|
| 1 | +# Predictor APIs |
| 2 | + |
| 3 | +You can deploy models from any Python framework by implementing Cortex's Predictor interface. The interface consists of an `init()` function and a `predict()` function. The `init()` function is responsible for preparing the model for serving, downloading vocabulary files, etc. The `predict()` function is called on every request and is responsible for responding with a prediction. |
| 4 | + |
| 5 | +In addition to supporting Python models via the Predictor interface, Cortex can serve the following exported model formats: |
| 6 | + |
| 7 | +- [TensorFlow](tensorflow.md) |
| 8 | +- [ONNX](onnx.md) |
| 9 | + |
| 10 | +## Configuration |
| 11 | + |
| 12 | +```yaml |
| 13 | +- kind: api |
| 14 | + name: <string> # API name (required) |
| 15 | + endpoint: <string> # the endpoint for the API (default: /<deployment_name>/<api_name>) |
| 16 | + predictor: |
| 17 | + path: <string> # path to the predictor Python file, relative to the Cortex root (required) |
| 18 | + model: <string> # S3 path to a file or directory (e.g. s3://my-bucket/exported_model) (optional) |
| 19 | + python_path: <string> # path to the root of your Python folder that will be appended to PYTHONPATH (default: folder containing cortex.yaml) |
| 20 | + metadata: <string: value> # dictionary that can be used to configure custom values (optional) |
| 21 | + tracker: |
| 22 | + key: <string> # the JSON key in the response to track (required if the response payload is a JSON object) |
| 23 | + model_type: <string> # model type, must be "classification" or "regression" (required) |
| 24 | + compute: |
| 25 | + min_replicas: <int> # minimum number of replicas (default: 1) |
| 26 | + max_replicas: <int> # maximum number of replicas (default: 100) |
| 27 | + init_replicas: <int> # initial number of replicas (default: <min_replicas>) |
| 28 | + target_cpu_utilization: <int> # CPU utilization threshold (as a percentage) to trigger scaling (default: 80) |
| 29 | + cpu: <string | int | float> # CPU request per replica (default: 200m) |
| 30 | + gpu: <int> # GPU request per replica (default: 0) |
| 31 | + mem: <string> # memory request per replica (default: Null) |
| 32 | +``` |
| 33 | +
|
| 34 | +### Example |
| 35 | +
|
| 36 | +```yaml |
| 37 | +- kind: api |
| 38 | + name: my-api |
| 39 | + predictor: |
| 40 | + path: predictor.py |
| 41 | + compute: |
| 42 | + gpu: 1 |
| 43 | +``` |
| 44 | +
|
| 45 | +## Debugging |
| 46 | +
|
| 47 | +You can log information about each request by adding a `?debug=true` parameter to your requests. This will print: |
| 48 | + |
| 49 | +1. The raw sample |
| 50 | +2. The value after running the `predict` function |
| 51 | + |
1 | 52 | # Predictor |
2 | 53 |
|
3 | 54 | A Predictor is a Python file that describes how to initialize a model and use it to make a prediction. |
|
0 commit comments