Skip to content

Commit c2f5d62

Browse files
ospillingerdeliahu
authored andcommitted
Update docs (#212)
1 parent 7454fd8 commit c2f5d62

File tree

11 files changed

+24
-181
lines changed

11 files changed

+24
-181
lines changed

docs/apis/apis.md

Lines changed: 2 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
# APIs
22

3-
Serve models at scale and use them to build smarter applications.
3+
Serve models at scale.
44

55
## Config
66

77
```yaml
88
- kind: api
99
name: <string> # API name (required)
10-
model: <string> # path to a zipped model dir (e.g. s3://my-bucket/model.zip)
10+
model: <string> # path to an exported model (e.g. s3://my-bucket/model.zip)
1111
model_format: <string> # model format, must be "tensorflow" or "onnx"
1212
request_handler: <string> # path to the request handler implementation file, relative to the cortex root
1313
compute:
@@ -40,17 +40,3 @@ See [packaging models](packaging-models.md) for how to create the zipped model.
4040
Request handlers are used to decouple the interface of an API endpoint from its model. A `pre_inference` request handler can be used to modify request payloads before they are sent to the model. A `post_inference` request handler can be used to modify model predictions in the server before they are sent to the client.
4141

4242
See [request handlers](request-handlers.md) for a detailed guide.
43-
44-
## Integration
45-
46-
APIs can be integrated into other applications or services via their JSON endpoints. The endpoint for any API follows the following format: {apis_endpoint}/{deployment_name}/{api_name}.
47-
48-
The fields in the request payload for a particular API should match the raw columns that were used to train the model that it is serving. Cortex automatically applies the same transformers that were used at training time when responding to prediction requests.
49-
50-
## Horizontal Scalability
51-
52-
APIs can be configured using `replicas` in the `compute` field. Replicas can be used to change the amount of computing resources allocated to service prediction requests for a particular API. APIs that have low request volumes should have a small number of replicas while APIs that handle large request volumes should have more replicas.
53-
54-
## Rolling Updates
55-
56-
When the model that an API is serving gets updated, Cortex will update the API with the new model without any downtime.

docs/apis/compute.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# Compute
22

3-
Compute resource requests in Cortex follow the syntax and meaning of [compute resources in Kubernetes](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/).
3+
Compute resource requests in Cortex follow the syntax and meaning of [compute resources in Kubernetes](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container).
44

55
For example:
66

77
```yaml
8-
- kind: model
8+
- kind: api
99
...
1010
compute:
1111
cpu: "2"

docs/apis/deployment.md

Lines changed: 0 additions & 17 deletions
This file was deleted.

docs/apis/deployments.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Deployments
2+
3+
Deployments are used to group a set of resources that can be deployed as a single unit. It must be defined in every Cortex directory in a top-level `cortex.yaml` file.
4+
5+
## Config
6+
7+
```yaml
8+
- kind: deployment
9+
name: <string> # deployment name (required)
10+
```
11+
12+
## Example
13+
14+
```yaml
15+
- kind: deployment
16+
name: my_deployment
17+
```
File renamed without changes.

docs/pipelines/apis.md

Lines changed: 0 additions & 47 deletions
This file was deleted.

docs/pipelines/compute.md

Lines changed: 0 additions & 32 deletions
This file was deleted.

docs/pipelines/deployment.md

Lines changed: 0 additions & 17 deletions
This file was deleted.

docs/pipelines/packaging-models.md

Lines changed: 0 additions & 26 deletions
This file was deleted.

docs/pipelines/statuses.md

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -14,20 +14,3 @@
1414
| upstream error | Resource was not created due to an error in one of its dependencies |
1515
| upstream termination | Resource was not created because one of its dependencies was terminated |
1616
| compute unavailable | Resource's workload could not start due to insufficient memory, CPU, or GPU in the cluster |
17-
18-
## API statuses
19-
20-
| Status | Meaning |
21-
|----------------------|---|
22-
| ready | API is deployed and ready to serve prediction requests |
23-
| pending | API is waiting for another resource to be ready, or is initializing |
24-
| updating | API is performing a rolling update |
25-
| update pending | API will be updated when the new model is ready; a previous version of this API is ready |
26-
| stopping | API is stopping |
27-
| stopped | API is stopped |
28-
| error | API was not created due to an error; run `cortex logs -v <name>` to view the logs |
29-
| skipped | API was not created due to an error in another resource |
30-
| update skipped | API was not updated due to an error in another resource; a previous version of this API is ready |
31-
| upstream error | API was not created due to an error in one of its dependencies; a previous version of this API may be ready |
32-
| upstream termination | API was not created because one of its dependencies was terminated; a previous version of this API may be ready |
33-
| compute unavailable | API could not start due to insufficient memory, CPU, or GPU in the cluster; some replicas may be ready |

0 commit comments

Comments
 (0)