Skip to content

Commit 9c2c79a

Browse files
authored
Update docs (#1754)
1 parent 1224d45 commit 9c2c79a

File tree

14 files changed

+57
-481
lines changed

14 files changed

+57
-481
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Cortex is an open source platform for large-scale inference workloads.
1010

1111
## Model serving infrastructure
1212

13-
* Supports deploying TensorFlow, PyTorch, sklearn and other models as realtime or batch APIs.
13+
* Supports deploying TensorFlow, PyTorch, and other models as realtime or batch APIs.
1414
* Ensures high availability with availability zones and automated instance restarts.
1515
* Runs inference on on-demand instances or spot instances with on-demand backups.
1616
* Autoscales to handle production workloads with support for overprovisioning.
@@ -98,13 +98,13 @@ import cortex
9898
cx = cortex.client("aws")
9999
cx.create_api(api_spec, predictor=PythonPredictor, requirements=requirements)
100100

101-
# creating https://example.com/text-generator
101+
# creating http://example.com/text-generator
102102
```
103103

104104
#### Consume your API
105105

106106
```bash
107-
$ curl https://example.com/text-generator -X POST -H "Content-Type: application/json" -d '{"text": "hello world"}'
107+
$ curl http://example.com/text-generator -X POST -H "Content-Type: application/json" -d '{"text": "hello world"}'
108108
```
109109

110110
<br>

docs/clusters/aws/gpu.md

Lines changed: 0 additions & 33 deletions
This file was deleted.

docs/clusters/aws/inferentia.md

Lines changed: 0 additions & 75 deletions
This file was deleted.

docs/clusters/aws/install.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
1. [Docker](https://docs.docker.com/install)
66
1. Subscribe to the [EKS-optimized AMI with GPU Support](https://aws.amazon.com/marketplace/pp/B07GRHFXGM) (for GPU clusters)
77
1. An IAM user with `AdministratorAccess` and programmatic access (see [security](security.md) if you'd like to use less privileged credentials after spinning up your cluster)
8+
1. You may need to [request a limit increase](https://console.aws.amazon.com/servicequotas/home?#!/services/ec2/quotas) for your desired instance type
89

910
## Spin up Cortex on your AWS account
1011

docs/clusters/aws/spot.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,13 @@
11
# Spot instances
22

3-
[Spot instances](https://aws.amazon.com/ec2/spot) are spare capacity that AWS sells at a discount (up to 90%). The caveat is that spot instances may not always be available, and can be recalled by AWS at anytime. Cortex allows you to use spot instances in your cluster to take advantage of the discount while ensuring uptime and reliability of APIs. You can configure your cluster to use spot instances using the configuration below:
4-
53
```yaml
64
# cluster.yaml
75

86
# whether to use spot instances in the cluster (default: false)
97
spot: false
108

119
spot_config:
12-
# additional instances with identical or better specs than the primary instance type (defaults to only the primary instance)
10+
# additional instance types with identical or better specs than the primary cluster instance type (defaults to only the primary instance type)
1311
instance_distribution: # [similar_instance_type_1, similar_instance_type_2]
1412

1513
# minimum number of on demand instances (default: 0)

docs/summary.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,14 +24,15 @@
2424
* Multi-model
2525
* [Example](workloads/multi-model/example.md)
2626
* [Configuration](workloads/multi-model/configuration.md)
27+
* [Caching](workloads/multi-model/caching.md)
2728
* Traffic Splitter
2829
* [Example](workloads/traffic-splitter/example.md)
2930
* [Configuration](workloads/traffic-splitter/configuration.md)
3031
* Managing dependencies
3132
* [Example](workloads/dependencies/example.md)
3233
* [Python packages](workloads/dependencies/python-packages.md)
3334
* [System packages](workloads/dependencies/system-packages.md)
34-
* [Docker images](workloads/dependencies/docker-images.md)
35+
* [Custom images](workloads/dependencies/images.md)
3536

3637
## Clusters
3738

@@ -40,8 +41,6 @@
4041
* [Update](clusters/aws/update.md)
4142
* [Security](clusters/aws/security.md)
4243
* [Spot instances](clusters/aws/spot.md)
43-
* [GPUs](clusters/aws/gpu.md)
44-
* [Inferentia](clusters/aws/inferentia.md)
4544
* [Networking](clusters/aws/networking.md)
4645
* [VPC peering](clusters/aws/vpc-peering.md)
4746
* [Custom domain](clusters/aws/custom-domain.md)

docs/workloads/batch/configuration.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,10 @@
1818
endpoint: <string> # the endpoint for the API (default: <api_name>)
1919
api_gateway: public | none # whether to create a public API Gateway endpoint for this API (if not, the API will still be accessible via the load balancer) (default: public, unless disabled cluster-wide)
2020
compute:
21-
cpu: <string | int | float> # CPU request per worker, e.g. 200m or 1 (200m is equivalent to 0.2) (default: 200m)
22-
gpu: <int> # GPU request per worker (default: 0)
23-
inf: <int> # Inferentia ASIC request per worker (default: 0)
24-
mem: <string> # memory request per worker, e.g. 200Mi or 1Gi (default: Null)
21+
cpu: <string | int | float> # CPU request per worker. One unit of CPU corresponds to one virtual CPU; fractional requests are allowed, and can be specified as a floating point number or via the "m" suffix (default: 200m)
22+
gpu: <int> # GPU request per worker. One unit of GPU corresponds to one virtual GPU (default: 0)
23+
inf: <int> # Inferentia request per worker. One unit corresponds to one Inferentia ASIC with 4 NeuronCores and 8GB of cache memory. Each process will have one NeuronCore Group with (4 * inf / processes_per_replica) NeuronCores, so your model should be compiled to run on (4 * inf / processes_per_replica) NeuronCores. (default: 0) (aws only)
24+
mem: <string> # memory request per worker. One unit of memory is one byte and can be expressed as an integer or by using one of these suffixes: K, M, G, T (or their power-of two counterparts: Ki, Mi, Gi, Ti) (default: Null)
2525
```
2626
2727
## TensorFlow Predictor
@@ -54,10 +54,10 @@
5454
endpoint: <string> # the endpoint for the API (default: <api_name>)
5555
api_gateway: public | none # whether to create a public API Gateway endpoint for this API (if not, the API will still be accessible via the load balancer) (default: public, unless disabled cluster-wide)
5656
compute:
57-
cpu: <string | int | float> # CPU request per worker, e.g. 200m or 1 (200m is equivalent to 0.2) (default: 200m)
58-
gpu: <int> # GPU request per worker (default: 0)
59-
inf: <int> # Inferentia ASIC request per worker (default: 0)
60-
mem: <string> # memory request per worker, e.g. 200Mi or 1Gi (default: Null)
57+
cpu: <string | int | float> # CPU request per worker. One unit of CPU corresponds to one virtual CPU; fractional requests are allowed, and can be specified as a floating point number or via the "m" suffix (default: 200m)
58+
gpu: <int> # GPU request per worker. One unit of GPU corresponds to one virtual GPU (default: 0)
59+
inf: <int> # Inferentia request per worker. One unit corresponds to one Inferentia ASIC with 4 NeuronCores and 8GB of cache memory. Each process will have one NeuronCore Group with (4 * inf / processes_per_replica) NeuronCores, so your model should be compiled to run on (4 * inf / processes_per_replica) NeuronCores. (default: 0) (aws only)
60+
mem: <string> # memory request per worker. One unit of memory is one byte and can be expressed as an integer or by using one of these suffixes: K, M, G, T (or their power-of two counterparts: Ki, Mi, Gi, Ti) (default: Null)
6161
```
6262
6363
## ONNX Predictor
@@ -84,7 +84,7 @@
8484
endpoint: <string> # the endpoint for the API (default: <api_name>)
8585
api_gateway: public | none # whether to create a public API Gateway endpoint for this API (if not, the API will still be accessible via the load balancer) (default: public, unless disabled cluster-wide)
8686
compute:
87-
cpu: <string | int | float> # CPU request per worker, e.g. 200m or 1 (200m is equivalent to 0.2) (default: 200m)
88-
gpu: <int> # GPU request per worker (default: 0)
89-
mem: <string> # memory request per worker, e.g. 200Mi or 1Gi (default: Null)
87+
cpu: <string | int | float> # CPU request per worker. One unit of CPU corresponds to one virtual CPU; fractional requests are allowed, and can be specified as a floating point number or via the "m" suffix (default: 200m)
88+
gpu: <int> # GPU request per worker. One unit of GPU corresponds to one virtual GPU (default: 0)
89+
mem: <string> # memory request per worker. One unit of memory is one byte and can be expressed as an integer or by using one of these suffixes: K, M, G, T (or their power-of two counterparts: Ki, Mi, Gi, Ti) (default: Null)
9090
```

0 commit comments

Comments
 (0)