Skip to content

Commit fc4034d

Browse files
caleb-kaiserospillinger
authored andcommitted
Update README.md (#514)
1 parent abdc0cd commit fc4034d

File tree

1 file changed

+64
-58
lines changed

1 file changed

+64
-58
lines changed

README.md

Lines changed: 64 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,122 +1,128 @@
11
# Deploy machine learning models in production
22

3-
Cortex is an open source machine learning deployment platform that makes it simple to deploy your machine learning models as web APIs on AWS. It combines TensorFlow Serving, ONNX Runtime, and Flask into a single tool that takes models from S3 and deploys them as web APIs. It also uses Docker and Kubernetes behind the scenes to autoscale, run rolling updates, and support CPU and GPU inference. The project is maintained by a venture-backed team of infrastructure engineers with backgrounds from Google, Illumio, and Berkeley.
4-
5-
<br>
3+
Cortex is an open source platform that takes machine learning models—trained with nearly any framework—and turns them into production web APIs in one command. <br>
64

75
<!-- CORTEX_VERSION_MINOR x2 (e.g. www.cortex.dev/v/0.8/...) -->
86
[install](https://www.cortex.dev/install)[docs](https://www.cortex.dev)[examples](examples)[we're hiring](https://angel.co/cortex-labs-inc/jobs)[email us](mailto:hello@cortex.dev)[chat with us](https://gitter.im/cortexlabs/cortex)
97

108
<br>
119

1210
<!-- Set header Cache-Control=no-cache on the S3 object metadata (see https://help.github.com/en/articles/about-anonymized-image-urls) -->
13-
![Demo](https://cortex-public.s3-us-west-2.amazonaws.com/demo/gif/v0.8.gif)
11+
![Demo](https://cortex-public.s3-us-west-2.amazonaws.com/demo/gif/v0.8.gif)<br>
1412

1513
<br>
1614

17-
## Key features
18-
19-
- **Minimal declarative configuration:** Deployments can be defined in a single `cortex.yaml` file.
20-
21-
- **Autoscaling:** Cortex automatically scales APIs to handle production workloads.
22-
23-
- **Multi framework:** Cortex supports TensorFlow, Keras, PyTorch, Scikit-learn, XGBoost, and more.
24-
25-
- **Rolling updates:** Cortex updates deployed APIs without any downtime.
26-
27-
- **Log streaming:** Cortex streams logs from your deployed models to your CLI.
28-
29-
- **Prediction monitoring:** Cortex can monitor network metrics and track predictions.
15+
## Quickstart
3016

31-
- **CPU / GPU support:** Cortex can run inference on CPU or GPU infrastructure.
17+
<!-- CORTEX_VERSION_MINOR (e.g. www.cortex.dev/v/0.8/...) -->
18+
Below, we'll walk through how to use Cortex to deploy OpenAI's GPT-2 model as a service on AWS. You'll need to [install Cortex](https://www.cortex.dev/install) on your AWS account before getting started.
3219

3320
<br>
3421

35-
## How it works
22+
### Step 1: Configure your deployment
3623

37-
### Define your deployment using declarative configuration
24+
<!-- CORTEX_VERSION_MINOR -->
25+
Define a `deployment` and an `api` resource. A `deployment` specifies a set of APIs that are deployed as a single unit. An `api` makes a model available as a web service that can serve real-time predictions. The configuration below will download the model from the `cortex-examples` S3 bucket. You can run the code that generated the exported GPT-2 model [here](https://colab.research.google.com/github/cortexlabs/cortex/blob/master/examples/text-generator/gpt-2.ipynb).
3826

3927
```yaml
4028
# cortex.yaml
4129

30+
- kind: deployment
31+
name: text
32+
4233
- kind: api
43-
name: my-api
44-
model: s3://my-bucket/my-model.onnx
34+
name: generator
35+
model: s3://cortex-examples/text-generator/gpt-2/124M
4536
request_handler: handler.py
46-
compute:
47-
gpu: 1
4837
```
4938
50-
### Customize request handling
39+
<br>
40+
41+
### Step 2: Add request handling
42+
43+
The model requires encoded data for inference, but the API should accept strings of natural language as input. It should also decode the inference output. This can be implemented in a request handler file using the `pre_inference` and `post_inference` functions:
5144

5245
```python
5346
# handler.py
5447
55-
# Load data for preprocessing or postprocessing. For example:
56-
labels = download_labels_from_s3()
48+
from encoder import get_encoder
49+
encoder = get_encoder()
5750
5851
5952
def pre_inference(sample, metadata):
60-
# Python code
53+
context = encoder.encode(sample["text"])
54+
return {"context": [context]}
6155
6256
6357
def post_inference(prediction, metadata):
64-
# Python code
58+
response = prediction["sample"]
59+
return {encoder.decode(response)}
6560
```
6661

67-
### Deploy to AWS using the CLI
62+
<br>
63+
64+
### Step 3: Deploy to AWS
65+
66+
Deploying to AWS is as simple as running `cortex deploy` from your CLI. `cortex deploy` takes the declarative configuration from `cortex.yaml` and creates it on the cluster. Behind the scenes, Cortex containerizes the model, makes it servable using TensorFlow Serving, exposes the endpoint with a load balancer, and orchestrates the workload on Kubernetes.
6867

6968
```bash
7069
$ cortex deploy
7170
72-
Deploying ...
73-
http://***.amazonaws.com/my-api # Your API is ready!
71+
deployment started
7472
```
7573

76-
### Serve real-time predictions via autoscaling JSON APIs running on AWS
74+
You can track the status of a deployment using `cortex get`. The output below indicates that one replica of the API was requested and one replica is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity.
7775

7876
```bash
79-
$ curl http://***.amazonaws.com/my-api -d '{"a": 1, "b": 2, "c": 3}'
77+
$ cortex get generator --watch
78+
79+
status up-to-date available requested last update avg latency
80+
live 1 1 1 8s 123ms
8081
81-
{ prediction: "def" }
82+
url: http://***.amazonaws.com/text/generator
8283
```
8384

8485
<br>
8586

86-
## Installation
87+
### Step 4: Serve real-time predictions
8788

88-
<!-- CORTEX_VERSION_README_MINOR -->
89+
Once you have your endpoint, you can make requests:
8990

9091
```bash
91-
# Download the install script
92-
$ curl -O https://raw.githubusercontent.com/cortexlabs/cortex/0.8/cortex.sh && chmod +x cortex.sh
92+
$ curl http://***.amazonaws.com/text/generator \
93+
-X POST -H "Content-Type: application/json" \
94+
-d '{"text": "machine learning"}'
9395
94-
# Install the Cortex CLI on your machine: the CLI sends configuration and code to the Cortex cluster
95-
$ ./cortex.sh install cli
96+
Machine learning, with more than one thousand researchers around the world today, are looking to create computer-driven machine learning algorithms that can also be applied to human and social problems, such as education, health care, employment, medicine, politics, or the environment...
97+
```
9698

97-
# Set your AWS credentials
98-
$ export AWS_ACCESS_KEY_ID=***
99-
$ export AWS_SECRET_ACCESS_KEY=***
99+
Any questions? [chat with us](https://gitter.im/cortexlabs/cortex).
100100

101-
# Configure AWS instance settings
102-
$ export CORTEX_NODE_TYPE="m5.large"
103-
$ export CORTEX_NODES_MIN="2"
104-
$ export CORTEX_NODES_MAX="5"
101+
<br>
105102

106-
# Install the Cortex cluster in your AWS account: the cluster is responsible for hosting your APIs
107-
$ ./cortex.sh install
108-
```
103+
## More examples
109104

110-
<!-- CORTEX_VERSION_MINOR (e.g. www.cortex.dev/v/0.8/...) -->
111-
See [installation instructions](https://www.cortex.dev/cluster/install) for more details.
105+
<!-- CORTEX_VERSION_README_MINOR x3 -->
106+
- [Iris classification](https://github.com/cortexlabs/cortex/tree/master/examples/iris-classifier)
107+
108+
- [Sentiment analysis](https://github.com/cortexlabs/cortex/tree/master/examples/sentiment-analysis) with BERT
109+
110+
- [Image classification](https://github.com/cortexlabs/cortex/tree/master/examples/image-classifier) with Inception v3 and AlexNet
112111

113112
<br>
114113

115-
## Examples
114+
## Key features
116115

117-
<!-- CORTEX_VERSION_README_MINOR x3 -->
118-
- [Text generation](https://github.com/cortexlabs/cortex/tree/0.8/examples/text-generator) with GPT-2
116+
- **Autoscaling:** Cortex automatically scales APIs to handle production workloads.
117+
118+
- **Multi framework:** Cortex supports TensorFlow, Keras, PyTorch, Scikit-learn, XGBoost, and more.
119+
120+
- **CPU / GPU support:** Cortex can run inference on CPU or GPU infrastructure.
121+
122+
- **Rolling updates:** Cortex updates deployed APIs without any downtime.
123+
124+
- **Log streaming:** Cortex streams logs from deployed models to your CLI.
119125

120-
- [Sentiment analysis](https://github.com/cortexlabs/cortex/tree/0.8/examples/sentiment-analysis) with BERT
126+
- **Prediction monitoring:** Cortex monitors network metrics and tracks predictions.
121127

122-
- [Image classification](https://github.com/cortexlabs/cortex/tree/0.8/examples/image-classifier) with Inception v3
128+
- **Minimal declarative configuration:** Deployments are defined in a single `cortex.yaml` file.

0 commit comments

Comments
 (0)