|
1 | 1 | # Deploy machine learning models in production |
2 | 2 |
|
3 | | -Cortex is an open source machine learning deployment platform that makes it simple to deploy your machine learning models as web APIs on AWS. It combines TensorFlow Serving, ONNX Runtime, and Flask into a single tool that takes models from S3 and deploys them as web APIs. It also uses Docker and Kubernetes behind the scenes to autoscale, run rolling updates, and support CPU and GPU inference. The project is maintained by a venture-backed team of infrastructure engineers with backgrounds from Google, Illumio, and Berkeley. |
4 | | - |
5 | | -<br> |
| 3 | +Cortex is an open source platform that takes machine learning models—trained with nearly any framework—and turns them into production web APIs in one command. <br> |
6 | 4 |
|
7 | 5 | <!-- CORTEX_VERSION_MINOR x2 (e.g. www.cortex.dev/v/0.8/...) --> |
8 | 6 | [install](https://www.cortex.dev/install) • [docs](https://www.cortex.dev) • [examples](examples) • [we're hiring](https://angel.co/cortex-labs-inc/jobs) • [email us](mailto:hello@cortex.dev) • [chat with us](https://gitter.im/cortexlabs/cortex) |
9 | 7 |
|
10 | 8 | <br> |
11 | 9 |
|
12 | 10 | <!-- Set header Cache-Control=no-cache on the S3 object metadata (see https://help.github.com/en/articles/about-anonymized-image-urls) --> |
13 | | - |
| 11 | +<br> |
14 | 12 |
|
15 | 13 | <br> |
16 | 14 |
|
17 | | -## Key features |
18 | | - |
19 | | -- **Minimal declarative configuration:** Deployments can be defined in a single `cortex.yaml` file. |
20 | | - |
21 | | -- **Autoscaling:** Cortex automatically scales APIs to handle production workloads. |
22 | | - |
23 | | -- **Multi framework:** Cortex supports TensorFlow, Keras, PyTorch, Scikit-learn, XGBoost, and more. |
24 | | - |
25 | | -- **Rolling updates:** Cortex updates deployed APIs without any downtime. |
26 | | - |
27 | | -- **Log streaming:** Cortex streams logs from your deployed models to your CLI. |
28 | | - |
29 | | -- **Prediction monitoring:** Cortex can monitor network metrics and track predictions. |
| 15 | +## Quickstart |
30 | 16 |
|
31 | | -- **CPU / GPU support:** Cortex can run inference on CPU or GPU infrastructure. |
| 17 | +<!-- CORTEX_VERSION_MINOR (e.g. www.cortex.dev/v/0.8/...) --> |
| 18 | +Below, we'll walk through how to use Cortex to deploy OpenAI's GPT-2 model as a service on AWS. You'll need to [install Cortex](https://www.cortex.dev/install) on your AWS account before getting started. |
32 | 19 |
|
33 | 20 | <br> |
34 | 21 |
|
35 | | -## How it works |
| 22 | +### Step 1: Configure your deployment |
36 | 23 |
|
37 | | -### Define your deployment using declarative configuration |
| 24 | +<!-- CORTEX_VERSION_MINOR --> |
| 25 | +Define a `deployment` and an `api` resource. A `deployment` specifies a set of APIs that are deployed as a single unit. An `api` makes a model available as a web service that can serve real-time predictions. The configuration below will download the model from the `cortex-examples` S3 bucket. You can run the code that generated the exported GPT-2 model [here](https://colab.research.google.com/github/cortexlabs/cortex/blob/master/examples/text-generator/gpt-2.ipynb). |
38 | 26 |
|
39 | 27 | ```yaml |
40 | 28 | # cortex.yaml |
41 | 29 |
|
| 30 | +- kind: deployment |
| 31 | + name: text |
| 32 | + |
42 | 33 | - kind: api |
43 | | - name: my-api |
44 | | - model: s3://my-bucket/my-model.onnx |
| 34 | + name: generator |
| 35 | + model: s3://cortex-examples/text-generator/gpt-2/124M |
45 | 36 | request_handler: handler.py |
46 | | - compute: |
47 | | - gpu: 1 |
48 | 37 | ``` |
49 | 38 |
|
50 | | -### Customize request handling |
| 39 | +<br> |
| 40 | +
|
| 41 | +### Step 2: Add request handling |
| 42 | +
|
| 43 | +The model requires encoded data for inference, but the API should accept strings of natural language as input. It should also decode the inference output. This can be implemented in a request handler file using the `pre_inference` and `post_inference` functions: |
51 | 44 |
|
52 | 45 | ```python |
53 | 46 | # handler.py |
54 | 47 |
|
55 | | -# Load data for preprocessing or postprocessing. For example: |
56 | | -labels = download_labels_from_s3() |
| 48 | +from encoder import get_encoder |
| 49 | +encoder = get_encoder() |
57 | 50 |
|
58 | 51 |
|
59 | 52 | def pre_inference(sample, metadata): |
60 | | - # Python code |
| 53 | + context = encoder.encode(sample["text"]) |
| 54 | + return {"context": [context]} |
61 | 55 |
|
62 | 56 |
|
63 | 57 | def post_inference(prediction, metadata): |
64 | | - # Python code |
| 58 | + response = prediction["sample"] |
| 59 | + return {encoder.decode(response)} |
65 | 60 | ``` |
66 | 61 |
|
67 | | -### Deploy to AWS using the CLI |
| 62 | +<br> |
| 63 | + |
| 64 | +### Step 3: Deploy to AWS |
| 65 | + |
| 66 | +Deploying to AWS is as simple as running `cortex deploy` from your CLI. `cortex deploy` takes the declarative configuration from `cortex.yaml` and creates it on the cluster. Behind the scenes, Cortex containerizes the model, makes it servable using TensorFlow Serving, exposes the endpoint with a load balancer, and orchestrates the workload on Kubernetes. |
68 | 67 |
|
69 | 68 | ```bash |
70 | 69 | $ cortex deploy |
71 | 70 |
|
72 | | -Deploying ... |
73 | | -http://***.amazonaws.com/my-api # Your API is ready! |
| 71 | +deployment started |
74 | 72 | ``` |
75 | 73 |
|
76 | | -### Serve real-time predictions via autoscaling JSON APIs running on AWS |
| 74 | +You can track the status of a deployment using `cortex get`. The output below indicates that one replica of the API was requested and one replica is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity. |
77 | 75 |
|
78 | 76 | ```bash |
79 | | -$ curl http://***.amazonaws.com/my-api -d '{"a": 1, "b": 2, "c": 3}' |
| 77 | +$ cortex get generator --watch |
| 78 | +
|
| 79 | +status up-to-date available requested last update avg latency |
| 80 | +live 1 1 1 8s 123ms |
80 | 81 |
|
81 | | -{ prediction: "def" } |
| 82 | +url: http://***.amazonaws.com/text/generator |
82 | 83 | ``` |
83 | 84 |
|
84 | 85 | <br> |
85 | 86 |
|
86 | | -## Installation |
| 87 | +### Step 4: Serve real-time predictions |
87 | 88 |
|
88 | | -<!-- CORTEX_VERSION_README_MINOR --> |
| 89 | +Once you have your endpoint, you can make requests: |
89 | 90 |
|
90 | 91 | ```bash |
91 | | -# Download the install script |
92 | | -$ curl -O https://raw.githubusercontent.com/cortexlabs/cortex/0.8/cortex.sh && chmod +x cortex.sh |
| 92 | +$ curl http://***.amazonaws.com/text/generator \ |
| 93 | + -X POST -H "Content-Type: application/json" \ |
| 94 | + -d '{"text": "machine learning"}' |
93 | 95 |
|
94 | | -# Install the Cortex CLI on your machine: the CLI sends configuration and code to the Cortex cluster |
95 | | -$ ./cortex.sh install cli |
| 96 | +Machine learning, with more than one thousand researchers around the world today, are looking to create computer-driven machine learning algorithms that can also be applied to human and social problems, such as education, health care, employment, medicine, politics, or the environment... |
| 97 | +``` |
96 | 98 |
|
97 | | -# Set your AWS credentials |
98 | | -$ export AWS_ACCESS_KEY_ID=*** |
99 | | -$ export AWS_SECRET_ACCESS_KEY=*** |
| 99 | +Any questions? [chat with us](https://gitter.im/cortexlabs/cortex). |
100 | 100 |
|
101 | | -# Configure AWS instance settings |
102 | | -$ export CORTEX_NODE_TYPE="m5.large" |
103 | | -$ export CORTEX_NODES_MIN="2" |
104 | | -$ export CORTEX_NODES_MAX="5" |
| 101 | +<br> |
105 | 102 |
|
106 | | -# Install the Cortex cluster in your AWS account: the cluster is responsible for hosting your APIs |
107 | | -$ ./cortex.sh install |
108 | | -``` |
| 103 | +## More examples |
109 | 104 |
|
110 | | -<!-- CORTEX_VERSION_MINOR (e.g. www.cortex.dev/v/0.8/...) --> |
111 | | -See [installation instructions](https://www.cortex.dev/cluster/install) for more details. |
| 105 | +<!-- CORTEX_VERSION_README_MINOR x3 --> |
| 106 | +- [Iris classification](https://github.com/cortexlabs/cortex/tree/master/examples/iris-classifier) |
| 107 | + |
| 108 | +- [Sentiment analysis](https://github.com/cortexlabs/cortex/tree/master/examples/sentiment-analysis) with BERT |
| 109 | + |
| 110 | +- [Image classification](https://github.com/cortexlabs/cortex/tree/master/examples/image-classifier) with Inception v3 and AlexNet |
112 | 111 |
|
113 | 112 | <br> |
114 | 113 |
|
115 | | -## Examples |
| 114 | +## Key features |
116 | 115 |
|
117 | | -<!-- CORTEX_VERSION_README_MINOR x3 --> |
118 | | -- [Text generation](https://github.com/cortexlabs/cortex/tree/0.8/examples/text-generator) with GPT-2 |
| 116 | +- **Autoscaling:** Cortex automatically scales APIs to handle production workloads. |
| 117 | + |
| 118 | +- **Multi framework:** Cortex supports TensorFlow, Keras, PyTorch, Scikit-learn, XGBoost, and more. |
| 119 | + |
| 120 | +- **CPU / GPU support:** Cortex can run inference on CPU or GPU infrastructure. |
| 121 | + |
| 122 | +- **Rolling updates:** Cortex updates deployed APIs without any downtime. |
| 123 | + |
| 124 | +- **Log streaming:** Cortex streams logs from deployed models to your CLI. |
119 | 125 |
|
120 | | -- [Sentiment analysis](https://github.com/cortexlabs/cortex/tree/0.8/examples/sentiment-analysis) with BERT |
| 126 | +- **Prediction monitoring:** Cortex monitors network metrics and tracks predictions. |
121 | 127 |
|
122 | | -- [Image classification](https://github.com/cortexlabs/cortex/tree/0.8/examples/image-classifier) with Inception v3 |
| 128 | +- **Minimal declarative configuration:** Deployments are defined in a single `cortex.yaml` file. |
0 commit comments