|
6 | 6 |
|
7 | 7 | <br> |
8 | 8 |
|
9 | | -# Model serving at scale |
| 9 | +# Deploy, manage, and scale machine learning models in production |
10 | 10 |
|
11 | | -Cortex is a platform for deploying, managing, and scaling machine learning in production. |
| 11 | +Cortex is a cloud native model serving platform for machine learning engineering teams. |
12 | 12 |
|
13 | 13 | <br> |
14 | 14 |
|
15 | | -## Key features |
| 15 | +## Use cases |
16 | 16 |
|
17 | | -* Run realtime inference, batch inference, and training workloads. |
18 | | -* Deploy TensorFlow, PyTorch, ONNX, and other models to production. |
19 | | -* Scale to handle production workloads with server-side batching and request-based autoscaling. |
20 | | -* Configure rolling updates and live model reloading to update APIs without downtime. |
21 | | -* Serve models efficiently with multi-model caching and spot / preemptible instances. |
22 | | -* Stream performance metrics and structured logs to any monitoring tool. |
23 | | -* Perform A/B tests with configurable traffic splitting. |
| 17 | +* **Realtime machine learning** - build NLP, computer vision, and other APIs and integrate them into any application. |
| 18 | +* **Large-scale inference** - scale realtime or batch inference workloads across hundreds or thousands of instances. |
| 19 | +* **Consistent MLOps workflows** - create streamlined and reproducible MLOps workflows for any machine learning team. |
24 | 20 |
|
25 | 21 | <br> |
26 | 22 |
|
27 | | -## How it works |
| 23 | +## Deploy |
28 | 24 |
|
29 | | -### Implement a Predictor |
| 25 | +* Deploy TensorFlow, PyTorch, ONNX, and other models using a simple CLI or Python client. |
| 26 | +* Run realtime inference, batch inference, asynchronous inference, and training jobs. |
| 27 | +* Define preprocessing and postprocessing steps in Python and chain workloads seamlessly. |
30 | 28 |
|
31 | | -```python |
32 | | -# predictor.py |
| 29 | +```text |
| 30 | +$ cortex deploy apis.yaml |
33 | 31 |
|
34 | | -from transformers import pipeline |
| 32 | +• creating text-generator (realtime API) |
| 33 | +• creating image-classifier (batch API) |
| 34 | +• creating video-analyzer (async API) |
35 | 35 |
|
36 | | -class PythonPredictor: |
37 | | - def __init__(self, config): |
38 | | - self.model = pipeline(task="text-generation") |
39 | | - |
40 | | - def predict(self, payload): |
41 | | - return self.model(payload["text"])[0] |
42 | | -``` |
43 | | - |
44 | | -### Configure a realtime API |
45 | | - |
46 | | -```yaml |
47 | | -# text_generator.yaml |
48 | | - |
49 | | -- name: text-generator |
50 | | - kind: RealtimeAPI |
51 | | - predictor: |
52 | | - type: python |
53 | | - path: predictor.py |
54 | | - compute: |
55 | | - gpu: 1 |
56 | | - mem: 8Gi |
57 | | - autoscaling: |
58 | | - min_replicas: 1 |
59 | | - max_replicas: 10 |
| 36 | +all APIs are ready! |
60 | 37 | ``` |
61 | 38 |
|
62 | | -### Deploy |
| 39 | +## Manage |
63 | 40 |
|
64 | | -```bash |
65 | | -$ cortex deploy text_generator.yaml |
| 41 | +* Create A/B tests and shadow pipelines with configurable traffic splitting. |
| 42 | +* Automatically stream logs from every workload to your favorite log management tool. |
| 43 | +* Monitor your workloads with pre-built Grafana dashboards and add your own custom dashboards. |
66 | 44 |
|
67 | | -# creating http://example.com/text-generator |
| 45 | +```text |
| 46 | +$ cortex get |
68 | 47 |
|
| 48 | +API TYPE GPUs |
| 49 | +text-generator realtime 32 |
| 50 | +image-classifier batch 64 |
| 51 | +video-analyzer async 16 |
69 | 52 | ``` |
70 | 53 |
|
71 | | -### Serve prediction requests |
| 54 | +## Scale |
| 55 | + |
| 56 | +* Configure workload and cluster autoscaling to efficiently handle large-scale production workloads. |
| 57 | +* Create clusters with different types of instances for different types of workloads. |
| 58 | +* Spend less on cloud infrastructure by letting Cortex manage spot or preemptible instances. |
| 59 | + |
| 60 | +```text |
| 61 | +$ cortex cluster info |
72 | 62 |
|
73 | | -```bash |
74 | | -$ curl http://example.com/text-generator -X POST -H "Content-Type: application/json" -d '{"text": "hello world"}' |
| 63 | +provider: aws |
| 64 | +region: us-east-1 |
| 65 | +instance_types: [c5.xlarge, g4dn.xlarge] |
| 66 | +spot_instances: true |
| 67 | +min_instances: 10 |
| 68 | +max_instances: 100 |
75 | 69 | ``` |
0 commit comments