Skip to content

Commit c61e568

Browse files
authored
Update versions of Cortex dependencies (#1886)
1 parent bb1e7d1 commit c61e568

File tree

23 files changed

+174
-147
lines changed

23 files changed

+174
-147
lines changed

build/build-image.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ fi
3030
build_args=""
3131

3232
if [ "${image}" == "python-predictor-gpu" ]; then
33-
cuda=("10.0" "10.1" "10.1" "10.2" "10.2" "11.0" "11.1")
34-
cudnn=("7" "7" "8" "7" "8" "8" "8")
33+
cuda=("10.0" "10.1" "10.1" "10.2" "10.2" "11.0" "11.1" "11.2")
34+
cudnn=("7" "7" "8" "7" "8" "8" "8" "8")
3535
for i in ${!cudnn[@]}; do
3636
build_args="${build_args} --build-arg CUDA_VERSION=${cuda[$i]} --build-arg CUDNN=${cudnn[$i]}"
3737
docker build "$ROOT" -f $ROOT/images/$image/Dockerfile $build_args -t quay.io/cortexlabs/${image}:${CORTEX_VERSION}-cuda${cuda[$i]}-cudnn${cudnn[$i]}

build/push-image.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ image=$1
2424
echo "$DOCKER_PASSWORD" | docker login -u "$DOCKER_USERNAME" --password-stdin
2525

2626
if [ "$image" == "python-predictor-gpu" ]; then
27-
cuda=("10.0" "10.1" "10.1" "10.2" "10.2" "11.0" "11.1")
28-
cudnn=("7" "7" "8" "7" "8" "8" "8")
27+
cuda=("10.0" "10.1" "10.1" "10.2" "10.2" "11.0" "11.1" "11.2")
28+
cudnn=("7" "7" "8" "7" "8" "8" "8" "8")
2929
for i in ${!cudnn[@]}; do
3030
docker push quay.io/cortexlabs/${image}:${CORTEX_VERSION}-cuda${cuda[$i]}-cudnn${cudnn[$i]}
3131
done

dev/versions.md

Lines changed: 15 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -143,18 +143,21 @@ python versions in our pip dependencies (e.g. [tensorflow](https://pypi.org/proj
143143
## TensorFlow / TensorFlow Serving
144144

145145
1. Find the latest release on [GitHub](https://github.com/tensorflow/tensorflow/releases)
146-
1. Search the codebase for the current minor TensorFlow version (e.g. `2.3`) and update versions as appropriate
146+
1. Search the codebase for the current minor TensorFlow version (e.g. `2.4`) and update versions as appropriate
147+
1. Update the version for libnvinfer in `images/tensorflow-serving-gpu/Dockerfile` dockerfile as appropriate (https://www.tensorflow.org/install/gpu)
147148

148149
Note: it's ok if example training notebooks aren't upgraded, as long as the exported model still works
149150

150151
## CUDA/cuDNN
151152

152-
1. Search the codebase for the previous CUDA version and `cudnn`
153+
1. Search the codebase for the previous CUDA version and `cudnn`. It might be nice to use the version of CUDA which does not require a special pip command when installing pytorch.
153154

154155
## ONNX runtime
155156

156157
1. Update the version in `images/onnx-predictor-cpu/Dockerfile`
157158
and `images/onnx-predictor-gpu/Dockerfile` ([releases](https://github.com/microsoft/onnxruntime/releases))
159+
* Use the appropriate CUDA/cuDNN version in `images/onnx-predictor-gpu/Dockerfile` ([docs](https://github.com/microsoft/onnxruntime/blob/master/BUILD.md#CUDA))
160+
* Search the codebase for the previous version
158161
1. Search the codebase for the previous ONNX runtime version
159162

160163
## Nvidia device plugin
@@ -163,10 +166,11 @@ Note: it's ok if example training notebooks aren't upgraded, as long as the expo
163166
, [Dockerhub](https://hub.docker.com/r/nvidia/k8s-device-plugin))
164167
1. In the [GitHub Repo](https://github.com/NVIDIA/k8s-device-plugin), find the latest release and go to this file (
165168
replacing the version number): <https://github.com/NVIDIA/k8s-device-plugin/blob/v0.6.0/nvidia-device-plugin.yml>
166-
1. Copy the contents to `manager/manifests/nvidia.yaml`
169+
1. Copy the contents to `manager/manifests/nvidia_aws.yaml`
167170
1. Update the link at the top of the file to the URL you copied from
168171
1. Check that your diff is reasonable (and put back any of our modifications, e.g. the image path, rolling update
169172
strategy, resource requests, tolerations, node selector, priority class, etc)
173+
1. For `manager/manifests/nvidia_gcp.yaml` follow the instructions at [here](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#installing_drivers)
170174
1. Confirm GPUs work for PyTorch, TensorFlow, and ONNX models
171175

172176
## Inferentia device plugin
@@ -188,10 +192,10 @@ Note: it's ok if example training notebooks aren't upgraded, as long as the expo
188192

189193
1. `docker run --rm -it amazonlinux:2`
190194
1. Run the `echo $'[neuron] ...' > /etc/yum.repos.d/neuron.repo` command
191-
from [Dockerfile.neuron-rtd](https://github.com/aws/aws-neuron-sdk/blob/master/docs/neuron-container-tools/docker-example/Dockerfile.neuron-rtd) (
192-
it needs to be updated to work properly with the new lines)
193-
1. Run `yum info aws-neuron-tools` and `yum info aws-neuron-runtime` to check the versions that were installed, and use
194-
those versions in `images/neuron-rtd/Dockerfile`
195+
from [Dockerfile.neuron-rtd](https://github.com/aws/aws-neuron-sdk/blob/master/docs/neuron-container-tools/docker-example/Dockerfile.neuron-rtd) (it needs to be updated to work properly with the new lines)
196+
* e.g. `echo $'[neuron] \nname=Neuron YUM Repository \nbaseurl=https://yum.repos.neuron.amazonaws.com \nenabled=1' > /etc/yum.repos.d/neuron.repo`
197+
1. Run `yum info aws-neuron-tools`, `yum info aws-neuron-runtime`, and `yum info procps-ng` to check the versions
198+
that were installed, and use those versions in `images/neuron-rtd/Dockerfile`
195199
1. Check if there are any updates
196200
to [Dockerfile.neuron-rtd](https://github.com/aws/aws-neuron-sdk/blob/master/docs/neuron-container-tools/docker-example/Dockerfile.neuron-rtd)
197201
which should be brought in to `images/neuron-rtd/Dockerfile`
@@ -268,19 +272,10 @@ Note: it's ok if example training notebooks aren't upgraded, as long as the expo
268272
1. Find the latest release on [GitHub](https://github.com/kubernetes-incubator/metrics-server/releases) and check the
269273
changelog
270274
1. Update the version in `images/metrics-server/Dockerfile`
271-
1. In the [GitHub Repo](https://github.com/kubernetes-incubator/metrics-server), find the latest release and go to this
272-
directory (replacing the version
273-
number): <https://github.com/kubernetes-incubator/metrics-server/tree/v0.3.7/deploy/1.8+>
274-
1. Copy the contents of all of the files in that directory into `manager/manifests/metrics-server.yaml`
275-
1. Update this line of config:
276-
277-
```yaml
278-
image: $CORTEX_IMAGE_METRICS_SERVER
279-
```
280-
281-
1. Update the link at the top of the file to the URL you copied from
282-
1. Check that your diff is reasonable (there may have been other modifications to the file which should be
283-
preserved, like resource requests)
275+
1. Download the manifest referenced in the latest release in changelog
276+
1. Copy the contents of the manifest into `manager/manifests/metrics-server.yaml`
277+
1. Update accordingly (e.g. image, pull policy, resource request, etc):
278+
1. Check that your diff is reasonable
284279
1. You can confirm the metric server is running by showing the logs of the metrics-server pod, or
285280
via `kubectl get deployment metrics-server -n kube-system`
286281
and `kubectl get apiservice v1beta1.metrics.k8s.io -o yaml`

docs/workloads/batch/predictors.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ class PythonPredictor:
9090

9191
## TensorFlow Predictor
9292

93-
**Uses TensorFlow version 2.3.0 by default**
93+
**Uses TensorFlow version 2.4.1 by default**
9494

9595
### Interface
9696

@@ -151,7 +151,7 @@ If you need to share files between your predictor implementation and the TensorF
151151

152152
## ONNX Predictor
153153

154-
**Uses ONNX Runtime version 1.4.0 by default**
154+
**Uses ONNX Runtime version 1.6.0 by default**
155155

156156
### Interface
157157

docs/workloads/dependencies/images.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Cortex's base Docker images are listed below. Depending on the Cortex Predictor
2020
* `quay.io/cortexlabs/python-predictor-gpu:master-cuda10.2-cudnn8`
2121
* `quay.io/cortexlabs/python-predictor-gpu:master-cuda11.0-cudnn8`
2222
* `quay.io/cortexlabs/python-predictor-gpu:master-cuda11.1-cudnn8`
23+
* `quay.io/cortexlabs/python-predictor-gpu:master-cuda11.2-cudnn8`
2324
* Python Predictor (Inferentia): `quay.io/cortexlabs/python-predictor-inf:master`
2425
* TensorFlow Predictor (CPU, GPU, Inferentia): `quay.io/cortexlabs/tensorflow-predictor:master`
2526
* ONNX Predictor (CPU): `quay.io/cortexlabs/onnx-predictor-cpu:master`

docs/workloads/realtime/predictors.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ Your `predictor` method can return different types of objects such as `JSON`-par
131131

132132
## TensorFlow Predictor
133133

134-
**Uses TensorFlow version 2.3.0 by default**
134+
**Uses TensorFlow version 2.4.1 by default**
135135

136136
### Interface
137137

@@ -203,7 +203,7 @@ If you need to share files between your predictor implementation and the TensorF
203203

204204
## ONNX Predictor
205205

206-
**Uses ONNX Runtime version 1.4.0 by default**
206+
**Uses ONNX Runtime version 1.6.0 by default**
207207

208208
### Interface
209209

images/inferentia/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
FROM 790709498068.dkr.ecr.us-west-2.amazonaws.com/neuron-device-plugin:1.0.11000.0
1+
FROM 790709498068.dkr.ecr.us-west-2.amazonaws.com/neuron-device-plugin:1.4.1.0

images/manager/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,11 @@ RUN apk add --no-cache bash curl gettext jq openssl
1818
RUN curl --location "https://github.com/weaveworks/eksctl/releases/download/0.36.2/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp && \
1919
mv /tmp/eksctl /usr/local/bin
2020

21-
RUN curl -o aws-iam-authenticator https://amazon-eks.s3.us-west-2.amazonaws.com/1.17.9/2020-08-04/bin/linux/amd64/aws-iam-authenticator && \
21+
RUN curl -o aws-iam-authenticator https://amazon-eks.s3.us-west-2.amazonaws.com/1.18.9/2020-11-02/bin/linux/amd64/aws-iam-authenticator && \
2222
chmod +x ./aws-iam-authenticator && \
2323
mv ./aws-iam-authenticator /usr/local/bin/aws-iam-authenticator
2424

25-
RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.19.0/bin/linux/amd64/kubectl && \
25+
RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.20.2/bin/linux/amd64/kubectl && \
2626
chmod +x ./kubectl && \
2727
mv ./kubectl /usr/local/bin/kubectl
2828

images/metrics-server/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
FROM k8s.gcr.io/metrics-server/metrics-server:v0.3.7
1+
FROM k8s.gcr.io/metrics-server/metrics-server:v0.4.2

images/neuron-rtd/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ enabled=1' > /etc/yum.repos.d/neuron.repo
99
RUN rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB
1010

1111
RUN yum install -y \
12-
aws-neuron-tools-1.0.11054.0 \
13-
aws-neuron-runtime-1.0.9592.0 \
12+
aws-neuron-tools-1.4.2.0 \
13+
aws-neuron-runtime-1.4.3.0 \
1414
procps-ng-3.3.10-26.amzn2.x86_64 \
1515
gzip \
1616
tar \

0 commit comments

Comments
 (0)