Skip to content

Commit 61e04eb

Browse files
committed
Update run_git.rst, run_python.rst, and 4 more files...
1 parent 073a661 commit 61e04eb

File tree

6 files changed

+155
-85
lines changed

6 files changed

+155
-85
lines changed

docs/source/user_guide/jobs/run_git.rst

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,18 @@ Run Code from Git Repo
33

44
The :py:class:`~ads.jobs.GitPythonRuntime` allows you to run source code from a Git repository as a job.
55

6+
.. include:: ../jobs/toc_local.rst
7+
8+
PyTorch Example
9+
===============
10+
611
The following example shows how to run a
712
`PyTorch Neural Network Example to train third order polynomial predicting y=sin(x)
813
<https://github.com/pytorch/tutorials/blob/master/beginner_source/examples_nn/polynomial_nn.py>`_.
914

1015
.. include:: ../jobs/tabs/git_runtime.rst
1116

17+
1218
Git Repository
1319
==============
1420

@@ -38,7 +44,7 @@ Entrypoint
3844
The entrypoint specifies how the source code is invoked.
3945
The :py:meth:`~ads.jobs.GitPythonRuntime.with_entrypoint` supports the following arguments:
4046

41-
* ``path``: Required. The relative path for the script, module, or file to start the job.
47+
* ``path``: Required. The relative path of the script/module from the root of the Git repository.
4248
* ``func``: Optional. The function in the script specified by ``path`` to call.
4349
If you don't specify it, then the script specified by ``path`` is run as a Python script in a subprocess.
4450

@@ -60,6 +66,18 @@ The arguments can be strings, ``list`` of strings or ``dict`` containing only st
6066
Arguments are not used when the entrypoint is a notebook.
6167

6268

69+
Working Directory
70+
=================
71+
72+
By default, the working directory is the root of the git repository.
73+
This can be configured by can be configured by :py:meth:`~ads.jobs.GitPythonRuntime.with_working_dir`
74+
using a relative path from the root of the Git repository.
75+
76+
Note that the entrypoint should always specified as a relative path from the root of the Git repository,
77+
regardless of the working directory.
78+
The python paths and output directory should be specified relative to the working directory.
79+
80+
6381
Python Paths
6482
============
6583

@@ -68,17 +86,19 @@ The working directory is added to the Python paths automatically.
6886
You can call :py:meth:`~ads.jobs.GitPythonRuntime.with_python_path` to add additional python paths as needed.
6987
The paths should be relative paths from the working directory.
7088

89+
7190
Outputs
7291
=======
7392

74-
The :py:meth:`~ads.jobs.GitPythonRuntime.with_output` method allows you to specify the output path ``output_path``
93+
The :py:meth:`~ads.jobs.GitPythonRuntime.with_output` method allows you to specify the output path ``output_dir``
7594
in the job run and a remote URI (``output_uri``).
76-
Files in the ``output_path`` are copied to the remote output URI after the job run finishes successfully.
77-
Note that the ``output_path`` should be a path relative to the working directory.
95+
Files in the ``output_dir`` are copied to the remote output URI after the job run finishes successfully.
96+
Note that the ``output_dir`` should be a path relative to the working directory.
7897

7998
OCI object storage location can be specified in the format of ``oci://bucket_name@namespace/path/to/dir``.
8099
Please make sure you configure the I AM policy to allow the job run dynamic group to use object storage.
81100

101+
82102
Metadata
83103
========
84104
The :py:class:`~ads.jobs.GitPythonRuntime` updates metadata as free-form tags of the job run
@@ -93,6 +113,6 @@ after the job run finishes. The following tags are added automatically:
93113
The new values overwrite any existing tags.
94114
If you want to skip the metadata update, set ``skip_metadata_update`` to ``True`` when initializing the runtime:
95115

96-
.. code-block:: python3
116+
.. code-block:: python
97117
98118
runtime = GitPythonRuntime(skip_metadata_update=True)

docs/source/user_guide/jobs/run_python.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ as described in :doc:`infra_and_runtime`. This section shows the additional enha
88

99
.. include:: ../jobs/toc_local.rst
1010

11+
Example
12+
=======
13+
1114
Here is an example to define and run a job using :py:class:`~ads.jobs.PythonRuntime`:
1215

1316
.. include:: ../jobs/tabs/python_runtime.rst

docs/source/user_guide/jobs/run_script.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@ Here is an example:
1515

1616
.. include:: ../jobs/tabs/script_runtime.rst
1717

18+
An `example script <https://github.com/oracle-samples/oci-data-science-ai-samples/blob/master/jobs/shell/shell-with-args.sh>`_
19+
is available on `Data Science AI Sample GitHub Repository <https://github.com/oracle-samples/oci-data-science-ai-samples>`_.
20+
1821
Working Directory
1922
=================
2023

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
.. tabs::
2+
3+
.. code-tab:: python
4+
:caption: Python
5+
6+
from ads.jobs import Job, DataScienceJob, GitPythonRuntime
7+
8+
job = (
9+
Job(name="Training RNN with PyTorch")
10+
.with_infrastructure(
11+
DataScienceJob()
12+
.with_log_group_id("<log_group_ocid>")
13+
.with_log_id("<log_ocid>")
14+
.with_shape_name("VM.GPU3.1")
15+
# The following infrastructure configurations are optional
16+
# if you are in an OCI data science notebook session.
17+
# The configurations of the notebook session will be used as defaults.
18+
.with_compartment_id("<compartment_ocid>")
19+
.with_project_id("<project_ocid>")
20+
# Default block storage size is 50GB
21+
.with_block_storage_size(50)
22+
)
23+
.with_runtime(
24+
GitPythonRuntime(skip_metadata_update=True)
25+
# Use service conda pack
26+
.with_service_conda("pytorch110_p38_gpu_v1")
27+
# Specify training source code from GitHub
28+
.with_source(url="https://github.com/pytorch/examples.git", branch="main")
29+
# Entrypoint is a relative path from the root of the Git repository
30+
.with_entrypoint("word_language_model/main.py")
31+
# Pass the arguments as: "--epochs 5 --save model.pt --cuda"
32+
.with_argument(epochs=5, save="model.pt", cuda=None)
33+
# Set working directory, which will also be added to PYTHONPATH
34+
.with_working_dir("word_language_model")
35+
# Save the output to OCI object storage
36+
# output_dir is relative to working directory
37+
.with_output(output_dir=".", output_uri="oci://bucket@namespace/prefix")
38+
)
39+
)
40+
41+
.. code-tab:: yaml
42+
:caption: YAML
43+
44+
kind: job
45+
spec:
46+
name: "My Job"
47+
infrastructure:
48+
kind: infrastructure
49+
type: dataScienceJob
50+
spec:
51+
blockStorageSize: 50
52+
compartmentId: <compartment_ocid>
53+
jobInfrastructureType: STANDALONE
54+
jobType: DEFAULT
55+
logGroupId: <log_group_ocid>
56+
logId: <log_ocid>
57+
projectId: <project_ocid>
58+
shapeConfigDetails:
59+
memoryInGBs: 16
60+
ocpus: 1
61+
shapeName: VM.Standard.E3.Flex
62+
subnetId: <subnet_ocid>
63+
runtime:
64+
kind: runtime
65+
type: gitPython
66+
spec:
67+
args:
68+
- --epochs
69+
- '5'
70+
- --save
71+
- model.pt
72+
- --cuda
73+
branch: main
74+
conda:
75+
slug: pytorch110_p38_gpu_v1
76+
type: service
77+
entrypoint: word_language_model/main.py
78+
outputDir: .
79+
outputUri: oci://bucket@namespace/prefix
80+
skipMetadataUpdate: true
81+
url: https://github.com/pytorch/examples.git
82+
workingDir: word_language_model
83+
84+
85+
.. code-block:: python
86+
87+
# Create the job on OCI Data Science
88+
job.create()
89+
# Start a job run
90+
run = job.run()
91+
# Stream the job run outputs
92+
run.watch()

docs/source/user_guide/jobs/tabs/training_mnist.rst

Lines changed: 0 additions & 78 deletions
This file was deleted.

docs/source/user_guide/model_training/training_with_oci.rst

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,39 @@ enables you to define and run repeatable machine learning tasks on a fully manag
77
You can have Compute resource on demand and run applications that perform tasks such as
88
data preparation, model training, hyperparameter tuning, and batch inference.
99

10-
Here is an example for training MNIST model with PyTorch using source code directly from GitHub.
10+
Here is an example for training RNN on `Word-level Language Modeling <https://github.com/pytorch/examples/tree/main/word_language_model>`_,
11+
using the source code directly from GitHub.
1112

12-
.. include:: ../jobs/tabs/training_mnist.rst
13+
.. include:: ../jobs/tabs/training_job.rst
14+
15+
The job run will:
16+
17+
* Setup the PyTorch conda environment
18+
* Fetch the source code from GitHub
19+
* Run the training script with the specific arguments
20+
* Save the outputs to OCI object storage
21+
22+
Following are the example outputs of the job run:
23+
24+
.. code-block:: text
25+
26+
2023-02-27 20:26:36 - Job Run ACCEPTED
27+
2023-02-27 20:27:05 - Job Run ACCEPTED, Infrastructure provisioning.
28+
2023-02-27 20:28:27 - Job Run ACCEPTED, Infrastructure provisioned.
29+
2023-02-27 20:28:53 - Job Run ACCEPTED, Job run bootstrap starting.
30+
2023-02-27 20:33:05 - Job Run ACCEPTED, Job run bootstrap complete. Artifact execution starting.
31+
2023-02-27 20:33:08 - Job Run IN_PROGRESS, Job run artifact execution in progress.
32+
2023-02-27 20:33:31 - | epoch 1 | 200/ 2983 batches | lr 20.00 | ms/batch 8.41 | loss 7.63 | ppl 2064.78
33+
2023-02-27 20:33:32 - | epoch 1 | 400/ 2983 batches | lr 20.00 | ms/batch 8.23 | loss 6.86 | ppl 949.18
34+
2023-02-27 20:33:34 - | epoch 1 | 600/ 2983 batches | lr 20.00 | ms/batch 8.21 | loss 6.47 | ppl 643.12
35+
2023-02-27 20:33:36 - | epoch 1 | 800/ 2983 batches | lr 20.00 | ms/batch 8.22 | loss 6.29 | ppl 537.11
36+
2023-02-27 20:33:37 - | epoch 1 | 1000/ 2983 batches | lr 20.00 | ms/batch 8.22 | loss 6.14 | ppl 462.61
37+
2023-02-27 20:33:39 - | epoch 1 | 1200/ 2983 batches | lr 20.00 | ms/batch 8.21 | loss 6.05 | ppl 425.85
38+
...
39+
2023-02-27 20:35:41 - =========================================================================================
40+
2023-02-27 20:35:41 - | End of training | test loss 4.96 | test ppl 142.94
41+
2023-02-27 20:35:41 - =========================================================================================
42+
...
1343
1444
For more details, see:
1545

0 commit comments

Comments
 (0)