Skip to content

Commit 5985af2

Browse files
author
Ziqun Ye
authored
Merge branch 'develop' into ODSC-39392/triton
2 parents 350f0ea + 230cde6 commit 5985af2

File tree

12 files changed

+262
-44
lines changed

12 files changed

+262
-44
lines changed
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
name: Feature Request
2+
description: Feature and enhancement proposals in oracle-ads library
3+
title: "[FR]: "
4+
labels: [Task, Backlog]
5+
assignees:
6+
- octocat
7+
body:
8+
- type: markdown
9+
attributes:
10+
value: |
11+
Before proceeding, please review the [Contributing to this repository](https://github.com/oracle/accelerated-data-science/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/oracle/.github/blob/main/CODE_OF_CONDUCT.md).
12+
13+
---
14+
15+
Thank you for submitting a feature request.
16+
- type: dropdown
17+
id: contribution
18+
attributes:
19+
label: Willingness to contribute
20+
description: Would you or another member of your organization be willing to contribute an implementation of this feature?
21+
options:
22+
- Yes. I can contribute this feature independently.
23+
- Yes. I would be willing to contribute this feature with guidance from the oracle-ads team.
24+
- No. I cannot contribute this feature at this time.
25+
validations:
26+
required: true
27+
- type: textarea
28+
attributes:
29+
label: Proposal Summary
30+
description: |
31+
In a few sentences, provide a clear, high-level description of the feature request
32+
validations:
33+
required: true
34+
- type: textarea
35+
attributes:
36+
label: Motivation
37+
description: |
38+
- What is the use case for this feature?
39+
- Why is this use case valuable to support for OCI DataScience users in general?
40+
- Why is this use case valuable to support for your project(s) or organization?
41+
- Why is it currently difficult to achieve this use case?
42+
value: |
43+
> #### What is the use case for this feature?
44+
45+
> #### Why is this use case valuable to support for OCI DataScience users in general?
46+
47+
> #### Why is this use case valuable to support for your project(s) or organization?
48+
49+
> #### Why is it currently difficult to achieve this use case?
50+
validations:
51+
required: true
52+
- type: textarea
53+
attributes:
54+
label: Details
55+
description: |
56+
Use this section to include any additional information about the feature. If you have a proposal for how to implement this feature, please include it here. For implementation guidelines, please refer to the [Contributing to this repository](https://github.com/oracle/accelerated-data-science/blob/main/CONTRIBUTING.md).
57+
validations:
58+
required: false
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
name: "[DO NOT TRIGGER] Publish to PyPI"
2+
3+
# To run this workflow manually from the Actions tab
4+
on: workflow_dispatch
5+
6+
jobs:
7+
build-n-publish:
8+
name: Build and publish Python 🐍 distribution 📦 to PyPI
9+
runs-on: ubuntu-latest
10+
11+
steps:
12+
- uses: actions/checkout@v3
13+
- name: Set up Python
14+
uses: actions/setup-python@v4
15+
with:
16+
python-version: "3.x"
17+
- name: Build distribution 📦
18+
run: |
19+
pip install wheel
20+
make dist
21+
- name: Validate
22+
run: |
23+
pip install dist/*.whl
24+
python -c "import ads;"
25+
## To run publish to test PyPI secret with token needs to be added,
26+
## this one GH_ADS_TESTPYPI_TOKEN - removed after initial test.
27+
## Project name also needed to be updated in setup.py - setup(name="test_oracle_ads", ...),
28+
## regular name is occupied by former developer and can't be used for testing
29+
# - name: Publish distribution 📦 to Test PyPI
30+
# env:
31+
# TWINE_USERNAME: __token__
32+
# TWINE_PASSWORD: ${{ secrets.GH_ADS_TESTPYPI_TOKEN }}
33+
# run: |
34+
# pip install twine
35+
# twine upload -r testpypi dist/* -u $TWINE_USERNAME -p $TWINE_PASSWORD
36+
- name: Publish distribution 📦 to PyPI
37+
env:
38+
TWINE_USERNAME: __token__
39+
TWINE_PASSWORD: ${{ secrets.GH_ADS_PYPI_TOKEN }}
40+
run: |
41+
pip install twine
42+
twine upload dist/* -u $TWINE_USERNAME -p $TWINE_PASSWORD

ads/database/connection.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ def __init__(
5252
The local database information store, default to ~/.database unless specified otherwise.
5353
kwargs: dict, optional
5454
Name-value pairs that are to be added to the list of connection parameters.
55-
For example, database_name="mydb", database_type="oracle", username = "root", password = "pwd".
55+
For example, database_name="mydb", database_type="oracle", username = "root", password = "example-password".
5656
5757
Returns
5858
-------

ads/model/generic_model.py

Lines changed: 38 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -789,6 +789,7 @@ def prepare(
789789
ignore_pending_changes: bool = True,
790790
max_col_num: int = DATA_SCHEMA_MAX_COL_NUM,
791791
ignore_conda_error: bool = False,
792+
score_py_uri: str = None,
792793
**kwargs: Dict,
793794
) -> "GenericModel":
794795
"""Prepare and save the score.py, serialized model and runtime.yaml file.
@@ -841,6 +842,10 @@ def prepare(
841842
number of features(columns).
842843
ignore_conda_error: (bool, optional). Defaults to False.
843844
Parameter to ignore error when collecting conda information.
845+
score_py_uri: (str, optional). Defaults to None.
846+
The uri of the customized score.py, which can be local path or OCI object storage URI.
847+
When provide with this attibute, the `score.py` will not be auto generated, and the
848+
provided `score.py` will be added into artifact_dir.
844849
kwargs:
845850
impute_values: (dict, optional).
846851
The dictionary where the key is the column index(or names is accepted
@@ -1001,13 +1006,22 @@ def prepare(
10011006
jinja_template_filename = (
10021007
"score-pkl" if self._serialize else "score_generic"
10031008
)
1004-
self.model_artifact.prepare_score_py(
1005-
jinja_template_filename=jinja_template_filename,
1006-
model_file_name=self.model_file_name,
1007-
data_deserializer=self.model_input_serializer.name,
1008-
model_serializer=self.model_save_serializer.name,
1009-
**{**kwargs, **self._score_args},
1010-
)
1009+
1010+
if score_py_uri:
1011+
utils.copy_file(
1012+
uri_src=score_py_uri,
1013+
uri_dst=os.path.join(self.artifact_dir, "score.py"),
1014+
force_overwrite=force_overwrite,
1015+
auth=self.auth
1016+
)
1017+
else:
1018+
self.model_artifact.prepare_score_py(
1019+
jinja_template_filename=jinja_template_filename,
1020+
model_file_name=self.model_file_name,
1021+
data_deserializer=self.model_input_serializer.name,
1022+
model_serializer=self.model_save_serializer.name,
1023+
**{**kwargs, **self._score_args},
1024+
)
10111025

10121026
self._summary_status.update_status(
10131027
detail="Generated score.py", status=ModelState.DONE.value
@@ -2483,6 +2497,7 @@ def predict(
24832497
self,
24842498
data: Any = None,
24852499
auto_serialize_data: bool = False,
2500+
local: bool = False,
24862501
**kwargs,
24872502
) -> Dict[str, Any]:
24882503
"""Returns prediction of input data run against the model deployment endpoint.
@@ -2507,6 +2522,8 @@ def predict(
25072522
Whether to auto serialize input data. Defauls to `False` for GenericModel, and `True` for other frameworks.
25082523
`data` required to be json serializable if `auto_serialize_data=False`.
25092524
If `auto_serialize_data` set to True, data will be serialized before sending to model deployment endpoint.
2525+
local: bool.
2526+
Whether to invoke the prediction locally. Default to False.
25102527
kwargs:
25112528
content_type: str, used to indicate the media type of the resource.
25122529
image: PIL.Image Object or uri for the image.
@@ -2525,10 +2542,21 @@ def predict(
25252542
NotActiveDeploymentError
25262543
If model deployment process was not started or not finished yet.
25272544
ValueError
2528-
If `data` is empty or not JSON serializable.
2545+
If model is not deployed yet or the endpoint information is not available.
25292546
"""
2530-
if not self.model_deployment:
2531-
raise ValueError("Use `deploy()` method to start model deployment.")
2547+
if local:
2548+
return self.verify(
2549+
data=data, auto_serialize_data=auto_serialize_data, **kwargs
2550+
)
2551+
2552+
if not (self.model_deployment and self.model_deployment.url):
2553+
raise ValueError(
2554+
"Error invoking the remote endpoint as the model is not "
2555+
"deployed yet or the endpoint information is not available. "
2556+
"Use `deploy()` method to start model deployment. "
2557+
"If you intend to invoke inference using locally available "
2558+
"model artifact, set parameter `local=True`"
2559+
)
25322560

25332561
current_state = self.model_deployment.state.name.upper()
25342562
if current_state != ModelDeploymentState.ACTIVE.name:

ads/opctl/cmds.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ def _save_yaml(yaml_content, **kwargs):
141141
yaml_content : str
142142
YAML content as string.
143143
"""
144-
if kwargs["job_info"]:
144+
if kwargs.get("job_info"):
145145
yaml_path = os.path.abspath(os.path.expanduser(kwargs["job_info"]))
146146
if os.path.isfile(yaml_path):
147147
overwrite = input(
@@ -210,7 +210,7 @@ def run(config: Dict, **kwargs) -> Dict:
210210
"backend operator for distributed training can either be local or job"
211211
)
212212
else:
213-
if not kwargs["dry_run"]:
213+
if not kwargs["dry_run"] and not kwargs["nobuild"]:
214214
verify_and_publish_image(kwargs["nopush"], config)
215215
print("running image: " + config["spec"]["cluster"]["spec"]["image"])
216216
cluster_def = YamlSpecParser.parse_content(config)

ads/opctl/config/resolver.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,11 @@ def _resolve_source_folder_path(self) -> None:
155155
def _resolve_entry_script(self) -> None:
156156
# this should be run after _resolve_source_folder_path
157157
if not self._is_ads_operator():
158-
if os.path.splitext(self.config["execution"]["entrypoint"])[1] == ".ipynb":
158+
if (
159+
self.config["execution"].get("entrypoint")
160+
and os.path.splitext(self.config["execution"]["entrypoint"])[1]
161+
== ".ipynb"
162+
):
159163
input_path = os.path.join(
160164
self.config["execution"]["source_folder"],
161165
self.config["execution"]["entrypoint"],

ads/templates/score_pytorch.jinja2

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,21 @@ import pandas as pd
1010
from io import BytesIO
1111
import base64
1212
import logging
13+
from random import randint
14+
15+
16+
def get_torch_device():
17+
num_devices = torch.cuda.device_count()
18+
if num_devices == 0:
19+
return "cpu"
20+
if num_devices == 1:
21+
return "cuda:0"
22+
else:
23+
return f"cuda:{randint(0, num_devices-1)}"
24+
1325

1426
model_name = '{{model_file_name}}'
27+
device = torch.device(get_torch_device())
1528

1629
"""
1730
Inference script. This script is used for prediction by scoring server when schema is known.
@@ -59,6 +72,7 @@ def load_model(model_file_name=model_name):
5972

6073
{% endif %}
6174
print("Model is successfully loaded.")
75+
the_model = the_model.to(device)
6276
return the_model
6377

6478
@lru_cache(maxsize=1)
@@ -158,6 +172,7 @@ def pre_inference(data, input_schema_path):
158172
data = deserialize(data, input_schema_path)
159173

160174
# Add further data preprocessing if needed
175+
data = data.to(device)
161176
return data
162177

163178
def post_inference(yhat):
@@ -199,6 +214,6 @@ def predict(data, model=load_model(), input_schema_path=os.path.join(os.path.dir
199214

200215
with torch.no_grad():
201216
yhat = post_inference(
202-
model(inputs)
217+
model(inputs).to("cpu")
203218
)
204219
return {'prediction': yhat}

dev-requirements.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,5 @@ xlrd>=1.2.0
1313
lxml
1414
fastparquet
1515
imbalanced-learn
16-
pyarrow
16+
pyarrow
17+
mysql-connector-python

docs/source/user_guide/model_registration/model_artifact.rst

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Auto generation of ``score.py`` with framework specific code for loading models
3030

3131
To accomodate for other frameworks that are unknown to ADS, a template code for ``score.py`` is generated in the provided artificat directory location.
3232

33+
3334
Prepare the Model Artifact
3435
--------------------------
3536

@@ -98,8 +99,25 @@ ADS automatically captures:
9899
* ``UseCaseType`` in ``metadata_taxonomy`` cannot be automatically populated. One way to populate the use case is to pass ``use_case_type`` to the ``prepare`` method.
99100
* Model introspection is automatically triggered.
100101

101-
.. include:: _template/score.rst
102+
Prepare with custom ``score.py``
103+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
104+
105+
.. versionadded:: 2.8.4
102106

107+
You could provide the location of your own ``score.py`` by ``score_py_uri`` in :py:meth:`~ads.model.GenericModel.prepare`.
108+
The provided ``score.py`` will be added into model artifact.
109+
110+
.. code-block:: python3
111+
112+
tf_model.prepare(
113+
inference_conda_env="generalml_p38_cpu_v1",
114+
use_case_type=UseCaseType.MULTINOMIAL_CLASSIFICATION,
115+
X_sample=trainx,
116+
y_sample=trainy,
117+
score_py_uri="/path/to/score.py"
118+
)
119+
120+
.. include:: _template/score.rst
103121

104122
Model Introspection
105123
-------------------

setup.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,6 @@
6969
"nbformat",
7070
"inflection",
7171
],
72-
"mysql": ["mysql-connector-python"],
7372
"bds": ["ibis-framework[impala]", "hdfs[kerberos]", "sqlalchemy"],
7473
"spark": ["pyspark>=3.0.0"],
7574
"huggingface": ["transformers"],

0 commit comments

Comments
 (0)