Skip to content

Commit 3d2e7c8

Browse files
authored
Optimum neuron 0.2.2 (#3281)
* chore(neuron): use optimum-neuron 0.2.1 * test(neuron): adjust expectations Since the latest optimum-neuron uses a new modeling for granite and qwen, the greedy outputs are slighly different. * test(neuron): add phi3 and qwen3 tests * chore(neuron): use optimum-neuron 0.2.2
1 parent f6005d6 commit 3d2e7c8

File tree

3 files changed

+27
-5
lines changed

3 files changed

+27
-5
lines changed

Dockerfile.neuron

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ RUN mkdir -p /tgi
55
# Fetch the optimum-neuron sources directly to avoid relying on pypi deployments
66
FROM alpine AS optimum-neuron
77
RUN mkdir -p /optimum-neuron
8-
ADD https://github.com/huggingface/optimum-neuron/archive/refs/tags/v0.2.0.tar.gz /optimum-neuron/sources.tar.gz
8+
ADD https://github.com/huggingface/optimum-neuron/archive/refs/tags/v0.2.2.tar.gz /optimum-neuron/sources.tar.gz
99
RUN tar -C /optimum-neuron -xf /optimum-neuron/sources.tar.gz --strip-components=1
1010

1111
# Build cargo components (adapted from TGI original Dockerfile)

integration-tests/fixtures/neuron/export_models.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,15 @@
4646
"auto_cast_type": "fp16",
4747
},
4848
},
49+
"qwen3": {
50+
"model_id": "Qwen/Qwen3-1.7B",
51+
"export_kwargs": {
52+
"batch_size": 4,
53+
"sequence_length": 4096,
54+
"num_cores": 2,
55+
"auto_cast_type": "bf16",
56+
},
57+
},
4958
"granite": {
5059
"model_id": "ibm-granite/granite-3.1-2b-instruct",
5160
"export_kwargs": {
@@ -55,6 +64,15 @@
5564
"auto_cast_type": "bf16",
5665
},
5766
},
67+
"phi3": {
68+
"model_id": "microsoft/Phi-3-mini-4k-instruct",
69+
"export_kwargs": {
70+
"batch_size": 4,
71+
"sequence_length": 4096,
72+
"num_cores": 2,
73+
"auto_cast_type": "bf16",
74+
},
75+
},
5876
}
5977

6078

integration-tests/neuron/test_generate.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,10 @@ async def test_model_single_request(tgi_service):
2121
assert response.details.generated_tokens == 17
2222
greedy_expectations = {
2323
"llama": " and how does it work?\nDeep learning is a subset of machine learning that uses artificial",
24-
"qwen2": " - Part 1\n\nDeep Learning is a subset of Machine Learning that is based on",
25-
"granite": "\n\nDeep Learning is a subset of Machine Learning, which is a branch of Art",
24+
"qwen2": " - Deep Learning is a subset of Machine Learning that involves the use of artificial neural networks",
25+
"granite": "\n\nDeep learning is a subset of machine learning techniques based on artificial neural networks",
26+
"qwen3": " A Deep Learning is a subset of machine learning that uses neural networks with multiple layers to",
27+
"phi3": "\n\nDeep learning is a subfield of machine learning that focuses on creating",
2628
}
2729
assert response.generated_text == greedy_expectations[service_name]
2830

@@ -78,8 +80,10 @@ async def test_model_multiple_requests(tgi_service, neuron_generate_load):
7880
assert len(responses) == 4
7981
expectations = {
8082
"llama": "Deep learning is a subset of machine learning that uses artificial",
81-
"qwen2": "Deep Learning is a subset of Machine Learning that is based on",
82-
"granite": "Deep Learning is a subset of Machine Learning, which is a branch of Art",
83+
"qwen2": "Deep Learning is a subset of Machine Learning that involves",
84+
"granite": "Deep learning is a subset of machine learning techniques",
85+
"qwen3": "Deep Learning is a subset of machine learning that uses neural networks",
86+
"phi3": "Deep learning is a subfield of machine learning that focuses on creating",
8387
}
8488
expected = expectations[tgi_service.client.service_name]
8589
for r in responses:

0 commit comments

Comments
 (0)