Skip to content

Commit 45d6d4d

Browse files
committed
Changes diff_mean to get 99.0 number + doc review requests
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
1 parent 692414e commit 45d6d4d

File tree

2 files changed

+3
-7
lines changed

2 files changed

+3
-7
lines changed

tests/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ They are calculated in lines [228 - 231 at generate_metrics.py](../scripts/gener
3939
cross_entropy = lambda r, t: torch.nn.CrossEntropyLoss()(r, t.softmax(dim=1).to(dtype=torch.float32))
4040
prob_mean = lambda r, t: torch.mean((r.softmax(dim=1).to(dtype=torch.float32) / t.softmax(dim=1).to(dtype=torch.float32)) - 1.0)
4141
prob_std = lambda r, t: torch.std(r.softmax(dim=1).to(dtype=torch.float32) / t.softmax(dim=1).to(dtype=torch.float32))
42-
diff_mean = lambda r, t: torch.mean(r.softmax(dim=1).to(dtype=torch.float32) - t.softmax(dim=1).to(dtype=torch.float32))
42+
diff_mean = lambda r, t: torch.mean(torch.abs(r.softmax(dim=1).to(dtype=torch.float32) - t.softmax(dim=1).to(dtype=torch.float32)))
4343
```
4444
More at [pytorch.org](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html), [Yiren,Wang](https://courses.grainger.illinois.edu/ece598pv/fa2017/Lecture13_LM_YirenWang.pdf), [Li, Wang, Shang Et al.](https://arxiv.org/abs/2412.12177#:~:text=%5B2412.12177%5D%20Model%2Ddiff:,%3E%20cs%20%3E%20arXiv:2412.12177) and [Wu,Hilton](https://arxiv.org/html/2410.13211v1).
4545
</br>
@@ -98,7 +98,7 @@ After running these scripts in namespace with 1 GPU, these were the thresholds g
9898
```bash
9999
python3 get_thresholds.py --models /tmp/aiu-fms-testing-utils/models/Mistral-7B-Instruct-v0.3 --metrics diff_mean ce --file_base /tmp/aiu-fms-testing-utils/output
100100
found 7 metric files
101-
--tmp--aiu-fms-testing-utils--models--Mistral-7B-Instruct-v0.3 diff_mean -1.0710003217617725e-08 0.0007839603102183846
101+
--tmp--aiu-fms-testing-utils--models--Mistral-7B-Instruct-v0.3 diff_mean 0.0007839603102183846
102102
found 7 metric files
103103
--tmp--aiu-fms-testing-utils--models--Mistral-7B-Instruct-v0.3 ce 2.8364005851745624
104104
```
@@ -120,7 +120,7 @@ These are the variables set at the deployment:
120120
| FMS_TEST_SHAPES_METRICS_THRESHOLD | 2.8364005851745624,0.0007839603102183846
121121

122122

123-
> Set `FMS_TEST_SHAPES_METRICS_THRESHOLD` in case there is no need to add the model to the default ones. No code changes needed, just this environment variable set with the metrics values.
123+
> Set `FMS_TEST_SHAPES_METRICS_THRESHOLD` in case there is no need to add the model to the default ones. No code changes needed, just this environment variable set with the metrics values. Set `FMS_TEST_SHAPES_VALIDATION_INFO_DIR` to speed up the tests considerably when testing larger models by using the output logits saved from generating the metrics. Set `FMS_TEST_SHAPES_FAILURE_THRESHOLD` if you would like to relax the threshold - default is `0.01`.
124124
125125
Add the new numbers at the end of the [dictionary](./models/test_decoders.py#L116):
126126
```python

tests/resources/get_thresholds.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,4 @@
5050
metric_list.append(float(line))
5151
print(f"found {len(metric_files)} metric files")
5252
if metric == "diff_mean":
53-
m1 = np.percentile(metric_list, .5)
54-
m2 = np.percentile(metric_list, 99.5)
55-
print(model, metric, m1, m2)
56-
else:
5753
print(model, metric, np.percentile(metric_list, 99.0))

0 commit comments

Comments
 (0)