Skip to content

Commit 3309773

Browse files
committed
Add readme doc for the paged programs scripts
Signed-off-by: Sahdev Zala <spzala@us.ibm.com>
1 parent 974194b commit 3309773

File tree

2 files changed

+93
-0
lines changed

2 files changed

+93
-0
lines changed

scripts/README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,3 +76,20 @@ python3 scripts/validation.py --architecture=hf_configured --model_path=/home/de
7676

7777
To run a logits-based validation, pass `--validation_level=1` to the validation script. This will check for the logits output to match at every step of the model through cross-entropy loss. You can control the acceptable threshold with `--logits_loss_threshold`.
7878

79+
## How to run and validate paged programs
80+
81+
The [drive_paged_programs.py](https://github.com/foundation-model-stack/aiu-fms-testing-utils/blob/main/scripts/drive_paged_programs.py) is designed to run and validate paged programs using a specified model variant. It supports different attention types, including `paged` and `paged_fp8`, with the default set to `paged`. The supported dataset types are `sharegpt` and `rag_factoid`, with the default set to `sharegpt`. The script can run tests in a distributed environment, utilizing multiple instances for faster execution. To see the description of various command-line arguments that the script can parse, run it with `--help`. The following examples demonstrate the usage of the script.
82+
83+
```bash
84+
# Run with 4K context length
85+
VLLM_DT_MAX_BATCH_SIZE=4 VLLM_DT_MAX_CONTEXT_LEN=4096 HF_HUB_CACHE=/home/senuser/models/huggingface_cache/hub DT_DEEPRT_VERBOSE=-1 DTLOG_LEVEL=error torchrun --nproc-per-node=4 /home/senuser/aiu-fms-testing-utils/scripts/drive_paged_programs.py --max_new_tokens=8 --model_variant=ibm-granite/granite-3.3-8b-instruct --program_criteria_json_path=/home/senuser/models/fms-tests-dpp-programs/dpp-4k.json --dataset_path=/home/senuser/models/ShareGPT_V3_unfiltered_cleaned_split.json --test_type=tokens --distributed
86+
87+
# Run with 8K context length
88+
VLLM_DT_MAX_BATCH_SIZE=16 VLLM_DT_MAX_CONTEXT_LEN=8192 HF_HUB_CACHE=/home/senuser/models/huggingface_cache/hub DT_DEEPRT_VERBOSE=-1 DTLOG_LEVEL=error torchrun --nproc-per-node=4 /home/senuser/aiu-fms-testing-utils/scripts/drive_paged_programs.py --max_new_tokens=8 --model_variant=ibm-granite/granite-3.3-8b-instruct --program_criteria_json_path=/home/senuser/models/fms-tests-dpp-programs/dpp-8k-16.json --dataset_path=/home/senuser/models/ShareGPT_V3_unfiltered_cleaned_split.json --dataset_type=sharegpt --test_type=tokens --distributed
89+
90+
# Run with 16K context length using the rag_factoid dataset type and a program with a specific batch size and prompt length
91+
VLLM_DT_MAX_BATCH_SIZE=4 VLLM_DT_MAX_CONTEXT_LEN=16384 HF_HUB_CACHE=/home/senuser/models/huggingface_cache/hub DT_DEEPRT_VERBOSE=-1 DTLOG_LEVEL=error torchrun --nproc-per-node=4 /home/senuser/aiu-fms-testing-utils/scripts/drive_paged_programs.py --max_new_tokens=8 --model_variant=ibm-granite/granite-3.3-8b-instruct --program_criteria_json_path=/home/senuser/models/fms-tests-dpp-programs/dpp-16k.json --dataset_path=/home/senuser/models/long_context_factoid_post_process.jsonl --dataset_type=rag_factoid --test_type=tokens --distributed --programs 0:4,16256
92+
93+
# Run with a 32K context length using the rag_factoid dataset type and a program with any batch size and a specific prompt length
94+
EN_PREFILL_OPT=1 VLLM_DT_MAX_BATCH_SIZE=4 VLLM_DT_MAX_CONTEXT_LEN=32768 HF_HUB_CACHE=/home/senuser/models/huggingface_cache/hub DT_DEEPRT_VERBOSE=-1 DTLOG_LEVEL=error torchrun --nproc-per-node=4 /home/senuser/aiu-fms-testing-utils/scripts/drive_paged_programs.py --max_new_tokens=8 --model_variant=ibm-granite/granite-3.3-8b-instruct --program_criteria_json_path=/home/senuser/models/fms-tests-dpp-programs/dpp-32k.json --dataset_path=/home/senuser/models/long_context_factoid_post_process.jsonl --dataset_type=rag_factoid --test_type=tokens --distributed --programs 0:0,32640
95+
```
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
The [drive_paged_programs.py](https://github.com/foundation-model-stack/aiu-fms-testing-utils/blob/main/scripts/drive_paged_programs.py) is designed to run and validate paged programs using a specified model variant.
2+
3+
It supports different attention types, including `paged` and `paged_fp8`, with the default set to `paged`. The supported dataset types are `sharegpt` and `rag_factoid`, with the default set to `sharegpt`.
4+
5+
The script can take many arguments/flags according to your needs, but a few notable flags are provided below. To see the description of all the command-line arguments that the script can parse, run it with `--help`.
6+
- `--distributed`: the script can run tests in a distributed environment, utilizing multiple instances of AIU, for faster execution.
7+
- `--skip_validation`: set it to true to skip CPU validation, which will make the script much faster.
8+
- `--save_validation_info_outputs`: set it to true to save cpu validation outputs for later consumption. The saved outputs will allow you to reuse CPU logits.
9+
- `--validation_info_outputs_dir`: path to directory containing validation info outputs. The use of saved outputs will avoid re-compute and will significantly reduce script execution time.
10+
- `--program_criteria_json_path` and `--dataset_path`: for both of these arguments, make sure that the provided directory path exists on your system.
11+
12+
The following examples demonstrate the usage of the script. Replace the `<valid_path>` with your directory path.
13+
14+
You may run the script in a distributed environment with multiple instances of AIU in production. However, for testing purposes, running the script on a single AIU can be very useful. Before we look at the examples for a distributed environment, let us first look at an example command to run on a single instance AIU.
15+
16+
```bash
17+
# Run with 4K context length on a single AIU instance
18+
VLLM_DT_MAX_BATCH_SIZE=4 VLLM_DT_MAX_CONTEXT_LEN=4096 HF_HUB_CACHE=/home/senuser/models/huggingface_cache/hub DT_DEEPRT_VERBOSE=-1 DTLOG_LEVEL=error python3 drive_paged_programs.py --max_new_tokens=8 --model_variant=ibm-granite/granite-3.3-8b-instruct --program_criteria_json_path=/<valid_path>/dpp-4k-5.json --dataset_path=/<valid_path>/ShareGPT_V3_unfiltered_cleaned_split.json --dataset_type=sharegpt --test_type=tokens
19+
```
20+
The following is a snippet of an output that the above command should produce.
21+
22+
```
23+
...
24+
...
25+
[ 0/ 1]: PT compile complete, took 79.305s
26+
[ 0/ 1]: extracted prompts in 1.9285 seconds
27+
[ 0/ 1]: *** testing program 0 ***
28+
[ 0/ 1]: program id: 0, valid prompt: (1, 2880), input shape: torch.Size([1, 2880])
29+
[ 0/ 1]: Prompt Length: 2847
30+
[ 0/ 1]: For Program 0 in sentence 1:
31+
[ 0/ 1]: Prompt:
32+
33+
You are to write a on road scenario configuration in pbtxt format (with other road agents like vehicles and pedestrians) that can be loaded into a self driving car simulation system to check if ego vehicle (called nurobot) can make correct decisions. The configuration contains other road agents, their behaviors, and conditions to check if the ego vehicle's behavior is expected.
34+
35+
Here are some general instructions
36+
- Check the final configuration against the protobuf definition to make sure it is syntax correct
37+
- Ego vehicle is a planner agent
38+
- Other road agents are line follower agents
39+
- Make sure the geo locations are accurate up to 8 digits after decimal
40+
...
41+
...
42+
43+
Below is the scenario specific description:
44+
45+
The ego vehicle is traveling from location (lat: 37.1233212321, lng: -122.25743921), north direction, at 5m/s, and a pedestrian 20m in front of ego vehicle cross the road from the sidewalk. There is another tailgater vehicle behind ego vehicle. We want to create a runtime check in ego vehicle config that the vehicle stopped in front of the pedestrian successfully, and not breaking too hard (<3m/s^2) to avoid collision from behind.
46+
[ 0/ 1]: CPU tokens:
47+
[203, 203, 2538, 322, 10284, 2760, 3488, 436]
48+
[ 0/ 1]: AIU tokens:
49+
[203, 306, 33964, 26755, 12251, 203, 203, 7608]
50+
[ 0/ 1]: CPU output:
51+
52+
53+
Write the pbtxt configuration for
54+
[ 0/ 1]: AIU output:
55+
56+
// PBTXT CONFIG
57+
58+
scene
59+
[ 0/ 1]: all tests passed
60+
```
61+
62+
More examples are provided below for the distributed environment, which utilizes multiple instances of AIU for faster execution.
63+
64+
```bash
65+
# Run with 4K context length
66+
VLLM_DT_MAX_BATCH_SIZE=4 VLLM_DT_MAX_CONTEXT_LEN=4096 HF_HUB_CACHE=/home/senuser/models/huggingface_cache/hub DT_DEEPRT_VERBOSE=-1 DTLOG_LEVEL=error torchrun --nproc-per-node=4 /home/senuser/aiu-fms-testing-utils/scripts/drive_paged_programs.py --max_new_tokens=8 --model_variant=ibm-granite/granite-3.3-8b-instruct --program_criteria_json_path=/<valid_path>/dpp-4k.json --dataset_path=/<valid_path>/ShareGPT_V3_unfiltered_cleaned_split.json --test_type=tokens --distributed
67+
68+
# Run with 8K context length
69+
VLLM_DT_MAX_BATCH_SIZE=16 VLLM_DT_MAX_CONTEXT_LEN=8192 HF_HUB_CACHE=/home/senuser/models/huggingface_cache/hub DT_DEEPRT_VERBOSE=-1 DTLOG_LEVEL=error torchrun --nproc-per-node=4 /home/senuser/aiu-fms-testing-utils/scripts/drive_paged_programs.py --max_new_tokens=8 --model_variant=ibm-granite/granite-3.3-8b-instruct --program_criteria_json_path=/<valid_path>/dpp-8k-16.json --dataset_path=<valid_path>/ShareGPT_V3_unfiltered_cleaned_split.json --dataset_type=sharegpt --test_type=tokens --distributed
70+
71+
# Run with 16K context length using the rag_factoid dataset type and a program with a specific batch size and prompt length
72+
VLLM_DT_MAX_BATCH_SIZE=4 VLLM_DT_MAX_CONTEXT_LEN=16384 HF_HUB_CACHE=/home/senuser/models/huggingface_cache/hub DT_DEEPRT_VERBOSE=-1 DTLOG_LEVEL=error torchrun --nproc-per-node=4 /home/senuser/aiu-fms-testing-utils/scripts/drive_paged_programs.py --max_new_tokens=8 --model_variant=ibm-granite/granite-3.3-8b-instruct --program_criteria_json_path=<valid_path>/dpp-16k.json --dataset_path=/<valid_path>/long_context_factoid_post_process.jsonl --dataset_type=rag_factoid --test_type=tokens --distributed --programs 0:4,16256
73+
74+
# Run with a 32K context length using the rag_factoid dataset type and a program with any batch size and a specific prompt length
75+
EN_PREFILL_OPT=1 VLLM_DT_MAX_BATCH_SIZE=4 VLLM_DT_MAX_CONTEXT_LEN=32768 HF_HUB_CACHE=/home/senuser/models/huggingface_cache/hub DT_DEEPRT_VERBOSE=-1 DTLOG_LEVEL=error torchrun --nproc-per-node=4 /home/senuser/aiu-fms-testing-utils/scripts/drive_paged_programs.py --max_new_tokens=8 --model_variant=ibm-granite/granite-3.3-8b-instruct --program_criteria_json_path=/<valid_path>/dpp-32k.json --dataset_path=/<valid_path>/long_context_factoid_post_process.jsonl --dataset_type=rag_factoid --test_type=tokens --distributed --programs 0:0,32640
76+
```

0 commit comments

Comments
 (0)