-
Notifications
You must be signed in to change notification settings - Fork 31
✨ Add compilation time test #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
✨ Add compilation time test #92
Conversation
59888d0 to
8b67444
Compare
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
8b67444 to
e7c18f8
Compare
| dprint(f"PT compile complete, took {pt_compile_model_time:.3f}s") | ||
|
|
||
|
|
||
| def get_env_to_int_list(env_var_name, default): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add type hints to this as well as explanation for each parameter. Should default's default here be None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'll add hints and description..
I thought of making this as a required param.. But None would work as well, since thats the default for os.environ.get
| dprint(f"PT compile complete, took {pt_compile_model_time:.3f}s") | ||
|
|
||
|
|
||
| def get_env_to_int_list(env_var_name, default): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see this function serving much more utility than just ints. I think we may want to include a third optional parameter which is a function which takes the value and returns the value the user wants. (by default it would just be a list of strings). For instance, we may want to return a list of floats or custom objects. If we don't want to add that much control, we could also have a third parameter to just specify the type to return back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true.. 3rd parameter as function would require more careful error handling... But we can easily add 3rd parameter with specific return type.. I'll add that.
tests/testing/test_compilation.py
Outdated
| "SHARE_GPT_DATASET_PATH", os.path.expanduser("~/share_gpt.json") | ||
| ) | ||
|
|
||
| ATTN_NAME = "spyre_paged_attn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make this configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. I'll add that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume when we make it configurable we'll want to reuse the ATTN_TYPE map for consistency so should we move some of these constants outside?
| from aiu_fms_testing_utils.utils.aiu_setup import dprint | ||
|
|
||
| GRANITE_3p3_8B_INSTRUCT = "ibm-granite/granite-3.3-8b-instruct" | ||
| SHARE_GPT_DATASET_PATH = os.environ.get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given this is just testing compilation time, I don't think we need real data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True.. I thought of keeping it similar to what we do in other places for consistency..
We can do a random text generation as well
tests/testing/test_compilation.py
Outdated
|
|
||
| ATTN_NAME = "spyre_paged_attn" | ||
|
|
||
| COMPILE_DYNAMIC_SHAPE = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make this configurable
tests/testing/test_compilation.py
Outdated
| "COMMON_COMPILATION_EXPECTED_TIME", [10] | ||
| ) # In minutes | ||
|
|
||
| COMMON_SHAPE_TYPE = "dynamic" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make this configurable
tests/testing/test_compilation.py
Outdated
|
|
||
| # the compiler supports certain max context lengths (VLLM_DT_MAX_CONTEXT_LEN) | ||
| # this will ensure that we select smallest supported VLLM_DT_MAX_CONTEXT_LEN that fits the largest possible context (prompt size + max_new_tokens) | ||
| __largest_context = max(common_seq_lengths) + max(common_max_new_tokens) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this test, given we are testing timing, we may want the largest context to be on a per-test basis (rather than spanning all tests). This way we can know the performance implications of changing these values
|
|
||
|
|
||
| # TODO: This is copied from test_decoders.py would be good to consolidate | ||
| def __prepare_inputs(batch_size, seq_length, tokenizer, seed=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can make this simpler and just use a simple torch.arange with the sizes required
| torch.set_default_dtype(torch.float16) | ||
| os.environ["COMPILATION_MODE"] = "offline_decoder" | ||
|
|
||
| dprint( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include attention type to this print
| COMPILE_DYNAMIC_SHAPE = False | ||
|
|
||
| model.compile(backend="sendnn", options={"sendnn.dynamic": COMPILE_DYNAMIC_SHAPE}) | ||
| warmup_model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only includes the initial compile (not the device warmup). Do we want to include that as well -- inference.py has an example
aiu-fms-testing-utils/scripts/inference.py
Line 824 in bd1090e
| if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add it. Thanks for the pointers
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
| @@ -0,0 +1,150 @@ | |||
| """This module contains test related to compilation operation""" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing dir right now contains tests of tests iiuc, this should live in models and we can move it to integration_tests or something similar when we restructure?
tests/testing/test_compilation.py
Outdated
| "SHARE_GPT_DATASET_PATH", os.path.expanduser("~/share_gpt.json") | ||
| ) | ||
|
|
||
| ATTN_NAME = "spyre_paged_attn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume when we make it configurable we'll want to reuse the ATTN_TYPE map for consistency so should we move some of these constants outside?
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Changes