Skip to content

Conversation

@gkumbhat
Copy link
Contributor

@gkumbhat gkumbhat commented Jul 25, 2025

Changes

  • Add a test for verifying compilation time to be as expected for given model and sequence length.
  • It currently only compares for dynamic shapes and only for paged attention.

@gkumbhat gkumbhat force-pushed the add_compilation_time_test branch from 59888d0 to 8b67444 Compare July 29, 2025 02:50
@gkumbhat gkumbhat marked this pull request as ready for review July 29, 2025 02:51
gkumbhat added 2 commits July 30, 2025 10:33
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
@gkumbhat gkumbhat force-pushed the add_compilation_time_test branch from 8b67444 to e7c18f8 Compare July 30, 2025 15:37
dprint(f"PT compile complete, took {pt_compile_model_time:.3f}s")


def get_env_to_int_list(env_var_name, default):
Copy link
Contributor

@JRosenkranz JRosenkranz Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add type hints to this as well as explanation for each parameter. Should default's default here be None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll add hints and description..

I thought of making this as a required param.. But None would work as well, since thats the default for os.environ.get

dprint(f"PT compile complete, took {pt_compile_model_time:.3f}s")


def get_env_to_int_list(env_var_name, default):
Copy link
Contributor

@JRosenkranz JRosenkranz Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see this function serving much more utility than just ints. I think we may want to include a third optional parameter which is a function which takes the value and returns the value the user wants. (by default it would just be a list of strings). For instance, we may want to return a list of floats or custom objects. If we don't want to add that much control, we could also have a third parameter to just specify the type to return back

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true.. 3rd parameter as function would require more careful error handling... But we can easily add 3rd parameter with specific return type.. I'll add that.

"SHARE_GPT_DATASET_PATH", os.path.expanduser("~/share_gpt.json")
)

ATTN_NAME = "spyre_paged_attn"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make this configurable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. I'll add that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume when we make it configurable we'll want to reuse the ATTN_TYPE map for consistency so should we move some of these constants outside?

from aiu_fms_testing_utils.utils.aiu_setup import dprint

GRANITE_3p3_8B_INSTRUCT = "ibm-granite/granite-3.3-8b-instruct"
SHARE_GPT_DATASET_PATH = os.environ.get(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given this is just testing compilation time, I don't think we need real data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True.. I thought of keeping it similar to what we do in other places for consistency..

We can do a random text generation as well


ATTN_NAME = "spyre_paged_attn"

COMPILE_DYNAMIC_SHAPE = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make this configurable

"COMMON_COMPILATION_EXPECTED_TIME", [10]
) # In minutes

COMMON_SHAPE_TYPE = "dynamic"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make this configurable


# the compiler supports certain max context lengths (VLLM_DT_MAX_CONTEXT_LEN)
# this will ensure that we select smallest supported VLLM_DT_MAX_CONTEXT_LEN that fits the largest possible context (prompt size + max_new_tokens)
__largest_context = max(common_seq_lengths) + max(common_max_new_tokens)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this test, given we are testing timing, we may want the largest context to be on a per-test basis (rather than spanning all tests). This way we can know the performance implications of changing these values



# TODO: This is copied from test_decoders.py would be good to consolidate
def __prepare_inputs(batch_size, seq_length, tokenizer, seed=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can make this simpler and just use a simple torch.arange with the sizes required

torch.set_default_dtype(torch.float16)
os.environ["COMPILATION_MODE"] = "offline_decoder"

dprint(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include attention type to this print

COMPILE_DYNAMIC_SHAPE = False

model.compile(backend="sendnn", options={"sendnn.dynamic": COMPILE_DYNAMIC_SHAPE})
warmup_model(
Copy link
Contributor

@JRosenkranz JRosenkranz Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only includes the initial compile (not the device warmup). Do we want to include that as well -- inference.py has an example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add it. Thanks for the pointers

Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
@@ -0,0 +1,150 @@
"""This module contains test related to compilation operation"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing dir right now contains tests of tests iiuc, this should live in models and we can move it to integration_tests or something similar when we restructure?

"SHARE_GPT_DATASET_PATH", os.path.expanduser("~/share_gpt.json")
)

ATTN_NAME = "spyre_paged_attn"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume when we make it configurable we'll want to reuse the ATTN_TYPE map for consistency so should we move some of these constants outside?

Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants