Skip to content

Conversation

@alex-jw-brooks
Copy link
Contributor

@alex-jw-brooks alex-jw-brooks commented Jul 30, 2025

This PR builds on top of #20 and #93 to add a cache for testing using the refactored version of the test to allow some code reuse. #93 should probably be merged first (splitting this out for readability).

Summary of changes (wrt the original cache test PR)
- Makes sure gptq kwargs are passed through to the AIU model
- Makes sure options={"sendnn.dynamic": COMPILE_DYNAMIC_SENDNN} is passed consistently
- Clears the torch sendnn .cache - the current PR can break if the cache test runs second since the cache paths aren't actually reset in torch sendnn. We reset the compiler settings and clear the directory, but don't clear the spyre cache object in the current PR, which causes alignment issues if the cache test doesn't run first
- The current PR runs the check as two tests (cache miss -> cache hit); moves the cache miss test to run as a fixture to set things up so that we can just run cache hit as the test

Note that there is still some weirdness around how micro models are handled, mostly due to the way we configure common models paths / micro model usage and also check thresholds based on whether micro models exist.

@alex-jw-brooks alex-jw-brooks force-pushed the rebased_cache_tests branch 2 times, most recently from 1be911b to ad3073c Compare July 30, 2025 12:26
@alex-jw-brooks alex-jw-brooks changed the title Rebased cache tests Add Cache Miss/Hit Test Jul 30, 2025
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
@alex-jw-brooks
Copy link
Contributor Author

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.3-8b-instruct BATCH_SIZE=1,2 SEQUENCE_LENGTH=1024,2048 USE_TINY_MODEL=1 NUM_AIU=4

model_kwargs = _get_common_model_kwargs(is_gptq, model_path)

# Get the AIU model w/ the persistent model fixture
model = persistent_model.get_or_create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are re-creating the model and validation_model when it's already being created in the fixture. Is this required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! Good point, returned them both out of the fixture and deleted it from the cache hit check so that it'll be reused

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
@alex-jw-brooks
Copy link
Contributor Author

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.3-8b-instruct BATCH_SIZE=1,2 SEQUENCE_LENGTH=1024 USE_TINY_MODEL=1 NUM_AIU=4

Copy link
Contributor

@JRosenkranz JRosenkranz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm once the duplicate lines are fixed

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants