Skip to content

Conversation

@TomeHirata
Copy link
Collaborator

@TomeHirata TomeHirata commented Jul 3, 2025

This PR enables running unit tests that require real LLM calls, which have been skipped in CI.
The model for testing is configurable through LLM_MODEL env variable, and we use ollama/llama3.2:3b in the branch build to balance quality and latency. The Ollama model pulling only introduces ~13s latency, so this PR just enables real LLM tests in the branch build instead of nightly tests.

@TomeHirata TomeHirata changed the title use real LLM for unit tests Use real LLM for unit tests Jul 3, 2025
@TomeHirata TomeHirata changed the title Use real LLM for unit tests Run unit tests with real LLM calls Jul 3, 2025
@chenmoneygithub
Copy link
Collaborator

@TomeHirata This is awesome! One thing I want to discuss.

I compared the time cost of running test before and after this PR, seems we are seeing a significant increase from 2m => 4m:

Before:
image

After:
image

4m seems to be all right, but that's the outcome of only converting 6 test cases into using Ollama, which means it could become much slower over time. I am thinking about the following:

  1. Use pytest markers to split tests requiring actual LM into its own group, like @pytest.mark.real_lm_call. These tests will be skipped in normal github tests, and only run in the dedicated action Run Tests with Real LM. We may not need to test across all python versions, because essentially it's an supplementary to the normal test to test against actual LM behavior.
  2. Prefer mocking over real LM testing in unit tests. Only when mocking goes too complex like optimizer, or unreliable over time like streaming, we switch to using the real LM.

Let me know what you think!

@TomeHirata
Copy link
Collaborator Author

Hi, @chenmoneygithub. I agree with both points. I've split LLM call tests into a separate job. I completely agree with #2. We should limit the usage of LLM call in unit tests because of latency and potential flakiness.

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setup looks good!

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@TomeHirata TomeHirata merged commit b4d1a7e into stanfordnlp:main Jul 8, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants