⚡️ Speed up function is_local_module by 667%
#597
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 667% (6.67x) speedup for
is_local_moduleinmarimo/_utils/site_packages.py⏱️ Runtime :
47.4 milliseconds→6.18 milliseconds(best of81runs)📝 Explanation and details
The optimization delivers a 666% speedup by eliminating expensive file system operations and replacing them with fast string comparisons.
Key optimizations applied:
Pre-resolved site-packages paths: The
_getsitepackages()function now calls.resolve()on all site-packages directories upfront and caches them. This eliminates repeated path resolution operations insideis_local_module().Conditional path resolution: Instead of always calling
.resolve()on the module path (which was taking 40.5% of runtime in the original), the optimized version only resolves paths when they're not already absolute. This reduces expensive file system calls from 1,387 to just 106 in the profiled run.String-based prefix matching: Replaced the expensive
Path.is_relative_to()method (55.8% of original runtime) with fast string comparisons usingstartswith()plus separator validation. This avoids file system operations entirely for the core comparison logic.Why this leads to speedup:
Path.resolve()andPath.is_relative_to()perform file system syscalls to canonicalize paths and check relationshipsstartswith()are pure CPU operations that are orders of magnitude fasterPerformance characteristics from tests:
The optimization transforms a file-system-heavy operation into a primarily string-based one, making it much more suitable for high-frequency module checking scenarios.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
_utils/test_site_packages.py::test_is_local_module🌀 Generated Regression Tests and Runtime
import pathlib
imports
import pytest
from marimo._utils.site_packages import is_local_module
function to test
(The function is_local_module is defined above.)
Helper class to simulate module specs for testing
class DummySpec:
def init(self, origin):
self.origin = origin
---------------------------
Basic Test Cases
---------------------------
def test_local_module_simple_path():
# Local module with an absolute path not in site-packages
spec = DummySpec("/home/user/project/mymodule.py")
codeflash_output = is_local_module(spec) # 88.3μs -> 146μs (39.5% slower)
def test_local_module_relative_path():
# Local module with a relative path
spec = DummySpec("mymodule.py")
codeflash_output = is_local_module(spec) # 53.8μs -> 30.5μs (76.8% faster)
def test_site_packages_module_simple():
# Module in site-packages (typical pip-installed location)
spec = DummySpec("/usr/local/lib/python3.10/site-packages/numpy/init.py")
codeflash_output = is_local_module(spec) # 788ns -> 784ns (0.510% faster)
def test_site_packages_module_windows():
# Windows-style site-packages path
spec = DummySpec("C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\requests\init.py")
codeflash_output = is_local_module(spec) # 740ns -> 736ns (0.543% faster)
def test_none_spec():
# None spec should be considered local
codeflash_output = is_local_module(None) # 362ns -> 413ns (12.3% slower)
def test_none_origin():
# Spec with None origin should be considered local
spec = DummySpec(None)
codeflash_output = is_local_module(spec) # 480ns -> 476ns (0.840% faster)
---------------------------
Edge Test Cases
---------------------------
def test_site_packages_in_foldername_but_not_site_packages():
# Path contains 'site-packages' in a folder name but is not actually site-packages
spec = DummySpec("/home/user/project/mysite-packagesmodule.py")
codeflash_output = is_local_module(spec) # 730ns -> 770ns (5.19% slower)
def test_module_in_subdirectory_of_site_packages():
# Module is in a subdirectory of site-packages
spec = DummySpec("/usr/local/lib/python3.10/site-packages/my_package/submodule.py")
codeflash_output = is_local_module(spec) # 712ns -> 719ns (0.974% slower)
def test_module_in_directory_named_site_packages_but_not_python_site_packages():
# Path contains a directory named 'site-packages' but is not Python's site-packages
spec = DummySpec("/home/user/site-packages-fake/mymodule.py")
codeflash_output = is_local_module(spec) # 700ns -> 702ns (0.285% slower)
def test_module_with_symlinked_path():
# Simulate a symlinked path (should resolve to real path)
# This test is limited by the fact that we can't create real symlinks here,
# but we can check that the function doesn't crash.
spec = DummySpec("/usr/local/lib/python3.10/site-packages/../site-packages/numpy/init.py")
codeflash_output = is_local_module(spec) # 699ns -> 661ns (5.75% faster)
def test_module_with_nonexistent_path():
# Path that does not exist on disk
spec = DummySpec("/some/nonexistent/path/to/module.py")
# Should be considered local if not in site-packages
codeflash_output = is_local_module(spec) # 68.7μs -> 12.5μs (449% faster)
def test_module_with_weird_characters_in_path():
# Path with unusual unicode characters
spec = DummySpec("/home/user/projéct/模块.py")
codeflash_output = is_local_module(spec) # 67.2μs -> 13.6μs (394% faster)
def test_module_with_empty_string_origin():
# Empty string as origin
spec = DummySpec("")
codeflash_output = is_local_module(spec) # 46.7μs -> 29.1μs (60.8% faster)
def test_module_with_dot_origin():
# "." as origin (current directory)
spec = DummySpec(".")
codeflash_output = is_local_module(spec) # 48.5μs -> 25.3μs (91.5% faster)
def test_module_with_site_packages_in_file_name():
# 'site-packages' in file name, not directory
spec = DummySpec("/home/user/project/site-packages_module.py")
codeflash_output = is_local_module(spec) # 754ns -> 696ns (8.33% faster)
---------------------------
Large Scale Test Cases
---------------------------
def test_many_local_modules():
# Test with a large number of local modules
for i in range(500):
spec = DummySpec(f"/home/user/project/module_{i}.py")
codeflash_output = is_local_module(spec) # 16.2ms -> 1.75ms (830% faster)
def test_mixed_large_scale_modules():
# Mix of local and site-packages modules
for i in range(250):
spec_local = DummySpec(f"/home/user/project/module_{i}.py")
spec_site = DummySpec(f"/usr/local/lib/python3.10/site-packages/pkg_{i}/init.py")
codeflash_output = is_local_module(spec_local) # 8.31ms -> 896μs (827% faster)
codeflash_output = is_local_module(spec_site)
def test_large_scale_edge_cases():
# Large set of modules with edge-case paths
for i in range(100):
# Path with 'site-packages' in a non-site-packages context
spec = DummySpec(f"/home/user/project/site-packages-fake/module_{i}.py")
codeflash_output = is_local_module(spec) # 20.3μs -> 19.8μs (2.17% faster)
# Path with 'site-packages' as part of the filename
spec2 = DummySpec(f"/home/user/project/module_site-packages_{i}.py")
codeflash_output = is_local_module(spec2)
def test_large_scale_none_and_empty():
# Many None and empty origin specs
for i in range(100):
codeflash_output = is_local_module(None) # 16.4μs -> 15.9μs (2.88% faster)
spec = DummySpec(None)
codeflash_output = is_local_module(spec)
spec2 = DummySpec("") # 17.6μs -> 16.4μs (7.06% faster)
codeflash_output = is_local_module(spec2)
---------------------------
Additional Robustness Tests
---------------------------
def test_module_in_site_packages_with_different_casing():
# Path with different casing (should be case-sensitive on Unix)
spec = DummySpec("/usr/local/lib/python3.10/SITE-PACKAGES/numpy/init.py")
# On Unix, this is not site-packages
codeflash_output = is_local_module(spec) # 76.6μs -> 14.5μs (429% faster)
#------------------------------------------------
from future import annotations
import functools
import pathlib
import site
from typing import Any
imports
import pytest
from marimo._utils.site_packages import is_local_module
unit tests
class DummySpec:
"""A dummy module spec for testing."""
def init(self, origin):
self.origin = origin
Helper to get a plausible site-packages path
def get_site_packages_path():
dirs = _getsitepackages()
if dirs:
return str(dirs[0])
# Fallback to a typical path
return str(pathlib.Path(file).parent / "site-packages")
Basic Test Cases
def test_local_module_simple_relative():
# Local module in current directory
spec = DummySpec("./mymodule.py")
codeflash_output = is_local_module(spec) # 80.1μs -> 151μs (47.3% slower)
def test_local_module_absolute_path():
# Local module with absolute path outside site-packages
spec = DummySpec(str(pathlib.Path(file).parent / "project" / "module.py"))
codeflash_output = is_local_module(spec) # 64.6μs -> 7.88μs (720% faster)
def test_local_module_in_home():
# Local module in user's home directory
spec = DummySpec(str(pathlib.Path.home() / "myproject" / "main.py"))
codeflash_output = is_local_module(spec) # 59.7μs -> 8.29μs (620% faster)
Edge Test Cases
def test_none_spec():
# Spec is None
codeflash_output = is_local_module(None) # 366ns -> 397ns (7.81% slower)
def test_none_origin():
# Spec.origin is None
spec = DummySpec(None)
codeflash_output = is_local_module(spec) # 484ns -> 477ns (1.47% faster)
def test_empty_origin():
# Spec.origin is empty string
spec = DummySpec("")
codeflash_output = is_local_module(spec) # 50.2μs -> 26.8μs (87.2% faster)
def test_site_packages_in_path_but_not_in_site_packages():
# Path contains "site-packages" but is not under actual site-packages directory
fake_path = str(pathlib.Path(file).parent / "not_really_site-packages" / "foo.py")
spec = DummySpec(fake_path)
codeflash_output = is_local_module(spec) # 834ns -> 825ns (1.09% faster)
def test_nonexistent_path():
# Path does not exist
spec = DummySpec("/nonexistent/path/to/module.py")
codeflash_output = is_local_module(spec) # 62.7μs -> 11.6μs (440% faster)
def test_origin_is_directory():
# Origin is a directory, not a file
spec = DummySpec(str(pathlib.Path(file).parent))
codeflash_output = is_local_module(spec) # 57.1μs -> 7.77μs (634% faster)
def test_origin_is_number():
# Origin is a number (invalid type)
spec = DummySpec(12345)
with pytest.raises(Exception):
# Should raise due to pathlib.Path(12345) being invalid
is_local_module(spec) # 1.70μs -> 1.60μs (5.79% faster)
def test_origin_is_bytes():
# Origin is bytes (invalid type)
spec = DummySpec(b"/tmp/module.py")
with pytest.raises(Exception):
is_local_module(spec) # 1.73μs -> 1.72μs (0.465% faster)
def test_path_with_site_packages_in_parts():
# Path has "site-packages" as a folder name, but not in actual site-packages
fake_path = str(pathlib.Path("/tmp/site-packages-not-real/module.py"))
spec = DummySpec(fake_path)
codeflash_output = is_local_module(spec) # 832ns -> 790ns (5.32% faster)
def test_many_local_modules():
# Test with 500 local modules
for i in range(500):
path = str(pathlib.Path(file).parent / f"local_mod_{i}.py")
spec = DummySpec(path)
codeflash_output = is_local_module(spec) # 19.0ms -> 1.84ms (933% faster)
#------------------------------------------------
from marimo._utils.site_packages import is_local_module
import pytest
def test_is_local_module():
with pytest.raises(AttributeError, match="'SymbolicInt'\ object\ has\ no\ attribute\ 'origin'"):
is_local_module(0)
To edit these changes
git checkout codeflash/optimize-is_local_module-mhv6koysand push.