Skip to content

Commit aaf0646

Browse files
feat(profiling): [unwrapt] remove wrapt dependency from Lock Profiler (#15003)
https://datadoghq.atlassian.net/browse/PROF-12854 --- ## Description This PR removes `wrapt` dependency from lock profiler - replaced with direct delegation using `__slots__`. It implements the following proposal in the Lock Profiler RFC: [Remove wrapt dependency](https://docs.google.com/document/d/12ao0XhiO8SJpEB1PNku-Brzkm6ru4OQqgVDosiW_Bi0/edit?tab=t.0#heading=h.l944x0fppaqe) ### Why - **Reduce memory overhead:** 75% reduction in per-lock memory usage - **Simpler code:** No external dependency, easier to maintain - **Consistent behavior:** No more WRAPT_C_EXT detection (wrapt's compiled C extension vs pure Python) ### Changes - Replaced `wrapt.ObjectProxy` with internal `_ProfiledLock` class using `__slots__` - `__slots__` explicitly declares data members without using `__dict__` - Eliminates wrapt's 360-byte hidden dictionary - Added `_LockAllocatorWrapper` as protocol wrapper - Removed environment-dependent frame depth logic (2 if WRAPT_C_EXT, else 3) - Implemented essential special methods: `__hash__`, `__eq__`, `__getattr__` ### Performance **Measurement tools:** - Memory: `gc.get_referents()` + `sys.getsizeof()` (direct object measurement) - Performance: Custom `benchmark.py` with `tracemalloc` and `time.perf_counter()` - Verification: Cache-busted tests in fresh Python processes **Test configuration:** Ran empirical benchmarks at 1% and 100% sampling rates: - Memory: 1,000 locks - Creation: 10,000 locks (5 iterations) - Acquire/Release: 100,000 operations (5 iterations) - Throughput: 10 threads × 10,000 operations | Metric | Before | After | Change | |--------|--------|-------|--------| | Memory (wrapper) | 416 bytes/lock | 104 bytes/lock | -75% | | Creation | 1.58 µs/lock | 1.56 µs/lock | +1% | | Acquire/Release | 1,429 ns/op | 1,419 ns/op | +1% | | Throughput | 738k ops/sec | 737k ops/sec | ≈ Same | **Key findings:** - Memory savings are consistent across all sampling rates (1%, 100%, etc.) - No performance regression in synthetic benchmarks - At 100k locks: saves ~30 MB _Note: These are synthetic benchmarks. Real-world impact will be measured using [dd-trace-doe](https://github.com/DataDog/dd-trace-doe) benchmarking framework._ _See [`benchmarks/lock_profiler_wrapt_removal/VERIFIED_COMPARISON.md`](https://github.com/DataDog/dd-trace-py/blob/vlad/benchmark-lock-profiler/benchmarks/lock_profiler_wrapt_removal/VERIFIED_COMPARISON.md) for details._ ### Implementation Notes `wrapt.ObjectProxy` provided automatic delegation. We now implement these features directly: 1. Attribute forwarding via `__getattr__` 2. Identity operations via `__hash__` and `__eq__` 3. Context management via `__enter__`, `__exit__`, `__aenter__`, `__aexit__` 4. Direct method wrapping for acquire/release profiling Removed/Not Needed: - Transparent `isinstance()` checks - Full introspection support (`dir()`, `vars()`) - Pickle (serialization) support - Weak reference support ## Testing - All existing tests pass (updated `test_patch` for new behavior) - Empirically verified with cache-busted tests on both branches - Full benchmark results: [`vlad/benchmark-lock-profiler`](https://github.com/DataDog/dd-trace-py/tree/vlad/benchmark-lock-profiler/benchmarks/lock_profiler_wrapt_removal) ## Risks **Low:** Existing functionality intact, as shown by unit/e2e tests and empirical benchmarks. **Potential issues:** Users with esoteric workflows depending on deep wrapt features (unlikely). --------- Co-authored-by: Taegyun Kim <taegyun.kim@datadoghq.com>
1 parent 81acf6d commit aaf0646

File tree

4 files changed

+351
-161
lines changed

4 files changed

+351
-161
lines changed

ddtrace/profiling/collector/_lock.py

Lines changed: 71 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,6 @@
1717
from typing import Tuple
1818
from typing import Type
1919

20-
import wrapt
21-
2220
from ddtrace.internal.datadog.profiling import ddup
2321
from ddtrace.profiling import _threading
2422
from ddtrace.profiling import collector
@@ -40,41 +38,63 @@ def _current_thread() -> Tuple[int, str]:
4038
return thread_id, _threading.get_thread_name(thread_id)
4139

4240

43-
# We need to know if wrapt is compiled in C or not. If it's not using the C module, then the wrappers function will
44-
# appear in the stack trace and we need to hide it.
45-
WRAPT_C_EXT: bool
46-
if os.environ.get("WRAPT_DISABLE_EXTENSIONS"):
47-
WRAPT_C_EXT = False
48-
else:
49-
try:
50-
import wrapt._wrappers as _w # noqa: F401
51-
except ImportError:
52-
WRAPT_C_EXT = False
53-
else:
54-
WRAPT_C_EXT = True
55-
del _w
41+
class _ProfiledLock:
42+
"""
43+
Lightweight lock wrapper that profiles lock acquire/release operations.
44+
It intercepts lock methods without the overhead of a full proxy object.
45+
"""
5646

47+
__slots__ = (
48+
"__wrapped__",
49+
"_self_tracer",
50+
"_self_max_nframes",
51+
"_self_capture_sampler",
52+
"_self_init_loc",
53+
"_self_acquired_at",
54+
"_self_name",
55+
)
5756

58-
class _ProfiledLock(wrapt.ObjectProxy):
5957
def __init__(
6058
self,
6159
wrapped: Any,
6260
tracer: Optional[Tracer],
6361
max_nframes: int,
6462
capture_sampler: collector.CaptureSampler,
65-
endpoint_collection_enabled: bool,
6663
) -> None:
67-
wrapt.ObjectProxy.__init__(self, wrapped)
64+
self.__wrapped__: Any = wrapped
6865
self._self_tracer: Optional[Tracer] = tracer
6966
self._self_max_nframes: int = max_nframes
7067
self._self_capture_sampler: collector.CaptureSampler = capture_sampler
71-
self._self_endpoint_collection_enabled: bool = endpoint_collection_enabled
72-
frame: FrameType = sys._getframe(2 if WRAPT_C_EXT else 3)
68+
# Frame depth: 0=__init__, 1=_profiled_allocate_lock, 2=_LockAllocatorWrapper.__call__, 3=caller
69+
frame: FrameType = sys._getframe(3)
7370
code: CodeType = frame.f_code
7471
self._self_init_loc: str = "%s:%d" % (os.path.basename(code.co_filename), frame.f_lineno)
7572
self._self_acquired_at: int = 0
7673
self._self_name: Optional[str] = None
7774

75+
### DUNDER methods ###
76+
77+
def __eq__(self, other: Any) -> bool:
78+
if isinstance(other, _ProfiledLock):
79+
return self.__wrapped__ == other.__wrapped__
80+
return self.__wrapped__ == other
81+
82+
def __getattr__(self, name: str) -> Any:
83+
# Delegates acquire_lock, release_lock, locked_lock, and any future methods
84+
return getattr(self.__wrapped__, name)
85+
86+
def __hash__(self) -> int:
87+
return hash(self.__wrapped__)
88+
89+
def __repr__(self) -> str:
90+
return f"<_ProfiledLock({self.__wrapped__!r}) at {self._self_init_loc}>"
91+
92+
### Regular methods ###
93+
94+
def locked(self) -> bool:
95+
"""Return True if lock is currently held."""
96+
return self.__wrapped__.locked()
97+
7898
def acquire(self, *args: Any, **kwargs: Any) -> Any:
7999
return self._acquire(self.__wrapped__.acquire, *args, **kwargs)
80100

@@ -115,11 +135,6 @@ def __aexit__(self, *args: Any, **kwargs: Any) -> Any:
115135
return self._release(self.__wrapped__.__aexit__, *args, **kwargs)
116136

117137
def _release(self, inner_func: Callable[..., Any], *args: Any, **kwargs: Any) -> None:
118-
# The underlying threading.Lock class is implemented using C code, and
119-
# it doesn't have the __dict__ attribute. So we can't do
120-
# self.__dict__.pop("_self_acquired_at", None) to remove the attribute.
121-
# Instead, we need to use the following workaround to retrieve and
122-
# remove the attribute.
123138
start: Optional[int] = getattr(self, "_self_acquired_at", None)
124139
try:
125140
# Though it should generally be avoided to call release() from
@@ -130,7 +145,6 @@ def _release(self, inner_func: Callable[..., Any], *args: Any, **kwargs: Any) ->
130145
# and unlocked lock, and the expected behavior is to propagate that.
131146
del self._self_acquired_at
132147
except AttributeError:
133-
# We just ignore the error, if the attribute is not found.
134148
pass
135149

136150
try:
@@ -196,9 +210,13 @@ def _find_self_name(self, var_dict: Dict[str, Any]) -> Optional[str]:
196210
return name
197211
if config.lock.name_inspect_dir:
198212
for attribute in dir(value):
199-
if not attribute.startswith("__") and getattr(value, attribute) is self:
200-
self._self_name = attribute
201-
return attribute
213+
try:
214+
if not attribute.startswith("__") and getattr(value, attribute) is self:
215+
self._self_name = attribute
216+
return attribute
217+
except AttributeError:
218+
# Accessing unset attributes in __slots__ raises AttributeError.
219+
pass
202220
return None
203221

204222
# Get lock acquire/release call location and variable name the lock is assigned to
@@ -225,11 +243,19 @@ def _maybe_update_self_name(self) -> None:
225243
self._self_name = self._find_self_name(frame.f_locals) or self._find_self_name(frame.f_globals) or ""
226244

227245

228-
class FunctionWrapper(wrapt.FunctionWrapper):
229-
# Override the __get__ method: whatever happens, _allocate_lock is always considered by Python like a "static"
230-
# method, even when used as a class attribute. Python never tried to "bind" it to a method, because it sees it is a
231-
# builtin function. Override default wrapt behavior here that tries to detect bound method.
232-
def __get__(self, instance: Any, owner: Optional[Type] = None) -> FunctionWrapper: # type: ignore
246+
class _LockAllocatorWrapper:
247+
"""Wrapper for lock allocator functions that prevents method binding."""
248+
249+
__slots__ = ("_func",)
250+
251+
def __init__(self, func: Callable[..., Any]) -> None:
252+
self._func: Callable[..., Any] = func
253+
254+
def __call__(self, *args: Any, **kwargs: Any) -> Any:
255+
return self._func(*args, **kwargs)
256+
257+
def __get__(self, instance: Any, owner: Optional[Type] = None) -> _LockAllocatorWrapper:
258+
# Prevent automatic method binding (e.g., Foo.lock_class = threading.Lock)
233259
return self
234260

235261

@@ -241,16 +267,14 @@ class LockCollector(collector.CaptureSamplerCollector):
241267
def __init__(
242268
self,
243269
nframes: int = config.max_frames,
244-
endpoint_collection_enabled: bool = config.endpoint_collection,
245270
tracer: Optional[Tracer] = None,
246271
*args: Any,
247272
**kwargs: Any,
248273
) -> None:
249274
super().__init__(*args, **kwargs)
250275
self.nframes: int = nframes
251-
self.endpoint_collection_enabled: bool = endpoint_collection_enabled
252276
self.tracer: Optional[Tracer] = tracer
253-
self._original: Optional[Any] = None
277+
self._original_lock: Any = None
254278

255279
@abc.abstractmethod
256280
def _get_patch_target(self) -> Callable[..., Any]:
@@ -272,23 +296,20 @@ def _stop_service(self) -> None:
272296

273297
def patch(self) -> None:
274298
"""Patch the module for tracking lock allocation."""
275-
# We only patch the lock from the `threading` module.
276-
# Nobody should use locks from `_thread`; if they do so, then it's deliberate and we don't profile.
277-
self._original = self._get_patch_target()
299+
self._original_lock = self._get_patch_target()
300+
original_lock: Any = self._original_lock # Capture non-None value
278301

279-
# TODO: `instance` is unused
280-
def _allocate_lock(wrapped: Any, instance: Any, args: Any, kwargs: Any) -> _ProfiledLock:
281-
lock: Any = wrapped(*args, **kwargs)
302+
def _profiled_allocate_lock(*args: Any, **kwargs: Any) -> _ProfiledLock:
303+
"""Simple wrapper that returns profiled locks."""
282304
return self.PROFILED_LOCK_CLASS(
283-
lock,
284-
self.tracer,
285-
self.nframes,
286-
self._capture_sampler,
287-
self.endpoint_collection_enabled,
305+
wrapped=original_lock(*args, **kwargs),
306+
tracer=self.tracer,
307+
max_nframes=self.nframes,
308+
capture_sampler=self._capture_sampler,
288309
)
289310

290-
self._set_patch_target(FunctionWrapper(self._original, _allocate_lock))
311+
self._set_patch_target(_LockAllocatorWrapper(_profiled_allocate_lock))
291312

292313
def unpatch(self) -> None:
293314
"""Unpatch the threading module for tracking lock allocation."""
294-
self._set_patch_target(self._original)
315+
self._set_patch_target(self._original_lock)
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
other:
3+
- |
4+
profiling: This removes the ``wrapt`` library dependency from the Lock Profiler implementation, improving performance and reducing overhead during lock instrumentation.

tests/profiling_v2/collector/test_asyncio.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,7 @@
2121
def test_repr():
2222
test_collector._test_repr(
2323
collector_asyncio.AsyncioLockCollector,
24-
"AsyncioLockCollector(status=<ServiceStatus.STOPPED: 'stopped'>, "
25-
"capture_pct=1.0, nframes=64, "
26-
"endpoint_collection_enabled=True, tracer=None)",
24+
"AsyncioLockCollector(status=<ServiceStatus.STOPPED: 'stopped'>, capture_pct=1.0, nframes=64, tracer=None)", # noqa: E501
2725
)
2826

2927

0 commit comments

Comments
 (0)