Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 7, 2025

📄 35% (0.35x) speedup for BCDataStream.read_uint64 in electrum/transaction.py

⏱️ Runtime : 1.83 milliseconds 1.36 milliseconds (best of 14 runs)

📝 Explanation and details

The optimization inlines the read_uint64 method to eliminate function call overhead and avoid redundant format string operations. Instead of calling the generic _read_num('<Q') method, the optimized version directly implements the uint64 reading logic within read_uint64.

Key changes:

  • Eliminates function call overhead: Removes the method call to _read_num, reducing Python's function call stack overhead
  • Replaces struct.calcsize('<Q') with constant 8: The optimized version uses the hardcoded value 8 instead of calling struct.calcsize('<Q') which must parse the format string and calculate the size at runtime
  • Direct format string usage: Uses '<Q' directly in struct.unpack_from rather than passing it through a parameter

Performance impact:
The line profiler shows the optimization reduces total execution time from 20.13ms to 5.96ms (70% reduction in the profiled section). The cursor increment operation becomes significantly faster (from 393.5ns to 324.3ns per hit) by avoiding the struct.calcsize call. The struct.unpack_from call itself shows slight improvement due to reduced function call overhead.

Test case performance:
All test cases show consistent 26-55% speedup, with larger improvements in scenarios involving multiple sequential reads (like the large-scale tests processing 1000 uint64 values). This suggests the optimization is particularly beneficial for Bitcoin transaction parsing workloads that process many consecutive uint64 values.

The optimization maintains identical behavior and error handling while providing substantial performance gains for a commonly used deserialization operation in Bitcoin data processing.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 5069 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import struct

# imports
import pytest
from electrum.transaction import BCDataStream


# function to test
class SerializationError(Exception):
    pass

# unit tests

# ----------- BASIC TEST CASES -----------

def test_read_uint64_zero():
    # Test reading 0 (all bytes zero)
    s = BCDataStream()
    s.input = b'\x00\x00\x00\x00\x00\x00\x00\x00'
    s.read_cursor = 0
    codeflash_output = s.read_uint64() # 2.59μs -> 1.93μs (34.5% faster)

def test_read_uint64_one():
    # Test reading 1 (little-endian)
    s = BCDataStream()
    s.input = b'\x01\x00\x00\x00\x00\x00\x00\x00'
    s.read_cursor = 0
    codeflash_output = s.read_uint64() # 1.79μs -> 1.30μs (37.3% faster)

def test_read_uint64_typical_value():
    # Test reading a typical value (0x0123456789abcdef)
    s = BCDataStream()
    s.input = b'\xef\xcd\xab\x89\x67\x45\x23\x01'
    s.read_cursor = 0
    codeflash_output = s.read_uint64() # 1.69μs -> 1.25μs (34.8% faster)

def test_read_uint64_max_value():
    # Test reading max uint64 (all bytes 0xff)
    s = BCDataStream()
    s.input = b'\xff\xff\xff\xff\xff\xff\xff\xff'
    s.read_cursor = 0
    codeflash_output = s.read_uint64() # 1.72μs -> 1.22μs (40.7% faster)

def test_read_uint64_cursor_advance():
    # Test that read_cursor advances by 8 after read
    s = BCDataStream()
    s.input = b'\x01\x02\x03\x04\x05\x06\x07\x08' + b'\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10'
    s.read_cursor = 0
    codeflash_output = s.read_uint64(); val1 = codeflash_output # 1.65μs -> 1.24μs (32.5% faster)
    codeflash_output = s.read_uint64(); val2 = codeflash_output # 751ns -> 563ns (33.4% faster)

# ----------- EDGE TEST CASES -----------

def test_read_uint64_incomplete_data():
    # Test reading with less than 8 bytes (should raise SerializationError)
    s = BCDataStream()
    s.input = b'\x01\x02\x03\x04\x05\x06\x07'  # Only 7 bytes
    s.read_cursor = 0
    with pytest.raises(SerializationError):
        s.read_uint64()

def test_read_uint64_cursor_at_end():
    # Test reading when cursor is at end (should raise SerializationError)
    s = BCDataStream()
    s.input = b'\x01\x02\x03\x04\x05\x06\x07\x08'
    s.read_cursor = 8  # Already at end
    with pytest.raises(SerializationError):
        s.read_uint64()

def test_read_uint64_cursor_near_end():
    # Test reading when cursor is near end (not enough bytes left)
    s = BCDataStream()
    s.input = b'\x01\x02\x03\x04\x05\x06\x07\x08\x09'
    s.read_cursor = 2  # Only 7 bytes left
    with pytest.raises(SerializationError):
        s.read_uint64()

def test_read_uint64_negative_cursor():
    # Test behavior if read_cursor is negative (should raise SerializationError)
    s = BCDataStream()
    s.input = b'\x01\x02\x03\x04\x05\x06\x07\x08'
    s.read_cursor = -1
    with pytest.raises(SerializationError):
        s.read_uint64()

def test_read_uint64_non_bytes_input():
    # Test input is not bytes/bytearray (should raise SerializationError)
    s = BCDataStream()
    s.input = "not bytes"
    s.read_cursor = 0
    with pytest.raises(SerializationError):
        s.read_uint64()

def test_read_uint64_none_input():
    # Test input is None (should raise SerializationError)
    s = BCDataStream()
    s.input = None
    s.read_cursor = 0
    with pytest.raises(SerializationError):
        s.read_uint64()

def test_read_uint64_cursor_in_middle():
    # Test reading from a non-zero cursor
    s = BCDataStream()
    s.input = b'\x00\x00\x00\x00' + b'\x11\x22\x33\x44\x55\x66\x77\x88'
    s.read_cursor = 4
    codeflash_output = s.read_uint64() # 2.57μs -> 1.93μs (33.6% faster)

def test_read_uint64_multiple_reads():
    # Test multiple reads in sequence with correct results and cursor
    s = BCDataStream()
    s.input = (
        b'\x01\x00\x00\x00\x00\x00\x00\x00'
        b'\x02\x00\x00\x00\x00\x00\x00\x00'
        b'\x03\x00\x00\x00\x00\x00\x00\x00'
    )
    s.read_cursor = 0
    codeflash_output = s.read_uint64() # 1.79μs -> 1.28μs (40.4% faster)
    codeflash_output = s.read_uint64() # 637ns -> 505ns (26.1% faster)
    codeflash_output = s.read_uint64() # 509ns -> 336ns (51.5% faster)

# ----------- LARGE SCALE TEST CASES -----------

def test_read_uint64_large_data():
    # Test reading many uint64s in a row from a large buffer
    count = 1000
    # Each value is i, packed as little-endian uint64
    data = b''.join(struct.pack('<Q', i) for i in range(count))
    s = BCDataStream()
    s.input = data
    s.read_cursor = 0
    for i in range(count):
        codeflash_output = s.read_uint64() # 442μs -> 330μs (33.6% faster)

def test_read_uint64_large_value_at_end():
    # Test reading a large value at the end of a large buffer
    count = 999
    data = b''.join(struct.pack('<Q', i) for i in range(count))
    # Append a very large value at the end
    large_value = 0xfedcba9876543210
    data += struct.pack('<Q', large_value)
    s = BCDataStream()
    s.input = data
    s.read_cursor = count*8
    codeflash_output = s.read_uint64() # 1.78μs -> 1.18μs (50.6% faster)

def test_read_uint64_performance_large_buffer():
    # Test that reading from a large buffer does not throw or hang
    count = 1000
    # Use max uint64 values
    data = b''.join(struct.pack('<Q', 0xffffffffffffffff) for _ in range(count))
    s = BCDataStream()
    s.input = data
    s.read_cursor = 0
    for i in range(count):
        codeflash_output = s.read_uint64() # 457μs -> 334μs (36.6% faster)

def test_read_uint64_large_buffer_incomplete_final():
    # Test that an incomplete uint64 at the end raises SerializationError
    count = 999
    data = b''.join(struct.pack('<Q', i) for i in range(count))
    # Add 4 extra bytes (not enough for a full uint64)
    data += b'\x01\x02\x03\x04'
    s = BCDataStream()
    s.input = data
    s.read_cursor = count*8
    with pytest.raises(SerializationError):
        s.read_uint64()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import struct

# imports
import pytest
from electrum.transaction import BCDataStream


# function to test
class SerializationError(Exception):
    pass

# unit tests

# ----------------------
# BASIC TEST CASES
# ----------------------

def test_read_uint64_basic_zero():
    # Test reading 0 from little-endian bytes
    stream = BCDataStream()
    stream.input = b'\x00\x00\x00\x00\x00\x00\x00\x00'
    codeflash_output = stream.read_uint64() # 2.62μs -> 1.88μs (39.1% faster)

def test_read_uint64_basic_max():
    # Test reading max uint64 (2^64-1)
    stream = BCDataStream()
    stream.input = b'\xff\xff\xff\xff\xff\xff\xff\xff'
    codeflash_output = stream.read_uint64() # 1.80μs -> 1.39μs (29.8% faster)

def test_read_uint64_basic_middle_value():
    # Test reading a typical middle value
    value = 0x0123456789abcdef
    stream = BCDataStream()
    stream.input = value.to_bytes(8, 'little')
    codeflash_output = stream.read_uint64() # 1.65μs -> 1.25μs (31.7% faster)

def test_read_uint64_sequential_reads():
    # Test reading two uint64s sequentially
    v1 = 1234567890123456789
    v2 = 9876543210987654321
    stream = BCDataStream()
    stream.input = v1.to_bytes(8, 'little') + v2.to_bytes(8, 'little')
    codeflash_output = stream.read_uint64() # 1.75μs -> 1.13μs (54.6% faster)
    codeflash_output = stream.read_uint64() # 719ns -> 511ns (40.7% faster)

# ----------------------
# EDGE TEST CASES
# ----------------------

def test_read_uint64_empty_stream():
    # Test reading from an empty stream should raise SerializationError
    stream = BCDataStream()
    stream.input = b''
    with pytest.raises(SerializationError):
        stream.read_uint64()

def test_read_uint64_short_stream():
    # Test reading from a stream that's too short (less than 8 bytes)
    stream = BCDataStream()
    stream.input = b'\x01\x02\x03'
    with pytest.raises(SerializationError):
        stream.read_uint64()

def test_read_uint64_cursor_offset():
    # Test reading from a non-zero cursor position
    value1 = 0xdeadbeefdeadbeef
    value2 = 0xabadcafeabadcafe
    stream = BCDataStream()
    stream.input = value1.to_bytes(8, 'little') + value2.to_bytes(8, 'little')
    stream.read_cursor = 8  # skip first value
    codeflash_output = stream.read_uint64() # 2.65μs -> 1.99μs (33.4% faster)

def test_read_uint64_cursor_past_end():
    # Test reading when cursor is already past the end
    stream = BCDataStream()
    stream.input = b'\x00' * 8
    stream.read_cursor = 9
    with pytest.raises(SerializationError):
        stream.read_uint64()

def test_read_uint64_partial_read():
    # Test reading when only partial bytes remain
    stream = BCDataStream()
    stream.input = b'\x01\x02\x03\x04\x05\x06\x07'
    with pytest.raises(SerializationError):
        stream.read_uint64()

def test_read_uint64_non_bytes_input():
    # Test with input set to None
    stream = BCDataStream()
    stream.input = None
    with pytest.raises(SerializationError):
        stream.read_uint64()

def test_read_uint64_non_bytearray_input():
    # Test with input set to a bytearray (should work)
    value = 0x0102030405060708
    stream = BCDataStream()
    stream.input = bytearray(value.to_bytes(8, 'little'))
    codeflash_output = stream.read_uint64() # 2.58μs -> 2.03μs (27.0% faster)

# ----------------------
# LARGE SCALE TEST CASES
# ----------------------

def test_read_uint64_large_stream_multiple_reads():
    # Test reading 1000 uint64 values in a row
    count = 1000
    values = [i * 123456789 for i in range(count)]
    data = b''.join(v.to_bytes(8, 'little') for v in values)
    stream = BCDataStream()
    stream.input = data
    for i in range(count):
        codeflash_output = stream.read_uint64() # 448μs -> 331μs (35.1% faster)

def test_read_uint64_large_stream_with_partial_at_end():
    # Test reading 999 valid uint64s and 1 partial at the end
    count = 999
    values = [i for i in range(count)]
    data = b''.join(v.to_bytes(8, 'little') for v in values) + b'\x01\x02'
    stream = BCDataStream()
    stream.input = data
    for i in range(count):
        codeflash_output = stream.read_uint64()
    # Now only 2 bytes remain, should raise
    with pytest.raises(SerializationError):
        stream.read_uint64()

def test_read_uint64_performance_reasonable():
    # Test that reading 1000 values is not excessively slow (basic sanity)
    import time
    count = 1000
    values = [i for i in range(count)]
    data = b''.join(v.to_bytes(8, 'little') for v in values)
    stream = BCDataStream()
    stream.input = data
    start = time.time()
    for _ in range(count):
        stream.read_uint64() # 449μs -> 331μs (35.5% faster)
    elapsed = time.time() - start

# ----------------------
# ADDITIONAL EDGE CASES
# ----------------------

def test_read_uint64_input_mutation():
    # Test that mutating input after reading doesn't affect already-read values
    value1 = 0x1122334455667788
    value2 = 0x99aabbccddeeff00
    data = bytearray(value1.to_bytes(8, 'little') + value2.to_bytes(8, 'little'))
    stream = BCDataStream()
    stream.input = data
    codeflash_output = stream.read_uint64() # 2.24μs -> 1.61μs (38.8% faster)
    # Mutate input
    stream.input[8] = 0x00
    # Now value2 should be changed
    changed_value2 = int.from_bytes(bytes([0x00]) + value2.to_bytes(8, 'little')[1:], 'little')
    codeflash_output = stream.read_uint64() # 770ns -> 549ns (40.3% faster)

def test_read_uint64_reset_cursor_and_reread():
    # Test that resetting cursor allows rereading the same value
    value = 0x123456789abcdef0
    stream = BCDataStream()
    stream.input = value.to_bytes(8, 'little')
    codeflash_output = stream.read_uint64() # 1.53μs -> 1.10μs (39.3% faster)
    stream.read_cursor = 0
    codeflash_output = stream.read_uint64() # 588ns -> 440ns (33.6% faster)

def test_read_uint64_struct_error_wrapping():
    # Test that a struct.error is wrapped as SerializationError
    stream = BCDataStream()
    stream.input = b'\x01\x02'
    with pytest.raises(SerializationError) as excinfo:
        stream.read_uint64()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from electrum.transaction import BCDataStream
import pytest

def test_BCDataStream_read_uint64():
    with pytest.raises(SerializationError, match="a\\ bytes\\-like\\ object\\ is\\ required,\\ not\\ 'NoneType'"):
        BCDataStream.read_uint64(BCDataStream())

To edit these changes git checkout codeflash/optimize-BCDataStream.read_uint64-mhol6i06 and push.

Codeflash Static Badge

The optimization **inlines the `read_uint64` method** to eliminate function call overhead and avoid redundant format string operations. Instead of calling the generic `_read_num('<Q')` method, the optimized version directly implements the uint64 reading logic within `read_uint64`.

**Key changes:**
- **Eliminates function call overhead**: Removes the method call to `_read_num`, reducing Python's function call stack overhead
- **Replaces `struct.calcsize('<Q')` with constant `8`**: The optimized version uses the hardcoded value `8` instead of calling `struct.calcsize('<Q')` which must parse the format string and calculate the size at runtime
- **Direct format string usage**: Uses `'<Q'` directly in `struct.unpack_from` rather than passing it through a parameter

**Performance impact:**
The line profiler shows the optimization reduces total execution time from 20.13ms to 5.96ms (70% reduction in the profiled section). The cursor increment operation becomes significantly faster (from 393.5ns to 324.3ns per hit) by avoiding the `struct.calcsize` call. The `struct.unpack_from` call itself shows slight improvement due to reduced function call overhead.

**Test case performance:**
All test cases show consistent 26-55% speedup, with larger improvements in scenarios involving multiple sequential reads (like the large-scale tests processing 1000 uint64 values). This suggests the optimization is particularly beneficial for Bitcoin transaction parsing workloads that process many consecutive uint64 values.

The optimization maintains identical behavior and error handling while providing substantial performance gains for a commonly used deserialization operation in Bitcoin data processing.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 7, 2025 08:21
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant