Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 5, 2025

📄 27% (0.27x) speedup for BCDataStream.write_int32 in electrum/transaction.py

⏱️ Runtime : 3.13 milliseconds 2.47 milliseconds (best of 114 runs)

📝 Explanation and details

The optimization achieves a 26% speedup by inlining the _write_num method directly into write_int32, eliminating the function call overhead and redundant operations.

Key optimizations applied:

  1. Function call elimination: The original code called self._write_num('<i', val) which added function call overhead. The optimized version directly implements the struct packing and buffer operations in write_int32.

  2. Removed redundant assertion: The original _write_num included an assert isinstance(s, (bytes, bytearray)) check that's unnecessary since struct.pack() always returns bytes.

  3. Direct buffer manipulation: Both versions use the same efficient buffer operations (bytearray(s) for initialization and inp.extend(s) for appending), but the optimized version accesses them directly without the indirection of a method call.

Performance impact analysis:

  • The line profiler shows the original write_int32 spent 100% of its time just making the function call to _write_num
  • The optimized version eliminates this overhead, with time distributed across the actual operations: struct packing (30.4%), buffer access (19.7%), conditional check (21%), and buffer operations (28.5%)
  • All test cases show consistent 10-35% speedups across different scenarios

Workload benefits:
This optimization is particularly effective for:

  • High-frequency serialization: Bitcoin transaction processing involves repeated int32 serialization for amounts, timestamps, and counters
  • Bulk operations: The large-scale tests show 25-28% improvements when writing 1000+ values sequentially
  • Memory-constrained scenarios: The optimization maintains the same efficient memory usage patterns while reducing CPU overhead

The optimization preserves all original behavior and error handling while providing significant performance gains for this hot-path serialization method.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 9096 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import struct

# imports
import pytest
from electrum.transaction import BCDataStream

# unit tests

# -------- Basic Test Cases --------

def test_write_int32_basic_positive():
    """Test writing a basic positive int32 value."""
    stream = BCDataStream()
    stream.write_int32(42) # 2.05μs -> 1.68μs (21.6% faster)

def test_write_int32_basic_negative():
    """Test writing a basic negative int32 value."""
    stream = BCDataStream()
    stream.write_int32(-42) # 1.42μs -> 1.27μs (11.6% faster)

def test_write_int32_basic_zero():
    """Test writing zero."""
    stream = BCDataStream()
    stream.write_int32(0) # 1.40μs -> 1.13μs (24.6% faster)

def test_write_int32_multiple_calls():
    """Test writing multiple int32 values in sequence."""
    stream = BCDataStream()
    stream.write_int32(1) # 1.43μs -> 1.10μs (29.4% faster)
    stream.write_int32(2) # 779ns -> 697ns (11.8% faster)
    stream.write_int32(-3) # 476ns -> 397ns (19.9% faster)
    # struct.pack('<i', 1) == b'\x01\x00\x00\x00'
    # struct.pack('<i', 2) == b'\x02\x00\x00\x00'
    # struct.pack('<i', -3) == b'\xfd\xff\xff\xff'
    expected = bytearray(b'\x01\x00\x00\x00\x02\x00\x00\x00\xfd\xff\xff\xff')

def test_write_int32_return_value():
    """Test that write_int32 returns None."""
    stream = BCDataStream()
    codeflash_output = stream.write_int32(123); ret = codeflash_output # 1.23μs -> 963ns (27.7% faster)

# -------- Edge Test Cases --------

def test_write_int32_minimum_value():
    """Test writing the minimum int32 value."""
    stream = BCDataStream()
    min_val = -2**31
    stream.write_int32(min_val) # 1.28μs -> 1.02μs (25.3% faster)

def test_write_int32_maximum_value():
    """Test writing the maximum int32 value."""
    stream = BCDataStream()
    max_val = 2**31 - 1
    stream.write_int32(max_val) # 1.35μs -> 995ns (35.3% faster)

def test_write_int32_overflow_positive():
    """Test that writing a value too large for int32 raises struct.error."""
    stream = BCDataStream()
    with pytest.raises(struct.error):
        stream.write_int32(2**31) # 2.54μs -> 2.31μs (10.0% faster)

def test_write_int32_overflow_negative():
    """Test that writing a value too small for int32 raises struct.error."""
    stream = BCDataStream()
    with pytest.raises(struct.error):
        stream.write_int32(-2**31 - 1) # 2.00μs -> 1.69μs (18.8% faster)

@pytest.mark.parametrize("val", [
    0, 1, -1, 123456, -123456, 2**31-1, -2**31
])
def test_write_int32_idempotence(val):
    """Test that writing the same value twice results in two identical 4-byte sequences."""
    stream = BCDataStream()
    stream.write_int32(val) # 9.70μs -> 7.56μs (28.3% faster)
    stream.write_int32(val) # 4.78μs -> 3.97μs (20.3% faster)
    packed = struct.pack('<i', val)

def test_write_int32_non_integer_input():
    """Test that non-integer input raises TypeError or struct.error."""
    stream = BCDataStream()
    # float
    with pytest.raises(struct.error):
        stream.write_int32(3.14)
    # string
    with pytest.raises(TypeError):
        stream.write_int32("100")
    # None
    with pytest.raises(TypeError):
        stream.write_int32(None)
    # list
    with pytest.raises(TypeError):
        stream.write_int32([1,2,3])

def test_write_int32_mutates_input():
    """Test that .input is a bytearray and is mutated, not replaced, after first write."""
    stream = BCDataStream()
    stream.write_int32(1) # 2.52μs -> 2.03μs (24.3% faster)
    first_input = stream.input
    stream.write_int32(2) # 814ns -> 709ns (14.8% faster)

# -------- Large Scale Test Cases --------

def test_write_int32_large_scale_sequential():
    """Test writing a large number of sequential int32 values."""
    stream = BCDataStream()
    N = 1000
    for i in range(N):
        stream.write_int32(i) # 336μs -> 267μs (26.1% faster)
    # Check a few random positions for correctness
    for idx in [0, 1, 10, 100, 999]:
        start = idx * 4
        expected = struct.pack('<i', idx)

def test_write_int32_large_scale_negative():
    """Test writing a large number of negative int32 values."""
    stream = BCDataStream()
    N = 1000
    for i in range(N):
        stream.write_int32(-i) # 338μs -> 267μs (26.5% faster)
    # Check a few random positions for correctness
    for idx in [0, 1, 10, 100, 999]:
        start = idx * 4
        expected = struct.pack('<i', -idx)

def test_write_int32_large_scale_pattern():
    """Test writing a repeating pattern of int32 values."""
    stream = BCDataStream()
    pattern = [0, 2**31-1, -2**31, -1, 123456789]
    N = 200  # 5*200 = 1000 writes
    for _ in range(N):
        for val in pattern:
            stream.write_int32(val)
    # Check the first 5 values
    for i, val in enumerate(pattern):
        start = i * 4
        expected = struct.pack('<i', val)
    # Check the last 5 values
    for i, val in enumerate(pattern):
        start = (5*N - 5 + i) * 4
        expected = struct.pack('<i', val)

def test_write_int32_large_scale_memory_efficiency():
    """Test that memory usage does not explode for large but reasonable input."""
    import sys
    stream = BCDataStream()
    N = 1000
    for i in range(N):
        stream.write_int32(i) # 337μs -> 265μs (27.3% faster)
    # The memory size should be reasonable (not much more than the data itself)
    mem_size = sys.getsizeof(stream.input)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import struct  # used for packing integers

# imports
import pytest  # used for our unit tests
from electrum.transaction import BCDataStream

# unit tests

# --- BASIC TEST CASES ---

def test_write_int32_basic_positive():
    """Test writing a simple positive int32 value."""
    stream = BCDataStream()
    stream.write_int32(1) # 2.06μs -> 1.60μs (28.7% faster)

def test_write_int32_basic_negative():
    """Test writing a simple negative int32 value."""
    stream = BCDataStream()
    stream.write_int32(-1) # 1.50μs -> 1.16μs (29.4% faster)

def test_write_int32_basic_zero():
    """Test writing zero."""
    stream = BCDataStream()
    stream.write_int32(0) # 1.42μs -> 1.12μs (26.9% faster)

def test_write_int32_basic_multiple_writes():
    """Test writing multiple int32 values sequentially."""
    stream = BCDataStream()
    stream.write_int32(1) # 1.37μs -> 1.04μs (32.0% faster)
    stream.write_int32(2) # 795ns -> 657ns (21.0% faster)
    # struct.pack('<i', 1) + struct.pack('<i', 2)
    expected = bytearray(struct.pack('<i', 1) + struct.pack('<i', 2))

# --- EDGE TEST CASES ---

def test_write_int32_edge_max_int32():
    """Test writing the maximum int32 value."""
    stream = BCDataStream()
    max_int32 = 2**31 - 1
    stream.write_int32(max_int32) # 1.32μs -> 988ns (33.8% faster)
    expected = bytearray(struct.pack('<i', max_int32))

def test_write_int32_edge_min_int32():
    """Test writing the minimum int32 value."""
    stream = BCDataStream()
    min_int32 = -2**31
    stream.write_int32(min_int32) # 1.30μs -> 1.02μs (27.9% faster)
    expected = bytearray(struct.pack('<i', min_int32))

def test_write_int32_edge_overflow_positive():
    """Test writing a value just above int32 range (should raise struct.error)."""
    stream = BCDataStream()
    with pytest.raises(struct.error):
        stream.write_int32(2**31) # 2.64μs -> 2.46μs (7.06% faster)

def test_write_int32_edge_overflow_negative():
    """Test writing a value just below int32 range (should raise struct.error)."""
    stream = BCDataStream()
    with pytest.raises(struct.error):
        stream.write_int32(-2**31 - 1) # 1.95μs -> 1.71μs (13.9% faster)

def test_write_int32_edge_non_integer():
    """Test writing a non-integer value (should raise struct.error)."""
    stream = BCDataStream()
    with pytest.raises(struct.error):
        stream.write_int32(1.5) # 1.30μs -> 1.13μs (15.0% faster)

def test_write_int32_edge_string():
    """Test writing a string (should raise struct.error)."""
    stream = BCDataStream()
    with pytest.raises(struct.error):
        stream.write_int32("123") # 1.25μs -> 1.04μs (19.9% faster)

def test_write_int32_edge_none():
    """Test writing None (should raise struct.error)."""
    stream = BCDataStream()
    with pytest.raises(struct.error):
        stream.write_int32(None) # 1.21μs -> 1.00μs (20.6% faster)

def test_write_int32_edge_bool_true():
    """Test writing True (should be treated as 1)."""
    stream = BCDataStream()
    stream.write_int32(True) # 1.69μs -> 1.31μs (28.8% faster)
    expected = bytearray(struct.pack('<i', 1))

def test_write_int32_edge_bool_false():
    """Test writing False (should be treated as 0)."""
    stream = BCDataStream()
    stream.write_int32(False) # 1.41μs -> 1.10μs (28.3% faster)
    expected = bytearray(struct.pack('<i', 0))

def test_write_int32_edge_input_already_initialized():
    """Test writing when input is already a bytearray."""
    stream = BCDataStream()
    stream.input = bytearray(b'abc')
    stream.write_int32(42) # 1.38μs -> 1.11μs (24.3% faster)
    expected = bytearray(b'abc' + struct.pack('<i', 42))

def test_write_int32_edge_input_is_empty_bytearray():
    """Test writing when input is an empty bytearray."""
    stream = BCDataStream()
    stream.input = bytearray()
    stream.write_int32(99) # 1.33μs -> 957ns (38.9% faster)
    expected = bytearray(struct.pack('<i', 99))

# --- LARGE SCALE TEST CASES ---

def test_write_int32_large_scale_many_writes():
    """Test writing a large number of int32 values sequentially."""
    stream = BCDataStream()
    values = list(range(1000))  # 0 to 999
    for v in values:
        stream.write_int32(v) # 343μs -> 269μs (27.7% faster)
    # The expected result is the concatenation of all packed int32s
    expected = bytearray(b''.join([struct.pack('<i', v) for v in values]))

def test_write_int32_large_scale_alternating_signs():
    """Test writing alternating positive and negative int32 values."""
    stream = BCDataStream()
    values = [i if i % 2 == 0 else -i for i in range(1000)]
    for v in values:
        stream.write_int32(v) # 345μs -> 273μs (26.4% faster)
    expected = bytearray(b''.join([struct.pack('<i', v) for v in values]))

def test_write_int32_large_scale_max_min():
    """Test writing max and min int32 values repeatedly."""
    stream = BCDataStream()
    values = [2**31 - 1, -2**31] * 500  # 1000 elements
    for v in values:
        stream.write_int32(v) # 340μs -> 270μs (26.1% faster)
    expected = bytearray(b''.join([struct.pack('<i', v) for v in values]))

def test_write_int32_large_scale_performance():
    """Test that writing 1000 int32s does not take excessive time or memory."""
    import time
    stream = BCDataStream()
    values = [i for i in range(1000)]
    start = time.time()
    for v in values:
        stream.write_int32(v) # 341μs -> 265μs (28.4% faster)
    duration = time.time() - start

def test_write_int32_large_scale_multiple_streams():
    """Test writing to multiple BCDataStream instances independently."""
    streams = [BCDataStream() for _ in range(10)]
    for i, stream in enumerate(streams):
        for v in range(i * 100, (i + 1) * 100):
            stream.write_int32(v)
        expected = bytearray(b''.join([struct.pack('<i', v) for v in range(i * 100, (i + 1) * 100)]))
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-BCDataStream.write_int32-mhmi39uo and push.

Codeflash Static Badge

The optimization achieves a **26% speedup** by **inlining the `_write_num` method directly into `write_int32`**, eliminating the function call overhead and redundant operations.

**Key optimizations applied:**

1. **Function call elimination**: The original code called `self._write_num('<i', val)` which added function call overhead. The optimized version directly implements the struct packing and buffer operations in `write_int32`.

2. **Removed redundant assertion**: The original `_write_num` included an `assert isinstance(s, (bytes, bytearray))` check that's unnecessary since `struct.pack()` always returns bytes.

3. **Direct buffer manipulation**: Both versions use the same efficient buffer operations (`bytearray(s)` for initialization and `inp.extend(s)` for appending), but the optimized version accesses them directly without the indirection of a method call.

**Performance impact analysis:**
- The line profiler shows the original `write_int32` spent 100% of its time just making the function call to `_write_num`
- The optimized version eliminates this overhead, with time distributed across the actual operations: struct packing (30.4%), buffer access (19.7%), conditional check (21%), and buffer operations (28.5%)
- All test cases show consistent 10-35% speedups across different scenarios

**Workload benefits:**
This optimization is particularly effective for:
- **High-frequency serialization**: Bitcoin transaction processing involves repeated int32 serialization for amounts, timestamps, and counters
- **Bulk operations**: The large-scale tests show 25-28% improvements when writing 1000+ values sequentially
- **Memory-constrained scenarios**: The optimization maintains the same efficient memory usage patterns while reducing CPU overhead

The optimization preserves all original behavior and error handling while providing significant performance gains for this hot-path serialization method.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 5, 2025 21:19
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant