Skip to content

Go Binding Does Not Support Incremental Processing #3504

@joshmux

Description

@joshmux

Summary

In the Go bindings, calling Process() more than once results in no segments being returned for all subsequent calls. After the first successful Process()NextSegment() cycle, any additional Process() invocation yields empty output.

This makes it impossible to perform incremental or streaming transcription.

Expected Behavior

  • Each call to Process() with new audio should produce new segments.
  • NextSegment() should return those new segments, and ideally preserve previously produced ones.
  • This matches how the core whisper.cpp context behaves when processing audio in multiple chunks.

Actual Behavior

  • The first call to Process() works correctly and NextSegment() returns segments.
  • Future calls to Process() produce zero segments, even when valid new audio is passed.
  • NextSegment() repeatedly returns no results for all subsequent Process() calls.

Why This Matters

This fully breaks incremental and streaming use cases, including:

  • Real-time / chunked audio processing
  • Live transcription
  • Processing long audio without loading the full file at once
  • Updating transcripts as new audio arrives

Users would expect the Go binding to match the C++ library’s ability to handle multiple processing passes.

Steps to Reproduce

  1. Initialize a whisper.Context.
  2. Call Process() with audio chunk A.
  3. Call NextSegment() → segments from chunk A are returned as expected.
  4. Call Process() again with audio chunk B.
  5. Call NextSegment()no segments are returned (output is empty).

Additional Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions