-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Open
Description
Summary
In the Go bindings, calling Process() more than once results in no segments being returned for all subsequent calls. After the first successful Process() → NextSegment() cycle, any additional Process() invocation yields empty output.
This makes it impossible to perform incremental or streaming transcription.
Expected Behavior
- Each call to
Process()with new audio should produce new segments. NextSegment()should return those new segments, and ideally preserve previously produced ones.- This matches how the core
whisper.cppcontext behaves when processing audio in multiple chunks.
Actual Behavior
- The first call to
Process()works correctly andNextSegment()returns segments. - Future calls to
Process()produce zero segments, even when valid new audio is passed. NextSegment()repeatedly returns no results for all subsequentProcess()calls.
Why This Matters
This fully breaks incremental and streaming use cases, including:
- Real-time / chunked audio processing
- Live transcription
- Processing long audio without loading the full file at once
- Updating transcripts as new audio arrives
Users would expect the Go binding to match the C++ library’s ability to handle multiple processing passes.
Steps to Reproduce
- Initialize a
whisper.Context. - Call
Process()with audio chunk A. - Call
NextSegment()→ segments from chunk A are returned as expected. - Call
Process()again with audio chunk B. - Call
NextSegment()→ no segments are returned (output is empty).
Additional Notes
- This appears related to internal state handling within the Go binding.
- A related fix attempt is in PR Allow NextSegment() to be called across multiple Process() calls in the Go binding #3503.
- The issue may stem from the binding not resetting or managing internal counters needed for multi-pass processing.
Metadata
Metadata
Assignees
Labels
No labels