-
Notifications
You must be signed in to change notification settings - Fork 443
WhisperTT evals #899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WhisperTT evals #899
Conversation
- Added `openslr_librispeech_other.yaml` and `openslr_librispeech.yaml` configuration files for task definitions. - Implemented utility functions in `utils.py` for processing audio and text documents. - Created `basic.py`, `english.py`, and `english.json` for English text normalization, including handling of spelling variations and number normalization. - Enhanced the whisper normalizer with new functionalities for both Chinese and English text processing.
- Changed dataset_path to 'parquet' and updated dataset_kwargs to include a specific data file URL. - Modified test_split from 'test' to 'train' and set dataset_name to 'null' for task definition adjustments.
- Enhanced the `import_function` to first attempt a relative file import and fallback to an absolute module import if the relative path does not exist. - Improved error handling by re-raising import errors with context for better debugging. - Removed unused `openslr_librispeech/_default_yaml_template` and related whisper normalizer files to streamline the codebase.
…ield names - Updated the `librispeech_process_result` function to handle both "gt" and "transcript" as valid keys for ground truth in documents, improving compatibility with different LibriSpeech datasets. - Added error handling to raise a KeyError if neither field is found, providing clearer feedback on document structure.
…fields - Updated the `librispeech_process_result` function to safely retrieve the "source" field, defaulting to "unknown" if not present. - Added logic to infer the "task" field from context, defaulting to "asr_en" for LibriSpeech datasets when not explicitly provided.
…eld names
- Updated the `librispeech_doc_to_audio` function to check for various field names ("audio", "file", "path", "audio_path") in the document, improving compatibility with different LibriSpeech datasets.
- Added error handling to raise a KeyError if no valid audio field is found, providing clearer feedback on document structure.
…ionality - Simplified the `librispeech_doc_to_audio` function to directly return the "audio" field, removing unnecessary checks. - Streamlined the `librispeech_process_result` function to directly access "gt", "source", and "task" fields without additional error handling, assuming their presence. - Added a new `librispeech_doc_to_target` function to return the ground truth from the document, enhancing modularity.
…dularity
- Updated the `openasr_doc_to_audio` function to handle multiple audio field names ("audio", "file", "path", "audio_path"), enhancing compatibility with various datasets.
- Introduced a new `openasr_doc_to_target` function to normalize the retrieval of ground truth fields ("text", "transcript", "gt"), improving modularity and error handling in the `openasr_process_result` function.
…sper model - Updated the `warmup_model` function to create a mesh device instead of a single device, enabling compatibility with the mesh-enabled Whisper model. - Added logging to indicate the creation and successful warming up of the Whisper model.
- Updated the `warmup_model` function to streamline the creation of the mesh device by removing unnecessary parameters, enhancing code clarity and maintainability.
- Refactored the WhisperTT class to utilize HTTP calls to the tt-media-server for audio transcription, allowing evaluations to run outside of Docker. - Added methods for encoding audio to base64 and transcribing audio via the API. - Updated model initialization to include parameters for base URL, timeout, and retries, enhancing flexibility and error handling.
- Added a new parameter `num_concurrent` to the WhisperTT class for improved concurrency handling. - Updated the initialization method to log a warning for any unexpected keyword arguments instead of raising an assertion error, enhancing robustness and user feedback.
…TT class - Updated the audio array conversion to float32 to prevent "Unsupported bit depth: 64" errors when creating WAV files, ensuring compatibility with server requirements.
…-eval into ben/samt/whisper-tt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that in OpenASR hf dataset, the audio and answer key is just audio and text? Is it necessary to try fallback for this?
|
Hi, thanks for the contribution. I checked through your files and seems like you are using a very old version of lmms-eval. Is it possible to checkout from the main branch and try to make the edits on the newest main? Thanks! |
| dataset_path: parquet | ||
| dataset_kwargs: | ||
| data_files: | ||
| test: "https://huggingface.co/datasets/openslr/librispeech_asr/resolve/71cacbfb7e2354c4226d01e70d77d5fca3d04ba1/other/test/0000.parquet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be config as a load dataset path with load_dataset(openslr/librispeech_asr, "other", split="test") so it can be configured much nicer in the yaml file
| "pycocoevalcap", | ||
| "tqdm-multiprocess", | ||
| "transformers>=4.39.2", | ||
| "transformers==4.38.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May remove the pinning version of transformers as we are incorporate more models
Before you open a pull-request, please check if a similar issue already exists or has been closed before.
When you open a pull-request, please be sure to include the following
If you meet the lint warnings, you can use following scripts to reformat code.
Thank you for your contributions!