Skip to content

Conversation

@idjuricTT
Copy link

Before you open a pull-request, please check if a similar issue already exists or has been closed before.

When you open a pull-request, please be sure to include the following

  • A descriptive title: [xxx] XXXX
  • A detailed description

If you meet the lint warnings, you can use following scripts to reformat code.

pip install pre-commit
pre-commit install
pre-commit run --all-files

Thank you for your contributions!

bgoelTT and others added 28 commits April 11, 2025 13:55
- Added `openslr_librispeech_other.yaml` and `openslr_librispeech.yaml` configuration files for task definitions.
- Implemented utility functions in `utils.py` for processing audio and text documents.
- Created `basic.py`, `english.py`, and `english.json` for English text normalization, including handling of spelling variations and number normalization.
- Enhanced the whisper normalizer with new functionalities for both Chinese and English text processing.
- Changed dataset_path to 'parquet' and updated dataset_kwargs to include a specific data file URL.
- Modified test_split from 'test' to 'train' and set dataset_name to 'null' for task definition adjustments.
- Enhanced the `import_function` to first attempt a relative file import and fallback to an absolute module import if the relative path does not exist.
- Improved error handling by re-raising import errors with context for better debugging.
- Removed unused `openslr_librispeech/_default_yaml_template` and related whisper normalizer files to streamline the codebase.
…ield names

- Updated the `librispeech_process_result` function to handle both "gt" and "transcript" as valid keys for ground truth in documents, improving compatibility with different LibriSpeech datasets.
- Added error handling to raise a KeyError if neither field is found, providing clearer feedback on document structure.
…fields

- Updated the `librispeech_process_result` function to safely retrieve the "source" field, defaulting to "unknown" if not present.
- Added logic to infer the "task" field from context, defaulting to "asr_en" for LibriSpeech datasets when not explicitly provided.
…eld names

- Updated the `librispeech_doc_to_audio` function to check for various field names ("audio", "file", "path", "audio_path") in the document, improving compatibility with different LibriSpeech datasets.
- Added error handling to raise a KeyError if no valid audio field is found, providing clearer feedback on document structure.
…ionality

- Simplified the `librispeech_doc_to_audio` function to directly return the "audio" field, removing unnecessary checks.
- Streamlined the `librispeech_process_result` function to directly access "gt", "source", and "task" fields without additional error handling, assuming their presence.
- Added a new `librispeech_doc_to_target` function to return the ground truth from the document, enhancing modularity.
…dularity

- Updated the `openasr_doc_to_audio` function to handle multiple audio field names ("audio", "file", "path", "audio_path"), enhancing compatibility with various datasets.
- Introduced a new `openasr_doc_to_target` function to normalize the retrieval of ground truth fields ("text", "transcript", "gt"), improving modularity and error handling in the `openasr_process_result` function.
…sper model

- Updated the `warmup_model` function to create a mesh device instead of a single device, enabling compatibility with the mesh-enabled Whisper model.
- Added logging to indicate the creation and successful warming up of the Whisper model.
- Updated the `warmup_model` function to streamline the creation of the mesh device by removing unnecessary parameters, enhancing code clarity and maintainability.
- Refactored the WhisperTT class to utilize HTTP calls to the tt-media-server for audio transcription, allowing evaluations to run outside of Docker.
- Added methods for encoding audio to base64 and transcribing audio via the API.
- Updated model initialization to include parameters for base URL, timeout, and retries, enhancing flexibility and error handling.
- Added a new parameter `num_concurrent` to the WhisperTT class for improved concurrency handling.
- Updated the initialization method to log a warning for any unexpected keyword arguments instead of raising an assertion error, enhancing robustness and user feedback.
…TT class

- Updated the audio array conversion to float32 to prevent "Unsupported bit depth: 64" errors when creating WAV files, ensuring compatibility with server requirements.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that in OpenASR hf dataset, the audio and answer key is just audio and text? Is it necessary to try fallback for this?

@kcz358
Copy link
Collaborator

kcz358 commented Nov 17, 2025

Hi, thanks for the contribution. I checked through your files and seems like you are using a very old version of lmms-eval. Is it possible to checkout from the main branch and try to make the edits on the newest main? Thanks!

Comment on lines +1 to +4
dataset_path: parquet
dataset_kwargs:
data_files:
test: "https://huggingface.co/datasets/openslr/librispeech_asr/resolve/71cacbfb7e2354c4226d01e70d77d5fca3d04ba1/other/test/0000.parquet"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be config as a load dataset path with load_dataset(openslr/librispeech_asr, "other", split="test") so it can be configured much nicer in the yaml file

"pycocoevalcap",
"tqdm-multiprocess",
"transformers>=4.39.2",
"transformers==4.38.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May remove the pinning version of transformers as we are incorporate more models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants