-
Notifications
You must be signed in to change notification settings - Fork 102
⚡ Improve Engine Performance and Implementation
#578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
shaneahmed
wants to merge
173
commits into
develop
Choose a base branch
from
dev-define-engines-abc
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Use `pyproject.toml` for `bdist_wheel` configuration
- Improve `Engines` performance and implementation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #578 +/- ##
===========================================
- Coverage 99.27% 94.85% -4.43%
===========================================
Files 71 75 +4
Lines 9161 9477 +316
Branches 1195 1238 +43
===========================================
- Hits 9095 8989 -106
- Misses 40 452 +412
- Partials 26 36 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Refactor engines_abc.py
Engines Performance and ImplementationEngine Performance and Implementation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
# Conflicts: # tiatoolbox/utils/misc.py
# Conflicts: # tests/models/test_feature_extractor.py # tiatoolbox/models/models_abc.py
# Conflicts: # tiatoolbox/cli/common.py # tiatoolbox/cli/nucleus_instance_segment.py # tiatoolbox/cli/patch_predictor.py # tiatoolbox/models/engine/semantic_segmentor.py
* ⚡ Make WSIPatchDataset Pickleable to Support Windows Multithreading (#947) This PR makes the WSIPatchDataset class picklable by delaying the creation of the reader object until the first call to `__getitem__`. This enables the use of multiple loader workers on Windows without errors and provides significant performance improvements. - Delays reader object instantiation to the first `__getitem__` call instead of during initialization - Extracts reader creation logic into a separate `_get_reader` method - Stores image path and mode as instance variables for lazy initialization Speedup for the WSI prediction cell of the patch_prediction example notebook: 2min 48 sec with 0 loader workers -> 1min 13 sec with 4 workers. Note: this PR doesn't have any effect for Linux as the multi-threading already works fine there because Linux multithreading doesn't require things to be pickleable * 🔀 Merge branch develop into dev-engine-abc * 🐛 Fix reader_info read --------- Co-authored-by: Mark Eastwood <20169086+measty@users.noreply.github.com>
# Conflicts: # tiatoolbox/models/dataset/classification.py
# Conflicts: # tests/models/test_patch_predictor.py
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
# Conflicts: # tests/models/test_feature_extractor.py # tests/models/test_multi_task_segmentor.py # tests/models/test_nucleus_instance_segmentor.py # tests/models/test_patch_predictor.py # tests/models/test_semantic_segmentation.py # tiatoolbox/models/architecture/__init__.py
## Summary of Changes ### Major Additions - **Dask Integration:** - Added `dask` as a dependency and integrated Dask arrays and lazy computation throughout the engine and patch predictor code. - Added Dask-based merging, chunking, and memory-aware processing for large images and WSIs. - **Zarr Output Support:** - Added support for saving model predictions and intermediate results directly to Zarr format. - New CLI options and internal logic for Zarr output, including memory thresholding and chunked writes. - **SemanticSegmentor Engine:** - Added a new `SemanticSegmentor` engine with Dask/Zarr support and new test coverage (`test_semantic_segmentor.py`). - Added CLI entrypoint for `semantic_segmentor` and removed the old `semantic_segment` CLI. - **Enhanced CLI and Config:** - Added CLI options for memory threshold, unified worker options, and improved mask handling. - Updated YAML configs and sample data for new models and test images. - **Utilities and Validation:** - Added utility functions for minimal dtype casting, patch/stride validation, and improved error handling (e.g., `DimensionMismatchError`). - Improved annotation store conversion for Dask arrays and Zarr-backed outputs. - **Changes to `kwarg`** - Add `memory-threshold` - Unified `num-loader-workers` and `num-postproc-workers` into `num-workers` - Removed `cache_mode` as cache mode is automatically handled. --- ### Major Removals/Refactors - **Removed Old CLI and Redundant Code:** - Deleted the old `semantic_segment.py` CLI and replaced it with `semantic_segmentor.py`. - Removed legacy cache mode and patch prediction Zarr store tests. - **Refactored Model and Dataset APIs:** - Unified and simplified model inference APIs to always return arrays (not dicts) for batch outputs. - Refactored dataset classes to enforce patch shape validation and remove legacy “mode” logic. - **Test Cleanup:** - Removed or updated tests that relied on old APIs or cache mode. - Refactored test assertions for new output types and Dask array handling. - **API Consistency:** - Standardized function and argument names across engines, CLI, and utility modules. - Updated docstrings and type hints for clarity and consistency. --- ### Notable File Changes - **New:** - `tiatoolbox/cli/semantic_segmentor.py` - `tests/engines/test_semantic_segmentor.py` - **Removed:** - `tiatoolbox/cli/semantic_segment.py` - Old cache mode and patch Zarr store tests - **Heavily Modified:** - `engine_abc.py`, `patch_predictor.py`, `semantic_segmentor.py` - CLI modules and test suites - Dataset and utility modules for Dask/Zarr compatibility --- ### Impact - Enables scalable, parallel, and memory-efficient inference and output saving for large images. - Simplifies downstream analysis by supporting Zarr as a native output format. - Lays the groundwork for further Dask-based optimizations in TIAToolbox. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
## 🚀Summary This PR introduces a new **[GrandQC Tissue Detection Model](https://github.com/cpath-ukk/grandqc/tree/main)** for digital pathology quality control and integrates **EfficientNet-based encoder architecture** into the TIAToolbox framework. --- ## ✨Key Changes - **New Model Architecture** - Added `grandqc.py` implementing a UNet++ decoder with EfficientNet encoder for tissue segmentation. - Includes preprocessing (JPEG compression + ImageNet normalization), postprocessing (argmin-based mask generation), and batch inference utilities. - **EfficientNet Encoder** - Added `timm_efficientnet.py` providing configurable EfficientNet encoders with dilation support and custom input channels. - **Pretrained Model Config** - Updated `pretrained_model.yaml` to register `grandqc_tissue_detection_mpp10` with associated IO configuration. - Corrected `IOSegmentorConfig` references and adjusted resolutions for SCCNN models. - **Testing** - Added comprehensive unit tests for: - `GrandQCModel` functionality, preprocessing/postprocessing, and decoder blocks. - EfficientNet encoder utilities and scaling logic. ## Impact - Enables high-resolution tissue detection for WSI quality control using state-of-the-art architectures. - Improves flexibility for segmentation tasks with EfficientNet encoders. - Enhances code quality and consistency through updated linting and formatting tools. ## Tasks - [x] Re-host GrandQC model weights on TIA Hugging Face - [x] Update `pretrained_model.yaml` - [x] Update `requirements.txt` - [x] Define GrandQC model architecture - [x] Add example usage - [x] Remove segmentation-models-pytorch dependency - [x] Wait for response from GrandQC authors - [x] Add tests - [x] Tidy up --------- Co-authored-by: Shan E Ahmed Raza <13048456+shaneahmed@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enginesperformance and implementationmypyType Checks forcli/common.pyPatchPredictorEngine based onEngineABCreturn_probabilitiesoption to Paramsmerge_predictionsoption inPatchPredictorengine.post_process_cache_modewhich allows running the algorithm onWSIinfer_wsifor WSI inferencesave_wsi_outputas this is not required after post processing.merge_predictionsand fixes docstring in EngineABCRunParamscompile_modelis now moved to EngineABC init_calculate_scale_factorclass_dictdefinition._get_zarr_arrayis now a public functionget_zarr_arrayinmiscpatch_predictions_as_annotationsruns the loop onpatch_coordsinstead ofclass_probs