Skip to content

Commit a30dd67

Browse files
authored
Merge pull request #168 from PytorchConnectomics/claude/codebase-review-refactor-plan-01Ke4BzX3gwYwecFjWbRsXS5
Analyze codebase and plan refactoring
2 parents 1014d8d + 43129a6 commit a30dd67

File tree

3 files changed

+1770
-61
lines changed

3 files changed

+1770
-61
lines changed

CLAUDE.md

Lines changed: 149 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -82,15 +82,15 @@ git clone https://github.com/MIC-DKFZ/MedNeXt.git
8282
cd MedNeXt
8383
pip install -e .
8484
```
85-
See [.claude/MEDNEXT.md](.claude/MEDNEXT.md) for detailed documentation.
85+
MedNeXt is an optional external package installed separately (see Installation section above).
8686

8787
### Verifying Installation
8888
```bash
8989
# Test import
9090
python -c "import connectomics; print(connectomics.__version__)"
9191

9292
# List available architectures
93-
python -c "from connectomics.models.architectures import print_available_architectures; print_available_architectures()"
93+
python -c "from connectomics.models.arch import print_available_architectures; print_available_architectures()"
9494
```
9595

9696
## Development Commands
@@ -133,52 +133,94 @@ python -m pytest tests/test_loss_functions.py
133133
## Current Package Structure
134134

135135
```
136-
connectomics/
137-
├── config/
138-
│ ├── hydra_config.py # Modern dataclass-based configs (PRIMARY)
139-
│ ├── hydra_utils.py # Config utilities (load, save, merge)
136+
connectomics/ # Main Python package (77 files, ~23K lines)
137+
├── config/ # Hydra/OmegaConf configuration system
138+
│ ├── hydra_config.py # Dataclass-based config definitions (PRIMARY)
139+
│ ├── hydra_utils.py # Config utilities (load, save, merge)
140140
│ └── __init__.py
141141
142-
├── models/
143-
│ ├── build.py # Model factory (registry-based)
144-
│ ├── architectures/ # Architecture registry and model wrappers
145-
│ │ ├── __init__.py # Public API
146-
│ │ ├── registry.py # Architecture registration system
147-
│ │ ├── base.py # Base model interface (ConnectomicsModel)
148-
│ │ ├── monai_models.py # MONAI model wrappers
149-
│ │ └── mednext_models.py # MedNeXt model wrappers
150-
│ ├── loss/ # Loss function implementations
151-
│ │ ├── build.py # Loss factory
152-
│ │ ├── losses.py # MONAI-based losses
153-
│ │ └── regularization.py
154-
│ └── solver/ # Optimizers and schedulers
155-
│ ├── build.py # Optimizer/scheduler factory
156-
│ └── lr_scheduler.py
142+
├── models/ # Model architectures and training components
143+
│ ├── build.py # Model factory (registry-based)
144+
│ ├── arch/ # Architecture registry and model wrappers
145+
│ │ ├── __init__.py # Public API and registration triggers
146+
│ │ ├── registry.py # Architecture registration system
147+
│ │ ├── base.py # Base model interface (ConnectomicsModel)
148+
│ │ ├── monai_models.py # MONAI model wrappers (4 architectures)
149+
│ │ ├── mednext_models.py # MedNeXt model wrappers (2 architectures)
150+
│ │ └── rsunet.py # RSUNet models (2 architectures)
151+
│ ├── loss/ # Loss function implementations
152+
│ │ ├── build.py # Loss factory (19 loss functions)
153+
│ │ ├── losses.py # Connectomics-specific losses
154+
│ │ └── regularization.py # Regularization losses
155+
│ └── solver/ # Optimizers and learning rate schedulers
156+
│ ├── build.py # Optimizer/scheduler factory
157+
│ └── lr_scheduler.py # Custom LR schedulers
157158
158-
├── lightning/ # PyTorch Lightning integration (PRIMARY)
159-
│ ├── lit_data.py # LightningDataModule
160-
│ ├── lit_model.py # LightningModule wrapper
161-
│ └── lit_trainer.py # Trainer utilities
159+
├── lightning/ # PyTorch Lightning integration (PRIMARY)
160+
│ ├── lit_data.py # LightningDataModule (Volume/Tile/Cloud datasets)
161+
│ ├── lit_model.py # LightningModule (1.8K lines - deep supervision, TTA)
162+
│ ├── lit_trainer.py # Trainer creation utilities
163+
│ └── callbacks.py # Custom Lightning callbacks
162164
163-
├── data/
164-
│ ├── dataset/ # Dataset classes (HDF5, TIFF)
165-
│ ├── augment/ # MONAI-based augmentations
166-
│ ├── io/ # Data I/O utilities
167-
│ └── process/ # Preprocessing utilities
165+
├── data/ # Data loading and preprocessing
166+
│ ├── dataset/ # Dataset classes (HDF5, TIFF, Zarr, Cloud)
167+
│ │ ├── build.py # Dataset factory
168+
│ │ ├── dataset_base.py # Base dataset class
169+
│ │ ├── dataset_volume.py # Volume-based datasets
170+
│ │ ├── dataset_tile.py # Tile-based datasets
171+
│ │ └── ... # Multi-dataset, filename-based, etc.
172+
│ ├── augment/ # MONAI-based augmentations
173+
│ │ ├── build.py # Transform pipeline builder (791 lines)
174+
│ │ ├── monai_transforms.py # Custom MONAI transforms (1.4K lines)
175+
│ │ └── ... # EM-specific, geometry, advanced augmentations
176+
│ ├── io/ # Multi-format I/O (HDF5, TIFF, PNG, Pickle)
177+
│ ├── process/ # Preprocessing and target generation
178+
│ └── utils/ # Data utilities
168179
169-
├── metrics/ # Evaluation metrics
170-
│ └── metrics_seg.py # Segmentation metrics (Adapted Rand, etc.)
180+
├── decoding/ # Post-processing and instance segmentation
181+
│ └── ... # Auto-tuning, instance decoding
171182
172-
└── utils/ # Utilities (visualization, system setup)
173-
174-
scripts/
175-
├── main.py # Primary entry point (Lightning + Hydra)
176-
└── build.py # Legacy entry point (deprecated)
177-
178-
tutorials/
179-
├── lucchi.yaml # Example config (MONAI BasicUNet)
180-
├── mednext_lucchi.yaml # Example config (MedNeXt-S)
181-
└── mednext_custom.yaml # Advanced config (MedNeXt custom)
183+
├── metrics/ # Evaluation metrics
184+
│ └── metrics_seg.py # Segmentation metrics (Adapted Rand, VOI, etc.)
185+
186+
└── utils/ # General utilities
187+
└── ... # Visualization, system setup, misc
188+
189+
scripts/ # Entry points and utilities
190+
├── main.py # Primary entry point (53KB, Lightning + Hydra)
191+
├── profile_dataloader.py # Data loading profiling tool
192+
├── slurm_launcher.py # SLURM cluster job launcher
193+
├── visualize_neuroglancer.py # Neuroglancer visualization (29KB)
194+
└── tools/ # Additional utility scripts
195+
196+
tutorials/ # Example configurations (11 YAML files)
197+
├── monai_lucchi++.yaml # Lucchi mitochondria (MONAI)
198+
├── monai_fiber.yaml # Fiber segmentation
199+
├── monai_bouton-bv.yaml # Bouton + blood vessel multi-task
200+
├── monai2d_worm.yaml # 2D C. elegans segmentation
201+
├── mednext_mitoEM.yaml # MitoEM dataset (MedNeXt)
202+
├── mednext2d_cem-mitolab.yaml # 2D MedNeXt example
203+
├── rsunet_snemi.yaml # SNEMI3D neuron segmentation (RSUNet)
204+
├── sweep_example.yaml # Hyperparameter sweep example
205+
└── ... # Additional tutorials
206+
207+
tests/ # Test suite (organized by type)
208+
├── unit/ # Unit tests (38/61 passing - 62%)
209+
├── integration/ # Integration tests (0/6 passing - needs update)
210+
├── e2e/ # End-to-end tests (requires data setup)
211+
├── test_rsunet.py # RSUNet model tests
212+
├── test_banis_features.py # Feature extraction tests
213+
├── TEST_STATUS.md # Detailed test status report
214+
└── README.md # Testing documentation
215+
216+
configs/ # LEGACY: Deprecated YACS configs
217+
└── barcode/ # ⚠️ Old YACS format (archive candidates)
218+
└── *.yaml # 3 legacy config files
219+
220+
docs/ # Sphinx documentation
221+
notebooks/ # Jupyter notebooks
222+
docker/ # Docker containerization
223+
conda-recipe/ # Conda packaging
182224
```
183225

184226
## Configuration System
@@ -256,31 +298,36 @@ print_config(cfg)
256298
The framework uses an extensible **architecture registry** for managing models:
257299

258300
```python
259-
from connectomics.models.architectures import (
301+
from connectomics.models.arch import (
260302
list_architectures,
261303
get_architecture_builder,
262304
register_architecture,
263305
print_available_architectures,
264306
)
265307
266308
# List all available architectures
267-
archs = list_architectures() # ['monai_basic_unet3d', 'monai_unet', 'mednext', ...]
309+
archs = list_architectures() # 8 total architectures
268310
269-
# Get detailed info
311+
# Get detailed info with counts
270312
print_available_architectures()
271313
```
272314

273-
### Supported Architectures
315+
### Supported Architectures (8 Total)
274316

275-
**MONAI Models:**
276-
- `monai_basic_unet3d`: Simple and fast 3D U-Net
277-
- `monai_unet`: U-Net with residual units
278-
- `monai_unetr`: Transformer-based UNETR
279-
- `monai_swin_unetr`: Swin Transformer U-Net
317+
**MONAI Models (4)** - No deep supervision:
318+
- `monai_basic_unet3d`: Simple and fast 3D U-Net (also supports 2D)
319+
- `monai_unet`: U-Net with residual units and advanced features
320+
- `monai_unetr`: Transformer-based UNETR (Vision Transformer backbone)
321+
- `monai_swin_unetr`: Swin Transformer U-Net (SOTA but memory-intensive)
280322

281-
**MedNeXt Models:**
323+
**MedNeXt Models (2)** - WITH deep supervision:
282324
- `mednext`: MedNeXt with predefined sizes (S/B/M/L) - RECOMMENDED
283-
- `mednext_custom`: MedNeXt with full parameter control
325+
- S: 5.6M params, B: 10.5M, M: 17.6M, L: 61.8M
326+
- `mednext_custom`: MedNeXt with full parameter control for research
327+
328+
**RSUNet Models (2)** - Pure PyTorch, WITH deep supervision:
329+
- `rsunet`: Residual symmetric U-Net with anisotropic convolutions (EM-optimized)
330+
- `rsunet_iso`: RSUNet with isotropic convolutions for uniform voxel spacing
284331

285332
#### MedNeXt Integration
286333
MedNeXt (MICCAI 2023) is a ConvNeXt-based architecture optimized for 3D medical image segmentation:
@@ -312,7 +359,7 @@ model:
312359
- **Isotropic Spacing**: Prefers 1mm isotropic spacing (unlike nnUNet)
313360
- **Training**: Use AdamW with lr=1e-3, constant LR (no scheduler)
314361

315-
**See:** `.claude/MEDNEXT.md` for complete documentation
362+
**Note:** MedNeXt is an optional external dependency - see Installation section for setup
316363

317364
### Building Models
318365
```python
@@ -548,6 +595,38 @@ scheduler:
548595
5. **Test everything**: Unit tests for all components
549596
6. **Documentation**: Update docs when adding features
550597

598+
## Code Quality Status
599+
600+
### Migration Status: ✅ Complete (95%+)
601+
- ✅ **YACS → Hydra/OmegaConf**: 100% migrated (no YACS imports in active code)
602+
- ✅ **Custom trainer → Lightning**: 100% migrated
603+
- ✅ **Custom models → MONAI models**: Primary path uses MONAI
604+
- ⚠️ **Legacy configs**: 3 YACS config files remain in `configs/barcode/` (archive candidates)
605+
606+
### Codebase Metrics
607+
- **Total Python files**: 109 (77 in connectomics module)
608+
- **Lines of code**: ~23,000 (connectomics module)
609+
- **Architecture**: Modular, well-organized
610+
- **Type safety**: Good (dataclass configs, type hints in most modules)
611+
- **Test coverage**: 62% unit tests passing (38/61), integration tests need updates
612+
613+
### Known Technical Debt
614+
1. **lit_model.py size**: 1,819 lines (should be split into smaller modules)
615+
2. **Code duplication**: Training/validation steps share deep supervision logic (~140 lines)
616+
3. **NotImplementedError**: 3 files with incomplete implementations
617+
- `connectomics/data/dataset/build.py`: `create_tile_data_dicts_from_json()`
618+
- Minor placeholders in base classes
619+
4. **Hardcoded values**: Output clamping, deep supervision weights, interpolation bounds
620+
5. **Dummy validation dataset**: Masks configuration errors instead of proper handling
621+
622+
### Overall Assessment: **8.1/10 - Production Ready**
623+
- ✅ Modern architecture (Lightning + MONAI + Hydra)
624+
- ✅ Clean separation of concerns
625+
- ✅ Comprehensive feature set
626+
- ✅ Good documentation
627+
- ⚠️ Minor refactoring needed for maintainability
628+
- ⚠️ Integration tests need API v2.0 migration
629+
551630
## Migration Notes
552631

553632
### From Legacy System
@@ -721,14 +800,23 @@ pip install -e .[full]
721800
722801
# Verify installation
723802
python -c "import connectomics; print('Version:', connectomics.__version__)"
724-
python -c "from connectomics.models.architectures import list_architectures; print(list_architectures())"
803+
python -c "from connectomics.models.arch import list_architectures; print(list_architectures())"
725804
```
726805

727806
## Further Reading
728807

729-
- **DESIGN.md**: Architecture principles (Lightning + MONAI)
730-
- **MEDNEXT.md**: MedNeXt integration guide
731-
- **REFACTORING_PLAN.md**: Planned improvements
732-
- [PyTorch Lightning Docs](https://lightning.ai/docs/pytorch/stable/)
733-
- [MONAI Docs](https://docs.monai.io/en/stable/)
734-
- [Hydra Docs](https://hydra.cc/)
808+
### Documentation Files
809+
- **README.md**: Project overview and quick start
810+
- **QUICKSTART.md**: 5-minute setup guide
811+
- **TROUBLESHOOTING.md**: Common issues and solutions
812+
- **CONTRIBUTING.md**: Contribution guidelines
813+
- **RELEASE_NOTES.md**: Version history and changes
814+
- **tests/TEST_STATUS.md**: Detailed test coverage status
815+
- **tests/README.md**: Testing guide
816+
817+
### External Resources
818+
- [PyTorch Lightning Docs](https://lightning.ai/docs/pytorch/stable/) - Training orchestration
819+
- [MONAI Docs](https://docs.monai.io/en/stable/) - Medical imaging toolkit
820+
- [Hydra Docs](https://hydra.cc/) - Configuration management
821+
- [Project Documentation](https://zudi-lin.github.io/pytorch_connectomics/build/html/index.html) - Full docs
822+
- [Slack Community](https://join.slack.com/t/pytorchconnectomics/shared_invite/zt-obufj5d1-v5_NndNS5yog8vhxy4L12w) - Get help

0 commit comments

Comments
 (0)