[0.4.5]
Added
-
Document Split Classification Metrics for Evaluating Page-Level Classification and Document Segmentation
- Added
DocSplitClassificationMetricsclass for comprehensive evaluation of document splitting and classification accuracy - Three Accuracy Types: Page-level classification accuracy, split accuracy without order consideration, and split accuracy with exact page order matching
- Visual Reporting: Generates markdown reports with color-coded indicators (🟢 Excellent, 🟡 Good, 🟠 Fair, 🔴 Poor), progress bars, and detailed section analysis tables
- Automatic Integration: Integrates with evaluation service when ground truth and predicted sections are available
- Documentation: Guide in
lib/idp_common_pkg/idp_common/evaluation/README.mdwith usage examples, metric explanations, and best practices
- Added
-
Caching improvements to Agentic Extraction Service
- Optimized prompt caching by caching document context (text/images) on first LLM call, reducing token costs and quota consumption
-
Enhanced Bedrock Retry Logic for Agentic Extraction
- New
bedrock_utils.pymodule with exponential backoff and comprehensive error handling - Improves agentic extraction reliability for transient failures and rate limiting
- New
-
Review Agent Model Configuration
- Added
review_agent_modelparameter to enable separate model for reviewing extraction work - Defaults to main extraction model if not specified
- Configurable through Web UI extraction settings
- Added
Fixed
-
Evaluation Output URI Fields Lost Across All Patterns - causing (a) missing Page Text Confidence content in UI, (2) failed Assessment step when reprocessing document after editing classes (No module named 'fitz')
- Fixed bug where
text_confidence_uriwas being set to null in evaluation output for all three patterns - Root cause: AppSync service
_appsync_to_document()method incorrectly mapped page URIs, and evaluation functions overwrote correct documents with corrupted AppSync responses
- Fixed bug where
-
UI: Metering Data Not Displayed During Document Processing
- Fixed UI subscription query missing
Meteringfield, preventing real-time cost display - Users can now see estimated costs accumulate in real-time without manual page refresh
- Fixed UI subscription query missing
-
UI: Estimated Cost Panel Arrow Misalignment
- Fixed expand/contract arrow displaying above "Estimated Cost" heading
-
Agentic Extraction Reliability Improvements
- Updated Pydantic model serialization to use
model_dump(mode="json")for proper JSON handling - Resolved linting issues and improved code quality across extraction modules
- Updated Pydantic model serialization to use