Release v0.4.5 · aws-solutions-library-samples/accelerated-intelligent-document-processing-on-aws

[0.4.5]

Document Split Classification Metrics for Evaluating Page-Level Classification and Document Segmentation
- Added DocSplitClassificationMetrics class for comprehensive evaluation of document splitting and classification accuracy
- Three Accuracy Types: Page-level classification accuracy, split accuracy without order consideration, and split accuracy with exact page order matching
- Visual Reporting: Generates markdown reports with color-coded indicators (🟢 Excellent, 🟡 Good, 🟠 Fair, 🔴 Poor), progress bars, and detailed section analysis tables
- Automatic Integration: Integrates with evaluation service when ground truth and predicted sections are available
- Documentation: Guide in lib/idp_common_pkg/idp_common/evaluation/README.md with usage examples, metric explanations, and best practices
Caching improvements to Agentic Extraction Service
- Optimized prompt caching by caching document context (text/images) on first LLM call, reducing token costs and quota consumption
Enhanced Bedrock Retry Logic for Agentic Extraction
- New bedrock_utils.py module with exponential backoff and comprehensive error handling
- Improves agentic extraction reliability for transient failures and rate limiting
Review Agent Model Configuration
- Added review_agent_model parameter to enable separate model for reviewing extraction work
- Defaults to main extraction model if not specified
- Configurable through Web UI extraction settings

Evaluation Output URI Fields Lost Across All Patterns - causing (a) missing Page Text Confidence content in UI, (2) failed Assessment step when reprocessing document after editing classes (No module named 'fitz')
- Fixed bug where text_confidence_uri was being set to null in evaluation output for all three patterns
- Root cause: AppSync service _appsync_to_document() method incorrectly mapped page URIs, and evaluation functions overwrote correct documents with corrupted AppSync responses
UI: Metering Data Not Displayed During Document Processing
- Fixed UI subscription query missing Metering field, preventing real-time cost display
- Users can now see estimated costs accumulate in real-time without manual page refresh
UI: Estimated Cost Panel Arrow Misalignment
- Fixed expand/contract arrow displaying above "Estimated Cost" heading
Agentic Extraction Reliability Improvements
- Updated Pydantic model serialization to use model_dump(mode="json") for proper JSON handling
- Resolved linting issues and improved code quality across extraction modules