Skip to content

Releases: aws-solutions-library-samples/accelerated-intelligent-document-processing-on-aws

v0.3.17

26 Sep 22:16

Choose a tag to compare

Added

  • Edit Sections Feature for Modifying Class/Type and Reprocessing Extraction

    • Added Edit Sections interface for Pattern-2 and Pattern-3 workflows with reprocessing optimization
    • Key Features: Section management (create, update, delete), classification updates, page reassignment with overlap detection, real-time validation
    • Selective Reprocessing: Only modified sections are reprocessed while preserving existing data for unmodified sections
    • Processing Pipeline: All functions (OCR/Classification/Extraction/Assessment) automatically skip redundant operations based on data presence
    • Pattern Compatibility: Full functionality for Pattern-2/Pattern-3, informative modal for Pattern-1 explaining BDA not yet supported
  • Analytics Agent Schema Optimization for Improved Performance

    • Embedded Database Overview: Complete table listing and guidance embedded directly in system prompt (no tool call needed)
    • On-Demand Detailed Schemas: get_table_info(['specific_tables']) loads detailed column information only for tables actually needed by the query
    • Significant Performance Gains: Eliminates redundant tool calls on every query while maintaining token efficiency
    • Enhanced SQL Guidance: Comprehensive Athena/Trino function reference with explicit PostgreSQL operator warnings to prevent common query failures like ~ regex operator mistakes
    • Faster Time-to-Query: Agent has immediate access to table overview and can proceed directly to detailed schema loading for relevant tables

Fixed

  • Fix missing data in Glue tables when using a document class that contains a dash (-).

v0.3.16

19 Sep 22:21

Choose a tag to compare

[0.3.16]

Added

  • S3 Vectors Support for Cost-Optimized Knowledge Base Storage

    • Added S3 Vectors as alternative vector store option to OpenSearch Serverless for Bedrock Knowledge Base with lower storage costs
    • Custom resource Lambda implementation for S3 vector bucket and index management (using boto3 s3vectors client) with proper IAM permissions and resource cleanup
    • Unified Knowledge Base interface supporting both vector store types with automatic resource provisioning based on user selection
  • Page Limit Configuration for Classification Control

    • Added maxPagesForClassification configuration option to control how many pages are used during document classification
    • Default Behavior: "ALL" - uses all pages for classification (existing behavior)
    • Limited Page Classification: Set to numeric value (e.g., "1", "2", "3") to classify only the first N pages
    • Important: When using numeric limit, the classification result from the first N pages is applied to ALL pages in the document, effectively forcing the entire document to be assigned a single class with one section
    • Use Cases: Performance optimization for large documents, cost reduction for documents with consistent classification patterns, simplified processing for homogeneous document types
  • CloudFormation Service Role for Delegated Deployment Access

    • Added example CloudFormation service role template that enables non-administrator users to deploy and maintain IDP stacks without requiring ongoing administrator permissions
    • Administrators can provision the service role once with elevated privileges, then delegate deployment capabilities to developer/DevOps teams
    • Includes comprehensive documentation and cross-referenced deployment guides explaining the security model and setup process

Fixed

  • Fixed issue where CloudFront policy statements were still appearing in generated GovCloud templates despite CloudFront resources being removed
  • Fix duplicate Glue tables are created when using a document class that contains a dash (-). Resolved by replacing dash in section types with underscore character when creating the table, to align with the table name generated later by the Glue crawler - resolves #57.
  • Fix occasional UI error 'Failed to get document details - please try again later' - resolves #58
  • Fixed UI zipfile creation to exclude .aws-sam directories and .env files from deployment package
  • Added security recommendation to set LogLevel parameter to WARN or ERROR (not INFO) for production deployments to prevent logging of sensitive information including PII data, document contents, and S3 presigned URLs
  • Hardened several aspects of the new Discovery feature

v0.3.15

12 Sep 21:46

Choose a tag to compare

[0.3.15]

Added

  • Intelligent Document Discovery Module for Automated Configuration Generation

    • Added Discovery module that automatically analyzes document samples to identify structure, field types, and organizational patterns
    • Pattern-Neutral Design: Works across all processing patterns (1, 2, 3) with unified discovery process and pattern-specific implementations
    • Dual Discovery Methods: Discovery without ground truth (exploratory analysis) and with ground truth (optimization using labeled data)
    • Automated Blueprint Creation: Pattern 1 includes zero-touch BDA blueprint generation with intelligent change detection and version management
    • Web UI Integration: Real-time discovery job monitoring, interactive results review, and seamless configuration integration
    • Advanced Features: Multi-model support (Nova, Claude), customizable prompts, configurable parameters, ground truth processing, schema conversion, and lifecycle management
    • Key Benefits: Rapid new document type onboarding, reduced time-to-production, configuration optimization, and automated workflow bootstrapping
    • Use Cases: New document exploration, configuration improvement, rapid prototyping, and document understanding
    • Documentation: Guide in docs/discovery.md with architecture details, best practices, and troubleshooting
  • Optional Pattern-2 Regex-Based Classification for Enhanced Performance

    • Added support for optional regex patterns in document class definitions for performance optimization
    • Document Name Regex: Match against document ID/name to classify all pages without LLM processing when all pages should be the same class
    • Document Page Content Regex: Match against page text content during multi-modal page-level classification for fast page classification
    • Key Benefits: Significant performance improvements and cost savings by bypassing LLM calls for pattern-matched documents, deterministic classification results for known document patterns, seamless fallback to existing LLM classification when regex patterns don't match
    • Configuration: Optional document_name_regex and document_page_content_regex fields in class definitions with automatic regex compilation and validation
    • Logging: Comprehensive info-level logging when regex patterns match for observability and debugging
    • CloudFormation Integration: Updated Pattern-2 schema to support regex configuration through the Web UI
    • Demonstration: New step2_classification_with_regex.ipynb notebook showcasing regex configuration and performance comparisons
    • Documentation: Enhanced classification module README and main documentation with regex usage examples and best practices
  • Windows WSL Development Environment Setup Guide

    • Added WSL-based development environment setup guide for Windows developers in docs/setup-development-env-WSL.md
    • Key Features: Automated setup script (wsl_setup.sh) for quick installation of Git, Python, Node.js, AWS CLI, and SAM CLI
    • Integrated Workflow: Development setup combining Windows tools (VS Code, browsers) with native Linux environment
    • Target Use Cases: Windows developers needing Linux compatibility without Docker Desktop or VM overhead

Fixed

  • Throttling Error Detection and Retry Logic for Assessment Functions - GitHub Issue #45

    • Assessment Function: Enhanced throttling detection to check for throttling errors returned in document.errors field in addition to thrown exceptions, raising ThrottlingException to trigger Step Functions retry when throttling is detected
    • Granular Assessment Task Caching: Fixed caching logic to properly cache successful assessment tasks when there are ANY failed tasks (both exception-based and result-based failures), enabling efficient retry optimization by only reprocessing failed tasks while preserving successful results
    • Impact: Improved resilience for throttling scenarios, reduced redundant processing during retries, and better Step Functions retry behavior
  • Security Vulnerability Mitigation - Package Updates

  • GovCloud Compatibility - Hardcoded Service Domain References

    • Fixed hardcoded amazonaws.com references in CloudFormation templates that prevented GovCloud deployment
    • Updated all service principals and endpoints to use dynamic ${AWS::URLSuffix} expressions for automatic region-based resolution
    • Templates Updated: template.yaml (main template), patterns/pattern-3/sagemaker_classifier_endpoint.yaml
    • Services Fixed: EventBridge, Cognito, SageMaker, ECR, CloudFront, CodeBuild, AppSync, Lambda, DynamoDB, CloudWatch Logs, Glue
    • Resolves GitHub Issue #50 - templates now deploy correctly in both standard AWS and GovCloud regions
  • Bug Fixes and Code Improvements

    • Fixed HITL processing errors in both Pattern-1 (DynamoDB validation with empty strings) and Pattern-2 (string indices error in A2I output processing)
    • Fixed Step Function UI issues including auto-refresh button auto-disable and fetch failures for failed executions with datetime serialization errors
    • Cleaned up unused Step Function subscription infrastructure and removed duplicate code in Pattern-2 HITL function
    • Expanded UI Visual Editor bounding box size with padding for better visibility and user interaction
    • Fixed bug in list of models supporting cache points - previously claude 4 sonnet and opus had been excluded.
    • Validations added at the assessment step for checking valid json response. The validation fails after extraction/assessment is complete if json parsing issues are encountered.

v0.3.14

07 Sep 15:22

Choose a tag to compare

[0.3.14]

Added

  • Support for 1m token context for Claude Sonnet 4
  • Video demo of "Chat with Document" in ./docs/web-ui.md
  • Human-in-the-Loop (HITL) Support Extended to Pattern-2
    • Added HITL review capabilities for Pattern-2 (Textract + Bedrock processing) using Amazon SageMaker Augmented AI (A2I)
    • Enables human validation and correction when extraction confidence falls below configurable threshold
    • Includes same features as Pattern-1 HITL: automatic triggering, review portal integration, and seamless result updates
    • Documentation and video demo in ./docs/human-review.md

Removed

  • Windows development environment guide and setup script removed as it proved insufficiently robust

Fixed

  • Fix 1-click Launch URL output from the GovCloud template generation script
  • Add Agent Analytics to architecture diagram
  • Fix various UX and error reporting issues with the new Python publish script
  • Simplify UDOP model path construction and avoid invalid default for regions other than us-east-1 and us-west-2
  • Permission regression from previous release affecting "Chat with Document"

v0.3.13

29 Aug 22:33

Choose a tag to compare

[0.3.13]

Added

  • External MCP Agent Integration for Custom Tool Extension

    • Added External MCP (Model Context Protocol) Agent support that enables integration with custom MCP servers to extend IDP capabilities
    • Cross-Account Integration: Host MCP servers in separate AWS accounts or external infrastructure with secure OAuth authentication using AWS Cognito
    • Dynamic Tool Discovery: Automatically discovers and integrates available tools from MCP servers through the IDP web interface
    • Secure Authentication Flow: Uses AWS Cognito User Pools for OAuth bearer token authentication with proper token validation
    • Configuration Management: JSON array configuration in AWS Secrets Manager supporting multiple MCP server connections with optional custom agent names and descriptions
    • Real-time Integration: Tools become immediately available through the IDP web interface after configuration
  • AWS GovCloud Support with Automated Template Generation

    • Added GovCloud compatibility through scripts/generate_govcloud_template.py script
    • ARN Partition Compatibility: All templates updated to use arn:${AWS::Partition}: for both commercial and GovCloud regions
    • Headless Operation: Automatically removes UI-related resources (CloudFront, AppSync, Cognito, WAF) for GovCloud deployment
    • Core Functionality Preserved: All 3 processing patterns and complete 6-step pipeline (OCR, Classification, Extraction, Assessment, Summarization, Evaluation) remain fully functional
    • Automated Workflow: Single script orchestrates build + GovCloud template generation + S3 upload with deployment URLs
    • Enterprise Ready: Enables headless document processing for government and enterprise environments requiring GovCloud compliance
    • Documentation: New docs/govcloud-deployment.md with deployment guide, architecture differences, and access methods
  • Pattern-2 and Pattern-3 Assessment now generate geometry (bounding boxes) for visualization in UI 'Visual Editor' (parity with Pattern-1)

    • Added comprehensive spatial localization capabilities to both regular and granular assessment services
    • Automatic Processing: When LLM provides bbox coordinates, automatically converts to UI-compatible (Visual Edit) geometry format without any configuration
    • Universal Support: Works with all attribute types - simple attributes, nested group attributes (e.g., CompanyAddress.State), and list attributes
    • Enhanced Prompts: Updated assessment task prompts with spatial-localization-guidelines requesting bbox coordinates in normalized 0-1000 scale
    • Demo Notebooks: Assessment notebooks now showcase automatic bounding box processing
  • New Python-Based Publishing System

    • Replaced publish.sh bash script with new publish.py Python script
    • Rich console interface with progress bars, spinners, and colored output using Rich library
    • Multi-threaded artifact building and uploading for significantly improved performance
    • Native support for Linux, macOS, and Windows environments
  • Windows Development Environment Setup Guide and Helper Script

    • New scripts/dev_setup.bat (570 lines) for complete Windows development environment configuration
  • OCR Service Default Image Sizing for Resource Optimization

    • Implemented automatic default image size limits (951×1268) when no image sizing configuration is provided
    • Key Benefits: Reduction in vision model token consumption, prevents OutOfMemory errors during concurrent processing, improves processing speed and reduces bandwidth usage

Changed

  • Reverted to python3.12 runtime to resolve build package dependency problems

Fixed

  • Improved Visual Edit bounding box position when using image zoom or pan

v0.3.12

22 Aug 21:43

Choose a tag to compare

[0.3.12]

Added

  • Custom Prompt Generator Lambda Support for Patterns 2 & 3

    • Added custom_prompt_lambda_arn configuration field to enable injection of custom business logic into extraction processing
    • Key Features: Lambda interface with all template placeholders (DOCUMENT_TEXT, DOCUMENT_CLASS, ATTRIBUTE_NAMES_AND_DESCRIPTIONS, DOCUMENT_IMAGE), URI-based image handling for JSON serialization, comprehensive error handling with fail-fast behavior, scoped IAM permissions requiring GENAIIDP-* function naming
    • Use Cases: Document type-specific processing rules, integration with external systems for customer configurations, conditional processing based on document content, regulatory compliance and industry-specific requirements
    • Demo Resources: Interactive notebook demonstration (step3_extraction_with_custom_lambda.ipynb), SAM deployment template for demo Lambda function, comprehensive documentation and examples in notebooks/examples/demo-lambda/
    • Benefits: Custom business logic without core code changes, backward compatible (existing deployments unchanged), robust JSON serialization handling all object types, complete observability with detailed logging
  • Refactored Document Classification Service for Enhanced Boundary Detection

    • Consolidated multimodalPageLevelClassification and the experimental multimodalPageBoundaryClassification (from v0.3.11) into a single enhanced multimodalPageLevelClassification method
    • Implemented BIO-like sequence segmentation with document boundary indicators: "start" (new document) and "continue" (same document)
    • Automatically segments multi-document packets, even when they contain multiple documents of the same type
    • Added comprehensive classification guide with method comparisons and best practices
    • Benefits: Simplified codebase with single multimodal classification method, improved handling of complex document packets, maintains backward compatibility
    • No Breaking Changes: Existing configurations work unchanged, no configuration updates required
  • Enhanced A2I Template and Workflow Management

    • Enhanced A2I template with improved user interface and clearer instructions for reviewers
    • Added comprehensive instructions for reviewers in A2I template to guide the review process
    • Implemented capture of failed review tasks with proper error handling and logging
    • Added workflow orchestration control to stop processing when reviewer rejects A2I task
    • Removed automatic A2I task creation when Pattern-1 Bedrock Data Automation (BDA) fails to classify document to appropriate Blueprint
  • Dynamic Cost Calculation for Metering Data

    • Added automated unit cost and estimated cost calculation to metering table with new unit_cost and estimated_cost columns
    • Dynamic pricing configuration loading from configuration
    • Enhanced cost analysis capabilities with comprehensive Athena queries for cost tracking, trend analysis, and efficiency metrics
    • Automatic cost calculation as estimated_cost = value × unit_cost for all metering records
  • Configuration-Based Summarization Control

    • Summarization can now be enabled/disabled via configuration file summarization.enabled property instead of CloudFormation stack parameter
    • Key Benefits: Runtime control without stack redeployment, zero LLM costs when disabled, simplified state machine architecture, backward compatible defaults
    • Implementation: Always calls SummarizationStep but service skips processing when enabled: false
    • Cost Optimization: When disabled, no LLM API calls or S3 operations are performed
    • Configuration Example: Set summarization.enabled: false to disable, enabled: true to enable (default)
  • Configuration-Based Assessment Control

    • Assessment can now be enabled/disabled via configuration file assessment.enabled property instead of CloudFormation stack parameter
    • Key Benefits: Runtime control without stack redeployment, zero LLM costs when disabled, simplified state machine architecture, backward compatible defaults
    • Implementation: Always calls AssessmentStep but service skips processing when enabled: false
    • Cost Optimization: When disabled, no LLM API calls or S3 operations are performed
    • Configuration Example: Set assessment.enabled: false to disable, enabled: true to enable (default)
  • New guides for setting up development environments

    • EC2-based Linux development environment
    • MacOS development environment

Removed

  • CloudFormation Parameters: Removed IsSummarizationEnabled and IsAssessmentEnabled parameters from all pattern templates
  • Related Conditions: Removed parameter conditions and state machine definition substitutions for both features
  • Conditional Logic: Eliminated complex conditional logic from state machine definitions for summarization and assessment steps

⚠️ Breaking Changes

  • Configuration Migration Required: When updating a stack that previously had IsSummarizationEnabled or IsAssessmentEnabled set to false, these features will now default to enabled: true after the update. To maintain the disabled behavior:
    1. Update your configuration file to set summarization.enabled: false and/or assessment.enabled: false as needed
    2. Save the configuration changes immediately after the stack update
    3. This ensures continued cost optimization by preventing unexpected LLM API calls
  • Action Required: Review your current CloudFormation parameter settings before updating and update your configuration accordingly to preserve existing behavior

Changed

  • Updated Python Lambda Runtime to 3.13

Fixed

  • Fixed B615 "Unsafe Hugging Face Hub download without revision pinning" security finding in Pattern-3 fine-tuning module - Added revision pinning with to prevent supply chain attacks and ensure reproducible deployments
  • Fixed CloudWatch Log Group Missing Retention regression
  • Security: Cross-Site Scripting (XSS) Vulnerability in FileViewer Component - Fixed high-risk XSS vulnerability in src/ui/src/components/document-viewer/FileViewer.jsx where innerHTML was used with user-controlled data
  • Add permissions boundary support to new Lambda function roles introduced in previous releases
  • Fixed OutOfMemory Errors in Pattern-2 OCR Lambda for Large High-Resolution Documents
    • Root Cause: Processing large PDFs with high-resolution images (7469×9623 pixels) caused memory spikes when 20 concurrent workers each held ~101MB images simultaneously, exceeding the 4GB Lambda memory limit
    • Optimal Solution: Refactored image extraction to render directly at target dimensions using PyMuPDF matrix transformations, completely eliminating oversized image creation

v0.3.11

15 Aug 21:15

Choose a tag to compare

[0.3.11]

Added

  • Chat with Document now available at the bottom of the each Document Detail page.
  • Anthropic Claude Opus 4.1 model available in configuration for all document processing steps
  • Browser tab icon now features a blue background with a white "IDP"
  • Experimental new classification method - multimodalPageBoundaryClassification - for detecting section boundaries during page level classification.

v0.3.10

08 Aug 21:10

Choose a tag to compare

[0.3.10]

Added

  • Agent Analysis Feature for Natural Language Document Analytics

    • Added integrated AI-powered analytics agent that enables natural language querying of processed document data
    • Key Capabilities: Convert natural language questions to SQL queries, generate interactive visualizations and tables, explore database schema automatically
    • Secure Architecture: All Python code execution happens in isolated AWS Bedrock AgentCore sandboxes, not in Lambda functions
    • Multi-Tool Agent System: Database discovery tool for schema exploration, Athena query tool for SQL execution, secure code sandbox for data transfer, Python visualization tool for charts and tables
    • Example Use Cases: Query document processing volumes and trends, analyze confidence scores and extraction accuracy, explore document classifications and content patterns, generate custom charts and data tables
    • Sample W2 Test Data: Includes 20 synthetic W2 tax documents for testing analytics capabilities
    • Configurable Models: Supports multiple AI models including Claude 3.7 Sonnet (default), Claude 3.5 Sonnet, Nova Pro/Lite, and Haiku
    • Web UI Integration: Accessible through "Document Analytics" section with real-time progress display and query history
  • Automatic Glue Table Creation for Document Sections

    • Added automatic creation of AWS Glue tables for each document section type (classification) during processing
    • Tables are created dynamically when new section types are encountered, eliminating manual table creation
    • Consistent lowercase naming convention for tables ensures compatibility with case-sensitive S3 paths
    • Tables are configured with partition projection for efficient date-based queries without manual partition management
    • Automatic schema evolution - tables update when new fields are detected in extraction results

v0.3.9

25 Jul 22:16

Choose a tag to compare

[0.3.9]

Added

  • Optional Permissions Boundary Support for Enterprise Deployments
    • Added PermissionsBoundaryArn parameter to all CloudFormation templates for organizations with Service Control Policies (SCPs) requiring permissions boundaries
    • Comprehensive support for both explicit IAM roles and implicit roles created by AWS SAM functions and statemachines`
    • Conditional implementation ensures backward compatibility - when no permissions boundary is provided, roles deploy normally

Added

  • IDP Configuration and Prompting Best Practices documentation doc

Changed

  • Updated lending_package.pdf sample with more realistic driver's license image

Fixed

  • Issue #27 - removed idp_common bedrock client region default to us-west-2 - PR #28

v0.3.8

18 Jul 22:10

Choose a tag to compare

[0.3.8]

Added

  • Lending Package Configuration Support for Pattern-2

    • Added new lending-package-sample configuration to Pattern-2, providing comprehensive support for lending and financial document processing workflows
    • New default configuration for Pattern-2 stack deployments, optimized for loan applications, mortgage processing, and financial verification documents
    • Previous rvl-cdip-sample configuration remains available by selecting rvl-cdip-package-sample for the Pattern2Configuration parameter when deploying or updating stacks
  • Text Confidence View for Document Pages

    • Added support for displaying OCR text confidence data through new TextConfidenceUri field
    • New "Text Confidence View" option in the UI pages panel alongside existing Markdown and Text views
    • Fixed issues with view persistence - Text Confidence View button now always visible with appropriate messaging when content unavailable
    • Fixed view toggle behavior - switching between views no longer closes the viewer window
    • Reordered view buttons to: Markdown View, Text Confidence View, Text View for better user experience
  • Enhanced OCR DPI Configuration for PDF files

    • DPI for PDF image conversion is now configurable in the configuration editor under OCR image processing settings
    • Default DPI improved from 96 to 150 DPI for better default quality and OCR accuracy
    • Configurable through Web UI without requiring code changes or redeployment

Changed

  • Converted text confidence data format from JSON to markdown table for improved readability and reduced token usage

    • Removed unnecessary "page_count" field
    • Changed "text_blocks" array to "text" field containing a markdown table with Text and Confidence columns
    • Reduces prompt size for assessment service while improving UI readability
    • OCR confidence values now rounded to 1 decimal point (e.g., 99.1, 87.3) for cleaner display
    • Markdown table headers now explicitly left-aligned using |:-----|:-----------| format for consistent appearance
  • Simplified OCR Service Initialization

    • OCR service now accepts a single config dictionary parameter for cleaner, more consistent API
    • Aligned with classification service pattern for better consistency across IDP services
    • Backward compatibility maintained - old parameter pattern still supported with deprecation warning
    • Updated all lambda functions and notebooks to use new simplified pattern
  • Removed fixed image target_height and target_width from default configurations, so images are processed in original resolution by default.

  • Updated Default Configuration for Pattern1 and Pattern2

    • Changed default configuration for new stacks from "default" to "lending-package-sample" for both Pattern1 and Pattern2
    • Maintains backward compatibility for stack updates by keeping the parameter value "default" mapped to the rvl-cdip-sample for pattern-2.
  • Reduce assessment step costs

    • Default model for granular assessment is now us.amazon.nova-lite-v1:0 - experimentation recommended
    • Improved placement of <> tags in assessment prompt to improve utilization of prompt caching

Fixed

  • Fixed Image Resizing Behavior for High-Resolution Documents
    • Fixed issue where empty strings in image configuration were incorrectly resizing images to default 951x1268 pixels instead of preserving original resolution
    • Empty strings ("") in target_width and target_height configuration now preserve original document resolution for maximum processing accuracy
  • Fixed issue where PNG files were being unnecessarily converted to JPEG format and resized to lower resolution with lost quality
  • Fixed issue where PNG and JPG image files were not rendering inline in the Document Details page
  • Fixed issue where PDF files were being downloaded instead of displayed inline
  • Fixed pricing data for cacheWrite tokens for Amazon Nova models to resolve innacurate cost estimation in UI.