Skip to content

Conversation

@iBalajiShanmugam
Copy link

Some PDFs have their first transaction's description text extracted on the same line as "Opening Unit Balance" due to PDF internal structure differences. This caused the first transaction to be silently dropped since the parser couldn't match a line with date/amounts but no description.

Example of problematic extraction:
Line 1: 'NFO Purchase Opening Unit Balance: 0.000' <- description merged
Line 2: '20-Apr-2023 2,000.00 200.000 10.0000 200.000' <- no description

Fix:

  • Modified OPEN_UNITS_RE to capture optional description before "Opening Unit Balance"
  • Added TRANSACTION_RE5 to match transaction lines without description
  • Store pending description and apply it to the next transaction line

Fixes orphan stamp duty issue where the corresponding purchase was missing.

…it Balance

Some PDFs have their first transaction's description text extracted on the
same line as "Opening Unit Balance" due to PDF internal structure differences.
This caused the first transaction to be silently dropped since the parser
couldn't match a line with date/amounts but no description.

Example of problematic extraction:
  Line 1: 'NFO Purchase		Opening Unit Balance: 0.000'  <- description merged
  Line 2: '28-Apr-2020		2,000.00		200.000		10.0000		200.000'  <- no description

Fix:
- Modified OPEN_UNITS_RE to capture optional description before "Opening Unit Balance"
- Added TRANSACTION_RE5 to match transaction lines without description
- Store pending description and apply it to the next transaction line

Fixes orphan stamp duty issue where the corresponding purchase was missing.
@iBalajiShanmugam iBalajiShanmugam force-pushed the fix-first-transaction-parsing branch from 709b78b to 3db75bb Compare December 3, 2025 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant