Skip to content

Conversation

@bukzor
Copy link
Contributor

@bukzor bukzor commented Oct 22, 2025

Summary

Fixes part of issue #160 by anchoring the IsJSON() regex to prevent false positives when detecting format.

Problem: The IsJSON() function uses an unanchored regex \s*[{\[] which matches a [ character anywhere in the input. This causes XML documents containing CDATA sections like <![CDATA[text]]> to be incorrectly detected as JSON format.

Solution: Changed the regex to ^\s*[{\[] to anchor it to the start of the input string. Now only inputs that actually begin with { or [ (after optional whitespace) will be detected as JSON.

Changes

  • internal/utils/utils.go: Modified IsJSON() to use anchored regex ^\s*[{\[]
  • cmd/root_test.go: Added TestFormatDetection test to verify CDATA sections are correctly detected as XML format

Test Plan

  • Added test case TestFormatDetection/CDATA_should_be_detected_as_XML_not_JSON
  • Verified test fails without the fix (shows CDATA detected as JSON)
  • Verified test passes with the fix (CDATA now correctly detected as XML)
  • All existing unit tests pass (go test ./...)

Additional Context

This is part 2 of a 4-part fix for issue #160. The other PRs address:

  1. HTML text escaping (PR Fix #160: Add proper text escaping in FormatHtml #161 - already submitted)
  2. IsJSON regex anchor (this PR)
  3. CDATA support in NodeToJSON
  4. Exhaustive switch statements for node types

🤖 Generated with Claude Code

@bukzor bukzor force-pushed the fix-160-isjson-regex branch from 44e1d28 to a11677f Compare October 22, 2025 16:40
@bukzor bukzor marked this pull request as ready for review October 22, 2025 16:41
@bukzor bukzor force-pushed the fix-160-isjson-regex branch from a11677f to 4629951 Compare October 22, 2025 17:04
… on CDATA

The IsJSON() function was using an unanchored regex `\s*[{\[]` which
would match a `[` character anywhere in the input. This caused XML
documents containing CDATA sections like `<![CDATA[text]]>` to be
incorrectly detected as JSON.

Changed the regex to `^\s*[{\[]` to anchor it to the start of the
input string. Now only inputs that actually begin with `{` or `[`
(after optional whitespace) will be detected as JSON.

Added TestIsJSON with test cases including CDATA to verify the fix.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@bukzor bukzor force-pushed the fix-160-isjson-regex branch from 4629951 to 9a5c191 Compare November 14, 2025 17:34
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.57%. Comparing base (ec5a59a) to head (9a5c191).
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #163   +/-   ##
=======================================
  Coverage   80.57%   80.57%           
=======================================
  Files           5        5           
  Lines         690      690           
=======================================
  Hits          556      556           
  Misses         92       92           
  Partials       42       42           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants