Skip to content

Suggest replacing Python’s built-in XML parser with pygixml for major performance gains #33847

@MohammadRaziei

Description

@MohammadRaziei

Checked other resources

  • This is a feature request, not a bug report or usage question.
  • I added a clear and descriptive title that summarizes the feature request.
  • I used the GitHub search to find a similar feature request and didn't find it.
  • I checked the LangChain documentation and API reference to see if this feature already exists.
  • This is not related to the langchain-community package.

Feature Description

Hi LangChain team,

While reviewing the code in
libs/core/langchain_core/output_parsers/xml.py,
I noticed that it currently uses Python’s built-in xml module for parsing.

The standard XML parser is known to be quite slow — especially when handling larger documents. I’d like to suggest evaluating a new library I’ve developed called pygixml, which could dramatically improve performance in this area.

Why pygixml

  • Built on top of pugixml (C++) and Cython, designed for performance and clean Python integration.
  • 16× to 33× faster than Python’s built-in xml parser (and around 5× faster than lxml in benchmarks).
  • Offers a highly intuitive, Pythonic API with full XPath 1.0 support.
  • Each node has additional utilities like mem_id, xpath, and recursive text access.

I’m the maintainer and author of pygixml, and I’d be happy to either:

  1. Submit a pull request integrating pygixml into LangChain’s XML output parser, or
  2. Help evaluate the potential performance benefits before merging.

Would love to know if you’re open to a PR — I’m confident this change could significantly improve XML parsing speed in LangChain.

Thanks for the great work you do,
Mohammad Raziei
Author of pygixml

Use Case

change speed of xml parsing

Proposed Solution

using pygixml

Alternatives Considered

No response

Additional Context

No response

Metadata

Metadata

Assignees

Labels

coreRelated to the package `langchain-core`feature requestrequest for an enhancement / additional functionality

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions