Skip to content

[Feature Request]: Identification of the relevant source material #1988

@ntsarb

Description

@ntsarb

Do you need to file an issue?

  • I have searched the existing issues and this feature is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate feature request, not just a question. If this is a question, please use the Discussions area.

Is your feature request related to a problem? Please describe.

The text units need to be sufficiently long to help the LLM establish the context of the extracted entities and relationships, but this makes it harder for the end-user to validate the results of the GraphRAG query results.

Furtherrmore, citations to community summaries lead to long lists of text_units, most of which can be irrelevant (only a particular piece of information may be of interest from a community report containing a lot more information).

The end-user needs better access to the original text that is relevant to the response, so that the response can be validated against the source material.

Describe the solution you'd like

After GraphRAG has produced its output, a separate utility could parse this output and, for each paragraph that is supported by citations, use the LLM to extract the pieces of text (as series of sentences) from the original text, which are relevant to the specific paragraph. The final output would be needed to be human-readable and machine-parseable.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions