-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Do you need to file an issue?
- I have searched the existing issues and this feature is not already filed.
- My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
- I believe this is a legitimate feature request, not just a question. If this is a question, please use the Discussions area.
Is your feature request related to a problem? Please describe.
The text units need to be sufficiently long to help the LLM establish the context of the extracted entities and relationships, but this makes it harder for the end-user to validate the results of the GraphRAG query results.
Furtherrmore, citations to community summaries lead to long lists of text_units, most of which can be irrelevant (only a particular piece of information may be of interest from a community report containing a lot more information).
The end-user needs better access to the original text that is relevant to the response, so that the response can be validated against the source material.
Describe the solution you'd like
After GraphRAG has produced its output, a separate utility could parse this output and, for each paragraph that is supported by citations, use the LLM to extract the pieces of text (as series of sentences) from the original text, which are relevant to the specific paragraph. The final output would be needed to be human-readable and machine-parseable.
Additional context
No response