Skip to content

[Bug]: Saved GraphML is not importable in Gephi due to errors #2114

@ifsheldon

Description

@ifsheldon

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

I followed https://microsoft.github.io/graphrag/visualization_guide/, but when importing the graphml file saved by GraphRAG, errors happen in the Gephi side. The errors shown in Gephi are not helpful (see gephi/gephi#3041), but I figured it out later that in the saved graphml file, some nodes do not have id or id="" and some edges have target="". I think this is an issue in GraphRAG.

Steps to reproduce

  1. Find some document to index
  2. Follow the steps in https://microsoft.github.io/graphrag/visualization_guide/
  3. Import the graphml file in Gephi

Expected Behavior

The graphml file should not have these errors and this importable by Gephi

GraphRAG Config Used

### This config file contains required core defaults that must be set, along with a handful of common optional settings.
### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/

### LLM settings ###
## There are a number of settings to tune the threading and token limits for LLM calls - check the docs.

models:
  default_chat_model:
    type: chat
    model_provider: openai
    auth_type: api_key # or azure_managed_identity
    api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file, or remove if managed identity
    model: gpt-5-mini
    # api_base: https://<instance>.openai.azure.com
    # api_version: 2024-05-01-preview
    model_supports_json: true # recommended if this is available for your model.
    concurrent_requests: 25
    async_mode: threaded # or asyncio
    retry_strategy: exponential_backoff
    max_retries: 10
    tokens_per_minute: null
    requests_per_minute: null
  default_embedding_model:
    type: embedding
    model_provider: openai
    auth_type: api_key
    api_key: ${GRAPHRAG_API_KEY}
    model: text-embedding-3-small
    # api_base: https://<instance>.openai.azure.com
    # api_version: 2024-05-01-preview
    concurrent_requests: 25
    async_mode: threaded # or asyncio
    retry_strategy: exponential_backoff
    max_retries: 10
    tokens_per_minute: null
    requests_per_minute: null

### Input settings ###

input:
  storage:
    type: file # or blob
    base_dir: "input"
  file_type: text # [csv, text, json]

chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id]

### Output/storage settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided

output:
  type: file # [file, blob, cosmosdb]
  base_dir: "output"
    
cache:
  type: file # [file, blob, cosmosdb]
  base_dir: "cache"

reporting:
  type: file # [file, blob]
  base_dir: "logs"

vector_store:
  default_vector_store:
    type: lancedb
    db_uri: output/lancedb
    container_name: default

### Workflow settings ###

embed_text:
  model_id: default_embedding_model
  vector_store_id: default_vector_store

extract_graph:
  model_id: default_chat_model
  prompt: "prompts/extract_graph.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  model_id: default_chat_model
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

extract_graph_nlp:
  text_analyzer:
    extractor_type: regex_english # [regex_english, syntactic_parser, cfg]
  async_mode: threaded # or asyncio

cluster_graph:
  max_cluster_size: 10

extract_claims:
  enabled: false
  model_id: default_chat_model
  prompt: "prompts/extract_claims.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  model_id: default_chat_model
  graph_prompt: "prompts/community_report_graph.txt"
  text_prompt: "prompts/community_report_text.txt"
  max_length: 2000
  max_input_length: 8000

embed_graph:
  enabled: true # if true, will generate node2vec embeddings for nodes

umap:
  enabled: true # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)

snapshots:
  graphml: true # set to true to save the graph as a graphml file
  embeddings: false

### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

local_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: "prompts/local_search_system_prompt.txt"

global_search:
  chat_model_id: default_chat_model
  map_prompt: "prompts/global_search_map_system_prompt.txt"
  reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
  knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"

drift_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: "prompts/drift_search_system_prompt.txt"
  reduce_prompt: "prompts/drift_search_reduce_prompt.txt"

basic_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: "prompts/basic_search_system_prompt.txt"

Logs and screenshots

No response

Additional Information

  • GraphRAG Version: 2.7.0
  • Operating System: macOS
  • Python Version: 3.12
  • Related Issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    awaiting_responseMaintainers or community have suggested solutions or requested info, awaiting filer responsebugSomething isn't workingstaleUsed by auto-resolve bot to flag inactive issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions