Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,20 @@ You can skip certain time-consuming tests by setting environment variables:
* `SKIP_LINK_CHECKS=true` - Skip validation of external links (GitHub repositories, lesson links)
* `SKIP_CYPHER_CHECKS=true` - Skip Cypher query validation (file existence checks still run)

== Cursor QA checks

You can use Cursor to run automated QA checks on the course content.

Cursor prompts are stored in the `.cursor/` folder.

To run the checks against a specific course , use the following commands:

[source, text]
@review-lesson-content.mdc for @genai-graphrag-python
run and fix COURSES=graphrag-python npm run test qa
@technical-lesson-review.mdc
@review-course.mdc

== Contributing

To create a new course or modify an existing course, please create a new branch and make your changes.
Expand Down
4 changes: 2 additions & 2 deletions asciidoc/courses/genai-graphrag-python/course.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
:duration: 2 hours
:caption: Learn how to use Python and LLMs to convert unstructured data into knowledge graphs.
:usecase: blank-sandbox
:key-points: Create a knowledge graph using Neo4j GraphRAG for Python, Model a knowledge graph of structure and unstructured data, Query a knowledge graph using retrievers, Customize the knowledge graph build process
:key-points: Create a knowledge graph using Neo4j GraphRAG for Python, Model a knowledge graph of structured and unstructured data, Query a knowledge graph using retrievers, Customize the knowledge graph build process
:repository: neo4j-graphacademy/genai-graphrag-python
:banner-style: light

Expand Down Expand Up @@ -52,4 +52,4 @@ How to:

* [lessons]#16 lessons#
* [challenges]#7 hands-on challenges#
* [quizes]#8 simple quizzes to support your learning#
* [quizes]#8 simple quizzes to support your learning#
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
= Constructing Knowledge Graphs
= Constructing knowledge graphs
:type: lesson
:order: 1

In this lesson you will review the process of constructing knowledge graphs from unstructured text using an LLM.
In this lesson, you will review the process of constructing knowledge graphs from unstructured text using an LLM.

== The construction process

Typically, you would follow these steps:
When constructing a knowledge graph from unstructured text, you typically follow these steps:

. Gather the data
. Chunk the data
Expand Down Expand Up @@ -72,12 +72,12 @@ If you wanted to construct a knowledge graph based on the link:https://en.wikipe
image::images/neo4j-wiki.png["A screenshot of the Neo4j wiki page"]
. Split the text into **chunks**.
+
Neo4j is a graph database management system (GDBMS) developed
Neo4j is a graph database management system (GDBMS) developed
by Neo4j Inc.
+
{sp}
+
The data elements Neo4j stores are nodes, edges connecting them,
The data elements Neo4j stores are nodes, edges connecting them,
and attributes of nodes and edges...

. Generate **embeddings** and **vectors** for each chunk.
Expand All @@ -88,8 +88,8 @@ image::images/neo4j-wiki.png["A screenshot of the Neo4j wiki page"]
+
Send the text to the LLM with an appropriate prompt, for example:
+
Your task is to identify the entities and relations requested
with the user prompt from a given text. You must generate the
Your task is to identify the entities and relations requested
with the user prompt from a given text. You must generate the
output in a JSON format containing a list with JSON objects.

Text:
Expand Down Expand Up @@ -166,4 +166,4 @@ include::questions/1-steps.adoc[leveloffset=+1]

In this lesson, you learned about how to construct a knowledge graph.

In the next lesson, you will setup your development environment to build knowledge graphs using Python and Neo4j.
In the next lesson, you will set up your development environment to build knowledge graphs using Python and Neo4j.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[.question]
= 1. Knowledge graph construction steps
= Knowledge Graph Construction Steps

Which of the following steps could be considered **optional**?

Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
= Setup your development environment
= Set up your development environment
:type: lesson
:order: 2
:branch: main


During this course, you will:
During this course, you will:

* Use the Neo4j link:https://neo4j.com/docs/neo4j-graphrag-python/current/[GraphRAG for Python ()`neo4j_graphrag`) package to create a knowledge graph from unstructured and structured data
* Use the Neo4j link:https://neo4j.com/docs/neo4j-graphrag-python/current/[GraphRAG for Python (`neo4j_graphrag`)^] package to create a knowledge graph from unstructured and structured data
* Create vector and text to Cypher retrievers that use the knowledge graph to provide context to an LLM

You must set up a development environment to run the code examples and exercises.
Expand All @@ -16,11 +16,11 @@ include::../../../../../../shared/courses/codespace/get-started.adoc[]
[%collapsible]
.Develop on your local machine
====
You will need link:https://python.org[Python] installed and the ability to install packages using `pip`.
You will need link:https://python.org[Python^] installed and the ability to install packages using `pip`.

You may want to set up a virtual environment using link:https://docs.python.org/3/library/venv.html[`venv`^] or link:https://virtualenv.pypa.io/en/latest/[`virtualenv`^] to keep your dependencies separate from other projects.

Clone the link:{repository-link}[github.com/neo4j-graphacademy/genai-graphrag-python] repository:
Clone the link:{repository-link}[github.com/neo4j-graphacademy/genai-graphrag-python^] repository:

[source,bash]
----
Expand Down Expand Up @@ -63,8 +63,9 @@ ifeval::[{course-completed}==true]

.Course completed
[IMPORTANT]
.Sandbox no longer available
====
You have completed this course.
You have completed this course.

The Neo4j sandbox instance is no longer available, you can create a Neo4j cloud instance using link:https://console.neo4j.io[Neo4j AuraDB^]
====
Expand All @@ -77,6 +78,12 @@ endif::[]

You can test your setup by running `genai-graphrag-python/test_environment.py` - this will attempt to connect to the Neo4j sandbox and the OpenAI API.

[source,python]
.Test your setup
----
python genai-graphrag-python/test_environment.py
----

You will see an `OK` message if you have set up your environment correctly. If any tests fail, check the contents of the `.env` file.

== Continue
Expand All @@ -86,12 +93,9 @@ When you are ready, you can move on to the next task.
read::Success - let's get started![]



read::Continue[]

[.summary]
== Lesson Summary

In this lesson, you setup your development environment to build a knowledge graph.
In this lesson, you prepared your development environment to build a knowledge graph.

In the next module, you will create a knowledge graph from unstructured and structured data using an LLM.
In the next module, you will create a knowledge graph from unstructured and structured data using an LLM.
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Welcome to Constructing Knowledge Graphs with Neo4j GraphRAG for Python.
In this module, you will:

* Review the process of creating knowledge graphs from unstructured text.
* Setup a development environment to build your own knowledge graph.
* Set up a development environment to build your own knowledge graph.

If you are ready, let's get going!

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= Extracting a Schema from Text
= Extracting a schema from text
:type: lesson
:order: 1
:branch: main
Expand All @@ -7,8 +7,6 @@ The link:https://neo4j.com/docs/neo4j-graphrag-python/current/[GraphRAG for Pyth

During this course you will use the `neo4j_graphrag` package to build a knowledge graph and retrievers to extract information from the graph using LLMs.

==

In this lesson you will review how a graph schema can be extracted from text using an LLM.

== Using the SchemaFromTextExtractor
Expand All @@ -21,13 +19,13 @@ Open `genai-graphrag-python/extract_schema.py`
include::{repository-raw}/{branch}/genai-graphrag-python/extract_schema.py[]
----

The code uses the `SchemaFromTextExtractor` class to extract a schema from a given text input.
The code uses the `SchemaFromTextExtractor` class to extract a schema from a given text input.

The extractor:

. Creates a prompt instructing the LLM to:
. Creates a prompt instructing the LLM to:
.. Identify entities and relationships in any given text
.. Format the output as JSON
.. Format the output as JSON
. Passes the prompt and text to the LLM for processing
. Parses the JSON response to create a schema object

Expand All @@ -37,20 +35,20 @@ Given the text, _"Neo4j is a graph database management system (GDBMS) developed
.Extracted Schema
----
node_types=(
NodeType(label='GraphDatabase),
NodeType(label='GraphDatabase),
NodeType(label='Company')
)
)
relationship_types=(
RelationshipType(label='DEVELOPED_BY'),
)
)
patterns=(
('GraphDatabaseManagementSystem', 'DEVELOPED_BY', 'Company')
)
)
----

Run the program and observe the output. You will see a more detailed schema based on the text provided.

This schema can be used to stored the data held within the text.
This schema can be used to store the data held within the text.

image::images/neo4j_graphdatabase.svg["a graph schema with a Neo4j GraphDatabase node connected to a Neo4j Inc Company node via a DEVELOPED_BY relationship"]

Expand All @@ -60,7 +58,7 @@ Experiment with different text inputs to see how the schema extraction varies ba
* "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France."
* "Large Language Models (LLMs) are a type of artificial intelligence model designed to understand and generate human-like text."

When you're have experimented with the schema extraction you can continue.
When you have experimented with the schema extraction, you can continue.

read::Continue[]

Expand All @@ -70,6 +68,6 @@ read::Continue[]
In this lesson, you:

* Learned how to extract a graph schema from unstructured text using an LLM.
* Explore how different text inputs can lead to different schema extractions.
* Explored how different text inputs can lead to different schema extractions.

In the next lesson, you will create a knowledge graph construction pipeline using the `SimpleKGPipeline` class.
In the next lesson, you will create a knowledge graph construction pipeline using the `SimpleKGPipeline` class.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= Create a Graph
= Create a graph
:type: lesson
:order: 2
:branch: main
Expand All @@ -17,7 +17,10 @@ The `SimpleKGPipeline` class provides a pipeline which implements a series of st
image::images/kg_builder_pipeline.png["Pipeline showing these steps"]

[TIP]
Typical default values are used for each step. Throughout the course will you learn how to customize each step to suit your requirements.
.Customizing the pipeline
====
Typical default values are used for each step. Throughout the course, you will learn how to customize each step to suit your requirements.
====

== Create the knowledge graph

Expand All @@ -31,7 +34,7 @@ include::{repository-raw}/{branch}/genai-graphrag-python/kg_builder.py[]

The code loads a single pdf file, `data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf`, and run the pipeline to create a knowledge graph in Neo4j.

The PDF document contains the content from the link:https://graphacademy.neo4j.com/courses/genai-fundamentals/[Neo4j & Generative AI Fundamentals^] course link:https://graphacademy.neo4j.com/courses/genai-fundamentals/1-generative-ai/1-what-is-genai/[What is Generative AI?^] lesson.
The PDF document contains the content from the link:https://graphacademy.neo4j.com/courses/genai-fundamentals/[Neo4j & Generative AI Fundamentals^] course, specifically the link:https://graphacademy.neo4j.com/courses/genai-fundamentals/1-generative-ai/1-what-is-genai/[What is Generative AI?^] lesson.

Breaking down the code, you can see the following steps:

Expand Down Expand Up @@ -85,7 +88,7 @@ The `SimpleKGPipeline` creates the following default graph model:

image::images/kg-builder-default-model.svg["a graph model showing (Document)<[:FROM_DOCUMENT]-(Chunk)<-[:FROM_CHUNK]-(Entity)"]

The `Entity` nodes represent the entities extracted from the text chunks. Relevant properties are extract from the chunk and associated with the entity nodes.
The `Entity` nodes represent the entities extracted from the text chunks. Relevant properties are extracted from the chunk and associated with the entity nodes.

You can view the documents and chunks created in the graph using the following Cypher query:

Expand Down Expand Up @@ -125,4 +128,4 @@ In this lesson, you:
* Learned how to use the `SimpleKGPipeline` class.
* Explored the graph model created by the pipeline.

In the next lesson, you will modify the chunk size used when splitting the text and define a custom schema for the knowledge graph.
In the next lesson, you will modify the chunk size used when splitting the text and define a custom schema for the knowledge graph.
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,5 @@ Think about what the SimpleKGPipeline does when you run it - it takes unstructur
[TIP,role=solution]
.Solution
====
The SimpleKGPipeline class provides a pipeline which *implements a series of steps to create a knowledge graph from unstructured data*. These steps include loading text, splitting it into chunks, creating embeddings, extracting entities, and writing the data to Neo4j.
====
The SimpleKGPipeline class provides a pipeline which **implements a series of steps to create a knowledge graph from unstructured data**. These steps include loading text, splitting it into chunks, creating embeddings, extracting entities, and writing the data to Neo4j.
====
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ MATCH (n) DETACH DELETE n
To modify the chunk size you will need to create a `FixedSizeSplitter` object and pass it to the `SimpleKGPipeline` when creating the pipeline instance:

. Modify the `genai-graphrag-python/kg_builder.py` file to import the `FixedSizeSplitter` class and create an instance with a chunk size of 500 characters:
+
+
[source, python]
----
include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_split.py[tag=import_text_splitter]
Expand All @@ -31,6 +31,7 @@ include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_sp
----
+
[NOTE]
.Chunk size and overlap
The `chunk_size` parameter defines the maximum number of characters in each text chunk. The `chunk_overlap` parameter ensures that there is some overlap between consecutive chunks, which can help maintain context.
. Update the `SimpleKGPipeline` instantiation to use the custom text splitter:
+
Expand Down Expand Up @@ -80,4 +81,4 @@ In this lesson, you:
* Learned about the impact of chunk size on entity extraction
* Modified the `SimpleKGPipeline` to use a custom chunk size with the `FixedSizeSplitter`

In the next lesson, you will define a custom schema for the knowledge graph.
In the next lesson, you will define a custom schema for the knowledge graph.
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,5 @@ Consider what happens to the level of detail and context when you make text chun
[TIP,role=solution]
.Solution
====
The larger the chunk size, the more context the LLM has when extracting entities and relationships, but it may also lead to less granular data. This is the key trade-off - more context versus granularity of the extracted information.
====
**Larger chunks provide more context for entity extraction but result in less granular data**. The larger the chunk size, the more context the LLM has when extracting entities and relationships, but it may also lead to less granular data. This is the key trade-off - more context versus granularity of the extracted information.
====
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
:order: 4
:branch: main

The knowledge graph you created is unconstrained, meaning that any entity or relationship can be created based on the data extracted from the text. This can lead to graph which are non-specific and maybe difficult to analyze and query.
The knowledge graph you created is unconstrained, meaning that any entity or relationship can be created based on the data extracted from the text. This can lead to graphs that are non-specific and may be difficult to analyze and query.

In this lesson, you will modify the `SimpleKGPipeline` to use a custom schema for the knowledge graph.

Expand All @@ -12,7 +12,7 @@ In this lesson, you will modify the `SimpleKGPipeline` to use a custom schema fo

When you provide a schema to the `SimpleKGPipeline`, it will pass this information to the LLM instructing it to only identify those nodes and relationships. This allows you to create a more structured and meaningful knowledge graph.

You define a schema by expressing the desired nodes, relationships, or patterns you want to extract from the text.
You define a schema by expressing the desired nodes, relationships, or patterns you want to extract from the text.

For example, you might want to extract the following information:

Expand All @@ -23,7 +23,7 @@ For example, you might want to extract the following information:
[TIP]
.Iterate your schema
====
You don't have to define nodes, relationships, and patterns all at once. You can start with just nodes or just relationships and expand your schema as needed.
You don't have to define nodes, relationships, and patterns all at once. You can start with just nodes or just relationships and expand your schema as needed.

For example, if you only define nodes, the LLM will find any relationships between those nodes based on the text.

Expand Down Expand Up @@ -113,7 +113,7 @@ Review the `data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf` PDF doc

== Process all the documents

When you are happy with the schema, you can modify the program to process all the PDF documents from the link:https://graphacademy.neo4j.com/courses/genai-fundamentals[GraphAcademy Neo4j and Generative AI Fundamentals course]:
When you are happy with the schema, you can modify the program to process all the PDF documents from the link:https://graphacademy.neo4j.com/courses/genai-fundamentals[Neo4j and Generative AI Fundamentals course^]:

[source, python]
.All PDFs
Expand All @@ -132,12 +132,18 @@ include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_sc
----
====

[TIP]
.OpenAI Rate Limiting?
====
When using a free OpenAI API key, you may encounter rate limiting issues when processing multiple documents. You can add a `sleep` between document processing to mitigate this.
====

Review the knowledge graph and observe how the defined schema has influenced the structure of the graph.

[source, cypher]
.Documents, Chunks, and Entity counts
----
RETURN
RETURN
count{ (:Document) } as documents,
count{ (:Chunk) } as chunks,
count{ (:__Entity__) } as entities
Expand All @@ -153,4 +159,4 @@ include::questions/1-define-schema.adoc[leveloffset=+2]

In this lesson, you learned how to define a custom schema for the knowledge graph.

In the next lesson, you will learn how to add structured data to the knowledge graph.
In the next lesson, you will learn how to add structured data to the knowledge graph.
Loading