Unstructured data & Vectors course updates (#186)

martinohanlon · adam-cowley · web-flow · commit 13b66fcd3713 · 2024-03-21T09:52:49.000Z
* typo

* added course structure

* semantic search lesson

* find movie plots

* semantic search questions

* vectors lesson

* sentence formatting

* unstructured data lesson

* fix link

* fix reset

* lesson structure

* poster and embedding lesson

* draft lessons

* creating vector index lesson

* renumbered lessons

* create vector index lesson

* query vectors lesson

* requirements

* fix course structure

* metadata and chunking lessons

* chunking

* build graph lesson

* topics and challenge lessson

* course metadata

* tidy up

* walk through

* fixed questions

* make course draft

* fix usecase in fundamentals

* Minor updates

* update 1st walkthrough

* module 3 structure

* restructure index lesson

* module 3 restructure

* updates after walk through

* updates 2nd walkthrough

* added banner

* add sandbox icon

* updates following review

* change question title

* minor update to language

* course summary

* llm fundamentals summary

---------

Co-authored-by: Adam Cowley &lt;adam@adamcowley.co.uk&gt;
diff --git a/asciidoc/courses/llm-fundamentals/summary.adoc b/asciidoc/courses/llm-fundamentals/summary.adoc
@@ -0,0 +1,21 @@
+= Course Summary
+
+Congratulations on completing "Neo4j & LLM Fundamentals". 
+
+You have learned:
+
+* How Large Language Models (LLMs) work, their benefits, and potential drawbacks
+* How you can use Knowledge Graphs and Retrieval Augmented Generation (RAG) to enhance LLMs
+* Basics of Semantic Search and how to implement vector search in Neo4j.
+* How to use Neo4j, Python and LangChain to interact with LLMs
+* How to use LLMs to generate Cypher statements
+* How to implement Retrieval Augmented Generation (RAG) strategies, including:
+** Using prompts to provide instructions and context
+** Using Neo4j vectors to query a graph database
+
+Continue your learning with the following resources:
+
+* link:https://graphacademy.neo4j.com[GraphAcademy^] - Free online training for Neo4j
+* link:https://neo4j.com/developer/[Neo4j Developer Page] - Resources for developers
+* link:https://graphacademy.neo4j.com/courses/llm-chatbot-python/[Build a Neo4j-backed Chatbot using Python] - Get hands-on and create a chatbot with Neo4j, Python, and Streamlit
+* link:https://graphacademy.neo4j.com/courses/llm-vectors-unstructured/[Introduction to Vector Indexes and Unstructured Data^] - Understand and search unstructured data using vector indexes
diff --git a/asciidoc/courses/llm-vectors-unstructured/course.adoc b/asciidoc/courses/llm-vectors-unstructured/course.adoc
@@ -1,6 +1,6 @@
 = Introduction to Vector Indexes and Unstructured Data
 :categories: llms:3
-:status: draft
+:status: active
 :next: llm-python-chatbot
 :duration: 2 hours
 :caption: Understand and search unstructured data using vector indexes
@@ -16,7 +16,7 @@ You will explore unstructured datasets, create embeddings and vector indexes and
 
 You will learn how to process unstructured data, chunking strategies, and create relationships between the data.
 
-You will build a graph database of unstructured data, use link:https://python.org[Python^], https://https://www.langchain.com/[Langchain^], and link:https://openai.com[OpenAI^] to process the data, create embeddings, and import it into Neo4j.
+You will build a graph database of unstructured data, use link:https://python.org[Python^], https://https://www.langchain.com/[LangChain^], and link:https://openai.com[OpenAI^] to process the data, create embeddings, and import it into Neo4j.
 
 After completing this course, you will have the knowledge and skill to build a graph of your unstructured data and query it using vector indexes.
 
@@ -42,5 +42,12 @@ To complete the practical tasks within this course, you will need link:https://p
 === What you will learn
 
 * Semantic search, unstructured data, and vector indexes
-* How to create embeddings using LLMs and Langchain
+* How to create embeddings using LLMs and LangChain
 * To build a graph database of unstructured data
+
+[.includes]
+== This course includes
+
+* [lessons]#11 lessons#
+* [challenges]#8 short hands-on challenges#
+* [quizes]#10 multiple choice questions#
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/1-introduction/lessons/1-getting-started/lesson.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/1-introduction/lessons/1-getting-started/lesson.adoc
@@ -12,7 +12,7 @@ You will create a graph of the unstructured course content from GraphAcademy, ex
 
 image::images/graphacademy-full.svg[An example diagram of the data model for the course content, showing Course, Module, Lesson, Paragraph and Topics nodes]
 
-You will use Python, Langchain, Neo4j, and OpenAI to process, embed, and index the data.
+You will use Python, LangChain, Neo4j, and OpenAI to process, embed, and index the data.
 
 == What you need
 
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/1-introduction/lessons/2-semantic-search/lesson.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/1-introduction/lessons/2-semantic-search/lesson.adoc
@@ -39,7 +39,7 @@ You would typically use the score to rank the results.
 
 === Why is semantic search useful?
 
-Semantic search allows you to find and score related data. It is useful when find similarities within unstructured data which relies on understanding the intent and contextual meaning of the search query.
+Semantic search allows you to find and score related data. It is useful when finding similarities within unstructured data that rely on understanding the intent and contextual meaning of the search query.
 
 Some typical use cases are:
 
@@ -54,9 +54,9 @@ Some typical use cases are:
 Semantic search faces several challenges that stem from the complexity of natural language, the diversity of user intents, and the dynamic nature of information. Some of these challenges include:
 
 * Understanding Context - Accurately grasping the context of queries can be difficult. Different users might use the same words to mean different things.
-* Language Ambiguity - Natural language is inherently ambiguous. Words can have multiple meanings, and sentences can be interpreted in different ways.
-* Fine tuning - To get the best result you may have to invest significant effort in fine tuning how the semantic search.
-* Transparency - The complexity behind semantic search can make it difficult to understand how a score is determined or why a particular result is returned.
+* Language Ambiguity - Natural language is inherently ambiguous. Words can have multiple meanings, and different models may interpret sentences differently.
+* Fine tuning - To get the best result, you may need to invest significant effort in fine-tuning your model, data and search algorithms.
+* Transparency - The complexity behind semantic search can make understanding how a score is determined or why a particular result is returned difficult.
 
 
 == Check Your Understanding
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/1-introduction/lessons/3-searching-text/lesson.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/1-introduction/lessons/3-searching-text/lesson.adoc
@@ -1,6 +1,6 @@
 = Finding Movie Plots
 :order: 3
-:type: lesson
+:type: challenge
 :sandbox: true
 
 In the previous lesson, you learned about the theory of semantic search. 
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/1-introduction/lessons/4-vectors/lesson.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/1-introduction/lessons/4-vectors/lesson.adoc
@@ -42,7 +42,7 @@ For example, the word "apple" might be represented by an embedding with the foll
 
 When applied in a search context, the vector for "apple" can be compared to the vectors for other words or phrases to determine the most relevant results.
 
-You can create embeddings in various ways, but one of the most common methods is to use a **large language model**.
+You can create embeddings in various ways, but one of the most common methods is to use a **Large Language Model (LLM)**.
 
 For example, the embedding for the word "apple" is `0.0077788467, -0.02306925, -0.007360777, -0.027743412, -0.0045747845, 0.01289164, -0.021863015, -0.008587573, 0.01892967, -0.029854324, -0.0027962727, 0.020108491, -0.004530236, 0.009129008,` ... and so on.
 
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/1-introduction/lessons/6-searching-images/lesson.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/1-introduction/lessons/6-searching-images/lesson.adoc
@@ -1,6 +1,6 @@
 = Searching images
 :order: 6
-:type: lesson
+:type: challenge
 :sandbox: true
 :optional: true
 
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/2-vector-indexes/lessons/1-embeddings/questions/1-models.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/2-vector-indexes/lessons/1-embeddings/questions/1-models.adoc
@@ -1,5 +1,5 @@
 [.question]
-= Embedding models
+= Comparing embeddings
 
 True or False - you can compare embeddings from *different* models to find similarities in unstructured data.
 
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/2-vector-indexes/lessons/2-load-embeddings/lesson.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/2-vector-indexes/lessons/2-load-embeddings/lesson.adoc
@@ -1,6 +1,6 @@
 = Load embeddings
 :order: 2
-:type: lesson
+:type: challenge
 :sandbox: true
 
 In this lesson, you will learn how to load embeddings into a Neo4j database.
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/2-vector-indexes/lessons/3-create-vector-index/lesson.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/2-vector-indexes/lessons/3-create-vector-index/lesson.adoc
@@ -4,6 +4,7 @@
 :sandbox: true
 
 To query embeddings, you need to create a vector index. 
+A vector index significantly increases the speed of similarity searches by pre-computing the similarity between vectors and storing them in the index.
 
 In this lesson, you will create vector indexes on the `embedding` property of the `Question` and `Answer` nodes.
 
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/2-vector-indexes/module.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/2-vector-indexes/module.adoc
@@ -6,8 +6,7 @@
 In this module, you will learn:
 
 * What embeddings are and how to load them into Neo4j
-* How to create Vector indexes in Neo4j
-* How to query vector indexes 
+* How to create and query vector indexes in Neo4j
 * The different similarity functions available in Neo4j
 
 If you are ready, let's get going!
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/1-structured-unstructured/lesson.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/1-structured-unstructured/lesson.adoc
@@ -34,7 +34,7 @@ For example, you could use the graph to answer questions like:
 
 == Course data
 
-During this module, you will use Python and Langchain to import the text of a GraphAcademy course into Neo4j.
+During this module, you will use Python and LangChain to import the text of a GraphAcademy course into Neo4j.
 
 GraphAcademy represents courses as a graph of `Course`, `Module`, and `Lesson` nodes. A course has modules, and a module has lessons.
 
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/2-chunking/lesson.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/2-chunking/lesson.adoc
@@ -65,4 +65,4 @@ include::questions/1-chunksize.adoc[leveloffset=+1]
 
 In this lesson, you learned about strategies for chunking and storing data in a graph.
 
-In the next lesson, you will use Python and Langchain to chunk the course content and store the data in Neo4j.
+In the next lesson, you will use Python and LangChain to chunk the course content and store the data in Neo4j.
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/3-import-python/lesson.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/3-import-python/lesson.adoc
@@ -1,17 +1,17 @@
-= Import data with Python and Langchain
+= Import data with Python and LangChain
 :order: 3
-:type: lesson
+:type: challenge
 :disable-cache: true
 :sandbox: true
 
-In this lesson, you will use Python and Langchain to chunk up course content and create embeddings for each chunk.
+In this lesson, you will use Python and LangChain to chunk up course content and create embeddings for each chunk.
 You will then load the chunks into a Neo4j graph database.
 
 == Course data
 
 You will load the content from the course link:https://graphacademy.neo4j.com/courses/llm-fundamentals/[Neo4j & LLM Fundamentals^].
 
-The workshop repository you cloned contains the course data.
+The course repository contains the course data.
 
 Open the `llm-vectors-unstructured\data` directory in your code editor.
 
@@ -28,7 +28,7 @@ You should note the following structure:
 
 == Load the content and chunk it
 
-You can load the content and chunk it using Python and Langchain.
+You can now load the content and chunk it using Python and LangChain.
 
 You will split the lesson content into chunks of text, around 1500 characters long, with each chunk containing one or more paragraphs.
 You can determine the paragraph in the content with two newline characters (`\n\n`).
@@ -98,7 +98,7 @@ Investigate what happens when you modify the `separator`, `chunk_size` and `chun
 
 == Create vector index
 
-Once you have chunked the content, you can use the Langchain link:https://python.langchain.com/docs/integrations/vectorstores/neo4jvector[`Neo4jVector`^] and link:https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.openai.OpenAIEmbeddings.html[`OpenAIEmbeddings`^] classes to create the embeddings, the vector index, and store the chunks in a Neo4j graph database.
+Once you have chunked the content, you can use the LangChain link:https://python.langchain.com/docs/integrations/vectorstores/neo4jvector[`Neo4jVector`^] and link:https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.openai.OpenAIEmbeddings.html[`OpenAIEmbeddings`^] classes to create the embeddings, the vector index, and store the chunks in a Neo4j graph database.
 
 Modify your Python program to include the following code:
 
@@ -161,6 +161,6 @@ include::questions/1-character-split.adoc[leveloffset=+1]
 [.summary]
 == Lesson Summary
 
-In this lesson, you learned how to chunk data and create a vector index using Python and Langchain.
+In this lesson, you learned how to chunk data and create a vector index using Python and LangChain.
 
 In the next lesson, you will use the OpenAI API to create an embedding.
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/3-import-python/questions/1-character-split.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/3-import-python/questions/1-character-split.adoc
@@ -1,7 +1,7 @@
 [.question]
 = Character split
 
-True or False - The Langchain `CharacterTextSplitter` will always split a chunk when the number of characters exceeds the `chunk_size` parameter.
+True or False - The LangChain `CharacterTextSplitter` will always split a chunk when the number of characters exceeds the `chunk_size` parameter.
 
 * [ ] True
 * [x] False
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/4-create-embeddings/lesson.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/4-create-embeddings/lesson.adoc
@@ -58,7 +58,7 @@ include::{repository-raw}/main/llm-vectors-unstructured/solutions/query_neo4j.py
 ----
 
 [NOTE]
-The `Neo4jGraph` class provides a simple mechanism with Langchain to interact with Neo4j. It is not a full-featured Neo4j client.
+The `Neo4jGraph` class provides a simple mechanism with LangChain to interact with Neo4j. It is not a full-featured Neo4j client.
 
 Use the `query` method to run the Cypher to query the `chunkVector` index using the embedding:
 
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/5-build-graph/lesson.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/5-build-graph/lesson.adoc
@@ -1,10 +1,10 @@
 = Create a graph
 :order: 5
-:type: lesson
+:type: challenge
 :disable-cache: true
 :sandbox: true
 
-In the two previous lessons, you used the Langchain `Neo4jVector` and `Neo4jGraph` classes to create nodes in the graph.
+In the two previous lessons, you used the LangChain `Neo4jVector` and `Neo4jGraph` classes to create nodes in the graph.
 Using `Neo4jVector` and `Neo4Graph` is an efficient and easy way to get started.
 
 To create a graph where you can also understand the relationships within the data, you must incorporate the metadata into the data model.
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/6-extract-topics/lesson.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/lessons/6-extract-topics/lesson.adoc
@@ -1,6 +1,6 @@
 = Extract Topics
 :order: 6
-:type: lesson
+:type: challenge
 :sandbox: true
 
 In the last lesson, you built a graph using metadata to understand the course content and the relationships between the content and lessons.
@@ -40,7 +40,10 @@ In this case, the topics `'extract topics', 'textblob'`.
 
 == Update the Graph
 
-Your next task is to update the program you created in the last lesson to extract topics from the lesson content and add them to the graph.
+Your task is to update the `build_graph.py` program you created in the last lesson to: 
+
+. Extract topics from the lesson content.
+. Add Topics and relationships to the graph.
 
 [%collapsible]
 .View the code from the last lesson
diff --git a/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/module.adoc b/asciidoc/courses/llm-vectors-unstructured/modules/3-unstructured-data/module.adoc
@@ -7,9 +7,9 @@ In this module, you will learn:
 
 * Strategies for importing unstructured data into Neo4j
 * How to chunk unstructured data into manageable pieces
-* To import unstructured data into Neo4j using Python and Langchain
+* To import unstructured data into Neo4j using Python and LangChain
 
 
 If you are ready, let's get going!
 
-link:./1-structured-unstructured/[Ready? Let's go →, role=btn]
+link:./0-setup/[Ready? Let's go →, role=btn]
diff --git a/asciidoc/courses/llm-vectors-unstructured/summary.adoc b/asciidoc/courses/llm-vectors-unstructured/summary.adoc
@@ -0,0 +1,17 @@
+= Course Summary
+
+Congratulations on completing an "Introduction to Vector Indexes and Unstructured Data". 
+
+You have learned:
+
+* How semantic search, vectors, and unstructured data are related
+* What embeddings are and how to load them into Neo4j
+* How to create and query vector indexes in Neo4j
+* Strategies for importing unstructured data into Neo4j
+* How to chunk unstructured data into manageable pieces
+* To import unstructured data into Neo4j using Python and LangChain
+
+Continue your learning with the following resources:
+
+* link:https://graphacademy.neo4j.com[GraphAcademy^] - Free online training for Neo4j
+* link:https://graphacademy.neo4j.com/courses/llm-chatbot-python/[Build a Neo4j-backed Chatbot using Python] - Get hands-on and create a chatbot with Neo4j, Python, and Streamlit
diff --git a/asciidoc/shared/courses/importing-data/images/sandboxicon.svg b/asciidoc/shared/courses/importing-data/images/sandboxicon.svg
@@ -0,0 +1,27 @@
+<svg width="48" height="48" viewBox="0 0 48 48" fill="none" xmlns="http://www.w3.org/2000/svg">
+  <path
+    d="M0 8C0 3.58172 3.58172 0 8 0H40C44.4183 0 48 3.58172 48 8V40C48 44.4183 44.4183 48 40 48H8C3.58172 48 0 44.4183 0 40V8Z"
+  />
+  <path d="M19.7812 19.3594V26.9531" stroke="#4D5157" stroke-width="1.5" stroke-linecap="round"
+    stroke-linejoin="round" />
+  <path
+    d="M13.0313 13.4531H19.7813C19.7813 13.4531 21.4688 13.4531 21.4688 15.1406V17.6719C21.4688 17.6719 21.4688 19.3594 19.7813 19.3594H13.0313C13.0313 19.3594 11.3438 19.3594 11.3438 17.6719V15.1406C11.3438 15.1406 11.3438 13.4531 13.0313 13.4531Z"
+    stroke="#4D5157" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" />
+  <path d="M28.2188 26.9531V19.3594" stroke="#4D5157" stroke-width="1.5" stroke-linecap="round"
+    stroke-linejoin="round" />
+  <path
+    d="M28.2188 13.4531H34.9688C34.9688 13.4531 36.6563 13.4531 36.6563 15.1406V17.6719C36.6563 17.6719 36.6563 19.3594 34.9688 19.3594H28.2188C28.2188 19.3594 26.5312 19.3594 26.5312 17.6719V15.1406C26.5312 15.1406 26.5312 13.4531 28.2188 13.4531Z"
+    stroke="#4D5157" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" />
+  <path d="M19.7812 23.5781H28.2188" stroke="#4D5157" stroke-width="1.5" stroke-linecap="round"
+    stroke-linejoin="round" />
+  <path
+    d="M34.9688 19.3594V33.7031C34.9688 33.9269 34.8799 34.1415 34.7216 34.2997C34.5634 34.458 34.3488 34.5469 34.125 34.5469H13.875C13.6512 34.5469 13.4366 34.458 13.2784 34.2997C13.1201 34.1415 13.0312 33.9269 13.0312 33.7031V19.3594"
+    stroke="#4D5157" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" />
+  <path d="M16.4062 13.4531V15.9844" stroke="#4D5157" stroke-width="1.5" stroke-linecap="round"
+    stroke-linejoin="round" />
+  <path d="M31.5938 13.4531V15.9844" stroke="#4D5157" stroke-width="1.5" stroke-linecap="round"
+    stroke-linejoin="round" />
+  <path
+    d="M21.4688 34.5469V31.1719C21.4688 30.5005 21.7354 29.8567 22.2101 29.382C22.6848 28.9073 23.3287 28.6406 24 28.6406C24.6713 28.6406 25.3152 28.9073 25.7899 29.382C26.2646 29.8567 26.5313 30.5005 26.5313 31.1719V34.5469"
+    stroke="#4D5157" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" />
+</svg>
diff --git a/asciidoc/shared/courses/importing-data/sandbox.adoc b/asciidoc/shared/courses/importing-data/sandbox.adoc
@@ -1,6 +1,6 @@
 A blank Neo4j Sandbox instance has been created for you to use during this course.
 
-You can open a Neo4j Browser window throughout this course by clicking the link:#[Toggle Sandbox,role=classroom-sandbox-toggle] button in the bottom right-hand corner of the screen.
+You can open a Neo4j Browser window throughout this course by clicking the link:#[image:images/sandboxicon.svg[Toggle Sandbox],role=classroom-sandbox-toggle]  button in the bottom right-hand corner of the screen.
 
 == What is Neo4j Sandbox?