KernelMemory File Watcher is a service designed to automate the document ingestion process for Semantic Kernel's KernelMemory service. It monitors specified directories for file changes and sends these changes to the KernelMemory service for processing. This enables the automatic creation of embeddings for Retrieval Augmented Generation (RAG) whenever a file is modified. The service is designed to run on the edge or wherever your files reside, and can be deployed as a standalone service or a Docker container.
The MessageStore is responsible for storing and managing file events. It implements the IMessageStore interface which defines methods for adding a file event, retrieving the next file event, and checking if there are any file events in the store. The MessageStore uses a ConcurrentDictionary to store file events, ensuring thread-safety.
The FileWatcherService is responsible for watching specified directories for file changes. It uses the FileSystemWatcher class to monitor directories and raises events when files are created, deleted, or modified. These events are then added to the MessageStore.
The HttpWorker is a hosted service that periodically checks the MessageStore for new file events and sends them to the KernelMemory service. It uses an HttpClient to send HTTP requests and includes logic for handling different types of file events (e.g., upserts and deletes).
- The
FileWatcherServicestarts watching the specified directories for file changes. - When a file change is detected, a file event is created and added to the
MessageStore. - The
HttpWorkerperiodically checks theMessageStorefor new file events. - When a new file event is found, the
HttpWorkersends it to the KernelMemory service for processing.
The service's configuration is defined in the appsettings.json file. Here you can specify the directories to watch, the KernelMemory service's endpoint and API key, and other options.
{
"FileWatcher": {
"Directories": [
{
"Path": "/tmp/folder_01",
"Filter": "*.md", // single filter
"Index": "folder-01",
"IncludeSubdirectories": true
},
{
"Path": "/tmp/folder_02",
"Filters": { // multiple filters
"*.md",
"*.pdf"
},
"Index": "folder-02",
"IncludeSubdirectories": true
},
// More directories...
]
},
"KernelMemory": {
"Endpoint": "http://127.0.0.1:9001",
"ApiKey": "", // not required
"Schedule": "00:00:30"
}
}
In the FileWatcher section, you can specify multiple directories to watch. For each directory, you can specify a path, a filter for the types of files to watch, an index, and whether to include subdirectories.
In the KernelMemory section, you can specify the endpoint of the KernelMemory service, your API key, and the schedule for the HttpWorker to check for new file events.
To run the service, you can either run the KernelMemory.FileWatcher project directly or build and run the Docker container.
To run the service as a standalone service, you can build and run the KernelMemory.FileWatcher project using the following commands:
dotnet run --project KernelMemory.FileWatcherdocker run -v /path/to/your/appsettings.json:/config/appsettings.json -v /path/to/your/documents-01:/data/documents-01 mplogas/km-filewatcher:latest