SnapToken

This project is a proof-of-concept middleware designed to reduce the token cost of long conversations with large language models (LLMs) by converting extensive message histories into a single image.

Blog Post

For a detailed walkthrough and explanation of this project, check out the Medium article: Cutting LLM Costs by Converting Long Text into Images

The Problem

When interacting with LLMs like GPT-4, the entire conversation history is typically sent with each new request. As the conversation grows, so does the number of tokens, leading to significantly higher API costs. A conversation with thousands of words can quickly become expensive to maintain.

The Solution

This project introduces a Flask-based proxy server that acts as a middleware between your application and the LLM API (e.g., OpenAI). The middleware monitors the conversation's word count. When the count exceeds a predefined threshold (currently 750 words), it performs the following steps:

Image Conversion: The entire conversation history is rendered as a text-based image.
Payload Reconstruction: The original message history is replaced with a new payload suitable for a vision-capable model (like GPT-4o). This new payload contains:
- The generated image of the conversation history.
- A final prompt instructing the model to use the image as context to answer the user's latest message.
Cost Savings: By representing thousands of words as a single image, we drastically reduce the number of tokens sent, leading to significant cost savings on long conversations.

How It Works

The project consists of two main components:

1. The Proxy Server

A Flask application that listens for requests on a local port.
It intercepts requests, calculates the word count of the messages payload.
If the word count is over 750, it uses the TextToImageOptimizer class to generate a PNG image from the text.
It then forwards a modified request to the target API with the image and the final user prompt.
If the word count is below the threshold, it simply forwards the request as is.

2. The Client

A simple command-line interface for interacting with the proxy server.
It maintains a local message history and allows you to chat in a loop.
After each message, it prints the total word count of the conversation, so you can see when the image conversion will be triggered.

Setup and Usage

1. Dependencies

Install the required Python libraries using the requirements file:

pip install -r requirements.txt

You may also need to install a font for the image generation. The script tries to find a suitable font, but if not available, it uses a default PIL font which may not be ideal. On Debian/Ubuntu, you can install fonts with:

sudo apt-get install fonts-dejavu

2. Configure API Key

When using the client or sending direct requests, ensure you provide your API key. For the client script, open the request script and set the API_KEY variable.

3. Run the Proxy Server

Start the middleware by running the server script:

python server.py

You will see a message indicating that the server is running on port 8000.

4. Run the Client (Optional)

In a separate terminal, run the interactive client to chat:

python request.py

Direct API Usage

You can also interact with the proxy server directly from your own application or using tools like curl.

Send a POST request to the proxy's endpoint.

Required Headers

Content-Type: application/json
Authorization: Bearer YOUR_API_KEY (Your actual API key)
X-Target-Url: The URL of the target LLM API

Payload Structure

The JSON payload should follow the standard structure of the target API. For OpenAI-compatible APIs, this includes:

model: The name of the model you want to use.
messages: A list of message objects, each with a role (system, user, or assistant) and content.
Other optional parameters like stream, temperature, etc.

Example Python Request

import requests
import json

PROXY_URL = "http://localhost:8000/v1/chat/completions"
API_KEY = "YOUR_API_KEY"
TARGET_URL = "https://api.openai.com/v1/chat/completions"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_KEY}",
    "X-Target-Url": TARGET_URL
}

payload = {
    "model": "gpt-4o",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, what is the capital of France?"}
    ],
    "stream": False
}

try:
    response = requests.post(PROXY_URL, headers=headers, json=payload)
    response.raise_for_status()
    print(response.json())
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

Example curl Request

Here is a basic example of how to send a request using curl.

curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Target-Url: https://api.openai.com/v1/chat/completions" \
-d '{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello, what is the capital of France?"
    }
  ],
  "stream": false
}'

To-Do / Future Improvements

Dynamic Threshold: Allow the word count threshold to be set via an environment variable or header.
Smarter History Truncation: Instead of converting the entire history, keep the last few messages as text and convert only the older parts to an image.
Support for Other Media: Extend the middleware to handle other types of attachments or data.
Async Processing: Improve performance by using an asynchronous web framework.
Upload to PyPI: Package the project and upload it to the Python Package Index (PyPI) for easier distribution.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
README.md		README.md
request.py		request.py
requirements.txt		requirements.txt
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SnapToken

Blog Post

The Problem

The Solution

How It Works

1. The Proxy Server

2. The Client

Setup and Usage

1. Dependencies

2. Configure API Key

3. Run the Proxy Server

4. Run the Client (Optional)

Direct API Usage

Required Headers

Payload Structure

Example Python Request

Example curl Request

To-Do / Future Improvements

About

Uh oh!

Releases

Packages

Languages

Progressing-Llama/SnapToken

Folders and files

Latest commit

History

Repository files navigation

SnapToken

Blog Post

The Problem

The Solution

How It Works

1. The Proxy Server

2. The Client

Setup and Usage

1. Dependencies

2. Configure API Key

3. Run the Proxy Server

4. Run the Client (Optional)

Direct API Usage

Required Headers

Payload Structure

Example Python Request

Example curl Request

To-Do / Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages