Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# syntax=docker/dockerfile:1
FROM ghcr.io/astral-sh/uv:python3.13-bookworm

# Install Node.js and npm for mermaid-cli
RUN apt-get update \
&& apt-get install -y --no-install-recommends nodejs npm \
&& rm -rf /var/lib/apt/lists/*

# Install mermaid CLI globally
RUN npm install -g @mermaid-js/mermaid-cli

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Install Chromium runtime deps for mermaid-cli

The Docker image installs Node and @mermaid-js/mermaid-cli but never installs the system libraries that Puppeteer’s bundled Chromium requires. On Debian-bookworm bases, running mmdc without packages such as libnss3, libatk-bridge2.0-0, libgtk-3-0, libasound2, and fonts results in Error: Failed to launch the browser process before any diagrams are rendered. Because the entrypoint relies on mermaid-cli for the benchmark, the container will fail at runtime. Add the standard Chromium dependency packages during the apt-get install step so the CLI can start successfully.

Useful? React with 👍 / 👎.


# Set work directory
WORKDIR /app

# Copy dependency files first for better caching
COPY pyproject.toml uv.lock ./

# Copy source
COPY . .

# Sync dependencies during build so they are baked into the image
RUN uv sync --frozen

# Default entrypoint
ENTRYPOINT ["/app/docker/merbench-entrypoint.sh"]
12 changes: 11 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,14 @@ adk_basic_ui:
uv run adk web agents_mcp_usage/basic_mcp

adk_multi_ui:
uv run adk web agents_mcp_usage/multi_mcp
uv run adk web agents_mcp_usage/multi_mcp

merbench-docker-build:
docker build -t merbench .

merbench-docker-run:
docker run --rm \
-e GEMINI_API_KEY=$${GEMINI_API_KEY} \
-e OPENAI_API_KEY=$${OPENAI_API_KEY} \
-v "$$(pwd)/mermaid_eval_results:/app/mermaid_eval_results" \
merbench $${ARGS}
28 changes: 28 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,34 @@ Run an Agent framework script e.g.:

Check console, Logfire, or the ADK web UI for output

## Docker quickstart

Build the pre-baked evaluation image (includes Python dependencies and the Mermaid CLI):

```bash
docker build -t merbench .
```

Run the multi-model Mermaid benchmark with your API keys and a bind mount so results persist on the host:

```bash
docker run --rm \
-e GEMINI_API_KEY="your-gemini-key" \
-e OPENAI_API_KEY="your-openai-key" \
-v "$(pwd)/mermaid_eval_results:/app/mermaid_eval_results" \
merbench --models gemini-1.5-pro,openai-gpt-4.1-mini
```

The container entrypoint defaults to `run_multi_evals.py`. Override it to launch other tooling, such as the evaluation UI:

```bash
docker run --rm \
-e GEMINI_API_KEY="your-gemini-key" \
-v "$(pwd)/mermaid_eval_results:/app/mermaid_eval_results" \
--entrypoint uv \
merbench run agents_mcp_usage/evaluations/mermaid_evals/merbench_ui.py
```

## Project Overview

This project aims to teach:
Expand Down
7 changes: 7 additions & 0 deletions docker/merbench-entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/sh
set -e

# Ensure results directory exists if bind-mounted
mkdir -p /app/mermaid_eval_results

exec uv run agents_mcp_usage/evaluations/mermaid_evals/run_multi_evals.py "$@"