Skip to content

Commit f471265

Browse files
dsarnogreptile-apps[bot]coderabbitai[bot]
authored
Claude‑friendly edit tools + framed transport + live Unity NL test framework (#243)
* CI: gate desktop-parity on Anthropic key; pass anthropic_api_key like NL suite * Add quickprobe prompt and CI workflow (mcp-quickprobe.md, unity-mcp-quickprobe.yml) * strictier tool use to prevent subagent spawning and force mcp tools * update workflow filesto reduce likelihood of subagent spawning * improve permissions for claude agent, fix mcpbridge timeout/token issue * increase max turns to 10 * ci: align NL suite to new permissions schema; prevent subagent drift * ci: NL suite -> mini prompt for e2e; add full NL/T prompt; server: ctx optional + project_root fallback; workflows: set UNITY_PROJECT_ROOT for CI * ci: add checks:write; revert local project hardcodes (manifest, ProjectVersion.txt) * tools: text-edit routing fixes (anchor_insert via text, CRLF calc); prompts: mini NL/T clarifications * ci: use absolute UNITY_PROJECT_ROOT; prompts target TestProjects; server: accept relative UNITY_PROJECT_ROOT and bare spec URI * ci: ignore Unity test project's packages-lock.json; remove from repo to avoid absolute paths * CI: start persistent Unity Editor for MCP (guarded by license) + allow batch-mode bridge via UNITY_MCP_ALLOW_BATCH * CI: hide license and pass via env to docker; fix invalid ref format * CI: readiness probe uses handshake on Unity MCP port (deterministic) * CI: fix YAML; use TCP handshake readiness probe (FRAMING=1) * CI: prime Unity license via game-ci; mount ULF into container; extend readiness timeout * CI: use ULF write + mount for Unity licensing; remove serial/email/pass from container * CI: entitlement activation (UNITY_SERIAL=''); verify host ULF cache; keep mount * CI: write ULF from secret and verify; drop entitlement activation step * CI: detect any licensing path; GameCI prime; status dir env; log+probe readiness; fix YAML * CI: add GameCI license prime; conditional ULF write; one-shot license validation; explicit status dir + license env * CI: fix YAML (inline python), add Anthropic key detect via GITHUB_ENV; ready to run happy path * CI: mount Unity token/ulf/cache dirs into container to share host license; create dirs before start * CI: fix YAML indentation; write ULF on host; activate in container with shared mounts; mount .config and .cache too * CI: gate Claude via outputs; mount all Unity license dirs; fix inline probe python; stabilize licensing flow * CI: normalize detect to step outputs; ensure license dirs mounted and validated; fix indentation * Bridge: honor UNITY_MCP_STATUS_DIR for heartbeat/status file (CI-friendly) * CI: guard project path for activation/start; align tool allowlist; run MCP server with python; tighten secret scoping * CI: finalize Unity licensing mounts + status dir; mode-detect (ULF/EBL); readiness logs+probe; Claude gating via outputs * CI: fix YAML probe (inline python -c) and finalize happy-path Unity licensing and MCP/Claude wiring * CI: inline python probe; unify Unity image and cache mounts; ready to test * CI: fix docker run IMAGE placement; ignore cache find perms; keep same editor image * CI: pass -manualLicenseFile to persistent Editor; keep mounts and single image * CI: mount full GameCI cache to /root in persistent Unity; set HOME=/root; add optional license check * CI: make -manualLicenseFile conditional; keep full /root mount and license check * CI: set HOME=/github/home; mount GameCI cache there; adjust manualLicenseFile path; expand license check * CI: EBL sign-in for persistent Unity (email/password/serial); revert HOME=/root and full /root mount; keep conditional manualLicenseFile and improved readiness * CI: run full NL/T suite prompt (nl-unity-suite-full.md) instead of mini * NL/T: require unified diffs + explicit verdicts in JUnit; CI: remove short sanity step, publish JUnit, upload artifacts * NL/T prompt: require CDATA wrapping for JUnit XML fields; guidance for splitting embedded ]]>; keep VERDICT in CDATA only * CI: remove in-container license check step; keep readiness and full suite * NL/T prompt: add version header, stricter JUnit schema, hashing/normalization, anchors, statuses, atomic semantics, tool logging * CI: increase Claude NL/T suite timeout to 30 minutes * CI: pre-create reports dir and result files to avoid tool approval prompts * CI: skip wait if container not running; skip Editor start if project missing; broaden MCP deps detection; expand allowed tools * fixies to harden ManageScript * CI: sanitize NL/T markdown report to avoid NUL/encoding issues * revert breaking yyaml changes * CI: prime license, robust Unity start/wait, sanitize markdown via heredoc * Resolve merge: accept upstream renames/installer (fix/installer-cleanup-v2) and keep local framing/script-editing - Restored upstream server.py, EditorWindow, uv.lock\n- Preserved ManageScript editing/validation; switched to atomic write + debounced refresh\n- Updated tools/__init__.py to keep script_edits/resources and adopt new logger name\n- All Python tests via uv: 7 passed, 6 skipped, 9 xpassed; Unity compile OK * Fix Claude Desktop config path and atomic write issues - Fix macOS path for Claude Desktop config: use ~/Library/Application Support/Claude/ instead of ~/.config/Claude/ - Improve atomic write pattern with backup/restore safety - Replace File.Replace() with File.Move() for better macOS compatibility - Add proper error handling and cleanup for file operations - Resolves issue where installer couldn't find Claude Desktop config on macOS * Editor: use macConfigPath on macOS for MCP client config writes (Claude Desktop, etc.). Fallback to linuxConfigPath only if mac path missing. * Models: add macConfigPath to McpClient for macOS config path selection (fixes CS1061 in editor window). * Editor: on macOS, prefer macConfigPath in ManualConfigEditorWindow (fallback to linux path); Linux/Windows unchanged. * Fix McpClient: align with upstream/main, prep for framing split * NL suite: shard workflow; tighten bridge readiness; add MCP preflight; use env-based shard vars * NL suite: fix shard step indentation; move shard vars to env; remove invalid inputs * MCP clients: split VSCode Copilot config paths into macConfigPath and linuxConfigPath * Unity bridge: clean stale status; bind host; robust wait probe with IPv4/IPv6 + diagnostics * CI: use MCPForUnity.Editor.MCPForUnityBridge.StartAutoConnect as executeMethod * Action wiring: inline mcpServers in settings for all shards; remove redundant .claude/mcp.json step * CI: embed mcpServers in settings for all shards; fix startup sanity step; lint clean * CI: pin claude-code-base-action to e6f32c8; use claude_args --mcp-config; switch to allowed_tools; ensure MCP config per step * CI: unpin claude-code-base-action to @beta (commit ref not found) * CI: align with claude-code-base-action @beta; pass MCP via claude_args and allowedTools * Editor: Fix apply_text_edits heuristic when edits shift positions; recompute method span on candidate text with fallback delta adjustment * CI: unify MCP wiring across workflows; write .claude/mcp.json; switch to claude_args with --mcp-config/--allowedTools; remove unsupported inputs * CI: collapse NL suite shards into a single run to avoid repeated test execution * CI: minimize allowedTools for NL suite to essential Unity MCP + Bash("git:*") + Write * CI: mkdir -p reports before run; remove unsupported --timeout-minutes from claude_args * CI: broaden allowedTools to include find_in_file and mcp__unity__* * CI: enable use_node_cache and switch NL suite model to claude-3-7-haiku-20250219 * CI: disable use_node_cache to avoid setup-node lockfile error * CI: set NL suite model to claude-3-haiku-20240307 * CI: cap Haiku output with --max-tokens 2048 for NL suite * CI: switch to claude-3-7-sonnet-latest and remove unsupported --max-tokens * CI: update allowedTools to Bash(*) and explicit Unity MCP tool list * CI: update NL suite workflow (latest tweaks) * Tests: tighten NL suite prompt for logging, hash discipline, stale retry, evidence windows, diff cap, and VERDICT line * Add disallowed tools to NL suite workflow * docs: clarify stale write retry * Add fallback JUnit report and adjust publisher * Indent fallback JUnit XML in workflow * fix: correct fallback JUnit report generation * Update mcp-quickprobe.md Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update mcp-quickprobe.md Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update Response.cs Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update MCPForUnityBridge.cs Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: correct McpTypes reference * Add directory existence checks for symlink and XDG paths * fix: only set installation flag after successful server install * Update resource_tools.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: respect mac config paths * Use File.Replace for atomic config write * Remove unused imports in manage_script * bump server version * Tests: update NL suite prompt and workflows; remove deprecated smoke/desktop-parity; quickprobe tidy * Editor: atomic config write via File.Replace fallback; remove redundant backups and racey exists checks * CI: harden NL suite - idempotent docker, gate on unity_ok, safer port probe, least-priv perms * Editor: make atomic config write restoration safe (flag writeDone; copy-overwrite restore; cleanup in finally) * CI: fix fallback JUnit heredoc by using printf lines (no EOF delimiter issues) * CI: switch NL suite to mini prompt; mini prompt honors / and NL discipline * CI: replace claude_args with allowed_tools/model/mcp_config per action schema * CI: expand UNITY_PROJECT_ROOT via in MCP config heredoc * EditorWindow: add cross-platform fallback for File.Replace; macOS-insensitive PathsEqual; safer uv resolve; honor macConfigPath * CI: strengthen JUnit publishing for NL mini suite (normalize, debug list, publish both, fail_on_parse_error) * CI: set job-wide JUNIT_OUT/MD_OUT; normalization uses env; publish references env and ungroup reports * CI: publish a single normalized JUnit (reports/junit-for-actions.xml); fallback writes same; avoid checkName/reportPaths mismatch * CI: align mini prompt report filenames; redact Unity log tail in diagnostics * chore: sync workflow and mini prompt; redacted logs; JUnit normalization/publish tweaks * CI: redact sensitive tokens in Stop Unity; docs: CI usage + edit tools * prompts: update nl-unity-suite-full (mini-style setup + reporting discipline); remove obsolete prompts * CI: harden NL workflows (timeout_minutes, robust normalization); prompts: unify JUnit suite name and reporting discipline * prompts: add guarded write pattern (LF hash, stale_file retry) to full suite * prompts: enforce continue-on-failure, driver flow, and status handling in full suite * Make test list more explicit in prompt. Get rid of old test prompts for hygeine. * prompts: add stale fast-retry (server hash) + in-memory buf guidance * CI: standardize JUNIT_OUT to reports/junit-nl-suite.xml; fix artifact upload indentation; prompt copy cleanups * prompts: reporting discipline — append-only fragments, batch writes, no model round-trip * prompts: stale fast-retry preference, buffer/sha carry, snapshot revert, essential logging * workflows(nl-suite): precreate report skeletons, assemble junit, synthesize markdown; restrict allowed_tools to append-only Bash + MCP tools * thsis too * Update README-DEV.md Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update .github/workflows/claude-nl-suite-mini.yml Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update .github/workflows/claude-nl-suite.yml Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * workflows(nl-mini): fix YAML indentation/trailing spaces under with: and cleanup heredoc spacing * workflows(nl-suite): fix indentation on docker logs redaction line (YAML lint) * Add write to allowlist * nl-suite: harden reporting discipline (fragment-only writes, forbid alt paths); workflow: clean stray junit-*updated*.xml * nl-suite: enforce end-of-suite single Write (no bash redirection); workflow: restrict allowed_tools to Write+MCP only * prompts(nl-full): end-of-suite results must be valid XML with single <cases> root and only <testcase> children; no raw text outside CDATA * workflows(nl-suite): make Claude step non-fatal; tolerant normalizer extracts <testcase> via regex on bad fragments * nl-suite: fix stale classname to UnityMCP.NL-T in mini fallback; prompt: require re-read after every revert; correct PLAN/PROGRESS to 15 * nl-suite: fix fallback JUnit classname to UnityMCP.NL-T; prompt: forbid create_script and env/mkdir checks, enforce single baseline-byte revert flow and post-revert re-read; add corruption-handling guidance * prompts(nl-full): after each write re-read raw bytes to refresh pre_sha; prefer script_apply_edits for anchors; avoid header/using changes * prompts(nl-full): canonicalize outputs to /; allow small fragment appends via Write or Bash(printf/echo); forbid wrappers and full-file round-trips * prompts(nl-full): finalize markdown formatting for guarded write, execution order, specs, status * workflows(nl-suite, mini): header/lint fixes and constrained Bash append path; align allowed_tools * prompts(nl-full): format Fast Restore, Guarded Write, Execution, Specs, Status as proper markdown lists and code fences * workflows(nl-suite): keep header tidy and append-path alignment with prompt * minor fix * workflows(nl-suite): fix indentation and dispatch; align allowed_tools and revert helper * prompts(nl-full): switch to read_resource for buf/sha; re-read only when needed; convert 'Print this once' to heading; note snapshot helper creates parent dirs * workflows(nl-suite): normalize step removes bootstrap when real testcases present; recompute tests/failures * workflows(nl-suite): enrich Markdown summary by extracting per-test <system-out> blocks (truncated) * clarify prompt resilience instructions * ci(nl-suite): revert prompt and workflow to known-good e0f8a72 for green run; remove extra MD details * ci(nl-suite): minimal fixes — no-mkdir guard in prompt; drop bootstrap and recompute JUnit counts * ci(nl-suite): richer JUnit→Markdown report (per-test system-out) * Small guard to incorret asset read call. * ci(nl-suite): refine MD builder — unescape XML entities, safe code fences, PASS/FAIL badges * Update UnityMcpBridge/UnityMcpServer~/src/tools/resource_tools.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update UnityMcpBridge/UnityMcpServer~/src/unity_connection.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update UnityMcpBridge/UnityMcpServer~/src/tools/manage_script_edits.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update .github/scripts/mark_skipped.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update .github/scripts/mark_skipped.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update .github/scripts/mark_skipped.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * server(manage_script): robust URI handling — percent-decode file://, normalize, strip host/leading slashes, return Assets-relative if present * Update .claude/prompts/nl-unity-suite-full.md Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * tests(framing): reduce handshake poll window, nonblocking peek to avoid disconnect race; still enforce pre-handshake data drop * tests(manage_script): add _split_uri tests for unity://path, file:// URLs (decoded/Assets-relative), and plain paths * server+tests: fix handshake syntax error; robust file:// URI normalization in manage_script; add _split_uri tests; adjust stdout scan to ignore venv/site-packages * bridge(framing): accept zero-length frames (treat as empty keepalive) * tests(logging): use errors='replace' on decode fallback to avoid silent drops * resources(list): restrict to Assets/, resolve symlinks, enforce .cs; add traversal/outside-path tests * Update .claude/prompts/nl-unity-suite-full.md Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update UnityMcpBridge/UnityMcpServer~/src/unity_connection.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * misc: framing keepalive (zero-length), regex preview consistency, resource.list hardening, URI parsing, legacy update routing, test cleanups * docs(tools): richer MCP tool descriptions; tests accept decorator kwargs; resource URI parsing hardened * Update .claude/prompts/nl-unity-suite-full.md Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update UnityMcpBridge/UnityMcpServer~/src/tools/resource_tools.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update UnityMcpBridge/UnityMcpServer~/src/unity_connection.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * net+docs: hard-reject zero-length frames; TCP_NODELAY on connect; Assets detection case-insensitive; NL prompt statuses aligned * prompt(nl-suite): constrain Write destinations under reports/, forbid traversal * prompt+net: harden Write path rules; use monotonic deadline and plain-text advisory for non-framed peers * unity_connection: restore recv timeout via try/finally; make global connection getter thread-safe with module lock and double-checked init * NL/T prompt: pin structured edit ops for T-D/T-E; add schema-error guarded write behavior; keep existing path/URI and revert rules * unity_connection: add FRAMED_MAX; use ValueError for framed length violations; lower framed receive log to debug; serialize connect() with per-instance lock * ManageScript: use UTF8Encoding(without BOM) for atomic writes in ApplyTextEdits/EditScript to align with Create/Update and avoid BOM-related diffs/hash mismatches * NL/T prompt: make helper deletion regex multiline-safe ((?ms) so ^ anchors line starts) * ManageScript: emit structured overlap status {status:"overlap"} for overlapping edit ranges in apply_text_edits and edit paths * NL/T prompt: clarify fallback vs failure — fallback only for unsupported/missing_field; treat bad_request as failure; note unsupported after fallback as failure * NL/T prompt: pin deterministic overlap probe (apply_text_edits two ranges from same snapshot); gate too_large behind RUN_TOO_LARGE env hint * TB update * NL/T prompt: harden Output Rules — constrain Bash(printf|echo) to stdout-only; forbid redirection/here-docs/tee; only scripts/nlt-revert.sh may mutate FS * Prompt: enumerate allowed script_apply_edits ops; add manage_editor/read_console guidance; fix T‑F atomic batch to single script_apply_edits. ManageScript: regex timeout for diagnostics; symlink ancestor guard; complete allowed-modes list. * Fixes * ManageScript: add rich overlap diagnostics (conflicts + hint) for both text range and structured batch paths * ManageScript: return structured {status:"validation_failed"} diagnostics in create/update/edits and validate before commit * ManageScript: echo canonical uri in responses (create/read/update/apply_text_edits/structured edits) to reinforce resource identity * improve clarity of capabilities message * Framing: allow zero-length frames on both ends (C# bridge, Python server). Prompt: harden T-F to single text-range apply_text_edits batch (descending order, one snapshot). URI: normalize file:// outside Assets by stripping leading slash. * ManageScript: include new sha256 in success payload for apply_text_edits; harden TryResolveUnderAssets by rejecting symlinked ancestors up to Assets/. * remove claudetest dir * manage_script_edits: normalize method-anchored anchor_insert to insert_method (map text->replacement); improves CI compatibility for T‑A/T‑E without changing Editor behavior. * tighten testing protocol around mkdir * manage_script: validate create_script inputs (Assets/.cs/name/no traversal); add Assets/ guard to delete_script; validate level+Assets in validate_script; make legacy manage_script optional params; harden legacy update routing with base64 reuse and payload size preflight. * Tighten prompt for testing * Update .claude/prompts/nl-unity-suite-full.md Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update .claude/prompts/nl-unity-suite-full.md Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update UnityMcpBridge/Editor/Tools/ManageScript.cs Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * manage_script_edits: honor ignore_case on anchor_insert and regex_replace in both direct and text-conversion paths (MULTILINE|IGNORECASE). * remove extra file * workflow: use python3 for inline scripts and port detection on ubuntu-latest. * Tighten prompt + manage_script * Update UnityMcpBridge/UnityMcpServer~/src/tools/manage_script_edits.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update UnityMcpBridge/UnityMcpServer~/src/tools/manage_script_edits.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update .claude/prompts/nl-unity-suite-full.md Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update UnityMcpBridge/Editor/Tools/ManageScript.cs Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update .claude/prompts/nl-unity-suite-full.md Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * manage_script: improve file:// UNC handling; preserve POSIX absolute semantics internally; keep test-expected slash stripping for non-Assets paths. * ManageScript.cs: add TimeSpan timeouts to all Regex uses (IsMatch/Match/new Regex) and keep CultureInvariant/Multiline options; reduces risk of catastrophic backtracking stalls. * workflow: ensure reports/ exists in markdown build step to avoid FileNotFoundError when writing MD_OUT. * fix brace * manage_script_edits: expand backrefs for regex_replace in preview->text conversion and translate to \g<n> in local apply; keeps previews and actual edits consistent. * anchor_insert: default to position=after, normalize surrounding newlines in Python conversion paths; C# path ensures trailing newline and skips duplicate insertion within class. * feat(mcp): add get_sha tool; apply_text_edits normalization+overlap preflight+strict; no-op evidence in C#; update NL suite prompt; add unit tests * feat(frames): accept zero-length heartbeat frames in client; add heartbeat test * feat(edits): guard destructive regex_replace with structural preflight; add robust tests; prompt uses delete_method for temp helper * feat(frames): bound heartbeat loop with timeout/threshold; align zero-length response with C#; update test * SDK hardening: atomic multi-span text edits; stop forcing sequential for structured ops; forward options on apply_text_edits; add validate=relaxed support and scoped checks; update NL/T prompt; add tests for options forwarding, relaxed mode, and atomic batches * Router: default applyMode=atomic for multi-span apply_text_edits; add tests * CI prompt: pass options.validate=relaxed for T-B/C; options.applyMode=atomic for T-F; emphasize always writing testcase and restoring on errors * Validation & DX: add validate=syntax (scoped), standardize evidence windows; early regex compile with hints; debug_preview for apply_text_edits * Update UnityMcpBridge/Editor/Windows/MCPForUnityEditorWindow.cs Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * NL/T suite-driven edits: LongUnityScriptClaudeTest, bridge helpers, server_version; prepare framing tests * Fix duplicate macConfigPath field in McpClient to resolve CS0102 * Editor threading: run EnsureServerInstalled on main thread; marshal EditorPrefs/DeleteKey + logging via delayCall * Docs(apply_text_edits): strengthen guidance on 1-based positions, verify-before-edit, and recommend anchors/structured edits * Docs(script_apply_edits): add safety guidance (anchors, method ops, validators) and recommended practices * Framed VerifyBridgePing in editor window; docs hardening for apply_text_edits and script_apply_edits --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent 22e8016 commit f471265

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+8668
-1107
lines changed
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Unity NL Editing Suite — Natural Mode
2+
3+
You are running inside CI for the **unity-mcp** repository. Your task is to demonstrate end‑to‑end **natural‑language code editing** on a representative Unity C# script using whatever capabilities and servers are already available in this session. Work autonomously. Do not ask the user for input. Do NOT spawn subagents, as they will not have access to the mcp server process on the top-level agent.
4+
5+
## Mission
6+
1) **Discover capabilities.** Quietly inspect the tools and any connected servers that are available to you at session start. If the server offers a primer or capabilities resource, read it before acting.
7+
2) **Choose a target file.** Prefer `TestProjects/UnityMCPTests/Assets/Scripts/LongUnityScriptClaudeTest.cs` if it exists; otherwise choose a simple, safe C# script under `TestProjects/UnityMCPTests/Assets/`.
8+
3) **Perform a small set of realistic edits** using minimal, precise changes (not full-file rewrites). Examples of small edits you may choose from (pick 3–6 total):
9+
- Insert a new, small helper method (e.g., a logger or counter) in a sensible location.
10+
- Add a short anchor comment near a key method (e.g., above `Update()`), then add or modify a few lines nearby.
11+
- Append an end‑of‑class utility method (e.g., formatting or clamping helper).
12+
- Make a safe, localized tweak to an existing method body (e.g., add a guard or a simple accumulator).
13+
- Optionally include one idempotency/no‑op check (re‑apply an edit and confirm nothing breaks).
14+
4) **Validate your edits.** Re‑read the modified regions and verify the changes exist, compile‑risk is low, and surrounding structure remains intact.
15+
5) **Report results.** Produce both:
16+
- A JUnit XML at `reports/junit-nl-suite.xml` containing a single suite named `UnityMCP.NL` with one test case per sub‑test you executed (mark pass/fail and include helpful failure text).
17+
- A summary markdown at `reports/junit-nl-suite.md` that explains what you attempted, what succeeded/failed, and any follow‑ups you would try.
18+
6) **Be gentle and reversible.** Prefer targeted, minimal edits; avoid wide refactors or non‑deterministic changes.
19+
20+
## Assumptions & Hints (non‑prescriptive)
21+
- A Unity‑oriented MCP server is expected to be connected. If a server‑provided **primer/capabilities** resource exists, read it first. If no primer is available, infer capabilities from your visible tools in the session.
22+
- In CI/headless mode, when calling `mcp__unity__list_resources` or `mcp__unity__read_resource`, include:
23+
- `ctx: {}`
24+
- `project_root: "TestProjects/UnityMCPTests"` (the server will also accept the absolute path passed via env)
25+
Example: `{ "ctx": {}, "under": "Assets/Scripts", "pattern": "*.cs", "project_root": "TestProjects/UnityMCPTests" }`
26+
- If the preferred file isn’t present, locate a fallback C# file with simple, local methods you can edit safely.
27+
- If a compile command is available in this environment, you may optionally trigger it; if not, rely on structural checks and localized validation.
28+
29+
## Output Requirements (match NL suite conventions)
30+
- JUnit XML at `$JUNIT_OUT` if set, otherwise `reports/junit-nl-suite.xml`.
31+
- Single suite named `UnityMCP.NL`, one `<testcase>` per sub‑test; include `<failure>` on errors.
32+
- Markdown at `$MD_OUT` if set, otherwise `reports/junit-nl-suite.md`.
33+
34+
Constraints (for fast publishing):
35+
- Log allowed tools once as a single line: `AllowedTools: ...`.
36+
- For every edit: Read → Write (with precondition hash) → Re‑read; on `{status:"stale_file"}` retry once after re‑read.
37+
- Keep evidence to ±20–40 lines windows; cap unified diffs to 300 lines and note truncation.
38+
- End `<system-out>` with `VERDICT: PASS` or `VERDICT: FAIL`.
39+
40+
## Guardrails
41+
- No destructive operations. Keep changes minimal and well‑scoped.
42+
- Don’t leak secrets or environment details beyond what’s needed in the reports.
43+
- Work without user interaction; do not prompt for approval mid‑flow.
44+
45+
> If capabilities discovery fails, still produce the two reports that clearly explain why you could not proceed and what evidence you gathered.
Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
# Unity NL/T Editing Suite — CI Agent Contract
2+
3+
You are running inside CI for the `unity-mcp` repo. Use only the tools allowed by the workflow. Work autonomously; do not prompt the user. Do NOT spawn subagents.
4+
5+
**Print this once, verbatim, early in the run:**
6+
AllowedTools: Write,Bash(printf:*),Bash(echo:*),Bash(scripts/nlt-revert.sh:*),mcp__unity__manage_editor,mcp__unity__list_resources,mcp__unity__read_resource,mcp__unity__apply_text_edits,mcp__unity__script_apply_edits,mcp__unity__validate_script,mcp__unity__find_in_file,mcp__unity__read_console,mcp__unity__get_sha
7+
8+
---
9+
10+
## Mission
11+
1) Pick target file (prefer):
12+
- `unity://path/Assets/Scripts/LongUnityScriptClaudeTest.cs`
13+
2) Execute **all** NL/T tests in order using minimal, precise edits.
14+
3) Validate each edit with `mcp__unity__validate_script(level:"standard")`.
15+
4) **Report**: write one `<testcase>` XML fragment per test to `reports/<TESTID>_results.xml`. Do **not** read or edit `$JUNIT_OUT`.
16+
5) **Restore** the file after each test using the OS‑level helper (fast), not a full‑file text write.
17+
18+
---
19+
20+
## Environment & Paths (CI)
21+
- Always pass: `project_root: "TestProjects/UnityMCPTests"` and `ctx: {}` on list/read/edit/validate.
22+
- **Canonical URIs only**:
23+
- Primary: `unity://path/Assets/...` (never embed `project_root` in the URI)
24+
- Relative (when supported): `Assets/...`
25+
- File paths for the helper script are workspace‑relative:
26+
- `TestProjects/UnityMCPTests/Assets/...`
27+
28+
CI provides:
29+
- `$JUNIT_OUT=reports/junit-nl-suite.xml` (pre‑created; leave alone)
30+
- `$MD_OUT=reports/junit-nl-suite.md` (synthesized from JUnit)
31+
- Helper script: `scripts/nlt-revert.sh` (snapshot/restore)
32+
33+
---
34+
35+
## Tool Mapping
36+
- **Anchors/regex/structured**: `mcp__unity__script_apply_edits`
37+
- Allowed ops: `anchor_insert`, `replace_range`, `regex_replace` (no overlapping ranges within a single call)
38+
- **Precise ranges / atomic batch**: `mcp__unity__apply_text_edits` (non‑overlapping ranges)
39+
- Multi‑span batches are computed from the same fresh read and sent atomically by default.
40+
- Prefer `options.applyMode:"atomic"` when passing options for multiple spans; for single‑span, sequential is fine.
41+
- **Hash-only**: `mcp__unity__get_sha` — returns `{sha256,lengthBytes,lastModifiedUtc}` without file body
42+
- **Validation**: `mcp__unity__validate_script(level:"standard")`
43+
- For edits, you may pass `options.validate`:
44+
- `standard` (default): full‑file delimiter balance checks.
45+
- `relaxed`: scoped checks for interior, non‑structural text edits; do not use for header/signature/brace‑touching changes.
46+
- **Reporting**: `Write` small XML fragments to `reports/*_results.xml`
47+
- **Editor state/flush**: `mcp__unity__manage_editor` (use sparingly; no project mutations)
48+
- **Console readback**: `mcp__unity__read_console` (INFO capture only; do not assert in place of `validate_script`)
49+
- **Snapshot/Restore**: `Bash(scripts/nlt-revert.sh:*)`
50+
- For `script_apply_edits`: use `name` + workspace‑relative `path` only (e.g., `name="LongUnityScriptClaudeTest"`, `path="Assets/Scripts"`). Do not pass `unity://...` URIs as `path`.
51+
- For `apply_text_edits` / `read_resource`: use the URI form only (e.g., `uri="unity://path/Assets/Scripts/LongUnityScriptClaudeTest.cs"`). Do not concatenate `Assets/` with a `unity://...` URI.
52+
- Never call generic Bash like `mkdir`; the revert helper creates needed directories. Use only `scripts/nlt-revert.sh` for snapshot/restore.
53+
- If you believe a directory is missing, you are mistaken: the workflow pre-creates it and the snapshot helper creates it if needed. Do not attempt any Bash other than scripts/nlt-revert.sh:*.
54+
55+
### Structured edit ops (required usage)
56+
57+
# Insert a helper RIGHT BEFORE the final class brace (NL‑3, T‑D)
58+
1) Prefer `script_apply_edits` with a regex capture on the final closing brace:
59+
```json
60+
{"op":"regex_replace",
61+
"pattern":"(?s)(\\r?\\n\\s*\\})\\s*$",
62+
"replacement":"\\n // Tail test A\\n // Tail test B\\n // Tail test C\\1"}
63+
64+
2) If the server returns `unsupported` (op not available) or `missing_field` (op‑specific), FALL BACK to
65+
`apply_text_edits`:
66+
- Find the last `}` in the file (class closing brace) by scanning from end.
67+
- Insert the three comment lines immediately before that index with one non‑overlapping range.
68+
69+
# Insert after GetCurrentTarget (T‑A/T‑E)
70+
- Use `script_apply_edits` with:
71+
```json
72+
{"op":"anchor_insert","afterMethodName":"GetCurrentTarget","text":"private int __TempHelper(int a,int b)=>a+b;\\n"}
73+
```
74+
75+
# Delete the temporary helper (T‑A/T‑E)
76+
- Prefer structured delete:
77+
- Use `script_apply_edits` with `{ "op":"delete_method", "className":"LongUnityScriptClaudeTest", "methodName":"PrintSeries" }` (or `__TempHelper` for T‑A).
78+
- If structured delete is unavailable, fall back to `apply_text_edits` with a single `replace_range` spanning the exact method block (bounds computed from a fresh read); avoid whole‑file regex deletes.
79+
80+
# T‑B (replace method body)
81+
- Use `mcp__unity__apply_text_edits` with a single `replace_range` strictly inside the `HasTarget` braces.
82+
- Compute start/end from a fresh `read_resource` at test start. Do not edit signature or header.
83+
- On `{status:"stale_file"}` retry once with the server-provided hash; if absent, re-read once and retry.
84+
- On `bad_request`: write the testcase with `<failure>…</failure>`, restore, and continue to next test.
85+
- On `missing_field`: FALL BACK per above; if the fallback also returns `unsupported` or `bad_request`, then fail as above.
86+
> Don’t use `mcp__unity__create_script`. Avoid the header/`using` region entirely.
87+
88+
Span formats for `apply_text_edits`:
89+
- Prefer LSP ranges (0‑based): `{ "range": { "start": {"line": L, "character": C}, "end": {…} }, "newText": "…" }`
90+
- Explicit fields are 1‑based: `{ "startLine": L1, "startCol": C1, "endLine": L2, "endCol": C2, "newText": "…" }`
91+
- SDK preflights overlap after normalization; overlapping non‑zero spans → `{status:"overlap"}` with conflicts and no file mutation.
92+
- Optional debug: pass `strict:true` to reject explicit 0‑based fields (else they are normalized and a warning is emitted).
93+
- Apply mode guidance: router defaults to atomic for multi‑span; you can explicitly set `options.applyMode` if needed.
94+
95+
---
96+
97+
## Output Rules (JUnit fragments only)
98+
- For each test, create **one** file: `reports/<TESTID>_results.xml` containing exactly a single `<testcase ...> ... </testcase>`.
99+
Put human-readable lines (PLAN/PROGRESS/evidence) **inside** `<system-out><![CDATA[ ... ]]></system-out>`.
100+
- If content contains `]]>`, split CDATA: replace `]]>` with `]]]]><![CDATA[>`.
101+
- Evidence windows only (±20–40 lines). If showing a unified diff, cap at 100 lines and note truncation.
102+
- **Never** open/patch `$JUNIT_OUT` or `$MD_OUT`; CI merges fragments and synthesizes Markdown.
103+
- Write destinations must match: `^reports/[A-Za-z0-9._-]+_results\.xml$`
104+
- Snapshot files must live under `reports/_snapshots/`
105+
- Reject absolute paths and any path containing `..`
106+
- Reject control characters and line breaks in filenames; enforce UTF‑8
107+
- Cap basename length to ≤64 chars; cap any path segment to ≤100 and total path length to ≤255
108+
- Bash(printf|echo) must write to stdout only. Do not use shell redirection, here‑docs, or `tee` to create/modify files. The only allowed FS mutation is via `scripts/nlt-revert.sh`.
109+
110+
**Example fragment**
111+
```xml
112+
<testcase classname="UnityMCP.NL-T" name="NL-1. Method replace/insert/delete">
113+
<system-out><![CDATA[
114+
PLAN: NL-0,NL-1,NL-2,NL-3,NL-4,T-A,T-B,T-C,T-D,T-E,T-F,T-G,T-H,T-I,T-J (len=15)
115+
PROGRESS: 2/15 completed
116+
pre_sha=<...>
117+
... evidence windows ...
118+
VERDICT: PASS
119+
]]></system-out>
120+
</testcase>
121+
122+
```
123+
124+
Note: Emit the PLAN line only in NL‑0 (do not repeat it for later tests).
125+
126+
127+
### Fast Restore Strategy (OS‑level)
128+
129+
- Snapshot once at NL‑0, then restore after each test via the helper.
130+
- Snapshot (once after confirming the target):
131+
```bash
132+
scripts/nlt-revert.sh snapshot "TestProjects/UnityMCPTests/Assets/Scripts/LongUnityScriptClaudeTest.cs" "reports/_snapshots/LongUnityScriptClaudeTest.cs.baseline"
133+
```
134+
- Log `snapshot_sha=...` printed by the script.
135+
- Restore (after each mutating test):
136+
```bash
137+
scripts/nlt-revert.sh restore "TestProjects/UnityMCPTests/Assets/Scripts/LongUnityScriptClaudeTest.cs" "reports/_snapshots/LongUnityScriptClaudeTest.cs.baseline"
138+
```
139+
- Then `read_resource` to confirm and (optionally) `validate_script(level:"standard")`.
140+
- If the helper fails: fall back once to a guarded full‑file restore using the baseline bytes; then continue.
141+
142+
### Guarded Write Pattern (for edits, not restores)
143+
144+
- Before any mutation: `res = mcp__unity__read_resource(uri)`; `pre_sha = sha256(res.bytes)`.
145+
- Write with `precondition_sha256 = pre_sha` on `apply_text_edits`/`script_apply_edits`.
146+
- To compute `pre_sha` without reading file contents, you may instead call `mcp__unity__get_sha(uri).sha256`.
147+
- On `{status:"stale_file"}`:
148+
- Retry once using the server-provided hash (e.g., `data.current_sha256` or `data.expected_sha256`, per API schema).
149+
- If absent, one re-read then a final retry. No loops.
150+
- After success: immediately re-read via `res2 = mcp__unity__read_resource(uri)` and set `pre_sha = sha256(res2.bytes)` before any further edits in the same test.
151+
- Prefer anchors (`script_apply_edits`) for end-of-class / above-method insertions. Keep edits inside method bodies. Avoid header/using.
152+
153+
**On non‑JSON/transport errors (timeout, EOF, connection closed):**
154+
- Write `reports/<TESTID>_results.xml` with a `<testcase>` that includes a `<failure>` or `<error>` node capturing the error text.
155+
- Run the OS restore via `scripts/nlt-revert.sh restore …`.
156+
- Continue to the next test (do not abort).
157+
158+
**If any write returns `bad_request`, or `unsupported` after a fallback attempt:**
159+
- Write `reports/<TESTID>_results.xml` with a `<testcase>` that includes a `<failure>` node capturing the server error, include evidence, and end with `VERDICT: FAIL`.
160+
- Run `scripts/nlt-revert.sh restore ...` and continue to the next test.
161+
### Execution Order (fixed)
162+
163+
- Run exactly: NL-0, NL-1, NL-2, NL-3, NL-4, T-A, T-B, T-C, T-D, T-E, T-F, T-G, T-H, T-I, T-J (15 total).
164+
- Before NL-1..T-J: Bash(scripts/nlt-revert.sh:restore "<target>" "reports/_snapshots/LongUnityScriptClaudeTest.cs.baseline") IF the baseline exists; skip for NL-0.
165+
- NL-0 must include the PLAN line (len=15).
166+
- After each testcase, include `PROGRESS: <k>/15 completed`.
167+
168+
169+
### Test Specs (concise)
170+
171+
- NL‑0. Sanity reads — Tail ~120; ±40 around `Update()`. Then snapshot via helper.
172+
- NL‑1. Replace/insert/delete — `HasTarget → return currentTarget != null;`; insert `PrintSeries()` after `GetCurrentTarget` logging "1,2,3"; verify; delete `PrintSeries()`; restore.
173+
- NL‑2. Anchor comment — Insert `// Build marker OK` above `public void Update(...)`; restore.
174+
- NL‑3. End‑of‑class — Insert `// Tail test A/B/C` (3 lines) before final brace; restore.
175+
- NL‑4. Compile trigger — Record INFO only.
176+
177+
### T‑A. Anchor insert (text path) — Insert helper after `GetCurrentTarget`; verify; delete via `regex_replace`; restore.
178+
### T‑B. Replace body — Single `replace_range` inside `HasTarget`; restore.
179+
- Options: pass {"validate":"relaxed"} for interior one-line edits.
180+
### T‑C. Header/region preservation — Edit interior of `ApplyBlend`; preserve signature/docs/regions; restore.
181+
- Options: pass {"validate":"relaxed"} for interior one-line edits.
182+
### T‑D. End‑of‑class (anchor) — Insert helper before final brace; remove; restore.
183+
### T‑E. Lifecycle — Insert → update → delete via regex; restore.
184+
### T‑F. Atomic batch — One `mcp__unity__apply_text_edits` call (text ranges only)
185+
- Compute all three edits from the **same fresh read**:
186+
1) Two small interior `replace_range` tweaks.
187+
2) One **end‑of‑class insertion**: find the **index of the final `}`** for the class; create a zero‑width range `[idx, idx)` and set `replacement` to the 3‑line comment block.
188+
- Send all three ranges in **one call**, sorted **descending by start index** to avoid offset drift.
189+
- Expect all‑or‑nothing semantics; on `{status:"overlap"}` or `{status:"bad_request"}`, write the testcase fragment with `<failure>…</failure>`, **restore**, and continue.
190+
- Options: pass {"applyMode":"atomic"} to enforce all‑or‑nothing.
191+
- T‑G. Path normalization — Make the same edit with `unity://path/Assets/...` then `Assets/...`. Without refreshing `precondition_sha256`, the second attempt returns `{stale_file}`; retry with the server-provided hash to confirm both forms resolve to the same file.
192+
193+
### T-H. Validation (standard)
194+
- Restore baseline (helper call above).
195+
- Perform a harmless interior tweak (or none), then MUST call:
196+
mcp__unity__validate_script(level:"standard")
197+
- Write the validator output to system-out; VERDICT: PASS if standard is clean, else include <failure> with the validator message and continue.
198+
199+
### T-I. Failure surfaces (expected)
200+
- Restore baseline.
201+
- (1) OVERLAP:
202+
* Fresh read of file; compute two interior ranges that overlap inside HasTarget.
203+
* Prefer LSP ranges (0‑based) or explicit 1‑based fields; ensure both spans come from the same snapshot.
204+
* Single mcp__unity__apply_text_edits call with both ranges.
205+
* Expect `{status:"overlap"}` (SDK preflight) → record as PASS; else FAIL. Restore.
206+
- (2) STALE_FILE:
207+
* Fresh read → pre_sha.
208+
* Make a tiny legit edit with pre_sha; success.
209+
* Attempt another edit reusing the OLD pre_sha.
210+
* Expect {status:"stale_file"} → record as PASS; else FAIL. Re-read to refresh, restore.
211+
212+
### Per‑test error handling and recovery
213+
- For each test (NL‑0..T‑J), use a try/finally pattern:
214+
- Always write a testcase fragment and perform restore in finally, even when tools return error payloads.
215+
- try: run the test steps; always write `reports/<ID>_results.xml` with PASS/FAIL/ERROR
216+
- finally: run Bash(scripts/nlt-revert.sh:restore …baseline) to restore the target file
217+
- On any transport/JSON/tool exception:
218+
- catch and write a `<testcase>` fragment with an `<error>` node (include the message), then proceed to the next test.
219+
- After NL‑4 completes, proceed directly to T‑A regardless of any earlier validator warnings (do not abort the run).
220+
- (3) USING_GUARD (optional):
221+
* Attempt a 1-line insert above the first 'using'.
222+
* Expect {status:"using_guard"} → record as PASS; else note 'not emitted'. Restore.
223+
224+
### T-J. Idempotency
225+
- Restore baseline.
226+
- Repeat a replace_range twice (second call may be noop). Validate standard after each.
227+
- Insert or ensure a tiny comment, then delete it twice (second delete may be noop).
228+
- Restore and PASS unless an error/structural break occurred.
229+
230+
231+
### Status & Reporting
232+
233+
- Safeguard statuses are non‑fatal; record and continue.
234+
- End each testcase `<system-out>` with `VERDICT: PASS` or `VERDICT: FAIL`.

0 commit comments

Comments
 (0)