diff --git a/.cursor/rules/wormhole-ai-resources.mdc b/.cursor/rules/wormhole-ai-resources.mdc new file mode 100644 index 0000000..b35d4cc --- /dev/null +++ b/.cursor/rules/wormhole-ai-resources.mdc @@ -0,0 +1,20 @@ +--- +description: When to use Wormhole AI doc bundles vs repo Chain 138 / CCIP canonicals +alwaysApply: false +--- + +# Wormhole AI resources vs this repo + +## Use Wormhole’s bundles for + +- Wormhole protocol behavior: VAAs, Guardians, NTT, Connect, Executor, Wormhole CCTP integration, Wormhole Queries, MultiGov, Settlement, TypeScript/Solidity SDK **as documented by Wormhole**. +- Prefer the **tier ladder**: `llms.txt` → `site-index.json` → category `.md` → `llms-full.jsonl` only for RAG or very large context. +- Canonical URLs and mirror script: [docs/04-configuration/WORMHOLE_AI_RESOURCES_LLM_PLAYBOOK.md](docs/04-configuration/WORMHOLE_AI_RESOURCES_LLM_PLAYBOOK.md). +- Optional MCP: `mcp-wormhole-docs` (read-only resources + `wormhole_doc_search`); see [docs/04-configuration/MCP_SETUP.md](docs/04-configuration/MCP_SETUP.md). + +## Use repo canonical docs for + +- **Chain 138** token addresses, PMM pools, DODOPMMIntegration, deployer wallet, Blockscout alignment. +- **CCIP** routes, receivers, and this project’s bridge runbooks. + +Do not answer “what is the canonical cUSDT address on 138?” from Wormhole docs. Do not answer “how does Wormhole NTT deploy on Solana?” from `ADDRESS_MATRIX_AND_STATUS.md` unless it explicitly cites Wormhole. diff --git a/docs/04-configuration/WORMHOLE_AI_RESOURCES_LLM_PLAYBOOK.md b/docs/04-configuration/WORMHOLE_AI_RESOURCES_LLM_PLAYBOOK.md new file mode 100644 index 0000000..313ce99 --- /dev/null +++ b/docs/04-configuration/WORMHOLE_AI_RESOURCES_LLM_PLAYBOOK.md @@ -0,0 +1,63 @@ +# Wormhole AI resources — LLM and agent playbook + +**Purpose:** How agents and humans should use Wormhole’s published documentation bundles for **Wormhole protocol** work, without mixing them up with **DBIS Chain 138** canonical facts in this repo. + +**Upstream hub:** [Wormhole AI Resources](https://wormhole.com/docs/ai-resources/ai-resources/) +**Source repo (docs):** [wormhole-foundation/wormhole-docs](https://github.com/wormhole-foundation/wormhole-docs) + +--- + +## Canonical fetch URLs (verified) + +Use these URLs in automation and MCP. The path `https://wormhole.com/docs/docs/...` (double `docs`) returns **404**; prefer the paths below. + +| Tier | Artifact | URL | +|------|-----------|-----| +| 1 | `llms.txt` | `https://wormhole.com/docs/llms.txt` | +| 2 | `site-index.json` | `https://wormhole.com/docs/ai/site-index.json` | +| 3 | Category bundles | `https://wormhole.com/docs/ai/categories/.md` | +| 4 | Full corpus | `https://wormhole.com/docs/ai/llms-full.jsonl` | + +**Category file names:** `basics`, `ntt`, `connect`, `wtt`, `settlement`, `executor`, `multigov`, `queries`, `transfer`, `typescript-sdk`, `solidity-sdk`, `cctp`, `reference` (each as `.md`). + +**Per-page markdown (optional):** entries in `site-index.json` include `resolved_md_url` / `raw_md_url` under `https://wormhole.com/docs/ai/pages/...` for single-page depth. + +--- + +## Consumption ladder (smallest context first) + +1. **`llms.txt`** — Map of the doc site and links; use first to decide where to go next. +2. **`site-index.json`** — Lightweight index (title, preview, categories, URLs); use for retrieval and “which page answers X”. +3. **Category `.md`** — Focused implementation tasks (NTT, TypeScript SDK, reference, etc.). +4. **`llms-full.jsonl`** — Full text + metadata; use only for large-context models or **indexed RAG** (see [WORMHOLE_AI_RESOURCES_RAG.md](WORMHOLE_AI_RESOURCES_RAG.md)). Do not paste whole file into a small context window. + +Wormhole notes these files are **informational only** (no embedded persona); safe to combine with project [AGENTS.md](../../AGENTS.md) and Cursor rules. + +--- + +## Boundary: Wormhole vs this repo + +| Topic | Source | +|--------|--------| +| Wormhole NTT, Connect, VAAs, Guardians, Executor, Wormhole CCTP integration, chain IDs **on Wormhole-supported networks** | Wormhole AI bundles + official Wormhole reference | +| Chain **138** token addresses, PMM pools, DODOPMMIntegration, **CCIP** routes, deployer wallet, Blockscout labels | Repo canonical docs: [EXPLORER_TOKEN_LIST_CROSSCHECK.md](../11-references/EXPLORER_TOKEN_LIST_CROSSCHECK.md), [ADDRESS_MATRIX_AND_STATUS.md](../11-references/ADDRESS_MATRIX_AND_STATUS.md), [07-ccip/](../07-ccip/) runbooks | + +Do **not** treat Wormhole docs as authority for Chain 138 deployment facts. Do **not** treat this repo’s CCIP docs as authority for Wormhole core contracts on other chains. + +--- + +## Local mirror and MCP + +- **Sync script:** `bash scripts/doc/sync-wormhole-ai-resources.sh` — downloads tier 1–3 (and optionally tier 4) into `third-party/wormhole-ai-docs/` and writes `manifest.json` (SHA-256, timestamps). +- **MCP server:** `mcp-wormhole-docs/` — read-only resources and `wormhole_doc_search`; see [MCP_SETUP.md](MCP_SETUP.md). +- **Health check:** `bash scripts/verify/verify-wormhole-ai-docs-setup.sh` — mirror presence + `node --check` on the MCP entrypoint. + +--- + +## Security and ops + +- Fetch only from `https://wormhole.com/docs/` (allowlisted in the MCP server when using live fetch). +- `llms-full.jsonl` is large; mirror with `INCLUDE_FULL_JSONL=1` only when needed for RAG or offline use. +- Re-sync when Wormhole ships breaking doc changes; keep `manifest.json` for audit (“which snapshot was used?”). + +**Licensing:** Wormhole Foundation material — use mirrors and RAG **consistent with their terms**; link to original URLs in answers when possible. diff --git a/docs/04-configuration/WORMHOLE_AI_RESOURCES_RAG.md b/docs/04-configuration/WORMHOLE_AI_RESOURCES_RAG.md new file mode 100644 index 0000000..e008a87 --- /dev/null +++ b/docs/04-configuration/WORMHOLE_AI_RESOURCES_RAG.md @@ -0,0 +1,64 @@ +# Wormhole `llms-full.jsonl` — RAG and chunking strategy + +**Purpose:** How to index Wormhole’s full documentation export for retrieval-augmented generation without blowing context limits or drowning out Chain 138 canonical facts. + +**Prerequisite:** Download the corpus with `INCLUDE_FULL_JSONL=1 bash scripts/doc/sync-wormhole-ai-resources.sh` (or `--full-jsonl`). File: `third-party/wormhole-ai-docs/llms-full.jsonl` (gitignored; large). + +**Playbook (tiers):** [WORMHOLE_AI_RESOURCES_LLM_PLAYBOOK.md](WORMHOLE_AI_RESOURCES_LLM_PLAYBOOK.md) + +--- + +## Category-first retrieval (default policy) + +1. **Before** querying `llms-full.jsonl`, resolve intent: + - **Broad protocol** → start from mirrored `categories/basics.md` or `reference.md`. + - **Product-specific** → pick the matching category file (`ntt.md`, `cctp.md`, `typescript-sdk.md`, etc.) from the mirror or `https://wormhole.com/docs/ai/categories/.md`. +2. Use **`site-index.json`** (tier 2) to rank **page-level** `id` / `title` / `preview` / `categories` and obtain `html_url` / `resolved_md_url`. +3. Only then ingest or search **full JSONL** lines that correspond to those pages (if your pipeline supports filtering by `id` or URL prefix). + +This keeps answers aligned with Wormhole’s own doc structure and reduces irrelevant hits. + +--- + +## Chunking `llms-full.jsonl` + +The file is **JSON Lines**: each line is one JSON object (typically one doc page or chunk with metadata). + +**Recommended:** + +- **Parse line-by-line** (streaming); do not load the entire file into RAM for parsing. +- **One line = one logical chunk** if each object already represents a single page; if objects are huge, split on `sections` or headings when present in the schema. +- **Metadata to store per chunk:** at minimum `id`, `title`, `slug`, `html_url`, and any `categories` / `hash` fields present in that line. Prefer storing **source URL** for citation in agent answers. +- **Embeddings:** embed `title + "\n\n" + body_or_preview` (or equivalent text field in the object); keep URL in metadata only for the retriever to return to the user. + +**Deduplication:** if the same `hash` or `id` appears across syncs, replace vectors for that id on re-index. + +--- + +## Query flow (RAG) + +```mermaid +flowchart TD + Q[User query] --> Intent{Wormhole product area?} + Intent -->|yes| Cat[Retrieve from category md slice] + Intent -->|unclear| Idx[Search site-index.json previews] + Cat --> NeedFull{Need deeper text?} + Idx --> NeedFull + NeedFull -->|no| Ans[Answer with citations] + NeedFull -->|yes| JSONL[Vector search filtered llms-full.jsonl by category or id] + JSONL --> Ans +``` + +--- + +## Boundaries + +- RAG over Wormhole docs improves **Wormhole** answers; it does **not** override [EXPLORER_TOKEN_LIST_CROSSCHECK.md](../11-references/EXPLORER_TOKEN_LIST_CROSSCHECK.md) or CCIP runbooks for **Chain 138** deployment truth. +- If a user question mixes both (e.g. “bridge USDC to Chain 138 via Wormhole”), answer in **two explicit sections**: Wormhole mechanics vs this repo’s CCIP / 138 facts. + +--- + +## Re-sync and audit + +- After `sync-wormhole-ai-resources.sh`, commit or archive **`third-party/wormhole-ai-docs/manifest.json`** when you want a recorded snapshot (hashes per file). +- Rebuild or delta-update the vector index when `manifest.json` changes. diff --git a/mcp-wormhole-docs/README.md b/mcp-wormhole-docs/README.md new file mode 100644 index 0000000..58469ba --- /dev/null +++ b/mcp-wormhole-docs/README.md @@ -0,0 +1,31 @@ +# mcp-wormhole-docs + +Read-only MCP server exposing Wormhole’s **AI documentation exports** as **resources**, plus **`wormhole_doc_search`** over `site-index.json`. + +- **Playbook:** [docs/04-configuration/WORMHOLE_AI_RESOURCES_LLM_PLAYBOOK.md](../docs/04-configuration/WORMHOLE_AI_RESOURCES_LLM_PLAYBOOK.md) +- **Client wiring:** [docs/04-configuration/MCP_SETUP.md](../docs/04-configuration/MCP_SETUP.md) +- **Mirror:** `bash scripts/doc/sync-wormhole-ai-resources.sh` → `third-party/wormhole-ai-docs/` + +## Environment + +| Variable | Default | Meaning | +|----------|---------|---------| +| `WORMHOLE_DOCS_MIRROR` | `/third-party/wormhole-ai-docs` | Directory with synced files | +| `WORMHOLE_DOCS_FETCH` | `0` | If `1`, fall back to HTTPS from `https://wormhole.com/docs/...` when a file is missing locally | +| `WORMHOLE_MAX_RESOURCE_BYTES` | `5242880` | Max bytes returned for `llms-full.jsonl` via MCP read (avoid OOM); increase or read file on disk for RAG | + +## Run + +```bash +cd mcp-wormhole-docs && pnpm install && node index.js +``` + +## Resources + +URIs use the `wormhole://ai/...` scheme, e.g. `wormhole://ai/site-index.json`, `wormhole://ai/categories/ntt.md`. + +## Tools + +- **`wormhole_doc_search`** — `query` (string), optional `limit` (number, default 10). Searches titles, previews, slugs, and categories in `site-index.json`. + +This server does **not** submit transactions or hold keys. It is documentation context only. diff --git a/mcp-wormhole-docs/index.js b/mcp-wormhole-docs/index.js new file mode 100644 index 0000000..8c50f60 --- /dev/null +++ b/mcp-wormhole-docs/index.js @@ -0,0 +1,289 @@ +#!/usr/bin/env node + +import { Server } from '@modelcontextprotocol/sdk/server/index.js'; +import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'; +import { + CallToolRequestSchema, + ListResourcesRequestSchema, + ListToolsRequestSchema, + ReadResourceRequestSchema, +} from '@modelcontextprotocol/sdk/types.js'; +import fetch from 'node-fetch'; +import { readFileSync, existsSync } from 'fs'; +import { join, dirname } from 'path'; +import { fileURLToPath } from 'url'; + +const __filename = fileURLToPath(import.meta.url); +const __dirname = dirname(__filename); + +const REPO_ROOT = join(__dirname, '..'); +const DEFAULT_MIRROR = join(REPO_ROOT, 'third-party', 'wormhole-ai-docs'); +const ALLOWED_FETCH_PREFIX = 'https://wormhole.com/docs'; +const LLMS_TXT_URL = `${ALLOWED_FETCH_PREFIX}/llms.txt`; + +const CATEGORIES = [ + 'basics', + 'ntt', + 'connect', + 'wtt', + 'settlement', + 'executor', + 'multigov', + 'queries', + 'transfer', + 'typescript-sdk', + 'solidity-sdk', + 'cctp', + 'reference', +]; + +function mirrorRoot() { + return process.env.WORMHOLE_DOCS_MIRROR || DEFAULT_MIRROR; +} + +function allowFetch() { + return process.env.WORMHOLE_DOCS_FETCH === '1' || process.env.WORMHOLE_DOCS_FETCH === 'true'; +} + +function maxResourceBytes() { + const n = parseInt(process.env.WORMHOLE_MAX_RESOURCE_BYTES || '5242880', 10); + return Number.isFinite(n) && n > 0 ? n : 5242880; +} + +/** @param {string} uri */ +function uriToHttps(uri) { + if (uri === 'wormhole://ai/llms.txt') return LLMS_TXT_URL; + if (uri === 'wormhole://ai/site-index.json') + return `${ALLOWED_FETCH_PREFIX}/ai/site-index.json`; + if (uri === 'wormhole://ai/llms-full.jsonl') + return `${ALLOWED_FETCH_PREFIX}/ai/llms-full.jsonl`; + const m = uri.match(/^wormhole:\/\/ai\/categories\/([a-z0-9-]+\.md)$/); + if (m) return `${ALLOWED_FETCH_PREFIX}/ai/categories/${m[1]}`; + return null; +} + +/** @param {string} uri */ +function uriToRelativePath(uri) { + if (uri === 'wormhole://ai/llms.txt') return 'llms.txt'; + if (uri === 'wormhole://ai/site-index.json') return 'site-index.json'; + if (uri === 'wormhole://ai/llms-full.jsonl') return 'llms-full.jsonl'; + const m = uri.match(/^wormhole:\/\/ai\/categories\/([a-z0-9-]+\.md)$/); + if (m) return join('categories', m[1]); + return null; +} + +/** @param {string} url */ +async function fetchAllowed(url) { + if (!url || !url.startsWith(ALLOWED_FETCH_PREFIX)) { + throw new Error(`Fetch URL not allowlisted: ${url}`); + } + const res = await fetch(url, { + redirect: 'follow', + headers: { 'user-agent': 'mcp-wormhole-docs/1.0' }, + }); + if (!res.ok) throw new Error(`HTTP ${res.status} for ${url}`); + const len = res.headers.get('content-length'); + if (len && parseInt(len, 10) > 80 * 1024 * 1024) { + throw new Error(`Refusing to fetch body larger than 80MB (${len} bytes)`); + } + return Buffer.from(await res.arrayBuffer()); +} + +/** + * @param {string} uri + */ +async function readResourceContentAsync(uri) { + const rel = uriToRelativePath(uri); + if (!rel) throw new Error(`Unknown resource URI: ${uri}`); + const root = mirrorRoot(); + const localPath = join(root, rel); + let buf; + if (existsSync(localPath)) buf = readFileSync(localPath); + else if (allowFetch()) { + const u = uriToHttps(uri); + if (!u) throw new Error(`No HTTPS mapping for ${uri}`); + buf = await fetchAllowed(u); + } else { + throw new Error(`Missing ${localPath}. Sync mirror or set WORMHOLE_DOCS_FETCH=1`); + } + return formatBuffer(uri, rel, buf); +} + +/** + * @param {string} uri + * @param {string} rel + * @param {Buffer} buf + */ +function formatBuffer(uri, rel, buf) { + const isJsonl = rel === 'llms-full.jsonl'; + const max = maxResourceBytes(); + if (isJsonl && buf.length > max) { + const head = buf.subarray(0, max).toString('utf8'); + return { + mimeType: 'text/plain; charset=utf-8', + text: + `[Truncated: ${buf.length} bytes, showing first ${max} bytes. Set WORMHOLE_MAX_RESOURCE_BYTES or read ${join(mirrorRoot(), rel)} on disk for RAG.]\n\n` + + head, + }; + } + + if (rel.endsWith('.json')) { + return { mimeType: 'application/json; charset=utf-8', text: buf.toString('utf8') }; + } + if (rel.endsWith('.jsonl')) { + return { mimeType: 'application/x-ndjson; charset=utf-8', text: buf.toString('utf8') }; + } + return { mimeType: 'text/plain; charset=utf-8', text: buf.toString('utf8') }; +} + +function buildResourceList() { + const resources = [ + { + uri: 'wormhole://ai/llms.txt', + name: 'Wormhole llms.txt', + description: 'Tier 1: site map and links (llms.txt standard)', + mimeType: 'text/plain', + }, + { + uri: 'wormhole://ai/site-index.json', + name: 'Wormhole site-index.json', + description: 'Tier 2: lightweight page index with previews', + mimeType: 'application/json', + }, + { + uri: 'wormhole://ai/llms-full.jsonl', + name: 'Wormhole llms-full.jsonl', + description: 'Tier 4: full doc corpus (large; may truncate via MCP read)', + mimeType: 'application/x-ndjson', + }, + ]; + for (const c of CATEGORIES) { + resources.push({ + uri: `wormhole://ai/categories/${c}.md`, + name: `Wormhole category: ${c}`, + description: `Tier 3: bundled markdown for ${c}`, + mimeType: 'text/markdown', + }); + } + return resources; +} + +async function loadSiteIndex() { + const root = mirrorRoot(); + const p = join(root, 'site-index.json'); + if (existsSync(p)) { + return JSON.parse(readFileSync(p, 'utf8')); + } + if (allowFetch()) { + const buf = await fetchAllowed(`${ALLOWED_FETCH_PREFIX}/ai/site-index.json`); + return JSON.parse(buf.toString('utf8')); + } + throw new Error( + `site-index.json not found under ${root}. Run scripts/doc/sync-wormhole-ai-resources.sh or set WORMHOLE_DOCS_FETCH=1` + ); +} + +/** + * @param {string} query + * @param {number} limit + */ +async function searchDocs(query, limit) { + const q = (query || '').toLowerCase().trim(); + if (!q) return { results: [], note: 'Empty query' }; + + const index = await loadSiteIndex(); + if (!Array.isArray(index)) { + return { results: [], note: 'site-index.json is not an array' }; + } + + const scored = []; + for (const page of index) { + const title = (page.title || '').toLowerCase(); + const prev = (page.preview || '').toLowerCase(); + const slug = (page.slug || '').toLowerCase(); + const id = (page.id || '').toLowerCase(); + const cats = Array.isArray(page.categories) ? page.categories.join(' ').toLowerCase() : ''; + + let score = 0; + if (title.includes(q)) score += 10; + if (slug.includes(q) || id.includes(q)) score += 8; + if (prev.includes(q)) score += 5; + if (cats.includes(q)) score += 3; + for (const word of q.split(/\s+/).filter((w) => w.length > 2)) { + if (title.includes(word)) score += 2; + if (prev.includes(word)) score += 1; + } + + if (score > 0) { + scored.push({ + score, + title: page.title, + id: page.id, + html_url: page.html_url, + resolved_md_url: page.resolved_md_url, + preview: page.preview + ? page.preview.slice(0, 400) + (page.preview.length > 400 ? '…' : '') + : undefined, + categories: page.categories, + }); + } + } + + scored.sort((a, b) => b.score - a.score); + return { results: scored.slice(0, limit) }; +} + +const server = new Server( + { name: 'wormhole-docs', version: '1.0.0' }, + { capabilities: { resources: {}, tools: {} } } +); + +server.setRequestHandler(ListResourcesRequestSchema, async () => ({ + resources: buildResourceList(), +})); + +server.setRequestHandler(ReadResourceRequestSchema, async (request) => { + const uri = request.params.uri; + const { text, mimeType } = await readResourceContentAsync(uri); + return { + contents: [{ uri, mimeType, text }], + }; +}); + +server.setRequestHandler(ListToolsRequestSchema, async () => ({ + tools: [ + { + name: 'wormhole_doc_search', + description: + 'Search Wormhole documentation site-index (titles, previews, categories). Use for tier-2 retrieval before loading full category markdown or llms-full.jsonl.', + inputSchema: { + type: 'object', + properties: { + query: { type: 'string', description: 'Search string (e.g. NTT, VAA, CCTP)' }, + limit: { type: 'number', description: 'Max results (default 10)', default: 10 }, + }, + required: ['query'], + }, + }, + ], +})); + +server.setRequestHandler(CallToolRequestSchema, async (request) => { + if (request.params.name !== 'wormhole_doc_search') { + throw new Error(`Unknown tool: ${request.params.name}`); + } + const args = request.params.arguments || {}; + const limit = Math.min(50, Math.max(1, parseInt(String(args.limit || 10), 10) || 10)); + const { results, note } = await searchDocs(String(args.query || ''), limit); + return { + content: [ + { + type: 'text', + text: JSON.stringify({ note, count: results.length, results }, null, 2), + }, + ], + }; +}); + +const transport = new StdioServerTransport(); +await server.connect(transport); diff --git a/mcp-wormhole-docs/package.json b/mcp-wormhole-docs/package.json new file mode 100644 index 0000000..736de08 --- /dev/null +++ b/mcp-wormhole-docs/package.json @@ -0,0 +1,15 @@ +{ + "name": "mcp-wormhole-docs", + "version": "1.0.0", + "description": "Read-only MCP server: Wormhole AI doc mirror resources + site-index search", + "type": "module", + "main": "index.js", + "scripts": { + "start": "node index.js" + }, + "dependencies": { + "@modelcontextprotocol/sdk": "^0.4.0", + "node-fetch": "^3.3.2" + }, + "license": "MIT" +} diff --git a/scripts/doc/sync-wormhole-ai-resources.sh b/scripts/doc/sync-wormhole-ai-resources.sh new file mode 100755 index 0000000..8312d51 --- /dev/null +++ b/scripts/doc/sync-wormhole-ai-resources.sh @@ -0,0 +1,97 @@ +#!/usr/bin/env bash +# Sync Wormhole AI documentation exports for offline use, MCP mirror, and RAG prep. +# See docs/04-configuration/WORMHOLE_AI_RESOURCES_LLM_PLAYBOOK.md +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)" +OUT="${WORMHOLE_AI_DOCS_DIR:-$REPO_ROOT/third-party/wormhole-ai-docs}" +BASE="https://wormhole.com/docs" + +CATEGORIES=( + basics ntt connect wtt settlement executor multigov queries transfer + typescript-sdk solidity-sdk cctp reference +) + +usage() { + echo "Usage: $0 [--dry-run] [--full-jsonl]" + echo " WORMHOLE_AI_DOCS_DIR output directory (default: third-party/wormhole-ai-docs)" + echo " INCLUDE_FULL_JSONL=1 or pass --full-jsonl to download llms-full.jsonl (large)" + exit "${1:-0}" +} + +DRY=0 +FULL=0 +while [[ $# -gt 0 ]]; do + case "$1" in + --dry-run) DRY=1 ;; + --full-jsonl) FULL=1 ;; + -h|--help) usage 0 ;; + *) echo "Unknown option: $1" >&2; usage 1 ;; + esac + shift +done + +if [[ -n "${INCLUDE_FULL_JSONL:-}" && "$INCLUDE_FULL_JSONL" != "0" ]]; then + FULL=1 +fi + +echo "Output directory: $OUT" +if [[ "$DRY" -eq 1 ]]; then + echo "[dry-run] would mkdir, curl downloads, write manifest.json" + exit 0 +fi + +mkdir -p "$OUT/categories" + +fetch() { + local url="$1" + local dest="$2" + echo " GET $url -> $dest" + curl -fsSL --connect-timeout 30 --max-time 600 -o "$dest" "$url" +} + +fetch "$BASE/llms.txt" "$OUT/llms.txt" +fetch "$BASE/ai/site-index.json" "$OUT/site-index.json" + +for name in "${CATEGORIES[@]}"; do + fetch "$BASE/ai/categories/${name}.md" "$OUT/categories/${name}.md" +done + +if [[ "$FULL" -eq 1 ]]; then + fetch "$BASE/ai/llms-full.jsonl" "$OUT/llms-full.jsonl" +else + rm -f "$OUT/llms-full.jsonl" +fi + +MANIFEST="$OUT/manifest.json" +export OUT MANIFEST BASE +python3 <<'PY' +import hashlib, json, os +from datetime import datetime, timezone +from pathlib import Path + +out = Path(os.environ["OUT"]) +base = os.environ["BASE"] +manifest_path = Path(os.environ["MANIFEST"]) +files = {} +for p in sorted(out.rglob("*")): + if not p.is_file(): + continue + if p.name in ("manifest.json", "README.md"): + continue + rel = str(p.relative_to(out)).replace("\\", "/") + data = p.read_bytes() + files[rel] = { + "sha256": hashlib.sha256(data).hexdigest(), + "bytes": len(data), + } +doc = { + "synced_at_utc": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"), + "base_url": base, + "files": files, +} +manifest_path.write_text(json.dumps(doc, indent=2) + "\n", encoding="utf-8") +PY + +echo "Done. Manifest: $MANIFEST" diff --git a/scripts/verify/verify-wormhole-ai-docs-setup.sh b/scripts/verify/verify-wormhole-ai-docs-setup.sh new file mode 100755 index 0000000..b108588 --- /dev/null +++ b/scripts/verify/verify-wormhole-ai-docs-setup.sh @@ -0,0 +1,23 @@ +#!/usr/bin/env bash +# Quick health check: mirror present (or fetch mode documented), MCP server syntax OK. +# Success criteria: docs/04-configuration/WORMHOLE_AI_RESOURCES_LLM_PLAYBOOK.md +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)" +MIRROR="${WORMHOLE_AI_DOCS_DIR:-$ROOT/third-party/wormhole-ai-docs}" +MCP_JS="$ROOT/mcp-wormhole-docs/index.js" + +err() { echo "verify-wormhole-ai-docs-setup: $*" >&2; exit 1; } + +[[ -f "$MCP_JS" ]] || err "missing $MCP_JS" +node --check "$MCP_JS" || err "node --check failed for mcp-wormhole-docs" + +if [[ -f "$MIRROR/site-index.json" && -f "$MIRROR/llms.txt" ]]; then + echo "OK: mirror at $MIRROR (site-index.json + llms.txt present)" +else + echo "WARN: mirror incomplete under $MIRROR — run: bash scripts/doc/sync-wormhole-ai-resources.sh" + echo "OK: MCP server syntax (mirror optional if WORMHOLE_DOCS_FETCH=1 when using MCP)" +fi + +echo "verify-wormhole-ai-docs-setup: all checks passed" diff --git a/third-party/wormhole-ai-docs/README.md b/third-party/wormhole-ai-docs/README.md new file mode 100644 index 0000000..5e2f593 --- /dev/null +++ b/third-party/wormhole-ai-docs/README.md @@ -0,0 +1,11 @@ +# Wormhole AI docs mirror (local) + +This directory holds a **local mirror** of Wormhole’s AI-oriented documentation exports, produced by: + +```bash +bash scripts/doc/sync-wormhole-ai-resources.sh +``` + +**Do not commit** mirrored blobs (see repo root `.gitignore`). **`manifest.json` is not ignored** so you can commit hashes and sync timestamps after a deliberate update. + +See [docs/04-configuration/WORMHOLE_AI_RESOURCES_LLM_PLAYBOOK.md](../../docs/04-configuration/WORMHOLE_AI_RESOURCES_LLM_PLAYBOOK.md) for URLs, tier ladder, and boundaries vs Chain 138 canonical docs. diff --git a/third-party/wormhole-ai-docs/manifest.json b/third-party/wormhole-ai-docs/manifest.json new file mode 100644 index 0000000..4f39044 --- /dev/null +++ b/third-party/wormhole-ai-docs/manifest.json @@ -0,0 +1,66 @@ +{ + "synced_at_utc": "2026-04-01T03:36:28Z", + "base_url": "https://wormhole.com/docs", + "files": { + "categories/basics.md": { + "sha256": "654b583a9d20233a860baeb28354337df0bd8754b51c3bb30f6d3ee79a579efb", + "bytes": 309599 + }, + "categories/cctp.md": { + "sha256": "084c52a9c7d65b99aaa6d44f0091bbfd2dd14859e435fc197f22e4c07484c859", + "bytes": 727059 + }, + "categories/connect.md": { + "sha256": "cfdb3325f66f942f1670dbaf673e4fd4070973eebfad549f2ad03408c85838c9", + "bytes": 646763 + }, + "categories/executor.md": { + "sha256": "fb2fd2deb69aa309af752ab4b56f91ad0502506698cc4752bc953b9726a57292", + "bytes": 596573 + }, + "categories/multigov.md": { + "sha256": "04c83a133ba56bf46c9d0a7f729e6391d1b270aea72a745aaa0856b1bc192a72", + "bytes": 611925 + }, + "categories/ntt.md": { + "sha256": "88760d62be53ed8cb87d8cfc8b4991a9ef9465cfff38d4bae6cd806502dd6205", + "bytes": 893967 + }, + "categories/queries.md": { + "sha256": "19b05c57e678bff1f754cd6aab9464072710f72d38cd0dbf665bd9b9661b7a59", + "bytes": 601482 + }, + "categories/reference.md": { + "sha256": "9fdcff586be614876ff4c65d9931709f216fe25e4729c19b67fb22c6d4113be6", + "bytes": 229733 + }, + "categories/settlement.md": { + "sha256": "bc0fd2c552a86281999df1149760446ce77d7c651faafb0c67f838e33628093d", + "bytes": 571959 + }, + "categories/solidity-sdk.md": { + "sha256": "26320c1ce21ae9c8da84a7cacfb0fb71a7604526c36afb654df1b8c13d8d0f97", + "bytes": 569370 + }, + "categories/transfer.md": { + "sha256": "66f489b29821a6043d2f87335658a63959b9216b06fcf1eb64ffefa6f472b7ac", + "bytes": 1321837 + }, + "categories/typescript-sdk.md": { + "sha256": "852deb75e3dccd1c3de7f683c0123c79895b3620187b0156e55cbda0eae3cc91", + "bytes": 569374 + }, + "categories/wtt.md": { + "sha256": "114b66925c93b44d4440934a1de95b2035ad4b41735b752750e5be83cf35d819", + "bytes": 714877 + }, + "llms.txt": { + "sha256": "eb0ae47244d27ada0c404b4c62ccbef1d8934af0987709cafa3603e4edaf387a", + "bytes": 51280 + }, + "site-index.json": { + "sha256": "0bd89a911bb33f89c2d7b3fad458c272269c4f99e77a1edd715eb83c8ed298cf", + "bytes": 272264 + } + } +}