# Graph Database Schema Specification ## Overview This document specifies the Neo4j graph database schema for storing cross-chain entity relationships, address clustering, and protocol interactions. ## Schema Design ### Node Types #### Address Node **Labels**: `Address`, `Chain{chain_id}` (e.g., `Chain138`) **Properties**: ```cypher { address: "0x...", // Unique identifier chainId: 138, // Chain ID label: "My Wallet", // Optional label isContract: false, // Is contract address firstSeen: timestamp, // First seen timestamp lastSeen: timestamp, // Last seen timestamp transactionCount: 100, // Transaction count balance: "1.5" // Current balance (string for precision) } ``` **Constraints**: ```cypher CREATE CONSTRAINT address_address_chain_id FOR (a:Address) REQUIRE (a.address, a.chainId) IS UNIQUE; ``` #### Contract Node **Labels**: `Contract`, `Address` **Properties**: Inherits from Address, plus: ```cypher { name: "MyToken", verificationStatus: "verified", compilerVersion: "0.8.19" } ``` #### Token Node **Labels**: `Token`, `Contract` **Properties**: Inherits from Contract, plus: ```cypher { symbol: "MTK", decimals: 18, totalSupply: "1000000", type: "ERC20" // ERC20, ERC721, ERC1155 } ``` #### Protocol Node **Labels**: `Protocol` **Properties**: ```cypher { name: "Uniswap V3", category: "DEX", website: "https://uniswap.org" } ``` ### Relationship Types #### TRANSFERRED_TO **Purpose**: Track token transfers between addresses. **Properties**: ```cypher { amount: "1000000000000000000", tokenAddress: "0x...", transactionHash: "0x...", blockNumber: 12345, timestamp: timestamp } ``` **Example**: ```cypher (a1:Address {address: "0x..."})-[r:TRANSFERRED_TO { amount: "1000000000000000000", tokenAddress: "0x...", transactionHash: "0x..." }]->(a2:Address {address: "0x..."}) ``` #### CALLED **Purpose**: Track contract calls between addresses. **Properties**: ```cypher { transactionHash: "0x...", blockNumber: 12345, timestamp: timestamp, gasUsed: 21000, method: "transfer" } ``` #### OWNS **Purpose**: Track token ownership (current balances). **Properties**: ```cypher { balance: "1000000000000000000", tokenId: "123", // For ERC-721/1155 updatedAt: timestamp } ``` **Example**: ```cypher (a:Address)-[r:OWNS { balance: "1000000000000000000", updatedAt: timestamp }]->(t:Token) ``` #### INTERACTS_WITH **Purpose**: Track protocol interactions. **Properties**: ```cypher { interactionType: "swap", // swap, deposit, withdraw, etc. transactionHash: "0x...", timestamp: timestamp } ``` **Example**: ```cypher (a:Address)-[r:INTERACTS_WITH { interactionType: "swap", transactionHash: "0x..." }]->(p:Protocol) ``` #### CLUSTERED_WITH **Purpose**: Link addresses that belong to the same entity (address clustering). **Properties**: ```cypher { confidence: 0.95, // Clustering confidence score method: "heuristic", // Clustering method createdAt: timestamp } ``` #### CCIP_MESSAGE_LINK **Purpose**: Link transactions across chains via CCIP messages. **Properties**: ```cypher { messageId: "0x...", sourceTxHash: "0x...", destTxHash: "0x...", status: "delivered", timestamp: timestamp } ``` **Example**: ```cypher (srcTx:Transaction)-[r:CCIP_MESSAGE_LINK { messageId: "0x...", status: "delivered" }]->(destTx:Transaction) ``` ## Query Patterns ### Find Token Holders ```cypher MATCH (t:Token {address: "0x...", chainId: 138})-[r:OWNS]-(a:Address) WHERE r.balance > "0" RETURN a.address, r.balance ORDER BY toFloat(r.balance) DESC LIMIT 100; ``` ### Find Transfer Path ```cypher MATCH path = (a1:Address {address: "0x..."})-[:TRANSFERRED_TO*1..3]-(a2:Address {address: "0x..."}) WHERE ALL(r in relationships(path) WHERE r.tokenAddress = "0x...") RETURN path LIMIT 10; ``` ### Find Protocol Users ```cypher MATCH (a:Address)-[r:INTERACTS_WITH]->(p:Protocol {name: "Uniswap V3"}) RETURN a.address, count(r) as interactionCount ORDER BY interactionCount DESC LIMIT 100; ``` ### Address Clustering ```cypher MATCH (a1:Address)-[r:CLUSTERED_WITH]-(a2:Address) WHERE a1.address = "0x..." RETURN a2.address, r.confidence, r.method; ``` ### Cross-Chain CCIP Links ```cypher MATCH (srcTx:Transaction {hash: "0x..."})-[r:CCIP_MESSAGE_LINK]-(destTx:Transaction) RETURN srcTx, r, destTx; ``` ## Data Ingestion ### Transaction Ingestion **Process**: 1. Process transaction from indexer 2. Create/update address nodes 3. Create TRANSFERRED_TO relationships for token transfers 4. Create CALLED relationships for contract calls 5. Update OWNS relationships for token balances ### Batch Ingestion **Strategy**: - Use Neo4j Batch API for bulk inserts - Batch size: 1000-10000 operations - Use transactions for atomicity ### Incremental Updates **Process**: - Update relationships as new transactions processed - Maintain OWNS relationships (update balances) - Add new relationships for new interactions ## Performance Optimization ### Indexing **Indexes**: ```cypher CREATE INDEX address_address FOR (a:Address) ON (a.address); CREATE INDEX address_chain_id FOR (a:Address) ON (a.chainId); CREATE INDEX transaction_hash FOR (t:Transaction) ON (t.hash); ``` ### Relationship Constraints **Uniqueness**: Use MERGE to avoid duplicate relationships **Example**: ```cypher MATCH (a1:Address {address: "0x...", chainId: 138}) MATCH (a2:Address {address: "0x...", chainId: 138}) MERGE (a1)-[r:TRANSFERRED_TO { transactionHash: "0x..." }]->(a2) ON CREATE SET r.amount = "1000000", r.timestamp = timestamp(); ``` ## Data Retention **Strategy**: - Keep all current relationships - Archive old relationships (older than 1 year) to separate database - Keep aggregated statistics (interaction counts) instead of all relationships ## References - Entity Graph: See `../multichain/entity-graph.md` - CCIP Integration: See `../ccip/ccip-tracking.md`