Files
explorer-monorepo/docs/specs/database/graph-schema.md

301 lines
5.8 KiB
Markdown

# Graph Database Schema Specification
## Overview
This document specifies the Neo4j graph database schema for storing cross-chain entity relationships, address clustering, and protocol interactions.
## Schema Design
### Node Types
#### Address Node
**Labels**: `Address`, `Chain{chain_id}` (e.g., `Chain138`)
**Properties**:
```cypher
{
address: "0x...", // Unique identifier
chainId: 138, // Chain ID
label: "My Wallet", // Optional label
isContract: false, // Is contract address
firstSeen: timestamp, // First seen timestamp
lastSeen: timestamp, // Last seen timestamp
transactionCount: 100, // Transaction count
balance: "1.5" // Current balance (string for precision)
}
```
**Constraints**:
```cypher
CREATE CONSTRAINT address_address_chain_id FOR (a:Address)
REQUIRE (a.address, a.chainId) IS UNIQUE;
```
#### Contract Node
**Labels**: `Contract`, `Address`
**Properties**: Inherits from Address, plus:
```cypher
{
name: "MyToken",
verificationStatus: "verified",
compilerVersion: "0.8.19"
}
```
#### Token Node
**Labels**: `Token`, `Contract`
**Properties**: Inherits from Contract, plus:
```cypher
{
symbol: "MTK",
decimals: 18,
totalSupply: "1000000",
type: "ERC20" // ERC20, ERC721, ERC1155
}
```
#### Protocol Node
**Labels**: `Protocol`
**Properties**:
```cypher
{
name: "Uniswap V3",
category: "DEX",
website: "https://uniswap.org"
}
```
### Relationship Types
#### TRANSFERRED_TO
**Purpose**: Track token transfers between addresses.
**Properties**:
```cypher
{
amount: "1000000000000000000",
tokenAddress: "0x...",
transactionHash: "0x...",
blockNumber: 12345,
timestamp: timestamp
}
```
**Example**:
```cypher
(a1:Address {address: "0x..."})-[r:TRANSFERRED_TO {
amount: "1000000000000000000",
tokenAddress: "0x...",
transactionHash: "0x..."
}]->(a2:Address {address: "0x..."})
```
#### CALLED
**Purpose**: Track contract calls between addresses.
**Properties**:
```cypher
{
transactionHash: "0x...",
blockNumber: 12345,
timestamp: timestamp,
gasUsed: 21000,
method: "transfer"
}
```
#### OWNS
**Purpose**: Track token ownership (current balances).
**Properties**:
```cypher
{
balance: "1000000000000000000",
tokenId: "123", // For ERC-721/1155
updatedAt: timestamp
}
```
**Example**:
```cypher
(a:Address)-[r:OWNS {
balance: "1000000000000000000",
updatedAt: timestamp
}]->(t:Token)
```
#### INTERACTS_WITH
**Purpose**: Track protocol interactions.
**Properties**:
```cypher
{
interactionType: "swap", // swap, deposit, withdraw, etc.
transactionHash: "0x...",
timestamp: timestamp
}
```
**Example**:
```cypher
(a:Address)-[r:INTERACTS_WITH {
interactionType: "swap",
transactionHash: "0x..."
}]->(p:Protocol)
```
#### CLUSTERED_WITH
**Purpose**: Link addresses that belong to the same entity (address clustering).
**Properties**:
```cypher
{
confidence: 0.95, // Clustering confidence score
method: "heuristic", // Clustering method
createdAt: timestamp
}
```
#### CCIP_MESSAGE_LINK
**Purpose**: Link transactions across chains via CCIP messages.
**Properties**:
```cypher
{
messageId: "0x...",
sourceTxHash: "0x...",
destTxHash: "0x...",
status: "delivered",
timestamp: timestamp
}
```
**Example**:
```cypher
(srcTx:Transaction)-[r:CCIP_MESSAGE_LINK {
messageId: "0x...",
status: "delivered"
}]->(destTx:Transaction)
```
## Query Patterns
### Find Token Holders
```cypher
MATCH (t:Token {address: "0x...", chainId: 138})-[r:OWNS]-(a:Address)
WHERE r.balance > "0"
RETURN a.address, r.balance
ORDER BY toFloat(r.balance) DESC
LIMIT 100;
```
### Find Transfer Path
```cypher
MATCH path = (a1:Address {address: "0x..."})-[:TRANSFERRED_TO*1..3]-(a2:Address {address: "0x..."})
WHERE ALL(r in relationships(path) WHERE r.tokenAddress = "0x...")
RETURN path
LIMIT 10;
```
### Find Protocol Users
```cypher
MATCH (a:Address)-[r:INTERACTS_WITH]->(p:Protocol {name: "Uniswap V3"})
RETURN a.address, count(r) as interactionCount
ORDER BY interactionCount DESC
LIMIT 100;
```
### Address Clustering
```cypher
MATCH (a1:Address)-[r:CLUSTERED_WITH]-(a2:Address)
WHERE a1.address = "0x..."
RETURN a2.address, r.confidence, r.method;
```
### Cross-Chain CCIP Links
```cypher
MATCH (srcTx:Transaction {hash: "0x..."})-[r:CCIP_MESSAGE_LINK]-(destTx:Transaction)
RETURN srcTx, r, destTx;
```
## Data Ingestion
### Transaction Ingestion
**Process**:
1. Process transaction from indexer
2. Create/update address nodes
3. Create TRANSFERRED_TO relationships for token transfers
4. Create CALLED relationships for contract calls
5. Update OWNS relationships for token balances
### Batch Ingestion
**Strategy**:
- Use Neo4j Batch API for bulk inserts
- Batch size: 1000-10000 operations
- Use transactions for atomicity
### Incremental Updates
**Process**:
- Update relationships as new transactions processed
- Maintain OWNS relationships (update balances)
- Add new relationships for new interactions
## Performance Optimization
### Indexing
**Indexes**:
```cypher
CREATE INDEX address_address FOR (a:Address) ON (a.address);
CREATE INDEX address_chain_id FOR (a:Address) ON (a.chainId);
CREATE INDEX transaction_hash FOR (t:Transaction) ON (t.hash);
```
### Relationship Constraints
**Uniqueness**: Use MERGE to avoid duplicate relationships
**Example**:
```cypher
MATCH (a1:Address {address: "0x...", chainId: 138})
MATCH (a2:Address {address: "0x...", chainId: 138})
MERGE (a1)-[r:TRANSFERRED_TO {
transactionHash: "0x..."
}]->(a2)
ON CREATE SET r.amount = "1000000", r.timestamp = timestamp();
```
## Data Retention
**Strategy**:
- Keep all current relationships
- Archive old relationships (older than 1 year) to separate database
- Keep aggregated statistics (interaction counts) instead of all relationships
## References
- Entity Graph: See `../multichain/entity-graph.md`
- CCIP Integration: See `../ccip/ccip-tracking.md`