567 lines
16 KiB
Markdown
567 lines
16 KiB
Markdown
# Data Models Specification
|
|
|
|
## Overview
|
|
|
|
This document specifies the data models used throughout the indexing pipeline and stored in the database. All models support multi-chain operation via a `chain_id` field.
|
|
|
|
## Core Data Models
|
|
|
|
### Block Schema
|
|
|
|
**Table**: `blocks`
|
|
|
|
**Fields**:
|
|
```sql
|
|
blocks (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
chain_id INTEGER NOT NULL,
|
|
number BIGINT NOT NULL,
|
|
hash VARCHAR(66) NOT NULL,
|
|
parent_hash VARCHAR(66) NOT NULL,
|
|
nonce VARCHAR(18),
|
|
sha3_uncles VARCHAR(66),
|
|
logs_bloom TEXT,
|
|
transactions_root VARCHAR(66),
|
|
state_root VARCHAR(66),
|
|
receipts_root VARCHAR(66),
|
|
miner VARCHAR(42),
|
|
difficulty NUMERIC,
|
|
total_difficulty NUMERIC,
|
|
size BIGINT,
|
|
extra_data TEXT,
|
|
gas_limit BIGINT,
|
|
gas_used BIGINT,
|
|
timestamp TIMESTAMP NOT NULL,
|
|
transaction_count INTEGER DEFAULT 0,
|
|
base_fee_per_gas BIGINT, -- EIP-1559
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW(),
|
|
UNIQUE(chain_id, number),
|
|
UNIQUE(chain_id, hash)
|
|
)
|
|
```
|
|
|
|
**Indexes**:
|
|
- `idx_blocks_chain_number` ON (chain_id, number)
|
|
- `idx_blocks_chain_hash` ON (chain_id, hash)
|
|
- `idx_blocks_chain_timestamp` ON (chain_id, timestamp)
|
|
|
|
**Relationships**:
|
|
- One-to-many with `transactions`
|
|
- One-to-many with `logs`
|
|
|
|
### Transaction Schema
|
|
|
|
**Table**: `transactions`
|
|
|
|
**Fields**:
|
|
```sql
|
|
transactions (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
chain_id INTEGER NOT NULL,
|
|
hash VARCHAR(66) NOT NULL,
|
|
block_number BIGINT NOT NULL,
|
|
block_hash VARCHAR(66) NOT NULL,
|
|
transaction_index INTEGER NOT NULL,
|
|
from_address VARCHAR(42) NOT NULL,
|
|
to_address VARCHAR(42), -- NULL for contract creation
|
|
value NUMERIC NOT NULL DEFAULT 0,
|
|
gas_price BIGINT,
|
|
max_fee_per_gas BIGINT, -- EIP-1559
|
|
max_priority_fee_per_gas BIGINT, -- EIP-1559
|
|
gas_limit BIGINT NOT NULL,
|
|
gas_used BIGINT,
|
|
nonce BIGINT NOT NULL,
|
|
input_data TEXT, -- Contract call data
|
|
status INTEGER, -- 0 = failed, 1 = success
|
|
contract_address VARCHAR(42), -- NULL if not contract creation
|
|
cumulative_gas_used BIGINT,
|
|
effective_gas_price BIGINT, -- Actual gas price paid
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW(),
|
|
FOREIGN KEY (chain_id, block_number) REFERENCES blocks(chain_id, number),
|
|
UNIQUE(chain_id, hash)
|
|
)
|
|
```
|
|
|
|
**Indexes**:
|
|
- `idx_transactions_chain_hash` ON (chain_id, hash)
|
|
- `idx_transactions_chain_block` ON (chain_id, block_number, transaction_index)
|
|
- `idx_transactions_chain_from` ON (chain_id, from_address)
|
|
- `idx_transactions_chain_to` ON (chain_id, to_address)
|
|
- `idx_transactions_chain_block_from` ON (chain_id, block_number, from_address)
|
|
|
|
**Relationships**:
|
|
- Many-to-one with `blocks`
|
|
- One-to-many with `logs`
|
|
- One-to-many with `internal_transactions`
|
|
- One-to-many with `token_transfers`
|
|
|
|
### Receipt Schema
|
|
|
|
**Note**: Receipt data is stored denormalized in the `transactions` table for efficiency. If separate storage is needed:
|
|
|
|
**Table**: `transaction_receipts`
|
|
|
|
```sql
|
|
transaction_receipts (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
chain_id INTEGER NOT NULL,
|
|
transaction_hash VARCHAR(66) NOT NULL,
|
|
transaction_index INTEGER NOT NULL,
|
|
block_number BIGINT NOT NULL,
|
|
block_hash VARCHAR(66) NOT NULL,
|
|
from_address VARCHAR(42) NOT NULL,
|
|
to_address VARCHAR(42),
|
|
gas_used BIGINT,
|
|
cumulative_gas_used BIGINT,
|
|
contract_address VARCHAR(42),
|
|
logs_bloom TEXT,
|
|
status INTEGER,
|
|
root VARCHAR(66), -- Pre-Byzantium
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
|
|
UNIQUE(chain_id, transaction_hash)
|
|
)
|
|
```
|
|
|
|
### Log Schema
|
|
|
|
**Table**: `logs`
|
|
|
|
**Fields**:
|
|
```sql
|
|
logs (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
chain_id INTEGER NOT NULL,
|
|
transaction_hash VARCHAR(66) NOT NULL,
|
|
block_number BIGINT NOT NULL,
|
|
block_hash VARCHAR(66) NOT NULL,
|
|
log_index INTEGER NOT NULL,
|
|
address VARCHAR(42) NOT NULL,
|
|
topic0 VARCHAR(66), -- Event signature
|
|
topic1 VARCHAR(66), -- First indexed parameter
|
|
topic2 VARCHAR(66), -- Second indexed parameter
|
|
topic3 VARCHAR(66), -- Third indexed parameter
|
|
data TEXT, -- Non-indexed parameters
|
|
decoded_data JSONB, -- Decoded event data (if ABI available)
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
|
|
UNIQUE(chain_id, transaction_hash, log_index)
|
|
)
|
|
```
|
|
|
|
**Indexes**:
|
|
- `idx_logs_chain_tx` ON (chain_id, transaction_hash)
|
|
- `idx_logs_chain_address` ON (chain_id, address)
|
|
- `idx_logs_chain_topic0` ON (chain_id, topic0)
|
|
- `idx_logs_chain_block` ON (chain_id, block_number)
|
|
- `idx_logs_chain_address_topic0` ON (chain_id, address, topic0) -- For event filtering
|
|
|
|
**Relationships**:
|
|
- Many-to-one with `transactions`
|
|
|
|
### Trace Schema
|
|
|
|
**Table**: `traces`
|
|
|
|
**Fields**:
|
|
```sql
|
|
traces (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
chain_id INTEGER NOT NULL,
|
|
transaction_hash VARCHAR(66) NOT NULL,
|
|
block_number BIGINT NOT NULL,
|
|
block_hash VARCHAR(66) NOT NULL,
|
|
trace_address INTEGER[], -- Array representing call hierarchy [0,1,2]
|
|
subtraces INTEGER, -- Number of child calls
|
|
action_type VARCHAR(20) NOT NULL, -- 'call', 'create', 'suicide', 'delegatecall'
|
|
action_from VARCHAR(42),
|
|
action_to VARCHAR(42),
|
|
action_value NUMERIC DEFAULT 0,
|
|
action_input TEXT,
|
|
action_gas BIGINT,
|
|
action_call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall'
|
|
result_type VARCHAR(20), -- 'callresult', 'createresult'
|
|
result_gas_used BIGINT,
|
|
result_output TEXT,
|
|
result_address VARCHAR(42), -- For create results
|
|
result_code TEXT, -- For create results
|
|
error TEXT, -- Error message if trace failed
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash)
|
|
)
|
|
```
|
|
|
|
**Indexes**:
|
|
- `idx_traces_chain_tx` ON (chain_id, transaction_hash)
|
|
- `idx_traces_chain_block` ON (chain_id, block_number)
|
|
- `idx_traces_chain_from` ON (chain_id, action_from)
|
|
- `idx_traces_chain_to` ON (chain_id, action_to)
|
|
|
|
**Note**: Trace data can be large. Consider partitioning or separate storage for historical traces.
|
|
|
|
### Internal Transaction Schema
|
|
|
|
**Table**: `internal_transactions`
|
|
|
|
**Purpose**: Track value transfers that occur within transactions (via calls).
|
|
|
|
**Fields**:
|
|
```sql
|
|
internal_transactions (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
chain_id INTEGER NOT NULL,
|
|
transaction_hash VARCHAR(66) NOT NULL,
|
|
block_number BIGINT NOT NULL,
|
|
trace_address INTEGER[] NOT NULL,
|
|
from_address VARCHAR(42) NOT NULL,
|
|
to_address VARCHAR(42) NOT NULL,
|
|
value NUMERIC NOT NULL,
|
|
call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall', 'create'
|
|
gas_limit BIGINT,
|
|
gas_used BIGINT,
|
|
input_data TEXT,
|
|
output_data TEXT,
|
|
error TEXT,
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash)
|
|
)
|
|
```
|
|
|
|
**Indexes**:
|
|
- `idx_internal_tx_chain_tx` ON (chain_id, transaction_hash)
|
|
- `idx_internal_tx_chain_from` ON (chain_id, from_address)
|
|
- `idx_internal_tx_chain_to` ON (chain_id, to_address)
|
|
- `idx_internal_tx_chain_block` ON (chain_id, block_number)
|
|
|
|
**Relationships**:
|
|
- Many-to-one with `transactions`
|
|
|
|
### Token Transfer Schema
|
|
|
|
**Table**: `token_transfers`
|
|
|
|
**Purpose**: Track ERC-20, ERC-721, and ERC-1155 token transfers.
|
|
|
|
**Fields**:
|
|
```sql
|
|
token_transfers (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
chain_id INTEGER NOT NULL,
|
|
transaction_hash VARCHAR(66) NOT NULL,
|
|
block_number BIGINT NOT NULL,
|
|
log_index INTEGER NOT NULL,
|
|
token_address VARCHAR(42) NOT NULL,
|
|
token_type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155'
|
|
from_address VARCHAR(42) NOT NULL,
|
|
to_address VARCHAR(42) NOT NULL,
|
|
amount NUMERIC, -- For ERC-20 and ERC-1155
|
|
token_id VARCHAR(78), -- For ERC-721 and ERC-1155 (can be large)
|
|
operator VARCHAR(42), -- For ERC-1155
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
|
|
FOREIGN KEY (chain_id, token_address) REFERENCES tokens(chain_id, address)
|
|
)
|
|
```
|
|
|
|
**Indexes**:
|
|
- `idx_token_transfers_chain_token` ON (chain_id, token_address)
|
|
- `idx_token_transfers_chain_from` ON (chain_id, from_address)
|
|
- `idx_token_transfers_chain_to` ON (chain_id, to_address)
|
|
- `idx_token_transfers_chain_tx` ON (chain_id, transaction_hash)
|
|
- `idx_token_transfers_chain_block` ON (chain_id, block_number)
|
|
- `idx_token_transfers_chain_token_from` ON (chain_id, token_address, from_address)
|
|
- `idx_token_transfers_chain_token_to` ON (chain_id, token_address, to_address)
|
|
|
|
**Relationships**:
|
|
- Many-to-one with `transactions`
|
|
- Many-to-one with `tokens`
|
|
|
|
### Token Schema
|
|
|
|
**Table**: `tokens`
|
|
|
|
**Purpose**: Store token metadata (ERC-20, ERC-721, ERC-1155).
|
|
|
|
**Fields**:
|
|
```sql
|
|
tokens (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
chain_id INTEGER NOT NULL,
|
|
address VARCHAR(42) NOT NULL,
|
|
type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155'
|
|
name VARCHAR(255),
|
|
symbol VARCHAR(50),
|
|
decimals INTEGER, -- For ERC-20
|
|
total_supply NUMERIC,
|
|
holder_count INTEGER DEFAULT 0,
|
|
transfer_count INTEGER DEFAULT 0,
|
|
logo_url TEXT,
|
|
website_url TEXT,
|
|
description TEXT,
|
|
verified BOOLEAN DEFAULT false,
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW(),
|
|
UNIQUE(chain_id, address)
|
|
)
|
|
```
|
|
|
|
**Indexes**:
|
|
- `idx_tokens_chain_address` ON (chain_id, address)
|
|
- `idx_tokens_chain_type` ON (chain_id, type)
|
|
- `idx_tokens_chain_symbol` ON (chain_id, symbol) -- For search
|
|
|
|
**Relationships**:
|
|
- One-to-many with `token_transfers`
|
|
- One-to-many with `token_holders` (if maintained)
|
|
|
|
### Contract Metadata Schema
|
|
|
|
**Table**: `contracts`
|
|
|
|
**Purpose**: Store verified contract information.
|
|
|
|
**Fields**:
|
|
```sql
|
|
contracts (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
chain_id INTEGER NOT NULL,
|
|
address VARCHAR(42) NOT NULL,
|
|
name VARCHAR(255),
|
|
compiler_version VARCHAR(50),
|
|
optimization_enabled BOOLEAN,
|
|
optimization_runs INTEGER,
|
|
evm_version VARCHAR(20),
|
|
source_code TEXT,
|
|
abi JSONB,
|
|
constructor_arguments TEXT,
|
|
verification_status VARCHAR(20) NOT NULL, -- 'pending', 'verified', 'failed'
|
|
verified_at TIMESTAMP,
|
|
verification_method VARCHAR(50), -- 'standard_json', 'sourcify', 'multi_file'
|
|
license VARCHAR(50),
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW(),
|
|
UNIQUE(chain_id, address)
|
|
)
|
|
```
|
|
|
|
**Indexes**:
|
|
- `idx_contracts_chain_address` ON (chain_id, address)
|
|
- `idx_contracts_chain_verified` ON (chain_id, verification_status)
|
|
|
|
**Relationships**:
|
|
- One-to-one with `contract_abis` (if separate ABI storage)
|
|
|
|
### Contract ABI Schema
|
|
|
|
**Table**: `contract_abis`
|
|
|
|
**Purpose**: Store contract ABIs for decoding (can be separate from verification).
|
|
|
|
**Fields**:
|
|
```sql
|
|
contract_abis (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
chain_id INTEGER NOT NULL,
|
|
address VARCHAR(42) NOT NULL,
|
|
abi JSONB NOT NULL,
|
|
source VARCHAR(50) NOT NULL, -- 'verification', 'sourcify', 'public', 'user_submitted'
|
|
verified BOOLEAN DEFAULT false,
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW(),
|
|
UNIQUE(chain_id, address)
|
|
)
|
|
```
|
|
|
|
**Indexes**:
|
|
- `idx_abis_chain_address` ON (chain_id, address)
|
|
|
|
## Address-Related Models
|
|
|
|
### Address Labels Schema
|
|
|
|
**Table**: `address_labels`
|
|
|
|
**Purpose**: User-defined and public labels for addresses.
|
|
|
|
**Fields**:
|
|
```sql
|
|
address_labels (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
chain_id INTEGER NOT NULL,
|
|
address VARCHAR(42) NOT NULL,
|
|
label VARCHAR(255) NOT NULL,
|
|
label_type VARCHAR(20) NOT NULL, -- 'user', 'public', 'contract_name'
|
|
user_id UUID, -- NULL for public labels
|
|
source VARCHAR(50), -- 'user', 'etherscan', 'blockscout', etc.
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW(),
|
|
UNIQUE(chain_id, address, label_type, user_id)
|
|
)
|
|
```
|
|
|
|
**Indexes**:
|
|
- `idx_labels_chain_address` ON (chain_id, address)
|
|
- `idx_labels_chain_user` ON (chain_id, user_id)
|
|
|
|
### Address Tags Schema
|
|
|
|
**Table**: `address_tags`
|
|
|
|
**Purpose**: Categorize addresses (e.g., "exchange", "defi", "wallet").
|
|
|
|
**Fields**:
|
|
```sql
|
|
address_tags (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
chain_id INTEGER NOT NULL,
|
|
address VARCHAR(42) NOT NULL,
|
|
tag VARCHAR(50) NOT NULL,
|
|
tag_type VARCHAR(20) NOT NULL, -- 'category', 'risk', 'protocol'
|
|
user_id UUID, -- NULL for public tags
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
UNIQUE(chain_id, address, tag, user_id)
|
|
)
|
|
```
|
|
|
|
**Indexes**:
|
|
- `idx_tags_chain_address` ON (chain_id, address)
|
|
- `idx_tags_chain_tag` ON (chain_id, tag)
|
|
|
|
## User-Related Models
|
|
|
|
### User Accounts Schema
|
|
|
|
**Table**: `users`
|
|
|
|
**Purpose**: User accounts for watchlists, alerts, preferences.
|
|
|
|
**Fields**:
|
|
```sql
|
|
users (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
email VARCHAR(255) UNIQUE,
|
|
username VARCHAR(100) UNIQUE,
|
|
password_hash TEXT, -- If using password auth
|
|
api_key_hash TEXT, -- Hashed API key
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
updated_at TIMESTAMP DEFAULT NOW(),
|
|
last_login_at TIMESTAMP
|
|
)
|
|
```
|
|
|
|
**Indexes**:
|
|
- `idx_users_email` ON (email)
|
|
- `idx_users_username` ON (username)
|
|
|
|
### Watchlists Schema
|
|
|
|
**Table**: `watchlists`
|
|
|
|
**Purpose**: User-defined lists of addresses to monitor.
|
|
|
|
**Fields**:
|
|
```sql
|
|
watchlists (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
user_id UUID NOT NULL,
|
|
chain_id INTEGER NOT NULL,
|
|
address VARCHAR(42) NOT NULL,
|
|
label VARCHAR(255),
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
|
|
UNIQUE(user_id, chain_id, address)
|
|
)
|
|
```
|
|
|
|
**Indexes**:
|
|
- `idx_watchlists_user` ON (user_id)
|
|
- `idx_watchlists_chain_address` ON (chain_id, address)
|
|
|
|
## Data Type Definitions
|
|
|
|
### Numeric Types
|
|
|
|
- **BIGINT**: Used for block numbers, gas values, nonces (64-bit integers)
|
|
- **NUMERIC**: Used for token amounts, ETH values (arbitrary precision decimals)
|
|
- Precision: 78 digits (sufficient for wei)
|
|
- Scale: 0 (integers) or configurable for token decimals
|
|
|
|
### Address Types
|
|
|
|
- **VARCHAR(42)**: Ethereum addresses (0x + 40 hex chars)
|
|
- Normalize to lowercase for consistency
|
|
|
|
### Hash Types
|
|
|
|
- **VARCHAR(66)**: Transaction/block hashes (0x + 64 hex chars)
|
|
- **TEXT**: For very long hashes or variable-length data
|
|
|
|
### JSONB Types
|
|
|
|
- Used for: ABIs, decoded event data, complex nested structures
|
|
- Benefits: Indexing, querying, efficient storage
|
|
|
|
## Multi-Chain Considerations
|
|
|
|
### Chain ID Partitioning
|
|
|
|
All tables include `chain_id` as the first column after primary key:
|
|
- Enables efficient partitioning by chain_id
|
|
- Ensures data isolation between chains
|
|
- Simplifies multi-chain queries
|
|
|
|
### Partitioning Strategy
|
|
|
|
**Recommended**: Partition large tables by `chain_id`:
|
|
- `blocks`, `transactions`, `logs` partitioned by chain_id
|
|
- Benefits: Faster queries, easier maintenance, parallel processing
|
|
|
|
**Implementation** (PostgreSQL):
|
|
```sql
|
|
-- Example partitioning
|
|
CREATE TABLE blocks (
|
|
-- columns
|
|
) PARTITION BY LIST (chain_id);
|
|
|
|
CREATE TABLE blocks_chain_138 PARTITION OF blocks FOR VALUES IN (138);
|
|
CREATE TABLE blocks_chain_1 PARTITION OF blocks FOR VALUES IN (1);
|
|
```
|
|
|
|
## Data Consistency
|
|
|
|
### Foreign Key Constraints
|
|
|
|
- Enforce referential integrity where possible
|
|
- Consider performance impact for high-throughput inserts
|
|
- May disable for initial backfill, enable after catch-up
|
|
|
|
### Unique Constraints
|
|
|
|
- Prevent duplicate blocks, transactions, logs
|
|
- Enable idempotent processing
|
|
- Use ON CONFLICT for upserts
|
|
|
|
## Indexing Strategy
|
|
|
|
### Index Types
|
|
|
|
1. **B-tree**: Default for most indexes (equality, range queries)
|
|
2. **Hash**: For exact match lookups (addresses, hashes)
|
|
3. **GIN**: For JSONB columns (ABIs, decoded data)
|
|
4. **BRIN**: For large ordered columns (block numbers, timestamps)
|
|
|
|
### Index Maintenance
|
|
|
|
- Regular VACUUM and ANALYZE
|
|
- Monitor index bloat
|
|
- Consider partial indexes for filtered queries
|
|
|
|
## References
|
|
|
|
- Indexer Architecture: See `indexer-architecture.md`
|
|
- Database Schema: See `../database/postgres-schema.md`
|
|
- Search Index Schema: See `../database/search-index-schema.md`
|
|
|