Files
explorer-monorepo/docs/specs/indexing/data-models.md

567 lines
16 KiB
Markdown

# Data Models Specification
## Overview
This document specifies the data models used throughout the indexing pipeline and stored in the database. All models support multi-chain operation via a `chain_id` field.
## Core Data Models
### Block Schema
**Table**: `blocks`
**Fields**:
```sql
blocks (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
number BIGINT NOT NULL,
hash VARCHAR(66) NOT NULL,
parent_hash VARCHAR(66) NOT NULL,
nonce VARCHAR(18),
sha3_uncles VARCHAR(66),
logs_bloom TEXT,
transactions_root VARCHAR(66),
state_root VARCHAR(66),
receipts_root VARCHAR(66),
miner VARCHAR(42),
difficulty NUMERIC,
total_difficulty NUMERIC,
size BIGINT,
extra_data TEXT,
gas_limit BIGINT,
gas_used BIGINT,
timestamp TIMESTAMP NOT NULL,
transaction_count INTEGER DEFAULT 0,
base_fee_per_gas BIGINT, -- EIP-1559
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(chain_id, number),
UNIQUE(chain_id, hash)
)
```
**Indexes**:
- `idx_blocks_chain_number` ON (chain_id, number)
- `idx_blocks_chain_hash` ON (chain_id, hash)
- `idx_blocks_chain_timestamp` ON (chain_id, timestamp)
**Relationships**:
- One-to-many with `transactions`
- One-to-many with `logs`
### Transaction Schema
**Table**: `transactions`
**Fields**:
```sql
transactions (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
hash VARCHAR(66) NOT NULL,
block_number BIGINT NOT NULL,
block_hash VARCHAR(66) NOT NULL,
transaction_index INTEGER NOT NULL,
from_address VARCHAR(42) NOT NULL,
to_address VARCHAR(42), -- NULL for contract creation
value NUMERIC NOT NULL DEFAULT 0,
gas_price BIGINT,
max_fee_per_gas BIGINT, -- EIP-1559
max_priority_fee_per_gas BIGINT, -- EIP-1559
gas_limit BIGINT NOT NULL,
gas_used BIGINT,
nonce BIGINT NOT NULL,
input_data TEXT, -- Contract call data
status INTEGER, -- 0 = failed, 1 = success
contract_address VARCHAR(42), -- NULL if not contract creation
cumulative_gas_used BIGINT,
effective_gas_price BIGINT, -- Actual gas price paid
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, block_number) REFERENCES blocks(chain_id, number),
UNIQUE(chain_id, hash)
)
```
**Indexes**:
- `idx_transactions_chain_hash` ON (chain_id, hash)
- `idx_transactions_chain_block` ON (chain_id, block_number, transaction_index)
- `idx_transactions_chain_from` ON (chain_id, from_address)
- `idx_transactions_chain_to` ON (chain_id, to_address)
- `idx_transactions_chain_block_from` ON (chain_id, block_number, from_address)
**Relationships**:
- Many-to-one with `blocks`
- One-to-many with `logs`
- One-to-many with `internal_transactions`
- One-to-many with `token_transfers`
### Receipt Schema
**Note**: Receipt data is stored denormalized in the `transactions` table for efficiency. If separate storage is needed:
**Table**: `transaction_receipts`
```sql
transaction_receipts (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
transaction_hash VARCHAR(66) NOT NULL,
transaction_index INTEGER NOT NULL,
block_number BIGINT NOT NULL,
block_hash VARCHAR(66) NOT NULL,
from_address VARCHAR(42) NOT NULL,
to_address VARCHAR(42),
gas_used BIGINT,
cumulative_gas_used BIGINT,
contract_address VARCHAR(42),
logs_bloom TEXT,
status INTEGER,
root VARCHAR(66), -- Pre-Byzantium
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
UNIQUE(chain_id, transaction_hash)
)
```
### Log Schema
**Table**: `logs`
**Fields**:
```sql
logs (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
transaction_hash VARCHAR(66) NOT NULL,
block_number BIGINT NOT NULL,
block_hash VARCHAR(66) NOT NULL,
log_index INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
topic0 VARCHAR(66), -- Event signature
topic1 VARCHAR(66), -- First indexed parameter
topic2 VARCHAR(66), -- Second indexed parameter
topic3 VARCHAR(66), -- Third indexed parameter
data TEXT, -- Non-indexed parameters
decoded_data JSONB, -- Decoded event data (if ABI available)
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
UNIQUE(chain_id, transaction_hash, log_index)
)
```
**Indexes**:
- `idx_logs_chain_tx` ON (chain_id, transaction_hash)
- `idx_logs_chain_address` ON (chain_id, address)
- `idx_logs_chain_topic0` ON (chain_id, topic0)
- `idx_logs_chain_block` ON (chain_id, block_number)
- `idx_logs_chain_address_topic0` ON (chain_id, address, topic0) -- For event filtering
**Relationships**:
- Many-to-one with `transactions`
### Trace Schema
**Table**: `traces`
**Fields**:
```sql
traces (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
transaction_hash VARCHAR(66) NOT NULL,
block_number BIGINT NOT NULL,
block_hash VARCHAR(66) NOT NULL,
trace_address INTEGER[], -- Array representing call hierarchy [0,1,2]
subtraces INTEGER, -- Number of child calls
action_type VARCHAR(20) NOT NULL, -- 'call', 'create', 'suicide', 'delegatecall'
action_from VARCHAR(42),
action_to VARCHAR(42),
action_value NUMERIC DEFAULT 0,
action_input TEXT,
action_gas BIGINT,
action_call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall'
result_type VARCHAR(20), -- 'callresult', 'createresult'
result_gas_used BIGINT,
result_output TEXT,
result_address VARCHAR(42), -- For create results
result_code TEXT, -- For create results
error TEXT, -- Error message if trace failed
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash)
)
```
**Indexes**:
- `idx_traces_chain_tx` ON (chain_id, transaction_hash)
- `idx_traces_chain_block` ON (chain_id, block_number)
- `idx_traces_chain_from` ON (chain_id, action_from)
- `idx_traces_chain_to` ON (chain_id, action_to)
**Note**: Trace data can be large. Consider partitioning or separate storage for historical traces.
### Internal Transaction Schema
**Table**: `internal_transactions`
**Purpose**: Track value transfers that occur within transactions (via calls).
**Fields**:
```sql
internal_transactions (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
transaction_hash VARCHAR(66) NOT NULL,
block_number BIGINT NOT NULL,
trace_address INTEGER[] NOT NULL,
from_address VARCHAR(42) NOT NULL,
to_address VARCHAR(42) NOT NULL,
value NUMERIC NOT NULL,
call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall', 'create'
gas_limit BIGINT,
gas_used BIGINT,
input_data TEXT,
output_data TEXT,
error TEXT,
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash)
)
```
**Indexes**:
- `idx_internal_tx_chain_tx` ON (chain_id, transaction_hash)
- `idx_internal_tx_chain_from` ON (chain_id, from_address)
- `idx_internal_tx_chain_to` ON (chain_id, to_address)
- `idx_internal_tx_chain_block` ON (chain_id, block_number)
**Relationships**:
- Many-to-one with `transactions`
### Token Transfer Schema
**Table**: `token_transfers`
**Purpose**: Track ERC-20, ERC-721, and ERC-1155 token transfers.
**Fields**:
```sql
token_transfers (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
transaction_hash VARCHAR(66) NOT NULL,
block_number BIGINT NOT NULL,
log_index INTEGER NOT NULL,
token_address VARCHAR(42) NOT NULL,
token_type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155'
from_address VARCHAR(42) NOT NULL,
to_address VARCHAR(42) NOT NULL,
amount NUMERIC, -- For ERC-20 and ERC-1155
token_id VARCHAR(78), -- For ERC-721 and ERC-1155 (can be large)
operator VARCHAR(42), -- For ERC-1155
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
FOREIGN KEY (chain_id, token_address) REFERENCES tokens(chain_id, address)
)
```
**Indexes**:
- `idx_token_transfers_chain_token` ON (chain_id, token_address)
- `idx_token_transfers_chain_from` ON (chain_id, from_address)
- `idx_token_transfers_chain_to` ON (chain_id, to_address)
- `idx_token_transfers_chain_tx` ON (chain_id, transaction_hash)
- `idx_token_transfers_chain_block` ON (chain_id, block_number)
- `idx_token_transfers_chain_token_from` ON (chain_id, token_address, from_address)
- `idx_token_transfers_chain_token_to` ON (chain_id, token_address, to_address)
**Relationships**:
- Many-to-one with `transactions`
- Many-to-one with `tokens`
### Token Schema
**Table**: `tokens`
**Purpose**: Store token metadata (ERC-20, ERC-721, ERC-1155).
**Fields**:
```sql
tokens (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155'
name VARCHAR(255),
symbol VARCHAR(50),
decimals INTEGER, -- For ERC-20
total_supply NUMERIC,
holder_count INTEGER DEFAULT 0,
transfer_count INTEGER DEFAULT 0,
logo_url TEXT,
website_url TEXT,
description TEXT,
verified BOOLEAN DEFAULT false,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(chain_id, address)
)
```
**Indexes**:
- `idx_tokens_chain_address` ON (chain_id, address)
- `idx_tokens_chain_type` ON (chain_id, type)
- `idx_tokens_chain_symbol` ON (chain_id, symbol) -- For search
**Relationships**:
- One-to-many with `token_transfers`
- One-to-many with `token_holders` (if maintained)
### Contract Metadata Schema
**Table**: `contracts`
**Purpose**: Store verified contract information.
**Fields**:
```sql
contracts (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
name VARCHAR(255),
compiler_version VARCHAR(50),
optimization_enabled BOOLEAN,
optimization_runs INTEGER,
evm_version VARCHAR(20),
source_code TEXT,
abi JSONB,
constructor_arguments TEXT,
verification_status VARCHAR(20) NOT NULL, -- 'pending', 'verified', 'failed'
verified_at TIMESTAMP,
verification_method VARCHAR(50), -- 'standard_json', 'sourcify', 'multi_file'
license VARCHAR(50),
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(chain_id, address)
)
```
**Indexes**:
- `idx_contracts_chain_address` ON (chain_id, address)
- `idx_contracts_chain_verified` ON (chain_id, verification_status)
**Relationships**:
- One-to-one with `contract_abis` (if separate ABI storage)
### Contract ABI Schema
**Table**: `contract_abis`
**Purpose**: Store contract ABIs for decoding (can be separate from verification).
**Fields**:
```sql
contract_abis (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
abi JSONB NOT NULL,
source VARCHAR(50) NOT NULL, -- 'verification', 'sourcify', 'public', 'user_submitted'
verified BOOLEAN DEFAULT false,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(chain_id, address)
)
```
**Indexes**:
- `idx_abis_chain_address` ON (chain_id, address)
## Address-Related Models
### Address Labels Schema
**Table**: `address_labels`
**Purpose**: User-defined and public labels for addresses.
**Fields**:
```sql
address_labels (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
label VARCHAR(255) NOT NULL,
label_type VARCHAR(20) NOT NULL, -- 'user', 'public', 'contract_name'
user_id UUID, -- NULL for public labels
source VARCHAR(50), -- 'user', 'etherscan', 'blockscout', etc.
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(chain_id, address, label_type, user_id)
)
```
**Indexes**:
- `idx_labels_chain_address` ON (chain_id, address)
- `idx_labels_chain_user` ON (chain_id, user_id)
### Address Tags Schema
**Table**: `address_tags`
**Purpose**: Categorize addresses (e.g., "exchange", "defi", "wallet").
**Fields**:
```sql
address_tags (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
tag VARCHAR(50) NOT NULL,
tag_type VARCHAR(20) NOT NULL, -- 'category', 'risk', 'protocol'
user_id UUID, -- NULL for public tags
created_at TIMESTAMP DEFAULT NOW(),
UNIQUE(chain_id, address, tag, user_id)
)
```
**Indexes**:
- `idx_tags_chain_address` ON (chain_id, address)
- `idx_tags_chain_tag` ON (chain_id, tag)
## User-Related Models
### User Accounts Schema
**Table**: `users`
**Purpose**: User accounts for watchlists, alerts, preferences.
**Fields**:
```sql
users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE,
username VARCHAR(100) UNIQUE,
password_hash TEXT, -- If using password auth
api_key_hash TEXT, -- Hashed API key
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
last_login_at TIMESTAMP
)
```
**Indexes**:
- `idx_users_email` ON (email)
- `idx_users_username` ON (username)
### Watchlists Schema
**Table**: `watchlists`
**Purpose**: User-defined lists of addresses to monitor.
**Fields**:
```sql
watchlists (
id BIGSERIAL PRIMARY KEY,
user_id UUID NOT NULL,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
label VARCHAR(255),
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
UNIQUE(user_id, chain_id, address)
)
```
**Indexes**:
- `idx_watchlists_user` ON (user_id)
- `idx_watchlists_chain_address` ON (chain_id, address)
## Data Type Definitions
### Numeric Types
- **BIGINT**: Used for block numbers, gas values, nonces (64-bit integers)
- **NUMERIC**: Used for token amounts, ETH values (arbitrary precision decimals)
- Precision: 78 digits (sufficient for wei)
- Scale: 0 (integers) or configurable for token decimals
### Address Types
- **VARCHAR(42)**: Ethereum addresses (0x + 40 hex chars)
- Normalize to lowercase for consistency
### Hash Types
- **VARCHAR(66)**: Transaction/block hashes (0x + 64 hex chars)
- **TEXT**: For very long hashes or variable-length data
### JSONB Types
- Used for: ABIs, decoded event data, complex nested structures
- Benefits: Indexing, querying, efficient storage
## Multi-Chain Considerations
### Chain ID Partitioning
All tables include `chain_id` as the first column after primary key:
- Enables efficient partitioning by chain_id
- Ensures data isolation between chains
- Simplifies multi-chain queries
### Partitioning Strategy
**Recommended**: Partition large tables by `chain_id`:
- `blocks`, `transactions`, `logs` partitioned by chain_id
- Benefits: Faster queries, easier maintenance, parallel processing
**Implementation** (PostgreSQL):
```sql
-- Example partitioning
CREATE TABLE blocks (
-- columns
) PARTITION BY LIST (chain_id);
CREATE TABLE blocks_chain_138 PARTITION OF blocks FOR VALUES IN (138);
CREATE TABLE blocks_chain_1 PARTITION OF blocks FOR VALUES IN (1);
```
## Data Consistency
### Foreign Key Constraints
- Enforce referential integrity where possible
- Consider performance impact for high-throughput inserts
- May disable for initial backfill, enable after catch-up
### Unique Constraints
- Prevent duplicate blocks, transactions, logs
- Enable idempotent processing
- Use ON CONFLICT for upserts
## Indexing Strategy
### Index Types
1. **B-tree**: Default for most indexes (equality, range queries)
2. **Hash**: For exact match lookups (addresses, hashes)
3. **GIN**: For JSONB columns (ABIs, decoded data)
4. **BRIN**: For large ordered columns (block numbers, timestamps)
### Index Maintenance
- Regular VACUUM and ANALYZE
- Monitor index bloat
- Consider partial indexes for filtered queries
## References
- Indexer Architecture: See `indexer-architecture.md`
- Database Schema: See `../database/postgres-schema.md`
- Search Index Schema: See `../database/search-index-schema.md`