# Data Models Specification ## Overview This document specifies the data models used throughout the indexing pipeline and stored in the database. All models support multi-chain operation via a `chain_id` field. ## Core Data Models ### Block Schema **Table**: `blocks` **Fields**: ```sql blocks ( id BIGSERIAL PRIMARY KEY, chain_id INTEGER NOT NULL, number BIGINT NOT NULL, hash VARCHAR(66) NOT NULL, parent_hash VARCHAR(66) NOT NULL, nonce VARCHAR(18), sha3_uncles VARCHAR(66), logs_bloom TEXT, transactions_root VARCHAR(66), state_root VARCHAR(66), receipts_root VARCHAR(66), miner VARCHAR(42), difficulty NUMERIC, total_difficulty NUMERIC, size BIGINT, extra_data TEXT, gas_limit BIGINT, gas_used BIGINT, timestamp TIMESTAMP NOT NULL, transaction_count INTEGER DEFAULT 0, base_fee_per_gas BIGINT, -- EIP-1559 created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW(), UNIQUE(chain_id, number), UNIQUE(chain_id, hash) ) ``` **Indexes**: - `idx_blocks_chain_number` ON (chain_id, number) - `idx_blocks_chain_hash` ON (chain_id, hash) - `idx_blocks_chain_timestamp` ON (chain_id, timestamp) **Relationships**: - One-to-many with `transactions` - One-to-many with `logs` ### Transaction Schema **Table**: `transactions` **Fields**: ```sql transactions ( id BIGSERIAL PRIMARY KEY, chain_id INTEGER NOT NULL, hash VARCHAR(66) NOT NULL, block_number BIGINT NOT NULL, block_hash VARCHAR(66) NOT NULL, transaction_index INTEGER NOT NULL, from_address VARCHAR(42) NOT NULL, to_address VARCHAR(42), -- NULL for contract creation value NUMERIC NOT NULL DEFAULT 0, gas_price BIGINT, max_fee_per_gas BIGINT, -- EIP-1559 max_priority_fee_per_gas BIGINT, -- EIP-1559 gas_limit BIGINT NOT NULL, gas_used BIGINT, nonce BIGINT NOT NULL, input_data TEXT, -- Contract call data status INTEGER, -- 0 = failed, 1 = success contract_address VARCHAR(42), -- NULL if not contract creation cumulative_gas_used BIGINT, effective_gas_price BIGINT, -- Actual gas price paid created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW(), FOREIGN KEY (chain_id, block_number) REFERENCES blocks(chain_id, number), UNIQUE(chain_id, hash) ) ``` **Indexes**: - `idx_transactions_chain_hash` ON (chain_id, hash) - `idx_transactions_chain_block` ON (chain_id, block_number, transaction_index) - `idx_transactions_chain_from` ON (chain_id, from_address) - `idx_transactions_chain_to` ON (chain_id, to_address) - `idx_transactions_chain_block_from` ON (chain_id, block_number, from_address) **Relationships**: - Many-to-one with `blocks` - One-to-many with `logs` - One-to-many with `internal_transactions` - One-to-many with `token_transfers` ### Receipt Schema **Note**: Receipt data is stored denormalized in the `transactions` table for efficiency. If separate storage is needed: **Table**: `transaction_receipts` ```sql transaction_receipts ( id BIGSERIAL PRIMARY KEY, chain_id INTEGER NOT NULL, transaction_hash VARCHAR(66) NOT NULL, transaction_index INTEGER NOT NULL, block_number BIGINT NOT NULL, block_hash VARCHAR(66) NOT NULL, from_address VARCHAR(42) NOT NULL, to_address VARCHAR(42), gas_used BIGINT, cumulative_gas_used BIGINT, contract_address VARCHAR(42), logs_bloom TEXT, status INTEGER, root VARCHAR(66), -- Pre-Byzantium created_at TIMESTAMP DEFAULT NOW(), FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash), UNIQUE(chain_id, transaction_hash) ) ``` ### Log Schema **Table**: `logs` **Fields**: ```sql logs ( id BIGSERIAL PRIMARY KEY, chain_id INTEGER NOT NULL, transaction_hash VARCHAR(66) NOT NULL, block_number BIGINT NOT NULL, block_hash VARCHAR(66) NOT NULL, log_index INTEGER NOT NULL, address VARCHAR(42) NOT NULL, topic0 VARCHAR(66), -- Event signature topic1 VARCHAR(66), -- First indexed parameter topic2 VARCHAR(66), -- Second indexed parameter topic3 VARCHAR(66), -- Third indexed parameter data TEXT, -- Non-indexed parameters decoded_data JSONB, -- Decoded event data (if ABI available) created_at TIMESTAMP DEFAULT NOW(), FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash), UNIQUE(chain_id, transaction_hash, log_index) ) ``` **Indexes**: - `idx_logs_chain_tx` ON (chain_id, transaction_hash) - `idx_logs_chain_address` ON (chain_id, address) - `idx_logs_chain_topic0` ON (chain_id, topic0) - `idx_logs_chain_block` ON (chain_id, block_number) - `idx_logs_chain_address_topic0` ON (chain_id, address, topic0) -- For event filtering **Relationships**: - Many-to-one with `transactions` ### Trace Schema **Table**: `traces` **Fields**: ```sql traces ( id BIGSERIAL PRIMARY KEY, chain_id INTEGER NOT NULL, transaction_hash VARCHAR(66) NOT NULL, block_number BIGINT NOT NULL, block_hash VARCHAR(66) NOT NULL, trace_address INTEGER[], -- Array representing call hierarchy [0,1,2] subtraces INTEGER, -- Number of child calls action_type VARCHAR(20) NOT NULL, -- 'call', 'create', 'suicide', 'delegatecall' action_from VARCHAR(42), action_to VARCHAR(42), action_value NUMERIC DEFAULT 0, action_input TEXT, action_gas BIGINT, action_call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall' result_type VARCHAR(20), -- 'callresult', 'createresult' result_gas_used BIGINT, result_output TEXT, result_address VARCHAR(42), -- For create results result_code TEXT, -- For create results error TEXT, -- Error message if trace failed created_at TIMESTAMP DEFAULT NOW(), FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash) ) ``` **Indexes**: - `idx_traces_chain_tx` ON (chain_id, transaction_hash) - `idx_traces_chain_block` ON (chain_id, block_number) - `idx_traces_chain_from` ON (chain_id, action_from) - `idx_traces_chain_to` ON (chain_id, action_to) **Note**: Trace data can be large. Consider partitioning or separate storage for historical traces. ### Internal Transaction Schema **Table**: `internal_transactions` **Purpose**: Track value transfers that occur within transactions (via calls). **Fields**: ```sql internal_transactions ( id BIGSERIAL PRIMARY KEY, chain_id INTEGER NOT NULL, transaction_hash VARCHAR(66) NOT NULL, block_number BIGINT NOT NULL, trace_address INTEGER[] NOT NULL, from_address VARCHAR(42) NOT NULL, to_address VARCHAR(42) NOT NULL, value NUMERIC NOT NULL, call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall', 'create' gas_limit BIGINT, gas_used BIGINT, input_data TEXT, output_data TEXT, error TEXT, created_at TIMESTAMP DEFAULT NOW(), FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash) ) ``` **Indexes**: - `idx_internal_tx_chain_tx` ON (chain_id, transaction_hash) - `idx_internal_tx_chain_from` ON (chain_id, from_address) - `idx_internal_tx_chain_to` ON (chain_id, to_address) - `idx_internal_tx_chain_block` ON (chain_id, block_number) **Relationships**: - Many-to-one with `transactions` ### Token Transfer Schema **Table**: `token_transfers` **Purpose**: Track ERC-20, ERC-721, and ERC-1155 token transfers. **Fields**: ```sql token_transfers ( id BIGSERIAL PRIMARY KEY, chain_id INTEGER NOT NULL, transaction_hash VARCHAR(66) NOT NULL, block_number BIGINT NOT NULL, log_index INTEGER NOT NULL, token_address VARCHAR(42) NOT NULL, token_type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155' from_address VARCHAR(42) NOT NULL, to_address VARCHAR(42) NOT NULL, amount NUMERIC, -- For ERC-20 and ERC-1155 token_id VARCHAR(78), -- For ERC-721 and ERC-1155 (can be large) operator VARCHAR(42), -- For ERC-1155 created_at TIMESTAMP DEFAULT NOW(), FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash), FOREIGN KEY (chain_id, token_address) REFERENCES tokens(chain_id, address) ) ``` **Indexes**: - `idx_token_transfers_chain_token` ON (chain_id, token_address) - `idx_token_transfers_chain_from` ON (chain_id, from_address) - `idx_token_transfers_chain_to` ON (chain_id, to_address) - `idx_token_transfers_chain_tx` ON (chain_id, transaction_hash) - `idx_token_transfers_chain_block` ON (chain_id, block_number) - `idx_token_transfers_chain_token_from` ON (chain_id, token_address, from_address) - `idx_token_transfers_chain_token_to` ON (chain_id, token_address, to_address) **Relationships**: - Many-to-one with `transactions` - Many-to-one with `tokens` ### Token Schema **Table**: `tokens` **Purpose**: Store token metadata (ERC-20, ERC-721, ERC-1155). **Fields**: ```sql tokens ( id BIGSERIAL PRIMARY KEY, chain_id INTEGER NOT NULL, address VARCHAR(42) NOT NULL, type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155' name VARCHAR(255), symbol VARCHAR(50), decimals INTEGER, -- For ERC-20 total_supply NUMERIC, holder_count INTEGER DEFAULT 0, transfer_count INTEGER DEFAULT 0, logo_url TEXT, website_url TEXT, description TEXT, verified BOOLEAN DEFAULT false, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW(), UNIQUE(chain_id, address) ) ``` **Indexes**: - `idx_tokens_chain_address` ON (chain_id, address) - `idx_tokens_chain_type` ON (chain_id, type) - `idx_tokens_chain_symbol` ON (chain_id, symbol) -- For search **Relationships**: - One-to-many with `token_transfers` - One-to-many with `token_holders` (if maintained) ### Contract Metadata Schema **Table**: `contracts` **Purpose**: Store verified contract information. **Fields**: ```sql contracts ( id BIGSERIAL PRIMARY KEY, chain_id INTEGER NOT NULL, address VARCHAR(42) NOT NULL, name VARCHAR(255), compiler_version VARCHAR(50), optimization_enabled BOOLEAN, optimization_runs INTEGER, evm_version VARCHAR(20), source_code TEXT, abi JSONB, constructor_arguments TEXT, verification_status VARCHAR(20) NOT NULL, -- 'pending', 'verified', 'failed' verified_at TIMESTAMP, verification_method VARCHAR(50), -- 'standard_json', 'sourcify', 'multi_file' license VARCHAR(50), created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW(), UNIQUE(chain_id, address) ) ``` **Indexes**: - `idx_contracts_chain_address` ON (chain_id, address) - `idx_contracts_chain_verified` ON (chain_id, verification_status) **Relationships**: - One-to-one with `contract_abis` (if separate ABI storage) ### Contract ABI Schema **Table**: `contract_abis` **Purpose**: Store contract ABIs for decoding (can be separate from verification). **Fields**: ```sql contract_abis ( id BIGSERIAL PRIMARY KEY, chain_id INTEGER NOT NULL, address VARCHAR(42) NOT NULL, abi JSONB NOT NULL, source VARCHAR(50) NOT NULL, -- 'verification', 'sourcify', 'public', 'user_submitted' verified BOOLEAN DEFAULT false, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW(), UNIQUE(chain_id, address) ) ``` **Indexes**: - `idx_abis_chain_address` ON (chain_id, address) ## Address-Related Models ### Address Labels Schema **Table**: `address_labels` **Purpose**: User-defined and public labels for addresses. **Fields**: ```sql address_labels ( id BIGSERIAL PRIMARY KEY, chain_id INTEGER NOT NULL, address VARCHAR(42) NOT NULL, label VARCHAR(255) NOT NULL, label_type VARCHAR(20) NOT NULL, -- 'user', 'public', 'contract_name' user_id UUID, -- NULL for public labels source VARCHAR(50), -- 'user', 'etherscan', 'blockscout', etc. created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW(), UNIQUE(chain_id, address, label_type, user_id) ) ``` **Indexes**: - `idx_labels_chain_address` ON (chain_id, address) - `idx_labels_chain_user` ON (chain_id, user_id) ### Address Tags Schema **Table**: `address_tags` **Purpose**: Categorize addresses (e.g., "exchange", "defi", "wallet"). **Fields**: ```sql address_tags ( id BIGSERIAL PRIMARY KEY, chain_id INTEGER NOT NULL, address VARCHAR(42) NOT NULL, tag VARCHAR(50) NOT NULL, tag_type VARCHAR(20) NOT NULL, -- 'category', 'risk', 'protocol' user_id UUID, -- NULL for public tags created_at TIMESTAMP DEFAULT NOW(), UNIQUE(chain_id, address, tag, user_id) ) ``` **Indexes**: - `idx_tags_chain_address` ON (chain_id, address) - `idx_tags_chain_tag` ON (chain_id, tag) ## User-Related Models ### User Accounts Schema **Table**: `users` **Purpose**: User accounts for watchlists, alerts, preferences. **Fields**: ```sql users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) UNIQUE, username VARCHAR(100) UNIQUE, password_hash TEXT, -- If using password auth api_key_hash TEXT, -- Hashed API key created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW(), last_login_at TIMESTAMP ) ``` **Indexes**: - `idx_users_email` ON (email) - `idx_users_username` ON (username) ### Watchlists Schema **Table**: `watchlists` **Purpose**: User-defined lists of addresses to monitor. **Fields**: ```sql watchlists ( id BIGSERIAL PRIMARY KEY, user_id UUID NOT NULL, chain_id INTEGER NOT NULL, address VARCHAR(42) NOT NULL, label VARCHAR(255), created_at TIMESTAMP DEFAULT NOW(), FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE, UNIQUE(user_id, chain_id, address) ) ``` **Indexes**: - `idx_watchlists_user` ON (user_id) - `idx_watchlists_chain_address` ON (chain_id, address) ## Data Type Definitions ### Numeric Types - **BIGINT**: Used for block numbers, gas values, nonces (64-bit integers) - **NUMERIC**: Used for token amounts, ETH values (arbitrary precision decimals) - Precision: 78 digits (sufficient for wei) - Scale: 0 (integers) or configurable for token decimals ### Address Types - **VARCHAR(42)**: Ethereum addresses (0x + 40 hex chars) - Normalize to lowercase for consistency ### Hash Types - **VARCHAR(66)**: Transaction/block hashes (0x + 64 hex chars) - **TEXT**: For very long hashes or variable-length data ### JSONB Types - Used for: ABIs, decoded event data, complex nested structures - Benefits: Indexing, querying, efficient storage ## Multi-Chain Considerations ### Chain ID Partitioning All tables include `chain_id` as the first column after primary key: - Enables efficient partitioning by chain_id - Ensures data isolation between chains - Simplifies multi-chain queries ### Partitioning Strategy **Recommended**: Partition large tables by `chain_id`: - `blocks`, `transactions`, `logs` partitioned by chain_id - Benefits: Faster queries, easier maintenance, parallel processing **Implementation** (PostgreSQL): ```sql -- Example partitioning CREATE TABLE blocks ( -- columns ) PARTITION BY LIST (chain_id); CREATE TABLE blocks_chain_138 PARTITION OF blocks FOR VALUES IN (138); CREATE TABLE blocks_chain_1 PARTITION OF blocks FOR VALUES IN (1); ``` ## Data Consistency ### Foreign Key Constraints - Enforce referential integrity where possible - Consider performance impact for high-throughput inserts - May disable for initial backfill, enable after catch-up ### Unique Constraints - Prevent duplicate blocks, transactions, logs - Enable idempotent processing - Use ON CONFLICT for upserts ## Indexing Strategy ### Index Types 1. **B-tree**: Default for most indexes (equality, range queries) 2. **Hash**: For exact match lookups (addresses, hashes) 3. **GIN**: For JSONB columns (ABIs, decoded data) 4. **BRIN**: For large ordered columns (block numbers, timestamps) ### Index Maintenance - Regular VACUUM and ANALYZE - Monitor index bloat - Consider partial indexes for filtered queries ## References - Indexer Architecture: See `indexer-architecture.md` - Database Schema: See `../database/postgres-schema.md` - Search Index Schema: See `../database/search-index-schema.md`