Files
explorer-monorepo/docs/specs/indexing/data-models.md

16 KiB

Data Models Specification

Overview

This document specifies the data models used throughout the indexing pipeline and stored in the database. All models support multi-chain operation via a chain_id field.

Core Data Models

Block Schema

Table: blocks

Fields:

blocks (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    number BIGINT NOT NULL,
    hash VARCHAR(66) NOT NULL,
    parent_hash VARCHAR(66) NOT NULL,
    nonce VARCHAR(18),
    sha3_uncles VARCHAR(66),
    logs_bloom TEXT,
    transactions_root VARCHAR(66),
    state_root VARCHAR(66),
    receipts_root VARCHAR(66),
    miner VARCHAR(42),
    difficulty NUMERIC,
    total_difficulty NUMERIC,
    size BIGINT,
    extra_data TEXT,
    gas_limit BIGINT,
    gas_used BIGINT,
    timestamp TIMESTAMP NOT NULL,
    transaction_count INTEGER DEFAULT 0,
    base_fee_per_gas BIGINT, -- EIP-1559
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(chain_id, number),
    UNIQUE(chain_id, hash)
)

Indexes:

  • idx_blocks_chain_number ON (chain_id, number)
  • idx_blocks_chain_hash ON (chain_id, hash)
  • idx_blocks_chain_timestamp ON (chain_id, timestamp)

Relationships:

  • One-to-many with transactions
  • One-to-many with logs

Transaction Schema

Table: transactions

Fields:

transactions (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    hash VARCHAR(66) NOT NULL,
    block_number BIGINT NOT NULL,
    block_hash VARCHAR(66) NOT NULL,
    transaction_index INTEGER NOT NULL,
    from_address VARCHAR(42) NOT NULL,
    to_address VARCHAR(42), -- NULL for contract creation
    value NUMERIC NOT NULL DEFAULT 0,
    gas_price BIGINT,
    max_fee_per_gas BIGINT, -- EIP-1559
    max_priority_fee_per_gas BIGINT, -- EIP-1559
    gas_limit BIGINT NOT NULL,
    gas_used BIGINT,
    nonce BIGINT NOT NULL,
    input_data TEXT, -- Contract call data
    status INTEGER, -- 0 = failed, 1 = success
    contract_address VARCHAR(42), -- NULL if not contract creation
    cumulative_gas_used BIGINT,
    effective_gas_price BIGINT, -- Actual gas price paid
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (chain_id, block_number) REFERENCES blocks(chain_id, number),
    UNIQUE(chain_id, hash)
)

Indexes:

  • idx_transactions_chain_hash ON (chain_id, hash)
  • idx_transactions_chain_block ON (chain_id, block_number, transaction_index)
  • idx_transactions_chain_from ON (chain_id, from_address)
  • idx_transactions_chain_to ON (chain_id, to_address)
  • idx_transactions_chain_block_from ON (chain_id, block_number, from_address)

Relationships:

  • Many-to-one with blocks
  • One-to-many with logs
  • One-to-many with internal_transactions
  • One-to-many with token_transfers

Receipt Schema

Note: Receipt data is stored denormalized in the transactions table for efficiency. If separate storage is needed:

Table: transaction_receipts

transaction_receipts (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    transaction_hash VARCHAR(66) NOT NULL,
    transaction_index INTEGER NOT NULL,
    block_number BIGINT NOT NULL,
    block_hash VARCHAR(66) NOT NULL,
    from_address VARCHAR(42) NOT NULL,
    to_address VARCHAR(42),
    gas_used BIGINT,
    cumulative_gas_used BIGINT,
    contract_address VARCHAR(42),
    logs_bloom TEXT,
    status INTEGER,
    root VARCHAR(66), -- Pre-Byzantium
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
    UNIQUE(chain_id, transaction_hash)
)

Log Schema

Table: logs

Fields:

logs (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    transaction_hash VARCHAR(66) NOT NULL,
    block_number BIGINT NOT NULL,
    block_hash VARCHAR(66) NOT NULL,
    log_index INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    topic0 VARCHAR(66), -- Event signature
    topic1 VARCHAR(66), -- First indexed parameter
    topic2 VARCHAR(66), -- Second indexed parameter
    topic3 VARCHAR(66), -- Third indexed parameter
    data TEXT, -- Non-indexed parameters
    decoded_data JSONB, -- Decoded event data (if ABI available)
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
    UNIQUE(chain_id, transaction_hash, log_index)
)

Indexes:

  • idx_logs_chain_tx ON (chain_id, transaction_hash)
  • idx_logs_chain_address ON (chain_id, address)
  • idx_logs_chain_topic0 ON (chain_id, topic0)
  • idx_logs_chain_block ON (chain_id, block_number)
  • idx_logs_chain_address_topic0 ON (chain_id, address, topic0) -- For event filtering

Relationships:

  • Many-to-one with transactions

Trace Schema

Table: traces

Fields:

traces (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    transaction_hash VARCHAR(66) NOT NULL,
    block_number BIGINT NOT NULL,
    block_hash VARCHAR(66) NOT NULL,
    trace_address INTEGER[], -- Array representing call hierarchy [0,1,2]
    subtraces INTEGER, -- Number of child calls
    action_type VARCHAR(20) NOT NULL, -- 'call', 'create', 'suicide', 'delegatecall'
    action_from VARCHAR(42),
    action_to VARCHAR(42),
    action_value NUMERIC DEFAULT 0,
    action_input TEXT,
    action_gas BIGINT,
    action_call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall'
    result_type VARCHAR(20), -- 'callresult', 'createresult'
    result_gas_used BIGINT,
    result_output TEXT,
    result_address VARCHAR(42), -- For create results
    result_code TEXT, -- For create results
    error TEXT, -- Error message if trace failed
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash)
)

Indexes:

  • idx_traces_chain_tx ON (chain_id, transaction_hash)
  • idx_traces_chain_block ON (chain_id, block_number)
  • idx_traces_chain_from ON (chain_id, action_from)
  • idx_traces_chain_to ON (chain_id, action_to)

Note: Trace data can be large. Consider partitioning or separate storage for historical traces.

Internal Transaction Schema

Table: internal_transactions

Purpose: Track value transfers that occur within transactions (via calls).

Fields:

internal_transactions (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    transaction_hash VARCHAR(66) NOT NULL,
    block_number BIGINT NOT NULL,
    trace_address INTEGER[] NOT NULL,
    from_address VARCHAR(42) NOT NULL,
    to_address VARCHAR(42) NOT NULL,
    value NUMERIC NOT NULL,
    call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall', 'create'
    gas_limit BIGINT,
    gas_used BIGINT,
    input_data TEXT,
    output_data TEXT,
    error TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash)
)

Indexes:

  • idx_internal_tx_chain_tx ON (chain_id, transaction_hash)
  • idx_internal_tx_chain_from ON (chain_id, from_address)
  • idx_internal_tx_chain_to ON (chain_id, to_address)
  • idx_internal_tx_chain_block ON (chain_id, block_number)

Relationships:

  • Many-to-one with transactions

Token Transfer Schema

Table: token_transfers

Purpose: Track ERC-20, ERC-721, and ERC-1155 token transfers.

Fields:

token_transfers (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    transaction_hash VARCHAR(66) NOT NULL,
    block_number BIGINT NOT NULL,
    log_index INTEGER NOT NULL,
    token_address VARCHAR(42) NOT NULL,
    token_type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155'
    from_address VARCHAR(42) NOT NULL,
    to_address VARCHAR(42) NOT NULL,
    amount NUMERIC, -- For ERC-20 and ERC-1155
    token_id VARCHAR(78), -- For ERC-721 and ERC-1155 (can be large)
    operator VARCHAR(42), -- For ERC-1155
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
    FOREIGN KEY (chain_id, token_address) REFERENCES tokens(chain_id, address)
)

Indexes:

  • idx_token_transfers_chain_token ON (chain_id, token_address)
  • idx_token_transfers_chain_from ON (chain_id, from_address)
  • idx_token_transfers_chain_to ON (chain_id, to_address)
  • idx_token_transfers_chain_tx ON (chain_id, transaction_hash)
  • idx_token_transfers_chain_block ON (chain_id, block_number)
  • idx_token_transfers_chain_token_from ON (chain_id, token_address, from_address)
  • idx_token_transfers_chain_token_to ON (chain_id, token_address, to_address)

Relationships:

  • Many-to-one with transactions
  • Many-to-one with tokens

Token Schema

Table: tokens

Purpose: Store token metadata (ERC-20, ERC-721, ERC-1155).

Fields:

tokens (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155'
    name VARCHAR(255),
    symbol VARCHAR(50),
    decimals INTEGER, -- For ERC-20
    total_supply NUMERIC,
    holder_count INTEGER DEFAULT 0,
    transfer_count INTEGER DEFAULT 0,
    logo_url TEXT,
    website_url TEXT,
    description TEXT,
    verified BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(chain_id, address)
)

Indexes:

  • idx_tokens_chain_address ON (chain_id, address)
  • idx_tokens_chain_type ON (chain_id, type)
  • idx_tokens_chain_symbol ON (chain_id, symbol) -- For search

Relationships:

  • One-to-many with token_transfers
  • One-to-many with token_holders (if maintained)

Contract Metadata Schema

Table: contracts

Purpose: Store verified contract information.

Fields:

contracts (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    name VARCHAR(255),
    compiler_version VARCHAR(50),
    optimization_enabled BOOLEAN,
    optimization_runs INTEGER,
    evm_version VARCHAR(20),
    source_code TEXT,
    abi JSONB,
    constructor_arguments TEXT,
    verification_status VARCHAR(20) NOT NULL, -- 'pending', 'verified', 'failed'
    verified_at TIMESTAMP,
    verification_method VARCHAR(50), -- 'standard_json', 'sourcify', 'multi_file'
    license VARCHAR(50),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(chain_id, address)
)

Indexes:

  • idx_contracts_chain_address ON (chain_id, address)
  • idx_contracts_chain_verified ON (chain_id, verification_status)

Relationships:

  • One-to-one with contract_abis (if separate ABI storage)

Contract ABI Schema

Table: contract_abis

Purpose: Store contract ABIs for decoding (can be separate from verification).

Fields:

contract_abis (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    abi JSONB NOT NULL,
    source VARCHAR(50) NOT NULL, -- 'verification', 'sourcify', 'public', 'user_submitted'
    verified BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(chain_id, address)
)

Indexes:

  • idx_abis_chain_address ON (chain_id, address)

Address Labels Schema

Table: address_labels

Purpose: User-defined and public labels for addresses.

Fields:

address_labels (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    label VARCHAR(255) NOT NULL,
    label_type VARCHAR(20) NOT NULL, -- 'user', 'public', 'contract_name'
    user_id UUID, -- NULL for public labels
    source VARCHAR(50), -- 'user', 'etherscan', 'blockscout', etc.
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(chain_id, address, label_type, user_id)
)

Indexes:

  • idx_labels_chain_address ON (chain_id, address)
  • idx_labels_chain_user ON (chain_id, user_id)

Address Tags Schema

Table: address_tags

Purpose: Categorize addresses (e.g., "exchange", "defi", "wallet").

Fields:

address_tags (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    tag VARCHAR(50) NOT NULL,
    tag_type VARCHAR(20) NOT NULL, -- 'category', 'risk', 'protocol'
    user_id UUID, -- NULL for public tags
    created_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(chain_id, address, tag, user_id)
)

Indexes:

  • idx_tags_chain_address ON (chain_id, address)
  • idx_tags_chain_tag ON (chain_id, tag)

User Accounts Schema

Table: users

Purpose: User accounts for watchlists, alerts, preferences.

Fields:

users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE,
    username VARCHAR(100) UNIQUE,
    password_hash TEXT, -- If using password auth
    api_key_hash TEXT, -- Hashed API key
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    last_login_at TIMESTAMP
)

Indexes:

  • idx_users_email ON (email)
  • idx_users_username ON (username)

Watchlists Schema

Table: watchlists

Purpose: User-defined lists of addresses to monitor.

Fields:

watchlists (
    id BIGSERIAL PRIMARY KEY,
    user_id UUID NOT NULL,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    label VARCHAR(255),
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
    UNIQUE(user_id, chain_id, address)
)

Indexes:

  • idx_watchlists_user ON (user_id)
  • idx_watchlists_chain_address ON (chain_id, address)

Data Type Definitions

Numeric Types

  • BIGINT: Used for block numbers, gas values, nonces (64-bit integers)
  • NUMERIC: Used for token amounts, ETH values (arbitrary precision decimals)
    • Precision: 78 digits (sufficient for wei)
    • Scale: 0 (integers) or configurable for token decimals

Address Types

  • VARCHAR(42): Ethereum addresses (0x + 40 hex chars)
  • Normalize to lowercase for consistency

Hash Types

  • VARCHAR(66): Transaction/block hashes (0x + 64 hex chars)
  • TEXT: For very long hashes or variable-length data

JSONB Types

  • Used for: ABIs, decoded event data, complex nested structures
  • Benefits: Indexing, querying, efficient storage

Multi-Chain Considerations

Chain ID Partitioning

All tables include chain_id as the first column after primary key:

  • Enables efficient partitioning by chain_id
  • Ensures data isolation between chains
  • Simplifies multi-chain queries

Partitioning Strategy

Recommended: Partition large tables by chain_id:

  • blocks, transactions, logs partitioned by chain_id
  • Benefits: Faster queries, easier maintenance, parallel processing

Implementation (PostgreSQL):

-- Example partitioning
CREATE TABLE blocks (
    -- columns
) PARTITION BY LIST (chain_id);

CREATE TABLE blocks_chain_138 PARTITION OF blocks FOR VALUES IN (138);
CREATE TABLE blocks_chain_1 PARTITION OF blocks FOR VALUES IN (1);

Data Consistency

Foreign Key Constraints

  • Enforce referential integrity where possible
  • Consider performance impact for high-throughput inserts
  • May disable for initial backfill, enable after catch-up

Unique Constraints

  • Prevent duplicate blocks, transactions, logs
  • Enable idempotent processing
  • Use ON CONFLICT for upserts

Indexing Strategy

Index Types

  1. B-tree: Default for most indexes (equality, range queries)
  2. Hash: For exact match lookups (addresses, hashes)
  3. GIN: For JSONB columns (ABIs, decoded data)
  4. BRIN: For large ordered columns (block numbers, timestamps)

Index Maintenance

  • Regular VACUUM and ANALYZE
  • Monitor index bloat
  • Consider partial indexes for filtered queries

References

  • Indexer Architecture: See indexer-architecture.md
  • Database Schema: See ../database/postgres-schema.md
  • Search Index Schema: See ../database/search-index-schema.md