# Data Models Specification

## Overview

This document specifies the data models used throughout the indexing pipeline and stored in the database. All models support multi-chain operation via a `chain_id` field.

## Core Data Models

### Block Schema

**Table**: `blocks`

**Fields**:
```sql
blocks (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    number BIGINT NOT NULL,
    hash VARCHAR(66) NOT NULL,
    parent_hash VARCHAR(66) NOT NULL,
    nonce VARCHAR(18),
    sha3_uncles VARCHAR(66),
    logs_bloom TEXT,
    transactions_root VARCHAR(66),
    state_root VARCHAR(66),
    receipts_root VARCHAR(66),
    miner VARCHAR(42),
    difficulty NUMERIC,
    total_difficulty NUMERIC,
    size BIGINT,
    extra_data TEXT,
    gas_limit BIGINT,
    gas_used BIGINT,
    timestamp TIMESTAMP NOT NULL,
    transaction_count INTEGER DEFAULT 0,
    base_fee_per_gas BIGINT, -- EIP-1559
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(chain_id, number),
    UNIQUE(chain_id, hash)
)
```

**Indexes**:
- `idx_blocks_chain_number` ON (chain_id, number)
- `idx_blocks_chain_hash` ON (chain_id, hash)
- `idx_blocks_chain_timestamp` ON (chain_id, timestamp)

**Relationships**:
- One-to-many with `transactions`
- One-to-many with `logs`

### Transaction Schema

**Table**: `transactions`

**Fields**:
```sql
transactions (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    hash VARCHAR(66) NOT NULL,
    block_number BIGINT NOT NULL,
    block_hash VARCHAR(66) NOT NULL,
    transaction_index INTEGER NOT NULL,
    from_address VARCHAR(42) NOT NULL,
    to_address VARCHAR(42), -- NULL for contract creation
    value NUMERIC NOT NULL DEFAULT 0,
    gas_price BIGINT,
    max_fee_per_gas BIGINT, -- EIP-1559
    max_priority_fee_per_gas BIGINT, -- EIP-1559
    gas_limit BIGINT NOT NULL,
    gas_used BIGINT,
    nonce BIGINT NOT NULL,
    input_data TEXT, -- Contract call data
    status INTEGER, -- 0 = failed, 1 = success
    contract_address VARCHAR(42), -- NULL if not contract creation
    cumulative_gas_used BIGINT,
    effective_gas_price BIGINT, -- Actual gas price paid
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (chain_id, block_number) REFERENCES blocks(chain_id, number),
    UNIQUE(chain_id, hash)
)
```

**Indexes**:
- `idx_transactions_chain_hash` ON (chain_id, hash)
- `idx_transactions_chain_block` ON (chain_id, block_number, transaction_index)
- `idx_transactions_chain_from` ON (chain_id, from_address)
- `idx_transactions_chain_to` ON (chain_id, to_address)
- `idx_transactions_chain_block_from` ON (chain_id, block_number, from_address)

**Relationships**:
- Many-to-one with `blocks`
- One-to-many with `logs`
- One-to-many with `internal_transactions`
- One-to-many with `token_transfers`

### Receipt Schema

**Note**: Receipt data is stored denormalized in the `transactions` table for efficiency. If separate storage is needed:

**Table**: `transaction_receipts`

```sql
transaction_receipts (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    transaction_hash VARCHAR(66) NOT NULL,
    transaction_index INTEGER NOT NULL,
    block_number BIGINT NOT NULL,
    block_hash VARCHAR(66) NOT NULL,
    from_address VARCHAR(42) NOT NULL,
    to_address VARCHAR(42),
    gas_used BIGINT,
    cumulative_gas_used BIGINT,
    contract_address VARCHAR(42),
    logs_bloom TEXT,
    status INTEGER,
    root VARCHAR(66), -- Pre-Byzantium
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
    UNIQUE(chain_id, transaction_hash)
)
```

### Log Schema

**Table**: `logs`

**Fields**:
```sql
logs (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    transaction_hash VARCHAR(66) NOT NULL,
    block_number BIGINT NOT NULL,
    block_hash VARCHAR(66) NOT NULL,
    log_index INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    topic0 VARCHAR(66), -- Event signature
    topic1 VARCHAR(66), -- First indexed parameter
    topic2 VARCHAR(66), -- Second indexed parameter
    topic3 VARCHAR(66), -- Third indexed parameter
    data TEXT, -- Non-indexed parameters
    decoded_data JSONB, -- Decoded event data (if ABI available)
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
    UNIQUE(chain_id, transaction_hash, log_index)
)
```

**Indexes**:
- `idx_logs_chain_tx` ON (chain_id, transaction_hash)
- `idx_logs_chain_address` ON (chain_id, address)
- `idx_logs_chain_topic0` ON (chain_id, topic0)
- `idx_logs_chain_block` ON (chain_id, block_number)
- `idx_logs_chain_address_topic0` ON (chain_id, address, topic0) -- For event filtering

**Relationships**:
- Many-to-one with `transactions`

### Trace Schema

**Table**: `traces`

**Fields**:
```sql
traces (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    transaction_hash VARCHAR(66) NOT NULL,
    block_number BIGINT NOT NULL,
    block_hash VARCHAR(66) NOT NULL,
    trace_address INTEGER[], -- Array representing call hierarchy [0,1,2]
    subtraces INTEGER, -- Number of child calls
    action_type VARCHAR(20) NOT NULL, -- 'call', 'create', 'suicide', 'delegatecall'
    action_from VARCHAR(42),
    action_to VARCHAR(42),
    action_value NUMERIC DEFAULT 0,
    action_input TEXT,
    action_gas BIGINT,
    action_call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall'
    result_type VARCHAR(20), -- 'callresult', 'createresult'
    result_gas_used BIGINT,
    result_output TEXT,
    result_address VARCHAR(42), -- For create results
    result_code TEXT, -- For create results
    error TEXT, -- Error message if trace failed
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash)
)
```

**Indexes**:
- `idx_traces_chain_tx` ON (chain_id, transaction_hash)
- `idx_traces_chain_block` ON (chain_id, block_number)
- `idx_traces_chain_from` ON (chain_id, action_from)
- `idx_traces_chain_to` ON (chain_id, action_to)

**Note**: Trace data can be large. Consider partitioning or separate storage for historical traces.

### Internal Transaction Schema

**Table**: `internal_transactions`

**Purpose**: Track value transfers that occur within transactions (via calls).

**Fields**:
```sql
internal_transactions (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    transaction_hash VARCHAR(66) NOT NULL,
    block_number BIGINT NOT NULL,
    trace_address INTEGER[] NOT NULL,
    from_address VARCHAR(42) NOT NULL,
    to_address VARCHAR(42) NOT NULL,
    value NUMERIC NOT NULL,
    call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall', 'create'
    gas_limit BIGINT,
    gas_used BIGINT,
    input_data TEXT,
    output_data TEXT,
    error TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash)
)
```

**Indexes**:
- `idx_internal_tx_chain_tx` ON (chain_id, transaction_hash)
- `idx_internal_tx_chain_from` ON (chain_id, from_address)
- `idx_internal_tx_chain_to` ON (chain_id, to_address)
- `idx_internal_tx_chain_block` ON (chain_id, block_number)

**Relationships**:
- Many-to-one with `transactions`

### Token Transfer Schema

**Table**: `token_transfers`

**Purpose**: Track ERC-20, ERC-721, and ERC-1155 token transfers.

**Fields**:
```sql
token_transfers (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    transaction_hash VARCHAR(66) NOT NULL,
    block_number BIGINT NOT NULL,
    log_index INTEGER NOT NULL,
    token_address VARCHAR(42) NOT NULL,
    token_type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155'
    from_address VARCHAR(42) NOT NULL,
    to_address VARCHAR(42) NOT NULL,
    amount NUMERIC, -- For ERC-20 and ERC-1155
    token_id VARCHAR(78), -- For ERC-721 and ERC-1155 (can be large)
    operator VARCHAR(42), -- For ERC-1155
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
    FOREIGN KEY (chain_id, token_address) REFERENCES tokens(chain_id, address)
)
```

**Indexes**:
- `idx_token_transfers_chain_token` ON (chain_id, token_address)
- `idx_token_transfers_chain_from` ON (chain_id, from_address)
- `idx_token_transfers_chain_to` ON (chain_id, to_address)
- `idx_token_transfers_chain_tx` ON (chain_id, transaction_hash)
- `idx_token_transfers_chain_block` ON (chain_id, block_number)
- `idx_token_transfers_chain_token_from` ON (chain_id, token_address, from_address)
- `idx_token_transfers_chain_token_to` ON (chain_id, token_address, to_address)

**Relationships**:
- Many-to-one with `transactions`
- Many-to-one with `tokens`

### Token Schema

**Table**: `tokens`

**Purpose**: Store token metadata (ERC-20, ERC-721, ERC-1155).

**Fields**:
```sql
tokens (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155'
    name VARCHAR(255),
    symbol VARCHAR(50),
    decimals INTEGER, -- For ERC-20
    total_supply NUMERIC,
    holder_count INTEGER DEFAULT 0,
    transfer_count INTEGER DEFAULT 0,
    logo_url TEXT,
    website_url TEXT,
    description TEXT,
    verified BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(chain_id, address)
)
```

**Indexes**:
- `idx_tokens_chain_address` ON (chain_id, address)
- `idx_tokens_chain_type` ON (chain_id, type)
- `idx_tokens_chain_symbol` ON (chain_id, symbol) -- For search

**Relationships**:
- One-to-many with `token_transfers`
- One-to-many with `token_holders` (if maintained)

### Contract Metadata Schema

**Table**: `contracts`

**Purpose**: Store verified contract information.

**Fields**:
```sql
contracts (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    name VARCHAR(255),
    compiler_version VARCHAR(50),
    optimization_enabled BOOLEAN,
    optimization_runs INTEGER,
    evm_version VARCHAR(20),
    source_code TEXT,
    abi JSONB,
    constructor_arguments TEXT,
    verification_status VARCHAR(20) NOT NULL, -- 'pending', 'verified', 'failed'
    verified_at TIMESTAMP,
    verification_method VARCHAR(50), -- 'standard_json', 'sourcify', 'multi_file'
    license VARCHAR(50),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(chain_id, address)
)
```

**Indexes**:
- `idx_contracts_chain_address` ON (chain_id, address)
- `idx_contracts_chain_verified` ON (chain_id, verification_status)

**Relationships**:
- One-to-one with `contract_abis` (if separate ABI storage)

### Contract ABI Schema

**Table**: `contract_abis`

**Purpose**: Store contract ABIs for decoding (can be separate from verification).

**Fields**:
```sql
contract_abis (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    abi JSONB NOT NULL,
    source VARCHAR(50) NOT NULL, -- 'verification', 'sourcify', 'public', 'user_submitted'
    verified BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(chain_id, address)
)
```

**Indexes**:
- `idx_abis_chain_address` ON (chain_id, address)

## Address-Related Models

### Address Labels Schema

**Table**: `address_labels`

**Purpose**: User-defined and public labels for addresses.

**Fields**:
```sql
address_labels (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    label VARCHAR(255) NOT NULL,
    label_type VARCHAR(20) NOT NULL, -- 'user', 'public', 'contract_name'
    user_id UUID, -- NULL for public labels
    source VARCHAR(50), -- 'user', 'etherscan', 'blockscout', etc.
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(chain_id, address, label_type, user_id)
)
```

**Indexes**:
- `idx_labels_chain_address` ON (chain_id, address)
- `idx_labels_chain_user` ON (chain_id, user_id)

### Address Tags Schema

**Table**: `address_tags`

**Purpose**: Categorize addresses (e.g., "exchange", "defi", "wallet").

**Fields**:
```sql
address_tags (
    id BIGSERIAL PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    tag VARCHAR(50) NOT NULL,
    tag_type VARCHAR(20) NOT NULL, -- 'category', 'risk', 'protocol'
    user_id UUID, -- NULL for public tags
    created_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(chain_id, address, tag, user_id)
)
```

**Indexes**:
- `idx_tags_chain_address` ON (chain_id, address)
- `idx_tags_chain_tag` ON (chain_id, tag)

## User-Related Models

### User Accounts Schema

**Table**: `users`

**Purpose**: User accounts for watchlists, alerts, preferences.

**Fields**:
```sql
users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE,
    username VARCHAR(100) UNIQUE,
    password_hash TEXT, -- If using password auth
    api_key_hash TEXT, -- Hashed API key
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    last_login_at TIMESTAMP
)
```

**Indexes**:
- `idx_users_email` ON (email)
- `idx_users_username` ON (username)

### Watchlists Schema

**Table**: `watchlists`

**Purpose**: User-defined lists of addresses to monitor.

**Fields**:
```sql
watchlists (
    id BIGSERIAL PRIMARY KEY,
    user_id UUID NOT NULL,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    label VARCHAR(255),
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
    UNIQUE(user_id, chain_id, address)
)
```

**Indexes**:
- `idx_watchlists_user` ON (user_id)
- `idx_watchlists_chain_address` ON (chain_id, address)

## Data Type Definitions

### Numeric Types

- **BIGINT**: Used for block numbers, gas values, nonces (64-bit integers)
- **NUMERIC**: Used for token amounts, ETH values (arbitrary precision decimals)
  - Precision: 78 digits (sufficient for wei)
  - Scale: 0 (integers) or configurable for token decimals

### Address Types

- **VARCHAR(42)**: Ethereum addresses (0x + 40 hex chars)
- Normalize to lowercase for consistency

### Hash Types

- **VARCHAR(66)**: Transaction/block hashes (0x + 64 hex chars)
- **TEXT**: For very long hashes or variable-length data

### JSONB Types

- Used for: ABIs, decoded event data, complex nested structures
- Benefits: Indexing, querying, efficient storage

## Multi-Chain Considerations

### Chain ID Partitioning

All tables include `chain_id` as the first column after primary key:
- Enables efficient partitioning by chain_id
- Ensures data isolation between chains
- Simplifies multi-chain queries

### Partitioning Strategy

**Recommended**: Partition large tables by `chain_id`:
- `blocks`, `transactions`, `logs` partitioned by chain_id
- Benefits: Faster queries, easier maintenance, parallel processing

**Implementation** (PostgreSQL):
```sql
-- Example partitioning
CREATE TABLE blocks (
    -- columns
) PARTITION BY LIST (chain_id);

CREATE TABLE blocks_chain_138 PARTITION OF blocks FOR VALUES IN (138);
CREATE TABLE blocks_chain_1 PARTITION OF blocks FOR VALUES IN (1);
```

## Data Consistency

### Foreign Key Constraints

- Enforce referential integrity where possible
- Consider performance impact for high-throughput inserts
- May disable for initial backfill, enable after catch-up

### Unique Constraints

- Prevent duplicate blocks, transactions, logs
- Enable idempotent processing
- Use ON CONFLICT for upserts

## Indexing Strategy

### Index Types

1. **B-tree**: Default for most indexes (equality, range queries)
2. **Hash**: For exact match lookups (addresses, hashes)
3. **GIN**: For JSONB columns (ABIs, decoded data)
4. **BRIN**: For large ordered columns (block numbers, timestamps)

### Index Maintenance

- Regular VACUUM and ANALYZE
- Monitor index bloat
- Consider partial indexes for filtered queries

## References

- Indexer Architecture: See `indexer-architecture.md`
- Database Schema: See `../database/postgres-schema.md`
- Search Index Schema: See `../database/search-index-schema.md`