Files

defiQUG 903c03c65b Add full monorepo: virtual-banker, backend, frontend, docs, scripts, deployment

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-10 11:32:49 -08:00

9.5 KiB

Raw Permalink Blame History

Search Index Schema Specification

Overview

This document specifies the Elasticsearch/OpenSearch index schema for full-text search and faceted querying across blocks, transactions, addresses, tokens, and contracts.

Architecture

flowchart LR
    PG[(PostgreSQL<br/>Canonical Data)]
    Transform[Data Transformer]
    ES[(Elasticsearch<br/>Search Index)]
    
    PG --> Transform
    Transform --> ES
    
    Query[Search Query]
    Query --> ES
    ES --> Results[Search Results]

Index Structure

Blocks Index

Index Name: blocks-{chain_id} (e.g., blocks-138)

Document Structure:

{
  "block_number": 12345,
  "hash": "0x...",
  "timestamp": "2024-01-01T00:00:00Z",
  "miner": "0x...",
  "transaction_count": 100,
  "gas_used": 15000000,
  "gas_limit": 20000000,
  "chain_id": 138,
  "parent_hash": "0x...",
  "size": 1024
}

Field Mappings:

block_number: long (not analyzed, for sorting/filtering)
hash: keyword (exact match)
timestamp: date
miner: keyword (exact match)
transaction_count: integer
gas_used: long
gas_limit: long
chain_id: integer
parent_hash: keyword

Searchable Fields:

Hash (exact match)
Miner address (exact match)

Transactions Index

Index Name: transactions-{chain_id}

Document Structure:

{
  "hash": "0x...",
  "block_number": 12345,
  "transaction_index": 5,
  "from_address": "0x...",
  "to_address": "0x...",
  "value": "1000000000000000000",
  "gas_price": "20000000000",
  "gas_used": 21000,
  "status": "success",
  "timestamp": "2024-01-01T00:00:00Z",
  "chain_id": 138,
  "input_data_length": 100,
  "is_contract_creation": false,
  "contract_address": null
}

Field Mappings:

hash: keyword
block_number: long
transaction_index: integer
from_address: keyword
to_address: keyword
value: text (for full-text search on large numbers)
value_numeric: long (for range queries)
gas_price: long
gas_used: long
status: keyword
timestamp: date
chain_id: integer
input_data_length: integer
is_contract_creation: boolean
contract_address: keyword

Searchable Fields:

Hash (exact match)
From/to addresses (exact match)
Value (range queries)

Addresses Index

Index Name: addresses-{chain_id}

Document Structure:

{
  "address": "0x...",
  "chain_id": 138,
  "label": "My Wallet",
  "tags": ["wallet", "exchange"],
  "token_count": 10,
  "transaction_count": 500,
  "first_seen": "2024-01-01T00:00:00Z",
  "last_seen": "2024-01-15T00:00:00Z",
  "is_contract": true,
  "contract_name": "MyToken",
  "balance_eth": "1.5",
  "balance_usd": "3000"
}

Field Mappings:

address: keyword
chain_id: integer
label: text (analyzed) + keyword (exact match)
tags: keyword (array)
token_count: integer
transaction_count: long
first_seen: date
last_seen: date
is_contract: boolean
contract_name: text + keyword
balance_eth: double
balance_usd: double

Searchable Fields:

Address (exact match, prefix match)
Label (full-text search)
Contract name (full-text search)
Tags (facet filter)

Tokens Index

Index Name: tokens-{chain_id}

Document Structure:

{
  "address": "0x...",
  "chain_id": 138,
  "name": "My Token",
  "symbol": "MTK",
  "type": "ERC20",
  "decimals": 18,
  "total_supply": "1000000000000000000000000",
  "holder_count": 1000,
  "transfer_count": 50000,
  "logo_url": "https://...",
  "verified": true,
  "description": "A token description"
}

Field Mappings:

address: keyword
chain_id: integer
name: text (analyzed) + keyword (exact match)
symbol: keyword (uppercase normalized)
type: keyword
decimals: integer
total_supply: text (for large numbers)
total_supply_numeric: double (for sorting)
holder_count: integer
transfer_count: long
logo_url: keyword
verified: boolean
description: text (analyzed)

Searchable Fields:

Name (full-text search)
Symbol (exact match, prefix match)
Address (exact match)

Contracts Index

Index Name: contracts-{chain_id}

Document Structure:

{
  "address": "0x...",
  "chain_id": 138,
  "name": "MyContract",
  "verification_status": "verified",
  "compiler_version": "0.8.19",
  "source_code": "contract MyContract {...}",
  "abi": [...],
  "verified_at": "2024-01-01T00:00:00Z",
  "transaction_count": 1000,
  "created_at": "2024-01-01T00:00:00Z"
}

Field Mappings:

address: keyword
chain_id: integer
name: text + keyword
verification_status: keyword
compiler_version: keyword
source_code: text (analyzed, indexed but not stored in full for large contracts)
abi: object (nested, for structured queries)
verified_at: date
transaction_count: long
created_at: date

Searchable Fields:

Name (full-text search)
Address (exact match)
Source code (full-text search, limited)

Indexing Pipeline

Data Transformation

Purpose: Transform canonical PostgreSQL data into search-optimized documents.

Transformation Steps:

Fetch Data: Query PostgreSQL for entities to index
Enrich Data: Add computed fields (balances, counts, etc.)
Normalize Data: Normalize addresses, format values
Index Document: Send to Elasticsearch/OpenSearch

Indexing Strategy

Initial Indexing:

Bulk index existing data
Process in batches (1000 documents per batch)
Use bulk API for efficiency

Incremental Indexing:

Index new entities as they're created
Update entities when changed
Delete entities when removed

Update Frequency:

Real-time: Index immediately after database insert/update
Batch: Bulk update every N minutes for efficiency

Index Aliases

Purpose: Enable zero-downtime index updates.

Strategy:

Write to new index (e.g., blocks-138-v2)
Build index in background
Switch alias when ready
Delete old index after switch

Alias Names:

blocks-{chain_id} → points to latest version
transactions-{chain_id} → points to latest version
etc.

Query Patterns

Full-Text Search

Blocks Search:

{
  "query": {
    "match": {
      "hash": "0x123..."
    }
  }
}

Address Search:

{
  "query": {
    "bool": {
      "should": [
        { "match": { "label": "wallet" } },
        { "prefix": { "address": "0x123" } }
      ]
    }
  }
}

Token Search:

{
  "query": {
    "bool": {
      "should": [
        { "match": { "name": "My Token" } },
        { "match": { "symbol": "MTK" } }
      ]
    }
  }
}

Faceted Search

Filter by Multiple Criteria:

{
  "query": {
    "bool": {
      "must": [
        { "term": { "chain_id": 138 } },
        { "term": { "type": "ERC20" } },
        { "range": { "holder_count": { "gte": 100 } } }
      ]
    }
  },
  "aggs": {
    "by_type": {
      "terms": { "field": "type" }
    }
  }
}

Unified Search

Cross-Entity Search:

Search across blocks, transactions, addresses, tokens
Use _index field to filter by entity type
Combine results with relevance scoring

Multi-Index Query:

{
  "query": {
    "multi_match": {
      "query": "0x123",
      "fields": ["hash", "address", "from_address", "to_address"],
      "type": "best_fields"
    }
  }
}

Index Configuration

Analysis Settings

Custom Analyzer:

Address analyzer: Lowercase, no tokenization
Symbol analyzer: Uppercase, no tokenization
Text analyzer: Standard analyzer with lowercase

Example Configuration:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "address_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": ["lowercase"]
        }
      }
    }
  }
}

Sharding and Replication

Sharding:

Number of shards: Based on index size
Large indices (> 50GB): Multiple shards
Small indices: Single shard

Replication:

Replica count: 1-2 (for high availability)
Increase replicas for read-heavy workloads

Performance Optimization

Index Optimization

Refresh Interval:

Default: 1 second
For bulk indexing: Increase to 30 seconds, then reset

Bulk Indexing:

Batch size: 1000-5000 documents
Use bulk API
Disable refresh during bulk indexing

Query Optimization

Query Caching:

Enable query cache for repeated queries
Cache filter results

Field Data:

Use doc_values for sorting/aggregations
Avoid fielddata for text fields

Maintenance

Index Monitoring

Metrics:

Index size
Document count
Query performance (p50, p95, p99)
Index lag (time behind database)

Index Cleanup

Strategy:

Delete old indices (after alias switch)
Archive old indices to cold storage
Compress indices for storage efficiency

Integration with PostgreSQL

Data Sync

Sync Strategy:

Real-time: Listen to database changes (CDC, triggers, or polling)
Batch: Periodic sync jobs
Hybrid: Real-time for recent data, batch for historical

Change Detection:

Use updated_at timestamp
Use database triggers to queue changes
Use CDC (Change Data Capture) if available

Consistency

Eventual Consistency:

Search index is eventually consistent with database
Small lag acceptable (< 1 minute)
Critical queries can fall back to database

References

Database Schema: See postgres-schema.md
Indexer Architecture: See ../indexing/indexer-architecture.md
Unified Search: See ../multichain/unified-search.md

9.5 KiB Raw Permalink Blame History

Search Index Schema Specification

Overview

Architecture

Index Structure

Blocks Index

Transactions Index

Addresses Index

Tokens Index

Contracts Index

Indexing Pipeline

Data Transformation

Indexing Strategy

Index Aliases

Query Patterns

Full-Text Search

Faceted Search

Unified Search

Index Configuration

Analysis Settings

Sharding and Replication

Performance Optimization

Index Optimization

Query Optimization

Maintenance

Index Monitoring

Index Cleanup

Integration with PostgreSQL

Data Sync

Consistency

References

9.5 KiB

Raw Permalink Blame History