9.5 KiB
Search Index Schema Specification
Overview
This document specifies the Elasticsearch/OpenSearch index schema for full-text search and faceted querying across blocks, transactions, addresses, tokens, and contracts.
Architecture
flowchart LR
PG[(PostgreSQL<br/>Canonical Data)]
Transform[Data Transformer]
ES[(Elasticsearch<br/>Search Index)]
PG --> Transform
Transform --> ES
Query[Search Query]
Query --> ES
ES --> Results[Search Results]
Index Structure
Blocks Index
Index Name: blocks-{chain_id} (e.g., blocks-138)
Document Structure:
{
"block_number": 12345,
"hash": "0x...",
"timestamp": "2024-01-01T00:00:00Z",
"miner": "0x...",
"transaction_count": 100,
"gas_used": 15000000,
"gas_limit": 20000000,
"chain_id": 138,
"parent_hash": "0x...",
"size": 1024
}
Field Mappings:
block_number:long(not analyzed, for sorting/filtering)hash:keyword(exact match)timestamp:dateminer:keyword(exact match)transaction_count:integergas_used:longgas_limit:longchain_id:integerparent_hash:keyword
Searchable Fields:
- Hash (exact match)
- Miner address (exact match)
Transactions Index
Index Name: transactions-{chain_id}
Document Structure:
{
"hash": "0x...",
"block_number": 12345,
"transaction_index": 5,
"from_address": "0x...",
"to_address": "0x...",
"value": "1000000000000000000",
"gas_price": "20000000000",
"gas_used": 21000,
"status": "success",
"timestamp": "2024-01-01T00:00:00Z",
"chain_id": 138,
"input_data_length": 100,
"is_contract_creation": false,
"contract_address": null
}
Field Mappings:
hash:keywordblock_number:longtransaction_index:integerfrom_address:keywordto_address:keywordvalue:text(for full-text search on large numbers)value_numeric:long(for range queries)gas_price:longgas_used:longstatus:keywordtimestamp:datechain_id:integerinput_data_length:integeris_contract_creation:booleancontract_address:keyword
Searchable Fields:
- Hash (exact match)
- From/to addresses (exact match)
- Value (range queries)
Addresses Index
Index Name: addresses-{chain_id}
Document Structure:
{
"address": "0x...",
"chain_id": 138,
"label": "My Wallet",
"tags": ["wallet", "exchange"],
"token_count": 10,
"transaction_count": 500,
"first_seen": "2024-01-01T00:00:00Z",
"last_seen": "2024-01-15T00:00:00Z",
"is_contract": true,
"contract_name": "MyToken",
"balance_eth": "1.5",
"balance_usd": "3000"
}
Field Mappings:
address:keywordchain_id:integerlabel:text(analyzed) +keyword(exact match)tags:keyword(array)token_count:integertransaction_count:longfirst_seen:datelast_seen:dateis_contract:booleancontract_name:text+keywordbalance_eth:doublebalance_usd:double
Searchable Fields:
- Address (exact match, prefix match)
- Label (full-text search)
- Contract name (full-text search)
- Tags (facet filter)
Tokens Index
Index Name: tokens-{chain_id}
Document Structure:
{
"address": "0x...",
"chain_id": 138,
"name": "My Token",
"symbol": "MTK",
"type": "ERC20",
"decimals": 18,
"total_supply": "1000000000000000000000000",
"holder_count": 1000,
"transfer_count": 50000,
"logo_url": "https://...",
"verified": true,
"description": "A token description"
}
Field Mappings:
address:keywordchain_id:integername:text(analyzed) +keyword(exact match)symbol:keyword(uppercase normalized)type:keyworddecimals:integertotal_supply:text(for large numbers)total_supply_numeric:double(for sorting)holder_count:integertransfer_count:longlogo_url:keywordverified:booleandescription:text(analyzed)
Searchable Fields:
- Name (full-text search)
- Symbol (exact match, prefix match)
- Address (exact match)
Contracts Index
Index Name: contracts-{chain_id}
Document Structure:
{
"address": "0x...",
"chain_id": 138,
"name": "MyContract",
"verification_status": "verified",
"compiler_version": "0.8.19",
"source_code": "contract MyContract {...}",
"abi": [...],
"verified_at": "2024-01-01T00:00:00Z",
"transaction_count": 1000,
"created_at": "2024-01-01T00:00:00Z"
}
Field Mappings:
address:keywordchain_id:integername:text+keywordverification_status:keywordcompiler_version:keywordsource_code:text(analyzed, indexed but not stored in full for large contracts)abi:object(nested, for structured queries)verified_at:datetransaction_count:longcreated_at:date
Searchable Fields:
- Name (full-text search)
- Address (exact match)
- Source code (full-text search, limited)
Indexing Pipeline
Data Transformation
Purpose: Transform canonical PostgreSQL data into search-optimized documents.
Transformation Steps:
- Fetch Data: Query PostgreSQL for entities to index
- Enrich Data: Add computed fields (balances, counts, etc.)
- Normalize Data: Normalize addresses, format values
- Index Document: Send to Elasticsearch/OpenSearch
Indexing Strategy
Initial Indexing:
- Bulk index existing data
- Process in batches (1000 documents per batch)
- Use bulk API for efficiency
Incremental Indexing:
- Index new entities as they're created
- Update entities when changed
- Delete entities when removed
Update Frequency:
- Real-time: Index immediately after database insert/update
- Batch: Bulk update every N minutes for efficiency
Index Aliases
Purpose: Enable zero-downtime index updates.
Strategy:
- Write to new index (e.g.,
blocks-138-v2) - Build index in background
- Switch alias when ready
- Delete old index after switch
Alias Names:
blocks-{chain_id}→ points to latest versiontransactions-{chain_id}→ points to latest version- etc.
Query Patterns
Full-Text Search
Blocks Search:
{
"query": {
"match": {
"hash": "0x123..."
}
}
}
Address Search:
{
"query": {
"bool": {
"should": [
{ "match": { "label": "wallet" } },
{ "prefix": { "address": "0x123" } }
]
}
}
}
Token Search:
{
"query": {
"bool": {
"should": [
{ "match": { "name": "My Token" } },
{ "match": { "symbol": "MTK" } }
]
}
}
}
Faceted Search
Filter by Multiple Criteria:
{
"query": {
"bool": {
"must": [
{ "term": { "chain_id": 138 } },
{ "term": { "type": "ERC20" } },
{ "range": { "holder_count": { "gte": 100 } } }
]
}
},
"aggs": {
"by_type": {
"terms": { "field": "type" }
}
}
}
Unified Search
Cross-Entity Search:
- Search across blocks, transactions, addresses, tokens
- Use
_indexfield to filter by entity type - Combine results with relevance scoring
Multi-Index Query:
{
"query": {
"multi_match": {
"query": "0x123",
"fields": ["hash", "address", "from_address", "to_address"],
"type": "best_fields"
}
}
}
Index Configuration
Analysis Settings
Custom Analyzer:
- Address analyzer: Lowercase, no tokenization
- Symbol analyzer: Uppercase, no tokenization
- Text analyzer: Standard analyzer with lowercase
Example Configuration:
{
"settings": {
"analysis": {
"analyzer": {
"address_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
}
}
Sharding and Replication
Sharding:
- Number of shards: Based on index size
- Large indices (> 50GB): Multiple shards
- Small indices: Single shard
Replication:
- Replica count: 1-2 (for high availability)
- Increase replicas for read-heavy workloads
Performance Optimization
Index Optimization
Refresh Interval:
- Default: 1 second
- For bulk indexing: Increase to 30 seconds, then reset
Bulk Indexing:
- Batch size: 1000-5000 documents
- Use bulk API
- Disable refresh during bulk indexing
Query Optimization
Query Caching:
- Enable query cache for repeated queries
- Cache filter results
Field Data:
- Use
doc_valuesfor sorting/aggregations - Avoid
fielddatafor text fields
Maintenance
Index Monitoring
Metrics:
- Index size
- Document count
- Query performance (p50, p95, p99)
- Index lag (time behind database)
Index Cleanup
Strategy:
- Delete old indices (after alias switch)
- Archive old indices to cold storage
- Compress indices for storage efficiency
Integration with PostgreSQL
Data Sync
Sync Strategy:
- Real-time: Listen to database changes (CDC, triggers, or polling)
- Batch: Periodic sync jobs
- Hybrid: Real-time for recent data, batch for historical
Change Detection:
- Use
updated_attimestamp - Use database triggers to queue changes
- Use CDC (Change Data Capture) if available
Consistency
Eventual Consistency:
- Search index is eventually consistent with database
- Small lag acceptable (< 1 minute)
- Critical queries can fall back to database
References
- Database Schema: See
postgres-schema.md - Indexer Architecture: See
../indexing/indexer-architecture.md - Unified Search: See
../multichain/unified-search.md