# Data Catalog **Purpose**: Unified data catalog for tracking and discovering datasets **Status**: 🚧 Planned --- ## Overview The data catalog provides a centralized registry for all datasets across the workspace, enabling discovery, access control, and metadata management. --- ## Features - Dataset registration - Metadata management - Search and discovery - Access control - Schema tracking - Lineage tracking --- ## Schema See `metadata-schema.json` for the complete metadata schema. ### Key Fields - **id**: Unique dataset identifier - **name**: Human-readable name - **source**: Source system/project - **storage**: Storage location details - **schema**: Data schema definition - **tags**: Categorization tags - **access**: Access control settings --- ## Implementation Options ### Option 1: Custom API - Build custom API using shared packages - Use PostgreSQL for metadata storage - Implement search using PostgreSQL full-text search ### Option 2: DataHub - Deploy DataHub (open-source) - Use existing metadata models - Leverage built-in features ### Option 3: Amundsen - Deploy Amundsen (open-source) - Use existing metadata models - Leverage built-in features --- ## Usage ### Register Dataset ```json { "id": "user-events-2025", "name": "User Events 2025", "description": "User interaction events for 2025", "source": "analytics-service", "storage": { "type": "minio", "bucket": "analytics", "path": "events/2025/" }, "format": "parquet", "tags": ["events", "analytics", "2025"], "owner": "analytics-team", "access": { "level": "internal", "permissions": ["read"] } } ``` ### Search Datasets ```bash # Search by tag GET /api/catalog/datasets?tag=analytics # Search by source GET /api/catalog/datasets?source=analytics-service # Full-text search GET /api/catalog/datasets?q=user+events ``` --- ## Next Steps 1. Choose implementation option 2. Set up metadata storage 3. Implement registration API 4. Implement search functionality 5. Set up access control 6. Integrate with projects --- **Status**: 🚧 Planned - Schema and design complete, implementation pending