100 lines
4.6 KiB
Markdown
100 lines
4.6 KiB
Markdown
# Offer ingestion (scrape and email)
|
||
|
||
Offers can be ingested from external sources so they appear in the database for potential purchases, without manual data entry.
|
||
|
||
## Sources
|
||
|
||
1. **Scraped** – e.g. site content from theserverstore.com (Peter as Manager). A scraper job fetches pages, parses offer-like content, and creates offer records.
|
||
2. **Email** – a dedicated mailbox accepts messages (e.g. from Sergio and others); a pipeline parses them and creates offer records.
|
||
|
||
Ingested offers are stored with:
|
||
|
||
- `source`: `scraped` or `email`
|
||
- `source_ref`: URL (scrape) or email message id (email)
|
||
- `source_metadata`: optional JSON (e.g. sender, subject, page title, contact name)
|
||
- `ingested_at`: timestamp of ingestion
|
||
- `vendor_id`: optional; may be null until procurement assigns the offer to a vendor
|
||
|
||
## API: ingestion endpoint
|
||
|
||
Internal or automated callers use a dedicated endpoint, secured by an API key (no user JWT).
|
||
|
||
**POST** `/api/v1/ingestion/offers`
|
||
|
||
- **Auth:** Header `x-ingestion-api-key` must equal the environment variable `INGESTION_API_KEY`. If missing or wrong, returns `401`.
|
||
- **Org:** Header `x-org-id` (default `default`) specifies the org for the new offer.
|
||
|
||
**Body (JSON):**
|
||
|
||
| Field | Type | Required | Description |
|
||
|-------|------|----------|-------------|
|
||
| `source` | `"scraped"` \| `"email"` | yes | Ingestion source |
|
||
| `source_ref` | string | no | URL or message id |
|
||
| `source_metadata` | object | no | e.g. `{ "sender": "Sergio", "subject": "...", "page_url": "..." }` |
|
||
| `vendor_id` | UUID | no | Vendor to attach; omit for unassigned |
|
||
| `sku` | string | no | |
|
||
| `mpn` | string | no | |
|
||
| `quantity` | number | yes | |
|
||
| `unit_price` | string | yes | Decimal |
|
||
| `incoterms` | string | no | |
|
||
| `lead_time_days` | number | no | |
|
||
| `country_of_origin` | string | no | |
|
||
| `condition` | string | no | |
|
||
| `warranty` | string | no | |
|
||
| `evidence_refs` | array | no | `[{ "key": "s3-key", "hash": "..." }]` |
|
||
|
||
**Response:** `201` with the created offer (including `id`, `source`, `source_ref`, `source_metadata`, `ingested_at`).
|
||
|
||
Example (scrape):
|
||
|
||
```json
|
||
{
|
||
"source": "scraped",
|
||
"source_ref": "https://theserverstore.com/...",
|
||
"source_metadata": { "contact": "Peter", "site": "theserverstore.com" },
|
||
"vendor_id": null,
|
||
"sku": "DL380-G9",
|
||
"quantity": 2,
|
||
"unit_price": "450.00",
|
||
"condition": "refurbished"
|
||
}
|
||
```
|
||
|
||
Example (email):
|
||
|
||
```json
|
||
{
|
||
"source": "email",
|
||
"source_ref": "msg-12345",
|
||
"source_metadata": { "from": "sergio@example.com", "subject": "Quote for R630" },
|
||
"vendor_id": null,
|
||
"mpn": "PowerEdge R630",
|
||
"quantity": 1,
|
||
"unit_price": "320.00"
|
||
}
|
||
```
|
||
|
||
## Scraper (e.g. theserverstore.com)
|
||
|
||
- **Responsibility:** Fetch pages (respecting robots.txt and rate limits), extract product/offer fields, then POST to `POST /api/v1/ingestion/offers` for each offer.
|
||
- **Where:** Can run as a scheduled job in `apps/` or `packages/`, or as an external service that calls the API. No scraper implementation is in-repo yet; this doc defines the contract.
|
||
- **Vendor:** If the site is known (e.g. The Server Store, Peter as Manager), the scraper can resolve or create a vendor and pass `vendor_id`; otherwise leave null for procurement to assign later.
|
||
- **Idempotency:** Use `source_ref` (e.g. canonical product URL) so the same offer is not duplicated; downstream you can upsert by `(org_id, source, source_ref)` if desired.
|
||
|
||
## Email intake (e.g. Sergio and others)
|
||
|
||
- **Flow:** Incoming messages to a dedicated mailbox (e.g. `offers@your-org.com`) are read by an IMAP poller or processed via an inbound webhook (SendGrid, Mailgun, etc.). The pipeline parses sender, subject, body, and optional attachments, then POSTs one or more payloads to `POST /api/v1/ingestion/offers`.
|
||
- **Storing raw email:** Attachments or full message can be uploaded to object storage (e.g. S3/MinIO) and referenced in `evidence_refs` or `source_metadata` (e.g. `raw_message_key`).
|
||
- **Vendor matching:** Match sender address or name to an existing vendor and set `vendor_id` when possible; otherwise leave null and set `source_metadata.sender` / `from` for later assignment.
|
||
|
||
## Configuration
|
||
|
||
- Set `INGESTION_API_KEY` in the environment where the API runs. Scraper and email pipeline must use the same value in `x-ingestion-api-key`.
|
||
- Use `x-org-id` on each request to target the correct org.
|
||
|
||
## Procurement workflow
|
||
|
||
- Ingested offers appear in the offers list with `source` = `scraped` or `email` and optional `vendor_id`.
|
||
- Offers with `vendor_id` null are “unassigned”; procurement can assign them to a vendor (PATCH offer or create/link vendor then update offer).
|
||
- Existing RBAC and org/site scoping apply; audit can track creation via `ingested_at` and `source_metadata`.
|