AI-Driven Company Enrichment with DeepSeek via Cloudflare Browser Rendering
· 4 min read
This page documents an AI-first enrichment pipeline that turns a company website into a clean, structured company profile you can safely persist into your database and expose through GraphQL.
The core idea is simple:
- Use Cloudflare Browser Rendering
/jsonto load a real rendered page (including JavaScript-heavy sites). - Use DeepSeek to convert the rendered page into a strict JSON-only object (no markdown, no prose).
High-level architecture
This pipeline has five clear layers, each with a single responsibility:
- Entry: GraphQL mutation identifies the target company.
- Acquisition: Browser Rendering fetches a fully rendered page.
- Extraction: DeepSeek converts HTML into JSON-only structure.
- Governance: validation, normalization, and audit snapshot.
- Persistence: upserts for company + ATS boards, then return.
Classification
A single enum-like category so downstream logic can branch cleanly:
company.categoryis one of:CONSULTANCY | AGENCY | STAFFING | DIRECTORY | PRODUCT | OTHER | UNKNOWN
UNKNOWN is intentionally allowed to prevent “forced certainty”.
Link intelligence (the AI “money fields”)
Two links that unlock most automation:
company.careers_url— best official careers entrypoint (prefer internal)company.linkedin_url— best LinkedIn company page (/company/...)
