Data Flow Architecture
Web Intelligence: AI's Eyes on the Web
Web Intelligence is a standalone environment that serves as the AI's primary interface for web data collection. It is NOT limited to contact enrichment - it enables ANY task requiring web data.
Core Pattern
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Web IntelligenceβββββΆβ Staging βββββΆβ Target β
β (6670/6671) β β (6680/6681) β β Environment β
β crawl/scrape β β temporary hold β β final destinationβ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Why Staging?
The Staging Environment is a multi-tiered intermediary that: - Holds raw web data temporarily during processing - Allows transformation and enrichment before routing - Decouples web collection from final storage - Enables retry/reprocessing without re-crawling - Supports audit trails and processing logs
Downstream Environments
Web Intelligence data can flow to ANY Nexus environment:
| Environment | Port | Use Case |
|---|---|---|
| KB | 6625/6626 | Documentation, knowledge articles |
| Documents | TBD | Reports, scraped content |
| Contacts | 6630/6631 | Contact enrichment |
| Corpus | TBD | Training data, transcripts |
| Links | 6635/6636 | URL bookmarks, discovered links |
| Track | 6640/6641 | Project research data |
Example Data Flows
1. Documentation Scraping
web.scrape(docs_url)
β
staging.store(raw_html)
β
web.to_markdown()
β
kb.create(title, markdown_content)
Use case: Scraping API documentation for LARS reference.
2. Competitive Research
web.research(competitor, depth='deep')
β
staging.store(research_data)
β
Process and structure
β
documents.create(research_report)
Use case: Building competitive analysis reports.
3. Contact Enrichment
web.research(person_company, depth='standard')
β
staging.store(enrichment_data)
β
Extract entities and verify
β
contact.update(contact_id, enriched_fields)
Use case: Adding business intelligence to CRM contacts.
4. YouTube Transcripts
web.fetch_transcript(video_url)
β
staging.store(raw_transcript)
β
Process and clean
β
corpus.add(transcript_document)
Use case: Building training data from educational videos.
5. Link Discovery
web.crawl(site_url, depth=2)
β
web.discover_links()
β
web.filter_links(criteria)
β
links.save_batch(discovered_urls)
Use case: Building curated bookmark collections.
LARS Integration
LARS (Local AI Runtime System) uses Web Intelligence as its primary tool for gathering external information:
- Research tasks β web.research()
- Documentation β web.scrape() + web.to_markdown()
- Monitoring β web.crawl() for changes
- Enrichment β web.research() with entity extraction
All results flow through Staging, allowing LARS to process data incrementally and route to appropriate destinations.
Key Principles
- Separation of Concerns: Web Intelligence collects; Staging holds; Target environments store
- Flexibility: Same crawl data can route to multiple destinations
- Idempotency: Staging enables safe reprocessing
- Audit Trail: Processing steps tracked in Staging
- Universal Interface: One web tool serves all AI tasks