AI Traffic Analytics — Setup & Implementation Guide
Everything you need to connect your site to Kachi and start measuring AI Indexing, AI Realtime Retrieval, AI Training, and LLM-referred conversions.
Overview
Kachi measures three types of AI activity on your website — AI Indexing, AI Realtime Retrieval, and AI Training — all from verified server-level log data. No JavaScript snippets, no sampling, no guesswork.
Once connected, your dashboard shows:
- Which AI platforms are caching, fetching, or training on your pages
- Which pages receive the most AI attention (AEO scores)
- Which visitors arrived from AI platforms and whether they converted
- How your AI visibility compares to your traditional SEO performance (GSC)
This guide walks through every step from initial connection to a fully operational dashboard.
Quick Start
Kachi follows a three-step onboarding flow:
- Connect your data source — choose the ingestion method that matches your hosting stack
- Connect your analytics integrations — link Google Search Console and GA4
- Verify data flow — confirm logs are arriving and your dashboard is populating
Your dedicated onboarding contact will guide you through each step. The sections below provide technical detail for each stage.
Step 1 — Choose Your Data Ingestion Method
Kachi supports four ingestion methods. Your hosting environment determines which one applies.
Option A — Cloudflare Worker (Recommended)
Best for: Sites already using Cloudflare as a reverse proxy or CDN.
Kachi deploys a lightweight Cloudflare Worker on your zone that captures every incoming request and forwards structured log events to Kachi’s ingest pipeline in real time.
How it works:
Browser → Cloudflare (Worker intercepts) → Your Origin Server
↓
Kachi API Gateway → Ingest Lambda → Kinesis Firehose → S3
What you need to provide:
- Cloudflare Zone ID
- API token with Worker deploy permissions (provided by your team, or Kachi handles deployment)
- Confirmation that your origin is behind Cloudflare proxy (orange cloud enabled)
What Kachi handles:
- Worker script deployment and maintenance
- Log normalization from Cloudflare format to Kachi canonical Parquet
- Automatic daily processing and dashboard updates
Note: Your origin server does not need any changes. The Worker runs entirely at the Cloudflare edge.
Option B — Server Log Streaming (SFTP / FTP)
Best for: Sites hosted on traditional servers, Nginx, Apache, or managed hosting platforms that can export access logs.
Your server or CDN sends raw access logs to Kachi’s secure ingestion endpoint on a scheduled basis (typically hourly or daily).
Supported log formats:
- Nginx access logs (
combinedorjsonformat) - Apache access logs (
combinedformat) - CDN logs (Fastly, CloudFront, Akamai) in standard formats
- Custom NDJSON or CSV formats (contact your onboarding contact)
Connection details:
Kachi will provide you with:
- SFTP host, port, username, and SSH key
- Target directory path for log delivery
- Expected filename format and delivery schedule
What Kachi handles:
- Log ingestion, deduplication, and normalization
- Conversion to canonical Parquet format
- Daily processing pipeline and dashboard refresh
Note: Logs should cover all traffic including bot and crawler requests. Do not pre-filter logs before delivery — Kachi’s normalization pipeline distinguishes AI traffic from human traffic.
Option C — Cloudflare Logpush
Best for: Sites on Cloudflare Pro, Business, or Enterprise who prefer configuration-only log delivery without deploying a Worker script.
Cloudflare Logpush automatically exports HTTP request logs from Cloudflare’s edge to a Kachi-provisioned S3 bucket or SFTP endpoint on a scheduled basis. No Worker code is deployed — the job is configured entirely in the Cloudflare dashboard or via API.
How it works:
Cloudflare Edge (Logpush job) → Kachi S3 / SFTP
↓
Ingest Lambda → Kinesis Firehose → S3 → Dashboard
What you need to provide:
- Cloudflare Zone ID
- Cloudflare account with Logpush enabled (Pro plan or higher)
- Ability to create a Logpush job in the Cloudflare dashboard (Kachi provides the destination and configuration)
What Kachi handles:
- Destination provisioning (S3 bucket or SFTP credentials)
- Logpush job configuration template
- Log normalization from Cloudflare format to Kachi canonical Parquet
- Daily processing pipeline and dashboard refresh
Note: Logpush delivers logs in batches (typically every 5 minutes to hourly depending on your plan). This means a slight delay compared to the Worker approach but requires no code deployment.
Option D — Cloud Hosting (AWS / Azure / GCP and others)
Best for: Sites hosted on any cloud provider — AWS, Azure, Google Cloud, DigitalOcean, Linode, or similar — where access logs can be exported to a storage bucket or delivered via SFTP.
Each cloud provider has its own native log export mechanism. Kachi ingests logs from whichever destination your provider supports, normalizes them, and feeds them into the dashboard pipeline.
Supported export methods:
| Provider | Log Export Method |
|---|---|
| AWS | CloudWatch Logs, S3 access logs, or ALB access logs |
| Azure | Azure Monitor → Storage Account export |
| Google Cloud | Cloud Logging → Log Router → Cloud Storage |
| DigitalOcean / Linode | SFTP log delivery from your droplet or VM |
| Other cloud VMs | SFTP delivery of raw access logs |
How it works:
Cloud Hosting (any provider) → Log Export / Storage Bucket
↓
Kachi Ingest Pipeline → S3 (normalized) → Athena → Dashboard
What you need to provide:
- Your cloud provider and hosting environment details
- Access to configure a log export or delivery method (Kachi provides the destination and configuration)
What Kachi handles:
- Destination provisioning (S3 bucket or SFTP credentials)
- Normalization pipeline for your provider’s log format
- Daily aggregation and dashboard refresh
Step 2 — Connect Google Search Console (GSC)
Connecting GSC enables the SEO vs AI Comparison view in your dashboard — showing GSC impressions, clicks, CTR, and average position alongside AI Indexing and AI Realtime Retrieval counts for every page.
Setup process:
- Your onboarding contact will request read-only access to your GSC property
- Add the Kachi service account as a restricted user in your GSC property settings
- Kachi’s
lambda_google_snapshotsfunction will pull GSC data nightly
GSC data imported:
| Field | Description |
|---|---|
| Impressions | How many times your pages appeared in Google Search |
| Clicks | How many times users clicked through to your site |
| CTR | Click-through rate per page |
| Average Position | Your average ranking position in Google Search |
What you’ll see in the dashboard:
Every page in your site appears in a unified table with GSC columns alongside AI Indexing and AI Realtime Retrieval hit counts — so you can immediately spot pages that Google ranks highly but AI ignores, or vice versa.
Step 3 — Connect Google Analytics 4 (GA4)
Connecting GA4 enables LLM-Referred Conversion Tracking — the most direct measure of AI’s business impact on your site.
Setup process:
- Your onboarding contact will request read access to your GA4 property
- Add the Kachi service account as a Viewer on your GA4 property
- Kachi imports GA4 data nightly via the Google Analytics Data API
GA4 data imported:
| Metric | Description |
|---|---|
| Sessions | Total sessions by channel |
| Users | Unique users |
| Engaged Sessions | Sessions meeting GA4 engagement criteria |
| Conversions | Key events configured in your GA4 property |
| Revenue | E-commerce or custom revenue events (if configured) |
Attribution windows:
Kachi matches LLM-referred visits from your server logs against GA4 conversions using two windows:
- Same Day — visitor converted within 24 hours of the AI-referred session
- Days 2–14 — conversion occurred 2 to 14 days after the AI-referred visit
This captures both immediate conversions and longer consideration cycles common in B2B and high-value purchases.
Step 4 — Verify Data Flow
After setup, Kachi’s team will confirm that data is flowing correctly before your dashboard goes live. You can also verify independently:
Cloudflare Worker verification
Check that the Worker is active in your Cloudflare dashboard under Workers & Pages. The Worker name follows the pattern kachi-{your-domain}-ingest. A green status indicator confirms it is deployed and handling requests.
Cloudflare Logpush verification
Confirm the Logpush job is active under Analytics & Logs → Logpush in your Cloudflare dashboard. The job status should show as Healthy. Your onboarding contact will confirm receipt of the first pushed batch within a few hours of activation.
Log streaming verification
Your onboarding contact will confirm receipt of the first log batch. Typical first-delivery confirmation takes 24–48 hours depending on your log delivery schedule.
Dashboard population
Your dashboard is populated on a daily refresh cycle. Within 48 hours of a confirmed connection you should see:
- AI Indexing, AI Realtime Retrieval, and AI Training KPI counts
- LLM Breakdown panels with vendor-level hit counts
- Top Pages table with AEO scores
If your dashboard shows zeros after 48 hours, contact your onboarding contact. Common causes are log format mismatches or firewall rules blocking SFTP delivery.
What Kachi Measures
Once live, your dashboard tracks the following across 40+ AI platforms:
AI Indexing
Pages that AI platforms have cached and pre-built as answers. High indexing counts mean your content is stored across AI ecosystems and being served at scale.
Top vendors tracked: Apple, You.com, Amazon, Petal, Meta, and others.
AI Realtime Retrieval
Live fetches that happen mid-query when an AI system needs current information to answer a user’s question. These are high-intent signals — an AI is actively consulting your site during a live session.
Top vendors tracked: OpenAI, Perplexity, Claude, Meta, Operator, and others.
AI Training
Crawls by model-training bots that ingest your content into future AI knowledge bases. Training activity shapes what AI systems know and recommend, even without a real-time citation.
Top vendors tracked: Meta, OpenAI, Claude, Common Crawl, Timpibot, and others.
AEO Score
A 0–100 score per page reflecting normalized AI citation frequency across all three interaction types. Higher scores indicate pages that AI systems consistently prefer as sources.
Supported Hosting Platforms
| Platform | Recommended Method |
|---|---|
| Cloudflare-proxied sites | Cloudflare Worker |
| Cloudflare Pro / Business / Enterprise | Cloudflare Worker or Logpush |
| Framer (Cloudflare reverse proxy) | Cloudflare Worker or Logpush |
| Squarespace (Cloudflare reverse proxy) | Cloudflare Worker or Logpush |
| Webflow (Cloudflare reverse proxy) | Cloudflare Worker or Logpush |
| SiteGround GrowBig (Cloudflare reverse proxy) | Cloudflare Worker or Logpush |
| Nginx / Apache (self-hosted) | Server Log Streaming (SFTP) |
| AWS CloudFront | Server Log Streaming |
| Vercel | Server Log Streaming (Log Drain) |
| Fastly | Server Log Streaming |
| AWS EC2 / ECS / Elastic Beanstalk | Cloud Hosting (Log Export) |
| Azure App Service / AKS | Cloud Hosting (Log Export) |
| Google Cloud / Firebase Hosting | Cloud Hosting (Log Export) |
| DigitalOcean / Linode | Cloud Hosting (SFTP) |
| Managed WordPress hosts | Server Log Streaming (if supported) |
Not sure which method applies to your stack? Your onboarding contact will confirm the right approach during the initial scoping call.
Need Help?
If you run into any issues during setup:
- Onboarding contact — your dedicated contact is your first point of call for setup questions
- Email support — reach us at [email protected]