Kachi
Kachi Site navigation
Getting Started

AI Traffic Analytics — Setup & Implementation Guide

Everything you need to connect your site to Kachi and start measuring AI Indexing, AI Realtime Retrieval, AI Training, and LLM-referred conversions.

Overview

Kachi measures three types of AI activity on your website — AI Indexing, AI Realtime Retrieval, and AI Training — all from verified server-level log data. No JavaScript snippets, no sampling, no guesswork.

Once connected, your dashboard shows:

  • Which AI platforms are caching, fetching, or training on your pages
  • Which pages receive the most AI attention (AEO scores)
  • Which visitors arrived from AI platforms and whether they converted
  • How your AI visibility compares to your traditional SEO performance (GSC)

This guide walks through every step from initial connection to a fully operational dashboard.


Quick Start

Kachi follows a three-step onboarding flow:

  1. Connect your data source — choose the ingestion method that matches your hosting stack
  2. Connect your analytics integrations — link Google Search Console and GA4
  3. Verify data flow — confirm logs are arriving and your dashboard is populating

Your dedicated onboarding contact will guide you through each step. The sections below provide technical detail for each stage.


Step 1 — Choose Your Data Ingestion Method

Kachi supports four ingestion methods. Your hosting environment determines which one applies.

Best for: Sites already using Cloudflare as a reverse proxy or CDN.

Kachi deploys a lightweight Cloudflare Worker on your zone that captures every incoming request and forwards structured log events to Kachi’s ingest pipeline in real time.

How it works:

Browser → Cloudflare (Worker intercepts) → Your Origin Server

              Kachi API Gateway → Ingest Lambda → Kinesis Firehose → S3

What you need to provide:

  • Cloudflare Zone ID
  • API token with Worker deploy permissions (provided by your team, or Kachi handles deployment)
  • Confirmation that your origin is behind Cloudflare proxy (orange cloud enabled)

What Kachi handles:

  • Worker script deployment and maintenance
  • Log normalization from Cloudflare format to Kachi canonical Parquet
  • Automatic daily processing and dashboard updates

Note: Your origin server does not need any changes. The Worker runs entirely at the Cloudflare edge.


Option B — Server Log Streaming (SFTP / FTP)

Best for: Sites hosted on traditional servers, Nginx, Apache, or managed hosting platforms that can export access logs.

Your server or CDN sends raw access logs to Kachi’s secure ingestion endpoint on a scheduled basis (typically hourly or daily).

Supported log formats:

  • Nginx access logs (combined or json format)
  • Apache access logs (combined format)
  • CDN logs (Fastly, CloudFront, Akamai) in standard formats
  • Custom NDJSON or CSV formats (contact your onboarding contact)

Connection details:

Kachi will provide you with:

  • SFTP host, port, username, and SSH key
  • Target directory path for log delivery
  • Expected filename format and delivery schedule

What Kachi handles:

  • Log ingestion, deduplication, and normalization
  • Conversion to canonical Parquet format
  • Daily processing pipeline and dashboard refresh

Note: Logs should cover all traffic including bot and crawler requests. Do not pre-filter logs before delivery — Kachi’s normalization pipeline distinguishes AI traffic from human traffic.


Option C — Cloudflare Logpush

Best for: Sites on Cloudflare Pro, Business, or Enterprise who prefer configuration-only log delivery without deploying a Worker script.

Cloudflare Logpush automatically exports HTTP request logs from Cloudflare’s edge to a Kachi-provisioned S3 bucket or SFTP endpoint on a scheduled basis. No Worker code is deployed — the job is configured entirely in the Cloudflare dashboard or via API.

How it works:

Cloudflare Edge (Logpush job) → Kachi S3 / SFTP

                            Ingest Lambda → Kinesis Firehose → S3 → Dashboard

What you need to provide:

  • Cloudflare Zone ID
  • Cloudflare account with Logpush enabled (Pro plan or higher)
  • Ability to create a Logpush job in the Cloudflare dashboard (Kachi provides the destination and configuration)

What Kachi handles:

  • Destination provisioning (S3 bucket or SFTP credentials)
  • Logpush job configuration template
  • Log normalization from Cloudflare format to Kachi canonical Parquet
  • Daily processing pipeline and dashboard refresh

Note: Logpush delivers logs in batches (typically every 5 minutes to hourly depending on your plan). This means a slight delay compared to the Worker approach but requires no code deployment.


Option D — Cloud Hosting (AWS / Azure / GCP and others)

Best for: Sites hosted on any cloud provider — AWS, Azure, Google Cloud, DigitalOcean, Linode, or similar — where access logs can be exported to a storage bucket or delivered via SFTP.

Each cloud provider has its own native log export mechanism. Kachi ingests logs from whichever destination your provider supports, normalizes them, and feeds them into the dashboard pipeline.

Supported export methods:

ProviderLog Export Method
AWSCloudWatch Logs, S3 access logs, or ALB access logs
AzureAzure Monitor → Storage Account export
Google CloudCloud Logging → Log Router → Cloud Storage
DigitalOcean / LinodeSFTP log delivery from your droplet or VM
Other cloud VMsSFTP delivery of raw access logs

How it works:

Cloud Hosting (any provider) → Log Export / Storage Bucket

                              Kachi Ingest Pipeline → S3 (normalized) → Athena → Dashboard

What you need to provide:

  • Your cloud provider and hosting environment details
  • Access to configure a log export or delivery method (Kachi provides the destination and configuration)

What Kachi handles:

  • Destination provisioning (S3 bucket or SFTP credentials)
  • Normalization pipeline for your provider’s log format
  • Daily aggregation and dashboard refresh

Step 2 — Connect Google Search Console (GSC)

Connecting GSC enables the SEO vs AI Comparison view in your dashboard — showing GSC impressions, clicks, CTR, and average position alongside AI Indexing and AI Realtime Retrieval counts for every page.

Setup process:

  1. Your onboarding contact will request read-only access to your GSC property
  2. Add the Kachi service account as a restricted user in your GSC property settings
  3. Kachi’s lambda_google_snapshots function will pull GSC data nightly

GSC data imported:

FieldDescription
ImpressionsHow many times your pages appeared in Google Search
ClicksHow many times users clicked through to your site
CTRClick-through rate per page
Average PositionYour average ranking position in Google Search

What you’ll see in the dashboard:

Every page in your site appears in a unified table with GSC columns alongside AI Indexing and AI Realtime Retrieval hit counts — so you can immediately spot pages that Google ranks highly but AI ignores, or vice versa.


Step 3 — Connect Google Analytics 4 (GA4)

Connecting GA4 enables LLM-Referred Conversion Tracking — the most direct measure of AI’s business impact on your site.

Setup process:

  1. Your onboarding contact will request read access to your GA4 property
  2. Add the Kachi service account as a Viewer on your GA4 property
  3. Kachi imports GA4 data nightly via the Google Analytics Data API

GA4 data imported:

MetricDescription
SessionsTotal sessions by channel
UsersUnique users
Engaged SessionsSessions meeting GA4 engagement criteria
ConversionsKey events configured in your GA4 property
RevenueE-commerce or custom revenue events (if configured)

Attribution windows:

Kachi matches LLM-referred visits from your server logs against GA4 conversions using two windows:

  • Same Day — visitor converted within 24 hours of the AI-referred session
  • Days 2–14 — conversion occurred 2 to 14 days after the AI-referred visit

This captures both immediate conversions and longer consideration cycles common in B2B and high-value purchases.


Step 4 — Verify Data Flow

After setup, Kachi’s team will confirm that data is flowing correctly before your dashboard goes live. You can also verify independently:

Cloudflare Worker verification

Check that the Worker is active in your Cloudflare dashboard under Workers & Pages. The Worker name follows the pattern kachi-{your-domain}-ingest. A green status indicator confirms it is deployed and handling requests.

Cloudflare Logpush verification

Confirm the Logpush job is active under Analytics & Logs → Logpush in your Cloudflare dashboard. The job status should show as Healthy. Your onboarding contact will confirm receipt of the first pushed batch within a few hours of activation.

Log streaming verification

Your onboarding contact will confirm receipt of the first log batch. Typical first-delivery confirmation takes 24–48 hours depending on your log delivery schedule.

Dashboard population

Your dashboard is populated on a daily refresh cycle. Within 48 hours of a confirmed connection you should see:

  • AI Indexing, AI Realtime Retrieval, and AI Training KPI counts
  • LLM Breakdown panels with vendor-level hit counts
  • Top Pages table with AEO scores

If your dashboard shows zeros after 48 hours, contact your onboarding contact. Common causes are log format mismatches or firewall rules blocking SFTP delivery.


What Kachi Measures

Once live, your dashboard tracks the following across 40+ AI platforms:

AI Indexing

Pages that AI platforms have cached and pre-built as answers. High indexing counts mean your content is stored across AI ecosystems and being served at scale.

Top vendors tracked: Apple, You.com, Amazon, Petal, Meta, and others.

AI Realtime Retrieval

Live fetches that happen mid-query when an AI system needs current information to answer a user’s question. These are high-intent signals — an AI is actively consulting your site during a live session.

Top vendors tracked: OpenAI, Perplexity, Claude, Meta, Operator, and others.

AI Training

Crawls by model-training bots that ingest your content into future AI knowledge bases. Training activity shapes what AI systems know and recommend, even without a real-time citation.

Top vendors tracked: Meta, OpenAI, Claude, Common Crawl, Timpibot, and others.

AEO Score

A 0–100 score per page reflecting normalized AI citation frequency across all three interaction types. Higher scores indicate pages that AI systems consistently prefer as sources.


Supported Hosting Platforms

PlatformRecommended Method
Cloudflare-proxied sitesCloudflare Worker
Cloudflare Pro / Business / EnterpriseCloudflare Worker or Logpush
Framer (Cloudflare reverse proxy)Cloudflare Worker or Logpush
Squarespace (Cloudflare reverse proxy)Cloudflare Worker or Logpush
Webflow (Cloudflare reverse proxy)Cloudflare Worker or Logpush
SiteGround GrowBig (Cloudflare reverse proxy)Cloudflare Worker or Logpush
Nginx / Apache (self-hosted)Server Log Streaming (SFTP)
AWS CloudFrontServer Log Streaming
VercelServer Log Streaming (Log Drain)
FastlyServer Log Streaming
AWS EC2 / ECS / Elastic BeanstalkCloud Hosting (Log Export)
Azure App Service / AKSCloud Hosting (Log Export)
Google Cloud / Firebase HostingCloud Hosting (Log Export)
DigitalOcean / LinodeCloud Hosting (SFTP)
Managed WordPress hostsServer Log Streaming (if supported)

Not sure which method applies to your stack? Your onboarding contact will confirm the right approach during the initial scoping call.


Need Help?

If you run into any issues during setup:

  • Onboarding contact — your dedicated contact is your first point of call for setup questions
  • Email support — reach us at [email protected]
Have questions? Contact support