Adding an AI chatbot to your website sounds simple. Install a widget, connect an API key, write a system prompt. Done. Except that approach produces a chatbot that handles 30–40% of queries before confusing or frustrating the rest of your visitors.
We have built chatbots across all four architecture tiers for clients ranging from single-location service businesses to mid-market SaaS platforms. The honest conclusion from our data: widget platforms resolve 30–45% of queries. Custom RAG chatbots resolve 60–80%. The gap is not the AI model quality. It is the knowledge base quality and retrieval architecture.
This guide walks through all four approaches, explains when each makes sense, and provides the technical depth to actually implement them.
The Four Architecture Options
Option 1: Widget Platforms (Intercom, Drift, Tidio)
Cost: $29–$449/month. Setup time: 30–90 minutes. Resolution rate: 30–45% on typical business queries.
These platforms provide an embeddable widget with AI built in. Intercom Fin, Drift AI, and Tidio's Lyro all work similarly: you upload your help documentation, connect your live chat team, and the AI handles what it can before escalating. They are genuinely good products for what they are.
The problem is cost at scale. Intercom charges $0.99 per AI resolution. At 2,000 resolutions/month, that is $1,980/month in resolution fees alone, plus the base platform fee. For high-volume use cases, this is economically irrational compared to a direct API implementation.
Use widget platforms when: you need something live in a day, you have under 500 conversations/month, or your team has no technical capacity for anything else.
Option 2: No-Code Builders (Voiceflow, Botpress)
Cost: $0–$749/month. Setup time: 1–5 days.Resolution rate: 35–55% depending on flow design.
Voiceflow and Botpress let you design conversation flows visually, add AI-powered intent detection, and embed the result on any website. They are significantly more powerful than simple widget platforms — you can design branching logic, integrate APIs, and build multi-step qualification flows.
The limitation is conversation intelligence. These platforms excel at structured flows ("press 1 for X, press 2 for Y" evolved into AI intent detection), but they struggle with open-ended questions that require genuine reasoning. Botpress improved significantly with their LLM integration, but the rigid flow structure fights against natural conversation.
Use no-code builders when: you need complex conversation flows with branching logic, your team is non-technical but has more time than money, or you need specific integrations (Salesforce, HubSpot) without custom development.
Option 3: API-First (OpenAI Assistants API + Custom Widget)
Cost: $50–$200/month at 1,000 conversations. Setup time:3–7 days with a developer. Resolution rate: 55–70%.
Build a lightweight backend that calls the OpenAI Assistants API, create a React chat widget, and embed it on your website via a script tag. You get file search (vector store), code interpreter, function calling, and conversation memory out of the box from OpenAI. This is significantly faster than building a full custom RAG pipeline.
Option 4: Full Custom (Next.js + Vercel AI SDK + Your LLM)
Cost: $30–$100/month at 1,000 conversations plus development time.Setup time: 2–4 weeks. Resolution rate: 60–80%.
Full ownership of the entire stack. Your own RAG pipeline, your own vector store, your own conversation storage, your own analytics. Maximum control, maximum resolution rate, maximum development investment.
Use this approach when: you need maximum performance, have compliance requirements that prevent data going to third-party platforms, or are building a chatbot as a core product feature rather than a support tool.
Decision Framework
Answer these four questions to find your option:
- Team: Do you have a developer available? No → Option 1 or 2. Yes → Option 3 or 4.
- Budget: Is your monthly budget under $100? Option 1 (Tidio free tier or Botpress free) or Option 4 if developer available. Over $100/month in API costs acceptable? Any option.
- Conversation complexity: Mostly FAQ and simple questions? Option 1–3. Complex reasoning, multi-step processes, account-specific lookups? Option 3–4.
- Integration needs: Standard CRM/helpdesk integrations? Option 1–2. Custom APIs, internal databases, proprietary systems? Option 3–4.
Option 3 Step-by-Step: API-First Implementation
This is the implementation we recommend most often. Here is how to build it.
Step 1: Set Up the OpenAI Assistant
In the OpenAI playground, create a new Assistant. Set the model to gpt-4o-mini (best cost/performance for support use cases). Enable "File Search" (not Code Interpreter unless needed — it costs extra). Write your system prompt with explicit boundaries as described in our ChatGPT customer service guide.
Upload your knowledge base documents to the vector store. OpenAI handles chunking and embedding automatically. For a 100-document knowledge base, this takes 5–10 minutes.
Step 2: Create the API Route
Build a server-side API route that manages thread creation and message sending. The pattern is: receive user message → create or retrieve thread → add message → create run → poll for completion → return assistant response. Using the OpenAI Node.js SDK:
- POST /api/chat — accepts {threadId, message}, returns {threadId, response}
- Thread IDs are stored client-side (localStorage or cookie) to maintain conversation continuity
- Server-side: validate input, enforce rate limiting, log conversations to your database
Step 3: Build the React Widget
A minimal chat widget needs: a toggle button (typically bottom-right), a message thread display, a text input, and typing indicators. Keep it under 200 lines of React. The critical UX elements are:
- Optimistic UI: show user message immediately, spinner while waiting for response
- Streaming (if you switch to Chat Completions): use Server-Sent Events for real-time text rendering
- Mobile-responsive: the widget must work on phones. Use a fixed-position container with max-height and scroll.
- Welcome message: shown on first open, sets expectations and offers quick-start options
Step 4: Embed on Any Website
Build the widget as a Next.js app, deploy to Vercel, and create a script tag embed:
- The widget loads from your domain as an iframe or shadow DOM component
- Pass configuration (company name, colors, initial message) via data attributes on the script tag
- This approach works on any HTML website — WordPress, Squarespace, Shopify, static HTML
Option 4 Step-by-Step: Full Custom with RAG
The full custom approach uses Next.js route handlers, the Vercel AI SDK, Supabase pgvector for the knowledge base, and your choice of LLM.
Backend Architecture
The core is a streaming API route using Vercel AI SDK's streamText():
- Receive user message via POST
- Generate embedding for the query using text-embedding-3-small
- Run vector similarity search against Supabase pgvector
- Retrieve top 4–6 relevant chunks
- Construct prompt: system instructions + retrieved context + conversation history + user message
- Call streamText() with the constructed messages array
- Return the stream to the client
Supabase pgvector Setup
Enable the vector extension and create a documents table with a content column (text) and an embedding column (vector(1536) for text-embedding-3-small). Add a metadata JSONB column for category, source_url, and last_updated. Create an HNSW index on the embedding column for fast approximate nearest neighbor search at scale:
- Enable extension: CREATE EXTENSION IF NOT EXISTS vector;
- Create index: CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
- Query with: ORDER BY embedding <=> query_embedding LIMIT 5;
Knowledge Base Ingestion Pipeline
Build a server-side ingestion script that runs on a schedule or webhook trigger:
- Fetch updated documents from your content source (help center API, Notion API, local files)
- Chunk documents into 300–400 token segments with 50-token overlap
- Generate embeddings in batches of 100 (OpenAI rate limits apply)
- Upsert to Supabase using the source URL as a unique key to handle updates
UI/UX Best Practices That Actually Affect Resolution Rate
These are not cosmetic suggestions. They directly affect how many conversations resolve successfully.
Widget Placement
Bottom-right is the expected position. Do not be creative here. Customers have been trained by every major website to find chat support there. Deviation hurts discoverability without any benefit.
Welcome Message A/B Testing
The welcome message is the highest-leverage optimization in a chatbot. We have tested 20+ variants across clients and the pattern is consistent: specific beats generic. "Hi! Ask me anything about our return policy, shipping times, or account setup." outperforms "Hi! How can I help you today?" by 23% in message initiation rate. Specific prompts reduce off-topic queries by 40%.
Quick Reply Chips
Offer 3–4 common question buttons below the welcome message. These remove the blank-input anxiety and route users to your highest-quality knowledge base content. They also represent your most common Tier 1 queries — serve them first.
Conversation History
Persist conversation history in localStorage with a 24-hour expiry. Returning users should not have to repeat context. This is table-stakes UX that most implementations miss.
Mobile Responsiveness
Test on a 375px-wide viewport. The chat widget should occupy 85–95% of screen width on mobile and not overlap the page scroll. Common mistake: a widget sized for desktop that partially covers the mobile viewport on open.
Cost Comparison at 1,000 Conversations/Month
- Intercom Fin: $74 base + $0.99/resolution × ~650 resolutions = $717/month
- Tidio Lyro: $299/month for 200 conversations, $0.50/conversation overage = $699/month
- Botpress Pro: $445/month all-in at that volume
- OpenAI Assistants API (Option 3): ~$100 API costs + $20 hosting = $120/month
- Full Custom (Option 4): ~$50 API costs + $25 Supabase + $0 Vercel = $75/month
The Metrics That Matter
Do not measure chatbot "engagement." That is a vanity metric. Measure:
- Resolution rate: Conversations fully handled without human escalation. Target 60%+ with good RAG.
- Deflection rate: Tickets that would have gone to your support team but were handled by the bot. Calculate value using your cost-per-ticket.
- Post-chat CSAT: Send a one-question survey after chat ends. 4-5 stars on 60%+ of responses is a healthy benchmark.
- Escalation rate: The percentage reaching a human. Below 30% suggests over-escalation (too conservative). Above 60% suggests under-performance.
- False resolution rate: Conversations marked as resolved where the customer returned within 24 hours with the same question. Target below 8%.
Ready to Build?
For businesses just starting out, our recommendation is to begin with the API-first approach (Option 3) and graduate to full custom when volume justifies the investment. The 3–7 day build time and $50–$200/month operational cost make it accessible to businesses of almost any size.
If you want a pre-built implementation deployed and customized for your specific knowledge base and use case, explore our AI chatbot service or read our complete guide to building an AI customer service agent.