LangChain gets a lot of attention and a lot of criticism. The criticism is often valid — LangChain can add complexity where it isn't needed, and the documentation has historically been inconsistent. But when you need to build an agent that uses multiple tools, maintains memory across a conversation, and retrieves information from your own data sources, LangChain genuinely shortens the development time.
We've built production agents using LangChain, LangGraph, the Vercel AI SDK, and raw API calls. This guide reflects what we've learned shipping these systems for clients: when to use LangChain, how to build two common agent types (customer support and data analysis), and when to upgrade to LangGraph for complex state management.
What LangChain Actually Is (and Isn't)
LangChain is an open-source Python and TypeScript framework for building applications powered by large language models. It provides abstractions for: connecting to LLM APIs (OpenAI, Anthropic, Groq, etc.), defining tools the AI can call, managing conversation memory, loading and chunking documents, creating vector store retrievers, and orchestrating multi-step agent loops.
What LangChain is not: it is not a hosted service, not an AI product you subscribe to, and not a replacement for the LLM APIs themselves. You still need an OpenAI or Anthropic API key. LangChain is glue code — well-designed, well-tested glue code that handles boilerplate so you don't have to.
LangSmith, made by the same company (LangChain AI), is a separate hosted product for tracing and debugging LangChain applications. It's optional but genuinely useful in production.
LangChain vs LangGraph vs Vercel AI SDK vs From Scratch
Before writing any code, choose the right framework. Wrong choice means refactoring later.
LangChain: Best for Standard Agent Patterns
Use LangChain when you need a single-agent setup with tool calling, conversation memory, and optional RAG. It handles the ReAct agent loop (Reason → Act → Observe) out of the box. Quick to prototype, good community, lots of pre-built tool integrations (search, SQL, Python execution, APIs). Weakness: complex workflows with conditional branching become messy quickly.
LangGraph: Best for Multi-Step Agents with State Machines
LangGraph is built on top of LangChain and adds a graph-based execution model with explicit state management. Use it when your agent needs conditional routing ("if the user asks about billing, route to billing agent; if technical, route to engineering agent"), human-in-the-loop checkpoints, or multi-agent coordination. More setup, much more control.
Vercel AI SDK: Best for Next.js Web Apps with Streaming
If you're building a Next.js app and want streaming chat responses, the Vercel AI SDK is the cleanest solution. It handles streaming, tool calling, and multi-step agent loops with minimal code. We use this for client-facing chat interfaces. Weakness: TypeScript/JavaScript only, and it's designed for web — not ideal for backend data pipelines.
From Scratch: Best for Full Control and Minimal Dependencies
Direct API calls to OpenAI or Anthropic with custom tool calling logic. More code, but no framework constraints. Use this for simple chatbots (LangChain adds zero value), high-performance systems where every millisecond matters, or when you need unusual orchestration patterns that frameworks don't support. Also useful when a client's security review prohibits third-party libraries.
Prerequisites
For the Python examples in this guide, you need:
- Python 3.10 or higher (3.11 recommended)
- An OpenAI API key (or Anthropic — the patterns are nearly identical)
- pip or uv for package management
- Basic familiarity with Python async/await
LangChain also has a TypeScript/Node 18+ version if you prefer. The Python version is more mature and has more integrations, so that's what we use for backend agents.
Step-by-Step: Building a Customer Support Agent
This is the most common agent type we build for clients. It handles incoming support queries, looks up relevant information from a knowledge base, can check order status or account data, and escalates to humans when needed.
Step 1: Installation
langchain-openai provides the OpenAI integration. langchain-community includes pre-built tools and integrations. chromadb is the vector store for RAG. fastapi and uvicorn are for the deployment endpoint.
Step 2: Define Your Tools
Tools are functions the agent can call to take actions or retrieve information. Each tool has a name, description (which the LLM reads to decide when to use it), and a function that executes when called.
For a customer support agent, you typically need three tools:
- FAQ lookup tool: Searches your knowledge base using vector similarity for relevant policy answers, product info, or how-to guides.
- Order/account status tool: Queries your database or API with the customer's identifier to retrieve their current order status, subscription tier, or account information.
- Escalation tool: Creates a support ticket and notifies a human agent when the AI determines the issue is out of scope or the customer requests human help.
The description you write for each tool is critical — the LLM decides when to use each tool based on this description. Be specific: "Use this tool when the customer asks about their order status, shipping, or delivery. Input: the order ID or email." Generic descriptions lead to the agent using tools at inappropriate times.
Step 3: Build the Agent
The agent is assembled from three components: the LLM (ChatOpenAI with gpt-4o or claude-3-5-sonnet), the tools list, and an AgentExecutor that manages the Reason → Act → Observe loop. The system prompt defines the agent's persona, scope, and escalation criteria.
A production-quality system prompt for a customer support agent covers: who the agent is, what company it represents, what it can and cannot do, how to handle sensitive requests (billing disputes, account termination, legal matters → always escalate), and the tone/language requirements (professional, empathetic, concise).
Step 4: Add Memory
Without memory, every message is isolated — the agent forgets what the user said one turn ago. LangChain's ConversationBufferWindowMemory keeps the last N message pairs in context. We use a window of 10 (last 10 exchanges) as default — enough for a full support conversation without blowing up the context window.
For longer-running or multi-session memory, you need persistent storage: ConversationSummaryBufferMemory (summarizes old messages automatically) or a custom implementation that stores conversation history in your database and retrieves it by session ID. We use the database approach for any production deployment where users might return to a conversation.
Step 5: Add RAG (Knowledge Base Retrieval)
RAG (Retrieval-Augmented Generation) is what allows the agent to answer questions from your specific documentation rather than hallucinating answers. The setup: load your documents (PDFs, markdown files, web pages) → split into chunks (500–1000 tokens each) → embed with OpenAI embeddings → store in Chroma vector database → create a retriever → wrap as a tool.
The retriever tool search description should say something like: "Search the company knowledge base for answers to customer questions about our products, policies, pricing, and procedures. Use this before attempting to answer any factual question." This ensures the agent retrieves before generating, not after.
Step 6: Deploy with FastAPI
A minimal FastAPI endpoint wraps the agent: POST /chat accepts a session_id and message, retrieves the conversation history for that session, invokes the agent, stores the updated history, and returns the response. Deploy to a VPS, Railway, or Fly.io. Typical cold start: under 2 seconds. Warm response time: 1–3 seconds depending on tool calls and LLM latency.
Step-by-Step: Building a Data Analysis Agent
The second common agent type we deploy for clients is a data analysis agent: a natural language interface to structured data that can query databases, run calculations, and generate charts. This is particularly valuable for internal business intelligence tools.
The Three Core Tools
A data analysis agent typically needs three tools:
- SQL tool: LangChain has a built-in SQLDatabaseTool that allows the agent to inspect your database schema and run SELECT queries. Configure it with read-only credentials — the agent should never write to the database directly. Limit to specific tables relevant to the analysis use case.
- Python REPL tool: Allows the agent to execute Python code for calculations, data transformations, and statistical analysis. Use LangChain's PythonREPLTool with a sandboxed environment. Important: restrict what's importable in the sandbox — no file system access, no network calls, just pandas, numpy, scipy, and similar analytical libraries.
- Chart generation tool: A custom tool that takes a pandas DataFrame (or JSON data), generates a matplotlib or plotly chart, saves it to a temporary URL (S3 or similar), and returns the URL. The agent includes the URL in its response.
The agent decides which tools to use based on the question. "What was our revenue last quarter?" → SQL query. "What's the year-over-year growth rate?" → SQL query followed by Python calculation. "Show me a bar chart of monthly revenue" → SQL query, Python aggregation, chart generation.
LangGraph for Complex Flows
At some point, a single AgentExecutor isn't enough. The signal to upgrade to LangGraph is when you need conditional routing between different agent behaviors, approval gates, or multi-agent coordination.
When to Upgrade from LangChain Agents to LangGraph
- You need to route different query types to specialized sub-agents (billing agent, technical support agent, sales agent)
- Some steps require human review before proceeding (e.g., agent proposes a refund, human must approve before it's issued)
- The agent needs to take multi-step actions with checkpoints and rollback capability
- You're building a workflow that involves multiple LLM calls in a defined sequence, not open-ended tool calling
LangGraph State Machine Pattern
LangGraph represents your agent as a directed graph with typed state. A customer support workflow might look like: intake node (receive message) → classify node (LLM determines: billing, technical, general, or off-topic) → route node (conditional edge based on classification) → specialized handler node → respond node.
Each node receives the current state, does its work, and returns state updates. Conditional edges implement routing logic. Human-in-the-loop checkpoints are implemented as interrupt nodes that pause execution and wait for external input before continuing.
Production Considerations
LangSmith for Tracing and Debugging
LangSmith is the observability platform for LangChain applications. Every agent run is logged as a trace showing each node executed, tools called, input/output for each step, token counts, and latency. When an agent gives a wrong answer, you pull up the trace and see exactly where it went wrong.
Pricing: Developer plan is $39/month for 10,000 traces/month. Usage-based production pricing above that. For any production agent, LangSmith is a required investment — debugging LLM agents without tracing is guesswork.
Cost Controls
LLM API costs in an agent can be unpredictable because the agent might call tools multiple times and generate multiple completions per user turn. Implement:
- Per-conversation token budget: Track tokens used in each session and stop the agent loop if it exceeds the budget. Return a graceful response.
- Tool call caching: Cache results from expensive tools (database queries, external API calls) with appropriate TTLs. If the user asks the same question twice in a session, return the cached answer without a new LLM call.
- Model tiering: Use gpt-4o-mini or claude-3-haiku for initial intent classification (cheap) and gpt-4o or claude-3-5-sonnet only for the final response generation (expensive but necessary for quality).
Error Handling When Tools Fail
Tools fail. APIs are down, databases timeout, network requests error. Wrap every tool in a try/except that returns a structured error message rather than raising an exception. The error message should tell the agent what happened and suggest alternatives: "Order status lookup failed: database unavailable. Ask the customer to check their email confirmation or contact support@company.com directly."
Without this, a tool failure causes the agent loop to crash or enter confused behavior. With it, the agent gracefully handles the failure and continues the conversation.
Honest Assessment: When Not to Use LangChain
LangChain adds complexity. Here is when we recommend against it:
For simple FAQ chatbots: If your use case is "answer questions about our product based on this document," the OpenAI Assistants API is simpler. You upload documents, create an assistant, and you're done. No framework, no code. LangChain would add 200 lines of setup for zero additional capability.
For a basic RAG chatbot: Again, OpenAI Assistants API with file search handles this with minimal code. LangChain's RAG pipeline is more flexible (use any vector store, any embedding model, custom retrieval strategies) but only matters if you need that flexibility.
When performance is critical: LangChain adds abstraction overhead. For agents where response time is tightly constrained (sub-second responses), direct API calls with a minimal custom orchestration layer will consistently outperform LangChain.
When your team doesn't know Python: LangChain's TypeScript version exists but lags the Python version significantly. If your team is TypeScript-first, the Vercel AI SDK is a better choice.
Cost Reality: What You'll Actually Pay
A LangChain agent with three tools and RAG costs approximately $0.02–$0.08 per conversation in API fees, depending on query complexity, number of tool calls, and document retrieval size. For a support agent handling 500 conversations per day, that's $10–$40/day in API costs — $300–$1,200/month.
Development time for a production-ready deployment: 40–80 hours for an experienced developer building their first LangChain agent. This includes the agent itself, tool development, RAG pipeline setup, FastAPI deployment, and basic observability with LangSmith. Subsequent agents in the same codebase are faster — 20–40 hours — because the infrastructure and patterns are already established.
Do not underestimate the data preparation work for RAG. Getting your knowledge base into clean, chunked, embedded form is often 20–30% of the total project time. Low-quality source documents produce low-quality retrieval and low-quality agent responses.
For a deeper understanding of how RAG works and when to use it, see our RAG explained guide. For a broader view of agentic workflow patterns beyond LangChain, see our agentic AI workflows guide.