Best AI/LLM Libraries for JavaScript in 2026

TL;DR

Vercel AI SDK for React/Next.js streaming; OpenAI SDK for direct API access; LangChain.js for complex agent pipelines. The Vercel AI SDK (~1.5M weekly downloads) provides React hooks and streaming primitives for chat UIs — useChat, useCompletion with minimal boilerplate. The OpenAI SDK (~4M downloads) is the direct API client with excellent TypeScript types. LangChain.js (~800K) is the heavyweight for chains, agents, and RAG — powerful but complex.

Key Takeaways

OpenAI SDK: ~4M weekly downloads — official, direct API access, best TypeScript types
Vercel AI SDK: ~1.5M downloads — React streaming hooks, multi-provider, Next.js-native
LangChain.js: ~800K downloads — chains, agents, RAG, vector stores, 100+ integrations
Vercel AI SDK v4 — streamText, generateObject, tool calling across providers
AI SDK — provider-agnostic: OpenAI, Anthropic, Google, AWS Bedrock, Mistral

The JavaScript AI Library Landscape in 2026

Building AI features in JavaScript in 2026 looks very different from 2023. The ecosystem has consolidated around a few clear patterns: streaming token-by-token responses, tool calling (function calling) for structured workflows, and structured output (JSON that matches a schema). Libraries that don't handle these three patterns well have lost ground.

The architectural decision isn't which LLM to call — it's which abstraction layer to use. The OpenAI SDK gives you direct access to one provider's API with excellent types. The Vercel AI SDK abstracts across providers and provides React-native streaming primitives. LangChain.js provides chains, agents, and retrieval abstractions for complex pipelines. Each layer adds value but also adds complexity.

Provider diversity is now a real consideration. In 2023, most teams defaulted to OpenAI because alternatives weren't competitive. In 2026, Anthropic's Claude models, Google's Gemini, and open-source models via Ollama or AWS Bedrock are genuinely competitive options. Using the OpenAI SDK directly ties you to OpenAI. Using the Vercel AI SDK or LangChain.js lets you swap providers with a config change.

Vercel AI SDK (React/Next.js)

// AI SDK — streamText with tool calling
import { streamText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

// app/api/chat/route.ts
export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = await streamText({
    model: openai('gpt-4o'),
    messages,
    tools: {
      getWeather: tool({
        description: 'Get weather for a location',
        parameters: z.object({
          city: z.string().describe('City name'),
          unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
        }),
        execute: async ({ city, unit }) => {
          const data = await fetchWeather(city, unit);
          return data;
        },
      }),
      searchPackages: tool({
        description: 'Search npm packages by keyword',
        parameters: z.object({ query: z.string() }),
        execute: async ({ query }) => {
          return searchNpm(query);
        },
      }),
    },
    maxSteps: 5,  // Allow multi-step tool use
  });

  return result.toDataStreamResponse();
}

// AI SDK — useChat React hook
'use client';
import { useChat } from 'ai/react';

export function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
    onError: (error) => console.error('Chat error:', error),
  });

  return (
    <div className="flex flex-col h-screen">
      <div className="flex-1 overflow-y-auto p-4">
        {messages.map(msg => (
          <div key={msg.id} className={`mb-4 ${msg.role === 'user' ? 'text-right' : 'text-left'}`}>
            <div className={`inline-block p-3 rounded-lg ${
              msg.role === 'user' ? 'bg-blue-500 text-white' : 'bg-gray-200'
            }`}>
              {msg.content}
            </div>
          </div>
        ))}
        {isLoading && <div>Thinking...</div>}
      </div>

      <form onSubmit={handleSubmit} className="p-4 border-t">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask anything..."
          className="w-full p-2 border rounded"
          disabled={isLoading}
        />
      </form>
    </div>
  );
}

// AI SDK — generateObject (structured output)
import { generateObject } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { z } from 'zod';

const schema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  score: z.number().min(0).max(1),
  topics: z.array(z.string()),
  summary: z.string().max(200),
});

const { object } = await generateObject({
  model: anthropic('claude-3-5-sonnet-20241022'),
  schema,
  prompt: `Analyze this review: "${userReview}"`,
});

// object is fully typed as z.infer<typeof schema>
console.log(object.sentiment, object.score);

The Vercel AI SDK's value proposition centers on useChat and streamText working together. The server-side streamText function calls the LLM and returns a streaming response. The client-side useChat hook consumes that stream and manages the message list, loading state, and input — all without any custom state management code.

The provider abstraction is the SDK's other major feature. Swapping from openai('gpt-4o') to anthropic('claude-3-5-sonnet') is a single line change. The tool calling interface, streaming protocol, and useChat hook work identically across providers. Teams that need to experiment with different models or switch providers for cost/performance reasons benefit significantly from this abstraction.

generateObject is particularly valuable for applications that need structured data from LLMs. The schema-based approach eliminates the brittle JSON parsing that was common in earlier LLM integrations — the SDK handles schema prompting, JSON extraction, and validation automatically. The output is fully typed as the Zod schema's inferred type.

OpenAI SDK (Direct API)

// OpenAI SDK — streaming chat
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// Streaming
const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Explain monads in 3 sentences' }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content || '';
  process.stdout.write(delta);  // Stream to stdout
}

// OpenAI SDK — function/tool calling
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What packages are similar to lodash?' }],
  tools: [
    {
      type: 'function',
      function: {
        name: 'search_packages',
        description: 'Search npm packages',
        parameters: {
          type: 'object',
          properties: {
            query: { type: 'string', description: 'Search query' },
            limit: { type: 'number', default: 5 },
          },
          required: ['query'],
        },
      },
    },
  ],
  tool_choice: 'auto',
});

const toolCall = response.choices[0].message.tool_calls?.[0];
if (toolCall) {
  const args = JSON.parse(toolCall.function.arguments);
  const results = await searchNpm(args.query, args.limit);

  // Continue conversation with tool result
  const finalResponse = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'user', content: 'What packages are similar to lodash?' },
      response.choices[0].message,
      { role: 'tool', content: JSON.stringify(results), tool_call_id: toolCall.id },
    ],
  });
}

The OpenAI SDK is the right choice when you want direct API access without abstraction. Every OpenAI API feature is immediately available without waiting for the AI SDK to add support. Advanced features like Assistants API, Batch API for bulk requests, fine-tuning management, and embeddings are all available with first-class TypeScript types.

For non-React environments — standalone scripts, background jobs, serverless functions that don't need streaming to a browser — the OpenAI SDK is simpler than the AI SDK. There's no framework-specific code, no React dependency, just a clean API client.

The tradeoff is lock-in. The OpenAI SDK is OpenAI-specific. If you later want to use Anthropic's Claude for a better use case, or Google Gemini for cost reasons, you'll need to rewrite the integration. For many applications, this tradeoff is acceptable; for others, provider portability is worth the abstraction cost.

LangChain.js (Complex Pipelines)

// LangChain.js — RAG (Retrieval Augmented Generation) pipeline
import { ChatOpenAI } from '@langchain/openai';
import { OpenAIEmbeddings } from '@langchain/openai';
import { MemoryVectorStore } from 'langchain/vectorstores/memory';
import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters';
import { createRetrievalChain } from 'langchain/chains/retrieval';
import { createStuffDocumentsChain } from 'langchain/chains/combine_documents';
import { ChatPromptTemplate } from '@langchain/core/prompts';

const model = new ChatOpenAI({ model: 'gpt-4o' });
const embeddings = new OpenAIEmbeddings();

// 1. Split documents
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});
const docs = await splitter.createDocuments([yourDocumentText]);

// 2. Create vector store
const vectorStore = await MemoryVectorStore.fromDocuments(docs, embeddings);

// 3. Create retrieval chain
const prompt = ChatPromptTemplate.fromTemplate(`
Answer based on the context:
Context: {context}
Question: {input}
`);

const documentChain = await createStuffDocumentsChain({ llm: model, prompt });
const retrievalChain = await createRetrievalChain({
  combineDocsChain: documentChain,
  retriever: vectorStore.asRetriever(),
});

// 4. Query
const result = await retrievalChain.invoke({
  input: 'What are the key features?',
});
console.log(result.answer);

LangChain.js is most valuable for RAG pipelines and complex multi-step agent workflows. The retrieval abstractions — document loaders, text splitters, vector stores, retrievers — represent a significant amount of infrastructure code that LangChain handles. Building a RAG pipeline from scratch involves document chunking logic, embedding generation, vector storage (Pinecone, Chroma, Weaviate, Qdrant), similarity search, and context injection — LangChain provides consistent interfaces for all of these.

The criticism of LangChain is valid: for simple use cases, the abstraction adds complexity without proportional value. Calling an OpenAI API directly is 5 lines. The LangChain equivalent is 20 lines with more abstraction layers. The complexity pays off when you're building a production RAG system with multiple document sources, a custom retrieval strategy, and a complex prompt pipeline.

LangChain's 100+ integrations are the strongest argument for it in enterprise environments. If your data lives in multiple sources (PDFs, databases, APIs, web pages), LangChain's document loaders standardize the ingestion pipeline. The vector store integrations mean you can switch from in-memory to Pinecone to Qdrant with a single line change.

Provider Comparison

Provider	Model	Strengths	Vercel AI SDK	OpenAI SDK	LangChain
OpenAI	GPT-4o	Tool calling, vision	✅	✅	✅
Anthropic	Claude 3.5	Long context, reasoning	✅	❌	✅
Google	Gemini 1.5	Multimodal, long context	✅	❌	✅
AWS Bedrock	Multiple	Enterprise, compliance	✅	❌	✅
Ollama	Open-source	Privacy, cost	✅	❌	✅

When to Choose

Scenario	Pick
React/Next.js chat UI with streaming	Vercel AI SDK
Multi-provider switching (OpenAI → Anthropic)	Vercel AI SDK
Direct OpenAI API with full control	OpenAI SDK
RAG pipeline with vector stores	LangChain.js
Complex agent workflows (multi-step)	LangChain.js
Type-safe structured output	Vercel AI SDK (generateObject)
Edge runtime compatible	Vercel AI SDK
OpenAI-specific features (Assistants, Batch)	OpenAI SDK
Multi-document search/retrieval	LangChain.js

Streaming, Token Management, and Error Handling

The practical challenges in AI application development go beyond choosing a library — they involve managing streaming responses, handling token limits, implementing graceful error recovery, and controlling costs at scale.

Why Streaming Matters

LLM responses are generated token by token. A 500-token response at 50 tokens/second takes 10 seconds to complete. Waiting for the full response before displaying anything creates a blank screen for 10 seconds — an experience that feels broken compared to the token-by-token streaming that ChatGPT and Claude.ai users expect.

The Vercel AI SDK's streaming APIs (streamText, streamObject) handle this correctly. streamText returns a ReadableStream that delivers tokens as they arrive. In a Next.js route handler, this enables server-sent events (SSE) with minimal configuration:

For React components, the useChat hook subscribes to the stream and updates component state with each incoming token, providing the progressive rendering that users expect. LangChain.js provides similar streaming through its stream() method on runnables, and the OpenAI SDK's streaming API (openai.chat.completions.stream()) gives direct access to the SSE stream.

Context Window Management

LLMs have hard token limits: GPT-4o at 128K tokens, Claude 3.5 Sonnet at 200K, Gemini 1.5 Pro at 1M. Applications that maintain conversation history must manage context growth — long conversations eventually exceed the model's context window.

Effective context management strategies range from simple truncation (remove oldest messages when approaching the limit) to semantic summarization (use the LLM itself to summarize earlier conversation turns). LangChain.js provides several built-in memory implementations (ConversationSummaryMemory, BufferWindowMemory) that automate these strategies. The Vercel AI SDK provides the context primitives but leaves memory strategy to the application layer.

For Retrieval Augmented Generation (RAG) applications — where relevant documents are retrieved and injected into the context — LangChain's vector store integrations (Pinecone, Chroma, pgvector via Drizzle) provide the infrastructure. The Vercel AI SDK's embed and embedMany functions generate embeddings, but the retrieval pipeline is the application's responsibility.

Cost Control and Rate Limiting

API costs for LLMs scale with token usage. A production chatbot with 10,000 conversations per day at 2,000 tokens each consumes 20 million tokens daily — at GPT-4o's pricing of $5/million input tokens and $15/million output tokens, this is meaningful spend. Cost visibility requires tracking token consumption per request.

The Vercel AI SDK includes usage metadata in every response: usage.promptTokens, usage.completionTokens, and usage.totalTokens. Logging these values per user and per conversation enables cost attribution and anomaly detection (unusually large contexts that indicate prompt injection or excessive history accumulation).

Rate limiting AI endpoints is distinct from rate limiting regular APIs. LLM API calls can take 5-30 seconds; request queuing with per-user concurrency limits prevents runaway parallel requests. An exponential backoff strategy on provider errors (particularly OpenAI's 429 rate limit responses) is essential for production reliability.

Multi-Provider Fallback

The Vercel AI SDK's provider model enables graceful fallback between providers. If your primary provider has an outage or returns errors above a threshold, switching to a backup provider requires changing one function parameter:

This multi-provider resilience pattern is one of the strongest arguments for the Vercel AI SDK over provider-specific SDKs. OpenAI's SDK requires significant refactoring to swap providers; the Vercel AI SDK makes it a configuration change.

Production Considerations: Latency, Reliability, and Cost

Moving AI features from prototype to production requires addressing concerns that don't appear during development: latency under load, provider reliability, cost scaling, and user experience during degraded states.

LLM API latency is non-deterministic and inherently higher than typical API calls. A chat completion request that takes 200ms on average can take 3-5 seconds during peak provider load. Users need visual feedback immediately when they submit — a streaming indicator ("typing...") rather than a blank wait state. The Vercel AI SDK's streaming primitives make implementing this feedback simple, but it requires architectural awareness from the beginning rather than being retrofitted later.

Provider outages are more frequent than standard API providers. Major LLM providers have experienced outages ranging from 15 minutes to several hours in 2025-2026. For applications where AI responses are critical features (not nice-to-have enhancements), multi-provider failover is worth implementing. The Vercel AI SDK's provider-agnostic interface makes this a configuration change rather than a refactor.

Cost attribution is the operational concern teams underestimate most. LLM API costs scale non-linearly: a user who includes extensive conversation history in each request consumes dramatically more tokens than a new user. A few power users can drive a significant fraction of your API costs. Logging token usage per user, setting per-user rate limits, and pruning conversation history at configurable thresholds are table-stakes features for any production AI application.

Model selection within a provider matters. GPT-4o and Claude 3.5 Sonnet provide strong capability but cost 10-20x more than GPT-4o mini or Claude 3 Haiku. For deterministic tasks where response quality is highly consistent (classifying intent, extracting structured data, summarizing short texts), smaller models often perform comparably to larger models at a fraction of the cost. A tiered approach — route simple tasks to smaller models, complex reasoning to larger models — reduces cost without sacrificing user experience for complex queries.

Compare AI library package health on PkgPulse. Related: AI Development Stack for JavaScript 2026, Best Realtime Libraries 2026, and Best TypeScript Build Tools 2026.

The 2026 JavaScript Stack Cheatsheet