Skip to content
Wbcom Designs

How I Built a Private AI Assistant for WordPress That Never Sends Data to OpenAI

· · 11 min read
WordPress Private AI self-hosted Ollama setup guide

I’ve been building WordPress plugins for over a decade. Last month, a client asked me a question I didn’t have a good answer to: “Our members are going to be talking to this AI assistant about their health history. Can you guarantee that OpenAI never sees any of that?”

I could not. That started a three-week build that I want to share with you.

The result is WP Private AI — an open-source proof of concept that adds a fully functional AI chat assistant to any WordPress site, where every inference call happens on your own server. No API keys sent to OpenAI. No data processed by Anthropic. No conversation logs stored anywhere except your own database.

The code is on GitHub: github.com/wbcomdesigns/wp-private-ai. This post explains how it works, why I built it the way I did, and what I learned.

The Problem with Cloud AI in WordPress

The WordPress ecosystem has embraced AI quickly. There are now dozens of plugins that let you add an AI chatbot to your site. Almost all of them work the same way: the user sends a message, the plugin forwards it to OpenAI or Anthropic’s API, gets a response, and displays it.

That works fine for many use cases. But it creates a serious problem for sites where users share personal information.

Consider a membership site where members discuss sensitive topics — health conditions, financial situations, legal matters. Every message they send to the AI assistant gets transmitted to a US company’s servers, processed by their model, and potentially logged for safety monitoring or model improvement.

Under GDPR Article 28, any company that processes personal data on your behalf is a data processor. Using OpenAI as your AI backend means OpenAI is a data processor for your site. That relationship requires a signed Data Processing Agreement, documented legal basis for the cross-border transfer of EU user data to US servers, and disclosure in your privacy policy.

Most WordPress site operators skip all of this. This is not a theoretical risk — EU data protection authorities have issued substantial fines for exactly this pattern. See the Privacy & GDPR wiki page for the full legal analysis.

There’s a simpler answer. What if the AI model ran on your server?

The Architecture

The key insight is that modern open-source models are good enough for site-specific Q&A tasks. We don’t need GPT-4 to tell a member what their last booking was, how much they’ve donated, or whether their license is still active. We need a model that can reliably call a WordPress function, receive structured data back, and describe it accurately.

Ollama makes this practical. It’s a tool that lets you run open-source models locally — llama3.1:8b, qwen2.5:14b, and others — as a simple HTTP API server on localhost:11434. The request format is compatible with the OpenAI chat format, so existing WordPress AI plugins can point to it with minimal changes.

The data flow looks like this:

[User browser]
      │
      ▼ HTTPS
[WordPress REST API — /wp-json/wp-agent/v1/chat/stream]
      │
      ▼ wp_remote_post() to 127.0.0.1:11434
[Ollama — llama3.1:8b running on your server]
      │
      ▼ NDJSON response (streamed)
[WordPress SSE endpoint]
      │
      ▼
[User browser — tokens appear as they're generated]

The user’s message goes in at the top. It never leaves the server. The model’s response comes out at the bottom via server-sent events. No external API calls. No data leaves your infrastructure.

For the VPS setup (firewall, TLS, systemd config), see the Self-Hosted Setup wiki. For multi-site shared server vs. per-site Docker container — both deployment models are documented there.

What Makes It Actually Useful: The WP Abilities API

An AI that can only talk in generalities isn’t useful on a WordPress site. What makes it useful is when it can answer questions using real data from your database.

WordPress 6.9 introduced the WP Abilities API — a way to register callable functions that the AI can invoke during a conversation. When a user asks “What’s my last purchase?”, the AI doesn’t guess. It calls the edd/get-user-orders ability, which runs a real database query, and answers using the actual data.

Here’s what registering an ability looks like in practice:

The permission_callback runs before execute_callback. If the current user doesn’t have permission — because they’re not logged in, because the data belongs to someone else, or because they don’t have the right role — the ability returns a WP_Error and the AI is told the ability is not available. The data is never fetched.

This is important: the access control is in PHP, not in the AI. The AI can’t talk its way past a permission_callback. The function simply doesn’t run. See the Access Control wiki page for all four security layers.

The Scanner: Generating Adapters for Any Plugin

Writing an ability adapter for every WordPress plugin manually would be impractical. Instead, I built a scanner (scanner/wp-plugin-scanner.py) that analyzes a plugin’s PHP codebase and generates a starting-point adapter automatically.

The scanner does four things: finds REST routes by parsing register_rest_route() calls, finds Custom Post Types by parsing register_post_type() calls, finds CRUD methods by scanning for patterns like get_*, create_*, update_*, delete_*, and generates abilities that map each finding to a wp_register_ability() call with a stub execute_callback.

I ran it against 10 popular plugins:

PluginREST RoutesCPTsAbilities Generated
Fluent Forms1208
FluentCRM31218
WPForms816
Groundhogg24315
Easy Digital Downloads19212
GiveWP16110
AffiliateWP— (paid)6 (manual)
LifterLMS22414
Fluent Booking1419
WP Job Manager1117

After the scanner runs, you copy the generated file to wp-content/mu-plugins/, implement the execute_callback functions using the target plugin’s PHP API, and the AI can start answering questions about that plugin’s data. No activation needed — mu-plugins run automatically. All 10 generated adapters are in poc/ in the repo.

The Problem That Almost Killed the Proof of Concept

About two weeks in, I had the chat widget working. I asked the AI: “How many members does this site have?”

It replied: “Your site currently has over 500,000 registered members.”

The test site had 27 members.

This is the hallucination problem. The 8B model, when it doesn’t have real data, fills in the gap with a plausible-sounding number. 500,000 members is a plausible number for a large community site. The model doesn’t know it’s wrong because it doesn’t know the actual number.

If first impressions are wrong, people won’t trust the tool. And they’d be right not to.

The fix was to build a site indexer that runs at plugin activation time and caches the real facts from the WordPress database:

These facts are injected into the system prompt before every conversation, labeled as authoritative. Here’s what the system prompt section looks like:

The key phrase is “never contradict them.” Without that instruction, a small model might still override the injected data with its training priors. With it, asking the same question now gets: “Your site currently has 27 registered members.”

The indexer refreshes every 6 hours via a transient and busts immediately whenever a plugin is activated or deactivated. Plugin activations change the site’s capability set — the AI needs to know when WooCommerce goes live, for example.

The Chat Widget

The user-facing part is a side panel that slides in from the right when a floating action button is clicked. It’s built with vanilla JavaScript and a WordPress REST endpoint — no React, no Vue, no heavy frontend framework.

Streaming works via the browser’s EventSource API connecting to /wp-json/wp-agent/v1/chat/stream. The widget renders markdown properly: **bold** becomes bold text, `code` becomes inline code, numbered lists become <ol> elements. Tokens appear as they’re generated — the same “typing” effect you get from ChatGPT, but entirely on your server.

The full conversation flow: user types a message → JavaScript POSTs to the REST endpoint → PHP builds the system prompt + conversation history → calls Ollama → if the model invokes an ability, WP Agent executes the PHP callback and returns the result → model generates its final response → SSE streams tokens to the browser.

The entire conversation is stored in WordPress database tables: wp_agent_conversations and wp_agent_messages. GDPR Article 17 erasure is a single DELETE query. WP Agent includes hooks into WordPress’s built-in personal data export and erasure tools (Settings → Privacy).

Multi-Site Deployment: One VPS for 10 Sites

Running a separate VPS for each site would be expensive. The shared server model runs one Ollama instance on a DigitalOcean 16 GB droplet ($96/month) behind an nginx gateway with per-site Bearer token authentication.

Each WordPress site gets a unique token: openssl rand -hex 32. The nginx gateway validates tokens, enforces per-site rate limiting, and proxies to Ollama on the loopback:

Adding a new site: generate a token, add one line to the nginx map, reload nginx (zero downtime), enter the token in WP Agent settings. Removing a site: delete the token line, reload nginx. No Ollama restart needed. Immediate effect.

For full VPS setup instructions — Ollama install, systemd config, firewall, TLS with Certbot — see the Self-Hosted Setup wiki. For the Docker Compose model (one container per site), see Per-Site Container.

The Fallback

The shared Ollama server will occasionally be down — maintenance, memory issues, a model loading error. Rather than showing the user an error, WP Agent automatically falls back to a cloud provider.

The AI Router checks the primary provider (Ollama), and if it returns a WP_Error, it immediately tries the configured fallback (Google Gemini Flash). From the user’s perspective, the response is slightly slower. They don’t see an error.

Gemini Flash costs $0.10/$0.40 per million tokens — extremely cheap for fallback-only usage. At 10 sites with light usage, the monthly fallback bill would be a few dollars at most. The $96/month droplet is not a single point of failure. It’s the primary path. Cloud is the safety net.

Access Control: Four Layers

One thing I was careful about: the AI should never be able to talk its way past the access control system.

Layer 1 — WordPress role check: The chat widget isn’t rendered for logged-out users. No one can send a message without being authenticated to WordPress.

Layer 2 — Ability permission callbacks: Every registered ability has a permission_callback that runs in PHP before the data is fetched. If the user doesn’t have permission, the callback returns false and the ability returns a WP_Error. The data is never touched.

Layer 3 — System prompt scope restrictions: The system prompt tells the model what topics it’s allowed to address. A membership site assistant won’t answer questions about coding. An e-commerce assistant won’t offer competitor pricing advice.

Layer 4 — nginx Bearer token: At the infrastructure level, only requests with a valid site-specific Bearer token reach Ollama. An unauthenticated request gets a 401 before the query is ever processed.

These four layers are independent. A failure at Layer 3 (the model goes off-script) doesn’t bypass Layer 2 (the PHP function still checks permissions). The system is secure even if the model does something unexpected. Full details in the Access Control wiki page.

Why Small Models Work for This Use Case

A question I get when showing this to other developers: “Won’t a smaller model like llama3.1:8b get things wrong? Shouldn’t we use GPT-4?”

For the specific use case of site-specific Q&A with structured function calls, smaller models work well. The model’s job is narrow: read the system prompt, understand what the user asked, decide whether to call an ability, call it with the right parameters, receive the structured result, and describe it in natural language.

It’s not being asked to write essays or reason about complex multi-step problems. The site indexer makes this even easier by providing exact numbers in the system prompt. The model doesn’t have to reason about “how many members does this site probably have?” — it’s told. The job becomes description, not inference.

The practical result: for 80–90% of common questions on a WordPress site — “what are my recent purchases?”, “am I enrolled in this course?”, “when is my next booking?” — the 8B model gets it right and responds in 5–15 seconds. That’s good enough for production use.

Where small models still struggle: very complex multi-turn conversations that require holding a lot of context, or novel ability combinations they haven’t seen examples of. For those cases, the cloud fallback provides a graceful degradation path rather than a hard failure.

Getting Started

Everything you need to try this is in the GitHub repository. Here’s the high-level path:

  1. Set up the VPS — Install Ollama, pull llama3.1:8b, configure nginx with TLS and Bearer tokens. Full instructions: Self-Hosted Setup.
  2. Install WP Agent — The WordPress plugin is in the plugin/ directory. Upload to your site, activate, and go to Settings → WP Agent to enter the Ollama endpoint URL and your site’s Bearer token.
  3. Run the scannerpython3 scanner/wp-plugin-scanner.py /path/to/plugin generates an ability adapter file. Copy the output to wp-content/mu-plugins/, implement the execute_callback functions, and the AI can query that plugin’s data.
  4. Connect your first site — Set the primary provider to Ollama, enter the endpoint URL (https://ollama.yourdomain.com/api/chat) and your site’s Bearer token. Set a fallback provider (Google Gemini Flash is recommended). Test with the built-in connection checker.
  5. Add more sites — Generate a new Bearer token for each site with openssl rand -hex 32, add one line to the nginx map, reload nginx. No VPS restart needed.

The full setup for VPS + one WordPress site takes about 2–3 hours. Each additional site after the first takes about 15 minutes.

If something doesn’t work, the Plugin Compatibility and WP Abilities API wiki pages cover common adapter issues. Open a GitHub issue if you get stuck.

What This Proof of Concept Demonstrates

After three weeks of build, here’s what this shows is possible:

For GDPR-sensitive sites, you can add an AI assistant that never sends user data to a third-party processor. No DPA required, no cross-border transfer analysis, no provider disclosure in your privacy policy. The legal case is simple because the architecture is simple.

For community sites, the AI can answer real questions about real member data: “What courses am I enrolled in?”, “Show my recent donations”, “What’s the status of my job application?”. These aren’t guesses — they’re database queries wrapped in natural language.

For agencies, the shared server model means one VPS can serve 10+ client sites. At $96/month split across clients, the cost is negligible compared to per-token cloud AI costs at moderate usage volumes.

For plugin developers, the WP Abilities API plus the scanner gives you a path to add AI capability to any plugin without modifying it. Drop the generated adapter in mu-plugins, implement the execute callbacks, and any AI-enabled WordPress site can query your plugin’s data.

What’s Next

This is a proof of concept, not a finished product. The things I’d build next:

Better model supportqwen2.5:14b has significantly better tool-calling quality than llama3.1:8b and fits in 32 GB RAM. For sites where accuracy matters most, it’s worth the upgrade.

Streaming ability results — Currently the model waits for the entire ability response before starting to generate. For slow queries, showing “Checking your orders…” while the database query runs would improve the experience.

Conversation memory — The current implementation sends the full conversation history with every request. For long conversations, this grows large. A summarization pass after N turns would keep the context window manageable.

Admin dashboard — Site owners should be able to see which abilities are registered, how often they’re called, and whether any are returning errors. Right now this requires digging in logs.

More plugin adapters — The scanner ran against 10 plugins. The WordPress.org plugin directory has 60,000+. The scanner could run against all of them and generate a public library of adapters.

The Code

Everything is on GitHub: github.com/wbcomdesigns/wp-private-ai

The repository includes:

  • The WP Agent plugin (plugin/) — the WordPress plugin powering the chat widget
  • The scanner (scanner/wp-plugin-scanner.py) — generates ability adapters from plugin source
  • Generated adapters for 10 plugins (poc/)
  • Docker Compose file (docker/docker-compose.yml) — for per-site container deployment
  • nginx gateway config (nginx/ollama-gateway.conf) — for shared server deployment
  • Documentation for both deployment models in the GitHub wiki

If you’re running a WordPress site where user privacy matters — a health community, a financial forum, a legal advice membership — I’d encourage you to try it. The setup is about two hours of work for the VPS and another hour to connect your first WordPress site.

Questions or contributions welcome on GitHub Issues.

Varun Dubey
Varun Dubey

We specialize in web design &amp; development, search engine optimization and web marketing, eCommerce, multimedia solutions, content writing, graphic and logo design. We build web solutions, which evolve with the changing needs of your business.