Ollama + Continue.dev: Free Copilot for WordPress Devs

GitHub Copilot costs $10-19 per month per developer. Ollama plus Continue.dev costs $0 per month with free github copilot alternative capabilities running entirely on local hardware. For WordPress developers who want AI autocomplete, chat, and inline edit suggestions without a subscription, this is the setup that delivers it.

What Ollama and Continue.dev Do Together

Ollama is the local model server. It downloads quantized LLM models, manages them on disk, and serves them through an OpenAI-compatible API endpoint at http://localhost:11434. You install it once and it runs as a background service. Continue.dev is the VS Code extension that connects to any OpenAI-compatible endpoint and adds AI autocomplete, a chat sidebar, and inline code edit suggestions to your editor. Together they replace the Copilot API plus Copilot VS Code extension stack with a fully local equivalent that works offline, has no monthly subscription, and never transmits your code to a third-party server.

For WordPress plugin developers working with client code under NDA, the zero data transmission aspect is the most important benefit. Client plugin code, database schemas, authentication logic, and API keys that appear in development configs should not transit an external API endpoint during code-gen. Running Ollama plus Continue.dev keeps everything within your own machine. The latency is higher than cloud API endpoints on fast hardware, but on an M2 Pro or M3 MacBook with 16GB of RAM, the 14B Qwen 2.5 Coder model responds in 2-4 seconds for typical autocomplete completions, which is within the acceptable range for interactive use during coding sessions.

Step 1: Install Ollama and Pull the Right Model

Choose your model based on available RAM. The Qwen 2.5 Coder series is the best choice for WordPress PHP development because it scores 92.1 on HumanEval and produces PHP that follows WordPress patterns reliably when given a WordPress-focused system prompt. Use the 7B model on 8GB RAM Macs, 14B on 16GB, and 32B on 24GB or more. On Windows with an NVIDIA GPU, the 14B model fits comfortably in 10GB VRAM. On Linux servers used for team serving, the 32B model on an A6000 48GB provides team-level throughput via vLLM instead of Ollama.

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull by RAM capacity
ollama pull qwen2.5-coder:7b     # 8GB RAM
ollama pull qwen2.5-coder:14b    # 16GB RAM
ollama pull qwen2.5-coder:32b    # 24GB+ RAM

# Verify it runs
ollama run qwen2.5-coder:14b "Write a WordPress wp_ajax handler"
# Press Ctrl+D to exit

After the model pulls successfully, verify the API endpoint is responding before configuring Continue.dev. The endpoint should return a JSON response with the model name and other metadata. If you are running Ollama on a separate machine from your development workstation, replace localhost with the IP address of the Ollama server throughout this guide. The Ollama API accepts connections from any IP by default when started with OLLAMA_HOST=0.0.0.0, but bind it to your local network interface rather than 0.0.0.0 on a shared network to avoid exposing the API to other users.

Step 2: Install Continue.dev and Configure config.json

Install the Continue.dev extension from the VS Code Extensions marketplace. After installation, open the Continue sidebar and click the gear icon to open the config.json file. The config file controls which model handles each feature: autocomplete (fast, low-latency completions), chat (higher-quality responses for the sidebar), and edit (inline code modifications). For WordPress development, use the same Qwen 2.5 Coder model for all three features from the same Ollama endpoint, with different temperature and max_tokens settings for each role.

{
  "models": [
    {
      "title": "Qwen 2.5 Coder 14B (Chat)",
      "provider": "ollama",
      "model": "qwen2.5-coder:14b",
      "systemMessage": "You are a WordPress plugin developer. Follow WordPress Coding Standards. Use tabs for indentation, not spaces. Always sanitize user input with appropriate WordPress sanitization functions and escape output with esc_html, esc_attr, or wp_kses_post. Use wpdb->prepare() for all database queries."
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen 2.5 Coder 14B (Autocomplete)",
    "provider": "ollama",
    "model": "qwen2.5-coder:14b"
  },
  "contextProviders": [
    { "name": "codebase" },
    { "name": "diff" },
    { "name": "file" },
    { "name": "terminal" }
  ]
}

The systemMessage in the chat model configuration is what makes the difference between generic PHP output and WordPress-standard PHP. Without the system prompt, Qwen produces PSR-12 style code with spaces for indentation and curly braces on new lines. The WordPress Coding Standards require tabs and same-line braces. A single system prompt instruction in the config.json fixes this for every chat session automatically, without needing to repeat the instruction on every prompt.

Step 3: Codebase Indexing for WordPress Plugin Repositories

Continue.dev includes a codebase indexer that creates embeddings of your local files, enabling the @codebase context provider to retrieve relevant code snippets when answering questions about your project. For a WordPress plugin project, enable codebase indexing from the Continue sidebar, then open your plugin directory. Continue.dev will index all PHP, JavaScript, and CSS files in the project, creating a local vector store that it queries when you ask questions about your codebase structure and hook registrations.

Indexing a typical WordPress plugin with 30-50 PHP files takes about 2-5 minutes on first run and then updates incrementally as you save files. The embeddings are stored locally, not on any external server. For the embedding model, Continue.dev by default uses a small local model via Ollama or can use a lightweight API-based embedding model. Using the all-minilm:l6-v2 Ollama embedding model (90MB) keeps everything local while providing reasonable retrieval quality for codebase search within a WordPress plugin project. Pull it once with the command ollama pull all-minilm:l6-v2 and add it to the embeddingsProvider section of config.json.

Autocomplete Performance on WordPress Hooks and Filters

The autocomplete experience with Qwen 2.5 Coder 14B via Ollama and Continue.dev on WordPress PHP files is meaningfully better than raw autocomplete without context, but slower than GitHub Copilot’s cloud-backed response time. Typical latency for a 10-30 token completion suggestion is 2-5 seconds on an M2 Pro MacBook. This is acceptable for triggered completions (Tab key) but feels slow for speculative suggestions (grey ghost text appearing as you type). Configure Continue.dev’s tabAutocompleteOptions with a debounce delay of 500-800ms to avoid firing autocomplete requests on every keystroke, which would make the editor feel laggy during fast typing.

Quality on WordPress-specific completions is strong when the plugin file is open in the editor. Continue.dev sends the current file context along with the autocomplete request, which means Qwen sees the existing hooks, the plugin slug, and the coding style already established in the file. Completions for add_action, add_filter, register_rest_route, and WP_Query calls are accurate for common patterns. Less common patterns like custom walker classes or complex meta query structures sometimes need a prompt in the chat sidebar to generate the initial scaffold before autocomplete can complete variations of it reliably. Read through the Gemini 2.5 Pro pricing breakdown for WordPress agencies if you want to compare this free local setup against the cheapest API-backed alternative at scale.

WPCS Integration via VS Code Tasks

Continue.dev does not run PHP_CodeSniffer automatically, but you can wire WPCS into the workflow through VS Code Tasks that Continue can read the output of. Create a tasks.json in your project’s .vscode directory that runs phpcs on the current file and another task that runs phpcbf to auto-fix violations. Then in the Continue chat sidebar, use the @terminal context provider to include the PHPCS output in your prompt when asking Qwen to fix the violations. This creates a manual but effective loop: run PHPCS, include output in Continue chat, ask Qwen to fix each violation, apply the suggested changes.

For a more automated workflow, configure the VS Code PHPCS extension alongside Continue.dev. The PHPCS extension shows violations as inline squiggles in the editor. When you see a PHPCS error, you can open the Continue chat, select the problematic code, and ask Qwen to fix it with the violation description in your prompt. This is less automated than Claude Code’s native PHPStan loop, but it works well enough for freelancers who do not need a fully autonomous fix pipeline and prefer to review changes before applying them. Read the AI subscription comparison for freelance WordPress developers for context on how this free setup compares to paying $20/mo for a more integrated tool.

Troubleshooting Common Setup Problems

The most common problem with the Ollama plus Continue.dev stack is the autocomplete feature failing silently after installation. If you open a PHP file and see no autocomplete suggestions after 5-10 seconds, the most likely cause is that the tabAutocompleteModel in config.json references a model name that does not exactly match what Ollama has pulled. Ollama model names are case-sensitive and include the size tag. Running ollama list in the terminal shows exactly what names are available. Copy the name from that output and paste it into the config.json model field to resolve the mismatch without reinstalling anything.

The second common issue is memory pressure causing slow responses when multiple applications are competing for RAM. On a 16GB MacBook running Docker for local development alongside Ollama, the combined RAM usage during active development often exceeds available physical memory and causes the system to page swap, which makes model inference dramatically slower. The fix is to stop the Docker desktop during AI-heavy coding sessions and restart it when you need the local dev environment. On a Mac with 24GB or more of unified memory this is rarely an issue, but on 16GB machines the RAM budget requires active management during coding sessions involving both Docker and local LLM inference.

A third issue that trips up WordPress developers specifically is the Continue.dev chat giving generic PHP answers instead of WordPress-specific answers even with the system prompt configured. The usual cause is that the system prompt is set in the models array but the user is triggering autocomplete rather than the chat interface, and the tabAutocompleteModel does not have the system prompt applied. The tabAutocompleteModel configuration supports a systemMessage field just like the main models array. Add the same WordPress Coding Standards instruction there too, and both the chat and the autocomplete completions will follow WordPress conventions consistently across all features of the extension.

When the embedding-based codebase search returns irrelevant results, the fix is usually a reindex. Delete the .continue directory inside your project folder to force Continue.dev to rebuild the vector store from scratch on the next embedding run. This is needed when you add or rename a large number of files that the incremental indexer did not pick up correctly. A full reindex on a 50-file WordPress plugin takes under 5 minutes on most machines and resolves stale codebase context issues reliably without requiring any configuration changes to your config.json setup.

Bottom Line

Ollama plus Continue.dev is a viable zero-cost alternative to GitHub Copilot for WordPress developers on hardware with at least 16GB of RAM. The setup takes about 30 minutes, produces WordPress-standard PHP when properly configured with the right system prompt, and keeps all code entirely local. The main trade-off is autocomplete latency (2-5 seconds vs under 1 second for cloud-backed tools) and slightly lower quality on complex completions involving obscure WordPress internals. For developers who have never tried a local AI stack before, the initial setup is the hardest part, and once that is done the workflow is straightforward.

The use cases where this setup wins clearly are client projects under NDA, high-volume scaffolding work where per-token API costs add up, and developers who work frequently offline or in restricted network environments. The use cases where paying $10-20/mo for GitHub Copilot or Cursor is still worth it are complex multi-file plugin architecture tasks and situations where 2-5 second autocomplete latency interrupts your typing flow enough to affect productivity. If you work at a pace where you notice a 3-second pause, budget $10/mo for Windsurf or Copilot instead.

Install Ollama and verify it runs before configuring Continue.dev
Pick the largest model size that fits your RAM comfortably (not the maximum possible)
Add the WordPress system prompt to both the chat model and the tabAutocompleteModel
Enable codebase indexing immediately after pointing Continue.dev at your plugin folder
Wire phpcs as a VS Code task and use the @terminal context provider to feed violations to the chat

Run the free local stack for two weeks on a real client project before deciding. Latency tolerance is personal, and two weeks of real use tells you whether the 2-5 second completions work within your flow.

Ollama + Continue.dev: The Free GitHub Copilot Alternative for WordPress Developers With Zero Subscription