Self-Host a GitHub Copilot Clone With Tabby: WordPress Dev Setup on Any Old GPU
Tabby is an open-source self-hosted Copilot clone that you can run on a GPU you already own. Unlike Ollama plus Continue.dev, which focuses on chat and flexible autocomplete, Tabby is purpose-built for self-host copilot wordpress autocomplete workflows, with a UI dashboard, team analytics, and native VS Code and JetBrains plugins that behave identically to GitHub Copilot’s completion experience.
What Makes Tabby Different From Ollama Plus Continue.dev
Tabby is a dedicated autocomplete server, not a general-purpose LLM interface. It does one thing well: provide fast, contextual code completion suggestions in your editor, served from your own hardware. The Tabby server includes a built-in web dashboard where you see completion acceptance rates, per-user telemetry for team setups, and model management. You deploy it once on a server or workstation with a GPU, and every developer on your team connects their IDE to the same Tabby instance through a shared API token.
The key advantage over the Ollama plus Continue.dev stack is that Tabby is optimized for completion speed at the cost of general-purpose chat capability. A dedicated autocomplete model running in Tabby responds in 200-800ms on adequate hardware, which is competitive with GitHub Copilot’s cloud response times. Continue.dev with Ollama delivers 2-5 second completions because the model is a general-purpose LLM optimized for quality, not completion speed. If autocomplete latency is your primary concern and you have a GPU available to dedicate to the server, Tabby is the better choice for WordPress teams that want an always-available, near-zero-latency autocomplete experience.
Hardware Requirements and GPU Compatibility
Tabby’s GPU requirements depend on the model you choose to serve. The 1B parameter Qwen 2.5 Coder model fits in 2GB of VRAM and delivers fast completions with reasonable quality for single-line suggestions. The 3B model fits in 4GB. The 7B model fits in 8GB and is the minimum recommended size for WordPress PHP development where the model needs enough capacity to understand plugin architecture patterns, hook callbacks, and sanitization conventions. The 7B model on a GTX 1080 8GB (which sold for around $400 in 2021 and under $200 used in 2026) delivers completions in 300-600ms, which is fast enough for smooth interactive autocomplete.
| GPU | VRAM | Max Model | Completion Speed | Est. Used Price |
|---|---|---|---|---|
| GTX 1080 | 8GB | 7B | 300-600ms | ~$150-200 |
| RTX 3080 | 10GB | 7B | 150-300ms | ~$250-350 |
| RTX 3090 | 24GB | 14B | 200-400ms | ~$500-700 |
| RTX 4090 | 24GB | 14B+ | 100-200ms | ~$1,400-1,800 |
CPU-only inference is possible but produces completions in 3-8 seconds, which is too slow for interactive autocomplete. If you do not have a compatible GPU, the better choice is the Ollama plus Continue.dev stack, which is designed to work well at higher latency because Continue.dev handles async completions more gracefully than Tabby does. Tabby is built around the assumption that completions arrive fast enough to feel like inline suggestions rather than triggered lookups, and slow completions break that user experience assumption.
Installing Tabby With Docker
Tabby’s recommended installation method is Docker, which handles the CUDA driver dependencies, model downloads, and server configuration automatically. You need Docker Desktop installed with GPU access enabled, or Docker Engine on Linux with the nvidia-container-toolkit package. On Windows, Docker Desktop with WSL2 backend and CUDA support enabled is the standard configuration. On Linux, install nvidia-container-toolkit from the NVIDIA package repository before running the Tabby container.
# Run Tabby with Qwen 2.5 Coder 7B on NVIDIA GPU
docker run -d \
--gpus all \
-p 8080:8080 \
-v $HOME/.tabby:/data \
--name tabby \
tabbyml/tabby serve \
--model Qwen/Qwen2.5-Coder-7B \
--device cuda
# For CPU-only (slow, not recommended for production use)
docker run -d \
-p 8080:8080 \
-v $HOME/.tabby:/data \
--name tabby \
tabbyml/tabby serve \
--model Qwen/Qwen2.5-Coder-7B
# Check Tabby dashboard
open http://localhost:8080
After the container starts, the Tabby web dashboard is available at http://localhost:8080. First-run model download takes 10-30 minutes depending on model size and internet speed. The 7B Qwen model is approximately 4GB. After the download completes, the dashboard shows the server status, a health check endpoint, and an API token generation interface. Generate an API token from the dashboard and save it. You will use this token to authenticate the VS Code and JetBrains extensions.
VS Code and JetBrains Integration for WordPress Development
Install the Tabby VS Code extension from the Extensions marketplace and configure it to point at your local Tabby server. Open the extension settings, set the endpoint to http://localhost:8080 (or the IP address of your server if running Tabby on a separate machine), and paste in the API token from the Tabby dashboard. After saving, the extension shows a connected status indicator in the VS Code status bar. Open any PHP file and start typing. Completions appear as grey ghost text in the same style as GitHub Copilot, triggered automatically as you type without any keyboard shortcut required.
For JetBrains PhpStorm users, install the Tabby plugin from the JetBrains Plugin marketplace. The configuration is identical: endpoint URL and API token. The PhpStorm plugin integrates with the native PhpStorm completion system, so Tabby completions appear alongside PhpStorm’s built-in PHP type-aware suggestions rather than replacing them. This means you get both PhpStorm’s static analysis-based completions and Tabby’s learned pattern completions simultaneously, which is a stronger autocomplete experience than either tool provides alone. The Gemini 2.5 Pro pricing breakdown for agencies provides context on what you are replacing when you switch from API-backed tools to local inference.
WordPress Codebase Indexing in Tabby
Tabby supports repository-level code indexing, which significantly improves completion quality by training the model’s context on your specific codebase rather than relying solely on the pre-trained model weights. Configure the Tabby server to index your WordPress plugin repositories by adding repository paths to the config.toml file in the Tabby data directory. Tabby indexes the files, creates embeddings, and uses retrieval-augmented generation at inference time to include relevant snippets from your codebase in the completion context.
For a WordPress agency running multiple client plugins on a shared Tabby server, codebase indexing is the feature that makes Tabby competitive with GitHub Copilot’s enterprise tier. Copilot for Business at $19/mo per developer includes codebase-aware completions. Tabby provides the same capability on your own hardware at $0/mo per developer. The GPU server cost for a 5-developer team using Tabby amortizes to under $5/mo per developer after the first year, assuming a used RTX 3090 at $600 and 3-year hardware life. Read the AI subscription comparison for freelance WordPress developers for context on what paying $10-19/mo per developer gets you from GitHub Copilot in comparison.
Team Setup: Sharing One Tabby Server Across Developers
Running Tabby on a shared server is the most cost-effective configuration for WordPress agencies. Instead of each developer running their own Ollama instance locally, a single Tabby server with a mid-range GPU handles all team members simultaneously through token-authenticated requests. The server component is stateless between requests, so GPU memory holds the model weights and each incoming completion request runs inference without any per-user session overhead. A single RTX 3090 running Qwen 2.5 Coder 7B handles 4-6 concurrent developers without noticeable latency increases, because code completion requests are short sequences that return in under a second and the GPU is free for the next request immediately after.
To set up team access, generate a unique API token for each developer from the Tabby admin dashboard. Each token appears separately in the usage analytics, so the dashboard shows acceptance rates, total suggestions, and active usage hours broken down by token. This telemetry is useful for understanding which developers are actively using the autocomplete and whether the model is producing suggestions that are accepted or routinely dismissed. Low acceptance rates for a specific developer often mean their workflow or file types are outside the training distribution for the current model, which is a signal to try a different model rather than a problem with the server setup.
Exposing Tabby beyond your local network requires a reverse proxy. Run nginx or Caddy in front of the Tabby container and terminate TLS at the proxy layer. The Tabby API authenticates via bearer token, so the proxy does not need to do any credential handling beyond TLS termination. A Caddy reverse proxy configuration for Tabby takes under 10 lines and provides automatic TLS certificate renewal through Let’s Encrypt. After setting this up, developers on remote machines or connecting from client offices can use the same shared Tabby server as the local team without any VPN requirement.
Model Selection for WordPress PHP Work
Tabby supports any model from the Hugging Face Hub that is compatible with the FIM (Fill-in-the-Middle) completion format. Qwen 2.5 Coder is the best choice for WordPress development in 2026 because of its PHP-specific training data and HumanEval scores. StarCoder2 is the alternative: it is specifically trained on code from a wide range of programming languages, has strong PHP coverage, and produces clean WordPress-style PHP without needing a system prompt to specify coding conventions. The trade-off is that StarCoder2 7B scores slightly lower than Qwen 2.5 Coder 7B on HumanEval, but many developers find its completions more consistent in style on WordPress codebases specifically.
For teams on hardware with only 8GB VRAM, the choice between Qwen 2.5 Coder 7B and StarCoder2 7B comes down to evaluation on your actual codebase. Pull both models into Tabby, switch between them in the dashboard, and use each for a week of real WordPress development work. The model that produces more accepted completions on your specific codebase is the right choice. Neither model requires any system prompt configuration in Tabby because the FIM format communicates the task implicitly through the prompt structure, unlike chat interfaces that need explicit instructions about the output format and coding style expected.
Troubleshooting Common Tabby Setup Problems
The most common issue with new Tabby installs is the CUDA version mismatch between the Docker image and the host GPU driver. Tabby Docker images are compiled against specific CUDA versions. If your NVIDIA driver is older, the container exits immediately with a CUDA initialization error. The fix is to either update your NVIDIA driver to the version required by the current Tabby image (check the Tabby GitHub releases page for the CUDA version listed in each release), or pin Tabby to an older image version that matches your installed CUDA. Running nvidia-smi on the host shows your driver version and maximum supported CUDA version. Match that against the Tabby Docker tag.
The second most common issue is the VS Code extension showing a disconnected status even when the Tabby server is running. This happens when the endpoint URL in the extension settings uses localhost but the Tabby container is accessible only at 127.0.0.1 (or vice versa) depending on the network interface binding. Try switching between localhost and 127.0.0.1 in the extension settings. On Windows with WSL2, localhost sometimes resolves to the WSL2 network adapter rather than the host loopback. If the server is running in WSL2, use the WSL2 IP address shown by ip addr show eth0 rather than localhost.
- CUDA version mismatch: check nvidia-smi and match to the Tabby Docker image CUDA requirement
- Extension disconnected: try 127.0.0.1 instead of localhost, or the WSL2 IP on Windows
- Slow completions (3+ seconds): verify GPU is being used with docker stats, not CPU fallback
- Out of memory errors: switch to a smaller model (3B instead of 7B) or reduce context window
- No completions in PHP files: check that the file extension is in Tabby’s language support list in config.toml
Bottom Line
Tabby is the right choice for WordPress agencies that want a GitHub Copilot replacement with Copilot-level autocomplete speed (under 500ms) and are willing to dedicate a GPU to the server. The Docker install is straightforward, the VS Code and JetBrains plugins configure in under 5 minutes, and the codebase indexing feature makes it genuinely competitive with paid enterprise Copilot tiers for teams sharing a single inference server. The upfront hardware cost is real, but the ongoing cost is electricity and maintenance rather than a $10-19/mo per developer subscription.
If you do not have a GPU available or are evaluating whether local AI tooling is worth the investment before buying hardware, start with Ollama plus Continue.dev at zero cost and accept the higher latency. If you already have an old gaming GPU in a workstation or server, Tabby turns it into a team-grade autocomplete server in an afternoon. The 7B model on a GTX 1080 delivers fast enough completions to replace GitHub Copilot for most WordPress development workflows at zero per-month cost after setup.
If you have an old GPU collecting dust, it almost certainly runs the 7B Qwen Coder model well enough to replace your Copilot subscription. The one-time setup afternoon pays for itself in two months of avoided subscription fees.