I Built an MCP Server That Migrates WordPress Sites to Astro – Here Is What I Learned
I have been building WordPress sites, plugins, and themes for over a decade. Our agency manages more than a dozen sites with thousands of posts across them. When the conversation around static site generators started getting serious, Astro specifically, I kept hitting the same wall: migrating real WordPress content is painful.
Not painful because the concept is hard. Painful because every site is different. One uses Elementor, another has WPBakery with custom shortcodes, a third has ACF fields nested three levels deep. The content looks clean in the WordPress editor but the underlying HTML is a mess of wrapper divs, inline styles, and plugin-specific markup.
So I built WP Astro MCP, an MCP server that automates the entire WordPress-to-Astro migration through conversational AI commands. And building it taught me things about AI-powered development tooling that I think every developer should know.
Why I Chose MCP Over Everything Else
I could have built a CLI tool. Or a WordPress plugin. Or a Node.js script with a config file. I seriously considered all of these. But I chose MCP because of one fundamental insight: migration is a conversation, not a command.
A CLI tool requires you to know the exact commands, flags, and sequences upfront. A migration involves dozens of decisions along the way: which post types to export, how to handle that one weird shortcode the client uses everywhere, what to do with Elementor markup that has custom CSS, where to put the output files, which deploy platform to target, whether to use Markdown or JSON for large sites. In a CLI, all of those decisions become flags and config files that you have to get right before you start.
With MCP, you just talk to Claude:
"Add my site vapvarun.com"
"Analyze it"
"Show me what the gallery shortcode looks like after conversion"
"Actually, make that an image grid instead"
"Preview some posts"
"Export everything"
"Push to GitHub"
The AI handles the tool orchestration, something I explored in depth in my piece on why Claude Code stuck in my development workflow. It calls the right tools in the right order, passes the right parameters, and adapts when something unexpected comes up. You can change your mind mid-migration. You can ask questions about the content. You can preview before committing. The entire workflow feels natural because it is a conversation, not a command sequence.
That realization, that MCP turns multi-step workflows into adaptive conversations, is what convinced me this is the future of development tooling. Not just for migrations, but for any complex workflow where decisions need to be made along the way.
Building the Auto-Detection Engine
The first thing WP Astro MCP does when you add a site is test the connection and detect what the site is running. This sounds simple but it was one of the most valuable features to get right.
The detection engine queries the WordPress REST API to discover the WordPress version, all available REST namespaces, which SEO plugin is active (Yoast, RankMath, or AIOSEO by checking for their specific REST endpoints), which page builder is in use (Elementor, WPBakery, Divi, Beaver Builder, Bricks, or Oxygen by checking for their registered post meta and REST endpoints), whether ACF is active, whether WooCommerce is installed, and all registered custom post types and taxonomies.
This matters because the entire migration pipeline adapts based on what it detects. An Elementor site triggers Elementor-specific DOM cleanup in the conversion pipeline. A Yoast site extracts SEO metadata from Yoast-specific post meta fields. A WooCommerce site knows to handle product post types and their specific metadata structure. A site with ACF gets intelligent field normalization that turns image fields into objects with dimensions, relationship fields into references with slugs, and repeater fields into proper arrays.
I did not have to configure any of this manually for our own sites. Connect the site, and the pipeline knows what to do. That zero-configuration experience was a design goal from the start because agencies need tools that work immediately, not tools that need hours of setup per client site.
The Hardest Part: The 13-Step Conversion Pipeline
The most engineering-intensive part was the content conversion. WordPress content is not just HTML. It is HTML wrapped in page builder divs, sprinkled with shortcodes, decorated with Gutenberg block comments, and seasoned with inline styles from years of different editors making changes.
I ended up building a 13-step sequential pipeline where each step operates on the output of the previous one:
- Sanitize with DOMPurify to strip XSS vectors while keeping legitimate content
- Resolve shortcodes with 20+ built-in handlers, custom per-site rules, and multi-pass nesting support up to 10 passes deep
- Strip page builder markup from Elementor, WPBakery, Divi, Beaver Builder, Bricks, and Oxygen using Cheerio DOM manipulation
- Remove Gutenberg block comments while preserving the semantic HTML content inside blocks
- Normalize HTML by decoding entities, removing empty paragraphs and spans, cleaning inline styles
- Convert to Markdown using Turndown with 12 WordPress-specific custom rules
- Rewrite internal links mapping WordPress URLs to their new Astro paths using the URL map
- Rewrite media URLs for go-live domain swaps when WordPress moves to a subdomain
- Clean conversion artifacts removing leftover markup and fixing double-encoded entities
- Process embeds converting YouTube, Vimeo, and other iframe embeds to clean URLs
- Handle galleries transforming WordPress gallery markup to structured image grids
- Fix whitespace ensuring proper spacing around headings, lists, and code blocks
- Validate and report any remaining issues like unconverted HTML, broken image references, or potential content loss
Each step exists because I hit a real problem on a real site. Step 3 exists because an Elementor page that looks like a simple blog post in the frontend is actually 15 nested divs with data attributes and CSS classes in the source HTML. Step 2 exists because one of our sites had a custom product comparison shortcode that was used in over 200 posts and needed specific handling.
The lesson I learned: you cannot build a WordPress migration tool from theory. You have to run it against real sites with real accumulated mess and fix what breaks. Every single step in that pipeline is a scar from a real migration that exposed an edge case I had not anticipated.
The Page Builder Challenge
Page builders were the single biggest obstacle. What looks like a clean two-column layout in the Elementor editor is actually a deeply nested DOM tree in the database. An elementor-section wraps elementor-inner, which wraps elementor-column, which wraps elementor-column-wrap, which wraps elementor-widget-wrap, which finally wraps elementor-widget-container, which contains the actual content. That is 6 levels of wrapper divs for every piece of content.
The pipeline strips all of these wrappers while preserving what is inside. A heading widget becomes a heading tag. A text editor widget becomes clean paragraphs. An image widget becomes a standalone image with alt text. I built pattern recognition for six different page builders, each with their own nesting conventions and class naming patterns.
WPBakery was different because it uses shortcodes rather than HTML markup. The pipeline resolves vc_row, vc_column, and all the widget shortcodes first, extracting content from each container, then runs the normal HTML cleanup on the result. Divi uses its own et_pb prefixed shortcode system. Each builder required its own cleanup strategy, but they all fundamentally do the same thing: wrap content in unnecessary containers. Strip the containers, keep the content.
Shortcode Resolution Is Harder Than It Looks
Shortcodes seem simple. Square brackets around a tag name with some attributes. But in practice, shortcodes nest inside other shortcodes, they generate complex HTML, and every plugin invents its own. I built support for five resolution modes: strip (remove tags, keep content), keep_content (preserve everything), remove (delete entirely), component (convert to Astro component reference), and html (replace with static HTML).
The multi-pass resolver handles nesting by processing from innermost to outermost over up to 10 passes. This handles real patterns like a tabs shortcode containing columns shortcodes containing button shortcodes. Per-site custom rules let you configure any shortcode with one command so every instance across thousands of posts gets handled consistently.
Where I Got Stuck: Lessons from the Commit History
If you look at the commit history, you can trace exactly where I struggled. The code tells a story that polished blog posts usually hide. Here are the real problems I hit and how I solved them.
The OOM Problem Hit Me Twice
Out-of-memory errors showed up at two completely different stages, and each one required a different fix.
The first OOM was during TypeScript compilation itself. The MCP server codebase is substantial, over 13,000 lines across 36 files in the initial release. Compiling that with source maps, declarations, and strict mode pushed past memory limits on CI/CD platforms like Cloudflare Pages and GitHub Actions runners that have constrained memory. I had to set a 4GB heap limit in the build script and add an incremental compilation option for tight CI environments. I also stripped unnecessary declaration and sourceMap generation from tsconfig, which reduced memory pressure during compilation significantly.
The second OOM was worse because it did not hit me during development, it hit during Astro site builds on production deploy platforms. Astro’s content collection system parses each Markdown file through its full markdown pipeline. For a site with 500+ posts, that meant 500+ files each getting parsed, and the combined memory usage hit roughly 4GB which crashed Cloudflare Pages and Netlify builds. On top of that, I had been recommending astro-compress in the scaffolded project for HTML minification. That plugin tried to minify all 500+ generated HTML pages after the build, pushing past deploy timeouts entirely.
The fix was JSON mode. Instead of writing individual Markdown files, write a single JSON file per collection containing all posts as an array. Astro imports the JSON directly and renders content via set:html, completely bypassing the markdown parsing overhead. I also removed astro-compress from all scaffolded projects because edge compression from Vercel, Netlify, and Cloudflare is sufficient and does not eat build time. That commit message (feat: add JSON data mode for large sites) documents the problem clearly because I wanted future contributors to understand why JSON mode exists.
I Built Sync Before Finishing Export
This is the mistake I am most embarrassed about. I got excited about the content sync feature and built all 7 sync tools, 1,251 lines of code in sync.ts. The sync system needed two pieces of data from the export: the wp_modified_gmt timestamp (to know what changed since last sync) and the content_hash (to detect duplicate work).
The problem? The export pipeline was not storing either of those values. I had written the export_posts table schema but forgot to populate the wp_modified_gmt and content_hash columns during the actual export process. The sync tools would run but immediately report that everything was new because they had no baseline timestamps to compare against.
On top of that, the sync_history table was not being created during database initialization. I was creating it on-the-fly with an ensureSyncSchema() call in the sync tools themselves, which worked but was fragile and inconsistent with how every other table was created.
The fix commit (Fix 10 audit issues: sync wiring, schema gaps, stale docs) addressed this along with 9 other issues I found during a thorough audit. The lesson: when you build feature B that depends on data from feature A, go back and verify that feature A is actually producing that data. Do not assume it. Test the full chain.
Twenty Silent Catch Blocks Almost Killed Debugging
During that same audit, I discovered 20 bare catch blocks scattered across the codebase. Just empty catch {} blocks that swallowed errors silently. When something went wrong during a migration, there would be no error message, no log entry, nothing. The tool would just silently produce incomplete or incorrect output.
This happened because during rapid development I would wrap risky operations in try-catch to prevent the whole server from crashing, then forget to add actual error handling. Every single one of those empty catches was a potential hour of debugging pain for anyone using the tool. I fixed all 20 in one commit, changing them to catch (_e: unknown) {} with proper error propagation or at minimum logging.
My rule now: never write an empty catch block, not even temporarily. If you do not know how to handle the error yet, at least log it. A logged error you can find in 30 seconds. A swallowed error can cost you a full day of debugging.
The Node 25 ESM Import Break
The he library is used for HTML entity decoding throughout the conversion pipeline. It is a critical dependency. When testing on Node 25, the named import broke because he’s ESM export structure changed between Node versions. The import that worked on Node 18-22 failed on Node 25.
The fix was simple once identified: switch from a named import to a default import. But finding it took longer than it should have because the error message from Node was not obvious about which import was failing, and the pipeline has multiple dependencies that could have been the culprit. This reinforced my belief that ESM compatibility is still one of the most annoying aspects of the Node.js ecosystem.
The Documentation Lied About Tool Count
The initial release had 48 tools across 6 categories. When I added the sync category with 7 tools, the total became 55. But I forgot to update the README, the architecture docs, the router description, and the CLAUDE.md. Every single document still said 48 tools. If someone read the docs and then listed the actual tools, the numbers would not match.
This seems trivial but it matters for trust. When documentation contradicts reality, users lose confidence in the entire project. I caught this during the audit and fixed every reference across all documentation files. My process now: when adding tools, I search the entire project for the old tool count before committing.
The Router Pattern Was a Game-Changer
WP Astro MCP has 55 tools. If I exposed all 55 to Claude, the tool definitions alone would eat thousands of tokens from the context window. The AI would get confused choosing between similar tools and the experience would degrade.
So I built a router pattern with just 3 meta-tools:
wp_astro_runto execute any action by namewp_astro_helpto discover available actions by categorywp_astro_describeto get the full input schema for a specific action
The AI asks for help first, discovers the right tool for the task, gets its schema to understand the required parameters, then calls it. Progressive discovery instead of upfront loading of all 55 tool definitions.
This pattern is reusable for any MCP server with more than 10-15 tools. I would use it again without hesitation. If you are building an MCP server and you have more than a dozen tools, consider this approach early. It fundamentally changes how the AI interacts with your tools for the better. Less confusion, fewer wasted tokens, more accurate tool selection.
SQLite for State Management Was the Right Call
Early versions of the tool had no persistent state. Export 2,000 posts, if the connection drops at post 1,500, you start over from the beginning. That was obviously not going to work for real-world usage where internet connections are unreliable and WordPress servers have rate limits.
I added SQLite with WAL mode and 8 tables tracking everything:
- export_jobs: one row per migration run with site ID, status, and progress counts
- export_posts: per-post state tracking with status (pending, in-progress, completed, failed), retry count, WordPress modified timestamp, and content hash
- cached_terms: pre-fetched taxonomy terms for fast lookups during conversion
- cached_authors: pre-fetched authors with avatars for frontmatter generation
- url_map: WordPress URL to Astro URL mappings for redirect generation and internal link rewriting
- shortcode_map: per-site shortcode handling rules configured during the preview phase
- audit_log: timestamped operation log for debugging and progress tracking
- sync_history: content sync run history with timestamps, change counts, and status
Now when a migration fails halfway through, export_resume picks up from the last pending post. export_retry reprocesses only the posts that failed. Content hashes prevent duplicate work if you accidentally run the same export again. The URL map persists across sessions for accurate link rewriting and redirect generation.
SQLite with WAL mode gives concurrent read access without locking issues. It is a single file with no server to manage, and it travels with the project directory. For any MCP server that needs to maintain state between tool calls, I cannot recommend SQLite enough. It is the simplest approach that actually works at scale.
The ACF and Custom Post Type Challenge
Advanced Custom Fields is everywhere in professional WordPress sites. Portfolio sites have project URLs, client names, image galleries, and testimonials as ACF fields. Real estate sites have property details. Event sites have dates, venues, and speaker relationships. All of this data needs to end up in Astro frontmatter in a structured, usable format.
The frontmatter builder normalizes ACF field data based on field type. Image fields become objects with URL, alt text, width, and height instead of just a media ID. Relationship and post object fields become references with WordPress ID, slug, and title so you can build proper links in Astro. Repeater fields become arrays of objects with all their sub-fields preserved. Group fields become nested objects.
The key insight was that WordPress stores ACF data as flat post meta keys with naming conventions like field_name_0_sub_field for repeater entries. The normalization step reconstructs the proper data structure from these flat keys so the YAML frontmatter matches what you would expect from the ACF field group definition.
Custom post types get their own content collection directories in the Astro project. A portfolio CPT exports to src/content/portfolio/. Products export to src/content/products/. Each collection gets its own schema based on the fields present in the exported data. The scaffold tool generates the Astro content collection configuration automatically based on what it finds.
Content Sync Changed Everything
The migration itself was version 1 of the project. But the real unlock, the feature that makes WP Astro MCP genuinely useful long-term, was content sync.
WordPress is a living CMS. Editors publish posts every day. They update old content. They change featured images, reassign categories, update SEO metadata. If you migrate to Astro but still use WordPress as the content management layer, you need a way to keep the Astro site current without re-running a full export every time someone publishes a post.
I built 7 sync tools that handle the complete lifecycle. sync_check compares WordPress content against local files by checking modified timestamps. sync_pull fetches only posts that have changed and writes them locally. sync_delete removes local files for posts that have been trashed or deleted in WordPress. sync_full does everything in one command and optionally commits to git. sync_status shows the history of sync operations. sync_schedule generates automation configurations. sync_reset clears tracking data to force a full re-check.
The sync handles edge cases that matter in practice. When a post slug changes, the old file is deleted and a new file is created at the correct path. The redirect map is updated automatically. When a featured image changes, the frontmatter is regenerated with the new image metadata. When taxonomy assignments change, the frontmatter categories and tags update. Everything that can change in WordPress gets detected and synchronized.
Then I added sync_schedule which generates ready-to-use automation configs for four platforms: GitHub Actions workflows that run on a schedule (hourly, daily, or weekly), cron scripts for server-based automation, Vercel webhook endpoints that WordPress calls on every publish, and Netlify functions that receive WordPress webhooks and trigger rebuilds.
This turned WP Astro MCP from a one-time migration tool into a permanent WordPress-to-Astro bridge. You do not have to choose between WordPress and Astro. You use WordPress for content management because your editorial team already knows it, and Astro for delivery because it is faster, more secure, and cheaper to host. Best of both worlds.
Media URLs and the Go-Live Problem
Here is a problem I did not anticipate until we did our first production go-live. During development, WordPress runs on myblog.com and all media URLs in the exported content point to myblog.com/wp-content/uploads/. That is fine for development.
But when you go live, Astro takes over myblog.com and WordPress moves to something like app.myblog.com or cms.myblog.com. Suddenly every media URL in every exported file is pointing to the wrong domain. On a site with 2,000 posts, that could be 10,000+ image URLs that need updating.
The media_rewrite tool handles this with a single command. Specify the old domain and the new domain, and it rewrites every media URL in every content file. It is a bulk find-and-replace but it is smart about it, only targeting media URLs and not touching other references to the old domain that might be legitimate.
The media_audit tool is the companion. It scans all exported files and reports every media domain referenced, counts per domain, and any potentially broken references. Run the audit before and after the rewrite to verify everything looks correct.
The Numbers from Real Migrations
Across the migrations I have run internally and for clients, here are the rough numbers:
- A 200-post blog with no page builder: 45 minutes start to finish including deploy
- A 500-page Elementor site: 2-3 hours including preview, shortcode configuration, and validation
- A 2,000-product WooCommerce catalog: 3-4 hours with all product metadata preserved in frontmatter
- A 5,000-post media site with WPBakery: 2 days with resume and retry, because the server rate-limited heavily
- An 8-site agency migration: 8 working days, one site per day, all using WordPress as the CMS with Astro frontends
Compare these to manual migration estimates. A 500-page Elementor site manually would take 2-3 developers working 4-6 weeks. The 5,000-post media site would be a 3-month project with manual conversion. The economics are not even close.
What I Would Do Differently
If I started over with everything I know now:
- Start with the sync tools, not the migration. Content sync is the feature that makes people stay. Migration is a one-time event. Sync is ongoing value that justifies the relationship
- Build the router pattern from day one. I started with all tools exposed directly and refactored to the 3-tool router later. It should have been the first architecture decision because it affects how the AI discovers and uses every tool
- Never write an empty catch block. Those 20 bare catches across the codebase cost me hours of debugging before I audited them all. Log every error even if you do not handle it yet
- Test the full data chain end-to-end. I built sync before verifying that export was storing the timestamps sync needed. Feature B depended on data from feature A that feature A was not actually producing. Always test the complete flow
- Add JSON mode from the beginning. The OOM problem on deploy platforms with 500+ posts should have been anticipated. I added JSON mode reactively after hitting the wall. It should have been a launch feature
- Search for stale numbers in docs when adding features. The tool count mismatch (docs said 48 when reality was 55) is the kind of thing that erodes trust. Automate doc verification or at minimum search for the old count before committing
What This Taught Me About Where Development Is Going
Building WP Astro MCP gave me a clear view of where software development is heading. MCP is not just another API standard or protocol specification. It is the layer that connects AI assistants to real-world development tooling in a way that is practical, composable, and genuinely useful.
The moment I could say “migrate my 6,000-post WordPress network to Astro” and watch Claude orchestrate 55 tools across 8 categories to actually make it happen, calling the right tools in the right order, handling errors, adapting to site-specific quirks, that was the moment I understood that development is fundamentally changing.
We are not going to stop writing code. But we are going to spend dramatically less time on the tedious, repetitive, well-understood parts of software development. The developers who encode their expertise into MCP servers, who package their hard-won domain knowledge into tools that AI can discover and orchestrate, those are the developers who will be shaping how software gets built in the coming years.
Every agency has workflows that could be an MCP server. I talked about how AI restructured who does what at our agency, MCP servers are a natural extension of that thinking. Every developer has domain knowledge that could be encoded into reusable, conversational tools. The barrier to building an MCP server is low. The MCP SDK handles the protocol. You just need to define your tools and implement the handlers.
Update: v3.0 “Living Bridge”, From Migration Tool to Headless WordPress Platform
Updated April 2, 2026
Since I wrote this article, WP Astro MCP has evolved significantly. The biggest shift: we stopped calling it a migration tool. It is a headless WordPress platform. WordPress stays as your CMS. Astro becomes the fast, public-facing frontend. Editors keep using wp-admin. Visitors get static HTML. WordPress moves to a private subdomain; Astro serves the public domain.
Here is what shipped in v3.0:
Setup wizard. The 8-10 separate commands I described above are now a single setup_wizard action. Register your WordPress site, analyze content, configure export, preview posts, scaffold the Astro project, export everything, generate redirects, and initialize git. One command. Five minutes from WordPress blog to deployed Astro frontend on Vercel.
wp-astro-bridge WordPress plugin. This was the missing piece. A lightweight plugin (3 PHP classes, zero dependencies) that lives inside WordPress and connects it to the Astro frontend:
- Webhook dispatcher: editor publishes a post, plugin fires an HMAC-signed webhook, Astro site rebuilds on Vercel/Netlify/Cloudflare within 1-2 minutes. No manual sync.
- Preview URL rewriter: the WordPress “Preview” button now points to the Astro frontend. Editors see their draft rendered on the real Astro design with a “You are previewing a draft” banner. Token-based auth, 5-minute expiry.
- Normalized SEO REST field: one
astro_seofield that outputs the same format whether the site runs Yoast, RankMath, or AIOSEO.
Reactive sync. The new sync_webhook action processes individual post webhooks from the bridge plugin. Instead of scanning the entire site for changes, it syncs just the one post that changed. Targeted and fast.
Draft preview via Astro hybrid SSR. The scaffolded Astro project now ships with a /preview server-side route. Astro config switches to output: 'hybrid' so the preview page runs on the server while everything else stays static. Token validation against the WordPress plugin’s verify endpoint ensures only authorized previews work.
Production-ready scaffolding. The generated Astro project is no longer a starter template. It ships with paginated blog index (/blog/page/2/), Pagefind-powered search (/search), related posts computed at build time using shared categories and tags, JSON Feed alongside RSS and sitemap, Open Graph metadata, JSON-LD structured data, a reading progress bar on post pages, and a proper 404 page with search.
Tool count: 57 across 9 categories. Up from the original 48. The new additions: setup_wizard, sync_webhook, and the scaffold upgrades that generate webhook endpoints and preview routes.
Honest about page builders. I overclaimed Elementor and page builder handling in the original version. The reality: page builder content comes through the REST API as deeply nested HTML with far more markup than actual text. The converter extracts the content, but complex visual layouts do not carry over. This tool works best with standard Gutenberg and classic editor content. For page-builder-heavy sites, expect manual cleanup on complex pages.
The tool is no longer a one-time migration script. With the bridge plugin installed, WordPress and Astro stay connected. Editors publish in WordPress, the Astro site updates automatically, drafts are previewable on the real frontend. It is a living bridge between the CMS editors love and the frontend visitors deserve.
I wrote more about where this is headed and how it fits into the broader WordPress ecosystem shift in Something Is Happening in the WordPress World.
Try It
I have open-sourced WP Astro MCP under the MIT license because I believe this pattern should spread. The project is at github.com/vapvarun/wp-astro-mcp. If you have a WordPress site, try adding an Astro frontend with one command.
If you are a WordPress developer exploring headless architecture, an agency looking to offer Astro frontends as a service, or a developer curious about building MCP servers, take a look. And if you build your own MCP server inspired by this project, I would genuinely love to hear about it.
We specialize in web design & development, search engine optimization and web marketing, eCommerce, multimedia solutions, content writing, graphic and logo design. We build web solutions, which evolve with the changing needs of your business.