Running an AI Content System in Production

Designing an AI content system is a product problem. Running one is a systems reliability problem. The second only starts when the first ends — and the gap between them is where the real learning happens.

I wrote recently about the design decisions behind my AI content system for douli.com — the one-hour-per-week constraint, the governance rules, the workflow I built before writing a single prompt. That post covered the blueprint. This one covers what the blueprint didn’t anticipate.

The assumption that didn’t survive contact

The first problem appeared on the first real content operation. I’d designed the system to produce native WordPress block markup — the format the editor needs, the format that makes posts visually editable. What it produced instead was raw HTML wrapped in a single classic block. Technically valid. Functionally wrong — hard to edit, inconsistent with the rest of the site, and a silent failure with no error message.

The fix wasn’t a better prompt. It was a persistent governance rule — a written specification of the exact block syntax required, saved in a way the system could recall across sessions. The lesson: format compliance isn’t something you can assume. It’s something you have to specify, verify, and re-enforce as a standing rule — not a one-time instruction.

What does “it worked” actually mean?

Every API call returned a success response. The WordPress REST API said 200. The content was saved. And on two occasions, the content was saved incorrectly — a paragraph in the wrong place, an external link missing its attributes.

A success response tells you the operation didn’t fail. It tells you nothing about whether it did the right thing. I added an explicit verification step after every content operation — a deliberate read-back that checks for specific signals: is the blockquote present? Is the TL;DR there? Does the author bio link to the right page? “Ran without errors” stopped being the acceptance criterion. The content itself became the acceptance criterion.

The problems weren’t AI problems. They were the same problems any multi-tool integration produces — authentication, state, verification, idempotency. The AI layer just makes them easier to miss because the output looks plausible.

Where the content landed versus where it needed to be

The system was designed to open every post with an answer-first paragraph — a direct statement of the post’s core argument, placed in the body before the first anecdote. This is the structure that AI systems extract when they cite a source. It’s not optional if the goal is AEO.

On the first post through the revised workflow, that paragraph ended up in WordPress’s excerpt field instead of the post body. Excerpts appear on archive pages and in RSS feeds. AI systems reading a post’s content don’t read the excerpt. The content was correct. The placement made it invisible to the systems it was designed for. It shipped looking right and working wrong.

This one required a checklist item, not a code fix: verify placement, not just presence. Two different things.

What happens when an operation runs twice?

A blocked response during a post-creation operation caused an automatic retry. The retry succeeded. So did the original — it had completed before the response was blocked. The result was two identical draft posts, one of which had to be manually deleted.

This is an idempotency problem — a class of failure that’s well-understood in API and systems design, and easy to forget about when you’re building a workflow rather than a system. The fix was an explicit check before any create operation: does a post with this slug already exist? If yes, update. Never create a second.

Design for: what happens if this runs twice? That question should precede every write operation in any automated workflow — regardless of whether AI is involved. The WordPress MCP Adapter documentation recommends starting with read-only operations before enabling write access — advice that makes more sense once you’ve seen what write operations can silently get wrong.

When your authentication silently expires

WordPress uses security tokens to verify that API requests are coming from a trusted, authenticated source. They’re short-lived by design. Between sessions, they expire.

The failure mode is subtle: if you assume the token from a previous session is still valid and reuse it in a new one, API calls fail silently. The content operation looks like it ran. Nothing was saved. There’s no obvious error — just a quiet 401 that you might not check for if you’re not looking.

The fix is simple but has to be enforced as a standing rule: fetch a fresh authentication token at the start of every session, before any read or write operation. Never carry credentials across session boundaries. This isn’t a WordPress-specific problem — it applies to any automated workflow that uses short-lived tokens to talk to an external API.

When your verification step is also wrong

After adding an opening summary to one of the posts, I ran a verification check to confirm the text was present. The check returned false — suggesting the content hadn’t saved correctly. The content was fine. The verification was wrong.

I’d written the check to look for the phrase “silently shaping”. The actual text in the post said “quietly shaping”. The wording had drifted slightly between drafting and saving, and the check string hadn’t kept up. The operation failed verification not because the output was wrong, but because the test was wrong.

The lesson compounds the earlier one: checking that an operation produced the right output is necessary — but the check itself can have bugs. Verification logic needs to be as deliberately designed as the workflow it’s verifying. Check for stable structural signals (is the blockquote block present? does the author bio contain the expected URL?) rather than exact phrases that might vary slightly in wording.

The pattern underneath all of these

None of these were AI problems. The AI performed well. What failed were the integration layers around it — assumptions about format compliance, about what success responses mean, about credential persistence across sessions, about what happens when operations partially succeed, about whether verification logic can itself be trusted.

These are the same failure modes you’d encounter in any system integrating multiple tools across sessions: authentication tokens that expire, operations that need to be idempotent, outputs that require verification rather than assumption. The AI layer makes them easier to miss because the output looks right. A correctly-formatted post with the blockquote in the wrong place and a correctly-formatted post with the blockquote in the right place look identical until you check.

The design post was about product thinking applied before the first prompt. This post is about systems thinking applied when the first prompt has long since been forgotten — when the workflow runs on its own and the question is whether it runs correctly, not whether it runs at all.

TL;DR

Designing an AI content system and running one require different thinking — the second starts where the first ends
The failures that emerged weren’t AI failures — they were integration failures: format assumptions, verification gaps, placement errors, idempotency, credential expiry
A success response tells you the operation didn’t fail — it tells you nothing about whether it did the right thing
Authentication tokens are ephemeral — fetch a fresh one at the start of every session, never carry them across session boundaries
Verification logic can be wrong too — check stable structural signals, not exact phrases that might vary in wording
Every automated workflow needs three explicit design decisions: what counts as correct output, what happens if an operation runs twice, and where output needs to land — not just what it needs to say

Delphine Ragazzi is a Product Owner with 20 years of experience across digital analytics, CRO, and product delivery. She writes about product decisions, data, and AI at douli.com.