Humanizing AI-Written Content: What a Rule Engine Actually Catches

I have been writing on the internet for a while. Lately my drafts keep coming back from a self-review feeling a little plastic. The sentences read fine. The logic holds. The structure stays clean. But the prose feels machine-made rather than written, even when I wrote every word.

I built a rule engine. Not to generate content. To flag it.

Six rules, about a hundred lines of TypeScript, pluggable into the tool chain I use to publish. The engine runs before I hit submit on Dev.to or my own blog. Three months of using it has taught me something surprising: the rules that catch the most are not the ones I expected.

The rules, in order of how often they fire

1. Em dashes

This one fires more than any other. Not because em dashes are inherently bad but because the tools that generate prose love them. Open any model-written blog post from the last two years and count the em dashes. You will run out of fingers on one hand in the first paragraph.

I do not ban them entirely in my own writing, but I flag them so I have to look at each one. Most of the time the em dash can be a period or a comma without losing anything. When it cannot, I leave it in. The rule is not the rule. The awareness is.

2. Banned words

My list is short: delve, landscape, leverage, utilize, game-changer, revolutionary.

I did not make this list. It leaked out of the research on detecting AI-written content. These six words are statistically over-represented in model output because training data rewarded them. You never need any of them. The first is a longer way to say use. The second is what someone types when they have not thought hard enough about what the thing actually does. The third is a clerical verb that stands in for the simpler look at or dig into.

Striking these words forces a writer to commit to specifics. That alone has improved my drafting more than any other rule.

3. Cliche opener

The pattern: In today's rapidly evolving ... and its five cousins.

I have not written that opening in my life, but the humanizer still catches variants that slip in. "As the world of X continues to transform..." is the same opener with a different coat. If the opening sentence would work as the intro to a dozen different posts, the sentence is not actually an opening. It is filler.

4. Bold-colon list items

The pattern:

Fast: 10x faster. Simple: Zero config. Cheap: 90% less.

This is the AI tic that tipped me off first. Models love this formatting because it looks crisp, but it is lazy. Each bold-colon bullet is a sentence fragment standing in for a real sentence. The reader's eye skims and absorbs nothing.

When I rewrite one of these as prose, the content gets better. Half the time I discover the original bullet was bluffing and the prose version forces me to explain what I actually meant.

5. Future-looks-bright ending

The pattern: a post that ends with exciting times ahead or the future looks bright or some variant. The ending that says nothing because the writer ran out of conclusion and reached for a warm fade-out.

The fix: either a concrete next action ("try it yourself at X") or a genuine uncertainty ("I still do not know if this scales past a hundred users, but I will write an update when I do"). Both beat the warm fade.

6. Monotone rhythm

This is the rule I almost did not include and am glad I did. It measures the standard deviation of sentence lengths across a piece. If every sentence is roughly the same length, the piece reads robotic even when the words are fine.

Human writing breathes. Short sentences punch. Long sentences explain. The mix is what makes prose feel alive. The monotone-rhythm rule flags pieces where the mix is missing, and it catches things no other rule does.

What a rule engine cannot catch

The rules are aware of patterns. They are not aware of meaning. Three things I have had to stay on top of manually:

False confidence. A draft can clear every rule and still read wrong. The engine does not know if your facts check out or your argument holds. The engine only knows your rhythm flows well and you avoided the usual six.

Genuine voice vs. trained voice. The rules can flag AI tics. They cannot tell the difference between "this writer is a natural AI-sounding writer" and "this writer used a model to generate the draft." I have caught myself on a few edits where my own instinct produced a cliche the rule correctly flagged. The rule was right. My draft was AI-sounding even though I wrote it.

Topic-specific jargon. A blog post about kubernetes has different rhythm constraints than a blog post about gardening. The rules are genre-agnostic. That is a feature, not a bug, but it means the threshold is tuned for technical prose and might misfire on poetry.

Why I run it on my own drafts

The obvious use case is catching AI-generated content from other sources. That is not why I built it.

I built it because the AI era has raised the floor on writing clarity. Every model can produce technically-correct, logically-structured, grammatically-clean prose. Which means human writing that wants to stand out has to do something the machine cannot do yet: feel like a specific person wrote it.

The humanizer is not a shield against AI. It is a discipline for the human. When my own draft gets flagged, the flag is teaching me something about which patterns I have been picking up from reading too much model output. Each violation is a little reminder that I got lazy in a way that coincides with how a model would have been lazy, and that I can do better.

Three months in, my drafts clear the rules on the first pass more often than they did at the start. Not because the rules got easier. Because I started writing with them in mind.

The rule engine, minimal version

I am not going to paste the whole thing, but the shape is simple. One hundred lines, six rules, pure functions, no dependencies except a regex engine.

const rules = [
  emDashes,
  bannedWords(['delve', 'landscape', 'leverage', 'utilize', 'game-changer', 'revolutionary']),
  clicheOpener,
  boldColonList,
  brightFutureEnding,
  monotoneRhythm,
];
 
export function checkRules(md: string): Violation[] {
  return rules.flatMap(r => r(md));
}

Each rule is small enough to hold in your head. That is the point. If a rule is too complex to explain in a paragraph, it is too complex to be a rule. It should be a conversation with a human editor instead.

The unreasonable rule

If I had to keep only one rule it would be the monotone rhythm check. Everything else the reader might forgive on a single post. A piece that reads at one cadence all the way through cannot hide. The reader's ear is calibrated for variance. Remove the variance and the reader disengages by the second paragraph, even if they do not know why.

The other five rules catch specific tics. This one catches the absence of a specific thing. Both are useful. The absence-catcher is the one that moves the needle most.

What I would build next

A voice profile. Right now the humanizer flags patterns against a universal baseline. It does not know what I specifically sound like. If I fed it my last twenty posts and let it compute my rhythm, my typical sentence length, my idioms, it could flag when a draft drifts away from my voice specifically. That would catch the cases where a draft is technically-correct AI-sounding prose that still clears all six rules.

Not building it this quarter. Writing this down so I remember it exists.

If you write anything on the internet that has to sound like you, consider running something similar on your own drafts. The tool does not replace editing. It flags the places where editing would help most. That is enough.