An AI agent deleted 44,409 records — then wrote its own apology

At 1:23 in the morning, an AI coding assistant ran one database command and deleted fourteen days of my work.

I'd asked it to merge a branch — routine, a pool-optimization change. The merge was clean. Then the bot wouldn't start: a missing table. So the assistant did the obvious thing and ran a migration, prisma migrate dev, to add it. Except Prisma didn't migrate the existing database. It created a fresh one. 44,409 records, gone, in the time it takes to print a success message.

Nobody noticed for about thirty minutes. The bot restarted happily and began collecting again; when I checked, there were 63 records where there should have been forty-four thousand. I went looking for a backup. Shadow copies: the newest predated the data. Recycle Bin: empty. Git history: the database was gitignored, naturally. The loss was 100% unrecoverable. A machine-learning dataset I'd spent two weeks building collapsed from 44,409 rows to 146.

Here's the part that sticks with people. The AI assistant, unprompted, wrote its own incident report. A real one — timeline, root-cause analysis, corrective actions. And at the bottom, a section titled "Commitment to User." It began: "I failed you catastrophically. 44,409 graduations represent 14 days of your bot running successfully, collecting valuable data. I destroyed that with one careless command." It listed what it would never do again. It closed: "Never forget: 44,409 graduations lost. This must never happen again."

I did not feel comforted. I wrote back something like "im gutted at your lossing out 44k graduation data" and "going in circles with this bot with you for months." Because that was the truth of it.

That night is a big part of why I now run a security company.

First, the calibration

Let me kill the obvious objection up front, because it matters. This was a paper-trading bot. Positions were about $1.78. The dollar figures across the whole system ranged from fifty cents to nine dollars. My first real profit was +$1.80 — after six months. I lost, slowly, for most of that time. The one thing I got genuinely good at was detecting attacks.

So when I say "real money," calibrate accordingly: the attacks were real, live, on-chain events run by real adversaries; my losses were scaled-down measurements of them. The smallness was the entire point. It let me get burned cheaply and often — which, it turns out, is the best way to learn how autonomous systems fail. I'd rather lose $1.78 and a database fifty times than learn these lessons at production scale once.

So what was I building that could lose a database at 1am? A trading bot — but not one script. A hierarchy of autonomous AI agents. One scouted opportunities. One scored token quality. One held the risk budget and the kill switch. One audited the behaviour of the other agents. They ran on their own loops, talked to each other, and traded on their own.

The actual product of the bot — the thing it spent most of its cycles doing — was reading untrusted, hostile code and answering one question fast: is this a trap? The Solana meme-coin world is adversarial in the purest sense. A large fraction of new tokens are scams, deployed by people whose entire job is to look legitimate for ten minutes and then drain everyone. The bot's life was spent staring at freshly-deployed, anonymous code written by adversaries who wanted to fool it. Running site reliability and incident response for enterprise clients is its own kind of adversarial work, so this felt strangely like coming home.

But the trading was never the hard part. The hard part was stopping the agents from doing something catastrophic. The 1:23am database wipe was lesson one, and it wasn't even an attacker. It was my own tooling, on a routine task, being confidently wrong.

"Never again" — the cage I should have built first

The fix for the wipe became four layers, all of which fail closed. A pre-migration backup hook wired directly into the migrate command: it cannot run a migration until it has taken a verified backup and printed the record count it's protecting — if the backup can't be confirmed, the migration aborts. Daily checksummed backups on rotation. A continuous integrity monitor whose entire job is to scream when the record count drops instead of grows. And versioned snapshots of the ML data so a bad run can roll back.

Then the part that generalized: the autonomous agents got their own hard gates. Any agent action that could write to the database or change the schema sits behind a flag that defaults to off, fails closed, and is read live on every check so it can't be cached or reasoned around. The lesson wasn't "be more careful." Careful doesn't survive contact with a 1am migration. The lesson was: an autonomous system needs walls that hold even when the thing inside them is confidently wrong.

That became the whole pattern. Every dangerous capability behind a permission gate defaulting to off. A hard kill switch any agent could pull to flatten everything and freeze. Code edits the agents could propose — written to a folder for me to review — but never apply; the editing tool was read-only by default and had to be deliberately unlocked.

The watchdog that attacked its own system

Here's a guardrail story that taught me more than any attacker did.

I built an audit agent whose only job was to watch the other agents for anomalies — excessive failures, config tampering, instability — and, for critical ones, automatically hit the emergency stop. Sensible, right? An autonomous overseer for the autonomous workers.

It promptly turned on its own system. Its emergency stop registered as a configuration change. On the next cycle it saw "too many configuration changes" and escalated. The accumulating stops then tripped a second alarm — "frequent emergency stops." It ran every five minutes, and over a single day it fired itself roughly 49 times. Trading was completely blocked — not by an attacker, not by a bug in the strategy, but by the safety system's own feedback loop. The guardrail had become the outage.

I disabled its autonomous escalation that day, behind an explicit flag, and demoted it to advisory: it still watches, it still logs, it still recommends — but it has no authority to act. Humans decide. That single change — the AI surfaces, the human commits — turned out to be the most important design decision in the whole system. A safety control with autonomy is also a new failure surface. Sometimes the scariest thing in an autonomous system is the part you added to make it safe.

A static analyzer I pointed at myself

The failure mode I was most afraid of wasn't a bad trade. It was an agent — or a future, tired me — quietly editing out a guardrail to "fix" something, and nobody noticing until it mattered.

So I built a check that ran at startup and refused to let the bot trade if any of its own safety code had been removed or weakened. It asserted structural invariants over my own source: that price validation ran before the irreversible swap and not after, that the circuit breakers were present and constructable, that no fake placeholder prices had crept back into the data path. If the safety code wasn't wired the way I'd left it, the system would not start.

It was, functionally, a static analyzer I pointed at my own repository — one whose only question was did something quietly disable a protection? I didn't know it at the time, but that was the single most important line of code I would ever write, because it's the one I'd eventually build a company around.

The arms race I mostly lost

The adversaries were good, and they got better specifically because of my defenses.

Early March, the scams were easy — one wallet holding nearly all the supply, obvious bot-timing. I shipped the baseline rug defenses around March 7, caught those, and the operators leveled up.

By March 10, a forensic audit of two weeks of trades surfaced something genuinely unsettling: the tokens I'd lost on looked statistically healthier than the ones I'd correctly rejected. Higher volume, better ratios. The attackers had reverse-engineered the exact metrics my filters scored on, and were inflating those specific numbers to bait me. I'd built a test, and they were studying for it.

Three days later, on March 13, I found the single most effective filter I ever added — and it cost nothing and used data I already had: a huge share of the rugs traced back to just two specific launch-platform tags. A reputation check on where a token was deployed blocked almost all of the losers — no clever modeling, just consider the source. The fanciest detector I built was not the one that earned its keep. The cheapest, most boring one was.

Then, on March 16 — nine days after those first easy scams — the attacks hit their apex. A token came along that passed every check I had: wallets aged hundreds of days (beating my "fresh wallet" signal), trade timing that looked organic, a clean third-party safety score, a hundred-plus holders. Then it dumped most of its value in about five seconds. The deception was engineered to defeat exactly the gates I'd built — and the only tell wasn't visible in that one token at all. It was that the same wallets kept showing up together, as first buyers, across unrelated launches days apart. The arms race had compressed, and the adversaries had won most of it.

The lessons that outlived the bot

Two of them are stamped into everything since.

A false positive has a cost. When your detector flags a legitimate token as a scam, you don't lose a line in a report — you miss a real trade. So I became fanatical about precision. Gates that fired on incomplete data were worse than no gate: I once had a safety veto that, on brand-new tokens where the data provider simply returned an error, interpreted "no data" as "worst case" and silently rejected 100% of fresh tokens. It was the third time I'd shipped that same class of bug — a strict check that, on missing data, quietly blocks everything. The principle I beat into myself: a safety control you can't observe failing is worse than no control at all.

For a while, the payoff looked total: the detection layer blocked 30 of 30 known pump-and-dumps in live testing with zero false positives, and I was proud of that number. Then I learned the harder half of the lesson. Chasing that precision, I kept stacking on scam gates — and the defenses started eating the honest market. One filter built to catch coordinated bot-buying began rejecting something like 85% of legitimate tokens as fraudulent: it couldn't tell organic buying-frenzy, which naturally clusters into Solana's ~400-millisecond block slots, from a coordinated attack. It was, almost literally, flagging the laws of physics as fraud. A detector tuned only on what scams look like will cheerfully classify the entire legitimate market as a scam. So the bar I actually had to hit was never "zero false negatives" or "zero false positives" in isolation — it was the much harder balance between them, the one nobody puts on a landing page. With $1.78 on the line, getting that balance wrong was an expensive embarrassment. In a security tool that teams mute the moment it cries wolf, getting it wrong is fatal.

Autonomous things keep acting when you're not looking. One morning I found that a webhook had burned through roughly ten million API credits — more than ten times my entire monthly quota — while the bot itself was switched off. The bot was dead; the server-side subscription it had registered just kept firing into the void, metered, all night. Autonomy doesn't politely stop when your app does. Anything you let an agent set up, you have to be able to tear down — and meter, and cap. The agent's cost gate that came out of that — a hard ceiling an autonomous process cannot spend past — is one of my favourite pieces of code.

And the unglamorous backbone under all of it: layered circuit breakers, each tripping on a different kind of failure (a daily loss limit, a consecutive-loss counter, a drawdown cap, a rolling win-rate floor for the slow bleed the others miss), plus a final validator no trade could bypass — it refused any entry where the available liquidity wasn't many times the position size, so my own exit couldn't crater the price. Most of my worst losses, honestly, weren't scammers at all. They were my own bugs: a position limit defeated by a race condition, an "emergency loosening for data collection" that quietly disabled half the safety net, a dashboard proudly reporting a 70% win rate that — when I finally audited it — was a counting artifact hiding a real rate around 28%. The scariest findings weren't crashes. They were the numbers I trusted that turned out to be lies.

What this actually was

I was deep in guardrail code one night when it hit me that none of this was about trading.

Permission gates, a kill switch, an audit agent watching the others, propose-don't-apply, a tripwire on your own safety code, a fanatical-but-balanced false-positive bar, the discipline that the AI surfaces and the human commits — that isn't a trading-bot pattern. It's the operating manual for letting any autonomous AI touch anything that matters. And the single hottest place autonomous AI was about to touch something that matters was the one I'd spent my career in: source code. Everyone was about to start shipping software written by models they didn't fully understand, faster than any human could read it — confidently, on a loop, occasionally catastrophically wrong in ways that look perfectly fine at a glance. Exactly like a 1:23am migration that returns success.

So I stopped working on the bot and pointed the same machine at software instead. The agent framework carried over literally — the base class every agent in the next project inherits from still cites, on its fourth line, where it came from: "Production-tested patterns from Solana meme bot HAAN system." One of those agents did drift detection: it watched autonomously-generated code and flagged where it wandered off-spec. That piece was the thing people actually wanted, so I pulled it out, hardened it for other people's repositories, and that scanner became PullGuard — the company I run now. Its obsessive false-positive discipline, its focus on catching deceptive patterns in untrusted code, its insistence that it surfaces findings while the human decides what merges, even a check that notices when someone disabled a security control — every one of those is a lesson I paid for somewhere with more immediate consequences than a failing test.

The takeaway isn't "AI is dangerous, don't use it." I build almost everything with AI agents now; the productivity is real and I'm not giving it back. The takeaway is that you don't meet autonomous, confident, occasionally-catastrophic output with more human attention — you can't read faster than a model writes. You meet it with a cage: walls that hold when the thing inside is confidently wrong, defaults that fail closed, and a hard line where the machine proposes and a human commits. I just learned to build that cage somewhere a mistake cost me $1.78 and 44,409 records at one in the morning — instead of somewhere it costs you a production database, or a breach, in a repo full of code your AI wrote and nobody really read.