We earn commissions when you shop through links on this site — at no extra cost to you. Learn more

Back to all essays
Own Your Tech

Fine-tuning your own AI doesn't cost $35,000. It cost us about $50.

·5 min read
George Pu
George Pu$10M+ Portfolio

28 · Toronto · Building to own for 30+ years

Fine-tuning your own AI doesn't cost $35,000. It cost us about $50.

Two A100 graphics cards. Spinning quietly in a Google datacenter.

Five hours of training.

About $50 in compute.

That's what it cost us to fine-tune our own 4-billion-parameter AI model this week.

The base model went from 30% accuracy on the tasks we care about to 98%.

Read any article on fine-tuning costs and you'll see numbers between $5,000 and $35,000.

One blog called it a 'CFO conversation.'

Another listed 'hidden expenses' that could double your initial estimate.

A third quoted teams spending $3,000 to $10,000 'before hidden costs.'

None of that is true anymore.

It might have been true two years ago. Now it's a myth that survives because someone is profiting from you believing it.

What you actually spend

An A100 GPU on Google Cloud rents for about $3.67 an hour. We used two of them.

The actual training - the part where the model learns - ran for 5 hours.

That's about $37 of compute. Round it to $50 if you count all the test runs, the failed experiments, the time the GPUs sat idle while we debugged.

The 9-billion-parameter version we're training next week will cost maybe twice that. The 27-billion-parameter one after that, maybe four times. We're talking single-digit hundreds of dollars to fine-tune models that beat the off-the-shelf alternatives on the work we actually do.

So why does every article you read price this in the thousands?

Where the myth comes from

Two reasons.

First: managed fine-tuning services.

OpenAI charges $25 per million tokens to fine-tune their model. AWS charges separately for the GPU, the storage, the egress, the support contract. Snowflake and Databricks bundle fine-tuning into platform deals that start at five figures.

If their pricing is the headline, the headline is going to be 'expensive.'

Second: failed experiments.

The articles aren't lying about teams spending $5,000-$10,000. They're pricing in the cost of not knowing what you're doing.

Five botched runs at $1,000 each is real money. So is a week of an engineer's time tracking down why the model regressed.

That's not the cost of fine-tuning. That's the cost of doing it for the first time.

The actual GPU bill is rounding error.

What we built

I should introduce the person who actually did this work.

Ayush is our engineering lead. He ran this project end to end - the configs, the launch scripts, the crashes at one in the morning, the eventual ship.

We didn't train this because the API economics finally broke our way.

We trained it because in 2026, every business decision that depends on an AI model is a sovereignty decision.

If your product runs on someone else's model through someone else's API, you don't own the most important part of your product.

You rent it. The landlord can change the price, the terms, or the model itself.

The argument for training your own isn't economic. It's structural.

So we built it.

Here's what actually broke.

The crash with no error message

Day one. Ayush launches training.

The model loads. The trainer prints 'Starting.'

Thirty seconds later, the process dies. No error message. No stack trace. Just gone.

Six hours of debugging.

The root cause turned out to be a library conflict. Google Cloud ships its own version of the software that lets multiple graphics cards talk to each other.

PyTorch ships a different version. The computer didn't know which one to use, picked Google's, and crashed.

The fix was one line: tell the computer to ignore Google's version.

Six hours to find one line.

The process that died at 96%

Two days later. Training is running. Healthy progress. Step 48 of 50.

Then the screen prints: process killed.

The SSH session - the remote terminal Ayush was working in - had disconnected.

We had software running that was supposed to keep the training alive even if the connection dropped. It didn't.

47 steps of training - most of a working model - vanished.

No checkpoint had been saved. A checkpoint is a snapshot of the model partway through training.

If the process dies, you resume from the last snapshot. We hadn't configured the system to save snapshots often enough.

Full restart.

Want the full playbook? I wrote a free 350+ page book on building without VC.
Read the free book·Online, free

The fix was to save a snapshot every 25 steps. The lesson is simpler:

If your process can die at 96% with nothing recoverable, you set it up wrong.

The benchmark that lied

After training finished, we ran the evaluation suite.

One number jumped out. The 'citation present rate' had dropped from 97% to 19%.

A 78-point regression on the metric we cared most about.

We almost rolled back a week of work.

Then Ayush sat down with five test prompts and read the actual outputs.

The model cited perfectly. Every claim attributed. Every source grounded. Zero made-up references.

So we looked at what the benchmark was actually measuring.

It was running a text search for the pattern '[1]' in the model's output.

The base model scored 97% because it was inserting '[1]' markers all over its responses semi-randomly. Hundreds of fake citation markers, attached to nothing.

Our fine-tuned model only inserts citations when it's actually citing something. The present rate dropped. The precision - whether the citations were real - stayed at 100%.

The benchmark was penalizing our model for stopping a bad behavior.

If we'd trusted the number without reading the outputs, we would have shipped the model that hallucinates citations instead of the one that doesn't.

Same week, we ran a different test on the same model two different ways. One method scored it 46.5%. The other scored it 73%.

Same model. Same data. Same week.

The test was the variable. Not the model.

The model that destroyed itself

The next stage of training is called preference optimization. You show the model two answers - one good, one bad - and teach it to prefer the good one.

The framework ships with default settings. We trusted them.

The model collapsed in five steps. Function calling accuracy dropped from 98% back to 36%. We were below the base model on every metric we'd just spent days improving.

We dialed the settings down. The next run snapped right back to base - as if fine-tuning had never happened.

Ayush dug into the framework's source code.

One of the settings - the one that's supposed to continue from the previous training stage - was loading the previous work for display purposes but training a fresh model underneath. The progress was visible in the logs but absent from the actual updates.

Two runs deleted before we figured out the pattern.

The fix was to merge the previous stage permanently into the model before starting the next one. No more chaining.

Run three worked.

Tutorials show the happy path. They don't show the two runs you delete.

What it actually costs

The $50 number isn't the punchline. The $50 is the easy part.

Here's the actual cost of training your own AI model in 2026:

  • One engineer who can read framework source code when the docs are wrong
  • The willingness to hand-test your own model instead of trusting the benchmark
  • A week of debugging time you'll never get back
  • The judgment to know when to dial settings down and when to merge weights permanently

That's the bill. The GPUs are rounding error.

This is why the discourse is wrong. Everyone is pricing the part of fine-tuning that's already commoditized - the compute - and ignoring the part that isn't.

The model science in 2026 is mostly settled. The recipes work. You can read the papers in a weekend.

The work is in the infrastructure underneath. The library conflict you find at midnight. The signal handler that doesn't fire. The configuration setting that does something different from what its name suggests. The benchmark that measures the wrong thing.

None of these are AI problems. They're engineering problems. And right now, in 2026, they are the bottleneck.

Which is exactly why this is the moment to learn how to do it.

The companies that will own their AI stack in five years are the ones building this muscle today.

The ones that won't are still describing 'AI strategy' as a procurement decision.

The compute is cheap.

The discipline isn't.

Own or be owned.

Share this: