induwara.lkinduwara.lk
Opinionai-infrastructurevendor-lock-indeveloper-tips

Notion's Anthropic Outage Is a Lesson in Vendor Lock-In

Notion's AI broke when Anthropic had a disruption. Here's how a solo Sri Lankan builder should design around single-provider AI dependency and free-tier fragility.

Induwara Ashinsana4 min read
Notion logo on a laptop screen with an error or loading state visible
Image: TechCrunch

When Notion restored access to Anthropic after a service disruption this week, the more interesting story wasn't the downtime. It was how visible the dependency suddenly became. According to TechCrunch, Notion's head of product said he was "astonished" at "the amount of people RT-ing this."

That reaction is the real signal. A productivity app most people think of as "where I keep my notes" was, for a few hours, just a front-end to someone else's model. If you are building anything with AI in it from Sri Lanka, on a free tier or a tight budget, that is your lesson, not theirs.


🔗 Why one outage exposed a whole architecture

Notion didn't go down. Its AI features did, because they route through Anthropic. When the model provider had a disruption, the part of the product that depends on it stopped working, and users noticed immediately.

This is the shape of nearly every modern AI app:

  • A nice UI you control.
  • A thin orchestration layer you wrote.
  • A third-party model API you absolutely do not control.

The third box is where your reliability actually lives. You can have perfect uptime on your own servers and still show users a spinner that never resolves, because the call you make to the model timed out.

Key takeaway: If a single upstream provider can take out a headline feature, you don't have an AI feature. You have a dependency wearing a feature's clothes.


🛠️ How to design for the provider being down

The fix is not "self-host a model." For a solo builder that is usually unrealistic on cost and hardware. The fix is graceful degradation plus a fallback path. Three patterns, cheapest first:

  1. Fail loud, fail fast. Set a timeout (5–10s) on every model call. When it trips, show a clear "AI is temporarily unavailable, try again" message instead of an infinite spinner. Notion's users RT-ed because the failure was confusing, not because it existed.
  2. Degrade to non-AI. If the AI summary fails, still show the raw text. If smart-tagging fails, let the user type a tag. The product should lose a feature, not a function.
  3. Fallback provider. Keep a second model wired behind a flag. When provider A errors or times out, retry once against provider B.

Here is the difference in user experience:

Approach What the user sees when the provider is down Build effort
No handling Infinite spinner, app feels broken None
Fail loud Clear error + retry button Low
Degrade to non-AI Core feature still works, AI absent Medium
Fallback provider Mostly invisible, slight latency bump Medium-high

You do not need all four. For most side projects, fail loud + degrade covers you for an afternoon-long outage without touching a second vendor.


💰 The free-tier angle nobody warns you about

If you are learning on a student budget, your dependency is even more fragile than Notion's, because you are on a free or rate-limited tier. Two things bite people from Colombo to Kandy:

  • Shared rate limits. A free-tier key can get throttled the moment usage spikes, which looks exactly like an outage to your users.
  • Single key, single account. One suspended key takes down your whole demo right before you show it to a client.

A few cheap habits:

  • Never hard-code one provider's call deep in your UI. Put it behind a small function you can swap.
  • Log every failed model call with a timestamp so you can tell "provider was down" apart from "my code is broken."
  • Estimate your real token spend before you commit to a tier, so a viral moment doesn't silently blow your cap.

If you're sizing those costs, our free AI agent cost calculator and AI model comparison tool let you compare providers on price and limits before you wire one in as your only option.

Bottom line: A second provider is not over-engineering when your first one is on a free tier that can throttle you without notice.


⚡ What "restored access" really tells you

The outage resolved on its own once Anthropic recovered. Notion's team didn't fix the model. They waited. That is the honest position most of us are in, and it's fine, as long as you've decided in advance how your product behaves during the wait.

The companies that handle these moments well aren't the ones who never depend on a vendor. Everyone depends on a vendor. They're the ones whose product fails in a way that keeps user trust:

  • Clear status messaging.
  • Core workflow still usable.
  • A retry that actually works when the provider comes back.

The ones that get ratio'd on social media are the ones where a hidden dependency surfaces as a confusing dead end.


💡 What this means for you

You are probably one engineer, maybe a tiny team, shipping something with an AI call in it. You can't out-spend Anthropic's reliability and you don't need to. You need three decisions made before launch, not during the incident:

  1. What's my timeout, and what message shows when it trips?
  2. What still works when the AI is gone?
  3. Is there a second provider I can flip to, even manually?

Make those calls now and the next provider disruption is a non-event for your users instead of a screenshot they share. Notion's outage got attention because the dependency was invisible until it broke. Yours doesn't have to be.

#ai-infrastructure#vendor-lock-in#developer-tips
IA

Induwara Ashinsana

Information Systems student at UCSC and Executive Director at Ryzera Technologies. Writes about software, AI, and what it means for builders in Sri Lanka.

About the author →

Keep reading