AI Vendor Lock-in: Real Risks and Fake Ones

Bottom line. Vendor lock-in is a real concern in AI infrastructure, but it's a specific set of concerns that most leadership conversations conflate with generic procurement anxiety. In 2026, there are five real lock-in risks worth managing actively — pricing leverage, data gravity, fine-tune portability, retrieval-layer dependency, and regulatory single-vendor risk — and four overstated risks that absorb leadership attention and produce no decision value. This briefing decomposes both lists, names the specific engineering decisions that buy portability where portability is worth having, and closes out Module L2 with the vendor strategy that fits the org patterns from L2.1-L2.3.

Procurement will always worry about lock-in. Engineering will sometimes worry about it. Your job as a leader is to know which worries are load-bearing and which are generic procurement reflexes. Fifteen minutes of reading; one clarified position you can defend in your next vendor review.

Why most lock-in conversations are unhelpful

Start with the pathology. Generic "we should avoid vendor lock-in" statements in leadership meetings do two specific things: they sound prudent, and they prevent decisions. A leader who says "we need to be vendor-independent" is signaling caution and offloading the specific strategy question to a later meeting. A team that says "we can't commit to any vendor until we have an abstraction layer" is postponing shipping indefinitely. Both patterns are pathologies, and the right response is to force the conversation into specifics: which specific lock-in risk are you worried about, and what specific engineering or contractual choice would mitigate it?

If the answer is "general risk" or "strategic flexibility," the concern is noise and should be named as such. If the answer is "pricing leverage erodes when we hit 100M monthly calls" or "our retrieval layer would need 6 months of rework to swap providers," that's a specific concern and deserves a specific engineering decision. The rest of this briefing is a vocabulary for making the distinction.

Two branches. The "noise" branch is where most leadership conversations end; the "specific risk" branch is where the useful work happens.

The five real lock-in risks

Each of these is a specific scenario where over-commitment to a single AI vendor creates measurable exposure, and each has a specific mitigation that's cheaper than the risk.

Real risk 1: pricing leverage at scale

The concern: your inference cost is a significant line item, your vendor knows your volume, and when the contract comes up for renewal they have pricing power because you can't cheaply switch. A 15-25% price increase at renewal is hard to refuse if migrating would take three quarters of engineering work.

When it becomes real: when your annual spend with a single vendor crosses ~$500K-$1M and the vendor knows your alternative is expensive. Below $500K, you're in the "below their attention threshold" regime and pricing stays at public tiers. Above $1M, the vendor has sales reps actively modeling your churn risk and pricing accordingly.

The specific mitigation: maintain a second-provider fallback for your hot-path features. This doesn't mean splitting 50/50 — it means having a tested, deployable configuration that uses a different provider for the same feature, so that at contract renewal you can credibly threaten to switch. The cost of maintaining the fallback is real (maybe 2-4 engineer-weeks per quarter on eval against the second provider), but it recovers 5-15% on renewal pricing, which on a $1M annual contract is $50K-$150K saved per year. Clear ROI above ~$500K annual spend; unclear below.

What not to do: actively run traffic through both providers for "resilience." The operational complexity of dual-provider production is expensive and the reliability gain is marginal because frontier providers rarely go down simultaneously. Keep the fallback deployable, not running.

Real risk 2: data gravity in proprietary formats

The concern: over time, you accumulate data in a vendor-specific format — stored prompts in their management platform, eval sets in their proprietary eval tool, traces in their observability system, fine-tuning datasets in their format. Migrating away means re-doing all of it. The gravity is proportional to accumulated years of data.

When it becomes real: when you've been on a single vendor's platform for 12+ months and have >500 prompts, >20 eval sets, or >1M logged traces in their format. Below those volumes, migration is annoying but tractable. Above them, migration is a project.

The specific mitigation: from day one, treat your prompts, eval sets, and key traces as your data, stored in your systems, synchronised to vendor tools, not as "in vendor's tool, exportable if needed." Concretely:

Prompts live in code in your git repo (Course 2 B2.2), synced to the vendor's prompt tool only as a read-only deployment target.
Eval sets live in a format-agnostic spec (markdown, JSON, SQLite) in your git repo, executed by your code calling whichever runner.
Observability traces stream to your logging system (Postgres, ClickHouse, BigQuery) first, with the vendor tool consuming from there.
Fine-tuning datasets live in your data warehouse, exported to vendor format on demand.

This is extra engineering work up front — maybe 2-3 weeks of platform team effort — that permanently eliminates data-gravity lock-in across your entire AI stack. High ROI, done once, worth doing.

Real risk 3: fine-tune portability

The concern: if you fine-tune models with a specific provider, the resulting weights and the training pipeline are tied to that provider's fine-tuning stack. Migrating means starting over with a new provider's fine-tuning process, re-curating data, re-running training, re-validating quality. The gap between "I have a fine-tuned model here" and "I have the equivalent on a different provider" is months of work.

When it becomes real: if you have any fine-tuned models in production with a specific vendor. Even one. The work to reproduce on another vendor is real.

The specific mitigation: keep your fine-tuning datasets, your training recipes, and your evaluation criteria in provider-agnostic formats. Your training data sits in your data warehouse. Your recipe is a documented specification ("take this data, apply these transformations, run a LoRA with these hyperparameters") that could be executed against any provider's fine-tuning API. Your evals are independent of the provider. If you ever need to migrate, you re-run the recipe against the new provider, not recreate the work from scratch.

Secondary mitigation: reconsider whether you need fine-tuning at all (see Course 2's B6.2). In 2026, most fine-tuning is avoidable with good prompting and retrieval, and the simplest way to avoid fine-tune lock-in is to not fine-tune. This is the most-recommended move I give leadership: if your team is fine-tuning, ask them whether they can achieve the same quality with prompting and retrieval instead. 70% of the time the answer is yes.

Real risk 4: retrieval-layer dependency

The concern: if you build your RAG pipeline on a specific vendor's vector database, their embedding model, or their rerankers, switching providers means rebuilding the retrieval layer. This is especially expensive because retrieval quality depends on specific embedding/chunking/reranking choices that don't transfer cleanly across providers.

When it becomes real: if your RAG pipeline is central to multiple features and uses a specific vendor's proprietary vector store with their embedding model and their reranker. Multi-dependency is the killer; single-component swaps are tractable.

The specific mitigation: use Postgres with pgvector for your vector store (Course 2's B3.2 covers this in detail). It's an open extension on a database you already run. Embeddings are stored as portable float vectors. You can re-embed with a different embedding model and keep the same vector store. You can swap retrieval strategies without touching the storage layer. This is the single highest-ROI anti-lock-in decision in a RAG stack — by far.

When dedicated vector databases are worth the lock-in: extreme scale (>10M vectors with demanding latency), exotic query patterns, or specific compliance features that pgvector can't provide. These are real cases; they're just much narrower than the vector-DB vendor marketing suggests.

Real risk 5: regulatory single-vendor risk

The concern: if a regulator takes action against your vendor (or the vendor itself has a major incident), your business is affected proportionally to how much you depend on them. An outage is short-term; a regulatory ban or a security incident can be weeks to months of disruption.

When it becomes real: in heavily regulated industries (finance, healthcare, government), and in jurisdictions where the regulatory environment is actively changing (EU, UK, some US states).

The specific mitigation: maintain contractual portability. Your vendor contract should allow rapid migration in specific regulatory scenarios — the DPA should cover data portability on request, the SLA should allow contract termination on regulatory grounds, and your data should be exportable in standard formats on short notice. This is a legal and contractual mitigation, not an engineering one. Your legal team, not your engineering team, owns it.

The operational mitigation: have a documented "emergency vendor swap" runbook — not a system that auto-swaps, but a written plan for how you would migrate in 2-4 weeks if you had to. Teams that have the runbook can act in a crisis; teams that don't, can't.

The four overstated lock-in risks

These are concerns that dominate leadership meetings in 2026 but are not proportionally important to actual exposure. Each deserves naming because the defence against them absorbs engineering capacity that should go to real risks.

Overstated 1: "we're using their API"

The claim: using a specific vendor's API (Anthropic, OpenAI, Google) is lock-in.

Why it's overstated: frontier model APIs are similar enough in shape that swapping providers for the base inference is a small engineering task — not free, but measured in weeks, not quarters. A codebase that wraps client.messages.create(...) in a thin helper function can swap providers in an afternoon, with eval-set re-validation taking another 2-3 days. The lock-in is real at the millisecond-by-millisecond output level (different providers have different biases, formats, temperatures), but at the feature level it is tractable.

The correct response: write a thin abstraction layer that exposes your app's calls in a provider-neutral way. One function llm(prompt, settings) that internally calls whichever provider. This is 50-150 lines of code. It is not an "AI framework" — it is a 3-function shim specific to your app. Don't adopt LangChain or similar to solve this problem; a custom shim is simpler and more portable.

Don't solve: elaborate multi-provider routing layers that dynamically pick between providers based on latency or cost. These are operationally complex, rarely worth their complexity at mid-market scale, and easier to build when you actually need them than to maintain speculatively.

Overstated 2: "prompts are lock-in"

The claim: prompts tuned for one provider's model don't transfer to another, so investing in prompt engineering creates lock-in.

Why it's overstated: prompts transfer well between frontier models in 2026. A prompt that works on Claude Sonnet 4.6 typically works on GPT-5, Gemini 2.5 Pro, and equivalent models with 3-7 points of eval variance — meaningful but small. Re-tuning a prompt for a new model is a 1-2 day exercise per major prompt, not a project.

The correct response: treat prompts as code (Course 2 B2.2), version them, eval them, and re-run evals against a new provider when migrating. The "lock-in" is a small one-time cost, not a permanent dependency.

Overstated 3: "context window features lock us in"

The claim: features that depend on 200K-token context windows are tied to specific providers because not every provider offers that window.

Why it's overstated: every major frontier provider offers 200K+ context in 2026. Claude Sonnet 4.6, GPT-5, Gemini 2.5 Pro all ship similar context capabilities. The days of "only one provider has long context" are over.

The correct response: don't worry about it. Use long context where your feature needs it; assume any frontier provider you might swap to also has it.

Overstated 4: "we can't commit until we have a multi-cloud AI platform"

The claim: until we've built a full abstraction layer across multiple clouds and multiple AI vendors, we shouldn't commit to any AI investment.

Why it's overstated: building a full multi-cloud AI platform is a 6-12 month project that delays shipping any AI feature. During that delay, competitors ship. When the abstraction layer is finally complete, it solves theoretical portability for features you haven't built yet. The ROI math is bad, and the opportunity cost is enormous.

The correct response: ship AI features with a thin per-feature abstraction (see Overstated 1). Worry about multi-cloud platforms only if you have a specific regulatory requirement that mandates them. "Strategic flexibility" is not a specific requirement; it's procurement paranoia.

The specific failure mode to recognise: a team that keeps proposing "foundational platform work" before shipping any features. Six months of foundational work with zero features shipped is a strong signal that the team is procrastinating by building infrastructure for a future that isn't coming. Redirect them to ship one feature on a single vendor, then revisit.

A practical vendor strategy that fits the org patterns

Given all of the above, here's the vendor strategy that fits the Patterns from L2.1, with the specific engineering choices that buy real portability without overinvesting in fake-risk mitigation.

For Pattern 1 (champion, <150 people):

One primary provider (Anthropic, OpenAI, or Google), picked based on your eval set.
No second-provider fallback needed — you're below the attention threshold for pricing leverage.
Thin in-code abstraction (one helper function) for API calls.
Postgres with pgvector for any retrieval. No dedicated vector database.
Prompts and eval sets in git, in provider-agnostic format.
No fine-tuning. Prompt engineering + retrieval for everything.
Total annual vendor spend: typically $20K-$200K, well within ad-hoc sourcing.

For Pattern 2 (150-800 people):

One primary provider for most work, second provider pre-tested as deployable fallback (2-4 weeks of dev time per quarter on keeping it current).
Thin in-code abstraction layer — 50-150 lines.
Postgres with pgvector for retrieval. Consider dedicated vector DB only if pgvector is measurably insufficient.
Prompts, eval sets, traces all stored in your systems first.
Fine-tuning only for specific high-ROI cases, with portable recipes.
Contractual portability (exportable data, termination clauses).
Total annual vendor spend: $300K-$3M, worth real strategic attention.

For Pattern 3 (>800 people):

Primary provider + fully tested second provider + periodic eval against third.
Full abstraction layer (not a framework — a bespoke shim per-service).
Vector DB choice evaluated on merits (Postgres for most; dedicated only at extreme scale).
Governance function tracks regulatory single-vendor risk.
Multi-regional data handling.
Contractual portability + operational runbook for emergency migration.
Total annual vendor spend: $3M-$25M, managed as a strategic sourcing relationship with regular reviews.

Notice what all three have in common: thin abstraction, portable data, Postgres for retrieval, avoid fine-tuning where possible, provider-agnostic prompts and evals. These are the five decisions that carry most of the real portability. The rest — the elaborate platforms, the multi-cloud frameworks, the sophisticated routing — are premature and mostly harmful.

The failure mode: "infinite flexibility theater"

The specific failure mode that drains the most engineering time in 2026 on the vendor question: building elaborate multi-vendor infrastructure speculatively, before committing to any vendor, because "we want to be flexible".

The pattern: a team decides before adopting any AI vendor that they must first build a multi-vendor abstraction layer, a vendor-neutral eval framework, a flexible prompt-management system, and a portable fine-tuning recipe engine. They spend 4-6 months on this infrastructure. They ship zero user-facing features during those months. At the end, they have impressive infrastructure and no evidence that it actually matches what they need, because they haven't built anything real on top of it yet. The first real feature they try to ship surfaces dozens of assumptions their infrastructure got wrong, and the team rebuilds.

Infinite flexibility theater is procrastination dressed up as architecture. It happens because "we need flexibility" is hard to argue against and easy to propose. It prevents shipping.

The specific defence: ship a real feature on a single vendor first, then extract the abstractions that prove necessary from the friction points you observe. This is the opposite of building the abstraction first. The concrete feature produces the forcing function; the friction produces the abstractions; the abstractions are then proven useful rather than speculative. Teams that do this ship faster and end up with better abstractions than teams that speculate.

If a team in your org is proposing "foundational multi-vendor work" before shipping a single production AI feature, redirect them. "Ship one feature on one vendor, then we'll talk about what abstractions are worth building" is the right guidance 95% of the time.

What to decide on Monday

Audit your current AI stack for the five real lock-in risks. One bullet per risk: current exposure, current mitigation, gap.
Adopt Postgres with pgvector for retrieval unless you've already measured that it doesn't work for your scale. This is the single biggest anti-lock-in move available.
Write a thin in-code abstraction for your LLM calls. 50-150 lines. Not a framework.
Store prompts, eval sets, and traces in your systems, not in vendor tools. Vendor tools become read-only downstream consumers.
Reject "infinite flexibility theater" proposals. Require a shipping feature before abstraction investment.
Maintain a deployable second-provider fallback if your annual spend crosses ~$500K. Below that threshold, skip.
Do not fine-tune unless you've explicitly ruled out prompting + retrieval. Fine-tuning is the single largest source of real lock-in in an AI stack; avoiding it solves the problem.
Write the emergency vendor migration runbook for Pattern 3 organisations. Three pages; updated annually.

And that closes Module L2 — Org and Strategy. You now have the pattern picker, the hiring sequence, the talent matrix, and the vendor lock-in decomposition. Four briefings, four decisions, ready to execute on your 2026 AI org design.

Next up: Module L3 — Money, Risk, Governance. Four briefings on pricing AI work internally, the compliance surface, IP and data handling, and incident response at the leadership level. The back-office infrastructure that keeps AI investments accountable and safe.

⬅️ Previous	📍 You are here	Next ➡️
⬅️ Previous L2.3 · Hire, Retrain, Borrow	L2.4 of L4.3	Next ➡️ L3.1 · Pricing AI Work Internally

📚 AI for Leaders · Course Home — 15 briefings, four modules.

Cover photo via Unsplash. This post is part of the AI for Leaders series.

Vendor Lock-in: Real AI Risks and Fake Ones

Why most lock-in conversations are unhelpful