Skip to main content

Command Palette

Search for a command to run...

Pricing an AI Feature Without Burning Margin

Your AI feature costs pennies to run and thousands to build. Flat pricing will bankrupt you on power users. Usage pricing will drive normal users away. Here is the pricing framework that actually works in 2026.

Updated
19 min read
Pricing an AI Feature Without Burning Margin

You have shipped the AI feature. Stage 4 is complete. Usage is growing. Then the finance email lands on a Tuesday: the feature is costing you real money — call it $22,000 a month at current volume, or roughly 14% of the revenue from the customers using it. The unit economics, which looked fine in the one-pager six months ago, have quietly shifted. A few power users are consuming 40% of the total cost. A few casual users are using it so rarely they wonder why they're paying for it at all. Your pricing was "included in Pro plan." Nobody's happy.

This is the pricing gap, and in 2026 it is one of the three most common ways an AI feature goes from "launched" to "we need to rethink this" — behind "quality wasn't ready" and "compliance stalled the deal" — but ahead of pretty much everything else. The gap is caused by shipping AI features on pricing models designed for traditional software, where marginal cost was near zero and flat pricing made sense. AI features have real marginal cost, and the pricing has to account for it without making the product feel like a metered utility.

This post is the framework. Four pricing models, a decision matrix for picking between them, real COGS math for 2026, a worked "this is how we priced the same feature at three different volumes" example, and the specific failure modes that kill AI product economics. Module P5 opens with this because pricing is the hinge between "the feature works" and "the feature makes money," and most teams still treat pricing as something to figure out after launch. That's the failure mode this post exists to prevent.

No code, no eval talk. Spreadsheet-level thinking about the economics of an AI feature and the pricing model that exposes them to customers in a way that's both fair and sustainable.


The three pricing anchors, restated for AI

Traditional SaaS pricing has been dominated for a decade by three anchors: per-seat, tiered plans, and enterprise contracts. Each of them has a clean story because traditional features have near-zero marginal cost — once a customer is on the plan, their 10,000th action costs you the same as their first. The margin was all fixed cost, and pricing just had to clear the line.

AI features break this. The 10,000th call to a frontier model costs roughly the same as the first — a real marginal cost per invocation, not a rounding error. Pricing has to account for that cost somehow, and the choices about how are the subject of this post.

At a very high level, the pricing choices for an AI feature in 2026 fall into four patterns:

Four patterns, each with a sweet spot and a failure mode. Let me take each in turn, then give you the decision matrix and real 2026 COGS math.


Model 1: flat include — "it's just part of the plan"

The traditional default. The AI feature is included in an existing plan (Pro, Business, whatever) with no usage limits, and the price of the plan stays the same or goes up a modest amount. Users see a normal feature they can use freely; you absorb the cost as part of the plan's gross margin.

When this wins:

  • The feature's per-call cost is small (under roughly $0.005 at frontier models, or heavily mid-tier).
  • Usage is predictable and concentrated — most users invoke it a handful of times per week, not hundreds.
  • The feature is a "value-add," not the core product. A smart-summary button in an inbox. A draft-reply feature in a CRM. Not a chatbot that IS the product.
  • You have healthy margins on the existing plan and can absorb even a doubling of AI cost without going under.

The 2026 math that makes this work:

Suppose your AI feature uses a mid-tier model (Claude Haiku 4.5 or GPT-5-mini, roughly $0.003/call for a typical 3,000-input/300-output use case). A user who hits the feature 30 times a week costs you $0.36/month in direct API cost. At 10,000 users, that's $3,600/month total. If your Pro plan is $25/user/month and the AI feature adds 1.4% to your COGS line, you can absorb it.

When this fails:

  • Power users invoke it hundreds of times a day, silently dragging your margin down while everyone else pays the same.
  • The feature turns out to be "too good," and usage grows 5x in three months — cost grows with it, revenue doesn't.
  • You migrate the feature from a mid-tier to a frontier model mid-flight (for quality reasons), and cost per call 5x but pricing didn't change.
  • Your customer base includes automation scripts that call the feature programmatically at scale.

The specific 2026 trap with flat include: no visibility into power-user costs. You can't see who's driving your bill, you can't throttle the outliers, and you can't have the "hey, your usage is significantly above average" conversation because you never set a norm. Teams that ship flat include without usage analytics wake up to a finance email and have no tool to respond with.

The defence: even if you price flat, instrument per-user AI usage in your dashboard. Know who the power users are. Have a path to having a conversation with them if costs run away. This is infrastructure work that costs a day and saves quarters.


Model 2: usage caps on a flat plan

A middle ground. The feature is included in the plan, but with a hard or soft limit. "500 AI drafts per user per month, then the feature pauses until next month" or "unlimited up to reasonable use, we'll notify you at 1,000 and pause at 1,500." The plan price accounts for the cap, and users who need more buy an upgrade.

When this wins:

  • Feature cost is medium — too high to absorb uncapped, too sensitive to meter per-call.
  • Usage has a long tail — most users are nowhere near the cap, a few pro users are well above it, and you can monetise the latter.
  • The cap is high enough that normal users never notice it. This is crucial. A cap users bump into is a cap users resent; a cap users never see is invisible and effective.
  • You have tiered plans already — upgrading to a higher tier for more AI usage slots naturally into your existing pricing structure.

Concrete 2026 example: Notion AI's early pricing used soft caps and an unlimited add-on. Linear's Copilot features include "reasonable use" language in the higher plans. GitHub's Copilot offers per-seat pricing with effectively unlimited use, but individual request rate-limits apply under the hood. Each of these is some version of the "usage cap" model.

The math at 2026 mid-tier pricing: if you set a cap at 500 AI calls/user/month on a mid-tier model at $0.003/call, the worst-case cost per user hitting the cap is $1.50/month, or roughly 6% of a $25/month plan. You can absorb that on every user, even if they all hit the cap — which they won't, because cap-hitting users are a fraction of total users.

The failure mode: the cap is too low, and power users bounce. You set a 200-call cap thinking it was generous. A power user hits it in week 2, gets blocked, realises the feature is unreliable for their workflow, and downgrades or churns. The cap saved you $2 in cost and cost you $300 in churn. Fix: monitor cap-hit rates and adjust the cap so no more than ~2% of users ever bump into it. If more are, either raise the cap or move to credits (Model 3).

The other failure mode: the cap is easy to circumvent. A power user hits the cap on their account, creates a second account, and continues. Now you have no revenue from them and extra support load. Caps at the account level work in B2B; at the user level in consumer they're trivially bypassable. Pick the right level for your product.


Model 3: credits — pre-paid burndown

The model where customers buy a bucket of "credits" up front (either included in their plan or purchased separately), and each AI call consumes credits at a defined rate. Different feature types can cost different amounts of credits — a simple summary might be 1 credit, a multi-step research task might be 10.

When this wins:

  • Feature cost varies meaningfully per call (short queries vs long document processing), and flat pricing over-charges some users and under-charges others.
  • Usage is bursty — users don't invoke the feature daily but they occasionally run a big batch job.
  • The cost is high enough that you need to tie revenue directly to usage, but you also want users to feel some detachment from per-call pricing.
  • Your product already has a "plan credits" metaphor (e.g., storage, seats, API calls) that credits can slot into.

Concrete 2026 examples:

  • Midjourney and other image generators use credit models almost universally — each generation burns a set number, different qualities cost different amounts.
  • Many B2B vertical AI tools use credits for agentic / multi-step tasks where per-call pricing would confuse customers.
  • Anthropic's own usage tiers include a credits-style "message limit" on higher tiers.

The math that makes this work: You set credit cost such that 1 credit = roughly 2-3x the direct API cost. For a $0.003 API call, that's roughly $0.01/credit in revenue terms. You bundle 500 credits with the Pro plan (a $5 value at your credit price, roughly 20% of a $25 plan's gross margin budget) and sell additional credits at $0.01 each. Power users who need 2,000 credits/month pay $20 more, and that extra $20 comes with a gross margin of roughly 70% (since your cost on those credits was only $6).

The failure mode: credits feel expensive even when they aren't. Users see "1 credit per draft" and the credit counter going down, and they become anxious about every call, using the feature less than they should. This is worse than it sounds — it produces lower engagement than either flat or metered pricing because the psychological weight of "spending" on each call is higher than the actual cost.

The defence: make credits plentiful by default. The median user should never come close to running out. The 90th-percentile user should occasionally consider topping up, not constantly be near zero. If most users are in the "anxiety zone" of credits, you've set the bundle too tight.

The other failure mode: confusing credit rates across feature types. A summary costs 1 credit; a research task costs 10; a long-document analysis costs 25. The user has no way to predict their bill. Fix: either keep all features at the same credit rate, or surface a clear "this action will cost N credits" estimate before the user commits.


Model 4: metered — per-call or per-unit pricing

The most transparent, most technically honest, most customer-hostile of the four. Every AI call (or every unit of output — per-page, per-minute, per-token) is priced individually. The customer sees exactly what each action costs and pays for it. This is how most frontier model APIs sell to developers; very few end-user products get away with it successfully.

When this wins:

  • Your customer is a developer or technical buyer who is comfortable with per-call pricing.
  • The feature is expensive enough that flat pricing would either bankrupt you or require an absurdly high plan cost.
  • Your product is itself an infrastructure tool, not an end-user product — a transcription API, a vector search service, a specialised vertical model.
  • Customers use the feature in batch workloads where cost predictability doesn't require hiding the per-unit price.

Concrete 2026 examples:

  • Frontier model APIs (Anthropic, OpenAI, Google) — per-token pricing.
  • Transcription services (Whisper-based, Deepgram, AssemblyAI) — per-minute pricing.
  • Embedding APIs, reranking APIs, image generation APIs — all per-call.
  • Some enterprise vertical AI tools (legal research, radiology analysis) — per-document or per-study pricing.

The failure mode: per-call pricing is hostile for consumer and most B2B SaaS products. Users don't want to think about what each click costs. The metering creates cognitive drag that suppresses usage. Teams that ship consumer AI products on metered pricing usually migrate to credits or caps within 6 months.

The specific trap: translating a metered cost from your frontier provider directly into metered pricing for your product. "Our cost is $0.003/call, we charge $0.01/call." This is technically correct and strategically terrible, because customers now see a micro-cost metric they didn't want, and you've exposed your COGS structure to your customers who will use it as a negotiation anchor. Better: absorb the metering into credits or caps, and let the customer see a higher-level unit.


The decision matrix

Five criteria, four models. For each row, check which model the criterion favours.

CriterionFlat includeUsage capsCreditsMetered
Per-call costVery low ($0.002 or less)Low to medium ($0.003-$0.015)Medium to high ($0.015-$0.10)Very high ($0.10+)
Usage shapePredictable across usersLong tail with few power usersBursty (big jobs occasionally)Batch / developer workloads
Customer typeConsumer or B2B SaaS end-usersB2B SaaS with varied usageB2B SaaS with power usersDevelopers, technical buyers
Margin toleranceHealthy existing marginsTight margins on power usersNeed to monetise variable usageCOGS-heavy products
Product maturityMature product, stable planMature product, adding AINew AI-first productInfrastructure product

Run your feature through the matrix. Count which column wins. If one column wins clearly (3+ rows), that's your starting model. If the rows split 2/2/1, you likely need a hybrid — usually flat include with soft caps plus a credits upgrade path for power users, which is the most common 2026 pattern for B2B SaaS products adding AI features.


The COGS math you must do before you ship

Before you commit to a pricing model, you need one concrete number: your cost per active user per month at expected launch volume, broken down by usage tier. Not an estimate from the one-pager. A real number from either a Rung 3 prototype run (Course 3 P2.3) or from the internal beta stage of the rollout (P4.3).

Here is the worksheet:

## AI feature COGS worksheet

Feature: [name]
Model tier: [frontier / mid / small] + specific provider
Avg input tokens per call: [measured from pilot data]
Avg output tokens per call: [measured]
Cost per call: (input × input_price) + (output × output_price) = $[X]

## Usage tiers (from internal beta observation)

- Casual user: [N] calls/month → $[N × X]/month cost
- Median user: [N] calls/month → $[N × X]/month cost
- Power user (90th percentile): [N] calls/month → $[N × X]/month cost
- Whale (99th percentile): [N] calls/month → $[N × X]/month cost

## At expected launch volume

- [X]% casual × [expected count] × $[cost] = $[total]
- [X]% median × [count] × $[cost] = $[total]
- [X]% power × [count] × $[cost] = $[total]
- [X]% whale × [count] × $[cost] = $[total]
Total monthly direct API cost: $[sum]

## Revenue side

- Plan revenue from AI users: $[price × count]
- Plan gross margin before AI: [%]
- AI cost as % of AI-user revenue: [%]
- Effective gross margin after AI: [%]

## Red flags

- If power user + whale cost > 40% of total, flat pricing will fail
- If AI cost > 20% of AI-user revenue, pricing needs re-thinking
- If effective margin < 50% of plan margin, do not ship at this model

Fill this in before you pick a pricing model. The shape of the distribution (how much of your cost comes from power users vs median users) tells you which model will survive contact with real traffic. Teams that skip this step ship pricing on intuition and rework it in month 3; teams that do it ship pricing once and live with it.

The specific 2026 reality: direct API costs for typical RAG + generation features on Claude Sonnet 4.6 in 2026 run $0.015-$0.030 per call. Mid-tier models come in at $0.002-$0.005. A feature that runs on Sonnet for quality reasons is 5-10x more expensive per call than the mid-tier equivalent, and that gap is the single biggest lever on your pricing model. Before committing to Sonnet, check: can mid-tier (with better prompting or retrieval) pass your eval-set target? If yes, you just halved the cost side of this whole conversation.


A worked example: pricing the same feature at three volumes

Let me ground this in a realistic 2026 scenario. Same feature — an AI support ticket drafter — priced at three different company scales to show how the right model changes with volume.

Stage 1: early startup, 300 customers, ~$50K MRR

The company is selling a $25/user Pro plan. 300 customers × average 5 seats = 1,500 seats. About 40% (600 seats) use the AI drafter. Each active user averages 60 drafts/month. At Claude Haiku pricing (~$0.003/call) that's $0.18/user/month cost, or $108/month total AI cost.

Right model: flat include. Cost is 0.2% of revenue. Absorb it. Put usage analytics in the dashboard but don't meter or cap. Use the early data to learn the usage distribution before committing to anything.

Stage 2: growth stage, 3,000 customers, ~$500K MRR

Same plan price. 3,000 customers × 10 seats = 30,000 seats. About 50% use the drafter. Distribution has shifted: casual users 40%, median 40%, power users (20% of active) are now averaging 300 drafts/month, and 1% of whales are averaging 2,000 drafts/month. Cost math:

  • Casual: 6,000 users × 30 drafts × $0.003 = $540/mo
  • Median: 6,000 × 80 × $0.003 = $1,440/mo
  • Power: 3,000 × 300 × $0.003 = $2,700/mo
  • Whale: 150 × 2,000 × $0.003 = $900/mo Total: $5,580/mo, or 1.1% of revenue. Still manageable but the whale tier is now visible.

Right model: flat include with soft caps. Set a "fair use" guideline at 500 drafts/user/month. For the 1% of whales consuming 4x that, reach out individually and offer them a higher-tier plan with more headroom (or a direct conversation about pricing). The 99% who never approach the cap never notice.

Stage 3: scale, 20,000 customers, ~$5M MRR

Now it's meaningful. 200,000 seats, 60% use the drafter = 120,000 active users. But now the company has moved the feature to Claude Sonnet 4.6 for better quality — cost per call has gone from $0.003 to $0.022. Same distribution:

  • Casual: 48,000 × 30 × $0.022 = $31,680/mo
  • Median: 48,000 × 80 × $0.022 = $84,480/mo
  • Power: 24,000 × 300 × $0.022 = $158,400/mo
  • Whale: 1,200 × 2,000 × $0.022 = $52,800/mo Total: $327,360/mo, or 6.5% of revenue. A real line item.

Right model: hybrid — flat with hard caps on Pro, credits upgrade for Business plan, enterprise custom pricing. The Pro plan ($25) includes 200 drafts/month. The Business plan ($50) includes 800 drafts/month. Enterprise contracts include custom allotments with volume discounts. Whales are now contractually priced. The 6.5% COGS line is directly addressed by the plan structure: power users and whales pay more for more capacity, and the flat tier is sized so median users never hit the limit.

The key observation: the same feature needed three different pricing models at three different scales. Pricing is not a decision you make once — it's a decision you revisit every ~6-12 months as your usage distribution and cost shape evolve. Teams that assume "we priced it at launch, we're done" are the teams that find themselves in the "finance email on Tuesday" situation at Stage 3.


The three specific failure modes

Three patterns deserve naming because they are the most common ways AI pricing goes wrong in 2026.

Failure 1: pricing on COGS, not value

Engineering says the feature costs $0.02/call. Finance says "price it at $0.05 for a 60% gross margin." Product ships it at $0.05/call. Adoption tanks because customers think "two cents to write an email? Absurd."

The failure is pricing on cost, not value. What's the feature worth to the customer? If it saves them 10 minutes per ticket and they handle 50 tickets a day, that's 8 hours/week of saved time — call it $200/week in value. The feature is worth $800/month to this user, not $0.05/call. Price to value, not to cost plus margin. Cost sets a floor, value sets a ceiling, and you charge somewhere in between.

The defence: always start the pricing conversation with "what is this worth?" not "what does it cost?" If you can't articulate the value in dollars saved, hours returned, or revenue generated, the feature isn't ready to price.

Failure 2: no visibility into power user cost

The feature ships on flat pricing. Three months in, finance notices total AI cost has grown 3x. Nobody can tell why — the analytics don't break down cost per user. The team investigates for a week, finds that 1% of users are driving 40% of cost, and has no existing relationship with those users because "they're just using the feature, like everyone else."

The defence: the day the feature ships, usage-per-user and cost-per-user must be in the dashboard. If a user is 5x the median, you want to know. Proactive outreach at 10x the median. A pre-written "hey, you're a power user, let's talk" playbook by the time anyone hits 20x. This is the single cheapest defensive investment against pricing failures, and it costs a day of engineering work.

Failure 3: letting sales or finance pick the pricing alone

The feature is ready to launch. Product is busy with Stage 4 of the rollout. Sales wants to price it as an "Enterprise AI add-on" at $10K/year. Finance wants to bundle it into the existing Pro plan because "that's easier to sell." The two teams argue, settle on something nobody loves, and ship. Six months later, neither power users nor casual users are happy, and both sales and finance blame each other.

The defence: the PM owns pricing. Not exclusively — sales, finance, and marketing all have strong inputs — but the PM runs the pricing decision and takes accountability for it. The PM has the best view of the usage distribution, the customer value, the competitive landscape, and the product strategy. Other teams are inputs to the decision, not the owners of it. If you as a PM are not running your feature's pricing conversation, someone less well-informed is running it, and that's worse.


What just changed in your roadmap

  • Run the 4-model decision matrix on every new AI feature before commit. Pick the model that matches your cost, usage shape, and customer type — don't default to flat.
  • Fill out the COGS worksheet using real data from Rung 3 or the internal beta. Numbers in the one-pager estimated from assumptions are not enough for a pricing decision.
  • Instrument per-user AI usage and cost from day one, regardless of which pricing model you pick. You need to see power users before finance does.
  • Price to value, not to cost. Cost sets a floor; value sets a ceiling. Find the midpoint.
  • Revisit pricing every 6-12 months, or whenever your model tier or usage shape changes materially. Pricing is not a one-time decision.
  • Pre-write the power-user outreach playbook before shipping. It will save you weeks when the first whale appears.
  • Own pricing as the PM. Sales and finance are inputs, not owners.
  • For most 2026 B2B SaaS products adding AI, default to hybrid: flat with soft caps in the core plan, credits or explicit higher tiers for power users, enterprise custom for whales. This is the most common successful pattern.

Next post, P5.2, takes on the topic that every AI product team eventually has to deal with and usually does badly: positioning against "we added AI" noise. How to stand out in a market where every competitor is adding "AI-powered" to their landing page, the specific positioning patterns that work in 2026, and why "better AI" is almost never a winning claim.


Course navigation

⬅️ Previous📍 You are hereNext ➡️
⬅️ Previous
P4.4 · Handling Your First Public Failure
P5.1 of P5.4Next ➡️
P5.2 · Positioning Against AI Noise

📚 AI for Product · Course Home — 20 posts, five modules.


Cover photo via Unsplash. This post is part of the AI for Product series.

More from this blog

Learn AI - Zero to Hero

111 posts