The Meter Is Running: Making Sense of Microsoft's New AI Credit Fees — and How Hybrid Infrastructure Can Keep Them Under Control

Executive summary

What changed: On 1 July 2026, Microsoft completed its shift to usage-based, credit-metered billing across Copilot Studio (AI Agents), Azure AI Foundry / AI Studio, and Copilot Cowork. Flat per-seat pricing is now the exception.
How it’s charged: A two-layer model — a fixed base licence ($21–$32/user/month) plus variable Copilot Credits at ~$0.01/credit. Credits are drawn per action, with rates ranging from 1 credit (classic answer) to 100+ (premium AI tools), and reasoning models are billed twice.
The risk: Costs now track architecture and behaviour, not headcount — driving industry warnings of “meter shock.” Budget alerts notify but don’t stop spend; only per-agent hard caps do.
The counter-strategy: A hybrid AI infrastructure — routing high-volume work to on-premise open models, hard problems to frontier APIs (OpenAI, Anthropic), and reserving Copilot credits for deep Microsoft 365 integration — converts an unpredictable meter into a controllable, largely fixed cost.

On 1 July 2026, the way businesses pay for Microsoft’s AI stopped being a flat, predictable line item and started behaving like a utility bill. With the general availability of usage-based billing across Copilot Studio (AI Agents), Azure AI Foundry / AI Studio, and Copilot Cowork, Microsoft has completed its pivot from “buy a seat” to “pay for what you consume.” For finance and IT leaders, this is one of the most consequential shifts in enterprise software economics in years.

Here’s what actually changed, how the charging model works, and critically, how a hybrid AI infrastructure can turn an unpredictable meter into a controllable cost.

What changed on 1 July 2026

Microsoft’s AI story emphasised simplicity: pay $30 per user per month for Microsoft 365 Copilot and use it as much as you liked. That flat model is now the exception, not the rule. Three consumption-metered products now share the spotlight:

AI Agents (Copilot Studio): Custom agents you build to answer questions, take actions, and run workflows.
AI Studio / Azure AI Foundry: The developer surface for building, grounding, and deploying models and agents.
Copilot Cowork: Microsoft’s autonomous “digital coworker” that executes multi-step tasks across your Microsoft 365 environment. It reached general availability on 16 June 2026, and for organisations that were on the Frontier early-access program, official billing begins in July 2026 — with a 30 June 2026 deadline to configure credit-based billing before access is gated (Microsoft Partner Center announcements, June 2026).

XENON Microsoft Copilot Studio

The connective tissue across all three is a single unit of account: the Copilot Credit (with the parallel GitHub AI Credit on the developer side). Everything now gets metered, priced, and billed in credits.

How the charging model works

1. A two-layer model: fixed licence + variable usage

Microsoft’s consumption model is best understood as two layers stacked on top of each other:

Layer 1 — Base licence (fixed). A Microsoft 365 Copilot User Subscription License, roughly $21–$32 per user per month depending on your base plan (E3, E5, Business Standard/Premium) and commitment term. This is the entry ticket.
Layer 2 — Copilot Credits (variable). Actual AI work is metered and billed on top, in credits (Avantiico).

2. What a credit costs

The credit has a clean, published exchange rate:

Pay-as-you-go: $0.01 per Copilot Credit, billed through a linked Azure subscription.
Prepaid capacity packs: 25,000 credits for $200/month — also effectively $0.01/credit — pooled tenant-wide.
Prepaid annual (P3): roughly $0.008 per credit (~20% discount) with an annual commitment (Microsoft: Copilot Studio pricing).

On the GitHub side, the same logic applies: $0.01 USD = 1 GitHub AI Credit as of 1 June 2026, so a $10/month plan includes ~1,000 credits, and 100 Business licences at $19 each pool into ~190,000 shared credits per month.

3. What consumes credits — and how fast

This is where predictability breaks down. Credits are drawn per action, and the rate depends heavily on the type of work and the model invoked. Microsoft’s own published billing rates for Copilot Studio agents illustrate the spread (Microsoft Learn: Billing rates and management):

Agent activity	Copilot Credits
Classic answer	1
Generative answer	2
Agent action	5
Tenant graph grounding (per message)	10
AI tools — basic (per 10 responses)	1
AI tools — standard (per 10 responses)	15
AI tools — premium (per 10 responses)	100
Content processing (per page)	8
GenAI Voice (per minute)	35

Note the 100x gap between basic and premium AI tools. And when an agent calls a reasoning-capable model, Microsoft bills twice: the feature rate for the action plus the premium token rate (10 credits per 1,000 tokens). Total cost = feature rate + premium token rate.

For Copilot Cowork, every task is metered on four inputs: the model invoked (e.g. Claude Opus, Sonnet, or GPT-5.5), the volume of data retrieved from Microsoft 365, the number of tool calls executed, and total runtime. In practice:

Light task (email draft, policy lookup): ~125 credits (~$1.25)
Medium task (meeting-prep brief, vendor research): ~500 credits (~$5.00)
Heavy task (board-deck generation, multi-tool cross-tenant workflow): ~2,500 credits (~$25.00)

Important to note that Copilot Studio and Cowork draw from the same shared credit pool, so you have to budget for both together.

4. Governance: alerts warn, they don’t stop

A crucial nuance: budget alerts in the Microsoft 365 Admin Center (at 50%, 80%, 100%) notify but do not halt usage. To enforce a genuine hard stop you must set per-agent limits in the Power Platform Admin Center. Prepaid capacity triggers enforcement at 125% of capacity, but pay-as-you-go simply keeps billing overage to Azure with no cap (Microsoft Learn).

Why the industry is nervous: “meter shock”

The shift has drawn pointed reactions. Analysts and developers have warned of “meter shock” —the phenomenon where “organizations don’t realize how much AI they’ve used until the monthly bill arrives.” One widely-cited headline captured the developer mood bluntly: “You Will Get Less, but Pay the Same Price” (Ramel, 2026). Microsoft also removed the old fallback experience that used to downshift users to a cheaper model when credits ran low — now, when the balance is exhausted, usage simply stops until you buy more.

The strategic takeaway: your AI bill is now a direct function of architecture and behaviour. Which model runs, how much data it grounds against, and how many tool calls it makes are all now line items. That is precisely where infrastructure design becomes a cost lever.

The counter-strategy: hybrid AI infrastructure

If every token and every action is now metered at premium rates through Microsoft’s meter, the obvious question is: does every workload need to run there? For most enterprises, the answer is no. A hybrid AI infrastructure — some models running on your own hardware, some routed to closed frontier models, all integrated into the Microsoft world — is emerging as the most effective way to control and minimise AI spend.

The core idea: route work to the cheapest capable tier

Not all AI work is equal, and it shouldn’t be priced as if it were. A hybrid architecture sorts workloads across three tiers:

On-premise / self-hosted open models (e.g. Llama, Mistral, Qwen, Phi running on your own GPUs or a private cloud). Best for high-volume, predictable, sensitive, or “good-enough” tasks: classification, summarisation, RAG retrieval, internal Q&A, first-draft generation. Cost is your fixed hardware + power — effectively $0 marginal cost per token once provisioned, versus 10–100 credits per interaction on the premium meter.
Closed frontier models via API (OpenAI GPT-5.x, Anthropic Claude Opus/Sonnet). Reserved for the genuinely hard reasoning, coding, and multi-step tasks where quality justifies premium pricing — but paid at raw API rates, which are typically far below the marked-up per-credit rate embedded in Copilot’s premium tools.
Microsoft Copilot / Cowork credits. Kept for what they’re uniquely good at: deep, native integration with Microsoft 365 data, Graph grounding, and cross-app automation — used deliberately, not as the default for every prompt.

For an added bonus, bring this all together with an AI Infrastructure layer where an Agentic Orchestration layer directs work to the right model and tier. Users also can also override the default Orchestration choices and direct work with full visibility of marginal costs.

Where the savings come from

Displace premium-metered volume. The 100-credit “premium AI tools” rate and double-billed reasoning models are the expensive part of Microsoft’s meter. Serving those requests from on-prem open models — or a cheaper direct frontier API call — removes the multiplier for the bulk of everyday work.
Turn variable cost into fixed cost. On-prem inference converts an unpredictable per-token meter into a known, depreciable hardware cost. For high, steady volumes this is dramatically cheaper and, just as importantly, immunises you from “meter shock.”
Buy frontier quality at wholesale. Calling OpenAI or Anthropic directly avoids the credit markup and the double-billing on reasoning models, while still giving you top-tier capability where it matters.
Data gravity and compliance. Sensitive or regulated data can be processed on-premise without ever traversing a metered cloud service — cutting both cost and risk.

Keeping it in the Microsoft world

The hybrid model doesn’t mean abandoning Microsoft — it means using it surgically. Azure AI Foundry already supports bringing your own models and routing across providers, and an intelligent orchestration/gateway layer (model router, semantic caching, prompt-complexity scoring) can decide per request whether to serve it on-prem, via a frontier API, or through Copilot. Copilot and Cowork stay in place for the native Microsoft 365 experiences users love — but the expensive, high-volume inference is quietly redirected to cheaper tiers behind the scenes. Semantic caching alone can eliminate a large share of repeat requests before they ever hit a meter.

A simple decision rule

Default to the cheapest tier that can do the job. Escalate to frontier or Copilot credits only when the task genuinely requires their strengths.

Pair that routing logic with the governance controls Microsoft does provide — per-agent hard caps in the Power Platform Admin Center, prepaid capacity for baseline load, pay-as-you-go only for controlled overflow — and the metered model becomes far less threatening.

Bottom line

Microsoft’s 1 July 2026 move to credit-based AI billing is a rational monetisation of genuinely expensive compute — but it transfers cost risk squarely onto the customer. Left unmanaged, agents, AI Studio, and Cowork can generate exactly the “meter shock” the industry is warning about, because every model call, grounding operation, and tool action now has a price.

The organisations that will come out ahead aren’t the ones that avoid Microsoft AI — they’re the ones that treat infrastructure as a cost-control strategy. By running the right workloads on-premise, calling frontier models directly for the hard problems, and reserving Copilot credits for the deep Microsoft integrations only Microsoft can provide, a hybrid architecture lets you capture the full value of AI while keeping the meter — and the invoice — firmly under control.

Start Building Your Hybrid AI Infrastructure Today

XENON helps organisations build the entire AI stack from the DGX Spark, workstations and servers – all the way to enterprise AI Factories. XENON also supports AI orchestration and provides managed services to help organisation fast track deployment and adoption.

Contact the team today to start building your hybrid AI infrastructure.