AzureAzure AI ServicesMachine Learning

`azurerm_cognitive_account` cost estimation

Managed AI APIs: OpenAI, Vision, Speech, Language, Translation. Pay-per-call (varies by service). OpenAI GPT-4o: $5/M input tokens, $15/M output. Vision: $1-$5/1K transactions. Speech-to-Text: $1/audio-hour.

Azure AI Services (formerly Cognitive Services) is the umbrella for managed AI APIs. The azurerm_cognitive_account resource creates an account for a specific service (OpenAI, Vision, Speech, Language, etc.) identified by the kind attribute. Pricing varies dramatically by service.

OpenAI (kind = "OpenAI"): - GPT-4o: $5/1M input tokens, $15/1M output tokens - GPT-4: $30/1M input, $60/1M output - GPT-3.5 Turbo: $0.50/1M input, $1.50/1M output - DALL-E 3: $0.040 per standard image - Embeddings (text-embedding-3-small): $0.020/1M tokens - Whisper (speech-to-text): $0.006/minute

Computer Vision (kind = "ComputerVision"): - Standard tier: $1.00 per 1,000 transactions (first 5K free) - Advanced features (Read, Analyze): $2.50 per 1,000 transactions

Speech (kind = "SpeechServices"): - Speech-to-Text Standard: $1.00 per audio-hour - Real-time STT: $1.40 per audio-hour - Text-to-Speech (Neural voices): $16 per 1M characters - Custom voice: training + $24/audio-hour

Language (kind = "TextAnalytics"): - Sentiment, Entity Recognition: $1/1K records (first 5K free) - Custom NER: training + per-prediction fee

Translator (kind = "TextTranslation"): - Standard: $10/1M characters

A typical Azure OpenAI workload using GPT-4o-mini for chatbot: 100M input tokens + 30M output tokens/month: - Input: 100 × $0.15 = $15 - Output: 30 × $0.60 = $18 - Total: ~$33/month

The same workload on GPT-4 (legacy): 100 × $30 + 30 × $60 = $4,800/month. Model choice dominates cost.

Provisioned Throughput Units (PTU) provide reserved capacity for OpenAI: $1-$60/hour per PTU depending on model. Right for high-throughput production where predictable latency matters more than pay-per-token efficiency.

c3x estimates Cognitive Services based on kind, sku_name, and (via c3x-usage.yml) expected transaction volume.

Terraform example

A minimal but realistic configuration that C3X can estimate.

resource "azurerm_cognitive_account" "openai" {
  name                = "prod-openai"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  kind                = "OpenAI"
  sku_name            = "S0"

  tags = {
    Environment = "production"
  }
}

resource "azurerm_cognitive_deployment" "gpt4" {
  name                 = "gpt-4o-deployment"
  cognitive_account_id = azurerm_cognitive_account.openai.id

  model {
    format  = "OpenAI"
    name    = "gpt-4o"
    version = "2024-08-06"
  }

  scale {
    type     = "Standard"
    capacity = 10  # 10K TPM tokens-per-minute
  }
}

Pricing dimensions

What you actually pay for when you provision azurerm_cognitive_account.

Dimension	Unit	What's being charged
OpenAI tokens (input)	per million tokens	Input tokens for OpenAI models. Varies by model. $5/1M GPT-4o, $0.50/1M GPT-3.5
OpenAI tokens (output)	per million tokens	Output tokens. Typically 3x input rate. $15/1M GPT-4o, $1.50/1M GPT-3.5
Vision transactions	per 1,000 transactions	Image analysis API calls. $1-$5 per 1K transactions
Speech-to-Text	per audio-hour	Audio transcription, billed by audio length. $1/audio-hour Standard
Provisioned Throughput Unit (PTU)	per PTU-hour	Reserved OpenAI capacity for production workloads. Better latency, predictable throughput. $1-$60/PTU-hour depending on model

Optimization tips

Common ways to reduce azurerm_cognitive_account cost without changing the workload.

Use smaller models when possible

30x or more

GPT-4o-mini is 30x cheaper than GPT-4o for many tasks. GPT-3.5 Turbo is 10x cheaper than GPT-4o. For simple classification, summarization, extraction, smaller models often work fine. A/B test quality before scaling up.

Cache responses

Up to 90%

Common queries (FAQ chatbot, product descriptions) can cache responses for hours/days. Each cached hit avoids the LLM call entirely. A 50% cache hit rate cuts token costs in half.

Optimize prompts

Variable, often 30-70%

Token counts in prompts add up. Trim system prompts, remove redundant context, use shorter formats. A 4K-token prompt vs 1K-token prompt is 4x more expensive per call.

Use Provisioned Throughput for high volume

20-40% for high volume

Above ~10M tokens/day sustained on a single model, PTU can be cheaper than pay-per-token. PTU also gives predictable latency. Calculate: token volume × pay-per-token vs PTU × $/hour × 24.

Batch where possible

50% on batch endpoints

Some Cognitive Services support batch endpoints with discounts (typically 50% off). For non-real-time workloads (overnight analysis), batch is dramatically cheaper.

FAQ

Is Azure OpenAI cheaper than direct OpenAI?

Roughly the same per-token pricing. Azure OpenAI's advantages are: integration with Azure auth/networking, data residency (data not used to train), better SLAs, and Microsoft enterprise contracts. For most enterprise users, Azure OpenAI is preferred. For early-stage testing, direct OpenAI is sometimes easier to access.

Do AI Services have a free tier?

Most have small free tiers per month (5K transactions for Vision/Language, etc.). OpenAI doesn't have a free tier. For development and testing, free tiers usually suffice.

How do I estimate OpenAI token costs?

Token count is roughly 1 token per 4 characters for English. Use the tiktoken library for exact counts. Multiply prompt tokens × input rate + completion tokens × output rate. Add overhead for system prompts and few-shot examples. Test with representative queries before estimating production cost.

When should I use Bedrock vs Azure OpenAI vs Vertex AI?

If you're on Azure, Azure OpenAI. If on AWS, Bedrock. If on GCP, Vertex AI. They've largely converged on similar models and pricing. The main differentiator is cloud-native integration (auth, networking, billing). Cross-cloud requires extra networking complexity.

Related resources

AWS

aws_sagemaker_endpoint

AWS ML inference alternative

Google Cloud

google_dataflow_job

GCP data pipeline reference

Estimate this resource in your own Terraform

Free, open source, no API key. C3X parses your Terraform and shows line-item cost for every resource, including azurerm_cognitive_account.

Get Started Star on GitHub

← Browse all Azure resources