azurerm_cognitive_account cost estimation
Managed AI APIs: OpenAI, Vision, Speech, Language, Translation. Pay-per-call (varies by service). OpenAI GPT-4o: $5/M input tokens, $15/M output. Vision: $1-$5/1K transactions. Speech-to-Text: $1/audio-hour.
Azure AI Services (formerly Cognitive Services) is the umbrella for managed AI APIs. The azurerm_cognitive_account resource creates an account for a specific service (OpenAI, Vision, Speech, Language, etc.) identified by the kind attribute. Pricing varies dramatically by service.
OpenAI (kind = "OpenAI"): - GPT-4o: $5/1M input tokens, $15/1M output tokens - GPT-4: $30/1M input, $60/1M output - GPT-3.5 Turbo: $0.50/1M input, $1.50/1M output - DALL-E 3: $0.040 per standard image - Embeddings (text-embedding-3-small): $0.020/1M tokens - Whisper (speech-to-text): $0.006/minute
Computer Vision (kind = "ComputerVision"): - Standard tier: $1.00 per 1,000 transactions (first 5K free) - Advanced features (Read, Analyze): $2.50 per 1,000 transactions
Speech (kind = "SpeechServices"): - Speech-to-Text Standard: $1.00 per audio-hour - Real-time STT: $1.40 per audio-hour - Text-to-Speech (Neural voices): $16 per 1M characters - Custom voice: training + $24/audio-hour
Language (kind = "TextAnalytics"): - Sentiment, Entity Recognition: $1/1K records (first 5K free) - Custom NER: training + per-prediction fee
Translator (kind = "TextTranslation"): - Standard: $10/1M characters
A typical Azure OpenAI workload using GPT-4o-mini for chatbot: 100M input tokens + 30M output tokens/month: - Input: 100 × $0.15 = $15 - Output: 30 × $0.60 = $18 - Total: ~$33/month
The same workload on GPT-4 (legacy): 100 × $30 + 30 × $60 = $4,800/month. Model choice dominates cost.
Provisioned Throughput Units (PTU) provide reserved capacity for OpenAI: $1-$60/hour per PTU depending on model. Right for high-throughput production where predictable latency matters more than pay-per-token efficiency.
c3x estimates Cognitive Services based on kind, sku_name, and (via c3x-usage.yml) expected transaction volume.
Terraform example
A minimal but realistic configuration that C3X can estimate.
resource "azurerm_cognitive_account" "openai" {
name = "prod-openai"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
kind = "OpenAI"
sku_name = "S0"
tags = {
Environment = "production"
}
}
resource "azurerm_cognitive_deployment" "gpt4" {
name = "gpt-4o-deployment"
cognitive_account_id = azurerm_cognitive_account.openai.id
model {
format = "OpenAI"
name = "gpt-4o"
version = "2024-08-06"
}
scale {
type = "Standard"
capacity = 10 # 10K TPM tokens-per-minute
}
}Pricing dimensions
What you actually pay for when you provision azurerm_cognitive_account.
| Dimension | Unit | What's being charged |
|---|---|---|
| OpenAI tokens (input) | per million tokens | Input tokens for OpenAI models. Varies by model. $5/1M GPT-4o, $0.50/1M GPT-3.5 |
| OpenAI tokens (output) | per million tokens | Output tokens. Typically 3x input rate. $15/1M GPT-4o, $1.50/1M GPT-3.5 |
| Vision transactions | per 1,000 transactions | Image analysis API calls. $1-$5 per 1K transactions |
| Speech-to-Text | per audio-hour | Audio transcription, billed by audio length. $1/audio-hour Standard |
| Provisioned Throughput Unit (PTU) | per PTU-hour | Reserved OpenAI capacity for production workloads. Better latency, predictable throughput. $1-$60/PTU-hour depending on model |
Optimization tips
Common ways to reduce azurerm_cognitive_account cost without changing the workload.
Use smaller models when possible
30x or moreGPT-4o-mini is 30x cheaper than GPT-4o for many tasks. GPT-3.5 Turbo is 10x cheaper than GPT-4o. For simple classification, summarization, extraction, smaller models often work fine. A/B test quality before scaling up.
Cache responses
Up to 90%Common queries (FAQ chatbot, product descriptions) can cache responses for hours/days. Each cached hit avoids the LLM call entirely. A 50% cache hit rate cuts token costs in half.
Optimize prompts
Variable, often 30-70%Token counts in prompts add up. Trim system prompts, remove redundant context, use shorter formats. A 4K-token prompt vs 1K-token prompt is 4x more expensive per call.
Use Provisioned Throughput for high volume
20-40% for high volumeAbove ~10M tokens/day sustained on a single model, PTU can be cheaper than pay-per-token. PTU also gives predictable latency. Calculate: token volume × pay-per-token vs PTU × $/hour × 24.
Batch where possible
50% on batch endpointsSome Cognitive Services support batch endpoints with discounts (typically 50% off). For non-real-time workloads (overnight analysis), batch is dramatically cheaper.
FAQ
Is Azure OpenAI cheaper than direct OpenAI?
Roughly the same per-token pricing. Azure OpenAI's advantages are: integration with Azure auth/networking, data residency (data not used to train), better SLAs, and Microsoft enterprise contracts. For most enterprise users, Azure OpenAI is preferred. For early-stage testing, direct OpenAI is sometimes easier to access.
Do AI Services have a free tier?
Most have small free tiers per month (5K transactions for Vision/Language, etc.). OpenAI doesn't have a free tier. For development and testing, free tiers usually suffice.
How do I estimate OpenAI token costs?
Token count is roughly 1 token per 4 characters for English. Use the tiktoken library for exact counts. Multiply prompt tokens × input rate + completion tokens × output rate. Add overhead for system prompts and few-shot examples. Test with representative queries before estimating production cost.
When should I use Bedrock vs Azure OpenAI vs Vertex AI?
If you're on Azure, Azure OpenAI. If on AWS, Bedrock. If on GCP, Vertex AI. They've largely converged on similar models and pricing. The main differentiator is cloud-native integration (auth, networking, billing). Cross-cloud requires extra networking complexity.
Related resources
Estimate this resource in your own Terraform
Free, open source, no API key. C3X parses your Terraform and shows line-item cost for every resource, including azurerm_cognitive_account.