The GPT-4o-audio-preview model from OpenAI introduces robust support for audio inputs as prompts. This significant enhancement allows the model to process and understand spoken language with remarkable accuracy, detecting subtle nuances within audio recordings. This capability adds considerable depth to generated user experiences, making it ideal for applications requiring sophisticated audio analysis and interpretation. Designed for PRO access, GPT-4o Audio boasts a substantial 128K token context window and a maximum output of 8K tokens. It supports streaming, audio input, functions, and structured outputs. Pricing is competitive at $2.50 per million input tokens and $10.00 per million output tokens. While it excels in understanding audio, please note that audio outputs are not currently supported. Leverage its power for superior transcription and audio-driven AI applications on Multi AI.
✅ Best For
🚀 Capabilities
❌ Limitations
Specifications
| Provider | openai |
| Context Window | 128,000 tokens |
| Max Output | 16,384 tokens |
| Minimum Plan | Premium |
Pricing
| Input Price | $2.5000 / 1M tokens |
| Output Price | $10.0000 / 1M tokens |
💡 With PRO subscription, cost is reduced by 20%