Cloudflare Docs
Workers AI
Edit this page
Give us feedback
Set theme to dark (⇧+D)

whisper-tiny-en

Beta

Model ID: @cf/openai/whisper-tiny-en

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. This is the English-only version of the Whisper Tiny model which was trained on the task of speech recognition.

​​ Properties

Task Type: Automatic Speech Recognition

​​ Code Examples

Workers - TypeScript
export interface Env {
AI: Ai;
}
export default {
async fetch(request, env): Promise<Response> {
const res: any = await fetch(
"https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav"
);
const blob = await res.arrayBuffer();
const input = {
audio: [...new Uint8Array(blob)],
};
const response = await env.AI.run(
"@cf/openai/whisper-tiny-en",
input
);
return Response.json({ input: { audio: [] }, response });
},
} satisfies ExportedHandler<Env>;
curl
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/openai/whisper-tiny-en \
-X POST \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
--data-binary "@talking-llama.mp3"

​​ Response

Automatic speech recognition responses return both a single string text property with the audio transciption and an optional array of words with start and end timestamps if the model supports that.

Here’s an example of the output from the @cf/openai/whisper model:

{
"text": "It is a good day",
"word_count": 5,
"words": [
{
"word": "It",
"start": 0.5600000023841858,
"end": 1
},
{
"word": "is",
"start": 1,
"end": 1.100000023841858
},
{
"word": "a",
"start": 1.100000023841858,
"end": 1.2200000286102295
},
{
"word": "good",
"start": 1.2200000286102295,
"end": 1.3200000524520874
},
{
"word": "day",
"start": 1.3200000524520874,
"end": 1.4600000381469727
}
]
}

​​ API Schema

The following schema is based on JSON Schema

Input JSON Schema
{
"oneOf": [
{
"type": "string",
"format": "binary"
},
{
"type": "object",
"properties": {
"audio": {
"type": "array",
"items": {
"type": "number"
}
}
},
"required": [
"audio"
]
}
]
}
Output JSON Schema
{
"type": "object",
"contentType": "application/json",
"properties": {
"text": {
"type": "string"
},
"word_count": {
"type": "number"
},
"words": {
"type": "array",
"items": {
"type": "object",
"properties": {
"word": {
"type": "string"
},
"start": {
"type": "number"
},
"end": {
"type": "number"
}
}
}
},
"vtt": {
"type": "string"
}
},
"required": [
"text"
]
}