Although running Whisper locally incurs no cost, it requires GPUs and other hardware, which creates challenges in terms of processing speed and operational expenses. In this article, we introduce a method to run Whisper at a very low cost using Cloudflare Workers AI — costing only
This cost difference makes Cloudflare Workers AI a very attractive option, especially for large-scale audio processing projects.
The project introduced here provides an audio transcription API that utilizes the Whisper model on Cloudflare Workers AI. Its main features include:
/docs
endpoint.@cf/openai/whisper
@cf/openai/whisper-tiny-en
@cf/openai/whisper-large-v3-turbo
This project uses TypeScript, Hono, and Zod to implement a simple yet robust API. Below is an explanation of the key files:
src/index.ts
— The Application Entry PointThis file defines the overall configuration and routing of the API.
Importing Libraries:
Libraries such as hono/logger
, @hono/zod-openapi
, and @hono/swagger-ui
are imported to handle logging, automatic generation of OpenAPI specifications, and providing Swagger UI, respectively.
Generating an OpenAPIHono Instance:
const app = new OpenAPIHono<{ Bindings: CloudflareBindings }>();
This line creates an OpenAPI-compatible Hono application while making Cloudflare Workers environment variables (in this case, the AI
binding) available.
Applying Middleware:
app.use(logger());
This applies a logging middleware to record details of every request.
Configuring OpenAPI Documentation and Swagger UI:
app.doc("openapi.json", { ... });
app.get("/docs", swaggerUI({ url: "/openapi.json" }));
The API specification is output as JSON at /openapi.json
, and the documentation is viewable via Swagger UI at /docs
.
Routing:
app.get("/", (c) => { return c.text("Hello Hono!"); });
app.route("/v1/audio", v1Audio);
The root endpoint returns a simple greeting, while audio-related processing is routed to /v1/audio
. The v1Audio
endpoint is defined in a separate file, where the audio transcription functionality is implemented.
src/v1/audio.ts
— Audio Transcription EndpointThis file implements the endpoint for transcribing audio files. The key points include:
Request Validation:
Zod is used to validate requests sent as multipart/form-data
.
schema: z.object({
file: z.custom<File>(),
model: z.union([
z.literal("@cf/openai/whisper"),
z.literal("@cf/openai/whisper-tiny-en"),
z.literal("@cf/openai/whisper-large-v3-turbo"),
]),
// Other optional parameters...
})
This strict validation ensures that the received data is in the correct format, providing a reliable API for developers.
Conditional Processing Based on Model:
Depending on the model
specified in the request, the Cloudflare Workers AI’s run
method is invoked.
const response = await c.env.AI.run(model, { audio: [...new Uint8Array(arrayBuffer)] });
Here, the audio file is converted into a byte array and passed to the AI runtime, which then performs transcription using the specified Whisper model.
Response Format Handling:
Currently, only the json
format is implemented. Other formats (such as text
, srt
, verbose_json
, and vtt
) are not yet implemented and will result in an error stating “Not implemented.” This design leaves room for future expansion.
switch (response_format) {
case "json":
return c.json({ text: response.text });
default:
throw new HTTPException(500, { message: "Not implemented" });
}
Overall Code Structure:
The entire file is documented according to the OpenAPI specification. By using createRoute
, the API endpoints, request parameters, and response formats are automatically generated. This allows API users to easily review the specifications via Swagger UI and similar tools.
Clone the Repository:
git clone https://github.com/Lqm1/openai-workers-ai.git
cd openai-workers-ai
npm install
Test Locally:
npm run dev
This command starts a local server where you can test the API endpoints and view the Swagger UI documentation.
Deploy to Cloudflare Workers:
npm run deploy
Alternatively, you can deploy directly using the “Deploy to Cloudflare Workers” button provided at the top of the README.
Using the Official Client:
Once deployment is complete, simply replace the base_url
in the official OpenAI client with the URL of your deployed Worker to use it just as you would with the original API.
This project has been developed with an emphasis on the following aspects:
Integration with Cloudflare Workers:
By leveraging the benefits of serverless computing, a low-latency and scalable API is achieved.
Type Safety and Validation:
The use of TypeScript and Zod ensures a robust and error-resistant implementation.
OpenAPI Compatibility:
Documentation is automatically generated based on the OpenAPI specification, making it extremely accessible for developers.
Note: Some response formats (such as text
, srt
, verbose_json
, and vtt
) are currently not implemented. Future updates and improvements are planned to expand functionality. Contributions in the form of ideas, code improvements, or bug fixes are very welcome.
Reducing the cost of using the Whisper API offers significant benefits—especially for projects involving large-scale audio processing or those with strict budget constraints. By leveraging Cloudflare Workers AI, you can achieve Whisper transcription in a low-cost, scalable environment.
We hope this explanation has given you a clear understanding of the design philosophy and implementation methods used in this project. Please check out openai-workers-ai and consider applying it to your own projects. Contributions of any size, including suggestions for unimplemented features or improvements, are greatly appreciated!
There are no datasets linked
There are no models linked
There are no datasets linked
There are no models linked