Cloudflare Launches the Most Complete Platform to Deploy Fast, Secure, Compliant AI Inference at Scale

Cloudflare today announced that developers can now build full-stack AI applications on Cloudflare’s network. Cloudflare’s developer platform will provide the best end-to-end experience for developers building AI applications, enabling fast and affordable inference without the need to manage infrastructure. As every business, from startups to enterprises, looks to augment their services with artificial intelligence, Cloudflare’s platform is empowering developers with the velocity to ship a production-ready application quickly, with security, compliance, and speed built in.

AI Gateway will give developers observability features to understand AI traffic like the number of requests, number of users, cost of running the app, and duration of requests. Additionally, developers can manage costs with caching and rate limiting. With caching, customers will be able to cache answers across repeated questions, reducing the need to constantly make multiple calls to expensive APIs. Rate limiting will help to manage the malicious actors and heavy traffic in order to manage growth and costs, giving developers more control over how they scale their applications.

Cloudflare has announced a collaboration with Databricks, Microsoft, Nvidia, and Hugging Face.

The continued collaboration with Databricks, the Data and AI company, is aimed at bringing MLflow capabilities to developers building on Cloudflare’s serverless developer platform. Cloudflare is joining the open-source MLflow project as an active contributor to bridge the gap between training models and easily deploy them to Cloudflare’s global network, where AI models can run close to end-users for a low-latency experience.
To make it easier for companies to run AI in the location most suitable for their needs, Cloudflare has collaborated with Microsoft. As inference tasks become more and more distributed, this collaboration will enable businesses to seamlessly deploy AI models across a computing continuum that spans device, network edge, and cloud environments – maximizing the benefits of both centralized and distributed computing models.
The partnership with Hugging Face, the leading open platform for AI builders, is aimed at deploying the best open AI models more accessible and affordable to developers. Cloudflare will be the first serverless GPU preferred partner for deploying Hugging Face models, enabling developers to quickly and easily deploy AI globally, without managing infrastructure or paying for unused compute capacity.
Lastly, as part of its collaboration with Nvidia, the global network will deploy NVIDIA GPUs at the edge combined with NVIDIA Ethernet switches, putting AI inference compute power close to users around the globe. It will also feature NVIDIA’s full stack inference software —including NVIDIA TensorRT-LLM and NVIDIA Triton Inference server — to further accelerate the performance of AI applications, including large language models.

“Cloudflare has all the infrastructure developers need to build scalable AI-powered applications, and can now provide AI inference as close to the user as possible. We’re investing to make it easy for every developer to have access to powerful, affordable tools to build the future,” said Matthew Prince, CEO and co-founder, of Cloudflare. “Workers AI will empower developers to build production-ready AI experiences efficiently and affordably, and in days, instead of what typically takes entire teams weeks or even months.”

“As enterprises look to maximize their operational velocity, more and more of them are turning to artificial intelligence,” said Stephen O’Grady, Principal Analyst with RedMonk. “But it’s critical to deliver a quality developer experience around AI, with abstractions to simplify the interfaces and controls to monitor costs. This is precisely what Cloudflare has optimized its Workers platform for.”