
Valuation
$2.15B
2025
Funding
$285.00M
2025
Valuation
Baseten closed a $150M Series D in September 2025 at a $2.15B valuation, led by BOND with participation from existing investors. This marks an increase from the company's $825M valuation during its $75M Series C round in February 2025.
The company's fundraising activity has increased notably. After raising a $2.5M seed round led by First Round Capital in 2021 and a $13.5M Series A led by Sequoia in 2023, Baseten completed three funding rounds in 2025. Key investors include CapitalG, Premji Invest, IVP, Spark Capital, Greylock, Conviction, and 01 Advisors.
Baseten has raised over $285M in total funding across all rounds.
Product
Baseten is a serverless inference platform that converts machine learning models into production-ready APIs. It functions similarly to AWS Lambda for AI workloads. Developers use Baseten's open-source Truss framework to package their models, resulting in auto-scaling HTTPS endpoints that manage GPU orchestration, caching, and monitoring.
The workflow begins with Truss, a CLI tool where developers run `truss init`, add model code to `model.py`, configure settings in `config.yaml`, and push to Baseten. The platform encapsulates models in Firecracker-style micro-VMs, shards weight files across GPU fleets, and employs cold-start snapshots to bring models as large as 20GB online in under 10 seconds.
Baseten provides three deployment options. Dedicated Deployments allow customers to select specific GPU instances with configurable autoscaling parameters. Model APIs enable one-click access to open-source models such as Llama and DeepSeek via OpenAI-compatible endpoints. The Chains SDK supports multi-model workflows, enabling each step to run on different hardware with point-to-point communication.
The platform includes detailed observability features, such as per-deployment dashboards that track request volumes, latencies, GPU utilization, and logs. Developers can iterate locally using `truss watch` for live-reload development and deploy to production with `truss push --publish`.
Business Model
Baseten operates as a B2B SaaS platform with usage-based pricing that adjusts according to customer utilization. The company employs an asset-light model, utilizing multi-cloud capacity management across more than 10 cloud providers instead of owning GPU infrastructure.
For dedicated deployments, customers are charged per-minute pricing for GPU instances, with compute costs transparently outlined. Pricing is tiered by hardware, ranging from basic T4 GPUs to premium B200 chips. Customers can configure autoscaling parameters, including scale-to-zero functionality, to manage costs effectively.
Model APIs are priced on a token-based system similar to OpenAI but at rates typically over 50% lower for comparable model access. This pricing structure appeals to developers familiar with OpenAI's model while offering improved unit economics.
The business benefits from expansion dynamics, as customers often begin with small workloads and increase usage as their AI applications grow. Revenue scales in direct proportion to customer usage, as pricing is tied to consumption rather than fixed contracts or seat-based models.
Baseten's multi-cloud strategy delivers cost efficiencies and reduces risk compared to single-cloud competitors. The platform dynamically allocates workloads across providers based on GPU availability and pricing, addressing supply constraints in high-end AI chips and mitigating vendor lock-in risks.
The company sustains strong gross margins by minimizing infrastructure costs through methods such as weight caching, efficient container orchestration, and intelligent workload distribution across its multi-cloud network.
Competition
Serverless GPU specialists
Modal Labs is Baseten's closest competitor, offering Rust-based functions on GPUs that scale to hundreds of GPUs within seconds. Modal reports six-figure monthly recurring revenue and has acquired workflow tooling companies to expand its capabilities.
Replicate targets the developer community with one-line APIs for thousands of open-source models and is backed by Andreessen Horowitz. While it performs well in the hobbyist and indie developer segments, Replicate does not provide enterprise governance features or compliance certifications, which are part of Baseten's offering.
Together AI competes on pricing with a low-margin, high-volume token model and dedicated H100/H200 clusters. Its SOC 2 Type 2 certification and competitive pricing on models such as Llama create pricing pressure on Baseten's Model APIs.
Cloud provider incumbents
AWS, Google Cloud, and Microsoft Azure represent the most significant long-term competitive challenge due to their vertical integration of AI services within broader cloud platforms. These providers leverage enterprise relationships and cloud spend commitments to bundle inference capabilities.
Baseten differentiates by focusing on developer experience, faster iteration cycles, and support for open-source models, avoiding the vendor lock-in risks associated with proprietary cloud AI services.
Specialized infrastructure players
OctoAI prioritizes hardware portability across NVIDIA, AMD, and AWS Inferentia chips, with strong on-premises deployment options. This appeals to regulated enterprises, a segment Baseten also targets with its HIPAA compliance.
Anyscale uses the Ray ecosystem for distributed compute workloads, competing in scenarios where customers require full-stack distributed computing beyond inference serving.
TAM Expansion
New product categories
Baseten's introduction of Model APIs and Training capabilities in 2025 expands its role in the AI value chain from inference-only services to include model supply and lifecycle management. The Training offering supports multi-node fine-tuning jobs with seamless promotion to inference endpoints, addressing a larger portion of the machine learning workflow.
Baseten Embeddings Inference targets the expanding RAG and search segments, optimizing throughput and latency for embedding workloads. This approach enables the company to capture value across the broader AI application ecosystem, extending beyond LLM inference.
The Chains SDK facilitates orchestration of compound AI systems, broadening Baseten's addressable market from single-model serving to comprehensive AI application backends. These include use cases such as voice AI, agents, and complex RAG pipelines.
Enterprise market penetration
HIPAA and SOC 2 Type II compliance certifications enable access to regulated industries such as healthcare and financial services, which have historically been unable to adopt managed inference platforms. These industries typically exhibit higher willingness to pay and favor longer contract durations.
The introduction of single-tenant and self-hosted deployment options addresses enterprise security requirements while preserving the benefits of Baseten's optimization stack. This hybrid model supports large enterprise deals that require on-premises components.
Multi-cloud capacity management mitigates vendor lock-in concerns, a common barrier to enterprise adoption of cloud-native AI platforms. This capability aligns with the needs of large organizations implementing multi-cloud strategies.
Geographic expansion
Baseten's multi-cloud architecture, spanning dozens of regions, supports entry into markets with data residency requirements, including the EU, Latin America, and Asia-Pacific. These regions often face limited local GPU availability, creating opportunities for Baseten's capacity aggregation model.
Hybrid deployment capabilities allow customers to retain sensitive data on-premises while utilizing Baseten Cloud for additional capacity. This approach addresses markets with constrained local GPU infrastructure, such as Japan and the Middle East.
The open-source Truss framework, with over 6,000 GitHub stars, fosters grassroots adoption. As projects scale beyond self-hosting capabilities, this channel can drive paid conversions, supporting global developer acquisition.
Risks
GPU supply constraints: Baseten's multi-cloud model relies on GPU availability from cloud providers, which are subject to the same chip supply limitations. Shortages of H100/B200 GPUs could restrict the company's ability to scale customer workloads and sustain competitive pricing. This may lead customers to consider alternatives with more reliable hardware access.
Hyperscale competition: AWS, Google, and Microsoft leverage extensive enterprise relationships to bundle AI inference with broader cloud commitments at below-market rates. As these providers enhance their developer experience and expand model catalogs, they may commoditize the inference layer. This could exert pricing pressure and create integration advantages that challenge independent platforms like Baseten.
Model commoditization: Accelerated open-source model development risks reducing differentiation in model hosting, potentially turning inference platforms into commodity services. If model performance converges and switching costs remain low, competition could center on price, compressing margins and challenging Baseten's ability to sustain premium pricing for its optimization and developer experience features.
News
DISCLAIMERS
This report is for information purposes only and is not to be used or considered as an offer or the solicitation of an offer to sell or to buy or subscribe for securities or other financial instruments. Nothing in this report constitutes investment, legal, accounting or tax advice or a representation that any investment or strategy is suitable or appropriate to your individual circumstances or otherwise constitutes a personal trade recommendation to you.
This research report has been prepared solely by Sacra and should not be considered a product of any person or entity that makes such report available, if any.
Information and opinions presented in the sections of the report were obtained or derived from sources Sacra believes are reliable, but Sacra makes no representation as to their accuracy or completeness. Past performance should not be taken as an indication or guarantee of future performance, and no representation or warranty, express or implied, is made regarding future performance. Information, opinions and estimates contained in this report reflect a determination at its original date of publication by Sacra and are subject to change without notice.
Sacra accepts no liability for loss arising from the use of the material presented in this report, except that this exclusion of liability does not apply to the extent that liability arises under specific statutes or regulations applicable to Sacra. Sacra may have issued, and may in the future issue, other reports that are inconsistent with, and reach different conclusions from, the information presented in this report. Those reports reflect different assumptions, views and analytical methods of the analysts who prepared them and Sacra is under no obligation to ensure that such other reports are brought to the attention of any recipient of this report.
All rights reserved. All material presented in this report, unless specifically indicated otherwise is under copyright to Sacra. Sacra reserves any and all intellectual property rights in the report. All trademarks, service marks and logos used in this report are trademarks or service marks or registered trademarks or service marks of Sacra. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any report is strictly prohibited. None of the material, nor its content, nor any copy of it, may be altered in any way, transmitted to, copied or distributed to any other party, without the prior express written permission of Sacra. Any unauthorized duplication, redistribution or disclosure of this report will result in prosecution.