How does Levity optimize infrastructure and control operational costs, and how will this change as the company scales up?

We do a lot of things on top of Valohai, which runs inside AWS but it also runs in any other cloud. That’s good if we want to switch or need a lot more resources. There’s been quota issues recently and it was hard to actually get more GPU resources from the cloud providers, probably because of chip shortages and crypto mining. That made it very hard to say, "I want 50 more powerful GPUs now."

The challenge there is we don't have a constant load on these GPUs. It's not like I know I need one GPU and then, I add a hundred more customers and suddenly, I need two GPUs. It's hard to predict when people make training or inference requests so we have to build things in a way that allows them to scale up and down very quickly.

Recently, we have adopted a new technology from AWS called Sagemaker Serverless. It's not running the model all the time, but it’s responding to requests whenever they come in. It's similar to Lambda and I think built on top of that, so we don't have these big costs of a model sitting and waiting for requests all the time. We drove the costs down to just a couple dollars per month to host a single model but that was still not economically viable for the ones that only got sparse requests or not too many requests in a month.

Now, we’ve switched to something that's actually serverless. For the training, it just spins up the instances as they're needed, and then shuts them down a few minutes later. It can scale really well and it doesn't explode in cost just because we have more customers. We're trying to keep it variable where it's important, so that we don't have fixed costs that just grow all the time.