Deploy GPU ready Kubernetes on Google cloud with Terraform

Michael Hannecke

Introduction

Testing applications with Large Language Models (LLMs) requires hardware with powerful GPUs. While LLMs like mistral-7b or similar can run with some effort on a MacBook (see also here) and be tested with some acceptable performance, some gaming PCs may also provide enough GPU performance to enable local testing in a smaller scope.

Ultimately, especially if you want to develop and test as close as possible to a potential production environment, you can set up a Kubernetes cluster with GPU capabilities.

In the following, I will describe how to configure a Kubernetes cluster on Google Cloud that has NVIDIA-L4 GPUs. This will allow you to test larger models.

For the setup, I choose a managed cluster, as most of the required management and monitoring functionalities are provided and operated by GCP.

As a GPU capable compute node, I have chosen a g2-standard-24 instance, as this offers the best price/performance ratio for my requirements. Other instances can also be deployed depending on the requirements.

A corresponding overview of which instances with which GPU support can be provided by Google can be found here:

GPU on GKE

However, Google does not offer every instance/GPU combination in every region/zone. Therefore, take a look at this list to identify a suitable zone near you:

GPU availability

And not to forget: GPUs don’t come for free, you should consider the costs in advance based on this list:

GPU Pricing on GKE

The configuration discussed in the following costs about 2.7€ per hour.

I have chosen to deploy via Terraform, as I can tear down the entire environment with a single command and (re-)deploy it again at any time.

I strongly recommend to ensure the cluster is removed after you finished evaluation to avoid unpleasant surprises with you GCP bill. At this point, I also recommend configuring appropriate budget alerts…

Prerequisites

The following prerequisites must be met to be able to rebuild the environment:

A valid GCP project with the corresponding quotas for the chosen instance size. All NVIDIA-capable instances are outside the standard quotas, so a quota adjustment is required. A description of how to request a quota adjustment can be found on this page.
Terraform CLI is installed (a documentation can be found here).
We need a service account that has the necessary permissions to make infrastructure changes. A description of how to do this can be found here: https://medium.com/bluetuple-ai/terraform-remote-state-on-gcp-d50e2f69b967
For access to the Kubernetes cluster, kubectl is required. Information about the installation can be found on the official Kubernetes website.

Let’s get started

First, we need to log in to GCP with sufficient permissions:

Then, we need to set the project, target region, and zone as environment variables. When doing so, make sure that the desired instance/GPU combination has sufficient quotas in the selected region for the project.

We also set the cluster name as an environment variable:

In the next step, we need to enable some required APIs for the project, if not already done otherwise:

Terraform

For better readability, I have omitted storage for storing the Terraform state files in the Terraform code. A detailed description of how the remote state can be configured can be found in this article:

Terraform Remote State on GCP

In general, the following configuration should be stored in a separate directory to avoid side effects with other deployments, especially with a `terraform destroy` command without additional parameters.

We will create the following files for terraform:

main.tf, variables.tf, provider.tf, and terraform.tfvars contain variable definitions and configurations that terraform needs for further deployment. I will not bore you with the details, you can find the files in the this GitHub repository.

For the Kubernetes cluster, I am using two definition files, one for the cluster itself and one for the GPU node pool.

Of course, all resource definitions can also be made in one file, but Terraform tends to delete the entire node pool or even the cluster and deploy it again with certain changes, which can sometimes take a very long time.

I have had better experiences with separate definition files, changes to a node pool have no impact on other pools or the cluster itself. Ultimately, it serves the purpose of better readability.

The definition of the cluster itself:

The definition of our core GPU-enabled cluster should look like this

Adjust the variables to your needs.

Then we can ask terraform to carry out the pre-steps required to check if our code is deployable:

If Terraform does not report any errors (potential a typo?), the cluster can be requested with:

From now on the bill starts ticking, so don’t forget to tear everything sown, once youre finished.

The deployment takes about 10 minutes, time for a coffee.

Once the cluster has been fully deployed, we need to request the credentials for kubectl:

From now on we can communicate with the cluster as usual:

You should see two nodes, one from default node pool, one withi our GPU node pool.

So far so good

Now we can deploy a small pod that uses GPUs. To do this, create a file called

‘cuda-pod.yaml` with the following content in the current folder:

Deploy the pod with:

Once `kubectl get pod` shows the pod in the `running` status, we would be able to connect to a shell in the pod:

Now we could run `nvidia-smi` on pod shell; the necessary drivers have already been provided during the deployment.

The output should look something like this:

In addition, the status can also be viewed in the GCP portal under

Kubernetes -> cluster -> Nodes -> your GPU node -> Overview, and it should look similar to this:

Now you can run cotainerized LLMS or other machine learning task on the GPU. I’ll publish some articles on how to containerzie LLMs later.

Don’t forget to delete the cluster when you no longer need it, otherwise you will incur significant costs.

The cluster can be deleted as usual:

Keep in mind that with the destroy command every infrastructure will be deleted, which has definition files in the same directory as you put the gke cluster definition in.

Keep an eye on the output of the destroy command before you approve the execution by typing ‘yes’.

Summary

Now you can deploy and test suitable LLMs such as Mistral-7B locally with the GPU-enabled cluster. I will publish a separate article about this.

The framework can be easily adapted to other requirements and used as a basis for your own deployments.

Of course, the same cluster can also be easily deployed with a single CLI command, but the Terraform approach is closer to DevOps principles and can be integrated into a CI/CD pipeline with only minor modifications.

< Older Post

Newer Post >