Skip to content

Deployment 2

langdonholmes edited this page Aug 18, 2024 · 9 revisions

Overview

We have done our best to document all the commands we ran during deployment. The purpose of this document is mainly to assist with troubleshooting when something goes wrong with the server. Besides the following commands, you may also want to review the configuration files in this repository.

SSH Access

SSH Access is provided for pre-approved users on Vanderbilt networks only. The IP address is 10.33.2.13.

Log of commands used during initial deployment

Disable power management features and remove swapfile.

Make sure secure boot is disabled.

Recommended by Dell for data science workstations.

/usr/bin/gsettings set org.gnome.settings-daemon.plugins.power sleep-inactive-ac-timeout 0

sudo systemctl mask hibernate.target

sudo swapoff -v /swapfile

sudo sed -i ′/^\/swapfile/d′ /etc/fstab

sudo rm /swapfile

Disable Unattended Upgrades

Nvidia drivers are finicky and often require a reboot. Let's make sure nothing is updated without some supervision: sudo dpkg-reconfigure unattended-upgrades

Follow NVIDIA's directions for CUDA installation.

NVIDIA's directions at time of deployment.

sudo apt-get install linux-headers-$(uname -r)

We installed the debian package over the network:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda nvidia-container-toolkit

Add the following to PATH (using the correct CUDA version):

export PATH=/usr/local/cuda-12.6/bin${PATH:+:${PATH}}

Install MicroK8s

We followed the getting started guide.

sudo snap install microk8s --classic --channel=1.29

sudo usermod -a -G microk8s $USER

su $USER

Move the Kube config to a shared location:

sudo mkdir /srv/jupyter/.kube

microk8s kubectl config view --raw > /srv/jupyter/.kube/config

sudo chown -R $USER:microk8s /srv/jupyter

Make sure to add that to your PATH:
export KUBECONFIG=/srv/jupyter/.kube/config

Install Microk8s Addons

microk8s enable dns storage helm3 gpu ingress

sudo reboot

Install Github Cli

type -p curl >/dev/null || sudo apt install curl -y
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg \
&& sudo chmod go+r /usr/share/keyrings/githubcli-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
&& sudo apt update \
&& sudo apt install gh -y

Install Docker

Followed instructions to install via Docker's apt repository.

Install Nvidia Container Toolkit

Followed instructions to install the Nvidia Container Toolkit

Appendix

I ran the following command, which should permanently add the KUBECONFIG location to PATH for all users... (export is temporary) echo "KUBECONFIG=/srv/jupyter/.kube/config">>/etc/environment

If you need to a run a new service on the machine, you can request a route from Vanderbilt IT here: https://it.vanderbilt.edu/services/catalog/infrastructure/network/load_balancing.php

Clone this wiki locally