Skip to content

Deployment

langdonholmes edited this page Aug 18, 2024 · 49 revisions

Overview

image

We have done our best to document all the commands we ran during deployment. The purpose of this document is mainly to assist with troubleshooting when something goes wrong with the server. Besides the following commands, you may also want to review the configuration files in this repository.

SSH Access

SSH Access is provided for pre-approved users on Vanderbilt networks only. The IP address is 10.33.2.42.

Frequently used commands and notes

To apply changes to /srv/jupyter/config.yaml, you will need to run:

helm upgrade --cleanup-on-fail jhub jupyterhub/jupyterhub --values /srv/jupyter/config.yaml --namespace jhub

Add this flag to the above command if any private shared volumes (shared directories that specific users have access to) are in use: --set-file hub.extraFiles.custom_pod_hook.stringData=/srv/jupyter/shared-private-volumes.py

If you encounter the following:

Error: UPGRADE FAILED: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp 127.0.0.1:8080: connect: connection refused

Then, you probably need to run

export KUBECONFIG=/srv/jupyter/.kube/config

To update any other service's configuration file, you can run:

microk8s kubectl apply -f /srv/jupyter/SERVICE_NAME.yaml

Log of commands used during initial deployment

Disable power management features and remove swapfile.

Make sure secure boot is disabled.

Recommended by Dell for data science workstations.

/usr/bin/gsettings set org.gnome.settings-daemon.plugins.power sleep-inactive-ac-timeout 0

sudo systemctl mask hibernate.target

sudo swapoff -v /swapfile

sudo sed -i ′/^\/swapfile/d′ /etc/fstab

sudo rm /swapfile

Disable Unattended Upgrades

Nvidia drivers are finicky and often require a reboot. Let's make sure nothing is updated without some supervision: sudo dpkg-reconfigure unattended-upgrades

Follow NVIDIA's directions for CUDA installation.

NVIDIA's directions at time of deployment.

We installed the Debian packages over the network.

Add the following to PATH:

export PATH=/usr/local/cuda-12.6/bin${PATH:+:${PATH}}

Install MicroK8s

We followed the getting started guide.

sudo snap install microk8s --classic --channel=1.26 sudo usermod -a -G microk8s $USER

su $USER

Move the Kube config to a shared location:

sudo mkdir /srv/jupyter/.kube

microk8s kubectl config view --raw > /srv/jupyter/.kube/config

sudo chown -R $USER:microk8s /srv/jupyter

Make sure to add that to PATH:
export KUBECONFIG=/srv/jupyter/.kube/config

Install Microk8s Addons

microk8s enable dns storage helm3 gpu ingress

sudo reboot

Configure dynamic volume provisioning

We had to install iscsi before it could be enabled:

sudo apt install open-iscsi

sudo systemctl enable iscsid.service

microk8s enable community

microk8s enable openebs

microk8s enable registry:size=100Gi

Configure metallb with ingress service

Instructions

microk8s enable metallb:10.0.0.100-10.0.0.200

Configure Volume Claims

microk8s.kubectl apply -f /srv/jupyter/local-storage-dir.yaml

microk8s.kubectl apply -f /srv/jupyter/active-projects.yaml

microk8s.kubectl apply -f /srv/jupyter/project-archive.yaml

Install helm

curl https://raw.githubusercontent.com/helm/helm/HEAD/scripts/get-helm-3 | bash

We also installed git:

sudo apt-get install git

Install JupyterHub

helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/

helm repo update

We install with the following command:

helm upgrade --cleanup-on-fail --install jhub jupyterhub/jupyterhub --namespace jhub --create-namespace --values config.yaml

Set jhub to be the default namespace for microk8s kubectl commands:

microk8s kubectl config set-context microk8s --namespace jhub

Configure Networking

Allow SSH

sudo ufw enable

sudo ufw allow 22

Install Github Cli

type -p curl >/dev/null || sudo apt install curl -y
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg \
&& sudo chmod go+r /usr/share/keyrings/githubcli-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
&& sudo apt update \
&& sudo apt install gh -y

Appendix

I ran the following command, which should permanently add the KUBECONFIG location to PATH for all users... (export is temporary) echo "KUBECONFIG=/srv/jupyter/.kube/config">>/etc/environment

If you need to a run a new service on the machine, you can request a route from Vanderbilt IT here: https://it.vanderbilt.edu/services/catalog/infrastructure/network/load_balancing.php