vasttools

The aim is to set up a list of tools that can be used with Vast.ai. The tools are free to use, modify and distribute. If you find this helpful and would like to donate, you can send your donations to the following wallets.

BTC 15qkQSYXP2BvpqJkbj2qsNFb6nd7FyVcou

XMR 897VkA8sG6gh7yvrKrtvWningikPteojfSgGff3JAUs3cu7jxPDjhiAZRdcQSYPE2VGFVHAdirHqRZEpZsWyPiNK6XPQKAg

RVN RSgWs9Co8nQeyPqQAAqHkHhc5ykXyoMDUp

USDT(ETH ERC20) 0xa5955cf9fe7af53bcaa1d2404e2b17a1f28aac4f

PayPal PayPal.Me/cryptolabsZA

CryptoLabs Datacenter Tools

These tools have evolved into a complete datacenter management suite under the CryptoLabs organisation. If you're running GPU infrastructure on Vast.ai, RunPod, or bare metal, check out the full toolkit:

Tool	Description	Link
IPMI Monitor	IPMI/BMC server monitoring with AI-powered insights, SSH log collection, and web dashboard. Monitors SEL events, sensors, GPU health, and more.	cryptolabsza/ipmi-monitor
DC Overview	Prometheus/Grafana datacenter monitoring dashboards. Full visibility into your fleet with pre-built dashboards for GPU, network, and system metrics.	cryptolabsza/dc-overview
DC Exporter	Rust-based GPU metrics exporter (dc-exporter-rs). Collects GPU core, hotspot, and VRAM temps including GDDR6X. Runs alongside Vast.ai and RunPod without interference.	cryptolabsza/dc-exporter-releases
DC Watchdog	SaaS uptime monitoring for your fleet. Replaces the old Telegram uptime bot with a managed service — multi-machine agents, alerts, and a dashboard.	cryptolabsza/dc-overview

Recommended: Start with DC Overview + DC Exporter for monitoring, and IPMI Monitor if you have IPMI/BMC access to your servers.

Host install guide for Vast.ai

#Start with a clean install of ubuntu 22.04.x HWE Kernel server. Just add openssh.
sudo apt update && sudo apt upgrade -y && sudo apt dist-upgrade -y && sudo apt install update-manager-core -y
#if you did not install HWE kernel do the following  
sudo apt install --install-recommends linux-generic-hwe-22.04 -y
sudo reboot

#install the drivers.
sudo apt install build-essential -y
sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt update
# to search for available NVIDIA drivers: use this command 
sudo apt search nvidia-driver | grep nvidia-driver | sort -r
sudo apt install nvidia-driver-560  -y    # assuming the latest is 560

#Remove unattended-upgrades Package so that the drivers don't upgrade when you have clients
sudo apt purge --auto-remove unattended-upgrades -y
sudo systemctl disable apt-daily-upgrade.timer
sudo systemctl mask apt-daily-upgrade.service 
sudo systemctl disable apt-daily.timer
sudo systemctl mask apt-daily.service

# This is needed to remove xserver and GNOME if you started with Ubuntu desktop. Clients can't run a desktop GUI in a container without an X server.
bash -c 'sudo apt-get update; sudo apt-get -y upgrade; sudo apt-get install -y libgtk-3-0; sudo apt-get install -y xinit; sudo apt-get install -y xserver-xorg-core; sudo apt-get remove -y gnome-shell; sudo update-grub; sudo nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration --enable-all-gpus' 


#if Ubuntu is installed to an SSD and you plan to have the Vast.ai client data stored on an NVMe follow the below instructions.
#WARNING IF YOUR OS IS ON /dev/nvme0n1 IT WILL BE WIPED. CHECK TWICE. Change this device to the intended device name that you plan to use.

# This is one command that will create the XFS partition and write it to the disk /dev/nvme0n1.
echo -e "n\n\n\n\n\n\nw\n" | sudo cfdisk /dev/nvme0n1 && sudo mkfs.xfs /dev/nvme0n1p1 
sudo mkdir /var/lib/docker

#I added discard so that the SSD is trimmed by Ubuntu and nofail so that if there is some problem with the drive the system will still boot.
sudo bash -c 'uuid=$(sudo xfs_admin -lu /dev/nvme0n1p1  | sed -n "2p" | awk "{print \$NF}"); echo "UUID=$uuid /var/lib/docker/ xfs rw,auto,pquota,discard,nofail 0 0" >> /etc/fstab'

sudo mount -a

# check that /dev/nvme0n1p1 is mounted to /var/lib/docker/
df -h

#this will enable Persistence mode on reboot so that the GPUs can go to idle power when not used
sudo bash -c '(crontab -l; echo "@reboot nvidia-smi -pm 1" ) | crontab -' 

#run the install command for Vast.ai
sudo apt install python3 -y
sudo wget https://console.vast.ai/install -O install; sudo python3 install YourKey; history -d $((HISTCMD-1)); 

nano /etc/default/grub   # find the GRUB_CMDLINE_LINUX="" and ensure it looks like this. 
GRUB_CMDLINE_LINUX="amd_iommu=on nvidia_drm.modeset=0 systemd.unified_cgroup_hierarchy=false"

#only run this command if you plan to support VMs on your machines. Read the Vast.ai guide to understand more https://vast.ai/docs/hosting/vms
sudo bash -c 'sed -i "/^GRUB_CMDLINE_LINUX=\"\"/s/\"\"/\"amd_iommu=on nvidia_drm.modeset=0\"/" /etc/default/grub && update-grub'

update-grub

#if you get an NVML error then run this
sudo wget https://raw.githubusercontent.com/jjziets/vasttools/main/nvml_fix.py
sudo python3 nvml_fix.py
sudo reboot

#follow the Configure Networking instructions as per https://console.vast.ai/host/setup

#test the ports with running sudo nc -l -p port on the host machine and use https://portchecker.co to verify  
sudo bash -c 'echo "40000-40019" > /var/lib/vastai_kaalia/host_port_range'
sudo reboot 

#After reboot, check that the drive is mounted to /var/lib/docker and that your systems show up on the Vast.ai dashboard.
df -h # look for /var/lib/docker mount
sudo systemctl status vastai
sudo systemctl status docker

Self-verification test

You can run the following test to ensure your new machine will be on the shortlist for verification testing. If you pass, there is a high chance that your machine will be eligible for verification. Take note that your router needs to allow loopback if you run this from a machine on the same network as the machine you want to test. If you do not know how to enable loopback it will be better to run this on a VM from a cloud provider or with a mobile connection to your PC.

Download the latest vastcli and set your API key

wget https://raw.githubusercontent.com/vast-ai/vast-python/master/vast.py; chmod +x vast.py;

./vast.py set api-key xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Usage Examples

Single machine, fail if not meeting requirements:
```
./vast.py self-test machine 54321
```
If it fails, you see the failing requirements in the output, and the test ends.
Single machine, continue testing anyway:
```
./vast.py self-test machine 54321 --ignore-requirements
```
Prints the failure reasons but still runs the tests.
Multiple machines from a single host ID, ignoring requirements:
```
python3 vast_machine_tester.py --host_id 123456 --ignore-requirements
```
In a few minutes, you’ll have passed_machines.txt and failed_machines.txt with a summary.

Overview (old and deprecated; use the new self-test built into the vastcli)

The autoverify_machineid.sh script is part of a suite of tools designed to automate the testing of machines on the Vast.ai marketplace. This script specifically tests a single machine to determine if it meets the minimum requirements necessary for further verification.

Prerequisites

Before you start using ./autoverify_machineid.sh, ensure you have the following:

Vast.ai Command Line Interface (vastcli): This tool is used to interact with the Vast.ai platform.
Vast.ai Listing: The machine should be listed on the Vast.ai marketplace.
Ubuntu OS: The scripts are designed to run on Ubuntu 20.04 or newer.

Setup and Installation

Download and Setup vastcli:

Download the Vast.ai CLI tool using the following command:

wget https://raw.githubusercontent.com/vast-ai/vast-python/master/vast.py -O vast
chmod +x vast

Set your Vast.ai API key:

./vast set api-key 6189d1be9f15ad2dced0ac4e3dfd1f648aeb484d592e83d13aaf50aee2d24c07

Download autoverify_machineid.sh:
- Use wget to download autoverify_machineid.sh to your local machine:
```
wget https://github.com/jjziets/VastVerification/releases/download/0.4-beta/autoverify_machineid.sh
```
Make Scripts Executable:
- Change the permissions of the main scripts to make them executable:
```
chmod +x autoverify_machineid.sh
```
Dependencies
- Run the following to install the required packages
```
apt update
apt install bc jq
```

Using `./autoverify_machineid.sh`

Check Machine Requirements:
- The ./autoverify_machineid.sh script is designed to test if a single machine meets the minimum requirements for verification. This is useful for hosts who want to verify their own machines.
- To test a specific machine by its machine_id, use the following command:
```
./autoverify_machineid.sh <machine_id>
```
  Replace <machine_id> with the actual ID of the machine you want to test.
To Ignore Requirements Check:
```
./autoverify_machineid.sh --ignore-requirements <machine_id>
```
This command runs the tests for the machine, regardless of whether it meets the minimum requirements.

Monitoring and Results

Progress and Results Logging:
- The script logs the progress and results of the tests.
- Successful results and machines that pass the requirements will be logged in Pass_testresults.log.
- Machines that do not meet the requirements or encounter errors during testing will be logged in Error_testresults.log.
Understanding the Logs:
- Pass_testresults.log: This file contains entries for machines that successfully passed all tests.
- Error_testresults.log: This file contains entries for machines that failed to meet the minimum requirements or encountered errors during testing.

Example Usage

Here’s how you can run the autoverify_machineid.sh script to test a machine with machine_id 10921:

./autoverify_machineid.sh 10921

Troubleshooting

API Key Issues: Ensure your API key is correctly set using ./vast set api-key <your-api-key>.
Permission Denied: If you encounter permission issues, make sure the script files have executable permissions (chmod +x <script_name>).
Connection Issues: Verify your network connection and ensure the Vast.ai CLI can communicate with the Vast.ai servers.

Summary

By following this guide, you will be able to use the ./autoverify_machineid.sh script to test individual machines on the Vast.ai marketplace. This process helps ensure that machines meet the required specifications for GPU and system performance, making them candidates for further verification and use in the marketplace.

Speedtest-cli fix for vast

If you are having problems with your machine not showing its upload and download speed correctly.

first check if there is a problem by forcing the speedtest to run

cd /var/lib/vastai_kaalia
./send_mach_info.py --speedtest

output should look like this

2024-10-03 08:50:04.587469

os version
running df
checking errors
nvidia-smi
560035003
/usr/bin/fio
checking speedtest
/usr/bin/speedtest
speedtest

running speedtest on random server id 19897
{"type":"result","timestamp":"2024-10-03T08:50:24Z","ping":{"jitter":0.243,"latency":21.723,"low":21.526,"high":22.047},"download":{"bandwidth":116386091,"bytes":1010581968,"elapsed":8806,"latency":{"iqm":22.562,"low":20.999,"high":296.975,"jitter":3.976}},"upload":{"bandwidth":116439919,"bytes":980885877,"elapsed":8508,"latency":{"iqm":36.457,"low":6.852,"high":349.495,"jitter":34.704}},"packetLoss":0,"isp":"Vox Telecom","interface":{"internalIp":"192.168.1.101","name":"bond0","macAddr":"F2:6A:67:0C:85:8B","isVpn":false,"externalIp":"41.193.204.66"},"server":{"id":19897,"host":"speedtest.wibernet.co.za","port":8080,"name":"Wibernet","location":"Cape Town","country":"South Africa","ip":"102.165.64.110"},"result":{"id":"18bb02e4-466d-43dd-b1fc-3f106319a9f6","url":"https://www.speedtest.net/result/c/18bb02e4-466d-43dd-b1fc-3f106319a9f6","persisted":true}}
....

If the above speedtest does not work, you can try to install an alternative newer one. Due to the newer speed test output not having the same format, a script will translate it so that Vast.ai can use the new speed test. All the commands combined

bash -c "sudo apt-get install curl -y && sudo curl -s https://packagecloud.io/install/repositories/ookla/speedtest-cli/script.deb.sh | sudo bash && sudo apt-get install speedtest -y && sudo apt install python3 -y && cd /var/lib/vastai_kaalia/latest && sudo mv speedtest-cli speedtest-cli.old && sudo wget -O speedtest-cli https://raw.githubusercontent.com/jjziets/vasttools/main/speedtest-cli.py && sudo chmod +x speedtest-cli"

or step by step

sudo apt-get install curl
sudo curl -s https://packagecloud.io/install/repositories/ookla/speedtest-cli/script.deb.sh | sudo bash
sudo apt-get install speedtest -y
sudo apt install python3 -y
cd /var/lib/vastai_kaalia/latest
sudo mv speedtest-cli speedtest-cli.old
sudo wget -O speedtest-cli https://raw.githubusercontent.com/jjziets/vasttools/main/speedtest-cli.py
sudo chmod +x speedtest-cli

This updates your speed test to the newer one and translates the output so that the Vast daemon can use it. If you now get slower speeds, follow this:

## If migrating from prior bintray install instructions please first...
# sudo rm /etc/apt/sources.list.d/speedtest.list
# sudo apt-get update
# sudo apt-get remove speedtest -y
## Other non-official binaries will conflict with Speedtest CLI
# Example how to remove using apt-get
# sudo apt-get remove speedtest-cli
sudo apt-get install curl
curl -s https://packagecloud.io/install/repositories/ookla/speedtest-cli/script.deb.sh | sudo bash
sudo apt-get install speedtest

Analytics dashboard

Recommended: Use the CryptoLabs monitoring stack for a modern, production-ready solution:

DC Overview — Prometheus/Grafana dashboards with pre-built panels for GPU, network, earnings, and system metrics.

DC Exporter — Rust-based GPU metrics exporter. Runs as a lightweight binary alongside Vast.ai and RunPod without affecting your rentals.

IPMI Monitor — IPMI/BMC monitoring with AI-powered insights, SSH log collection, and a web dashboard.

Legacy DCMontoring (archived): https://github.com/jjziets/DCMontoring

Addressing NVML error when using Ubuntu 22 and 24

Run the script below if you have a problem with the Vast.ai installer on 22 or 24 and receive an NVML error. This script is based on Bo26fhmC5M, so credit goes to him.

sudo wget https://raw.githubusercontent.com/jjziets/vasttools/main/nvml_fix.py
sudo python nvml_fix.py

Remove Persistent red error messages

If you have a red error message on your machine that you have confirmed has been addressed, it might help to delete /var/lib/vastai_kaalia/kaalia.log and reboot.

sudo rm /var/lib/vastai_kaalia/kaalia.log
sudo systemctl restart vastai

Monitor your Nvidia 3000/4000 Core, GPU Hotspot and VRAM temps

Recommended: dc-exporter-rs — A modern Rust-based GPU metrics exporter that collects GPU core, hotspot, and VRAM temperatures (including GDDR6X). Exposes Prometheus metrics for Grafana dashboards.

Key advantage: Runs as a lightweight system binary and does not interfere with Vast.ai or RunPod — your rentals and workloads are unaffected.
# Quick install (see repo for full instructions)
wget https://github.com/cryptolabsza/dc-exporter-releases/releases/latest/download/dc-exporter-rs-linux-amd64
chmod +x dc-exporter-rs-linux-amd64
sudo ./dc-exporter-rs-linux-amd64

Legacy nvml_direct_access tool (archived):

sudo wget https://github.com/jjziets/gddr6_temps/raw/master/nvml_direct_access
sudo chmod +x nvml_direct_access
sudo ./nvml_direct_access

Memory OC

Set the OC of the RTX 3090. It requires the following:

On the host run the following command:

sudo apt-get install libgtk-3-0 && sudo apt-get install xinit && sudo apt-get install xserver-xorg-core && sudo update-grub && sudo nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration --enable-all-gpus
wget https://raw.githubusercontent.com/jjziets/vasttools/main/set_mem.sh
sudo chmod +x set_mem.sh
sudo ./set_mem.sh 2000 # this will set the memory OC to +1000MHz on all the GPUs. You can use 3000 on some GPUs, which will give 1500MHz OC.

OC monitor

Set up the monitoring program that will change the memory OC based on what program is running. It is designed for RTX3090s and targets ethminer at this stage. It requires both set_mem.sh and ocmonitor.sh to run as root.

wget https://raw.githubusercontent.com/jjziets/vasttools/main/ocminitor.sh
sudo chmod +x ocminitor.sh
sudo ./ocminitor.sh # I suggest running this in tmux or screen so that when you close the SSH connection it keeps running. It looks for ethminer and if it finds it, it will set the OC based on your choice. You can also set power limits with nvidia-smi -pl 350

To load at reboot use the crontab below

sudo (crontab -l; echo "@reboot screen -dmS ocmonitor /home/jzietsman/ocminitor.sh") | crontab -  # replace the user with your user

Stress testing GPUs on Vast with this Python benchmark of RTX3090s

Mining does not stress your system the same as Python workloads do, so this is a good test to run as well.

First, set a maintenance window, and then once you have no clients running, you can do the stress testing.

https://github.com/jjziets/pytorch-benchmark-volta

A full suite of stress tests can be found in the docker image jjziets/vastai-benchmarks:latest in folder /app/

stress-ng - CPU stress
stress-ng - Drive stress
stress-ng - Memory stress
sysbench - Memory latency and speed benchmark
dd - Drive speed benchmark
Hashcat - Benchmark
bandwidthTest - GPU bandwidth benchmark
pytorch - Pytorch DL benchmark

#test or bash interface

sudo docker run --shm-size 1G --rm -it --gpus all jjziets/vastai-benchmarks /bin/bash
apt update && apt upgrade -y
./benchmark.sh

#Run using default settings Results are saved to ./output.

sudo docker run -v ${PWD}/output:/app/output --shm-size 1G --rm -it --gpus all jjziets/vastai-benchmarks
Run with parameters SLEEP_TIME/BENCH_TIME
sudo docker run -v ${PWD}/output:/app/output --shm-size 1G --rm -it -e SLEEP_TIME=2 -e BENCH_TIME=2 --gpus all jjziets/vastai-benchmarks

You can also do a GPU burn test.

sudo docker run --gpus all --rm oguzpastirmaci/gpu-burn <test duration in seconds>

If you want to run it for one GPU, run the command below, replacing the x with the GPU number starting at 0.

sudo docker run --gpus '"device=x"' --rm oguzpastirmaci/gpu-burn <test duration in seconds>

*Based on Leona / vast.ai-tools

Telegram-Vast-Uptime-Bot / DC Watchdog

Recommended: DC Watchdog — A managed SaaS uptime monitoring service that replaces the self-hosted Telegram bot. Features include:

Multi-machine monitoring with lightweight agents

Alerts via Telegram, email, and push notifications

Centralized dashboard with history and analytics

No need to run your own server — the agents report to the CryptoLabs cloud

Legacy self-hosted bot (still works for simple setups): This is a set of scripts for monitoring machine crashes. Run the client on your Vast.ai machine and the server on a remote one. You get notifications on Telegram if no heartbeats are sent within the timeout (default 12 seconds). https://github.com/jjziets/Telegram-Vast-Uptime-Bot

Auto update the price for host listing based on mining profits

Based on RTX 3090 120MHz for eth. It sets the price of my two hosts. It works with a custom Vast CLI which can be found here https://github.com/jjziets/vast-python/blob/master/vast.py The manager is here https://github.com/jjziets/vasttools/blob/main/setprice.sh

This should be run on a VPS, not on a host. Do not expose your Vast API keys by using it on the host.

wget https://github.com/jjziets/vast-python/blob/master/vast.py 
sudo chmod +x vast.py
./vast.py set api-key UseYourVasset
wget https://github.com/jjziets/vasttools/blob/main/setprice.sh
sudo chmod +x setprice.sh

Background job or idle job for Vast.ai

The best way to manage your idle job is via the Vast CLI. To my knowledge, the GUI set job is broken, so to set an idle job follow the following steps. You will need to download the Vast CLI and run the following commands. The idea is to rent yourself as an interruptible job. The Vast CLI allows you to set one idle job for all the GPUs or one GPU per instance. You can also set the SSH connection method or any other method. Go to https://cloud.vast.ai/cli/ and install your CLI flavor.

Set up your account key so that you can use the Vast CLI. You get this key from your account page.

./vast set api-key API_KEY

You can use my SetIdleJob.py script to set up your idle job based on the minimum price set on your machines.

wget https://raw.githubusercontent.com/jjziets/vasttools/main/SetIdleJob.py

Here is an example of how I mine to NiceHash

python3 SetIdleJob.py --args 'env | grep _ >> /etc/environment; echo "starting up"; apt -y update; apt -y install wget; apt -y install libjansson4; apt -y install xz-utils; wget https://github.com/develsoftware/GMinerRelease/releases/download/3.44/gminer_3_44_linux64.tar.xz; tar -xvf gminer_3_44_linux64.tar.xz; while true; do ./miner --algo kawpow --server stratum+tcp://kawpow.auto.nicehash.com:9200 --user 3LNHVWvUEufL1AYcKaohxZK2P58iBHdbVH.${VAST_CONTAINERLABEL:2}; done'

Or the full command if you don't want to use the defaults

python3 SetIdleJob.py --image nvidia/cuda:12.4.1-runtime-ubuntu22.04 --disk 16 --args 'env | grep _ >> /etc/environment; echo "starting up"; apt -y update; apt -y install wget; apt -y install libjansson4; apt -y install xz-utils; wget https://github.com/develsoftware/GMinerRelease/releases/download/3.44/gminer_3_44_linux64.tar.xz; tar -xvf gminer_3_44_linux64.tar.xz; while true; do ./miner --algo kawpow --server stratum+tcp://kawpow.auto.nicehash.com:9200 --user 3LNHVWvUEufL1AYcKaohxZK2P58iBHdbVH.${VAST_CONTAINERLABEL:2}; done' --api-key b149b011a1481cd852b7a1cf1ccc9248a5182431b23f9410c1537fca063a68b1

Troubleshoot your bash -c command by using the logs on the instance page

Alternatively, you can rent yourself with the following command and then log in and load what you want to run. Make sure to add your process to onstart.sh. To rent yourself first find your machine with the machine ID

./vast search offers "machine_id=14109 verified=any gpu_frac=1 " # gpu_frac=1 will give you the instance with all the gpus.

or

./vast search offers -i "machine_id=14109 verified=any  min_bid>0.1 num_gpus=1" # it will give you the instance with one GPU

Once you have the offer_id, and in this case, the search with a -i switch will give you an interruptible instance_id

Let's assume you want to mine with lolminer

./vast create instance 9554646 --price 0.2 --image nvidia/cuda:12.0.1-devel-ubuntu20.04   --env '-p 22:22' --onstart-cmd 'bash -c "apt  -y update; apt  -y install wget; apt  -y install libjansson4; apt -y install xz-utils; wget https://github.com/Lolliedieb/lolMiner-releases/releases/download/1.77b/lolMiner_v1.77b_Lin64.tar.gz; tar -xf lolMiner_v1.77b_Lin64.tar.gz -C ./; cd 1.77b; ./lolMiner --algo ETCHASH --pool etc.2miners.com:1010 --user 0xYour_Wallet_Goes_Here.VASTtest"'  --ssh  --direct --disk 100

It will start the instance at price 0.2.

./vast show instances # will give you the list of instances
./vast change bid 9554646  --price 0.3 # This will change the price to 0.3 for the instance

Setting fan speeds if you have a headless system

Here is a repo with two programs and a few scripts that you can use to manage your fans https://github.com/jjziets/GPU_FAN_OC_Manager/tree/main

bash -c "wget https://github.com/jjziets/GPU_FAN_OC_Manager/raw/main/set_fan_curve; chmod +x set_fan_curve; CURRENT_PATH=\$(pwd); nohup bash -c \"while true; do \$CURRENT_PATH/set_fan_curve 65; sleep 1; done\" > output.txt & (crontab -l; echo \"@reboot screen -dmS gpuManger bash -c 'while true; do \$CURRENT_PATH/set_fan_curve 65; sleep 1; done'\") | crontab -"

Remove unattended-upgrades package

If your system updates while Vast.ai is running, or even worse when a client is renting you, then you might get de-verified or banned. It's advised to only update when the system is unrented and delisted. The best approach would be to set an end date for your listing and conduct updates and upgrades at that stage. To stop unattended-upgrades run the following commands.

sudo apt purge --auto-remove unattended-upgrades -y
sudo systemctl disable apt-daily-upgrade.timer
sudo systemctl mask apt-daily-upgrade.service 
sudo systemctl disable apt-daily.timer
sudo systemctl mask apt-daily.service

How to update a host

When the system is idle and delisted run the following commands. Vast daemon and Docker services are stopped. It is also a good idea to upgrade Nvidia drivers like this. If you don't and the upgrades break a package you might get de-verified or even banned from Vast.ai.

bash -c ' sudo systemctl stop vastai; sudo systemctl stop docker.socket; sudo systemctl stop docker; sudo apt update; sudo apt upgrade -y; sudo systemctl start docker.socket ; sudo systemctl start docker; sudo systemctl start vastai'

How to move your Vast.ai Docker driver to another drive

This guide illustrates how to back up Vast.ai Docker data from an existing drive and transfer it to a new drive. In this case a RAID drive /dev/md0

Prerequisites:

No clients are running and that you are unlisted from the Vast.ai market.
Docker data exists on the current drive.

Steps:

Install required tools:
```
sudo apt install pv pixz
```

Stop and disable relevant services:

sudo systemctl stop vastai docker.socket docker
sudo systemctl disable vastai docker.socket docker

Backup the Docker directory: Create a compressed backup of the /var/lib/docker directory. Ensure there's enough space on the OS drive for this backup, or move the data to a backup server. See https://github.com/jjziets/vasttools/blob/main/README.md#backup-varlibdocker-to-another-machine-on-your-network
```
sudo tar -c -I 'pixz -k -1' -f ./docker.tar.pixz /var/lib/docker | pv  # you can change ./ to a destination directory
```
Note: pixz utilizes multiple cores for faster compression.
Unmount the Docker directory: If you're planning to shut down and install a new drive:
```
sudo umount /var/lib/docker
```
Update /etc/fstab: Disable auto-mounting of the current Docker directory at startup to prevent boot issues:
```
sudo nano /etc/fstab
```
Comment out the line associated with /var/lib/docker by adding a # at the start of the line.
Partition the New Drive: (Adjust the device name based on your system. The guide uses /dev/md0 for RAID and /dev/nvme0n1 for NVMe drives as examples.)
```
sudo cfdisk /dev/md0
```
Format the new partition with XFS:
```
sudo mkfs.xfs -f /dev/md0p1
```
Retrieve the UUID: You'll need the UUID for updating /etc/fstab.
```
sudo xfs_admin -lu /dev/md0p1
```
Update /etc/fstab with the New Drive:
```
sudo nano /etc/fstab
```
Add the following line (replace the UUID with the one you retrieved):
```
UUID="YOUR_UUID_HERE" /var/lib/docker xfs rw,auto,pquota,discard,nofail 0 0
```
Mount the new partition:
```
sudo mount -a
```
Confirm the mount:
```
df -h
```
Ensure /dev/md0p1 (or the appropriate device name) is mounted to /var/lib/docker.
Restore the Docker data: Navigate to the root directory:
```
cd /
```
Decompress and restore: Ensure you change the user to the relevant name
```
sudo cat /home/user/docker.tar.pixz | pv | sudo tar -x -I 'pixz -d -k'
```

Enable services:

sudo systemctl enable vastai docker.socket docker

Reboot:
```
sudo reboot
```

Post-Reboot:

Check if the desired drive is mounted to /var/lib/docker and ensure vastai is operational.

Backup `/var/lib/docker` to Another Machine on Your Network

If you're looking to migrate your Docker setup to another machine, whether for replacing the drive or setting up a RAID, follow this guide. For this example, we'll assume the backup server's IP address is 192.168.1.100.

Setup on the Backup Server:

Temporarily Enable Root SSH Login: It's essential to ensure uninterrupted SSH communication during the backup process, especially when transferring large files like compressed Docker data. a. Open the SSH configuration:
```
sudo nano /etc/ssh/sshd_config
```
b. Locate and change the line:
```
PermitRootLogin no
```
to:
```
PermitRootLogin yes
```
c. Reload the SSH configuration:
```
sudo systemctl restart sshd
```

Setup on the Source Machine:

Generate an SSH Key and Transfer it to the Backup Server: a. Create the SSH key:
```
sudo ssh-keygen
```
b. Copy the SSH key to the backup server:
```
sudo ssh-copy-id -i ~/.ssh/id_rsa root@192.168.1.100
```
Disable Root Password Authentication: Ensure only the SSH key can be used for root login, enhancing security. a. Modify the SSH configuration:
```
sudo nano /etc/ssh/sshd_config
```
b. Change the line to:
```
PermitRootLogin prohibit-password
```
c. Reload the SSH configuration:
```
sudo systemctl restart sshd
```

Preparation for Backup: Before backing up, ensure relevant services are halted:

sudo systemctl stop docker.socket
sudo systemctl stop docker
sudo systemctl stop vastai
sudo systemctl disable vastai 
sudo systemctl disable docker.socket 
sudo systemctl disable docker

Backup Procedure: This procedure compresses the /var/lib/docker directory and transfers it to the backup server. a. Switch to the root user and install necessary tools:
```
sudo su
apt install pixz
apt install pv
```
It might be a good idea to run the backup command in tmux or screen so that if you lose the SSH connection the process will finish. b. Perform the backup:
```
tar -c -I 'pixz -k -0' -f - /var/lib/docker | pv | ssh root@192.168.1.100 "cat > /mnt/backup/machine/docker.tar.pixz"
```

Restoration:

Restoring the Backup: Make sure your new drive is mounted at /var/lib/docker. a. Switch to the root user:
```
sudo su
```
b. Restore from the backup:
```
cd /
ssh root@192.168.1.100 "cat /mnt/backup/machine/docker.tar.pixz" | pv | sudo tar -x -I 'pixz -d -k'
```

Reactivate Services:

sudo systemctl enable vastai 
sudo systemctl enable docker.socket 
sudo systemctl enable docker
sudo reboot

Post-reboot: Ensure your target drive is mounted to /var/lib/docker and that vastai is operational.

Connecting to running instance with VNC to see applications GUI

Using an instance with open ports If the display color depth is 16 not 16-bit try another VNC viewer. TightVNC worked for me on Windows

First tell Vast.ai to allow a port to be assigned. Use the -p 8081:8081 and tick the direct command.

Find a host with open ports and then rent it, preferably on demand. Go to the client instances page and wait for the connect button.

Use SSH to connect to the instances.

Run the commands below. The second part can be placed in the onstart.sh to run on restart.

bash -c 'apt-get update; apt-get -y upgrade;  apt-get install -y x11vnc; apt-get install -y xvfb; apt-get install -y firefox;apt-get install -y xfce4;apt-get install -y  xfce4-goodies'


export DISPLAY=:20
Xvfb :20 -screen 0 1920x1080x16 &
x11vnc -passwd TestVNC -display :20 -N -forever -rfbport 8081 &
startxfce4

To connect use the IP of the host and the port that was provided. In this case it is 400010.

Then enjoy the desktop. Sadly this is not hardware accelerated, so no games will work.

Setting up 3D accelerated desktop in a web browser on Vast.ai

We will be using ghcr.io/ehfd/nvidia-glx-desktop:latest Use these environment parameters

-e TZ=UTC -e SIZEW=1920 -e SIZEH=1080 -e REFRESH=60 -e DPI=96 -e CDEPTH=24 -e VIDEO_PORT=DFP -e PASSWD=mypasswd -e WEBRTC_ENCODER=nvh264enc -e BASIC_AUTH_PASSWORD=mypasswd -p 8080:8080

Find a system that has open ports

When done loading click Open.

The username is user and the password is what you set, mypasswd in this case.

Click Start

3D accelerated desktop environment in a web browser

How to set up a Docker registry for the systems on your network

This will reduce the number of pull requests from your public IP. Docker is restricted to 100 pulls per six hours for unauthenticated login, and it can speed up the startup time for your rentals. This guide provides instructions on how to set up a Docker registry server using Docker Compose, as well as configuring Docker clients to use this registry. Prerequisites Docker and Docker Compose are installed on the server that has a lot of fast storage on your local LAN. Docker is installed on all client machines.

Setting Up the Docker Registry Server Install docker-compose if you have not already.

sudo su
curl -L "https://github.com/docker/compose/releases/download/v2.24.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
apt-get update && sudo apt-get install -y gettext-base

Create a docker-compose.yml file: Create a file named docker-compose.yml on your server with the following content:

version: '3'
services:
  registry:
    restart: unless-stopped
    image: registry:2
    ports:
      - 5000:5000
    environment:
      - REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io
      - REGISTRY_STORAGE_DELETE_ENABLED="true"
    volumes:
      - data:/var/lib/registry
volumes:
  data:

This configuration sets up a Docker registry server running on port 5000 and uses a volume named data for storage. Start the Docker Registry:

Run the following command in the directory where your docker-compose.yml file is located:

sudo docker-compose up -d

This command will start the Docker registry in detached mode.

If space is limited, you can run this cleanup task as a cron job on the server.

wget https://github.com/jjziets/vasttools/raw/main/cleanup-registry.sh
chmod +x cleanup-registry.sh

Add this line to your crontab -e

0 * * * * /path/to/cleanup-registry.sh

Replace /path/to/ with where the file is saved.

Configuring Docker Clients

To configure Docker clients to use the registry, follow these steps on each client machine: Edit the Docker Daemon Configuration: Run the following command to add your Docker registry as a mirror in the Docker daemon configuration:

echo '{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "registry-mirrors": ["http://192.168.100.7:5000"]
}' | sudo tee /etc/docker/daemon.json

Replace 192.168.100.7:5000 with the IP address and port of your Docker registry server. Restart the Docker daemon:

sudo systemctl restart docker

Verifying the setup To verify that the Docker registry is set up correctly, you can try pulling an image from the registry:

docker pull 192.168.100.7:5000/your-image

Replace 192.168.100.7:5000/your-image with the appropriate registry URL and image name.

Useful commands

If you set up the Vast CLI, you can enter this

./vast show machines | grep "current_rentals_running_on_demand"

If it returns 0, then it's an interruptible rent.

Command on a host that provides logs of the daemon running

tail /var/lib/vastai_kaalia/kaalia.log -f

Uninstall Vast

wget https://s3.amazonaws.com/vast.ai/uninstall.py
sudo python uninstall.py

Name		Name	Last commit message	Last commit date
Latest commit History 298 Commits
README.md		README.md
SetIdleJob.py		SetIdleJob.py
ValidationSearch.sh		ValidationSearch.sh
autostart.sh		autostart.sh
bwtest		bwtest
change_hostname.sh		change_hostname.sh
cleanup-registry.sh		cleanup-registry.sh
iplimit.sh		iplimit.sh
mdadm_telegram_notify.sh		mdadm_telegram_notify.sh
nvflash_5.833_linux.zip		nvflash_5.833_linux.zip
nvml_fix.py		nvml_fix.py
ocminitor.sh		ocminitor.sh
onstart.sh		onstart.sh
pcilinktest.sh		pcilinktest.sh
setIdleJobPrice.sh		setIdleJobPrice.sh
set_mem.sh		set_mem.sh
setprice.sh		setprice.sh
showDockerLogs.sh		showDockerLogs.sh
speedtest-cli		speedtest-cli
speedtest-cli-jsontest		speedtest-cli-jsontest
speedtest-cli.py		speedtest-cli.py
stress_test.sh		stress_test.sh
stresttest300.sh		stresttest300.sh
train_fsdp.py		train_fsdp.py
vast_idle_power_manager.py		vast_idle_power_manager.py
verify.sh		verify.sh

jjziets/vasttools

Folders and files

Latest commit

History

Repository files navigation