ollama – High Performance Computing

This is an example to inference model llama3.2:1b with ollama on IDUN.

Create job file "ollama.slurm" to start "ollama serve" on a compute node:

#!/bin/sh
#SBATCH --partition=GPUQ
#SBATCH --account=MY-GROUP-ACCOUNT
#SBATCH --time=0-01:00:00
#SBATCH --nodes=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=10G
#SBATCH --output=ollama_serve.txt
#SBATCH --gres=gpu:1
module load ollama/0.6.0-GCCcore-13.3.0-CUDA-12.6.0
export OLLAMA_HOST=0.0.0.0
ollama serve

Submit job:

$ sbatch ollama.slurm
Submitted batch job 22988538

Check that job is RUNNING and compute node name:

$ scontrol show job 22988538
JobId=22988538 JobName=ollama.slurm
. . .
   JobState=RUNNING Reason=None Dependency=(null)
. . .
   BatchHost=idun-09-04
. . .

$ ssh idun-09-04

$ module load ollama/0.6.0-GCCcore-13.3.0-CUDA-12.6.0

$ ollama run llama3.2:1b

$ ollama run llama3.2:1b
>>> Hi
How can I help you today?