This is an example to inference model llama3.2:1b with ollama on IDUN.
Create job file "ollama.slurm" to start "ollama serve" on a compute node:
#!/bin/sh
#SBATCH --partition=GPUQ
#SBATCH --account=MY-GROUP-ACCOUNT
#SBATCH --time=0-01:00:00
#SBATCH --nodes=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=10G
#SBATCH --output=ollama_serve.txt
#SBATCH --gres=gpu:1
module load ollama/0.6.0-GCCcore-13.3.0-CUDA-12.6.0
export OLLAMA_HOST=0.0.0.0
ollama serve
Submit job:
$ sbatch ollama.slurm
Submitted batch job 22988538
Check that job is RUNNING and compute node name:
$ scontrol show job 22988538
JobId=22988538 JobName=ollama.slurm
. . .
JobState=RUNNING Reason=None Dependency=(null)
. . .
BatchHost=idun-09-04
. . .
Login to the compute node and load model:
$ ssh idun-09-04
$ module load ollama/0.6.0-GCCcore-13.3.0-CUDA-12.6.0
$ ollama run llama3.2:1b
$ ollama run llama3.2:1b
>>> Hi
How can I help you today?