IDUN HPC cluster provides access to LLMs ( running locally ).
How to access
IDUN LLM models are available from NTNU networks or NTNU VPN.
You can use IDUN LLM models via Desktop AI applications like:
- VS Code with extensions (Cline, Roo Code, Continue, Kilo Code, ... )
- ZED
- OpenCode
- Claude Code
- Witsy
- Anythin LLM
- Cherry Studio
Web chat/agent:
- Open WebUI: https://idun-llm.hpc.ntnu.no
- LibreChat: https://ai.hpc.ntnu.no/chat/
Login with your NTNU short username.
iPhone and Android applications like Apollo (tested)
API key:
We create personal API key for each user. Send e-mail to: help@hpc.ntnu.no
See more details and examples in this document.
What about sensitive data?
All LLM models on IDUN are running locally and data is not leaving NTNU network.
- API calls go directly into the model, and they are not stored. (vLLM is started with "--no-enable-prefix-caching" and LLM proxy is started with "cache: False" option)
- Use API with desktop AI applications.
- Web interfaces Open WebUI and LibreChat have feature "Temporary Chat". These chats are not saved. "Temporary Chat" sis not enabled by default so users’ questions and answers are stored for user convenience. And user can delete saved conversations manually.
- There is a plan to officially approve for "røde data", but we have not done the formal assessment yet.
Temporary chat toggle is located in the top right corner:

Usage statistics: https://ai.hpc.ntnu.no/stats
LLM models
Updated: 2025-12-28
| Model name | Input format | Created by | Country | License | Parameters | Context Window |
|---|---|---|---|---|---|---|
| mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4 | image and text | Mistral AI SAS | France | Apache 2.0 | 675B | 294912 |
| openai/gpt-oss-120b | text | OpenAI | USA | Apache 2.0 | 117B | 131072 |
| zai-org/GLM-4.7-FP8 | text | Z.ai | China | MIT | 358B | 202752 |
| moonshotai/Kimi-K2.5 | image and text | Moonshot AI | China | Modified MIT | 1000B | 262144 |
| Qwen/Qwen3-Coder-Next-FP8 | text | Alibaba Cloud | China | Apache 2.0 | 30.5B | 262144 |
| NorwAI/NorwAI-Magistral-24B-reasoning | text | NorwAI, NTNU | Norway | NorLLM License by NTNU | 40960 | |
| Qwen/Qwen3-Embedding-8B | text | Alibaba Cloud | China | Apache 2.0 | 8B | 40960 |
NOTE: context window = max_input_tokens + max_output_tokens. We configure 50% / 50%. For example for model openai/gpt-oss-120b with context window 131072 use:
- max_input_tokens: 65536
- max_output_tokens: 65536
Click to show "What model developers write about their models"
| Model ID | Comments |
|---|---|
| mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4 | The Mistral Large 3 Instruct model offers the following capabilities: - Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text. - Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic. - System Prompt: Maintains strong adherence and support for system prompts. - Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting. Read about this model: https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4 https://mistral.ai/news/mistral-3 |
| openai/gpt-oss-120b | Highlights: - Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. - Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. - Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. - Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Read about this model: https://huggingface.co/openai/gpt-oss-120b https://openai.com/index/introducing-gpt-oss/ |
| zai-org/GLM-4.7-FP8 | GLM-4.7, your new coding partner, is coming with the following features: - Core Coding: multilingual agentic coding and terminal-based tasks. GLM-4.7 also supports thinking before acting, with significant improvements on complex tasks in mainstream agent frameworks such as Claude Code, Kilo Code, Cline, and Roo Code. - Vibe Coding: GLM-4.7 takes a big step forward in improving UI quality. It produces cleaner, more modern webpages and generates better-looking slides with more accurate layout and sizing. - Tool Using: GLM-4.7 achieves significantly improvements in Tool using. - Complex Reasoning: GLM-4.7 delivers a substantial boost in mathematical and reasoning capabilities. Read about this model: https://huggingface.co/zai-org/GLM-4.7-FP8 https://z.ai/blog/glm-4.7 |
| moonshotai/Kimi-K2.5 | Key Features: - Native Multimodality: Pre-trained on vision–language tokens, K2.5 excels in visual knowledge, cross-modal reasoning, and agentic tool use grounded in visual inputs. - Coding with Vision: K2.5 generates code from visual specifications (UI designs, video workflows) and autonomously orchestrates tools for visual data processing. - Agent Swarm: K2.5 transitions from single-agent scaling to a self-directed, coordinated swarm-like execution scheme. It decomposes complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents. Read about this model: https://huggingface.co/moonshotai/Kimi-K2.5 https://www.kimi.com/blog/kimi-k2-5.html |
| Qwen/Qwen3-Coder-Next-FP8 | Key enhancements: - Super Efficient with Significant Performance. - Advanced Agentic Capabilities. - Versatile Integration with Real-World IDE. Read about this model: https://huggingface.co/Qwen/Qwen3-Coder-Next-FP8 https://qwen.ai/blog?id=qwen3-coder-next |
| NorwAI/NorwAI-Magistral-24B-reasoning | Reasoning language model, by NowAI research center at Norwegian University of Science and Technology (NTNU) in collaboration with Schibsted, NRK, VG and the National Library of Norway. The model is designed to adapt its reasoning depth dynamically based on the type and complexity of the user’s question: - Completion mode for straightforward answers without reasoning - Short-thinking mode for moderately difficult questions requiring some reasoning - Long-thinking mode for more complex questions requiring deeper reasoning Read about this model: https://huggingface.co/NorwAI/NorwAI-Magistral-24B-reasoning |
| Qwen/Qwen3-Embedding-8B | Highlights - support 100+ Languages, including: Norwegian Bokmål, Norwegian Nynorsk. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities. - long-text understanding - reasoning skills - The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. Read about this model: https://huggingface.co/Qwen/Qwen3-Embedding-8B https://github.com/QwenLM/Qwen3-Embedding |
Web interface (Open WebUI)
Open WebUI: https://idun-llm.hpc.ntnu.no
LibreChat: https://ai.hpc.ntnu.no/chat/

API (OpenAI Compatible)
Send email and we'll generate new personal API token for you: help@hpc.ntnu.no
https://llm.hpc.ntnu.no is a LLM gateway to access all LLMs on IDUN HPC cluster. It provides consistent openai compatible API.
You can see all endpoints on this web page https://llm.hpc.ntnu.no/
Visual Studio Code - extensions
There are several popular open source VS Code extensions:
- Cline (recommended)
- Roo Code
- Kilo Code
- Continue
NOTE: Current experience with Roo Code. Extension is getting updates every 1-3 days. And that cases somtimes issue. Things that worked yesterday can stop working today. This was my experience with Roo Code and Mistral models. You can downgrade extension and stop auto update.
Base URL: https://llm.hpc.ntnu.no/
API provider: LiteLLM or OpenAI compatible
API Key: send email to help@hpc.ntnu.no to create new.
NOTE 1: Some LLM models works better (no errors) with Cline some are better with Roo Code.
NOTE 2: You can use API Provider "Open API compatible" or "LiteLLM". In most cases they are the same. But for example with Mistral Large 3 in Roo Code "Open API compatible" provider shows sometimes errors. This can be fixed in the next Roo Code update release.

Comparing LLM models undenstanding and coding capabilities. Text prompt (Source https://z.ai/blog/glm-4.7):
Design a richly crafted voxel-art environment featuring an ornate pagoda set within a vibrant garden.
Include diverse vegetation—especially cherry blossom trees—and ensure the composition feels lively, colorful, and visually striking.
Use any voxel or WebGL libraries you prefer, but deliver the entire project as a single, self-contained HTML file that I can paste and open directly in Chrome.

Desktop AI applications
BYOK (Bring Your Own Keys) AI applications. Test several applications. Looking for features:
- Application can connect to LLM models via API on IDUN
- Open Source
- Can work with local documents
- Can use local web search
- Can send images for recognition
These applications was tested:
| Desktop AI application | License | Comment |
|---|---|---|
| Open WebUI | permissive license with branding protection | It can be installed on a local computer. And accessed via web browser http://localhost:8080. Uses embedding model to work with local documents. |
| Witsy | AGPL-3.0 license | Uses embedding model to work with local documents. |
| Cherry Studio | AGPL-3.0 license | Uses embedding model to work with local documents. |
| Anything LLM | MIT License | Uses embedding model to work with local documents. |
| ZED | Open source | Zed is a minimal code editor with AI support out of the box. It is designed to work with code. It is not using embedding model but it understand question and finds answer in local files with LLM model. |
| Visual Studio Code | MIT license | VS Code is a code editor. It can connec to LLM API with extensions like Cline, Roo Code, Kilo Code.... It is not using embedding model but it understand question and finds answer in local files with LLM model. |
Click to show: Configuration example local Open WebUI
Install guide: https://docs.openwebui.com/getting-started/quick-start
Settings location: Click User icon > Admin panel > Settings
Screenshot main interface http://localhost:8080:

Connect to IDUN LLMs:

Configure search engine:

Configure Embedding model for local documents:

Add local documents to a knowledge base:

Add image generation model:

Click to show: Configuration example Witsy
Main interface:

Connect to IDUN LLM

Add embedding model for local documents:



Click to show: Configuration example Cherry Studio
Main interface:

Connect to IDUN LLM


Configure web search engine:

Add directory to knowledge base:



Click to show: Configuration example Anything LLM
Main interface

Connect to IDUN LLM:

Add embedding model:

Configure web search:

Add directory to knowledge base:

Click to show: Configuration example ZED
Main interface:

Connect IDUN LLM:

test...
iPhone (iOS) and Android application
Example configuration application Apollo:


API examples with curl and Python
Test API token - get model list:
curl https://llm.hpc.ntnu.no/v1/models -H "Authorization: Bearer sk-..MY..PESONAL..API..TOKEN.."
Test chat response:
curl https://llm.hpc.ntnu.no/v1/chat/completions -H "Authorization: Bearer sk-..MY..PESONAL..API..TOKEN.." -H "Content-Type: application/json" -d '{
"model": "openai/gpt-oss-120b",
"messages": [
{"role": "user", "content": "Who are you?"}
]
}'
Example with curl command - embedding:
curl https://llm.hpc.ntnu.no/v1/embeddings -H "Authorization: Bearer sk-..MY..PESONAL..API..TOKEN.." -H "Content-Type: application/json" -d '{
"model": "Qwen/Qwen3-Embedding-8B",
"input": ["hello world", "this is another sentence"]
}'
This example will use Python module openai. First create Python virtual environment and install openai module:
python3 -m venv venv-openai
source venv-openai/bin/activate
pip install openai
Create file chat-tools.py with code example with tool calling:
import openai
import json
import datetime
client = openai.OpenAI(
base_url="https://llm.hpc.ntnu.no/v1",
api_key="sk-..MY..PESONAL..API..TOKEN.."
)
def get_current_time():
current_datetime = datetime.datetime.now()
return f"Current Date and Time: {current_datetime}"
tools = [
{
"type": "function",
"function": {
"name": "get_current_time",
"description": "Get current date and time"
},
}
]
response = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[{"role": "user", "content": "What's the time right now?"}],
tools=tools
)
# Process the response
response_message = response.choices[0].message
if response_message.tool_calls:
for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
if function_name == "get_current_time":
time_info = get_current_time()
print(f"Tool call executed: {function_name}() -> {time_info}")
else:
print(f"Unknown tool call: {function_name}")
else:
print(f"Model response (no tool call): {response_message.content}")
Example output:
$ python3 chat-tools.py
Tool call executed: get_current_time() -> Current Date and Time: 2025-12-29 12:40:34.581986
OpenCode - configuration example

File: ~/.config/opencode/config.json
{
"provider": {
"idun-llm": {
"npm": "@ai-sdk/openai-compatible",
"name": "IDUN LLM",
"options": {
"baseURL": "https://llm.hpc.ntnu.no/v1",
"apiKey": "sk-...Personal...API.KEY..."
},
"models": {
"openai/gpt-oss-120b": {
"name": "GPT OSS 120B"
},
"Qwen/Qwen3-Coder-Next-FP8": {
"name": "Qwen3 Coder Next FP8"
},
"mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4": {
"name": "Mistral Large 3 675B Instruct"
},
"zai-org/GLM-4.7-FP8": {
"name": "GLM 4.7 FP8"
},
"moonshotai/Kimi-K2.5": {
"name": "Kimi K2.5"
}
}
}
}
}
Claude Code - configuration example

Install Claude Code. Instruction: https://code.claude.com/docs/en/quickstart.
I used this commands on Mac:
curl -fsSL https://claude.ai/install.sh | bash
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc && source ~/.zshrc
IMPORTANT (to start without subscription): only first time, add this environment variable before starting claude:
export ANTHROPIC_AUTH_TOKEN="ollama"
Create config file in the home directory claude_code_config.json with this lines:
{
"env": {
"ANTHROPIC_BASE_URL": "https://llm.hpc.ntnu.no",
"ANTHROPIC_AUTH_TOKEN": "sk-...YOUR_API_KEY....",
"API_TIMEOUT_MS": "3000000",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": 1,
"ANTHROPIC_MODEL": "zai-org/GLM-4.7-FP8",
"ANTHROPIC_SMALL_FAST_MODEL": "zai-org/GLM-4.7-FP8",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "zai-org/GLM-4.7-FP8",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "zai-org/GLM-4.7-FP8",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "zai-org/GLM-4.7-FP8"
}
}
Test. Create directory "garden" and change directory:
mkdir garden
cd garden
Start Claude Code:
claude --settings ~/claude_code_config.json
Test prompt:
Design a richly crafted voxel-art environment featuring an ornate pagoda set within a vibrant garden.
Include diverse vegetation—especially cherry blossom trees—and ensure the composition feels lively, colorful, and visually striking.
Use any voxel or WebGL libraries you prefer, but deliver the entire project as a single, self-contained HTML file that I can paste and open directly in Chrome.
Test prompts:
Prompt - 3D solar system:
Create single HTML file with a 3d rotating solar system to open in web browser. Show planet text information when mouse is ower the planet. It will be possible to zoom and rotate with mouse.
Prompt - 3D racing game:
Design and create a 3D highway racing game. The game must feature 3D graphics in any style you choose. A Start Screen that allows the user to select the car they will use. The user may select from three potential options as follows: A Sports Car, A Sedan, An option of your choosing. Each Car must have realistic limitations on its performance (e.g., top speed, acceleration, handling), which should also be displayed graphically on the car selection screen. Once the car is selected and the game starts, the player's car will begin driving on a busy highway. The player must navigate through dynamic traffic, swerving between lanes to avoid collisions. There MUST be a visible "nitrous oxide" or speed boost effect when used, as well as functional damage implementation for the player's car from collisions. If the player successfully navigates through the traffic for a set distance/time, the level repeats with increased difficulty (e.g., denser traffic, higher speeds, adverse weather). If the player's car sustains critical damage or crashes, the vehicle becomes uncontrollable (e.g., spins out, rolls over), and the screen returns to the home screen following a 2-second black screen. You may use any library for this implementation, but it must be contained within a single script, and be able to be opened and played in the Chrome browser.
Prompt - rotating 3D globe:
Create a single HTML file that sets up a basic Three.js scene with a rotating 3D globe. The globe should have high detail (64 segments), use a placeholder texture for the Earth's surface, and include ambient and directional lighting for realistic shading. Implement smooth rotation animation around the Y-axis, handle window resizing to maintain proper proportions, and use antialiasing for smoother edges. Explanation: Scene Setup: Initializes the scene, camera, and renderer with antialiasing. Sphere Geometry: Creates a high-detail sphere geometry (64 segments). Texture: Loads a placeholder texture using THREE. TextureLoader. Material & Mesh: Applies the texture to the sphere material and creates a mesh for the
globe. Lighting: Adds ambient and directional lights to enhance the scene's realism. Animation: Continuously rotates the globe around its Y-axis. Resize Handling : Adjusts the renderer size and camera aspect ratio when the window is resized.