Large Language Models (LLMs) and API on IDUN

Table of Contents

    IDUN provides access to coding assistant and LLMs that is running locally on IDUN HPC cluster.

    How to access

    IDUN LLM models are available from NTNU networks or NTNU VPN.

    You can use IDUN LLM models via VS Code plugins, web interface or API.

    Web chat/agent:
    - Open WebUI: https://idun-llm.hpc.ntnu.no
    - LibreChat: https://ai.hpc.ntnu.no/chat/
    Login with your NTNU short username.

    API key:
    We create personal API key for each user. Send e-mail to: help@hpc.ntnu.no

    See more details and examples in this document.

    What about sensitive data?

    All LLM models on IDUN are running locally and data is not leaving NTNU network.

    • API calls go directly into the model, and they are not stored anywhere.
    • Web interfaces Open WebUI and LibreChat have feature "Temporary Chat". These chats are not saved. "Temporary Chat" sis not enabled by default so users’ questions and answers are stored for user convenience. And user can delete saved conversations manually.
    • There is a plan to officially approve for "røde data", but we have not done the formal assessment yet.

    Temporary chat toggle is located in the top right corner:

    Usage statistics: https://ai.hpc.ntnu.no/stats

    LLM models

    Updated: 2025-12-28

    Model nameInput formatCreated byCountryLicenseParametersContext WindowGPU KV cache
    Mistral Large 3image and textMistral AI SASFranceApache 2.0675B294912905984
    GPT-OSS-120BtextOpenAIUSAApache 2.0117B131072882368
    GLM 4.7textZ.aiChinaMIT358B202752452112
    Kimi K2 ThinkingtextMoonshot AIChinaModified MIT1000B2621441665792
    Qwen3 Coder 30BtextAlibaba CloudChinaApache 2.030.5B262144798752
    NorwAI-Magistral-24B-reasoningtextNorwAI, NTNUNorwayNorLLM License by NTNU
    Qwen3 Embedding 8BtextAlibaba CloudChinaApache 2.08B40960

    What model developers write about their models:

    Model IDComments
    mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4The Mistral Large 3 Instruct model offers the following capabilities:
    - Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
    - Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
    - System Prompt: Maintains strong adherence and support for system prompts.
    - Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.

    Read about this model:
    https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4
    https://mistral.ai/news/mistral-3
    openai/gpt-oss-120bHighlights:
    - Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
    - Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
    - Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
    - Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.

    Read about this model:
    https://huggingface.co/openai/gpt-oss-120b
    https://openai.com/index/introducing-gpt-oss/
    zai-org/GLM-4.7-FP8GLM-4.7, your new coding partner, is coming with the following features:
    - Core Coding: multilingual agentic coding and terminal-based tasks. GLM-4.7 also supports thinking before acting, with significant improvements on complex tasks in mainstream agent frameworks such as Claude Code, Kilo Code, Cline, and Roo Code.
    - Vibe Coding: GLM-4.7 takes a big step forward in improving UI quality. It produces cleaner, more modern webpages and generates better-looking slides with more accurate layout and sizing.
    - Tool Using: GLM-4.7 achieves significantly improvements in Tool using.
    - Complex Reasoning: GLM-4.7 delivers a substantial boost in mathematical and reasoning capabilities.

    Read about this model:
    https://huggingface.co/zai-org/GLM-4.7-FP8
    https://z.ai/blog/glm-4.7
    moonshotai/Kimi-K2-ThinkingKey Features:
    - Deep Thinking & Tool Orchestration: End-to-end trained to interleave chain-of-thought reasoning with function calls, enabling autonomous research, coding, and writing workflows that last hundreds of steps without drift.
    - Stable Long-Horizon Agency: Maintains coherent goal-directed behavior across up to 200–300 consecutive tool invocations, surpassing prior models that degrade after 30–50 steps.

    Read about this model:
    https://huggingface.co/moonshotai/Kimi-K2-Thinking
    https://moonshotai.github.io/Kimi-K2/thinking.html
    Qwen/Qwen3-Coder-30B-A3B-InstructKey enhancements:
    - Significant Performance among open models on Agentic Coding, Agentic Browser-Use, and other foundational coding tasks.
    - Long-context Capabilities with native support for 256K tokens, extendable up to 1M tokens using Yarn, optimized for repository-scale understanding.
    - Agentic Coding supporting for most platform such as Qwen Code, CLINE, featuring a specially designed function call format.

    Read about this model:
    https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8
    https://qwenlm.github.io/blog/qwen3-coder/
    NorwAI/NorwAI-Magistral-24B-reasoningReasoning language model, by NowAI research center at Norwegian University of Science and Technology (NTNU) in collaboration with Schibsted, NRK, VG and the National Library of Norway. The model is designed to adapt its reasoning depth dynamically based on the type and complexity of the user’s question:
    - Completion mode for straightforward answers without reasoning
    - Short-thinking mode for moderately difficult questions requiring some reasoning
    - Long-thinking mode for more complex questions requiring deeper reasoning

    Read about this model:
    https://huggingface.co/NorwAI/NorwAI-Magistral-24B-reasoning
    Qwen/Qwen3-Embedding-8BHighlights
    - support 100+ Languages, including: Norwegian Bokmål, Norwegian Nynorsk. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.
    - long-text understanding
    - reasoning skills
    - The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations.

    Read about this model:
    https://huggingface.co/Qwen/Qwen3-Embedding-8B
    https://github.com/QwenLM/Qwen3-Embedding

    Web interface (Open WebUI)

    Open WebUI: https://idun-llm.hpc.ntnu.no
    LibreChat: https://ai.hpc.ntnu.no/chat/

    Open WebUILibreChat

    API (OpenAI Compatible) and models

    There are 2 ways to access the same IDUN models via API:

    1. Use unified API URL. But you need to have personal API token. Send email and we'll generate new API token for you: help@hpc.ntnu.no
    2. Use the same API token sk-IDUN-NTNU-LLM-API-KEY but you need to use unique Base URL for each model.

    Why 2 choices? We want to make unified API URL as default. But it is very early installation. And we are still leaning how to provide access to LLM models better.

    Solution 1 - Unified API URL

    https://llm.hpc.ntnu.no is a LLM gateway (LiteLLM) acts as universal adapter to access all LLMs on IDUN HPC cluster. It provides consistent openai compatible API. It also gives us to generate personalised API tokens.

    You can see all endpoints on this web page https://llm.hpc.ntnu.no/

    For this moment API tokes are generated manually. To get your own API token: send email and we'll generate new API token for you: help@hpc.ntnu.no

    Visual Studio Code - extensions

    There are 2 popular open source VS Code extensions:
    - Cline (recommended)
    - Roo Code

    NOTE: Current experience with Roo Code. Extension is getting updates every 1-3 days. And that cases somtimes issue. Things that worked yesterday can stop working today. This was my experience with Roo Code and Mistral models. You can downgrade extension and stop auto update.

    Base URL: https://llm.hpc.ntnu.no/
    API provider: LiteLLM or OpenAI compatible
    API Key: send email to help@hpc.ntnu.no to create new.

    NOTE 1: Some LLM models works better (no errors) with Cline some are better with Roo Code.

    NOTE 2: You can use API Provider "Open API compatible" or "LiteLLM". In most cases they are the same. But for example with Mistral Large 3 in Roo Code "Open API compatible" provider shows sometimes errors. This can be fixed in the next Roo Code update release.

    ClineRoo Code

    Comparing LLM models undenstanding and coding capabilities. Text prompt (Source https://z.ai/blog/glm-4.7):

    Design a richly crafted voxel-art environment featuring an ornate pagoda set within a vibrant garden.
    Include diverse vegetation—especially cherry blossom trees—and ensure the composition feels lively, colorful, and visually striking.
    Use any voxel or WebGL libraries you prefer, but deliver the entire project as a single, self-contained HTML file that I can paste and open directly in Chrome.

    API examples with curl and Python

    Test API token - get model list:

    curl https://llm.hpc.ntnu.no/v1/models -H "Authorization: Bearer sk-..MY..PESONAL..API..TOKEN.."

    Test chat response:

    curl https://llm.hpc.ntnu.no/v1/chat/completions -H "Authorization: Bearer sk-..MY..PESONAL..API..TOKEN.." -H "Content-Type: application/json" -d '{
        "model": "openai/gpt-oss-120b",
        "messages": [
          {"role": "user", "content": "Who are you?"}
        ]
      }'

    Example with curl command - embedding:

    curl https://llm.hpc.ntnu.no/v1/embeddings -H "Authorization: Bearer sk-..MY..PESONAL..API..TOKEN.." -H "Content-Type: application/json" -d '{
        "model": "Qwen/Qwen3-Embedding-8B",
        "input": ["hello world", "this is another sentence"]
      }'

    This example will use Python module openai. First create Python virtual environment and install openai module:

    python3 -m venv venv-openai
    source venv-openai/bin/activate
    pip install openai

    Create file chat-tools.py with code example with tool calling:

    import openai
    import json
    import datetime
    
    client = openai.OpenAI(
        base_url="https://llm.hpc.ntnu.no/v1",
        api_key="sk-..MY..PESONAL..API..TOKEN.."
    )
    
    def get_current_time():
        current_datetime = datetime.datetime.now()
        return f"Current Date and Time: {current_datetime}"
    
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_time",
                "description": "Get current date and time"
            },
        }
    ]
    
    response = client.chat.completions.create(
        model="openai/gpt-oss-120b",
        messages=[{"role": "user", "content": "What's the time right now?"}],
        tools=tools
    )
    
    # Process the response
    response_message = response.choices[0].message
    
    if response_message.tool_calls:
        for tool_call in response_message.tool_calls:
            function_name = tool_call.function.name
            function_args = json.loads(tool_call.function.arguments)
    
            if function_name == "get_current_time":
                time_info = get_current_time()
                print(f"Tool call executed: {function_name}() -> {time_info}")
            else:
                print(f"Unknown tool call: {function_name}")
    else:
        print(f"Model response (no tool call): {response_message.content}")

    Example output:

    $ python3 chat-tools.py
    Tool call executed: get_current_time() -> Current Date and Time: 2025-12-29 12:40:34.581986

    See more example below in Solution 2.

    Solution 2 - one API token but many URLs

    All models have the same API token/key: sk-IDUN-NTNU-LLM-API-KEY

    These LLM models are available (updated 2025-12-28):

    Model IDBase URL
    openai/gpt-oss-120b https://ai.hpc.ntnu.no/api/gpt-oss-120b/v1
    Qwen/Qwen3-Coder-30B-A3B-Instructhttps://ai.hpc.ntnu.no/api/qwen3-coder-30b-a3b-instruct/v1
    mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4https://ai.hpc.ntnu.no/api/mistral-large-3-675b-instruct-2512-nvfp4/v1
    zai-org/GLM-4.7-FP8https://ai.hpc.ntnu.no/api/glm-4.7-fp8/v1
    moonshotai/Kimi-K2-Thinkinghttps://ai.hpc.ntnu.no/api/kimi-k2-thinking/v1
    Qwen/Qwen3-Embedding-8Bhttps://ai.hpc.ntnu.no/api/qwen3-embedding-8b/v1

    OUTDATED - Introduction video

    OUTDATED - Visual Studio Code and Code Server ( plugin Continue.dev )

    Start VS Code locally or Code Server ( VS Code web version) on https://apps.hpc.ntnu.no

    Install extension Continue from VS Code Marketplace.

    Add configuration file by clicking on gear icon in a chat model list (see screenshot):

    Another way to add configuration: Create directory in your user directory ".continue" with a dot at the start. Create configuration file "config.yaml" in that directory.

    File content:

    name: IDUN Assistant
    version: 1.0.0
    schema: v1
    models:
      - name: Coder
        provider: openai
        model: Qwen/Qwen3-Coder-30B-A3B-Instruct
        apiBase: https://ai.hpc.ntnu.no/api/coder/v1
        apiKey: sk-IDUN-NTNU-LLM-API-KEY
        roles:
          - chat
          - edit
          - apply
        capabilities:
            - tool_use
      - name: Autocomplete
        provider: openai
        model: Qwen/Qwen3-Coder-30B-A3B-Instruct
        apiBase: https://ai.hpc.ntnu.no/api/coder/v1
        apiKey: sk-IDUN-NTNU-LLM-API-KEY
        roles:
          - autocomplete
    context:
      - provider: code
      - provider: docs
      - provider: diff
      - provider: terminal
      - provider: problems
      - provider: folder
      - provider: codebase

    Main elements (screenshot):

    OUTDATED - Visual Studio Code and Code Server ( plugin Cline )

    Start VS Code locally or Code Server ( VS Code web version) on https://apps.hpc.ntnu.no

    Install extension Cline from VS Code Marketplace.

    Add configuration file by clicking on "Use your own API key" (see screenshot):

    Use this settings:

    API Provider: OpenAI Compatible
    Base URL: https://ai.hpc.ntnu.no/api/coder/v1
    OpenAI Compatible API Key: sk-IDUN-NTNU-LLM-API-KEY
    Model ID: Qwen/Qwen3-Coder-30B-A3B-Instruct

    OUTDATED - Visual Studio Code and Code Server ( plugin Roo Code )

    Start VS Code locally or Code Server ( VS Code web version) on https://apps.hpc.ntnu.no

    Install extension Roo Code from VS Code Marketplace.

    Use this settings to change settings (see screenshot):

    API Provider: OpenAI Compatible
    Base URL: https://ai.hpc.ntnu.no/api/coder/v1
    OpenAI Compatible API Key: sk-IDUN-NTNU-LLM-API-KEY
    Model ID: Qwen/Qwen3-Coder-30B-A3B-Instruct

    There is not need to connect to Roo Code cloud, close request:

    OUTDATED - JupiterLab ( plugin Jupyter AI )

    Install plugin Jupyter AI. Example with new Python environment:

    module load Python/3.12.3-GCCcore-13.3.0
    python -m venv /cluster/home/USERNAME/JupiterAI
    source /cluster/home/USERNAME/JupiterAI/bin/activate
    pip install jupyterlab
    pip install jupyter-ai[all]

    Start JupyterLab via https://apps.hpc.ntnu.no

    Change Jupyter AI settings:

    Language Model
    - Completion model: OpenAI (general interface)::*
    - Model ID: Qwen/Qwen3-Coder-30B-A3B-Instruct
    - Base API URL: https://ai.hpc.ntnu.no/api/coder/v1

    Inline completions model (Optional)
    - Completion model: OPenAI (general interface)::*
    - Model ID: Qwen/Qwen3-Coder-30B-A3B-Instruct
    - Base API URL: https://ai.hpc.ntnu.no/api/coder/v1

    API Keys: sk-IDUN-NTNU-LLM-API-KEY

    Scroll to Top