GLM-Z1-9B-0414 on the Windows Ollama app

Published 2025-11-03. Last modified 2025-11-04.
Time to read: 6 minutes.

This page is part of the llm collection.

The following articles were written sequentially.

This article follows Best Local LLMs For Coding. Here I discuss how I set up one of the currently leading LLMs for coding with Ollama.

Get Started

Paste sllm/glm-z1-9b into the model selector, then press the model button and press Enter. The model will download, install and run.

If you open the sidebar, then select settings, then enable Expose Ollama to the network, you will be asked to grant network permissions to the Ollama executable:

Important Settings

Here are some settings that will probably improve your coding experience with GLM-Z1-9B-0414.

Chat window

/set parameter repeat_penalty 1.2
/set parameter repeat_last_n 128
/set parameter top_k 50
/set parameter temperature 0.2

Repetition is a known quirk with GLM-4 models in general. The above settings try to ensure that only one copy of an answer is returned to you. This happens because:

The community quant (sllm) uses an older tokenizer/chat template that doesn’t fully suppress loops in long reasoning chains.
GLM-Z1’s “deep thinking” mode can self-reinforce patterns
Ollama’s default repeat_last_n value is too low; it only penalizes the last 64 tokens.

You can also type /help.

The currently active settings can be saved as a preset called coding.

Chat window

/save preset coding

At the start of every coding chat session, reload the settings for coding.

Chat window

/load preset coding

Define Model With Desired Settings

It is easy to define a new model with any desired settings. For example, lets create a model called glm-z1-fixed with the previously shown settings:

Chat window

/create glm-z1-fixed from sllm/glm-z1-9b
/set parameter repeat_penalty 1.2
/set parameter repeat_last_n 128
/set parameter top_k 40
/set parameter top_p 0.95
/set parameter temperature 0.6
/save

From now on search and run glm-z1-fixed instead of sllm/glm-z1-9b. No need to load settings.

Agentic Assistance in Visual Studio Code

sllm/glm-z1-9b Ollama model is an instance of GLM-Z1-9B-0414.

The remainder of this article are my notes on setting it up as an agentic coding assistant in Visual Studio Code. The assistant can provide contextual help like code generation, refactoring, debugging, file creation and editing, and multi-step reasoning (e.g., "Plan and implement a full auth system"), powered by Ollama’s local inference performed on on your RTX 3060.

Grok recommended the Continue.dev Visual Studio Code extension, which is open-source, privacy-focused, and designed for Ollama integration. This Visual Studio Code extension turns your LLM into a Copilot-like agent, with inline completions, a sidebar chat with codebase context, and "agent mode" for autonomous tasks (e.g., using tools to edit files or run commands).

"Best" is a relative term. Continue is what would have been called bleeding-edge in the dotcom boom. You were warned.

Run Ollama

Ensure the Ollama app is open and sllm/glm-z1-9b is pulled. One way to test if the model has been pulled is to type the following in a CMD or PowerShell console:

PowerShell

PS C:\Users\Mike Slinn> ollama run sllm/glm-z1-9b

>>> Send a message (/? for help) /help
Available Commands:
  /set            Set session variables
  /show           Show model information
  /load    Load a session or model
  /save    Save your current session
  /clear          Clear session context
  /bye            Exit
  /?, /help       Help for a command
  /? shortcuts    Help for keyboard shortcuts

Use """ to begin a multi-line message. 

>>> /show modelfile
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM sllm/glm-z1-9b:latest

FROM /usr/share/ollama/.ollama/models/blobs/sha256-a7b407b938bdbab835e0237edbc30fb88fe8cd9342a3fe43cf3ee4c75f1dae38
TEMPLATE {{ .Prompt }}

>>> /show parameters
Model defined parameters:
No additional parameters were specified for this model.

>>> CTRL+D

You can halt Ollama with CTRL+D once you know it works.

Install the Continue Visual Studio Code Extension

Install the Continue Visual Studio Code extension by typing:

Shell

$ code --install-extension Continue.continue

Adjust Context Window

A computer like Bear with 12 GB VRAM and 64 GB RAM should handle contexts from 8K to 32K. User the Ollama app's settings slider for tuning the size of the context window.

Visual Studio Code Extension Settings

Open Visual Studio Code.
Go to the Continue.continue extension setup.
I enabled Enable Console and disabled Telemetry Enabled.

Configure Continue for GLM-Z1-9B

To configure Continue to use the GLM-Z1-9B model, edit the Continue configuration file. On macOS and Linux the file is ~/.continue/config.json. On Windows the file is %USERPROFILE%\.continue\config.json. Because I run Ollama on native Windows, not WSL, I only edit the file in one place. Set the contents of this file to the following exactly as shown:

~/.continue/config.json or %USERPROFILE%\.continue\config.json

{
  "codebase": {
    "embedOnStartup": true
  },

  "contextProviders": [
    {
      "name": "codebase",
      "params": {
        "nRetrieve": 15,
        "nFinal": 5,
        "useChunks": true
      }
    },
    {
      "name": "docs",
      "params": {
        "urls": [
          "https://ollama.com",
          "https://mslinn.com",
          "https://go.dev",
          "https://developer.mozilla.org/en-US/docs/Web"
        ]
      }
    }
  ],

  "embeddingsProvider": {
    "model": "nomic-embed-text",
    "provider": "ollama"
  },

  "models": [
    {
      "apiBase": "http://localhost:11434",
      "default": true,
      "model": "sllm/glm-z1-9b",
      "name": "GLM-Z1-9B",
      "provider": "ollama",
      "roles": [
        "apply",
        "chat",
        "edit"
      ],
      "supportsTools": true,
      "title": "GLM-Z1-9B (Local Ollama)"
    }
  ],

  "rules": [
    {
      "name": "No Philosophy",
      "description": "Never give abstract or philosophical answers. Be direct and practical.",
      "prompt": "Answer in 1-2 sentences. No explanations unless asked."
    },
    {
      "name": "Use Think Tags",
      "description": "Use <think> only for complex reasoning, not simple questions.",
      "prompt": "For simple queries, respond immediately. Use <think> only when planning multi-step actions."
    },
    {
      "name": "File Names Only",
      "description": "When asked for file name, return just the name.",
      "prompt": "If asked 'what is your name' or similar, say Jeannie VaVoom."
    }
  ],

  "systemMessage": "You are a helpful coding agent. Use <think> for reasoning.",

  "tabAutocompleteModel": {
    "apiBase": "http://localhost:11434",
    "model": "sllm/glm-z1-9b",
    "provider": "ollama",
    "title": "GLM-Z1-9B Autocomplete"
  }
}

"supportsTools": true, above enables agentic tools if "roles" are also provided as shown.

I found that the configuration file was automatically reread every time it was modified. I liked that very much. No need to request a reload, restart Visual Studio Code or reboot the computer. Grok truly loves to go into gory detail about how to reset the universe to a defined state.

Defining contextProviders above caused new context providers to appear in the Continue dropdown that presents when you type @: Codebase, and Docs. If you type @codebase, your entire project is included in the context. The complete list of contexts is:

All Contexts

@codebase
@docs
@files
@current
@git
@terminal
@problems

The above config.json is sufficient. However, Continue is the midst of transitioning from JSON to YAML. While a JSON file is currently required, the equivalent YAML file may soon be required instead. Only provide one or the other.

BTW, here is a YAML equivalent of the above JSON file:

~/.continue/config.yaml and %USERPROFILE%\.continue\config.yaml

---
codebase:
  embedOnStartup: "true"
contextProviders:
- name: "codebase"
  params:
    nRetrieve: "15"
    nFinal: "5"
    useChunks: "true"
- name: "docs"
  params:
    urls:
    - "https://ollama.com"
    - "https://mslinn.com"
    - "https://go.dev"
    - "https://developer.mozilla.org/en-US/docs/Web"
embeddingsProvider:
  model: "nomic-embed-text"
  provider: "ollama"
models:
- apiBase: "http://localhost:11434"
  default: "true"
  systemMessage: "You are a helpful coding agent. Use <think> for reasoning."
  model: "sllm/glm-z1-9b"
  name: "GLM-Z1-9B"
  provider: "ollama"
  supportsTools: "true"
  title: "GLM-Z1-9B (Local Ollama)"
tabAutocompleteModel:
  apiBase: "http://localhost:11434"
  model: "sllm/glm-z1-9b"
  provider: "ollama"
  title: "GLM-Z1-9B Autocomplete"

Continue tries to load every .yaml file in .continue/agents/ as an agent definition. These files require a name: field. The example from the installation process, .continue/agents/new-config.yaml lacks this field, causing a fatal error. Delete or rename the example file with a different filetype; perhaps .yaml-. The following Bash incantation uses brace expansion to do that:

Shell

$ ~/.continue/agents/new-config.yaml{,-}

Here is the minimal structure for a valid local config.yaml:

minimal valid config.yaml

name: Whatever
version: 1.0.0
schema: v1

Also delete rm ./.continue/mcpServers/new-mcp-server.yaml if it exists.

Use Nomic To Improve Agentic Context Awareness

For better agentic context awareness, stir in a dash of nomic-embed-text:

CMD or PowerShell

PS C:\Users\Mike Slinn> ollama pull nomic-embed-text
pulling manifest
pulling 970aa74c0a90: 100% ▕█████████████████████▏ 274 MB
pulling c71d239df917: 100% ▕█████████████████████▏ 11 KB
pulling ce4a164fc046: 100% ▕█████████████████████▏ 17 B
pulling 31df23ea7daa: 100% ▕█████████████████████▏ 420 B
verifying sha256 digest
writing manifest
success

nomic-embed-text is another LLM that can easily be added to a Continue.dev configuration, resulting in a dual-model setup.

nomic-embed-text provides significant value in both single, dual and triad LLM setups, primarily by enabling efficient Retrieval-Augmented Generation (RAG) for codebase and docs context. It acts as a semantic search layer, injecting relevant snippets into prompts without overwhelming token limits—boosting accuracy and reducing hallucinations by ~20–30% on agentic tasks (e.g., multi-file refactors). Without it, models rely on keyword search, which is noisier and slower.

Nomic-embed-text is a lightweight (137M params, ~250 MB) open-source embedding model optimized for long contexts (up to 8K tokens) and multilingual RAG. It converts code and documentss into dense vectors for fast similarity search, outperforming Ada-002 on BEIR/MIRACL benchmarks (e.g., 65–70% recall at 8192 tokens).

The embed role in Continue.dev specifies that a model can generate embeddings, which are vector representations of text used for semantic search and retrieval-augmented generation (RAG) in features like @codebase or @docs. Embeddings are created during indexing of your codebase or documentation and enable the system to retrieve relevant snippets based on similarity to the user's query, improving context accuracy without loading entire files into prompts. This role is assigned to embedding models (typically small, specialized ones) rather than full LLMs, and it is configured in the config.json under embeddingsProvider or per-model roles. For example, adding "roles": ["embed"] to a model allows it to be used for this purpose.

Without an embed-capable model, Continue falls back to keyword search, which is less accurate and slower for code-heavy queries.

If focusing on code-heavy tasks, use voyage-code-3 for a 5–10% recall boost, but nomic is the sweet spot for your hardware.

Enable Agentic Features

Open the Visual Studio command palette with CTRL+SHIFT>+P, then type Continue: Open Settings.
Open the right side bar by pressing CTRL+SHIFT+B. Now open the Continue pane by pressing CTRL+SHIFT+L. If you press that key chord while in an editor, the current line will be deleted; you do not want that to happen. You can now drag the Continue chat pane to the right side bar; that is where I put all my agents.
Click the Settings icon at the bottom of the Continue pane, then you should see User Settings.
Agent mode is the default (set by the small pull-down menu below where you type commands for LLMs. This mode allows multi-step tasks like "Create a new file and implement X".
In the configuration file shown below,. set contextProvider.docs to a list of the URLs that it should treat as references.
In chat prompts, prepend <think> before requests to encourage step-by-step reasoning.

More settings for you to play with are shown to the right.

Now use It!

Here are all the ways you can use Continue with GLM-Z1-9B:

Open a code file in VS Code (e.g., a Python script).
To use GLM-Z1-9B with context:
1. Highlight code
2. press CTRL+I
3. type explain or refactor into the chat
Type <think>Debug this function and suggest fixes.</think>
In the chat area, type:

As an agent, plan and implement a REST API endpoint for user login in this Node.js project.
The LLM will reason, edit files, and ask for approval.
Autocomplete starts suggesting things as you type. Press TAB to accept the displayed suggestion.

Expected Performance on Bear

Speed: 50–70 t/s for chat/generation (Q6_K quant); inline completions ~80–100 t/s.
Context: Up to 128K via Ollama slider, but stick to 8K–32K for snappiness (YaRN-enabled for GLM-Z1).
Agentic Limits: GLM-Z1-9B excels at reasoning but may need prompting for tools (e.g., "Use file edit tool to..."). For more advanced agents (e.g., browser/tool-calling), upgrade to GLM-4.6:cloud if limits allow.

Troubleshooting

Ensure Ollama app is running (ollama serve in PowerShell if needed). Restart VS Code.
Slow performance or VRAM Issues: Lower context in Ollama slider to 4K–8K.
Agent mode not offered: Update the Continue Extension.

This gives you a fully local, agentic Copilot alternative—private, fast, and tailored to GLM-Z1's strengths.

Continue CLI

The Continue CLI command stub is called cn.

Shell

$ npm i -g @continuedev/cli
added 86 packages in 11s

I could not get cn to respond.

The following articles were written sequentially.

Mainframe image; Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License by PekoeBlaze

© Copyright 1994-2025 Michael Slinn. All rights reserved.
For requests to use this copyright-protected work in any manner, email mslinn@mslinn.com.

This website was made using Jekyll and Mike Slinn’s Jekyll Plugins.