Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

fix(ai): use enable_thinking for Z.ai instead of thinking param#1674

Merged
badlogic merged 1 commit intobadlogic:mainfrom
okuyam2y:fix/zai-enable-thinking-param
Feb 27, 2026
Merged

fix(ai): use enable_thinking for Z.ai instead of thinking param#1674
badlogic merged 1 commit intobadlogic:mainfrom
okuyam2y:fix/zai-enable-thinking-param

Conversation

@okuyam2y
Copy link
Contributor

Problem

The Z.ai thinking control in openai-completions.ts sends thinking: { type: "disabled" }, but Z.ai actually uses enable_thinking: false — the same format as Qwen.

Because Z.ai doesn't recognize the thinking parameter, it ignores the disable request and always runs with thinking enabled. This wastes reasoning tokens and adds significant latency (measured 1.8x slower on chat responses).

Fix

Merge the Z.ai and Qwen code paths since they use the same enable_thinking: boolean format.

Before:

if (compat.thinkingFormat === "zai" && model.reasoning) {
    (params as any).thinking = { type: options?.reasoningEffort ? "enabled" : "disabled" };
} else if (compat.thinkingFormat === "qwen" && model.reasoning) {
    (params as any).enable_thinking = !!options?.reasoningEffort;
}

After:

if ((compat.thinkingFormat === "zai" || compat.thinkingFormat === "qwen") && model.reasoning) {
    (params as any).enable_thinking = !!options?.reasoningEffort;
}

Testing

Verified with Z.ai GLM-4.7 and GLM-5 models:

  • Before fix: all responses contain reasoning_content (thinking=True)
  • After fix: no reasoning_content when enable_thinking: false (thinking=False)
  • GLM-4.7 chat response: 3.2s → 1.8s (1.8x faster)
  • GLM-5 chat response: 12.9s → 7.0s (1.8x faster)

🤖 Generated with Claude Code

Z.ai uses the same `enable_thinking: boolean` parameter as Qwen to
control reasoning, not `thinking: { type: "enabled" | "disabled" }`.

The wrong parameter name means Z.ai ignores the disable request and
always runs with thinking enabled, wasting tokens and adding latency.

Merge the Z.ai and Qwen branches since they use the same format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

Hi @okuyam2y, thanks for your interest in contributing!

We ask new contributors to open an issue first before submitting a PR. This helps us discuss the approach and avoid wasted effort.

Next steps:

  1. Open an issue describing what you want to change and why (keep it concise, write in your human voice, AI slop will be closed)
  2. Once a maintainer approves with lgtm, you'll be added to the approved contributors list
  3. Then you can submit your PR

This PR will be closed automatically. See https://github.com/badlogic/pi-mono/blob/main/CONTRIBUTING.md for more details.

@github-actions github-actions bot closed this Feb 27, 2026
@badlogic badlogic reopened this Feb 27, 2026
@badlogic badlogic merged commit 22b3be8 into badlogic:main Feb 27, 2026
1 check passed
@badlogic
Copy link
Owner

cheers

@okuyam2y
Copy link
Contributor Author

Thanks for merging! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants