Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

[infer] expose finish_reason from inference engine#2249

Open
oelachqar wants to merge 2 commits intomainfrom
oelachqar/add_finish_reason
Open

[infer] expose finish_reason from inference engine#2249
oelachqar wants to merge 2 commits intomainfrom
oelachqar/add_finish_reason

Conversation

@oelachqar
Copy link
Contributor

Description

[infer] expose finish_reason from inference engine

Related issues

Fixes # (issue)

Before submitting

  • This PR only changes documentation. (You can ignore the following checks in that case)
  • Did you read the contributor guideline Pull Request guidelines?
  • Did you link the issue(s) related to this PR in the section above?
  • Did you add / update tests where needed?

Reviewers

At least one review from a member of oumi-ai/oumi-staff is required.

@gitar-bot
Copy link

gitar-bot bot commented Mar 7, 2026

Important

Upgrade your plan to unlock code review, CI analysis, custom rules, and more.

"stop": FinishReason.STOP,
"length": FinishReason.LENGTH,
}
return mapping.get(raw_reason.lower(), FinishReason.UNKNOWN)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The _normalize_sglang_finish_reason function incorrectly assumes finish_reason from SGLang is a string, but it's a dictionary, causing an AttributeError on .lower().
Severity: CRITICAL

Suggested Fix

Modify _normalize_sglang_finish_reason to handle a dictionary input. Check if raw_reason is a dictionary and extract the reason string from the type key (e.g., raw_reason.get("type")). Then, perform the lowercasing and mapping on this extracted string. Update the function's type hint to reflect it can accept a dictionary.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/oumi/inference/sglang_inference_engine.py#L293

Potential issue: The `_normalize_sglang_finish_reason` function expects its `raw_reason`
argument to be a string or `None`. However, the SGLang API returns a dictionary for the
`finish_reason` field, such as `{"type": "stop"}`. The function attempts to call
`.lower()` on this dictionary, which will raise an `AttributeError` at runtime. This
will cause any SGLang inference that includes a `finish_reason` in its response to fail.

Did we get this right? 👍 / 👎 to inform future reviews.

*conversation.messages,
Message(role=Role.ASSISTANT, content=response),
]
metadata = dict(conversation.metadata)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: When exclude_prompt_from_response is False, the finish_reason is calculated using total sequence length (prompt + generation) instead of just generated tokens, leading to incorrect results.
Severity: MEDIUM

Suggested Fix

When exclude_prompt_from_response is False, calculate the length of the generated tokens by subtracting the prompt length from the total sequence length before comparing it to max_new_tokens. This ensures the finish_reason is determined based on the number of newly generated tokens only.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/oumi/inference/native_text_inference_engine.py#L368

Potential issue: In the `native_text_inference_engine`, when
`generation_params.exclude_prompt_from_response` is set to `False`, the code calculates
the length of the generated sequence including the original prompt's tokens. This total
length is then compared against `generation_params.max_new_tokens` to determine if the
finish reason was `LENGTH` or `STOP`. This logic is flawed because `max_new_tokens`
applies only to the newly generated part. As a result, the `finish_reason` can be
incorrectly reported as `LENGTH` even if the model stopped generating text naturally.

Did we get this right? 👍 / 👎 to inform future reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants