[infer] expose finish_reason from inference engine by oelachqar · Pull Request #2249 · oumi-ai/oumi

oelachqar · 2026-03-07T06:32:10Z

Description

[infer] expose finish_reason from inference engine

Related issues

Fixes # (issue)

Before submitting

This PR only changes documentation. (You can ignore the following checks in that case)
Did you read the contributor guideline Pull Request guidelines?
Did you link the issue(s) related to this PR in the section above?
Did you add / update tests where needed?

Reviewers

At least one review from a member of oumi-ai/oumi-staff is required.

gitar-bot · 2026-03-07T06:32:13Z

Important

Upgrade your plan to unlock code review, CI analysis, custom rules, and more.

sentry · 2026-03-07T06:34:51Z

src/oumi/inference/sglang_inference_engine.py

+            "stop": FinishReason.STOP,
+            "length": FinishReason.LENGTH,
+        }
+        return mapping.get(raw_reason.lower(), FinishReason.UNKNOWN)


Bug: The _normalize_sglang_finish_reason function incorrectly assumes finish_reason from SGLang is a string, but it's a dictionary, causing an AttributeError on .lower().
_{Severity: CRITICAL}

Suggested Fix

Modify _normalize_sglang_finish_reason to handle a dictionary input. Check if raw_reason is a dictionary and extract the reason string from the type key (e.g., raw_reason.get("type")). Then, perform the lowercasing and mapping on this extracted string. Update the function's type hint to reflect it can accept a dictionary.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/oumi/inference/sglang_inference_engine.py#L293 Potential issue: The `_normalize_sglang_finish_reason` function expects its `raw_reason` argument to be a string or `None`. However, the SGLang API returns a dictionary for the `finish_reason` field, such as `{"type": "stop"}`. The function attempts to call `.lower()` on this dictionary, which will raise an `AttributeError` at runtime. This will cause any SGLang inference that includes a `finish_reason` in its response to fail.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

sentry · 2026-03-07T06:34:51Z

src/oumi/inference/native_text_inference_engine.py

                    *conversation.messages,
                    Message(role=Role.ASSISTANT, content=response),
                ]
+                metadata = dict(conversation.metadata)


Bug: When exclude_prompt_from_response is False, the finish_reason is calculated using total sequence length (prompt + generation) instead of just generated tokens, leading to incorrect results.
_{Severity: MEDIUM}

Suggested Fix

When exclude_prompt_from_response is False, calculate the length of the generated tokens by subtracting the prompt length from the total sequence length before comparing it to max_new_tokens. This ensures the finish_reason is determined based on the number of newly generated tokens only.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/oumi/inference/native_text_inference_engine.py#L368 Potential issue: In the `native_text_inference_engine`, when `generation_params.exclude_prompt_from_response` is set to `False`, the code calculates the length of the generated sequence including the original prompt's tokens. This total length is then compared against `generation_params.max_new_tokens` to determine if the finish reason was `LENGTH` or `STOP`. This logic is flawed because `max_new_tokens` applies only to the newly generated part. As a result, the `finish_reason` can be incorrectly reported as `LENGTH` even if the model stopped generating text naturally.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

oelachqar added 2 commits March 6, 2026 17:38

update

b884526

cleanup

b5ea78d

oelachqar requested review from rlehman221, taenin and wizeng23 March 7, 2026 06:32

sentry bot reviewed Mar 7, 2026

View reviewed changes

wizeng23 approved these changes Mar 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[infer] expose finish_reason from inference engine#2249

[infer] expose finish_reason from inference engine#2249
oelachqar wants to merge 2 commits intomainfrom
oelachqar/add_finish_reason

oelachqar commented Mar 7, 2026

Uh oh!

gitar-bot bot commented Mar 7, 2026

Uh oh!

sentry bot Mar 7, 2026

Uh oh!

sentry bot Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oelachqar commented Mar 7, 2026

Description

Related issues

Before submitting

Reviewers

Uh oh!

gitar-bot bot commented Mar 7, 2026

Uh oh!

sentry bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

sentry bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants