[infer] expose finish_reason from inference engine#2249
[infer] expose finish_reason from inference engine#2249
Conversation
|
Important Upgrade your plan to unlock code review, CI analysis, custom rules, and more. |
| "stop": FinishReason.STOP, | ||
| "length": FinishReason.LENGTH, | ||
| } | ||
| return mapping.get(raw_reason.lower(), FinishReason.UNKNOWN) |
There was a problem hiding this comment.
Bug: The _normalize_sglang_finish_reason function incorrectly assumes finish_reason from SGLang is a string, but it's a dictionary, causing an AttributeError on .lower().
Severity: CRITICAL
Suggested Fix
Modify _normalize_sglang_finish_reason to handle a dictionary input. Check if raw_reason is a dictionary and extract the reason string from the type key (e.g., raw_reason.get("type")). Then, perform the lowercasing and mapping on this extracted string. Update the function's type hint to reflect it can accept a dictionary.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/oumi/inference/sglang_inference_engine.py#L293
Potential issue: The `_normalize_sglang_finish_reason` function expects its `raw_reason`
argument to be a string or `None`. However, the SGLang API returns a dictionary for the
`finish_reason` field, such as `{"type": "stop"}`. The function attempts to call
`.lower()` on this dictionary, which will raise an `AttributeError` at runtime. This
will cause any SGLang inference that includes a `finish_reason` in its response to fail.
Did we get this right? 👍 / 👎 to inform future reviews.
| *conversation.messages, | ||
| Message(role=Role.ASSISTANT, content=response), | ||
| ] | ||
| metadata = dict(conversation.metadata) |
There was a problem hiding this comment.
Bug: When exclude_prompt_from_response is False, the finish_reason is calculated using total sequence length (prompt + generation) instead of just generated tokens, leading to incorrect results.
Severity: MEDIUM
Suggested Fix
When exclude_prompt_from_response is False, calculate the length of the generated tokens by subtracting the prompt length from the total sequence length before comparing it to max_new_tokens. This ensures the finish_reason is determined based on the number of newly generated tokens only.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/oumi/inference/native_text_inference_engine.py#L368
Potential issue: In the `native_text_inference_engine`, when
`generation_params.exclude_prompt_from_response` is set to `False`, the code calculates
the length of the generated sequence including the original prompt's tokens. This total
length is then compared against `generation_params.max_new_tokens` to determine if the
finish reason was `LENGTH` or `STOP`. This logic is flawed because `max_new_tokens`
applies only to the newly generated part. As a result, the `finish_reason` can be
incorrectly reported as `LENGTH` even if the model stopped generating text naturally.
Did we get this right? 👍 / 👎 to inform future reviews.
Description
[infer] expose finish_reason from inference engine
Related issues
Fixes # (issue)
Before submitting
Reviewers
At least one review from a member of
oumi-ai/oumi-staffis required.