ChatGLM4 batch推理时padding 细节 #424

geekchen007 · 2024-07-31T10:07:48Z

System Info / 系統信息

transformers==4.39.0

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

官方类似的处理代码
GLM-4/basic_demo/trans_batch_demo.py at main · THUDM/GLM-4: https://github.com/THUDM/GLM-4/blob/main/basic_demo/trans_batch_demo.py

以下面代码为例，batch推理的结果和单个query推理的结果不一致

已经设置贪心推理策略
gen_kwargs = {"max_new_tokens": max_new_tokens, "do_sample": True, "top_k": 1,}

batch推理

您刚才的提问是关于为什么您的父母在结婚时不能带您去。这个问题可能是因为您在思考或者想象一个场景，即如果您的父母在结婚时能够带着您，那会是什么样的情况。通常情况下，父母结婚时孩子还没有出生，所以自然不能带着未出生的孩子参加婚礼。这是一个基于现实情况的问题。

非batch推理

new_batch_input = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True)

您刚才的提问是关于为什么您的父母在结婚时不能带您去。这个问题可能是因为您对父母结婚时的情景感到好奇，或者是在思考为什么在父母结婚的时候，作为孩子的您不在场。通常情况下，孩子是在父母结婚之后才出生的，所以他们在结婚时自然不能带着未出生的您。

queries = ["小明的女朋友是谁？"]

def batch_infer(self, queries, gen_kwargs):
    if not isinstance(queries, list):
        queries = [queries]
    inputs_list = []
    generate_text = []
    for query in queries:
        input_dict = self.tokenizer.apply_chat_template(
            [{"role": "user", "content": query}],
            add_generation_prompt=True,
            tokenize=False,
        )
        encoded_inputs = self.tokenizer(
            input_dict,
            padding=True,
            truncation=True,
            max_length=1024,
            return_tensors="pt"
        )
        inputs_list.append(encoded_inputs)

    batch_inputs = {}
    for key in inputs_list[0].keys():
        batch_inputs[key] = torch.cat([inputs[key] for inputs in inputs_list], dim=0)

    batch_inputs = {k: v.to(self.device) for k, v in batch_inputs.items()}

    with torch.no_grad():
        outputs = self.model.generate(**batch_inputs, **gen_kwargs)
        outputs = outputs[:, batch_inputs['input_ids'].shape[1]:]

    for _, output in enumerate(outputs):
        summary_res = self.tokenizer.decode(output, skip_special_tokens=True)
        generate_text.append(summary_res)
    return generate_text

Expected behavior / 期待表现

循环单次推理的结果和padding后凑batch的推理结果一致

参考资料：
LLM padding 细节 - 知乎: https://zhuanlan.zhihu.com/p/675273498?utm_psn=1751559938508574720

The text was updated successfully, but these errors were encountered:

zRzRzRzRzRzRzR · 2024-08-11T15:50:45Z

"do_sample": True, "top_k": 1,}
那你不得设置成 False么，尝试将要 "do_sample": False, "top_k": 1,}

geekchen007 · 2024-08-12T07:13:16Z

"do_sample": True, "top_k": 1,} 那你不得设置成 False么，尝试将要 "do_sample": False, "top_k": 1,}
尝试过这个参数，也无法一致。
prompt内容可以切换的更复杂些，更容易验证一致性。比如"小明的女朋友是谁？"会不一致；简单的提示词"锄禾日当午下一句"会一致。

zRzRzRzRzRzRzR self-assigned this Aug 1, 2024

sixsixcoder closed this as completed Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChatGLM4 batch推理时padding 细节 #424

ChatGLM4 batch推理时padding 细节 #424

geekchen007 commented Jul 31, 2024 •

edited

Loading

zRzRzRzRzRzRzR commented Aug 11, 2024 •

edited

Loading

geekchen007 commented Aug 12, 2024

ChatGLM4 batch推理时padding 细节 #424

ChatGLM4 batch推理时padding 细节 #424

Comments

geekchen007 commented Jul 31, 2024 • edited Loading

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

以下面代码为例，batch推理的结果和单个query推理的结果不一致

batch推理

非batch推理

Expected behavior / 期待表现

zRzRzRzRzRzRzR commented Aug 11, 2024 • edited Loading

geekchen007 commented Aug 12, 2024

geekchen007 commented Jul 31, 2024 •

edited

Loading

zRzRzRzRzRzRzR commented Aug 11, 2024 •

edited

Loading