Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatGLM4 batch推理时padding 细节 #424

Closed
1 of 2 tasks
geekchen007 opened this issue Jul 31, 2024 · 2 comments
Closed
1 of 2 tasks

ChatGLM4 batch推理时padding 细节 #424

geekchen007 opened this issue Jul 31, 2024 · 2 comments
Assignees

Comments

@geekchen007
Copy link

geekchen007 commented Jul 31, 2024

System Info / 系統信息

transformers==4.39.0

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

官方类似的处理代码
GLM-4/basic_demo/trans_batch_demo.py at main · THUDM/GLM-4: https://github.com/THUDM/GLM-4/blob/main/basic_demo/trans_batch_demo.py

以下面代码为例,batch推理的结果和单个query推理的结果不一致

已经设置贪心推理策略
gen_kwargs = {"max_new_tokens": max_new_tokens, "do_sample": True, "top_k": 1,}

batch推理

您刚才的提问是关于为什么您的父母在结婚时不能带您去。这个问题可能是因为您在思考或者想象一个场景,即如果您的父母在结婚时能够带着您,那会是什么样的情况。通常情况下,父母结婚时孩子还没有出生,所以自然不能带着未出生的孩子参加婚礼。这是一个基于现实情况的问题。

非batch推理

new_batch_input = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True)
您刚才的提问是关于为什么您的父母在结婚时不能带您去。这个问题可能是因为您对父母结婚时的情景感到好奇,或者是在思考为什么在父母结婚的时候,作为孩子的您不在场。通常情况下,孩子是在父母结婚之后才出生的,所以他们在结婚时自然不能带着未出生的您。

queries = ["小明的女朋友是谁?"]

def batch_infer(self, queries, gen_kwargs):
    if not isinstance(queries, list):
        queries = [queries]
    inputs_list = []
    generate_text = []
    for query in queries:
        input_dict = self.tokenizer.apply_chat_template(
            [{"role": "user", "content": query}],
            add_generation_prompt=True,
            tokenize=False,
        )
        encoded_inputs = self.tokenizer(
            input_dict,
            padding=True,
            truncation=True,
            max_length=1024,
            return_tensors="pt"
        )
        inputs_list.append(encoded_inputs)

    batch_inputs = {}
    for key in inputs_list[0].keys():
        batch_inputs[key] = torch.cat([inputs[key] for inputs in inputs_list], dim=0)

    batch_inputs = {k: v.to(self.device) for k, v in batch_inputs.items()}

    with torch.no_grad():
        outputs = self.model.generate(**batch_inputs, **gen_kwargs)
        outputs = outputs[:, batch_inputs['input_ids'].shape[1]:]

    for _, output in enumerate(outputs):
        summary_res = self.tokenizer.decode(output, skip_special_tokens=True)
        generate_text.append(summary_res)
    return generate_text

Expected behavior / 期待表现

循环单次推理的结果和padding后凑batch的推理结果一致

参考资料:
LLM padding 细节 - 知乎: https://zhuanlan.zhihu.com/p/675273498?utm_psn=1751559938508574720

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this Aug 1, 2024
@zRzRzRzRzRzRzR
Copy link
Member

zRzRzRzRzRzRzR commented Aug 11, 2024

"do_sample": True, "top_k": 1,}
那你不得设置成 False么,尝试将要 "do_sample": False, "top_k": 1,}

@geekchen007
Copy link
Author

"do_sample": True, "top_k": 1,} 那你不得设置成 False么,尝试将要 "do_sample": False, "top_k": 1,}
尝试过这个参数,也无法一致。
prompt内容可以切换的更复杂些,更容易验证一致性。比如"小明的女朋友是谁?"会不一致;简单的提示词"锄禾日当午下一句"会一致。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants