We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transformers==4.39.0
No response
官方类似的处理代码 GLM-4/basic_demo/trans_batch_demo.py at main · THUDM/GLM-4: https://github.com/THUDM/GLM-4/blob/main/basic_demo/trans_batch_demo.py
已经设置贪心推理策略 gen_kwargs = {"max_new_tokens": max_new_tokens, "do_sample": True, "top_k": 1,}
您刚才的提问是关于为什么您的父母在结婚时不能带您去。这个问题可能是因为您在思考或者想象一个场景,即如果您的父母在结婚时能够带着您,那会是什么样的情况。通常情况下,父母结婚时孩子还没有出生,所以自然不能带着未出生的孩子参加婚礼。这是一个基于现实情况的问题。
new_batch_input = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True)
您刚才的提问是关于为什么您的父母在结婚时不能带您去。这个问题可能是因为您对父母结婚时的情景感到好奇,或者是在思考为什么在父母结婚的时候,作为孩子的您不在场。通常情况下,孩子是在父母结婚之后才出生的,所以他们在结婚时自然不能带着未出生的您。
queries = ["小明的女朋友是谁?"]
def batch_infer(self, queries, gen_kwargs): if not isinstance(queries, list): queries = [queries] inputs_list = [] generate_text = [] for query in queries: input_dict = self.tokenizer.apply_chat_template( [{"role": "user", "content": query}], add_generation_prompt=True, tokenize=False, ) encoded_inputs = self.tokenizer( input_dict, padding=True, truncation=True, max_length=1024, return_tensors="pt" ) inputs_list.append(encoded_inputs) batch_inputs = {} for key in inputs_list[0].keys(): batch_inputs[key] = torch.cat([inputs[key] for inputs in inputs_list], dim=0) batch_inputs = {k: v.to(self.device) for k, v in batch_inputs.items()} with torch.no_grad(): outputs = self.model.generate(**batch_inputs, **gen_kwargs) outputs = outputs[:, batch_inputs['input_ids'].shape[1]:] for _, output in enumerate(outputs): summary_res = self.tokenizer.decode(output, skip_special_tokens=True) generate_text.append(summary_res) return generate_text
循环单次推理的结果和padding后凑batch的推理结果一致
参考资料: LLM padding 细节 - 知乎: https://zhuanlan.zhihu.com/p/675273498?utm_psn=1751559938508574720
The text was updated successfully, but these errors were encountered:
"do_sample": True, "top_k": 1,} 那你不得设置成 False么,尝试将要 "do_sample": False, "top_k": 1,}
Sorry, something went wrong.
"do_sample": True, "top_k": 1,} 那你不得设置成 False么,尝试将要 "do_sample": False, "top_k": 1,} 尝试过这个参数,也无法一致。 prompt内容可以切换的更复杂些,更容易验证一致性。比如"小明的女朋友是谁?"会不一致;简单的提示词"锄禾日当午下一句"会一致。
zRzRzRzRzRzRzR
No branches or pull requests
System Info / 系統信息
transformers==4.39.0
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
官方类似的处理代码
GLM-4/basic_demo/trans_batch_demo.py at main · THUDM/GLM-4: https://github.com/THUDM/GLM-4/blob/main/basic_demo/trans_batch_demo.py
以下面代码为例,batch推理的结果和单个query推理的结果不一致
已经设置贪心推理策略
gen_kwargs = {"max_new_tokens": max_new_tokens, "do_sample": True, "top_k": 1,}
batch推理
非batch推理
queries = ["小明的女朋友是谁?"]
Expected behavior / 期待表现
循环单次推理的结果和padding后凑batch的推理结果一致
参考资料:
LLM padding 细节 - 知乎: https://zhuanlan.zhihu.com/p/675273498?utm_psn=1751559938508574720
The text was updated successfully, but these errors were encountered: