rawwellのブックマーク - はてなブックマーク

rawwell id:rawwell

ブックマーク / ai.stackexchange.com (2)

What if we drop the causal mask in auto-regressive Transformer?
rawwell 2024/08/10
“This property ensures that the model can only attend to previous positions in the sequence, not future positions, in order to generate predictions sequentially.”

LLM

research
リンク
Transformer decoder. Causal masking during inference?
rawwell 2024/08/10
“egarding batched inference: I believe that all sequences in the batch are unrolled together so there is no need to pad anything during inference, but at the end you have to slice shorter outputs. I really couldn't find any reason for having the mask, but yet these "reference" implementations that
リンク
1

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx