Cited By
View all- You HFu YWang ZYazdanbakhsh ALin YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)When linear attention meets autoregressive decodingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694435(57350-57366)Online publication date: 21-Jul-2024
- Lee WLee JSeo JSim JGavrilovska ATerry D(2024)InfiniGenProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691947(155-172)Online publication date: 10-Jul-2024
- Song YMeng YChen BChen SKang Y(2024)SALTM: Accelerating Large Transformers in Multi-Device System With 2-D Model Partitioning MethodIntegrated Circuits and Systems10.23919/ICS.2024.34588971:3(144-156)Online publication date: Jul-2024
- Show More Cited By