default search action
Saeed Maleki
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j4]Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang:
Efficient Schedule Construction for Distributed Execution of Large DNN Models. IEEE Trans. Parallel Distributed Syst. 35(12): 2375-2391 (2024) - [c21]Abhinav Jangda, Saeed Maleki, Maryam Mehri Dehnavi, Madan Musuvathi, Olli Saarikivi:
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels. CGO 2024: 93-105 - [c20]Guodong Liu, Youshan Miao, Zhiqi Lin, Xiaoxiang Shi, Saeed Maleki, Fan Yang, Yungang Bao, Sa Wang:
Aceso: Efficient Parallel DNN Training through Iterative Bottleneck Alleviation. EuroSys 2024: 163-181 - [c19]Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang:
Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search. HPCA 2024: 803-816 - [c18]Pratyush Patel, Esha Choukse, Chaojie Zhang, Aashaka Shah, Íñigo Goiri, Saeed Maleki, Ricardo Bianchini:
Splitwise: Efficient Generative LLM Inference Using Phase Splitting. ISCA 2024: 118-132 - [c17]Zhiqi Lin, Youshan Miao, Quanlu Zhang, Fan Yang, Yi Zhu, Cheng Li, Saeed Maleki, Xu Cao, Ning Shang, Yilei Yang, Weijiang Xu, Mao Yang, Lintao Zhang, Lidong Zhou:
nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training. OSDI 2024: 347-363 - [i18]Liangyu Zhao, Saeed Maleki, Ziyue Yang, Hossein Pourreza, Aashaka Shah, Changho Hwang, Arvind Krishnamurthy:
ForestColl: Efficient Collective Communications on Heterogeneous Network Fabrics. CoRR abs/2402.06787 (2024) - 2023
- [c16]Meghan Cowan, Saeed Maleki, Madanlal Musuvathi, Olli Saarikivi, Yifan Xiong:
MSCCLang: Microsoft Collective Communication Language. ASPLOS (2) 2023: 502-514 - [c15]Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi:
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches. NSDI 2023: 593-612 - [i17]Zhiqi Lin, Youshan Miao, Guodong Liu, Xiaoxiang Shi, Quanlu Zhang, Fan Yang, Saeed Maleki, Yi Zhu, Xu Cao, Cheng Li, Mao Yang, Lintao Zhang, Lidong Zhou:
SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction. CoRR abs/2301.08984 (2023) - [i16]Abhinav Jangda, Saeed Maleki, Maryam Mehri Dehnavi, Madan Musuvathi, Olli Saarikivi:
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels. CoRR abs/2305.13450 (2023) - [i15]Behnaz Arzani, Siva Kesava Reddy Kakarla, Miguel Castro, Srikanth Kandula, Saeed Maleki, Luke Marshall:
Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem. CoRR abs/2305.13479 (2023) - [i14]Saeed Maleki:
Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM. CoRR abs/2310.06178 (2023) - [i13]Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang:
Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search. CoRR abs/2311.15269 (2023) - [i12]Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Aashaka Shah, Saeed Maleki, Ricardo Bianchini:
Splitwise: Efficient generative LLM inference using phase splitting. CoRR abs/2311.18677 (2023) - 2022
- [c14]Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Saarikivi:
Breaking the computation and communication abstraction barrier in distributed machine learning workloads. ASPLOS 2022: 402-416 - [i11]Meghan Cowan, Saeed Maleki, Madanlal Musuvathi, Olli Saarikivi, Yifan Xiong:
MSCCL: Microsoft Collective Communication Library. CoRR abs/2201.11840 (2022) - [i10]Saeed Maleki, Adhiti Raman, Yang Cheng, John L. Crassidis, Matthias Schmid:
Optimal Pose Estimation and Covariance Analysis with Simultaneous Localization and Mapping Applications. CoRR abs/2210.11697 (2022) - [i9]Saeed Maleki, John L. Crassidis, Yang Cheng, Matthias Schmid:
Error-Covariance Analysis of Monocular Pose Estimation Using Total Least Squares. CoRR abs/2210.12157 (2022) - 2021
- [c13]Gurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi:
Distributed Training of Embeddings using Graph Analytics. IPDPS 2021: 973-983 - [c12]Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum:
Scaling Distributed Training with Adaptive Summation. MLSys 2021 - [c11]Zixian Cai, Zhengyang Liu, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi:
Synthesizing optimal collective algorithms. PPoPP 2021: 62-75 - [i8]Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Saarikivi:
CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning. CoRR abs/2105.05720 (2021) - [i7]Saeed Maleki, John L. Crassidis, Yang Cheng, Matthias Schmid:
Total Least Squares for Optimal Pose Estimation. CoRR abs/2106.11522 (2021) - [i6]Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, Rachee Singh:
Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL. CoRR abs/2111.04867 (2021) - 2020
- [i5]Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum:
Scaling Distributed Training with Adaptive Summation. CoRR abs/2006.02924 (2020) - [i4]Zixian Cai, Zhengyang Liu, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi:
Synthesizing Optimal Collective Algorithms. CoRR abs/2008.08708 (2020)
2010 – 2019
- 2019
- [c10]Roshan Dathathri, Olli Saarikivi, Hao Chen, Kim Laine, Kristin E. Lauter, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz:
CHET: an optimizing compiler for fully-homomorphic neural-network inferencing. PLDI 2019: 142-156 - [i3]Gurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi:
Distributed Word2Vec using Graph Analytics Frameworks. CoRR abs/1909.03359 (2019) - 2018
- [j3]Zhangxiaowen Gong, Zhi Chen, Justin Josef Szaday, David C. Wong, Zehra Sura, Neftali Watkinson, Saeed Maleki, David A. Padua, Alexander V. Veidenbaum, Alexandru Nicolau, Josep Torrellas:
An empirical study of the effect of source-level loop transformations on compiler stability. Proc. ACM Program. Lang. 2(OOPSLA): 126:1-126:29 (2018) - [c9]Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz:
Semantics-Preserving Parallelization of Stochastic Gradient Descent. IPDPS 2018: 224-233 - [i2]Roshan Dathathri, Olli Saarikivi, Hao Chen, Kim Laine, Kristin E. Lauter, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz:
CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs. CoRR abs/1810.00845 (2018) - 2017
- [c8]Zhi Chen, Zhangxiaowen Gong, Justin Josef Szaday, David C. Wong, David A. Padua, Alexandru Nicolau, Alexander V. Veidenbaum, Neftali Watkinson, Zehra Sura, Saeed Maleki, Josep Torrellas, Gerald DeJong:
LORE: A loop repository for the evaluation of compilers. IISWC 2017: 219-228 - [i1]Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz:
Parallel Stochastic Gradient Descent with Sound Combiners. CoRR abs/1705.08030 (2017) - 2016
- [j2]Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz:
Efficient parallelization using rank convergence in dynamic programming algorithms. Commun. ACM 59(10): 85-92 (2016) - [j1]Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz:
Low-Rank Methods for Parallelizing Dynamic Programming Algorithms. ACM Trans. Parallel Comput. 2(4): 26:1-26:32 (2016) - [c7]Charith Mendis, Jasha Droppo, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Geoffrey Zweig:
Parallelizing WFST speech decoders. ICASSP 2016: 5325-5329 - [c6]Saeed Maleki, Donald Nguyen, Andrew Lenharth, María Jesús Garzarán, David A. Padua, Keshav Pingali:
DSMR: A Parallel Algorithm for Single-Source Shortest Path Problem. ICS 2016: 32:1-32:14 - [c5]Saeed Maleki, Donald Nguyen, Andrew Lenharth, María Jesús Garzarán, David A. Padua, Keshav Pingali:
DSMR: a shared and distributed memory algorithm for single-source shortest path problem. PPoPP 2016: 39:1-39:2 - 2015
- [b1]Saeed Maleki:
Communication avoiding parallel algorithms for amorphous problems. University of Illinois Urbana-Champaign, USA, 2015 - 2014
- [c4]Saeed Maleki, G. Carl Evans, David A. Padua:
Tiled Linear Algebra a System for Parallel Graph Algorithms. LCPC 2014: 116-130 - [c3]Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz:
Parallelizing dynamic programming through rank convergence. PPoPP 2014: 219-232 - 2012
- [c2]Albert Sidelnik, Saeed Maleki, Bradford L. Chamberlain, María Jesús Garzarán, David A. Padua:
Performance Portability with the Chapel Language. IPDPS 2012: 582-594 - 2011
- [c1]Saeed Maleki, Yaoqing Gao, María Jesús Garzarán, Tommy Wong, David A. Padua:
An Evaluation of Vectorizing Compilers. PACT 2011: 372-382
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-07 21:35 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint