Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleApril 2023
DeepMerge: Learning to Merge Programs
IEEE Transactions on Software Engineering (ISOF), Volume 49, Issue 4Pages 1599–1614https://doi.org/10.1109/TSE.2022.3183955In collaborative software development, program merging is <italic>the</italic> mechanism to integrate changes from multiple programmers. Merge algorithms in modern version control systems report a conflict when changes interfere textually. Merge conflicts ...
- research-articleNovember 2022
Program merge conflict resolution via neural transformers
- Alexey Svyatkovskiy,
- Sarah Fakhoury,
- Negar Ghorbani,
- Todd Mytkowicz,
- Elizabeth Dinella,
- Christian Bird,
- Jinu Jang,
- Neel Sundaresan,
- Shuvendu K. Lahiri
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 822–833https://doi.org/10.1145/3540250.3549163Collaborative software development is an integral part of the modern software development life cycle, essential to the success of large-scale software projects. When multiple developers make concurrent changes around the same lines of code, a merge ...
- research-articleJuly 2022
Using pre-trained language models to resolve textual and semantic merge conflicts (experience paper)
ISSTA 2022: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and AnalysisPages 77–88https://doi.org/10.1145/3533767.3534396Program merging is standard practice when developers integrate their individual changes to a common code base. When the merge algorithm fails, this is called a merge conflict. The conflict either manifests as a textual merge conflict where the merge ...
- research-articleJuly 2022
TOGA: a neural method for test oracle generation
ICSE '22: Proceedings of the 44th International Conference on Software EngineeringPages 2130–2141https://doi.org/10.1145/3510003.3510141Testing is widely recognized as an important stage of the software development lifecycle. Effective software testing can provide benefits such as bug finding, preventing regressions, and documentation. In terms of documentation, unit tests express a unit'...
Breaking the computation and communication abstraction barrier in distributed machine learning workloads
- Abhinav Jangda,
- Jun Huang,
- Guodong Liu,
- Amir Hossein Nodehi Sabet,
- Saeed Maleki,
- Youshan Miao,
- Madanlal Musuvathi,
- Todd Mytkowicz,
- Olli Saarikivi
ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsPages 402–416https://doi.org/10.1145/3503222.3507778Recent trends towards large machine learning models require both training and inference tasks to be distributed. Considering the huge cost of training these models, it is imperative to unlock optimizations in computation and communication to obtain best ...
-
Synthesizing optimal collective algorithms
PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 62–75https://doi.org/10.1145/3437801.3441620Collective communication algorithms are an important component of distributed computation. Indeed, in the case of deep-learning, collective communication is the Amdahl's bottleneck of data-parallel training.
This paper introduces SCCL (for Synthesized ...
- research-articleOctober 2019
Niijima: sound and automated computation consolidation for efficient multilingual data-parallel pipelines
- Guoqing Harry Xu,
- Margus Veanes,
- Michael Barnett,
- Madan Musuvathi,
- Todd Mytkowicz,
- Ben Zorn,
- Huan He,
- Haibo Lin
SOSP '19: Proceedings of the 27th ACM Symposium on Operating Systems PrinciplesPages 306–321https://doi.org/10.1145/3341301.3359649Multilingual data-parallel pipelines, such as Microsoft's Scope and Apache Spark, are widely used in real-world analytical tasks. While the involvement of multiple languages (often including both managed and native languages) provides much convenience ...
- research-articleJune 2019
CHET: an optimizing compiler for fully-homomorphic neural-network inferencing
- Roshan Dathathri,
- Olli Saarikivi,
- Hao Chen,
- Kim Laine,
- Kristin Lauter,
- Saeed Maleki,
- Madanlal Musuvathi,
- Todd Mytkowicz
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and ImplementationPages 142–156https://doi.org/10.1145/3314221.3314628Fully Homomorphic Encryption (FHE) refers to a set of encryption schemes that allow computations on encrypted data without requiring a secret key. Recent cryptographic advances have pushed FHE into the realm of practical applications. However, ...
- research-articleMay 2018
Cross-language optimizations in big data systems: a case study of SCOPE
ICSE-SEIP '18: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in PracticePages 45–54https://doi.org/10.1145/3183519.3183528Building scalable big data programs currently requires programmers to combine relational (SQL) with non-relational code (Java, C#, Scala). Relational code is declarative - a program describes what the computation is and the compiler decides how to ...
Static stages for heterogeneous programming
Proceedings of the ACM on Programming Languages (PACMPL), Volume 1, Issue OOPSLAArticle No.: 71, Pages 1–27https://doi.org/10.1145/3133895Heterogeneous hardware is central to modern advances in performance and efficiency. Mainstream programming models for heterogeneous architectures, however, sacrifice safety and expressiveness in favor of low-level control over performance details. The ...
- research-articleAugust 2017
Static analysis for optimizing big data queries
- Diego Garbervetsky,
- Zvonimir Pavlinovic,
- Michael Barnett,
- Madanlal Musuvathi,
- Todd Mytkowicz,
- Edgardo Zoppi
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software EngineeringPages 932–937https://doi.org/10.1145/3106237.3117774Query languages for big data analysis provide user extensibility through a mechanism of user-defined operators (UDOs). These operators allow programmers to write proprietary functionalities on top of a relational query skeleton. However, achieving ...
- research-articleJune 2017
Debugging probabilistic programs
MAPL 2017: Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming LanguagesPages 18–26https://doi.org/10.1145/3088525.3088564Many applications compute with estimated and uncertain data. While advances in probabilistic programming help developers build such applications, debugging them remains extremely challenging. New types of errors in probabilistic programs include 1) ...
- research-articleJune 2017
Fusing effectful comprehensions
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and ImplementationPages 17–32https://doi.org/10.1145/3062341.3062362List comprehensions provide a powerful abstraction mechanism for expressing computations over ordered collections of data declaratively without having to use explicit iteration constructs. This paper puts forth effectful comprehensions as an elegant way ...
Also Published in:
ACM SIGPLAN Notices: Volume 52 Issue 6 - research-articleSeptember 2016
Efficient parallelization using rank convergence in dynamic programming algorithms
This paper proposes an efficient parallel algorithm for an important class of dynamic programming problems that includes Viterbi, Needleman--Wunsch, Smith--Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems that do not ...
- research-articleMarch 2016
Parallelizing WFST speech decoders
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)Pages 5325–5329https://doi.org/10.1109/ICASSP.2016.7472694The performance-intensive part of a large-vocabulary continuous speech-recognition system is the Viterbi computation that determines the sequence of words that are most likely to generate the acoustic-state scores extracted from an input utterance. This ...
- research-articleFebruary 2016
Low-Rank Methods for Parallelizing Dynamic Programming Algorithms
ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 4Article No.: 26, Pages 1–32https://doi.org/10.1145/2884065This article proposes efficient parallel methods for an important class of dynamic programming problems that includes Viterbi, Needleman-Wunsch, Smith-Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems that do not depend ...
- research-articleOctober 2015
Parallelizing user-defined aggregations using symbolic execution
SOSP '15: Proceedings of the 25th Symposium on Operating Systems PrinciplesPages 153–167https://doi.org/10.1145/2815400.2815418User-defined aggregations (UDAs) are integral to large-scale data-processing systems, such as MapReduce and Hadoop, because they let programmers express application-specific aggregation logic. System-supported associative aggregations, such as counting ...
- ArticleJuly 2015
Yinyang K-means: a drop-in replacement of the classic K-means with consistent speedup
ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37Pages 579–587This paper presents Yinyang K-means, a new algorithm for K-means clustering. By clustering the centers in the initial stage, and leveraging efficiently maintained lower and upper bounds between a point and centers, it more effectively avoids unnecessary ...
- research-articleJune 2015
TOP: a framework for enabling algorithmic optimizations for distance-related problems
Proceedings of the VLDB Endowment (PVLDB), Volume 8, Issue 10Pages 1046–1057https://doi.org/10.14778/2794367.2794374Computing distances among data points is an essential part of many important algorithms in data analytics, graph analysis, and other domains. In each of these domains, developers have spent significant manual effort optimizing algorithms, often through ...
- research-articleMay 2015
Uncertain&lt;T&gt;: Abstractions for Uncertain Hardware and Software
Building correct, efficient systems that reason about the approximations produced by sensors, machine learning, big data, humans, and approximate hardware and software requires new standards and abstractions. The Uncertain <;T>; software abstraction ...