Author: Komuravelli, Rakesh : Search

research-article

MTIA: First Generation Silicon Targeting Meta's Recommendation Systems

ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureArticle No.: 80, Pages 1–13https://doi.org/10.1145/3579371.3589348

Meta has traditionally relied on using CPU-based servers for running inference workloads, specifically Deep Learning Recommendation Models (DLRM), but the increasing compute and memory requirements of these models have pushed the company towards using ...

research-article

Software-hardware co-design for fast and scalable training of deep learning recommendation models

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer ArchitecturePages 993–1011https://doi.org/10.1145/3470496.3533727

Deep learning recommendation models (DLRMs) have been used across many business-critical services at Meta and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper, we present Neo, a software-hardware ...

research-article

Understanding data storage and ingestion for large-scale deep recommendation model training: industrial product

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer ArchitecturePages 1042–1057https://doi.org/10.1145/3470496.3533044

Datacenter-scale AI training clusters consisting of thousands of domain-specific accelerators (DSA) are used to train increasingly-complex deep learning models. These clusters rely on a data storage and ingestion (DSI) pipeline, responsible for storing ...

research-article

Open Access

HPVM: heterogeneous parallel virtual machine

PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 68–80https://doi.org/10.1145/3178487.3178493

We propose a parallel program representation for heterogeneous systems, designed to enable performance portability across a wide range of popular parallel hardware, including GPUs, vector instruction sets, multicore CPUs and potentially FPGAs. Our ...

Also Published in:

ACM SIGPLAN Notices: Volume 53 Issue 1

poster

Public Access

POSTER: hVISC: A Portable Abstraction for Heterogeneous Parallel Systems

PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 443–445https://doi.org/10.1145/2967938.2976039

Programming heterogeneous parallel systems can be extremely complex because a single system may include multiple different parallelism models, instruction sets, and memory hierarchies, and different systems use different combinations of these features. ...

research-article

Stash: have your scratchpad and cache it too

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer ArchitecturePages 707–719https://doi.org/10.1145/2749469.2750374

Heterogeneous systems employ specialization for energy efficiency. Since data movement is expected to be a dominant consumer of energy, these systems employ specialized memories (e.g., scratchpads and FIFOs) for better efficiency for targeted data. ...

Also Published in:

ACM SIGARCH Computer Architecture News: Volume 43 Issue 3S

research-article

Open Access

Revisiting the Complexity of Hardware Cache Coherence and Some Implications

ACM Transactions on Architecture and Code Optimization (TACO), Volume 11, Issue 4Article No.: 37, Pages 1–22https://doi.org/10.1145/2663345

Cache coherence is an integral part of shared-memory systems but is also widely considered to be one of the most complex parts of such systems. Much prior work has addressed this complexity and the verification techniques to prove the correctness of ...

research-article

DeNovoND: efficient hardware support for disciplined non-determinism

ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systemsPages 13–26https://doi.org/10.1145/2451116.2451119

Recent work has shown that disciplined shared-memory programming models that provide deterministic-by-default semantics can simplify both parallel software and hardware. Specifically, the DeNovo hardware system has shown that the software guarantees of ...

Also Published in:

ACM SIGARCH Computer Architecture News: Volume 41 Issue 1ACM SIGPLAN Notices: Volume 48 Issue 4

Article

DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

PACT '11: Proceedings of the 2011 International Conference on Parallel Architectures and Compilation TechniquesPages 155–166https://doi.org/10.1109/PACT.2011.21

For parallelism to become tractable for mass programmers, shared-memory languages and environments must evolve to enforce disciplined practices that ban "wild shared-memory behaviors;'' e.g., unstructured parallelism, arbitrary data races, and ...

research-article

Parallel SAH k-D tree construction

HPG '10: Proceedings of the Conference on High Performance GraphicsPages 77–86

The k-D tree is a well-studied acceleration data structure for ray tracing. It is used to organize primitives in a scene to allow efficient execution of intersection operations between rays and the primitives. The highest quality k-D tree can be ...

research-article

A type and effect system for deterministic parallel Java

OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applicationsPages 97–116https://doi.org/10.1145/1640089.1640097

Today's shared-memory parallel programming models are complex and error-prone.While many parallel programs are intended to be deterministic, unanticipated thread interleavings can lead to subtle bugs and nondeterministic semantics. In this paper, we ...

Also Published in:

ACM SIGPLAN Notices: Volume 44 Issue 10

Article

A Prototype for Tiger Hash Primitive Hardware Architecture

ADCOM '07: Proceedings of the 15th International Conference on Advanced Computing and CommunicationsPages 327–332https://doi.org/10.1109/ADCOM.2007.25

With the increasing prominence of the Internet as a tool of commerce, security has become a tremendously important issue. One essential aspect for secure communication over networks is that of cryptography. The increasing prominence of mobile devices ...

Search Results

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

MTIA: First Generation Silicon Targeting Meta's Recommendation Systems

Software-hardware co-design for fast and scalable training of deep learning recommendation models

Understanding data storage and ingestion for large-scale deep recommendation model training: industrial product

HPVM: heterogeneous parallel virtual machine

Also Published in:

POSTER: hVISC: A Portable Abstraction for Heterogeneous Parallel Systems

Stash: have your scratchpad and cache it too

Also Published in:

Revisiting the Complexity of Hardware Cache Coherence and Some Implications

DeNovoND: efficient hardware support for disciplined non-determinism

Also Published in:

DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

Parallel SAH k-D tree construction

A type and effect system for deterministic parallel Java

Also Published in:

A Prototype for Tiger Hash Primitive Hardware Architecture

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder

Also Published in:

Also Published in:

Also Published in:

Also Published in: