research-article

Open access

tf.data service: A Case for Disaggregating ML Input Data Processing

Authors:

Andrew Audibert,

Jiří Šimša,

Chandramohan A. ThekkathAuthors Info & Claims

SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud Computing

Pages 358 - 375

https://doi.org/10.1145/3620678.3624666

Published: 31 October 2023 Publication History

Abstract

Machine learning (ML) computations commonly execute on expensive specialized hardware, such as GPUs and TPUs, which provide high FLOPs and performance-per-watt. For cost efficiency, it is essential to keep these accelerators highly utilized. This requires preprocessing input data at the rate at which the accelerators can ingest and perform ML computations on the data. To avoid data stalls, the host CPU and RAM required for input data processing per accelerator core used for ML computations varies across jobs. Hence, the traditional approach of processing input data on ML accelerator hosts with a fixed hardware ratio leads to either under-utilizing the accelerators or the host CPU and RAM. In this paper, we address these concerns by building a disaggregated ML data processing system.

We present tf.data service, an open-source disaggregated input data processing service built on top of tf.data in TensorFlow. We show that disaggregating data preprocessing has three key advantages for large-scale ML training jobs. First, the service can horizontally scale-out to right-size CPU/RAM host resources for data processing in each job, saving 32× training time and 26× cost, on average. Second, the service can share ephemeral preprocessed data results across jobs, to optimize CPU usage and reduce redundant computations. Finally, the service supports coordinated reads, a technique that avoids stragglers due to different input sizes in distributed training, reducing training time by 2.2×, on average. Our design is inspired by lessons learned from deploying tf.data service in production, including relaxing data visitation guarantees without impacting model accuracy.

References

[1]

2022. Apache Beam: An advanced unified programming model. https://beam.apache.org/.

[2]

2022. Apache Flume. https://flume.apache.org/.

[3]

Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proc. of OSDI. https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf

[4]

Amazon. 2022. Amazon EC2 Pricing. https://aws.amazon.com/ec2/pricing/.

[5]

Amazon. 2022. Amazon EC2 Pricing. https://aws.amazon.com/ec2/instance-types/.

[6]

Rohan Anil, Battulga Bayarsaikhan, Ryan Doherty, and Emanuel Taropa. 2021. Distributed computing pipeline processing. https://patents.google.com/patent/WO2021177976A1.

[7]

Leon Bottou. 2009. Curiously Fast Convergence of some Stochastic Gradient Descent Algorithms. In Proc. of the Symposium on Learning and Data Science.

[8]

Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010:19th International Conference on Computational StatisticsParis France, August 22-27, 2010 Keynote, Invited and Contributed Papers. Springer, 177--186.

[9]

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, and Skye Wanderman-Milne. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax

[10]

Broadcom. 2019. Broadcom Stingray PS250 SmartNIC. https://docs.broadcom.com/doc/PS250-PB

[11]

Tianshi Cao, Sasha (Alexandre) Doubov, David Acuna, and Sanja Fidler. 2021. Scalable Neural Data Server: A Data Recommender for Transfer Learning. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.).

[12]

Upen S. Chakravarthy, John Grant, and Jack Minker. 1990. Logic-Based Approach to Semantic Query Optimization. ACM Trans. Database Syst. 15, 2 (jun 1990), 162--207. https://doi.org/10.1145/78922.78924

Digital Library

[13]

Dami Choi, Alexandre Passos, Christopher J. Shallue, and George E. Dahl. 2019. Faster Neural Network Training with Data Echoing. arXiv:1907.05550 [cs.LG]

[14]

Torch Contributors. 2022. PyTorch Docs: torch.utils.data. https://pytorch.org/docs/stable/data.html.

[15]

Ekin D. Cubuk, Barret Zoph, Dandelion Mané, Vijay Vasudevan, and Quoc V. Le. 2019. AutoAugment: Learning Augmentation Strategies From Data. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR.

[16]

Ekin Dogus Cubuk, Barret Zoph, Jon Shlens, and Quoc Le. 2020. RandAugment: Practical Automated Data Augmentation with a Reduced Search Space. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.). 18613--18624.

[17]

Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. 2016. The Snowflake Elastic Data Warehouse. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16).

Digital Library

[18]

Jia Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In Proc. of CVPR.

[19]

Jonas Geiping, Micah Goldblum, Gowthami Somepalli, Ravid Shwartz-Ziv, Tom Goldstein, and Andrew Gordon Wilson. 2023. How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization. arXiv:2210.06441 [cs.LG]

[20]

Georgios Giannikis, Darko Makreshanski, Gustavo Alonso, and Donald Kossmann. 2014. Shared workload optimization. Proceedings of the VLDB Endowment 7, 6, 429--440.

Digital Library

[21]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.

Digital Library

[22]

Google. 2022. Better performance with the tf.data API. https://www.tensorflow.org/guide/data_performance

[23]

Google. 2022. Google Cloud: All Pricing. https://cloud.google.com/compute/all-pricing.

[24]

Google. 2022. Google Cloud: TPU regions and zones. https://cloud.google.com/tpu/docs/regions-zones.

[25]

Google. 2022. tf.data service API documentation. https://www.tensorflow.org/api_docs/python/tf/data/experimental/service

[26]

Google. 2023. Colossus under the hood: a peek into Google's scalable storage system.

[27]

Google. 2023. Google Storage. https://cloud.google.com/storage.

[28]

Google. 2023. gRPC Documentation.

[29]

Google. 2023. Network Pricing. https://cloud.google.com/vpc/network-pricing#vpc-pricing.

[30]

Dan Graur, Damien Aymon, Dan Kluser, Tanguy Albrici, Chandramohan A Thekkath, and Ana Klimovic. 2022. Cachew: Machine Learning Input Data Processing as a Service. In Proc. of USENIX ATC.

[31]

Joaquin Anton Guirao, Krzysztof Łęcki, Janusz Lisiecki, Serge Panev, Michał Szołucha, Albert Wolant, and Michał Zientkiewicz. 2019. Fast AI Data Preprocessing with NVIDIA DALI. https://devblogs.nvidia.com/fast-ai-data-preprocessing-with-nvidia-dali.

[32]

Stavros Harizopoulos, Vladislav Shkapenyuk, and Anastassia Ailamaki. 2005. Qpipe: A simultaneously pipelined relational query engine. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 383--394.

Digital Library

[33]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proc. of CVPR. IEEE Computer Society. https://doi.org/10.1109/CVPR.2016.90

[34]

Martin Hirzel, Robert Soulé, Scott Schneider, Buğra Gedik, and Robert Grimm. 2014. A catalog of stream processing optimizations. ACM Computing Surveys (CSUR) 46, 4 (2014), 1--34.

Digital Library

[35]

Kubernetes HPA. 2023. Kubernetes Horizontal Pod Autoscaler Documentation. https://kubernetes.io/docs/tasks/run-application/horizontalpod-autoscale/

[36]

Chip Huyen. 2022. Designing Machine Learning Systems. O'Reilly Media, USA.

[37]

Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, and Chuanxiong Guo. 2020. A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20).

Digital Library

[38]

Norman P Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, et al. 2023. Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. arXiv preprint arXiv:2304.01433 (2023).

[39]

Norman P. Jouppi, Doe Hyun Yoon, George Kurian, Sheng Li, Nishant Patil, James Laudon, Cliff Young, and David Patterson. 2020. A Domain-Specific Supercomputer for Training Deep Neural Networks. Commun. ACM 63, 7 (2020).

[40]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proc. of ISCA (Toronto, ON, Canada) (ISCA '17). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3079856.3080246

Digital Library

[41]

Aarati Kakaraparthy, Abhay Venkatesh, Amar Phanishayee, and Shivaram Venkataraman. 2019. The Case for Unifying Data Loading in Machine Learning Clusters. In 11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 19).

[42]

Ana Klimovic, Christos Kozyrakis, Eno Thereska, Binu John, and Sanjeev Kumar. 2016. Flash Storage Disaggregation. In Proc. EuroSys (EuroSys '16). Article 29.

Digital Library

[43]

Kubernetes. 2023. kubernetes Documentation. https://kubernetes.io/docs/home/

[44]

Michael Kuchnik, Ana Klimovic, Jiri Simsa, Virginia Smith, and George Amvrosiadis. 2022. Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines. In Proc. of Machine Learning and Systems, Vol. 4. 33--51.

[45]

Abhishek Vijaya Kumar and Muthian Sivathanu. 2020. Quiver: An Informed Storage Cache for Deep Learning. In Proc. of FAST.

[46]

Gyewon Lee, Irene Lee, Hyeonmin Ha, Kyunggeun Lee, Hwarim Hyun, Ahnjae Shin, and Byung-Gon Chun. 2021. Refurbish Your Training Data: Reusing Partially Augmented Samples for Faster Deep Neural Network Training. In Proc. of USENIX ATC.

[47]

Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI'14).

Digital Library

[48]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.

[49]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Proc. of ECCV (2014-01-01). Zürich. /se3/wp-content/uploads/2014/09/coco_eccvpdf, http://mscoco.org Oral.

[50]

Renato Marroquin, Ingo Müller, Darko Makreshanski, and Gustavo Alonso. 2018. Pay one, get hundreds for free: Reducing cloud costs through shared query execution. In Proceedings of the ACM Symposium on Cloud Computing. 439--450.

Digital Library

[51]

Meta. 2022. Scaling data ingestion for machine learning training at Meta. https://engineering.fb.com/2022/09/19/ml-applications/data-ingestion-machine-learning-training-meta/.

[52]

Jaehong Min, Ming Liu, Tapan Chugh, Chenxingyu Zhao, Andrew Wei, In Hwan Doh, and Arvind Krishnamurthy. 2021. Gimbal: Enabling Multi-Tenant Storage Disaggregation on SmartNIC JBOFs. In Proc. of ACM SIGCOMM (SIGCOMM '21). 106--122.

[53]

MLCommons. 2022. ML Perf v2 Google Hardware Configurations. https://github.com/mlcommons/training_results_v2.0/tree/main/Google/systems

[54]

Jayashree Mohan, Amar Phanishayee, and Vijay Chidambaram. 2021. {CheckFreq}: Frequent, {Fine-Grained}{DNN} Checkpointing. In 19th USENIX Conference on File and Storage Technologies (FAST 21). 203--216.

[55]

Jayashree Mohan, Amar Phanishayee, Ashish Raniwala, and Vijay Chidambaram. 2021. Analyzing and Mitigating Data Stalls in DNN Training. arXiv:2007.06775 [cs.DC]

[56]

Derek G. Murray, Jiří Šimša, Ana Klimovic, and Ihor Indyk. 2021. tf.data: A Machine Learning Data Processing Framework. Proc. VLDB Endow. 14, 12 (2021).

Digital Library

[57]

MXNET. 2018. Designing Efficient Data Loaders for Deep Learning. https://mxnet.apache.org/api/architecture/note_data_loading.

[58]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

Digital Library

[59]

Krzysztof Rzadca, Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemyslaw Broniek, Jarek Kusmierek, Pawel Nowak, Beata Strack, Piotr Witusowski, Steven Hand, et al. 2020. Autopilot: workload autoscaling at google. In Proc. of the Fifteenth European Conference on Computer Systems.

Digital Library

[60]

Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. 2018. LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation. In Proc. of OSDI.

[61]

Sreekumar T. Shenoy and Z. Meral Ozsoyoglu. 1987. A System for Semantic Query Optimization. In Proceedings of the 1987 ACM SIGMOD International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '87). Association for Computing Machinery, New York, NY, USA, 181--195. https://doi.org/10.1145/38713.38736

Digital Library

[62]

Connor Shorten and Taghi M Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of big data 6, 1 (2019), 1--48.

[63]

Connor Shorten, Taghi M Khoshgoftaar, and Borko Furht. 2021. Text data augmentation for deep learning. Journal of big Data 8 (2021), 1--34.

[64]

Patrice Y. Simard, Dave Steinkraus, and John C. Platt. 2003. Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. In Proc. of ICDAR (ICDAR '03). IEEE Computer Society, USA, 1 pages.

[65]

Apache Spark. 2023. Spark Streaming Programming Guide. https://spark.apache.org/docs/latest/streaming-programming-guide.html.

[66]

TensorFlow. 2022. Module: tf.data.experimental.service. https://www.tensorflow.org/api_docs/python/tf/data/experimental/service.

[67]

TensorFlow. 2022. tf.data: Build TensorFlow input pipelines. https://www.tensorflow.org/guide/data.

[68]

TensorFlow. 2023. Tensorflow. https://github.com/tensorflow/tensorflow.

[69]

TensorFlow. 2023. TensorFlow Model Garden. https://github.com/tensorflow/models.

[70]

Muhammad Tirmazi, Adam Barker, Nan Deng, Md E Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, and John Wilkes. 2020. Borg: the next generation. In Proceedings of the fifteenth European conference on computer systems. 1--14.

Digital Library

[71]

Taegeon Um, Byungsoo Oh, Byeongchan Seo, Minhyeok Kweun, Goeun Kim, and Woo-Yeon Lee. 2023. FastFlow: Accelerating Deep Learning Model Training with Smart Offloading of Input Data Pipeline. Proceedings of the VLDB Endowment 16, 5 (2023), 1086--1099.

Digital Library

[72]

Joost Verbraeken, Matthijs Wolting, Jonathan Katzy, Jeroen Kloppenburg, Tim Verbelen, and Jan S. Rellermeyer. 2020. A Survey on Distributed Machine Learning. ACM Comput. Surv. 53, 2, Article 30 (mar 2020).

[73]

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proc. of EuroSys.

Digital Library

[74]

Kubernetes VPA. 2023. Kubernetes Vertical Pod Autoscaler Documentation. https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler

[75]

Xi Yan, David Acuna, and Sanja Fidler. 2020. Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data. In Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76]

Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proc. of HotCloud (Boston, MA) (HotCloud'10). USENIX Association, USA, 10.

[77]

Mark Zhao, Niket Agarwal, Aarti Basant, Buğra Gedik, Satadru Pan, Mustafa Ozdal, Rakesh Komuravelli, Jerry Pan, Tianshu Bao, Haowei Lu, Sundaram Narayanan, Jack Langman, Kevin Wilfong, Harsha Rastogi, Carole-Jean Wu, Christos Kozyrakis, and Parik Pol. 2022. Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training: Industrial Product. In Proc. of ISCA (ISCA '22).

Digital Library

Cited By

Graur DMraz OLi MPourghannad SThekkath CKlimovic ABagchi SZhang Y(2024)PecanProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692032(649-665)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692032
Wang MWaldspurger GSundararaman S(2024)A Selective Preprocessing Offloading Framework for Reducing Data Traffic in DL TrainingProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665947(63-70)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3655038.3665947
Lee YKim HRhu M(2024)PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00033(340-353)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00033
Show More Cited By

Index Terms

tf.data service: A Case for Disaggregating ML Input Data Processing

Index terms have been assigned to the content through auto-classification.

Recommendations

dOpenCL: Towards uniform programming of distributed heterogeneous multi-/many-core systems

Modern computer systems become increasingly distributed and heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of ...
Towards Optimality in Parallel Job Scheduling
SIGMETRICS '18

To keep pace with Moore's law, chip designers have focused on increasing the number of cores per chip. To effectively leverage these multi-core chips, one must decide how many cores to assign to each job. Given that jobs receive sublinear speedups from ...
Hybridhadoop: CPU-GPU hybrid scheduling in hadoop
Abstract
As a GPU has become an essential component in high performance computing, it has been attempted by many works to leverage GPU computing in Hadoop. However, few works considered to fully utilize the GPU in Hadoop and only a few works studied ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud Computing

October 2023

624 pages

ISBN:9798400703874

DOI:10.1145/3620678

Copyright © 2023 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SoCC '23

Sponsor:

SoCC '23: ACM Symposium on Cloud Computing

October 30 - November 1, 2023

CA, Santa Cruz, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
723
Total Downloads

Downloads (Last 12 months)579
Downloads (Last 6 weeks)49

Reflects downloads up to 11 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Graur DMraz OLi MPourghannad SThekkath CKlimovic ABagchi SZhang Y(2024)PecanProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692032(649-665)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692032
Wang MWaldspurger GSundararaman S(2024)A Selective Preprocessing Offloading Framework for Reducing Data Traffic in DL TrainingProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665947(63-70)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3655038.3665947
Lee YKim HRhu M(2024)PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00033(340-353)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00033
Kim HLee YRhu M(2023)FPGA-Accelerated Data Preprocessing for Personalized Recommendation SystemsIEEE Computer Architecture Letters10.1109/LCA.2023.333684123:1(7-10)Online publication date: 28-Nov-2023
https://dl.acm.org/doi/10.1109/LCA.2023.3336841

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents