Ebook536 pages5 hours

High-Performance Big Data Computing

Name: High-Performance Big Data Computing
Author: Dhabaleswar K. Panda
ISBN: 9780262369428

By Dhabaleswar K. Panda, Xiaoyi Lu and Dipti Shankar

Rating: 0 out of 5 stars

()

Read preview

About this ebook

An in-depth overview of an emerging field that brings together high-performance computing, big data processing, and deep lLearning.

Over the last decade, the exponential explosion of data known as big data has changed the way we understand and harness the power of data. The emerging field of high-performance big data computing, which brings together high-performance computing (HPC), big data processing, and deep learning, aims to meet the challenges posed by large-scale data processing. This book offers an in-depth overview of high-performance big data computing and the associated technical issues, approaches, and solutions.

The book covers basic concepts and necessary background knowledge, including data processing frameworks, storage systems, and hardware capabilities; offers a detailed discussion of technical issues in accelerating big data computing in terms of computation, communication, memory and storage, codesign, workload characterization and benchmarking, and system deployment and management; and surveys benchmarks and workloads for evaluating big data middleware systems. It presents a detailed discussion of big data computing systems and applications with high-performance networking, computing, and storage technologies, including state-of-the-art designs for data processing and storage systems. Finally, the book considers some advanced research topics in high-performance big data computing, including designing high-performance deep learning over big data (DLoBD) stacks and HPC cloud technologies.

Skip carousel

LanguageEnglish

PublisherThe MIT Press

Release dateAug 2, 2022

ISBN9780262369428

Author

Dhabaleswar K. Panda

Related authors

Skip carousel

Related to High-Performance Big Data Computing

Related ebooks

Skip carousel

Mastering DuckDB: High-Performance Analytics Made Easy
Ebook
Mastering DuckDB: High-Performance Analytics Made Easy
byRobert Johnson
Rating: 0 out of 5 stars
0 ratings
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
Ebook
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
byWilliam Smith
Rating: 0 out of 5 stars
0 ratings
Mastering C: Advanced Techniques and Best Practices
Ebook
Mastering C: Advanced Techniques and Best Practices
byAdam Jones
Rating: 0 out of 5 stars
0 ratings
Optimized Computing in C++: Mastering Concurrency, Multithreading, and Parallel Programming
Ebook
Optimized Computing in C++: Mastering Concurrency, Multithreading, and Parallel Programming
byPeter Jones
Rating: 0 out of 5 stars
0 ratings
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
Ebook
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
byRobert Johnson
Rating: 0 out of 5 stars
0 ratings
Mastering C: Advanced Techniques and Tricks
Ebook
Mastering C: Advanced Techniques and Tricks
byTed Norice
Rating: 0 out of 5 stars
0 ratings
PySpark Essentials: A Practical Guide to Distributed Computing
Ebook
PySpark Essentials: A Practical Guide to Distributed Computing
byRobert Johnson
Rating: 0 out of 5 stars
0 ratings
Mastering ScyllaDB: High-Performance NoSQL with C++
Ebook
Mastering ScyllaDB: High-Performance NoSQL with C++
byRobert Johnson
Rating: 0 out of 5 stars
0 ratings
C++ Advanced Programming: Building High-Performance Applications
Ebook
C++ Advanced Programming: Building High-Performance Applications
byRobert Johnson
Rating: 0 out of 5 stars
0 ratings
Advanced Database Architecture: Strategic Techniques for Effective Design
Ebook
Advanced Database Architecture: Strategic Techniques for Effective Design
byAdam Jones
Rating: 0 out of 5 stars
0 ratings
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Ebook
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
byEric Tome
Rating: 0 out of 5 stars
0 ratings
Modern Data Architectures with Python: A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python
Ebook
Modern Data Architectures with Python: A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python
byBrian Lipp
Rating: 0 out of 5 stars
0 ratings
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
Ebook
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
byalasdair gilchrist
Rating: 5 out of 5 stars
5/5
Hadoop in Practice
Ebook
Hadoop in Practice
byAlex Holmes
Rating: 0 out of 5 stars
0 ratings
Networking Programming with C++: Build Efficient Communication Systems
Ebook
Networking Programming with C++: Build Efficient Communication Systems
byRobert Johnson
Rating: 0 out of 5 stars
0 ratings
Serverless Data Engineering
Ebook
Serverless Data Engineering
byChuck Sherman
Rating: 0 out of 5 stars
0 ratings
Contemporary Machine Learning Methods: Harnessing Scikit-Learn and TensorFlow
Ebook
Contemporary Machine Learning Methods: Harnessing Scikit-Learn and TensorFlow
byAdam Jones
Rating: 0 out of 5 stars
0 ratings
Crafting Data-Driven Solutions: Core Principles for Robust, Scalable, and Sustainable Systems
Ebook
Crafting Data-Driven Solutions: Core Principles for Robust, Scalable, and Sustainable Systems
byPeter Jones
Rating: 0 out of 5 stars
0 ratings
Data-Oriented Programming for Beginners
Ebook
Data-Oriented Programming for Beginners
bySam Campbell
Rating: 0 out of 5 stars
0 ratings
Big Data Frameworks: Architectures, Tools, and Techniques for Managing Large-Scale Data. Comprehensive review of Apache Hadoop, Spark and Flink.
Ebook
Big Data Frameworks: Architectures, Tools, and Techniques for Managing Large-Scale Data. Comprehensive review of Apache Hadoop, Spark and Flink.
byMark Jackson
Rating: 0 out of 5 stars
0 ratings
Mastering Data Science: From Basics to Expert Proficiency
Ebook
Mastering Data Science: From Basics to Expert Proficiency
byWilliam Smith
Rating: 0 out of 5 stars
0 ratings
Distributed Programming
Ebook
Distributed Programming
bySaimon Carrie
Rating: 0 out of 5 stars
0 ratings
Data Engineering with Python for Beginners
Ebook
Data Engineering with Python for Beginners
bySimon Winston
Rating: 0 out of 5 stars
0 ratings
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Ebook
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
byPeter Jones
Rating: 0 out of 5 stars
0 ratings
Advanced Computer Networking: Comprehensive Techniques for Modern Systems
Ebook
Advanced Computer Networking: Comprehensive Techniques for Modern Systems
byAdam Jones
Rating: 0 out of 5 stars
0 ratings
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
Ebook
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
byWilliam Smith
Rating: 0 out of 5 stars
0 ratings
Data Engineering Guide for Beginners: Part 1
Ebook
Data Engineering Guide for Beginners: Part 1
byAllan Murray
Rating: 0 out of 5 stars
0 ratings
Edge Computing: From Basics to Expert Proficiency
Ebook
Edge Computing: From Basics to Expert Proficiency
byWilliam Smith
Rating: 0 out of 5 stars
0 ratings
Mastering Vector Databases: The Future of Data Retrieval and AI
Ebook
Mastering Vector Databases: The Future of Data Retrieval and AI
byRobert Johnson
Rating: 0 out of 5 stars
0 ratings
Databricks Essentials: A Guide to Unified Data Analytics
Ebook
Databricks Essentials: A Guide to Unified Data Analytics
byRobert Johnson
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Co-Intelligence: Living and Working with AI
Ebook
Co-Intelligence: Living and Working with AI
byEthan Mollick
Rating: 4 out of 5 stars
4/5
ChatGPT For Dummies
Ebook
ChatGPT For Dummies
byPam Baker
Rating: 4 out of 5 stars
4/5
Nexus: A Brief History of Information Networks from the Stone Age to AI
Ebook
Nexus: A Brief History of Information Networks from the Stone Age to AI
byYuval Noah Harari
Rating: 4 out of 5 stars
4/5
The Instant AI Agency: How to Cash 6 & 7 Figure Checks in the New Digital Gold Rush Without Being A Tech Nerd
Ebook
The Instant AI Agency: How to Cash 6 & 7 Figure Checks in the New Digital Gold Rush Without Being A Tech Nerd
byDan Wardrope
Rating: 0 out of 5 stars
0 ratings
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Writing AI Prompts For Dummies
Ebook
Writing AI Prompts For Dummies
byStephanie Diamond
Rating: 0 out of 5 stars
0 ratings
So You Want to Start a Podcast: Finding Your Voice, Telling Your Story, and Building a Community That Will Listen
Ebook
So You Want to Start a Podcast: Finding Your Voice, Telling Your Story, and Building a Community That Will Listen
byKristen Meinzer
Rating: 3 out of 5 stars
3/5
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business
Ebook
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business
byAlec Rowe
Rating: 3 out of 5 stars
3/5
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
Ebook
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 2 out of 5 stars
2/5
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
AI for Educators: AI for Educators
Ebook
AI for Educators: AI for Educators
byMatt Miller
Rating: 5 out of 5 stars
5/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Our Final Invention: Artificial Intelligence and the End of the Human Era
Ebook
Our Final Invention: Artificial Intelligence and the End of the Human Era
byJames Barrat
Rating: 4 out of 5 stars
4/5
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
Ebook
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Summary of Super-Intelligence From Nick Bostrom
Ebook
Summary of Super-Intelligence From Nick Bostrom
bySummary Station
Rating: 4 out of 5 stars
4/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
MacBook Pro User Manual: 2022 MacBook Pro User Guide for beginners and seniors to Master Macbook Pro like a Pro
Ebook
MacBook Pro User Manual: 2022 MacBook Pro User Guide for beginners and seniors to Master Macbook Pro like a Pro
byA. Cruz Steve
Rating: 0 out of 5 stars
0 ratings
AI Money Machine: Unlock the Secrets to Making Money Online with AI
Ebook
AI Money Machine: Unlock the Secrets to Making Money Online with AI
byLucas Bennett
Rating: 0 out of 5 stars
0 ratings
System Design Interview: 300 Questions And Answers: Prepare And Pass
Ebook
System Design Interview: 300 Questions And Answers: Prepare And Pass
byRob Botwright
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence For Dummies
Ebook
Artificial Intelligence For Dummies
byJohn Paul Mueller
Rating: 3 out of 5 stars
3/5
A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going
Ebook
A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going
byMichael Wooldridge
Rating: 4 out of 5 stars
4/5
Coding with AI For Dummies
Ebook
Coding with AI For Dummies
byChris Minnick
Rating: 1 out of 5 stars
1/5
3550+ Most Effective ChatGPT Prompts
Ebook
3550+ Most Effective ChatGPT Prompts
byOm Prakash Saini
Rating: 0 out of 5 stars
0 ratings
ChatGPT For Fiction Writing: AI for Authors
Ebook
ChatGPT For Fiction Writing: AI for Authors
byNova Leigh
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
UNLIMITED
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
byData Engineering Podcast
0 ratings
0% found this document useful
Yaniv Tal: The Graph – A Marketplace for Web3 Data Indexes Based on GraphQL: We're joined by Yaniv Tal, Project Lead at The Graph. The project aims to create a scalable marketplace for high-availability blockchain data indexes.
UNLIMITED
Yaniv Tal: The Graph – A Marketplace for Web3 Data Indexes Based on GraphQL: We're joined by Yaniv Tal, Project Lead at The Graph. The project aims to create a scalable marketplace for high-availability blockchain data indexes.
byEpicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies
0 ratings
0% found this document useful
Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems: Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and practices around data that is at rest and in motion, but security around data in use is still severely lacking. Recognizing this shortcoming and the capabilities that could be unlocked by a robust solution Rishabh Poddar helped to create Opaque Systems as an outgrowth of his PhD studies. In this episode he shares the work that he and his team have done to simplify integration of secure enclaves and trusted computing environments into analytical workflows and how you can start using it without re-engineering your existing systems.
UNLIMITED
Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems: Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and practices around data that is at rest and in motion, but security around data in use is still severely lacking. Recognizing this shortcoming and the capabilities that could be unlocked by a robust solution Rishabh Poddar helped to create Opaque Systems as an outgrowth of his PhD studies. In this episode he shares the work that he and his team have done to simplify integration of secure enclaves and trusted computing environments into analytical workflows and how you can start using it without re-engineering your existing systems.
byData Engineering Podcast
0 ratings
0% found this document useful
SE Radio 623: Michael J. Freedman on TimescaleDB: Michael J. Freedman, the Robert E. Kahn Professor in the Computer Science Department at Princeton University, as well as the co-founder and CTO of Timescale, spoke with SE Radio host about TimescaleDB. They revisit what time series data means in...
UNLIMITED
SE Radio 623: Michael J. Freedman on TimescaleDB: Michael J. Freedman, the Robert E. Kahn Professor in the Computer Science Department at Princeton University, as well as the co-founder and CTO of Timescale, spoke with SE Radio host about TimescaleDB. They revisit what time series data means in...
bySoftware Engineering Radio - the podcast for professional software developers
0 ratings
0% found this document useful
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
UNLIMITED
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
byData Engineering Podcast
0 ratings
0% found this document useful
Open Standards Make MLOps Easier and Silos Harder // Cody Peterson // #234
UNLIMITED
Open Standards Make MLOps Easier and Silos Harder // Cody Peterson // #234
byMLOps.community
0 ratings
0% found this document useful
Data Gravity? Why Cloud Databases Will Prevail: Information assets may not have physical weight, but that doesn't mean data has no gravity. And in the new, cloud-centric world evolving around us, many new data sets are born in the cloud, where they will likely remain, whether for analytical or...
UNLIMITED
Data Gravity? Why Cloud Databases Will Prevail: Information assets may not have physical weight, but that doesn't mean data has no gravity. And in the new, cloud-centric world evolving around us, many new data sets are born in the cloud, where they will likely remain, whether for analytical or...
byDM Radio
0 ratings
0% found this document useful
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
UNLIMITED
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
byData Engineering Podcast
0 ratings
0% found this document useful
Find Out About The Technology Behind The Latest PFAD In Analytical Database Development: Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.
UNLIMITED
Find Out About The Technology Behind The Latest PFAD In Analytical Database Development: Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.
byData Engineering Podcast
0 ratings
0% found this document useful
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
UNLIMITED
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
byData Engineering Podcast
0 ratings
0% found this document useful
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
UNLIMITED
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
byData Engineering Podcast
0 ratings
0% found this document useful
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
UNLIMITED
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
byData Engineering Podcast
0 ratings
0% found this document useful
Developer Data Platforms
UNLIMITED
Developer Data Platforms
byThe Cloudcast
0 ratings
0% found this document useful
Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams: With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides.
UNLIMITED
Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams: With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides.
byData Engineering Podcast
0 ratings
0% found this document useful
Addressing The Challenges Of Component Integration In Data Platform Architectures: Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.
$Addressing The Challenges Of Component Integration In Data Platform Architectures: Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.$
$Addressing The Challenges Of Component Integration In Data Platform Architectures: Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.$
UNLIMITED
Addressing The Challenges Of Component Integration In Data Platform Architectures: Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.
byData Engineering Podcast
0 ratings
0% found this document useful
Data Migration Strategies For Large Scale Systems: Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. In this episode he shares some of the valuable lessons that he learned about how to make those projects successful.
UNLIMITED
Data Migration Strategies For Large Scale Systems: Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. In this episode he shares some of the valuable lessons that he learned about how to make those projects successful.
byData Engineering Podcast
0 ratings
0% found this document useful
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
UNLIMITED
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
byPapers Read on AI
0 ratings
0% found this document useful
Designing Data Transfer Systems That Scale: The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud.
UNLIMITED
Designing Data Transfer Systems That Scale: The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud.
byData Engineering Podcast
0 ratings
0% found this document useful
Reconciling The Data In Your Databases With Datafold: A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data.
UNLIMITED
Reconciling The Data In Your Databases With Datafold: A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data.
byData Engineering Podcast
0 ratings
0% found this document useful
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable: Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.
UNLIMITED
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable: Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.
byData Engineering Podcast
0 ratings
0% found this document useful
Improving The Performance Of Cloud-Native Big Data At Netflix Using The Iceberg Table Format with Ryan Blue - Episode 52: Iceberg: Improving The Utility Of Cloud-Native Big Data At Netflix (Interview)
UNLIMITED
Improving The Performance Of Cloud-Native Big Data At Netflix Using The Iceberg Table Format with Ryan Blue - Episode 52: Iceberg: Improving The Utility Of Cloud-Native Big Data At Netflix (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
MySQL Database Design: Explore the essentials of MySQL database design with Lois Houston and Nikita Abraham, who team up with MySQL expert Perside Foster to discuss key storage concepts, transaction support in InnoDB, and ACID compliance. You’ll also get tips on choosing...
UNLIMITED
MySQL Database Design: Explore the essentials of MySQL database design with Lois Houston and Nikita Abraham, who team up with MySQL expert Perside Foster to discuss key storage concepts, transaction support in InnoDB, and ACID compliance. You’ll also get tips on choosing...
byOracle University Podcast
0 ratings
0% found this document useful
Building and Deploying Real-World RAG Applications with Ram Sriharsha - #669
UNLIMITED
Building and Deploying Real-World RAG Applications with Ram Sriharsha - #669
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Powering Vector Search With Real Time And Incremental Vector Indexes: The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applications for vector search capabilities both in and outside of AI, as well as the challenges of maintaining real-time indexes of vector data.
UNLIMITED
Powering Vector Search With Real Time And Incremental Vector Indexes: The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applications for vector search capabilities both in and outside of AI, as well as the challenges of maintaining real-time indexes of vector data.
byData Engineering Podcast
0 ratings
0% found this document useful
Dev Ops for Data Science: We revisit the 2018 Microsoft Build in this episode, focusing on the latest ideas in DevOps. Kyle interviews Cloud Developer Advocates Damien Brady, Paige Bailey, and Donovan Brown to talk about DevOps and data science and databases. For a data...
UNLIMITED
Dev Ops for Data Science: We revisit the 2018 Microsoft Build in this episode, focusing on the latest ideas in DevOps. Kyle interviews Cloud Developer Advocates Damien Brady, Paige Bailey, and Donovan Brown to talk about DevOps and data science and databases. For a data...
byData Skeptic
0 ratings
0% found this document useful
"Saga of a Gnarly Report" with Owen and Dan: Elixir Wizards Owen and Dan delve into the complexities of building advanced reporting features within software applications. They share personal insights and challenges encountered while developing reporting solutions for user-generated data, leveraging both Elixir/Phoenix and Ruby on Rails.
UNLIMITED
"Saga of a Gnarly Report" with Owen and Dan: Elixir Wizards Owen and Dan delve into the complexities of building advanced reporting features within software applications. They share personal insights and challenges encountered while developing reporting solutions for user-generated data, leveraging both Elixir/Phoenix and Ruby on Rails.
byElixir Wizards
0 ratings
0% found this document useful
Building Large AI Models
UNLIMITED
Building Large AI Models
byThe Cloudcast
0 ratings
0% found this document useful
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
UNLIMITED
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
byData Engineering Podcast
0 ratings
0% found this document useful
Rethinking databases and Noria with Jon Gjengset: Can we make databases faster and remove the need for caching reads in an external cache? Can we make a distributed SQL based relational database that outperforms memcached? Jon Gjengset and the PDOS team at MIT CSAIL have done just that with Noria....
UNLIMITED
Rethinking databases and Noria with Jon Gjengset: Can we make databases faster and remove the need for caching reads in an external cache? Can we make a distributed SQL based relational database that outperforms memcached? Jon Gjengset and the PDOS team at MIT CSAIL have done just that with Noria....
byCoRecursive: Coding Stories
0 ratings
0% found this document useful
Powering your Copilot for Data – with Artem Keydunov of Cube.dev
UNLIMITED
Powering your Copilot for Data – with Artem Keydunov of Cube.dev
byLatent Space: The AI Engineer Podcast
0 ratings
0% found this document useful

Skip carousel

02 Nvidia’s 200-billion Transistor Blackwell Gpu Will Tackle Xxxl-sized Generative AI Models
HWM Singapore
UNLIMITED
02 Nvidia’s 200-billion Transistor Blackwell Gpu Will Tackle Xxxl-sized Generative AI Models
Apr 8, 2024
3 min read
Supercomputer On A Platter
Business Today
UNLIMITED
Supercomputer On A Platter
Apr 1, 2022
CHENNAI-HEADQUARTERED automobile major TVS Motor Company uses high-performance computing (HPC) for running R&D simulations and testing the aero-dynamics of two-wheelers, which allows it to make the vehicles stable at speed and more efficient, cool en
7 min read
Rediscover Speed With The Redis Revolution
Linux Format
UNLIMITED
Rediscover Speed With The Redis Revolution
Jul 25, 2023
Credit: https://redis.io Redis is an open-source, in-memory data structure store that has gained popularity R as a highly efficient caching and messaging system. It prioritises speed, efficiency and versatility, making it a top choice for various ap
8 min read
‘Digital Twin’ Can Make Wireless Networks Better
Futurity
UNLIMITED
‘Digital Twin’ Can Make Wireless Networks Better
Jul 24, 2024
Researchers have developed a new method for predicting what data wireless computing users will need before they need it, making wireless networks faster and more reliable. The new method makes use of a technique called a “digital twin,” which effecti
2 min read
Nvidia unveils Big Accelerator Memory
APC
UNLIMITED
Nvidia unveils Big Accelerator Memory
Apr 18, 2022
2 min read
Throwing Shade
PC Gamer (US Edition)
UNLIMITED
Throwing Shade
May 18, 2021
4 min read
Opinion
Linux Format
UNLIMITED
Opinion
Jul 23, 2024
Italo Vignoli is one of the founders of LibreOffice and the Document Foundation. “LibreOffice 24.8 will be announced in the second half of August, and the developers are working hard to optimise the new features that will be included. It will be the
3 min read
Edge and Cloud Computing Can They Coexist Peacefully?
Techfastly
UNLIMITED
Edge and Cloud Computing Can They Coexist Peacefully?
Jun 1, 2022
6 min read
Edge Computing Ecosystem Architecture, Use Cases, and Examples
Techfastly
UNLIMITED
Edge Computing Ecosystem Architecture, Use Cases, and Examples
Jun 1, 2022
6 min read
Scan Cloud RTX Virtual Workstation
PC Pro Magazine
UNLIMITED
Scan Cloud RTX Virtual Workstation
Aug 7, 2022
2 min read
It’s Great When You’re K8s
Linux Format
UNLIMITED
It’s Great When You’re K8s
Oct 18, 2022
8 min read
Where is Streaming Data Stored Temporarily? The Role of Storage in Streaming Media
Techfastly
UNLIMITED
Where is Streaming Data Stored Temporarily? The Role of Storage in Streaming Media
May 1, 2022
4 min read
All Your Database Are Belong To Us
Linux Format
UNLIMITED
All Your Database Are Belong To Us
Apr 6, 2021
7 min read
Kings And Databases
Linux Format
UNLIMITED
Kings And Databases
Oct 20, 2020
“Are architects the new kingmakers of the database world? To get market insight, Percona conducts an annual Open Source Data Management Software survey [http://bit.ly/lxf269sur]. When it comes to actual decision-making, architects (43 per cent) were
1 min read
Speed And Benchmarks
Linux Format
UNLIMITED
Speed And Benchmarks
Jun 2, 2020
1 min read
Open Success
Linux Format
UNLIMITED
Open Success
Nov 17, 2020
“ClickHouse was developed for Yandex Metrics (the Russian equivalent of Google Analytics) as a data store and was Apache 2 licenced in 2016. In 2020. Altinity picked up $4m in funding to help it finish off a ClickHouse cloud service that’s in private
1 min read
AMD’s Ryzen 7000 And RDNA 3 Chips Are Set To Stun Later This Year
PCWorld
UNLIMITED
AMD’s Ryzen 7000 And RDNA 3 Chips Are Set To Stun Later This Year
Jul 6, 2022
2 min read
Sorting Out The AI Gold Rush?
The European Business Review
UNLIMITED
Sorting Out The AI Gold Rush?
Sep 26, 2024
12 min read
How To Implement Edge Computing in Your Organization?
Techfastly
UNLIMITED
How To Implement Edge Computing in Your Organization?
Jun 1, 2022
5 min read
Asus Vivobook S 15 OLED
T3 India
UNLIMITED
Asus Vivobook S 15 OLED
Oct 9, 2024
₹1,24,990 asus.com/in The sleek and lightweight Asus Vivobook S 15 for 2024 is a thing of beauty. The first thing to get my attention was how light it was. The second I opened it, what immediately caught my eye was the “Snapdragon X Elite” sticker—no
2 min read
Nvidia Uses GPU-powered AI To Design GPUs
APC
UNLIMITED
Nvidia Uses GPU-powered AI To Design GPUs
May 16, 2022
2 min read
Database Control With C++ Tools
Linux Format
UNLIMITED
Database Control With C++ Tools
Dec 17, 2019
10 min read
HotPicks
Linux Format
UNLIMITED
HotPicks
Jun 4, 2019
12 min read
Qsan XCubeNAS XN8112R
PC Pro Magazine
UNLIMITED
Qsan XCubeNAS XN8112R
Apr 6, 2023
2 min read
Software Pools Server Memory for Faster Networks
Futurity
UNLIMITED
Software Pools Server Memory for Faster Networks
May 31, 2017
A group of engineers has created open-source software that allows for memory sharing among servers in a computer network, allowing for more efficient use of memory and even faster computer operations. For decades, operators of large computer clusters
2 min read
QNAP TVS-h674T NAS
APC
UNLIMITED
QNAP TVS-h674T NAS
Jan 4, 2024
2 min read
Accurate, Open Source IP-based Localisation
Linux Format
UNLIMITED
Accurate, Open Source IP-based Localisation
Dec 14, 2021
8 min read
Throwing Shade
PC Gamer
UNLIMITED
Throwing Shade
Apr 29, 2021
3 min read
Raspberry Pi 5 killers
Linux Format
UNLIMITED
Raspberry Pi 5 killers
Nov 14, 2023
11 min read
The Future Of The Database
Linux Format
UNLIMITED
The Future Of The Database
Aug 27, 2019
7 min read

Related categories

Skip carousel

Reviews for High-Performance Big Data Computing

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

High-Performance Big Data Computing - Dhabaleswar K. Panda

Cover: High-Performance Big Data Computing by Dhabaleswar K. Panda, Xiaoyi Lu, and Dipti Shankar

High-Performance Big Data Computing

Dhabaleswar K. Panda, Xiaoyi Lu, and Dipti Shankar

The MIT Press

Cambridge, Massachusetts

London, England

All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.

The MIT Press would like to thank the anonymous peer reviewers who provided comments on drafts of this book. The generous work of academic experts is essential for establishing the authority and quality of our publications. We acknowledge with gratitude the contributions of these otherwise uncredited readers.

Library of Congress Cataloging-in-Publication Data

Names: Panda, Dhabaleswar K., author. | Lu, Xiaoyi (Professor of computer science), author. | Shankar, Dipti, author.

Title: High-performance big data computing / Dhabaleswar K. Panda, Xiaoyi Lu, and Dipti Shankar.

Description: Cambridge, Massachusetts: The MIT Press, [2022] | Series: Scientific and engineering computation | Includes bibliographical references and index.

Identifiers: LCCN 2021038754 | ISBN 9780262046855 (hardcover)

Subjects: LCSH: High performance computing. | Big data.

Classification: LCC QA76.88.P36 2022 | DDC 005.7—dc23/eng/20211020

LC record available at https://lccn.loc.gov/2021038754

d_r0

Acknowledgments

1 Introduction

1.1 Overview

1.2 Big Data Characteristics and Trends

1.3 Current Systems for Data Management and Processing

1.4 Technological Trends

1.5 Convergence in HPC, Big Data, and Deep Learning

1.6 Outline of the Book

1.7 Summary

2 Parallel Programming Models and Systems

2.1 Overview

2.2 Batch Processing Frameworks

2.3 Stream Processing Frameworks

2.4 Query Processing Frameworks

2.5 Graph Processing Frameworks

2.6 Machine Learning and Deep Learning Frameworks

2.7 Interactive Big Data Tools

2.8 Monitoring and Diagnostics Tools

2.9 Summary

3 Parallel and Distributed Storage Systems

3.1 Overview

3.2 File Storage

3.3 Object Storage

3.4 Block Storage

3.5 Memory-Centric Storage

3.6 Monitoring and Diagnostics Tools

3.7 Summary

4 HPC Architectures and Trends

4.1 Overview

4.2 Computing Capabilities

4.3 Storage

4.4 Network Interconnects

4.5 Summary

5 Opportunities and Challenges in Accelerating Big Data Computing

5.1 Overview

5.2 C1: Computational Challenges

5.3 C2: Communication and Data Movement Challenges

5.4 C3: Memory and Storage Management Challenges

5.5 C4: Challenges of Codesigning Big Data Systems and Applications

5.6 C5: Challenges of Big Data Workload Characterization and Benchmarking

5.7 C6: Deployment and Management Challenges

5.8 Summary

6 Benchmarking Big Data Systems

6.1 Overview

6.2 Oﬄine Analytical Data Processing

6.3 Streaming Data Processing

6.4 Online Data Processing

6.5 Graph Data Processing

6.6 Machine Learning and Deep Learning Workloads

6.7 Comprehensive Benchmark Suites

6.8 Summary

7 Accelerations with RDMA

7.1 Overview

7.2 Batch and Stream Processing Systems

7.3 Graph Processing Systems

7.4 RPC Libraries

7.5 Query Processing in Databases

7.6 In-Memory KV Stores

7.7 HiBD Project

7.8 Case Studies and Performance Benefits

7.9 Summary

8 Accelerations with Multicore/Accelerator Technologies

8.1 Introduction

8.2 Multicore CPUs

8.3 GPU Acceleration for Big Data Computing

8.4 FPGAs and ASICs

8.5 Case Studies and Performance Benefits

8.6 Summary

9 Accelerations with High-Performance Storage Technologies

9.1 Overview

9.2 Exploring NVM-Centric Designs

9.3 Hybrid and Hierarchical Storage Middleware

9.4 Burst Buffer Systems

9.5 Case Studies and Performance Benefits

9.6 Summary

10 Deep Learning over Big Data

10.1 Overview

10.2 Convergence of Deep Learning, Big Data, and HPC

10.3 Challenges of Designing DLoBD Stacks

10.4 Distributed Deep Learning Training Basics

10.5 Overview of DLoBD Stacks

10.6 Characterization of DLoBD Stacks

10.7 Case Studies and Performance Benefits

10.8 Discussions on Optimizations for Deep Learning Workloads

10.9 Summary

11 Designs with Cloud Technologies

11.1 Overview

11.2 Overview of High-Performance Cloud Technologies

11.3 State-of-the-Art Designs

11.4 Case Studies and Performance Benefits

11.5 Summary

12 Frontier Research on High-Performance Big Data Computing

12.1 Heterogeneity-Aware Big Data Processing and Management Systems

12.2 Big Data Processing and Management for Hybrid Storage Systems

12.3 Efficient and Coherent Communication and Computation in Network for Big Data Systems

12.4 Summary

References

Index

List of Figures

Figure 1.1

Four V characteristics of big data.

Figure 1.2

Data Never Sleeps 8.0. Source: Courtesy of (Domo).

Figure 1.3

Convergence of HPC, big data, and deep learning.

Figure 1.4

Challenges in bringing HPC, big data processing, and deep learning into a convergent trajectory.

Figure 1.5

Can we efficiently run big data and deep learning jobs on existing HPC infrastructure?

Figure 1.6

Challenges of high-performance big data computing. HDD, hard disk drive; NICs, network interface cards; QoS, quality of service; SR-IOV, single root input/output virtualization.

Figure 1.7

Outline of the book.

Figure 2.1

Programming models for distributed data processing.

Figure 2.2

Overview of data processing with MapReduce.

Figure 2.3

Overview of data processing with Hadoop MapReduce.

Figure 2.4

Overview of Spark architecture.

Figure 2.5

Spark dependencies.

Figure 2.6

Overview of streaming processing with Apache Storm.

Figure 2.7

Overview of streaming processing with Apache Flink.

Figure 2.8

Overview of TensorFlow stack.

Figure 2.9

Overview of distributed TensorFlow environment.

Figure 3.1

Various types of parallel and distributed storage systems. PCIe, Peripheral Component Interconnect Express; RADOS, Reliable, Autonomic Distributed Object Store; SATA, Serial AT Attachment.

Figure 3.2

File system architecture. MR, MapReduce; OSS, Object Storage Server; OST, Object Storage Target.

Figure 3.3

OpenStack Swift: architecture overview.

Figure 3.4

Apache Cassandra: architecture overview.

Figure 3.5

Memcached (distributed caching over DRAM).

Figure 3.6

Redis (distributed in-memory data store). HA, High Availability.

Figure 4.1

Overview of a typical HPC system architecture.

Figure 4.2

Storage device hierarchy.

Figure 4.3

NVMe command processing.

Figure 4.4

Overview of high-performance network interconnects and protocols. OFI (OpenFabrics Interfaces).

Figure 4.5

One-way latency: MPI over RDMA networks with MVAPICH2. (a) Small message latency. (b) Large message latency.

Figure 4.6

Bandwidth: MPI over RDMA networks with MVAPICH2. (a) Unidirectional bandwidth. (b) Bidirectional bandwidth.

Figure 4.7

RDMA over NVM: contrasting NVMeoF and remote PMoF

Figure 5.1

Envisioned architecture for next-generation HEC systems. Courtesy of Panda et al. (2018).

Figure 5.2

Challenges of achieving high-performance big data computing.

Figure 6.1

Big data benchmarks.

Figure 7.1

Design overview of RDMA-Memcached (Shankar, Lu, Islam, et al., 2016; Jose et al., 2011). KNL, kernel; LRU, least recently used.

Figure 7.2

RDMA-based Hadoop architecture and its different modes. PBS, Portable Batch System.

Figure 7.3

Performance improvement of RDMA-based designs for Apache Spark and Hadoop on SDSC Comet cluster. (a) PageRank with RDMA-Spark. (b) Sort with RDMA–Hadoop 2.x.

Figure 7.4

Performance benefits with RDMA-Memcached based workloads. (a) Memcached Set/Get over simulated MySQL. (b) Hadoop TestDFSIO throughput with Boldio.

Figure 8.1

Architecture overview of GPU-aware hash table in Memcached.

Figure 8.2

Stand-alone throughput with CPU and GPU-centric hash table (based on Mega-KV (K. Zhang, Wang, et al., 2015)). (a) Insert. (b) Search. MOPS, millions of operations per second; thrs, threads.

Figure 8.3

Stand-alone hash table probing performance on the twenty-eight–core Intel Skylake CPU, over a three-way cuckoo hash table versus non-SIMD CPU-optimized MemC3 hash table with 32-bit key/payload (Shankar et al., 2019a).

Figure 9.1

Performance benefits of heterogeneous storage-aware designs for Hadoop on SDSC Comet. (a) NVM-assisted MapReduce design. (b) Spark TeraSort over heterogeneity-aware HDFS. MR-IPoIB, Default MapReduce running with the IPoIB protocol; RMR, RDMA-based MapReduce; RMR-NVM, RDMA-based MapReduce running with NVM in a naive manner; NVMD, Non-Volatile Memory-assisted design for MapReduce and DAG execution frameworks (Rahman et al., 2017).

Figure 9.2

Performance benefits with RDMA-Memcached–based workloads.

Figure 10.1

Deep learning and big data analytics pipeline. Source: Courtesy of Flickr (Garrigues, 2015).

Figure 10.2

Overview of a unified DLoBD stack. IB, InfiniBand.

Figure 10.3

Convergence of deep learning, big data, and HPC.

Figure 10.4

Overview of CaffeOnSpark. DB, database.

Figure 10.5

Overview of TensorFlowOnSpark.

Figure 10.6

Overview of MMLSpark (CNTKOnSpark).

Figure 10.7

Overview of BigDL.

Figure 10.8

Comparison of DNNs. Source: Courtesy of Canziani et al. (2016). BN, Batch Nominations; ENet, efficient neural network; G-Ops, one billion (10⁹) operations per second; ResNet, residual neural network; M, million; NIN, Network in Network; GoogleLeNet, a 22-layer Deep Convolutional Neural Network that’s a variant of the Inception Neural Network developed by researchers at Google.

Figure 10.9

Performance and accuracy comparison of training AlexNet on ImageNet with CaffeOnSpark running over IPoIB and RDMA. Source: Courtesy of X. Lu et al. (2018).

Figure 10.10

Performance analysis of TensorFlowOnSpark and stand-alone TensorFlow (lower is better). The numbers were taken by training the SoftMax Regression model over the MNIST dataset on a four-node cluster, which includes one PS and three workers. Source: Courtesy of X. Lu et al. (2018).

Figure 11.1

Overview of virtualization techniques. (a) VM architecture. (b) Container architecture. libs, libraries; OS, operating system.

Figure 11.2

SR-IOV architecture. IOMMU, input-output memory management unit.

Figure 11.3

Topology-aware resource allocation in Hadoop-Virt.

Figure 11.4

NVMe hardware arbitration overview.

Figure 11.5

Performance benefits of Hadoop-Virt on HPC clouds. Execution times for (a) WordCount, (b) PageRank, (c) Sort, and (d) Self-Join (30 GB).

Figure 11.6

Evaluation with synthetic application scenarios. (a) Bandwidth over time with scenario 1. (b) Job bandwidth ratio for scenarios 2–5.

Acknowledgments

We are grateful to our students and collaborators, Adithya Bhat, Rajarshi Biswas, Shashank Gugnani, Yujie Hui, Nusrat Islam, Haseeb Javed, Arjun Kashyap, Kunal Kulkarni, Tianxi Li, Yuke Li, Hao Qi, Md. Wasi-ur-Rahman, Haiyang Shi, and Jie Zhang, for their joint scientific work over the past ten years. We sincerely thank Shashank Gugnani, Haseeb Javed, Arjun Kashyap, Yuke Li, Hao Qi, and Haiyang Shi for their contributions to this collection or for proofreading several versions of this manuscript. Special thanks to Marie Lee, Kate Elwell, and Elizabeth Swayze from The MIT Press for their significant help in publishing this book. In addition, we are indebted to the National Science Foundation (NSF) for multiple grants (e.g., IIS-1447804, OAC-1636846, CCF-1822987, OAC-2007991, OAC-2112606, and CCF-2132049). This book would not have been possible without this support.

Finally, we dedicate this book to our loving families (P. S. Panda, S. M. Panda, Debashree Pati, Abha Panda, Zonghe Lu, Haiying Yu, Sherry Peng, Ada Lu, Alivia Lu, Alan Lu, Dr. R. Shivashankar, G. S. Usharani, and Manju G. Siddappa) for their love and understanding during the long process of writing this book over the past five years.

Dhabaleswar K. (DK) Panda, Xiaoyi Lu, and Dipti Shankar

March 19, 2022

1

Introduction

Human society is in a data explosion era, where data are growing exponentially. This era has been called the big data era with the 5Vs characteristics, which are volume, velocity, variety, veracity, and value. To tamp the challenges associated with five big Vs, a new field—high-performance big data computing—is emerging, which aims to bring high-performance computing (HPC), big data processing, and deep learning into a convergent trajectory. This book aims to provide an in-depth overview of this field and the associated technical challenges, approaches, and solutions. This chapter provides a high-level overview of research topics and challenges in this field and an outline of the overall book.

1.1 Overview

During the last decade, big data has changed the way people understand and harness the power of data, both in the business and research domains. Big data has become one of the most important elements in business analytics. Big data, HPC, and deep learning/machine learning (DL/ML) are converging to meet large-scale data processing challenges. Running high-performance data analytics workloads in the HPC and cloud computing environments are gaining popularity. According to the recent Hyperion research report (Norton et al., 2020), High-performance data analytics workloads have seen robust growth in the last few years, both in budget allocations as well as organizational focus. This trend is poised to grow over the next decade. The field of big data is being expanded into huge data (Wang et al.).

In this context, challenging issues are emerging along the following four major directions: (1) understanding big data characteristics and trends; (2) understanding the interplay among big data, HPC, and deep learning/machine learning; (3) understanding the trends of HPC technologies (processing, networking, and storage) to accelerate big data processing; and (4) understanding the benefits of accelerating big data processing.

1.2 Big Data Characteristics and Trends

Traditionally, big data problems and solutions have been characterized with the 3Vs (volume, velocity, and variety). In recent years, a fourth V (veracity) has been added. These characteristics are illustrated in figure 1.1. Volume reflects the amount of data at rest to be processed. Velocity refers to data in motion. Variety refers to the vast variety in the type of data that must be processed. Veracity refers to data in doubt.

Figure 1.1

Four V characteristics of big data.

Data are being generated in all different forms and shapes from many different businesses, organizations, and entities. Figure 1.2 illustrates the amount of data being generated every minute of the day by various entities in 2020. For example, five hundred hours of videos are being uploaded to YouTube every minute. Around 41.67 million messages are being posted by WhatsApp users every minute. Amazon is shipping 6,659 packages per minute. This leads to a big challenge in designing appropriate systems to process big data analytics.

Figure 1.2

Data Never Sleeps 8.0. Source: Courtesy of (Domo).

Efficient processing of big data with the 4Vs has many significant challenges with current-generation technologies, especially with the constantly growing data. Large volumes of data typically result in out-of-core data processing and movement, as well as significant input/output (I/O) bottlenecks. On the other hand, data in motion, popularly known as big velocity, requires real-time data processing capability. This places high-performance expectations on the underlying computing resources, networks, and storage systems for computation, communication, and I/O. The third V (variety) has resulted in the development of several data processing frameworks by the big data community; for example, Hadoop, Spark, Flink, Storm, Kafka.

However, it is unlikely that a single standardized specification or implementation will be converged upon in the near future, making it hard to move forward with highly optimized frameworks since each of them is designed differently. Current optimizations proposed in the literature and community have mostly been done in a case-by-case manner. Therefore, to address the challenges presented by the 4Vs for big data processing, there is a critical need to design next-generation big data software stacks capable of processing data in a high-performance and scalable manner that can optimally leverage the underlying network, computation, and storage capabilities. In this context, a fifth V (value) has been added in the context of big data processing. The value of certain data and the associated business intelligence to be derived from this data can differ from organization to organization. For a given organization, a certain type of data might have a significant value. Thus, this organization will be willing to put a significant effort (and cost) to process this data. For another organization, such criticality might not exist. Such a trend is leading to various value propositions in the big data community to process different types of data.

1.3 Current Systems for Data Management and Processing

Broadly, current-generation data management and processing systems on modern data centers are working in two major tiers: front-end tier and back-end tier. The front-end tier software components are typically deployed for serving data accessing queries and online data processing. The corresponding data management and processing software components in this tier usually include (1) web servers, such as Apache HTTP Server, NGINX, Tomcat, and so on; (2) databases, such as MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server, and so on; (3) distributed caching layer, such as Memcached, Redis, and so on; (4) NoSQL (Not Only SQL) databases, such as HBase, MongoDB, and so on. From the performance perspective, the online applications require these front-end tier software systems in a data center to process data in a low-latency and high-throughput manner, so that the positive user experience can be provided by the data center. This is why usually the system administrators choose to deploy a distributed caching layer on top of the traditional database system. Through such a caching layer, many data queries can be directly served with the cached data copies in the memory, which is much faster than the case of loading the data from the database system.

With increasing amounts of data being processed and stored by the front-end tier software systems, the data will gradually be moved to the back-end tier for further processing, such as data mining, data cleaning, machine learning, data warehousing, and so on. The major goal of the back-end tier software components is mining the value in an oﬄine fashion from the huge amount of data through data analytics and machine learning or deep learning jobs. The corresponding data management and processing software components in this tier usually include (1) distributed storage systems, such as Hadoop Distributed File System (HDFS), Ceph, Swift, and so on; (2) data analytics middleware, such as Hadoop, Spark, Flink, and so on; (3) machine learning or deep learning frameworks, such as TensorFlow, PyTorch, and so on; (4) different kinds of data analytics and machine learning tools or libraries, such as MLlib, Keras, and so on. From the performance perspective, high-throughput and horizontal scalability are the most important pursued properties for these back-end tier software systems in a data center.

In this book, we will discuss the programming models and software architectures of some example systems in both front-end tier and back-end tier. More details can be found in chapters 2 and 3.

1.4 Technological Trends

In the last few years, big data analytics and management software stacks have been significantly enhanced for performance and scalability. Among various factors, hardware evolution is one of the key factors driving the evolution of big data analytics and management systems. The last few years have witnessed a rapid increase in the number of processor cores and an equally impressive increase in memory capacity and network bandwidth on modern cluster-based systems in both HPC centers and data centers. This growth has been fueled by the current trends in multi-/many-core architectures, emerging heterogeneous memory technologies (e.g., DRAM, nonvolatile memory [NVM] (Qureshi et al., 2009; Kültürsay et al., 2013), or persistent memory [PMEM], high-bandwidth memory, NVM Express solid state drive [NVMe-SSD]), and high-speed interconnects such as InfiniBand, Omni-Path, RDMA (i.e., remote direct memory access) over converged enhanced Ethernet (RoCE), and so on.

These multi-/many-core architectures, heterogeneous memory, and high-speed interconnects are currently gaining momentum for designing next-generation HPC and cloud computing environments. These novel hardware architectures with higher performance and advanced features open up many opportunities to redesign the big data analytics and management software stacks to achieve unprecedented performance and scalability.

Thus, hardware-conscious or architecture-aware designs for big data analytics and management software stacks have been a fruitful research area. We have seen many exciting research results and promising performance improvements brought by architecture-aware optimizations, from emerging memory technologies (such as NVM/PMEM), high-speed interconnects (such as RDMA-enabled networks) on easing the I/O and communication bottlenecks, to multi-core/many-core architecture-based parallel processing for big data analytics and management software stacks.

In the HPC community, advanced technologies have been widely adopted to solve the challenges of a huge amount of scientific data to be processed and stored. Modern HPC systems and the associated middleware (such as message passing interface, or MPI, burst buffer, and parallel file systems) have been exploiting the advances in HPC technologies (multi-/many-core architectures, RDMA-enabled networking, NVRAMs and, SSDs) during the last decades. However, current-generation out-of-box big data analytics and management software stacks (e.g., Hadoop, Spark, Flink, Memcached) have not fully embraced such technologies. For instance, recent studies (Rahman et al., 2014; Lu et al., 2014; Islam et al., 2016b; Shankar, Lu, Islam, et al., 2016; Y. Wang et al., 2015; Lim et al., 2014; Huang et al., 2014; Arulraj et al., 2015) have shed light on the possible performance improvements for different big data middleware by taking advantage of RDMA over InfiniBand network, byte-addressability, and persistency of NVM. In this book, we will discuss more details about technological trends in modern HPC and data center clusters in chapter 4.

1.5 Convergence in HPC, Big Data, and Deep Learning

In recent years, the community is seeing an important convergence among three big fields—HPC, big data, and deep learning—as shown in figure 1.3. As the HPC environment keeps providing more and more advanced capabilities, the big data community is able to take advantage of those capabilities recently. In the meantime, we also see the deep learning community is able to leverage the technological advances from both HPC and big data fields to form up its two critical pillars: unprecedented computing capabilities and a huge amount of data for model training. This convergence cycle is continuing over the years. We believe that this trend will benefit all three fields, and we will see increasingly better solutions being proposed and developed from these communities to achieve higher performance and scalability for end applications.

Figure 1.3

Convergence of HPC, big data, and deep learning.

The convergence of HPC, big data, and deep learning is becoming the next game-changing business opportunity. This trend has led to many important research and development activities in the fields to bring HPC, big data processing, and deep learning into a convergent trajectory. As demonstrated in figure 1.4, from a user’s perspective, we have to answer many critical questions and challenges to make this convergence happen. Some example questions may include the following:

Figure 1.4

Challenges in bringing HPC, big data processing, and deep learning into a convergent trajectory.

• What are the major bottlenecks in current big data processing and deep learning middleware (e.g., Hadoop, Spark, TensorFlow, PyTorch)?

• Can these bottlenecks be alleviated with new designs by taking advantage of HPC technologies?

• Can RDMA-enabled high-performance interconnects, which are commonly deployed on HPC systems, benefit big data processing and deep learning systems and applications?

• Can HPC Clusters with high-performance storage systems (e.g., PMEM, NVMe-SSD, parallel file system) benefit big data and deep learning applications?

• How much performance benefits can be achieved through enhanced designs or codesigns?

• How to design benchmarks for evaluating the performance of big data and deep learning middleware on HPC clusters?

There are definitely more questions that can be added to this list. To help answer these questions, this book aims to provide an in-depth and systematic overview of the latest research findings in major and emerging topics for HPC + big data + deep learning over HPC clusters and clouds.

As a starting point of exploring these research opportunities, we can try to deploy and run current-generation big data and deep learning jobs (e.g., Hadoop jobs, Spark jobs, TensorFlow jobs) on existing HPC infrastructures, as shown in figure 1.5. Through workload characterization and performance analysis, we can examine the potential bottlenecks for efficiency and scalability in this execution model and stack.

Figure 1.5

Can we efficiently run big data and deep learning jobs on existing HPC infrastructure?

Figure 1.6 provides a high-level overview of the associated challenges of achieving high-performance big data computing on HPC and cloud computing systems. The bottom layer in this figure shows different kinds of advanced technologies provided by HPC and cloud computing infrastructures, such as high-speed networking technologies, high-performance and commodity computing system architectures, and advanced storage technologies. The upper layer in this figure shows the technology consumers, such as data-intensive applications, benchmarks, and workloads. In the middle, we have three important layers that help to deliver the near-peak performance from the hardware layer to the application layer. These three layers are the communication and I/O library layer, programming model layer, and big data processing and management middleware layer. Each of these layers needs to be designed efficiently to expose the maximum performance and flexibility to the upper layer components.

Figure 1.6

Challenges of high-performance big data computing. HDD, hard disk drive; NICs, network interface cards; QoS, quality of service; SR-IOV, single root input/output virtualization.

The major challenges of designing a high-performance and scalable communication and I/O layer include designing efficient point-to-point communication protocols, thread models and synchronization mechanisms, virtualization support with near-native performance, low-latency and high-throughput I/O operations on file systems or storage systems, quality of service and fault tolerance support, performance tuning, and so on. All these properties are critical features for the desired communication and I/O library in a next-generation high-performance big data computing stack. A successful example of such a communication and I/O layer is MPI (MPI) for the HPC community. Unfortunately, the big data community has not come up with a standardized communication and I/O layer yet, which could be seen as a pre-MPI stage (R. Lu et al., 2014; Lu et al., 2011). Historical lessons tell us that high-performance big data computing needs a standard and efficient communication and I/O infrastructure.

Traditionally, big data processing and management middleware, such as Hadoop, Spark, HBase, Memcached, and so on, are designed on top of conventional communication and I/O protocols, such as TCP/IP, remote procedure call (RPC), file system calls, and so on. These protocols are built with operating system-centric concepts and interfaces, such as Sockets, Portable Operating System Interface (POSIX), etc. These programming models typically have higher overhead due to the context switches and buffer copies between user-space and kernel-space (Rahman et al., 2014; Lu et al., 2014; Islam et al., 2016b, Shankar, Lu, Islam, et al., 2016). With more and more advanced technologies provided by the underlying hardware layer, new programming models and interfaces are becoming available and they can provide pure user-space and zero-copy communication and I/O protocols for the applications. For instance, RDMA is one such promising communication model, and it has been widely used in the HPC community for more than twenty years. In addition, PMEM and NVMe-SSD–based I/O programming models are emerging in the storage community as well, which have been demonstrated with high-performance benefits for data-intensive applications compared to the traditional POSIX-based I/O approaches (Klimovic et al., 2017; Cao et al., 2018; Xia et al., 2017; Islam et al., 2016b). These new programming models are not only significantly improving the performance and scalability of big data processing and management middleware, but they also open up a lot of new codesign opportunities for the upper layer systems and applications.

Rather than exploit these commodity hardware platforms and technologies (e.g., RDMA, NVMe, PMEM) in the research community, many high-tech companies (such as Google and Amazon) make their own proprietary chips, motherboards, networks, and so on. Their networking, I/O, and software stacks are presumably optimized to exploit the unique capabilities provided by their hardware devices. Due to the unavailability of technical details about those proprietary designs, we will not discuss them in this book. However, the major goal of all these designs are similar, which is trying to significantly improve the performance and scalability of current-generation big data analytics and management systems to meet the growing challenges of huge data or big data.

In the meantime, we should note that several big cloud providers (such as Microsoft Azure, AWS, Oracle Cloud, and Alibaba Cloud) have been adopting HPC networking technologies (such as InfiniBand and RoCE) in their latest HPC instances. Thus, the discussed designs in this book can also run on these HPC instances on the cloud. Even many of the social site data centers such as Facebook, Microsoft, Alibaba, and so on, have also moved to adopt InfiniBand and RoCE HPC networking technologies. In addition to running the traditional big data analytics workload, these data centers are currently running deep learning and artificial intelligence workloads. More details about these cloud-based designs will be discussed in chapter 11.

More technical challenges of designing high-performance big data computing systems and applications will be discussed in detail in chapter 5.

1.6 Outline of the Book

Based on the preceding discussions on research challenges of achieving high-performance big data computing, this book has been organized in the following five parts with twelve chapters in total as shown in figure 1.7.

Figure 1.7

Outline of the book.

• Chapters 1–4 describe the basic introductory concepts and background knowledge necessary for a good understanding of HPC, big data, deep learning, and so on. Chapter 1 has presented a global view of the field of high-performance big data computing. Chapter 2 describes popular

Enjoying the preview?

Page 1 of 1

High-Performance Big Data Computing

About this ebook

Dhabaleswar K. Panda

Related authors

Related to High-Performance Big Data Computing

Related ebooks

Mastering DuckDB: High-Performance Analytics Made Easy

Mastering Big Data and Hadoop: From Basics to Expert Proficiency

Mastering C: Advanced Techniques and Best Practices

Optimized Computing in C++: Mastering Concurrency, Multithreading, and Parallel Programming

The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management

Mastering C: Advanced Techniques and Tricks

PySpark Essentials: A Practical Guide to Distributed Computing

Mastering ScyllaDB: High-Performance NoSQL with C++

C++ Advanced Programming: Building High-Performance Applications

Advanced Database Architecture: Strategic Techniques for Effective Design

Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala

Modern Data Architectures with Python: A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python

Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform

Hadoop in Practice

Networking Programming with C++: Build Efficient Communication Systems

Serverless Data Engineering

Contemporary Machine Learning Methods: Harnessing Scikit-Learn and TensorFlow

Crafting Data-Driven Solutions: Core Principles for Robust, Scalable, and Sustainable Systems

Data-Oriented Programming for Beginners

Big Data Frameworks: Architectures, Tools, and Techniques for Managing Large-Scale Data. Comprehensive review of Apache Hadoop, Spark and Flink.

Mastering Data Science: From Basics to Expert Proficiency

Distributed Programming

Data Engineering with Python for Beginners

Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive

Advanced Computer Networking: Comprehensive Techniques for Modern Systems

Mastering Deep Learning with Keras: From Basics to Expert Proficiency

Data Engineering Guide for Beginners: Part 1

Edge Computing: From Basics to Expert Proficiency

Mastering Vector Databases: The Future of Data Retrieval and AI

Databricks Essentials: A Guide to Unified Data Analytics

Intelligence (AI) & Semantics For You

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing

Artificial Intelligence: A Guide for Thinking Humans

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees

ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind

Co-Intelligence: Living and Working with AI

ChatGPT For Dummies

Nexus: A Brief History of Information Networks from the Stone Age to AI

The Instant AI Agency: How to Cash 6 & 7 Figure Checks in the New Digital Gold Rush Without Being A Tech Nerd

The Secrets of ChatGPT Prompt Engineering for Non-Developers

Writing AI Prompts For Dummies

So You Want to Start a Podcast: Finding Your Voice, Telling Your Story, and Building a Community That Will Listen

ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business

Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention

Midjourney Mastery - The Ultimate Handbook of Prompts

AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python

ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)

AI for Educators: AI for Educators

Dark Aeon: Transhumanism and the War Against Humanity

Our Final Invention: Artificial Intelligence and the End of the Human Era

ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve

Summary of Super-Intelligence From Nick Bostrom

Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates

MacBook Pro User Manual: 2022 MacBook Pro User Guide for beginners and seniors to Master Macbook Pro like a Pro

AI Money Machine: Unlock the Secrets to Making Money Online with AI

System Design Interview: 300 Questions And Answers: Prepare And Pass

Artificial Intelligence For Dummies

A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going

Coding with AI For Dummies

3550+ Most Effective ChatGPT Prompts

ChatGPT For Fiction Writing: AI for Authors

Related podcast episodes

Related articles

Related categories

Reviews for High-Performance Big Data Computing

What did you think?

Book preview

High-Performance Big Data Computing - Dhabaleswar K. Panda

d_r0

Contents

List of Figures

Acknowledgments

1

Introduction