Search | arXiv e-print repository

A Survey of Pipeline Tools for Data Engineering

Authors: Anthony Mbata, Yaji Sripada, Mingjun Zhong

Abstract: Currently, a variety of pipeline tools are available for use in data engineering. Data scientists can use these tools to resolve data wrangling issues associated with data and accomplish some data engineering tasks from data ingestion through data preparation to utilization as input for machine learning (ML). Some of these tools have essential built-in components or can be combined with other tool… ▽ More Currently, a variety of pipeline tools are available for use in data engineering. Data scientists can use these tools to resolve data wrangling issues associated with data and accomplish some data engineering tasks from data ingestion through data preparation to utilization as input for machine learning (ML). Some of these tools have essential built-in components or can be combined with other tools to perform desired data engineering operations. While some tools are wholly or partly commercial, several open-source tools are available to perform expert-level data engineering tasks. This survey examines the broad categories and examples of pipeline tools based on their design and data engineering intentions. These categories are Extract Transform Load/Extract Load Transform (ETL/ELT), pipelines for Data Integration, Ingestion, and Transformation, Data Pipeline Orchestration and Workflow Management, and Machine Learning Pipelines. The survey also provides a broad outline of the utilization with examples within these broad groups and finally, a discussion is presented with case studies indicating the usage of pipeline tools for data engineering. The studies present some first-user application experiences with sample data, some complexities of the applied pipeline, and a summary note of approaches to using these tools to prepare data for machine learning. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 18 pages, 7 figures

arXiv:2210.09394 [pdf]

Review Learning: Alleviating Catastrophic Forgetting with Generative Replay without Generator

Authors: Jaesung Yoo, Sunghyuk Choi, Ye Seul Yang, Suhyeon Kim, Jieun Choi, Dongkyeong Lim, Yaeji Lim, Hyung Joon Joo, Dae Jung Kim, Rae Woong Park, Hyeong-Jin Yoon, Kwangsoo Kim

Abstract: When a deep learning model is sequentially trained on different datasets, it forgets the knowledge acquired from previous data, a phenomenon known as catastrophic forgetting. It deteriorates performance of the deep learning model on diverse datasets, which is critical in privacy-preserving deep learning (PPDL) applications based on transfer learning (TL). To overcome this, we propose review learni… ▽ More When a deep learning model is sequentially trained on different datasets, it forgets the knowledge acquired from previous data, a phenomenon known as catastrophic forgetting. It deteriorates performance of the deep learning model on diverse datasets, which is critical in privacy-preserving deep learning (PPDL) applications based on transfer learning (TL). To overcome this, we propose review learning (RL), a generative-replay-based continual learning technique that does not require a separate generator. Data samples are generated from the memory stored within the synaptic weights of the deep learning model which are used to review knowledge acquired from previous datasets. The performance of RL was validated through PPDL experiments. Simulations and real-world medical multi-institutional experiments were conducted using three types of binary classification electronic health record data. In the real-world experiments, the global area under the receiver operating curve was 0.710 for RL and 0.655 for TL. Thus, RL was highly effective in retaining previously learned knowledge. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2005.08870 [pdf, other]

Topology design of two-fluid heat exchange

Authors: Hiroki Kobayashi, Kentaro Yaji, Shintaro Yamasaki, Kikuo Fujita

Abstract: Heat exchangers are devices that typically transfer heat between two fluids. The performance of a heat exchanger such as heat transfer rate and pressure loss strongly depends on the flow regime in the heat transfer system. In this paper, we present a density-based topology optimization method for a two-fluid heat exchange system, which achieves a maximum heat transfer rate under fixed pressure los… ▽ More Heat exchangers are devices that typically transfer heat between two fluids. The performance of a heat exchanger such as heat transfer rate and pressure loss strongly depends on the flow regime in the heat transfer system. In this paper, we present a density-based topology optimization method for a two-fluid heat exchange system, which achieves a maximum heat transfer rate under fixed pressure loss. We propose a representation model accounting for three states, i.e., two fluids and a solid wall between the two fluids, by using a single design variable field. The key aspect of the proposed model is that mixing of the two fluids can be essentially prevented without any penalty scheme. This is because the solid constantly exists between the two fluids due to the use of the single design variable field. We demonstrate the effectiveness of the proposed approach through three-dimensional numerical examples in which an optimized design is compared with a simple reference design, and the effects of design conditions (i.e., Reynolds number, Prandtl number, design domain size, and flow arrangements) are investigated. △ Less

Submitted 5 May, 2020; originally announced May 2020.

Comments: 19 pages, 17 figures

arXiv:1504.06828 [pdf, other]

A bi-convex optimization problem to compute Nash equilibrium in n-player games and an algorithm

Authors: Vinayaka Yaji, Shalabh Bhatnagar

Abstract: In this paper we present optimization problems with biconvex objective function and linear constraints such that the set of global minima of the optimization problems is the same as the set of Nash equilibria of a n-player general-sum normal form game. We further show that the objective function is an invex function and consider a projected gradient descent algorithm. We prove that the projected g… ▽ More In this paper we present optimization problems with biconvex objective function and linear constraints such that the set of global minima of the optimization problems is the same as the set of Nash equilibria of a n-player general-sum normal form game. We further show that the objective function is an invex function and consider a projected gradient descent algorithm. We prove that the projected gradient descent scheme converges to a partial optimum of the objective function. We also present simulation results on certain test cases showing convergence to a Nash equilibrium strategy. △ Less

Submitted 26 April, 2015; originally announced April 2015.

Showing 1–4 of 4 results for author: Yaji