From the course: Data Pipeline Automation with GitHub Actions Using R and Python

Unlock the full course today

Join today to access over 24,900 courses taught by industry experts.

Data pipeline scope and requirements

Data pipeline scope and requirements

- [Instructor] In this video, we will define the Data Scope and Data Pipeline Requirements. Before getting started with the scope and requirements, let's first define what is the data pipeline. A simple definition of a data pipeline is the process of moving data from one data source to another. In most cases, it includes intermediate steps such as data processing, cleaning, data transformation, aggregation, and creating new fields. The process also defined as ETL, which stands for extract, transform, and Load. Typically, the common terms of the different stages of the data in this process are; Raw for the data source, Calculation for the intermediate steps and Normalized for the final output. Moving forward, we'll refer to our raw data, the AI API as the source of raw data and the process data as normalized. In the previous chapter, we saw the process of pulling data from the API to our local machine where the API in this case is our raw data source, which comes in a JSON format. And…

Contents