0% found this document useful (0 votes)

91 views

Lecture Notes: Data Ingestion For Structured/Unstructured Data

The document discusses data ingestion for structured and unstructured data. It introduces Apache Sqoop for ingesting structured data from databases and Apache Flume for ingesting unstructured data from various sources. It describes the challenges of data ingestion due to the variety of data sources and formats and the need for efficient ingestion to generate business insights.

Uploaded by

Yuvaraj V, Assistant Professor, BCA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views

Lecture Notes: Data Ingestion For Structured/Unstructured Data

Uploaded by

Yuvaraj V, Assistant Professor, BCA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Lecture Notes: Data Ingestion For Structured/Unstructured Data

In this module, you learnt about data ingestion for structured/unstructured data. You were first
introduced to what data ingestion is and then you understood Apache Sqoop - tool for ingesting
structured data and Apache Flume - tool for ingesting unstructured data. You got to know how
these tools are used for ingesting data and how insights are generated from the ingested data in
a typical production setup in the industry.

1.1 Data ingestion and the challenges involved in it

Data ingestion, can be referred to as the process of absorbing data for immediate use or
storage. It is a bridge to transfer data from the source to the destination such as HDFS, where it
can be used efficiently.

But there can be numerous data sources and the data can be in a plethora of formats. So, it
becomes a challenge for businesses across the globe to ingest data efficiently and process it at
a reasonable speed in order to get business insights. Moreover, the velocity at which data is
being generated nowadays is rapid, and the volume of data being generated is humongous.
With such colossal volume of data being generated at such an accelerated velocity, it becomes
a challenge to ingest data and process it at a proper pace so that businesses can get the
desired analysis at the required time.
Moreover, to generate any business insights, you have to prioritise the sources of data you will
analyse, filter out data you don’t want, and set up a system that allows you to draw conclusions
from this data.

Data ingestion tools, such as Sqoop, Flume, Kafka, Gobblin, etc., which have come up in the
recent past, can help alleviate these challenges and generate business insights.
Along with the above-discussed challenges, there are network-related challenges as well in a
typical production setup for data ingestion.

1.2 Key Functions of Data Ingestion

Data ingestion involves the key functions of:

1. Data collection: When beginning to conduct any analysis, you have to first collect data. This
is where you analyse the sources from where the data has to be imported, based on the
requirement, out of the many resources available.

2. Data validation: Once you prioritise the resources and collect the data, you have to validate
it so that the unwanted data can be filtered out.

3. Data routing: Next, you have to route the validated data to its particular destination such as
HBase, HDFS, Hive, or some other system, where it will be further analysed.

Once data has been successfully imported, validated, and routed to its particular destination,
data processing tools are run on the data to get the desired output. You may also use business
intelligence (BI) tools and business analytics (BA) tools to get meaningful insights.

1.3 Data Ingestion Tools

There are multiple sources of data. We have structured data coming from RDBMS,
multi-structured data coming from social media, streaming data and real-time data coming in
from back-end servers, weblogs, etc.

To ingest data from all such sources, the following commands and tools can be used:

1. File transfer using commands: Use commands such as put to copy files from a local file
system to HDFS, and get to copy files from HDFS back to the local file system. For data
transformation, however, you cannot depend on this method.

2. Apache Sqoop: Sqoop is short for SQL to Hadoop. It is used for importing data from RDBMS
to Big Data Ecosystem (Hive, HDFS, HBase, etc.) and exporting the data back to RDBMS after
it gets processed in the Big Data Ecosystem. It was created by Cloudera and was then open
sourced.

3. Apache Flume: Flume is a distributed data collection service for collecting, aggregating, and
transporting large amounts of real-time data from various sources to a centralised place, where
the data can be processed. It was released as an open source service.

4. Apache Kafka: Kafka is a fast, scalable distributed system that can handle a high volume of
data; it enables programmers to pass messages from one point to another. Apache Kafka was
developed by LinkedIn; later, it became open source.
5. Apache Gobblin: Gobblin is an open source data ingestion framework for extracting,
transforming, and loading a large volume of data from different data sources. It supports both
streaming and batch data ecosystems. Gobblin is LinkedIn's Data Ingestion Platform.

Apart from the tools you have seen till now, tools such as Apache Storm, Apache Chukwa,
Apache Spark, etc. are used for ingesting data the way we want.

1.4 Types of Data

Since data can be in any type and choosing a particular tool for data ingestion depends a lot on
the type of data we are going to ingest, let’s have a quick refresher on the types of data.

The types of data can be categorized as follows:

1. Structured data: It is organised and can be stored in databases, i.e. in tables with rows and
columns. Therefore, it has the advantage of being entered, stored, queried, and analysed
efficiently using Structured Query Language(SQL). In other words, structured data is data that
can be read easily by machines. Examples include Aadhaar data, financial data, the metadata
of files, etc.

2. Unstructured data: We can naively describe unstructured data as a complement to

structured data, i.e. it cannot be easily organised and stored in databases. However, this data
may be stored and retrieved using NoSQL databases. Examples of unstructured data include
images, audio, video, chat messages, etc., which are usually generated and consumed by
humans. It’s not surprising that the vast majority (~80%) of the data being created in today’s
world is unstructured.

3. Semi-structured data: There is no predefined schema for semi-structured data. In terms of

readability, it sits between structured and unstructured data. XML and JSON files are examples
of semi-structured data. Emails are also examples of semi-structured data since they contain
fields such as ‘From’, ’To’, ’Subject’, and ’Body’. It maintains internal tags and markings that
identify separate data elements; this enables information grouping and the creation of
hierarchies among the elements. However, this schema does not constrain the data as in an
RDBMS, e.g. the ’Subject’ and ’Body’ fields may contain text of any size, thereby making it
difficult for machines to read them.

1.5 File Formats And Why They are Important?

We are now familiar with the types of data, but each type of data can be stored in a plethora of
file formats, and each format has its advantages and disadvantages. When it comes to data
ingestion, file formats play a crucial role. A file format represents a way in which the information
is stored or encoded in a computer. Choosing a particular file format is important if you want to
have maximum efficiency in terms of factors such as processing power, network bandwidth,
available storage, etc. A file format directly affects the processing power of the system ingesting
the data, the capacity of the network carrying the data, and the available storage for storing the
ingested data. Following are some of the widely used file formats:
1. Text/CSV: CSV stands for Comma-Separated Values. This is the most commonly used file
format for exchanging huge data between Hadoop and external systems. A CSV file has very
limited support for schema evolution and it does not support block compression. It is not
compact.

2. XML and JSON: XML stands for Extensible Markup Language, which defines a set of rules
using which documents can be encoded in a format that is both machine-readable and
human-readable. JSON stands for JavaScript Object Notation and is an open-standard file
format consisting of key-value pairs. Since both are text files, they don’t support block
compression and are not compact. Splitting is very tricky in these files as Hadoop doesn’t
provide a built-in InputFormat for either. Since splitting is tricky, these files cannot be split easily
to be processed in parallel in Hadoop.

3. Sequence Files: These are binary files and store data as binary key-value pairs. A binary file
is a file stored in binary format. These are binary files, and, hence, are more compact than text
files. A binary file is more compact because in a binary file each byte has 256 possible values
as opposed to pure unextended ASCII which only has only 128 so immediately it is twice as
compact. Sequence files support block compression and can be split and processed in parallel,
due to which they are extensively used in MapReduce jobs.

4. Avro: This is the language-neutral data serialization system developed within Apache’s
Hadoop project. Serialization is the process of turning data structures into a format that can be
used for either storage or transmission over a network. Language-neutral means Avro files can
be easily read later, even from a language different from the one used to write the file. Avro files
are self-describing, compressible, and splittable thus suitable for MapReduce jobs as they can
be split and processed in parallel. These are binary files and are, hence, more compact than
text files. Avro also supports schema evolution, which means that the schema used to read the
file doesn’t have to match the schema used to write the file.

Based on your requirement, you may choose the appropriate file format for storing data.

Schema Evolution: To understand schema evolution, suppose you are working on a particular
schema of a database in a software company. Now a requirement comes from the client that
you need to update the schema based on some requirements. So can you go ahead and
directly update it? Well, No! This is because there will be applications running and getting data
based on the current schema. Now, if you just update schema without considering anything then
those applications will get affected. So you need to evolve schema in such a way that it caters
to the new requirements as well as to the existing applications. This is schema evolution.

Block Compression: Since Hadoop stores large files by splitting them into blocks, it will be
best if the individual blocks can be compressed. Thus block compression is the process of
compressing each individual block.

2.1 Sqoop: Introduction And Architecture

Apache Sqoop — short for ‘SQL to Hadoop’, which is used for ingesting relational data. Sqoop
provides efficient, bi-directional data transfer between Hadoop and relational databases - in
parallel. The data can be imported directly in the HDFS or to HBase or Hive tables as per the
use case.

When you use Sqoop to transfer data, the dataset being transferred is split into multiple
partitions, and a map-only job is launched. Individual mappers are now responsible for the
transfer of each slice/partition of the dataset. The metadata of the database is used to handle
each data record in a type-safe manner.
Once Sqoop connects to the database, it used JDBC to examine the table to be imported by
retrieving a list of all the columns and their SQL data types. The SQL data types (integer,
varchar, etc.) can be mapped to Java data types (Integer, String, etc.). Sqoop has a code
generator which creates a table-specific java class to hold the extracted records from the table
by using the information given by the JDBC about the data types, etc. Then Sqoop connects to
the cluster to submit a MapReduce job using the java class generated. The dataset being
transferred is split into multiple partitions, and a map-only job is launched. The output of this is a
set of files containing the imported data. Since the import process is performed in parallel, the
output is in multiple files.

2.2 Import The Data: Sqoop Import

The basic ‘import’ command used for importing data from RDBMSs to the HDFS is —

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/sqoopdemo --username

root --password cloudera --table Categories

After firing this command on your terminal, data from the ‘Categories’ table from the database in
MySQL is transferred to the HDFS. You can see the transferred data by running this command
on your console:

hadoop fs -cat Categories/part-m-*

There will be files generated like part-m-00000, part-m-00001, etc. depending on the number of
mappers used by Sqoop during import. Each mapper works on a portion of the data and imports
it to the corresponding file like part-m-00000.

There are two steps involved in the execution of the ‘import’ command:
1. First, Sqoop connects to the database and fetches the table metadata — the number of
columns and the column names and their types. In the ‘Categories’ table, it finds that there are
six columns, namely, ItemCode, ItemName, Category, Stock, LastStockedOn, and MRP with
VARCHAR, VARCHAR, VARCHAR, INT, DATE, and NUMBER as their data types, respectively.
Based on the metadata retrieved, Sqoop internally generates a Java class and compiles it using
the JDK and the Hadoop libraries available on the machine.

2. Next, Sqoop connects to the Hadoop cluster and submits a MapReduce job where each
mapper transfers a slice of the table’s data. As multiple mappers run at the same time, the
transfer of data between the database and the Hadoop cluster takes place in parallel.

Note: For all the tables imported using the given ‘import’ command structure, the primary key is
mandatory. If there’s no primary key, then the ‘--split-by’ parameter has to be specified like so:

sqoop import --connect <database connect info> --username <username> --password

<password> --table <tablename>

For EC2 Users:

Note: The target directories should be made via hdfs user, and then the owner has to be
changed to root in order to avoid permission issues in AWS EC2. Wherever required, the
commands for creating the directory and changing the permissions have been given.

Create a /user/root in hdfs and change permissions for it by using the following commands:
[root@ip-10-0-0-163 ~]# su - hdfs
[hdfs@ip-10-0-0-163 ~]$ hadoop fs -mkdir /user/root
[hdfs@ip-10-0-0-163 ~]$ hadoop fs -chown root /user/root
[hdfs@ip-10-0-0-163 ~]$ exit

Fire this command:

sqoop import --connect jdbc:mysql://localhost:3306/sqoopdemo --username root

--password 123 --table Categories

To see the output:

hadoop fs -cat Categories/part-m-*

To list databases using Sqoop, you can run the following command:

sqoop-list-databases --connect jdbc:mysql://quickstart.cloudera:3306 --username root

--password cloudera

To list the tables of a database using Sqoop, you can run the following command:

sqoop-list-tables --connect jdbc:mysql://quickstart.cloudera:3306/sqoopdemo

--username root --password cloudera
2.3 Import The Data: Sqoop Import Using Directory Parameters

The --target-dir parameter can be used to specify the target directory where you want to store
your imported data. The command to do this is —

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/sqoopdemo --username

root --password cloudera --table Categories --target-dir /input/data/Categories

You can see the output by using the following command:

hadoop fs -cat /input/data/Categories/part-m-*

The issue with the previous approach is that every time you run the ‘import’ script, you need to
change the target directory. To overcome this issue, you need to import the data to a
warehouse directory. The --warehouse-dir parameter can be used for multiple table imports.
That is, if you need to import the ‘Categories’ table followed by the ‘Products’ table, then you
just need to change the table name in each ‘import’ command like so:

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/sqoopdemo --username

root --password cloudera --table Categories --warehouse-dir /input/data/tables

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/sqoopdemo --username

root --password cloudera --table Products --warehouse-dir /input/data/tables

As you can see, you need only change the table name without changing the target location. As
a result, the data from both tables is getting imported at the specific location (/input/data/tables)
in the folders created of the name same as the tables getting imported.

You can see the output by using the following command:

hadoop fs -cat /input/data/tables/Categories/part-m-*

Note: To run commands on the Products table, you first need to have it in your ‘sqoopdemo’
database. Create a ‘Products’ table of your choice if you want to run the preceding command.

For EC2 Uses:

Create /input/data/ in hdfs and change permissions for it by using the following commands:
[root@ip-10-0-0-163 ~]# su - hdfs
[hdfs@ip-10-0-0-163 ~]$ hadoop fs -mkdir -p /input/data/
[hdfs@ip-10-0-0-163 ~]$ hadoop fs -chown root /input/data/
[hdfs@ip-10-0-0-163 ~]$ exit

Fire this command:

sqoop import --connect jdbc:mysql://localhost:3306/sqoopdemo --username root
--password 123 --table Categories --target-dir /input/data/Categories

To see the output:

hadoop fs -cat /input/data/Categories/part-m-*

Note: If we create directories directly such as input/data/tables instead of /input/data/tables, we

don’t have to change permissions every time. So in the commands ahead, we have created
directories without /. This is specific to AWS EC2. In videos, you are shown to create directories
with /, but you can do without /, as shown in all the commands ahead. We are not creating the
directories inside / to avoid permission issues in AWS EC2.

Fire this command:

sqoop import --connect jdbc:mysql://localhost:3306/sqoopdemo --username root

--password 123 --table Categories --warehouse-dir input/data/tables

To see the output:

hadoop fs -cat input/data/tables/Categories/part-m-*

2.4 Import-All-Tables Using Sqoop

Till now, you were focussed on importing just one table. What if you need to import all tables
from the database. You’ll obviously prefer not to write individual Sqoop Import commands for
every table! This is where you will use the ‘import-all-tables’ command. This command retrieves
a list of all the tables from the database and calls the ‘import’ tool to import the data of each
table in a sequential manner to avoid putting excessive load on the database server.

Note: To avoid conflict with the previously created ‘Categories’ directory in segment 3, remove it
using —

hadoop fs -rm -r Categories

The command for importing all tables is:

sqoop import-all-tables --connect jdbc:mysql://quickstart.cloudera:3306/sqoopdemo

--username root --password cloudera --warehouse-dir /input/data/alltables

You can see the folders created for each table using this command:

hadoop fs -ls /input/data/alltables

To see the imported data:

hadoop fs -cat /input/data/alltables/Categories/part-m-*

Note: In other words, if you want to import multiple tables but not all tables by writing one
command, then use --exclude-tables parameter to exclude the tables you do not want to import.

For EC2 Users:

Remove the previously created Categories table folder to avoid conflict:

hadoop fs -rm -r Categories

Fire this command:

sqoop import-all-tables --connect jdbc:mysql://localhost:3306/sqoopdemo --username

root --password 123 --warehouse-dir input/data/alltables

To see the output:

hadoop fs -ls input/data/alltables

This will show the folders with each table name.

To see the imported data:

hadoop fs -cat input/data/alltables/Categories/part-m-*

2.5 Import Specific Rows Using Sqoop

In a typical database, you will have plenty of data, and in most situations, you won't require all of
this data. Instead, you may specifically need some rows that satisfy some properties. You can
do this in SQL by using the 'WHERE' clause. By using the command line parameter --where you
can specify the SQL condition that the imported rows should satisfy. The command is —

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/sqoopdemo --username

root --password cloudera --table Categories --where "Category = 'Grocery'" --target-dir
/groceryproducts

You can see the output by running the following command:

hadoop fs -cat /groceryproducts/part-m-*

However, using --where parameter can lead to performance issues.

For EC2 Users:

Fire this command:

sqoop import --connect jdbc:mysql://localhost:3306/sqoopdemo --username root
--password 123 --table Categories --where "Category = 'Grocery'" --target-dir
groceryproducts

To see the output:

hadoop fs -cat groceryproducts/part-m-*

2.6 SQL Queries Within Sqoop Import

To use SQL queries within Sqoop Import commands, use the --query parameter. The command
looks like this:

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/sqoopdemo --username

root --password cloudera --query 'SELECT ItemCode, Item, ManufacturedBy, Stock,
LastStockedOn FROM Stocks JOIN Manufacturers USING(MCode) WHERE
$CONDITIONS' --split-by Itemcode --target-dir /stocksinfo

Here, the ‘--query’ parameter holds the query, the ‘--split-by’ parameter indicates the column
that is to be used for slicing the data into parallel tasks ( by default this column is the primary
key of the main table). Remember, Sqoop connects to the Hadoop cluster and submits a
MapReduce job where each mapper transfers a slice of the table’s data. As multiple mappers
run at the same time, the transfer of data between the database and the Hadoop cluster takes
place in parallel.

Also, ‘$CONDITIONS’ is a placeholder that would be substituted automatically by Sqoop to

indicate which slice of data is to be transferred by each task. This type of import command is
also known as free-form query import. Sqoop does not use the database catalogue to fetch the
metadata while doing free-form query imports.

You can see the output by running the following command:

hadoop fs -cat /stocksinfo/part-m-*

For EC2 Users:

Fire this command:

sqoop import --connect jdbc:mysql://localhost:3306/sqoopdemo --username root

--password 123 --query 'SELECT ItemCode, Item, ManufacturedBy, Stock, LastStockedOn
FROM Stocks JOIN Manufacturers USING(MCode) WHERE $CONDITIONS' --split-by
Itemcode --target-dir stocksinfo

To see the output:

hadoop fs -cat stocksinfo/part-m-*

2.7 Incremental Import Using Sqoop

So far, you have covered use cases that import data as a one-time operation. It is possible that
you can use Hadoop as an active backup for your database, i.e. you can keep the data in
Hadoop in sync with the relational database. It is in such cases that incremental import comes
into the picture. When the table is getting new rows and no existing rows are changed, you need
to import just the new rows. To achieve this incremental import, you have to use the parameter
--incremental with its value as append. Also, you need a mechanism to indicate how to track
new rows. You can use the primary key of the table to identify the new rows. The parameter
--check-column can be used to tell Sqoop which column to be checked to find if the row is new.
The --last-value parameter indicates the value of the said column for the row inserted last. In
other words, Sqoop checks the column for rows that have values greater than the last value and
inserts just these rows.

Note: After running the initial ‘import’ command, remember to insert the two rows in the
Morning_Shift table as mentioned in the initial Database Setup document given in segment 2.

The commands for incremental import using the ‘--last-value’ parameter are —

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/sqoopdemo --username

root --password cloudera --table Morning_Shift --target-dir /morningshift

To see the output:

hadoop fs -cat /morningshift/part-m-*

Insert rows by logging into MySQL:

mysql -u root -p
Enter password: cloudera

use sqoopdemo;

insert into Morning_Shift values

(5, 'Paulo', '22215125'),
(6, 'Sofia', '15234567');

exit;

Fire this command:

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/sqoopdemo --username

root --password cloudera --table Morning_Shift --incremental append --check-column
AgentId --last-value 4 --target-dir /morningshift

So, in this import example, Sqoop checks the ‘agentid’ column for all rows and only those rows
with their ‘agentid’ value greater than 4 are imported.
You can check the output by running the following command:

hadoop fs -cat /morningshift/part-m-*

For EC2 Users:

Fire this command:

sqoop import --connect jdbc:mysql://localhost:3306/sqoopdemo --username root

--password 123 --table Morning_Shift --target-dir morningshift

To see the output:

hadoop fs -cat morningshift/part-m-*

Insert rows by logging into MySQL:

mysql -u root -p
Enter password: 123

use sqoopdemo;

insert into Morning_Shift values

(5, 'Paulo', '22215125'),
(6, 'Sofia', '15234567');

exit;

Fire this command:

sqoop import --connect jdbc:mysql://localhost:3306/sqoopdemo --username root

--password 123 --table Morning_Shift --incremental append --check-column AgentId
--last-value 4 --target-dir morningshift

To see the output:

hadoop fs -cat morningshift/part-m-*

Note: What if you want to import the updated rows as well along with the newly added rows?
Sqoop provides --last-modified parameter to do so.

2.8 Export The Data: Sqoop Export

To transfer processed or backed-up data from Hadoop back to the database use sqoop export.
Note: Remember to create the table ‘Consolidated_Stocks‘ as mentioned in the Database
Setup document given in segment 2.

The commands are:

Create table where exported data will be put by logging into MySQL.

mysql -u root -p
Enter password: cloudera

use sqoopdemo;

create table Consolidated_Stocks (ItemCode varchar(50), Item varchar(50),

ManufacturedBy varchar(50), Stock varchar(50), LastStockedOn varchar(50), PRIMARY
KEY (ItemCode));

exit;

Fire this command:

sqoop export --connect jdbc:mysql://quickstart.cloudera:3306/sqoopdemo --username

root --password cloudera --table Consolidated_Stocks --export-dir /stocksinfo

Log in to the MySQL to see the output.

mysql -u root -p
Enter password: cloudera

use sqoopdemo;

select * from Consolidated_Stocks;

exit;

The command works in the following way:

1. Sqoop connects to the database and extracts the metadata information about the table to
which data is to be loaded — the number of columns, data types of columns, and more. This
information is used by Sqoop to create and compile the Java class used in the MapReduce job.
2. Sqoop connects to the cluster and submits the MapReduce job to transfer the data from
Hadoop to the database table in parallel.

Note: What happens when the data is corrupted, say, for instance, the type of the data in any of
the columns does not match with the expected type. In this case, the export will fail. Sqoop does
not skip rows when it encounters an error, so it must be fixed before running the command
again.
For EC2 Users:

Create a table where exported data will be put by logging into MySQL.

mysql -u root -p
Enter password: 123

use sqoopdemo;

create table Consolidated_Stocks (ItemCode varchar(50), Item varchar(50),

ManufacturedBy varchar(50), Stock varchar(50), LastStockedOn varchar(50), PRIMARY
KEY (ItemCode));

exit;

Fire this command:

sqoop export --connect jdbc:mysql://localhost:3306/sqoopdemo --username root

--password 123 --table Consolidated_Stocks --export-dir stocksinfo

Login in the MySQL to see the output.

mysql -u root -p
Enter password: 123

use sqoopdemo;

select * from Consolidated_Stocks;

exit;

2.9 Sqoop Jobs

Now that you are familiar with the import and export commands in Sqoop, it might have
occurred to you that instead of writing and executing them every single time you wanted to carry
out an operation, it would be great if you could have some jobs that were scheduled to run at a
particular time on their own.

To create Sqoop jobs that could be re-run as and when required, the command is —

sqoop job --create morningshift -- import --connect

jdbc:mysql://quickstart.cloudera:3306/sqoopdemo --username root --password cloudera
--table Morning_Shift --incremental append --check-column AgentId --last-value 4
--target-dir /morningshift

Note: There is a space before import (-- import) in the above command.
You can use the following command to run this import job:

sqoop job --exec morningshift

Enter password: cloudera

The Sqoop metastore, which is a metadata repository, stores all the saved jobs. You can view
the parameters of a saved job using the following command:

sqoop job --show morningshift

Enter password:cloudera

To get a list of all your saved jobs, run this command:

sqoop job --list

To delete a job that is no longer needed, run this command:

sqoop job --delete morningshift

Scheduling and using Sqoop jobs effectively will be discussed in Module 4: Oozie of this course.

For EC2 Users:

Fire these command:

sqoop job --create morningshift -- import --connect

jdbc:mysql://localhost:3306/sqoopdemo --username root --password 123 --table
Morning_Shift --incremental append --check-column AgentId --last-value 4 --target-dir
morningshift

sqoop job --exec morningshift

Enter password:123

sqoop job --show morningshift

Enter password:123

sqoop job --list

sqoop job --delete morningshift

3.1 Case Study Introduction And Database Setup

We’ll saw a case study for Sqoop that was archiving relational databases in Hadoop using
different parameters available in Sqoop. For this case study, we made use of the Enron Email
Dataset. Enron Corporation was a US energy trading and utility company. When the company
collapsed in 2001, almost 0.5M- 5,00,000 internal emails were made public. This dataset is
available as a SQL dump. We loaded this dataset into MySQL and we saw how to import it to
Hadoop using Sqoop.

3.2 Handling NULL Via Sqoop

In relational databases, some of the columns/attributes can be indicated as optional. NULL

values are used to indicate the missing information, and the database stores it as an extra bit in
addition to the column's possible values.

Sqoop supports importing data to file formats that do not generally support the NULL value. So,
it is required to encode this missing value or NULL value in the data.

Sqoop encodes it to a string constant 'null' in lowercase. But there is a problem with this
approach. If your data itself contains NULL as a string/regular value rather than a missing value,
this doesn't help. Also, it is possible that further processing steps expect a different substitution
for missing values.

In such cases, you can override the NULL substitution string with the --null-string and
--null-non-string parameters to any required value.

For text-based columns’ missing values, we use the --null-string parameter, and for other
columns, we use the --null-non-string parameter. For example, we have data for three columns
— ID number, Name, and Address. Now, ID is an integer data type column, and address and
name are of character array/string data types. If ID number has a NULL value, it gets encoded
with the value specified for the --null-non-string. If Address has a NULL value, it gets encoded
with the value specified for the --null-string.

Let’s look at an import command now:

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/enron --username root
--password cloudera --table employeelist --null-string 'NA' --null-non-string '\\N'
--warehouse-dir /enron

To see the output:

hadoop fs -cat /enron/employeelist/part-m-00000

So you see, how NULL was replaced by 'NA' in the files. In the same way, if any other non-text
based column would have contained NULL then it would have been replaced by the value
defined for --null-non-string parameter.

For EC2 Users:

Note: If we create directories directly such as enron instead of /enron, we don’t have to change
permissions every time. So in the commands ahead, we have created directories without /. This
is specific to AWS EC2. In videos, you are shown to create directories with /, but you can do
without /, as shown in all the commands ahead. We are not creating the directories inside / to
avoid permission issues in AWS EC2.

Create a /user/root in HDFS and change permissions for it using the following commands:
[root@ip-10-0-0-163 ~]# su - hdfs
[hdfs@ip-10-0-0-163 ~]$ hadoop fs -mkdir /user/root
[hdfs@ip-10-0-0-163 ~]$ hadoop fs -chown root /user/root
[hdfs@ip-10-0-0-163 ~]$ exit

Fire this command:

sqoop import --connect jdbc:mysql://localhost:3306/enron --username root --password

123 --table employeelist --null-string 'NA' --null-non-string '\\N' --warehouse-dir enron

To see the output:

hadoop fs -cat enron/employeelist/part-m-00000

3.3 Handling Mappers Via Sqoop

Sqoop uses four map tasks to achieve the import by default. However, we can control the
number of mappers by using the --num-mappers parameter. Moreover, increasing the number
of mappers does not necessarily reduce the processing time. It is possible that the database
gets overwhelmed by a large number of mappers and loses time in context, switching between
these tasks rather than getting the data transferred. The best method to determine the optimal
number of mappers is to go with trial and error. You can set the number of mappers at a starting
value, increase it, and test until no further improvement is achieved.

The command is:

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/enron --username root
--password cloudera --table recipientinfo --num-mappers 10 --warehouse-dir /enron

To see the output:

hadoop fs -cat /enron/recipientinfo/part-m-00000

For EC2 Users:

Fire this command:

sqoop import --connect jdbc:mysql://localhost:3306/enron --username root --password

123 --table recipientinfo --num-mappers 10 --warehouse-dir enron

To see the output:

hadoop fs -cat enron/recipientinfo/part-m-00000

3.4 Importing Data In Binary File Format Via Sqoop

We can import data in a binary file format via Sqoop. Binary formats are used to store images,
PDFs, etc. If the text itself contains characters that are used as separators in the text file (CSV),
then it is preferable to use binary formats.To import data in a sequence file format use the
following command:

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/enron --username root

--password cloudera --table referenceinfo --as-sequencefile --warehouse-dir /enron

To see the output:

hadoop fs -cat /enron/referenceinfo/part-m-00000

Press Enter after the data is displayed.

To import data in an Avro file format using the following commands:

Remove the previously created directory for the referenceinfo table:

hadoop fs -rm -r -f /enron/referenceinfo

Import:

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/enron --username root

--password cloudera --table referenceinfo --as-avrodatafile --warehouse-dir /enron

To see the output:

hadoop fs -ls /enron/referenceinfo

You can see that all the files created have a .avro suffix.

hadoop fs -cat /enron/referenceinfo/part-m-00000.avro

Press Enter after the data is displayed.

For EC2 Users:

Fire this command:

sqoop import --connect jdbc:mysql://localhost:3306/enron --username root --password

123 --table referenceinfo --as-sequencefile --warehouse-dir enron

To see the output:

hadoop fs -cat enron/referenceinfo/part-m-00000

Press Enter after the data is displayed.

Remove the previously created directory for the referenceinfo table:

hadoop fs -rm -r -f /enron/referenceinfo

Fire this command:

sqoop import --connect jdbc:mysql://localhost:3306/enron --username root --password

123 --table referenceinfo --as-avrodatafile --warehouse-dir enron

To see the output:

hadoop fs -ls enron/referenceinfo

You can see that all the files created have a .avro suffix.

hadoop fs -cat enron/referenceinfo/part-m-00000.avro

Press Enter after the data is displayed.

3.5 Compression Via Sqoop

To import data in a compressed state use the following command:

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/enron --username root

--password cloudera --table message --compress --warehouse-dir /enron
To see the output:

hadoop fs -ls /enron/message

You can see that all files are compressed and hence have a .gz suffix in their filename.

For EC2 Users:

Fire this command:

sqoop import --connect jdbc:mysql://localhost:3306/enron --username root --password

123 --table message --compress --warehouse-dir enron

To see the output:

hadoop fs -ls enron/message

You can see that all files are compressed and, hence, have a .gz suffix in their filename.

Once you get this Enron Email Data into Hadoop, you can run data processing tools on the data
to get the desired output. You may also use business intelligence (BI) tools and business
analytics (BA) tools to get meaningful insights. In a typical production setup in the industry,
Sqoop is used to ingest data such as Enron Email Data from RDBMS into Hadoop, and then,
the analysis is performed on that data.

Note: We created the database inside MySQL and then imported data to Hadoop. But in a
practical scenario, we would already have enron database on-premise or on Cloud, and Sqoop
would directly connect to it to import data.

Note: Just as you have connected Sqoop to MySQL, you can also connect Sqoop to other
RDBMSs like Oracle, DB2, etc.

4.1 Sqoop: Industry Use Case

Industry use case of Sqoop: There is a RDBMS system that hosts five years of legacy data
which needs to be imported to Hadoop and this imported data need to be processed inside the
Big Data Ecosystem to generate business insights. RDBMS system is on-premise and Hadoop
is on cloud and there is a networking layer involved between RDBMS system and Hadoop.
4.2 Sqoop In A Typical Production Architecture

You can see from the diagram how Sqoop fits in a typical production architecture and what are
the factors involved in building the connection between RDBMS and Hadoop to transfer data via
Sqoop. There are network challenges involved in the production setup and resolving the
network challenges play an important role in the successful transfer of the data. Once those
network challenges are taken care of, Sqoop can transfer data from RDBMS system to Hadoop.

4.3 Tuning Sqoop

If the size of the data to be imported is very large then it takes a lot of time for Sqoop to transfer
data between RDBMS and Hadoop. In such scenarios, you can tune Sqoop using boundary
queries and mappers to fetch data at a faster rate.

Moreover, we also have network latency issues when data transfer takes place between two
networks in different zones. Here, since RDBMS system is on-premise and Hadoop is on cloud
so network latency will also be a factor slowing down the transfer of data.

There is a concept of availability zones using which we can reduce network latency.

4.4 Generating Business Insights: Production Architecture

You can see from the diagram how Sqoop fits into the entire big data ecosystem in a typical
production setup in the industry to generate business insights.

5.1 Flume: Introduction And Architecture

Apache Flume, which is a tool for ingesting unstructured data. According to the Flume user
guide, Apache Flume is a distributed, reliable, and available system for efficiently collecting,
aggregating, and moving large amounts of log data from many different sources to a centralised
data store. Flume is also used for ingesting massive quantities of event data such as network
traffic data, social-media-generated data, data from email messages, etc. All of this is
unstructured and there are challenges involved in ingesting unstructured data.

The first challenge is overwhelmed-network. Unstructured data is getting generated at a very

rapid rate so as to ingest it at the same rate is difficult since the number of machines ingesting
data is generally small compared to the number of machines generating data. Hence, when
many servers generating data try to write data to a small cluster of machines ingesting data, it
leads to an overwhelmed network. Here, Flume acts as an intermediate system (buffer) between
the data production source and the target Hadoop system.

The second challenge involves latency issues. Servers are scattered across multiple geographic
locations but they all try to write data to a centralised Hadoop system. The servers which are at
a far geographic location will not be able to write data at a fast speed due to network lag which
is called as latency. Hence, you need a system that is extendable to such far-off locations.
Flume is extendable in such scenarios which means that you can just configure flume once and
extend it to anywhere we want with the same configuration. It will work the same way.

The third challenge is that the data might get lost in the network due to network related issues.
So you need to ensure you have a system that is fault tolerant (keeps account of the data sent
and received). Flume is fault-tolerant.

A flume agent is a Java application that generates or receives data and buffers it until it is
written to the next flume agent or a storage system. A chain of flume agents can be used to
move data from some data sources to the target HDFS or HBase in a scalable and durable
manner. The Flume agent has three components, namely, the source, the sink, and the
channel. Data is represented as ‘events’. The source receives or produces the data, which
contains events. The sink reads these events and sends them to the next flume agent or to the
target Hadoop system. The channel acts as a buffer to hold the data written by the source until
the sink successfully writes the data to the next stage.

5.2 Components Of Flume

An event can be considered equivalent to a data structure holding data. It consists of header(s)
and a body. Headers are key-value pair used to show routing information or other structured
information. The body contains the actual data which is an array of bytes. Data flows in the form
of these events only.
Sources are the components of a Flume agent which receive data from any application that
produces data. They take the data by listening to a port or the file system. Every source is at
least connected to one channel. Source writes the data in the form of events to channels. There
are different types of sources supported by Flume such as Avro Source, HTTP source, etc.
Sinks are components of a Flume agent which deliver the data to the final destination. A sink
continuously polls the connected channel to retrieve the events that were written by the source
in the channel. A sink writes events to the next hop (Flume agent) or the final destination. Once
it has successfully written the events to the next destination then it informs the channel to
remove the written events from the channel. There are different types of sinks available
supported by Flume such as Avro Sink, HDFS Sink, etc.

Channels are those components of a Flume agent which acts as a conduit between the source
and the sink. It is a buffer (in-memory queue) which keeps events till the sink writes them to the
next hop or the target destination. Multiple sources can write to the same channel, and multiple
sinks can read from the same channel, but a sink can only be connected to only one channel.
The number of sources, channels and sinks in a Flume agent is not restricted to one. These can
be multiple. In such a scenario we have other components as well to be taken into account. Let
see how the data flows in such a case.

Source uses channel processor to write events to the channels. The channels processor passes
the events to one or more interceptors. The interceptor reads, modifies and drop some events
as required and then sends them back to the channel processor. The channel processor then
passes the events to the channel selector. Channel selector determines how the events will now
move to the channels. There are two types of channel selectors mainly: replicating and
multiplexing. A replicating channel selector simply sends a copy of each event to all the
connected channels. This is the default channel selector. Multiplexing channel selector writes
events to the channels based on some criteria such as header information. Once the data is
available in the channel then it is used by the sinks. There can be more than one sink. In such a
case, we have sink groups. Sink groups contain one or more sink groups and each group has a
sink processor which determines which sinks carry out event processing. Once the sink is
selected it writes events to the next destination and removes the written events from the
channel.

Here is a diagram showing how all the components of Flume interact with each other. Have a
look at it to get more understanding on the above discussed other components.
5.3 Flume: Industrial Discussion

Industry use case of Flume: You have banking transactions going on and in real-time or
near-real-time, you have to infer which transaction is fraud and which is not. You got an
overview of how it is done using Flume and the architecture involved.
However, for things to happen in real-time or near-real-time, network latency should be
minimum which can be reduced using the concept of availability zones.

5.4 Sqoop Versus Flume

Now that you are familiar with both Sqoop and Flume, let’s compare and summarise the
features of Sqoop and Flume. This will help you determine which tool suits which use cases.

Sqoop is used to import data from RDBMS to HDFS, HBase, Hive and to export data from
HDFS to RDBMS. It is a tool for ingesting structured data. On the other hand, Flume is a
distributed data collection service for collecting, aggregating and transporting large amounts of
real-time data from various sources to a centralised place where the data can be processed. It
is a tool for ingesting unstructured data.

Sqoop is not event-driven which means that its functioning is on dependent on events and thus
suitable for moving data from and to RDBMS such as Oracle, MySQL, etc. On the other hand,
Flume has an event-based architecture which means it is dependant on events. The events can
be tweets generated on Twitter, log files of a server, etc.

Sqoop has a connector based architecture which means that JDBC connector is primarily
responsible for connecting with the data sources and to fetch data correspondingly while Flume
has an agent-based architecture which means that the Flume agent is responsible for the data
transfer.

Sqoop can transfer data in parallel for better performance while Flume scales horizontally and
multiple Flume agents can be configured to collect high volumes of data. Flume also has
several recoveries and failover mechanisms due to which it is highly reliable.
This will help you determine when to use which tool. You further got to know about the
companies that use Sqoop and Flume.

SwiftUI Essentials (Moeykens M.) (Z-Library)
No ratings yet
SwiftUI Essentials (Moeykens M.) (Z-Library)
195 pages
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
JSON & JSONPath
100% (1)
JSON & JSONPath
15 pages
Currency Converter PDF
71% (14)
Currency Converter PDF
15 pages
DP 900T00A ENU TrainerHandbook
100% (2)
DP 900T00A ENU TrainerHandbook
103 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Big Data Ingestion and Preparation Tools
No ratings yet
Big Data Ingestion and Preparation Tools
16 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Big Data Analytics - Unit 2
No ratings yet
Big Data Analytics - Unit 2
10 pages
big data unit 1
No ratings yet
big data unit 1
24 pages
ACFrOgAo1SpYCo1YmTJeiGbHKH22nYKAL3GLgRtzpk4R3gRbHCAsTnCSMxfKm0SFBNYGz7keG7rfZN Y3QVo gdxiQyqG - 6KLsY2icn
No ratings yet
ACFrOgAo1SpYCo1YmTJeiGbHKH22nYKAL3GLgRtzpk4R3gRbHCAsTnCSMxfKm0SFBNYGz7keG7rfZN Y3QVo gdxiQyqG - 6KLsY2icn
14 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
22 pages
BDA
No ratings yet
BDA
8 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
ECS765P - W6 - Big Data Ingestion and Storage
No ratings yet
ECS765P - W6 - Big Data Ingestion and Storage
34 pages
IDS_sem ans unit 1
No ratings yet
IDS_sem ans unit 1
10 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
UNIT-1 Bda Kalyan
No ratings yet
UNIT-1 Bda Kalyan
25 pages
Datasist: A Python-Based Library For Easy Data Analysis, Visualization and Modeling
No ratings yet
Datasist: A Python-Based Library For Easy Data Analysis, Visualization and Modeling
17 pages
32Study_of_Data_Ingestion_Tools
No ratings yet
32Study_of_Data_Ingestion_Tools
9 pages
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Big Data study 1
No ratings yet
Big Data study 1
77 pages
Chapter 2 Data Science (4)
No ratings yet
Chapter 2 Data Science (4)
8 pages
Big Data Analytics 1-5
100% (1)
Big Data Analytics 1-5
63 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
27 pages
Big Data Analytics Unit 1
No ratings yet
Big Data Analytics Unit 1
26 pages
BDA Report
No ratings yet
BDA Report
11 pages
Chapter 2 - Overview for Data Science
No ratings yet
Chapter 2 - Overview for Data Science
31 pages
MODULE 1 - ST
No ratings yet
MODULE 1 - ST
13 pages
Ds unit 3 notes
No ratings yet
Ds unit 3 notes
29 pages
UNIT 1_PPT
No ratings yet
UNIT 1_PPT
67 pages
Emergency chapter two(2)
No ratings yet
Emergency chapter two(2)
41 pages
Lecture Notes 2
No ratings yet
Lecture Notes 2
5 pages
Big Data Answers
No ratings yet
Big Data Answers
14 pages
2 emerging
No ratings yet
2 emerging
10 pages
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
module 2-3 fuba midterms
100% (1)
module 2-3 fuba midterms
5 pages
Big Data Analytics - Project
50% (2)
Big Data Analytics - Project
27 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
Bigdata Analysis: Streaming Twitter Data With Apache Hadoop and V Isualizing Using Biginsights
No ratings yet
Bigdata Analysis: Streaming Twitter Data With Apache Hadoop and V Isualizing Using Biginsights
5 pages
Cours BI 23 24 Session 4 2
No ratings yet
Cours BI 23 24 Session 4 2
46 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Course Material
100% (1)
Course Material
57 pages
Big Data With Hadoop
No ratings yet
Big Data With Hadoop
26 pages
8
No ratings yet
8
43 pages
BD Unit 1
No ratings yet
BD Unit 1
72 pages
Big Data Analytics and Its Applications
No ratings yet
Big Data Analytics and Its Applications
4 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
BDA-24_Lect (3-4)-(Fundamentals of Data Analysis)
No ratings yet
BDA-24_Lect (3-4)-(Fundamentals of Data Analysis)
15 pages
DA(Unit-1)
No ratings yet
DA(Unit-1)
45 pages
Unit 1: Introduction To Big Data: Types of Data and Their Characteristics
No ratings yet
Unit 1: Introduction To Big Data: Types of Data and Their Characteristics
7 pages
Hands-On Machine Learning Recommender Systems with Apache Spark
From Everand
Hands-On Machine Learning Recommender Systems with Apache Spark
Ernesto Lee
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Data Science: Chapter Two
No ratings yet
Data Science: Chapter Two
8 pages
Big Data NOTES and QB
No ratings yet
Big Data NOTES and QB
92 pages
Big Data Analytics QB
No ratings yet
Big Data Analytics QB
44 pages
Fbda Unit-1
No ratings yet
Fbda Unit-1
17 pages
Basics of Big Data Notes
No ratings yet
Basics of Big Data Notes
17 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
Module 1 Glossary What Is Big Data
No ratings yet
Module 1 Glossary What Is Big Data
2 pages
Jump Start MySQL: Master the Database That Powers the Web
From Everand
Jump Start MySQL: Master the Database That Powers the Web
Timothy Boronczyk
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
Technology in Health Care Model QP
No ratings yet
Technology in Health Care Model QP
3 pages
CH 01
No ratings yet
CH 01
33 pages
Technology in Health Care
No ratings yet
Technology in Health Care
4 pages
Elective-I-Fundamental of Networks-Assignment
No ratings yet
Elective-I-Fundamental of Networks-Assignment
42 pages
DEEP LEARNING - FDP
No ratings yet
DEEP LEARNING - FDP
18 pages
18mit13c U2
No ratings yet
18mit13c U2
13 pages
Dse-Iv E (Theory-Programming) : BATCH: 2019
No ratings yet
Dse-Iv E (Theory-Programming) : BATCH: 2019
11 pages
7.6.3 Exploring IPv6 Addressing On Routers
No ratings yet
7.6.3 Exploring IPv6 Addressing On Routers
3 pages
IPV-4 ROUTING-Assignment Questions With Solutions
No ratings yet
IPV-4 ROUTING-Assignment Questions With Solutions
15 pages
Open Ele-Cyber Security-II-NAAC Hours
No ratings yet
Open Ele-Cyber Security-II-NAAC Hours
7 pages
Programme:: B.Sc. CS/IT/CA
No ratings yet
Programme:: B.Sc. CS/IT/CA
12 pages
Basic IP Servies - Assignment
No ratings yet
Basic IP Servies - Assignment
20 pages
B.SC - IT-UG-CBCS-2019-STRUCTURE & SCHEME-19.3.21 (WITHOUT COP) - Merged-Merged
No ratings yet
B.SC - IT-UG-CBCS-2019-STRUCTURE & SCHEME-19.3.21 (WITHOUT COP) - Merged-Merged
54 pages
7.2.3 Routing Troubleshooting Tools
No ratings yet
7.2.3 Routing Troubleshooting Tools
7 pages
Lecture 17
No ratings yet
Lecture 17
136 pages
LECT3
No ratings yet
LECT3
87 pages
7.2.1 IPv4 Routing Overview
No ratings yet
7.2.1 IPv4 Routing Overview
5 pages
Cyber Security-Unit-II
No ratings yet
Cyber Security-Unit-II
29 pages
Lecture 15
No ratings yet
Lecture 15
82 pages
C - Unit-Iii
No ratings yet
C - Unit-Iii
25 pages
Os Unit Iv
No ratings yet
Os Unit Iv
62 pages
Os Unit Ii
No ratings yet
Os Unit Ii
69 pages
Cyber Security-Unit-I
No ratings yet
Cyber Security-Unit-I
64 pages
Lesson 1 - Intro To Json
No ratings yet
Lesson 1 - Intro To Json
36 pages
AJP Part B Final
No ratings yet
AJP Part B Final
8 pages
LoRa Installation Guide
0% (1)
LoRa Installation Guide
20 pages
API
No ratings yet
API
2 pages
Widevine DRM Proxy Integration
No ratings yet
Widevine DRM Proxy Integration
23 pages
CVP Development and Scripting Part 2 (CVPDS Part 2) : Course Overview
No ratings yet
CVP Development and Scripting Part 2 (CVPDS Part 2) : Course Overview
2 pages
Google Maps Add API (JSON Format) :: Sample Body
No ratings yet
Google Maps Add API (JSON Format) :: Sample Body
3 pages
Translation Using LLM (Rust)
No ratings yet
Translation Using LLM (Rust)
11 pages
SAGE X3 Architecture Guide v11
No ratings yet
SAGE X3 Architecture Guide v11
35 pages
jq-cookbook-sample
No ratings yet
jq-cookbook-sample
6 pages
Mysql Shell 8.3 En.a4
No ratings yet
Mysql Shell 8.3 En.a4
416 pages
Lua JSON
No ratings yet
Lua JSON
4 pages
Routingpy Readthedocs Io en Latest
No ratings yet
Routingpy Readthedocs Io en Latest
58 pages
15 SAP CPI Interview Questions and Answers - CLIMB
No ratings yet
15 SAP CPI Interview Questions and Answers - CLIMB
9 pages
2025-02-16-16-02-20-772
No ratings yet
2025-02-16-16-02-20-772
2 pages
How To Use Llama 2 With An API On AWS To Power Your AI Apps
No ratings yet
How To Use Llama 2 With An API On AWS To Power Your AI Apps
21 pages
Mobile Application Laboratory Manual (Vtu)
No ratings yet
Mobile Application Laboratory Manual (Vtu)
51 pages
VenugopalReddy FrontEnd
No ratings yet
VenugopalReddy FrontEnd
2 pages
pTIA Data+ DA0-001
No ratings yet
pTIA Data+ DA0-001
11 pages
All About JSON File
No ratings yet
All About JSON File
9 pages
NEON Vibration Sensor - Communication Protocol v3 - DS VB XX XX - 6013 - 3 - A2
No ratings yet
NEON Vibration Sensor - Communication Protocol v3 - DS VB XX XX - 6013 - 3 - A2
46 pages
OVSDB Integration - Design - OpenDaylight Project
No ratings yet
OVSDB Integration - Design - OpenDaylight Project
6 pages
SHEFI Android 7y Exp
No ratings yet
SHEFI Android 7y Exp
6 pages
Madan N - ServiceNow Resume
0% (1)
Madan N - ServiceNow Resume
6 pages
Question Bank of Mobile Application development using flutter
No ratings yet
Question Bank of Mobile Application development using flutter
4 pages
KTLT Eng-1
No ratings yet
KTLT Eng-1
43 pages