Big Data Analytics: Free Guide: 5 Data Science Tools To Consider

big data analytics
Posted by: Margaret Rouse

WhatIs.com

Contributor(s): Mark Labbe, Lisa Martinek and Craig Stedman






Big data analytics is the often complex process of examining large and
varied data sets, or big data, to uncover information -- such as hidden
patterns, unknown correlations, market trends and customer preferences --
that can help organizations make informed business decisions.
DOWNLOAD THIS FREE GUIDE
Free Guide: 5 Data Science Tools to Consider
With the right data science tools, you can

gain powerful insight out of the ever-growing
pools of corporate data. Learn why data
science experts are using Python, R, Jupyter
Notebook, Tableau, and Keras.
 Corporate E-mail Address:

o I agree to TechTarget’s Terms of Use, Privacy Policy, and
the transfer of my information to the United States for processing to provide
me with relevant information as described in our Privacy Policy.

o I agree to my information being processed by TechTarget
and its Partners to contact me via phone, email, or other means regarding
information relevant to my professional interests. I may unsubscribe at any
time.
Dow nload Now

On a broad scale, data analytics technologies and techniques provide a

means to analyze data sets and draw conclusions about them which help
organizations make informed business decisions. Business intelligence (BI)
queries answer basic questions about business operations and
performance.
Big data analytics is a form of advanced analytics, which involves complex

applications with elements such as predictive models, statistical algorithms
and what-if analysis powered by high-performance analytics systems.
The importance of big data analytics
Driven by specialized analytics systems and software, as well as high-

powered computing systems, big data analytics offers various business
benefits, including:
 New revenue opportunities
 More effective marketing
 Better customer service

 Improved operational efficiency
 Competitive advantages over rivals
Big data analytics applications enable big data analysts, data scientists,

predictive modelers, statisticians and other analytics professionals to
analyze growing volumes of structured transaction data, plus other forms of
data that are often left untapped by conventional BI and analytics
programs. This encompasses a mix of semi-structured and unstructured
data -- for example, internet clickstream data, web server logs, social
media content, text from customer emails and survey responses, mobile
phone records, and machine data captured by sensors connected to the
internet of things (IoT).
TECHTARGET
Big data analytics is a form of advanced analytics, which has marked differences
compared to traditional BI.
Big data analytics technologies and tools
Unstructured and semi-structured data types typically don't fit well in

traditional data warehouses that are based on relational databases oriented
to structured data sets. Further, data warehouses may not be able to
handle the processing demands posed by sets of big data that need to be
updated frequently or even continually, as in the case of real-time data on
stock trading, the online activities of website visitors or the performance of
mobile applications.
As a result, many of the organizations that collect, process and analyze big
data turn to NoSQL databases, as well as Hadoop and its companion data
analytics tools, including:
 YARN: a cluster management technology and one of the key features

in second-generation Hadoop.
 MapReduce: a software framework that allows developers to write

programs that process massive amounts of unstructured data in parallel
across a distributed cluster of processors or stand-alone computers.
 Spark: an open source, parallel processing framework that enables

users to run large-scale data analytics applications across clustered
systems.
 HBase: a column-oriented key/value data store built to run on top of

the Hadoop Distributed File System (HDFS).
 Hive: an open source data warehouse system for querying and

analyzing large data sets stored in Hadoop files.
 Kafka: a distributed publish/subscribe messaging system designed to

replace traditional message brokers.
 Pig: an open source technology that offers a high-level mechanism

for the parallel programming of MapReduce jobs executed on Hadoop
clusters.
How big data analytics works
In some cases, Hadoop clusters and NoSQL systems are used primarily as

landing pads and staging areas for data before it gets loaded into a data
warehouse or analytical database for analysis -- usually in a summarized
form that is more conducive to relational structures.
More frequently, however, big data analytics users are adopting the
concept of a Hadoop data lake that serves as the primary repository for
incoming streams of raw data. In such architectures, data can be analyzed
directly in a Hadoop cluster or run through a processing engine like Spark.
As in data warehousing, sound data management is a crucial first step in
the big data analytics process. Data being stored in the HDFS must be
organized, configured and partitioned properly to get good performance out
of both extract, transform and load (ETL) integration jobs and analytical
queries.
Once the data is ready, it can be analyzed with the software commonly
used for advanced analytics processes. That includes tools for:
 data mining, which sift through data sets in search of patterns and
relationships;
 predictive analytics, which build models to forecast customer

behavior and other future developments;
 machine learning, which taps algorithms to analyze large data sets;

and
 deep learning, a more advanced offshoot of machine learning.
Text mining and statistical analysis software can also play a role in the big

data analytics process, as can mainstream business intelligence software
and data visualization tools. For both ETL and analytics applications,
queries can be written in MapReduce, with programming languages such
as R, Python, Scala, and SQL, the standard languages for relational
databases that are supported via SQL-on-Hadoop technologies.
Big data analytics uses and challenges
Big data analytics applications often include data from both internal
systems and external sources, such as weather data or demographic data
on consumers compiled by third-party information services providers. In
addition, streaming analytics applications are becoming common in big
data environments as users look to perform real-time analytics on data fed
into Hadoop systems through stream processing engines, such
as Spark, Flink and Storm.
See the four types of big data analytics and what
each is used for.
Early big data systems were mostly deployed on premises, particularly in

large organizations that collected, organized and analyzed massive
amounts of data. But cloud platform vendors, such as Amazon Web
Services (AWS) and Microsoft, have made it easier to set up and manage
Hadoop clusters in the cloud, as have Hadoop suppliers such as Cloudera-
Hortonworks, which supports the distribution of the big data framework on
the AWS and Microsoft Azure clouds. Users can now spin up clusters in the
cloud, run them for as long as they need and then take them offline with
usage-based pricing that doesn't require ongoing software licenses.
Big data has become increasingly beneficial in supply chain analytics. Big
supply chain analytics utilizes big data and quantitative methods to
enhance decision making processes across the supply chain. Specifically,
big supply chain analytics expands datasets for increased analysis that
goes beyond the traditional internal data found on enterprise resource
planning (ERP) and supply chain management (SCM) systems. Also, big
supply chain analytics implements highly effective statistical methods on
new and existing data sources. The insights gathered facilitate better
informed and more effective decisions that benefit and improve the supply
chain.
Potential pitfalls of big data analytics initiatives include a lack of internal

analytics skills and the high cost of hiring experienced data scientists
and data engineers to fill the gaps.
TECHTARGET
Big data analytics involves analyzing structured and unstructured data.

Emergence and growth of big data analytics
The term big data was first used to refer to increasing data volumes in the
mid-1990s. In 2001, Doug Laney, then an analyst at consultancy Meta
Group Inc., expanded the notion of big data to also include increases in the
variety of data being generated by organizations and the velocity at which
that data was being created and updated. Those three factors -- volume,
velocity and variety -- became known as the 3Vs of big data, a concept
Gartner popularized after acquiring Meta Group and hiring Laney in 2005.

Margaret Rouse asks:
What kind of big data analytics
challenges does your organization
face? And what are you doing to
overcome them?
Join the Discussion
Separately, the Hadoop distributed processing framework was launched as

an Apache open source project in 2006, planting the seeds for a clustered
platform built on top of commodity hardware and geared to run big data
applications. By 2011, big data analytics began to take a firm hold in
organizations and the public eye, along with Hadoop and various related
big data technologies that had sprung up around it.
Initially, as the Hadoop ecosystem took shape and started to mature, big
data applications were primarily the province of large internet and e-
commerce companies such as Yahoo, Google and Facebook, as well as
analytics and marketing services providers. In the ensuing years, though,
big data analytics has increasingly been embraced by retailers, financial
services firms, insurers, healthcare organizations, manufacturers, energy
companies and other enterprises.

Big Data Analytics: Free Guide: 5 Data Science Tools To Consider

Uploaded by

Copyright:

Available Formats

Big Data Analytics: Free Guide: 5 Data Science Tools To Consider

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data Analytics: Free Guide: 5 Data Science Tools To Consider

Uploaded by

Copyright:

Available Formats

big data analytics

Posted by: Margaret Rouse

Contributor(s): Mark Labbe, Lisa Martinek and Craig Stedman

DOWNLOAD THIS FREE GUIDE

Free Guide: 5 Data Science Tools to Consider

With the right data science tools, you can

On a broad scale, data analytics technologies and techniques provide a

Big data analytics is a form of advanced analytics, which involves complex

The importance of big data analytics

Driven by specialized analytics systems and software, as well as high-

 New revenue opportunities

 More effective marketing

 Better customer service

 Competitive advantages over rivals

Big data analytics applications enable big data analysts, data scientists,

Unstructured and semi-structured data types typically don't fit well in

 YARN: a cluster management technology and one of the key features

 MapReduce: a software framework that allows developers to write

 Spark: an open source, parallel processing framework that enables

 HBase: a column-oriented key/value data store built to run on top of

 Hive: an open source data warehouse system for querying and

 Kafka: a distributed publish/subscribe messaging system designed to

 Pig: an open source technology that offers a high-level mechanism

In some cases, Hadoop clusters and NoSQL systems are used primarily as

 predictive analytics, which build models to forecast customer

 machine learning, which taps algorithms to analyze large data sets;

 deep learning, a more advanced offshoot of machine learning.

Text mining and statistical analysis software can also play a role in the big

Big data analytics uses and challenges

Early big data systems were mostly deployed on premises, particularly in

Potential pitfalls of big data analytics initiatives include a lack of internal

Big data analytics involves analyzing structured and unstructured data.

Separately, the Hadoop distributed processing framework was launched as

You might also like