This document provides an introduction to key concepts in big data, including:
- The three types of data are structured, semi-structured, and unstructured, with unstructured data making up 80-90% of organizational data.
- Common big data technologies and concepts discussed include relational databases, data mining algorithms, UIMA, ERP, CRM, data lakes, and CAP theorem.
- Companies should leverage big data to look at newer architectures, tools, and practices in order to gain insights from large and diverse datasets.
This document provides an introduction to key concepts in big data, including:
- The three types of data are structured, semi-structured, and unstructured, with unstructured data making up 80-90% of organizational data.
- Common big data technologies and concepts discussed include relational databases, data mining algorithms, UIMA, ERP, CRM, data lakes, and CAP theorem.
- Companies should leverage big data to look at newer architectures, tools, and practices in order to gain insights from large and diverse datasets.
This document provides an introduction to key concepts in big data, including:
- The three types of data are structured, semi-structured, and unstructured, with unstructured data making up 80-90% of organizational data.
- Common big data technologies and concepts discussed include relational databases, data mining algorithms, UIMA, ERP, CRM, data lakes, and CAP theorem.
- Companies should leverage big data to look at newer architectures, tools, and practices in order to gain insights from large and diverse datasets.
This document provides an introduction to key concepts in big data, including:
- The three types of data are structured, semi-structured, and unstructured, with unstructured data making up 80-90% of organizational data.
- Common big data technologies and concepts discussed include relational databases, data mining algorithms, UIMA, ERP, CRM, data lakes, and CAP theorem.
- Companies should leverage big data to look at newer architectures, tools, and practices in order to gain insights from large and diverse datasets.
1 Firewalls of the Enterprise data is present in Homogeneous
sources As well as in Heterogeneous Sources. 2 Digital data can be classified into Structured,Semi-Structured and Unstructured data. 3 Unstructured data is consist of 80-90% data of an organization. 4 It does not confirm to a data model but ha some structure is called Semi-Structured data. 5 Structured data is an Organized form and easily used by a computer program. 6 Most of the enterprise data has stored in relational databases. 7 Data held in RDBMS is typically Structured data. 8 Structure data generated an Object and Classes. 9 Predefined schema is called Structured data. 10 The number of tuples in a relation is called Cardinality of a Relation. Big Data and Analytics - 1
11 The number of columns referred as the Degree of a
Relation. 12 Expand ACID Atomicity,Consistency,Isolation,Durability. 13 Expand DML Data Manipulation Language. 14 Expand SOAP Simple Object Access Protocol. 15 Expand JSON Java Script Object Notation. 16 Popular data mining algorithms are Association rule mining,Regression Analysis,Collaborative filtering. 17 Expand UIMA Unstructured Information Management Architecture. 18 The relational data base evolved in 1980s & 1990s. 19 Todays big-data become tomorrow’s NORMAL 20 3 V’s concept proposed by the Gartner analysist Doug Lancy in 2001 year. 21 Terabyte is 1024^4 bytes. 22 Expand ERP Enterprise Resource Planning. 23 Expand CRM Customer Relationship Management. 24 Why company should compulsorily consider leveraging big- data to look at newer architecture,tools & practices. 25 Variability Characteristic of data explains the spikes in data Big Data and Analytics - 1
26 Volatility is the characteristics of data dealing with its
retention. 27 Near real time processing or real time processing deal with Velocity characteristics of data. 28 Data Lakes is a large data repository that stores data in its nature format until it is need. 29 Variability data flow can be highly inconsistent with periodic peaks. 30 Data science is the science of extracting knowledge from data using Statistical & Mathematical techniques. 31 Raw data is just “RAW” unsuitable for Analysis. 32 Data access from non-volatile storage such as hard disk is a slow process. 33 In database processing is also called in-databases analytics. 34 Symmetric Multiprocessor Systems are tightly coupled multiprocessor system. 35 Distributed data systems are known to be loosely coupled and are composed by individual machines. 36 CAP Theorem is also called Brewer’s Theorem. 37 Consistency implies that every read fetches the last write. Big Data and Analytics - 1
38 Availability implies that reads and writes always succeed.
39 Partition tolerance implies that the system will continue to function when network partion occurs. 40 A Shared Nothing Architecture provides the benefit of Isolation fault.