Computer Networks TCP
Computer Networks TCP
Computer Networks TCP
Prepared by,
SHADHA K
Assistant Professor(IT)
MESCE
Big Data
• Global data
• These enormous datasets.
Gartner’s Big Data Definition
• “Big data” is high-volume, -velocity and -variety information assets
that demand cost-effective, innovative forms of information
processing for enhanced insight and decision making.
Part One: 3Vs
• Companies are digging out amazing insights from text, locations or log
files.
• Elevator logs help to predict vacated real estate, shoplifters tweet
about stolen goods right next to the store, emails contain
communication patterns of successful projects.
• Most of this data already belongs to organizations, but it is sitting
there unused — that’s why Gartner calls it dark data. Similar to dark
matter in physics, dark data cannot be seen directly, yet it is the bulk
of the organizational universe.
Part Two: Cost-Effective, Innovative Forms of
Information Processing
• To sort out what can indeed be solved by the new technologies.
• Different technologies can be combined.
• Technology capabilities to store and process unstructured data
• To link data of various types, origins and rates of change
• Perform comprehensive analysis, which became possible for many,
rather than for selected few.
• Expect cost-effective and appropriate answers to your problems.
Part Three: Enhanced Insight and Decision Making
Big data is not only just a big pile of data but also
possesses to have hidden patterns and useful knowledge
which can be of high business value. But as there is a cost
of investment of resources in processing big data, we
should make a preliminary inquiry to see the potential of
the big data in terms of value discovery or else our efforts
could be in vain.
Advanced Analytics
• Deeper insights
• All data
• Broader insights
• New data sources
• Frictionless actions
• Increased reliability and accuracy
Advanced Analytics
Big Data Business Models
Challenges of Big Data
• Data Representation
• Many datasets have certain levels of heterogeneity.
• Type, structure, semantics, organization, granularity, and accessibility.
• Data representation aims to make data more meaningful for computer
analysis and user interpretation.
• Improper data representation will reduce the value of the original data and
may even obstruct effective data analysis.
• Efficient data representation shall reflect data structure, class, and type.
• Integrated technologies
Challenges of Big Data
• Redundancy Reduction and Data Compression
• High level of redundancy in datasets.
• Redundancy reduction and data compression is effective to reduce the
indirect cost of the entire system
• Potential values of the data are not affected.
• For example, most data generated by sensor networks are highly redundant,
which may be filtered and compressed at orders of magnitude.
Challenges of Big Data
• Data Life Cycle Management
• Sensors are generating data at unprecedented rates and scales.
• Current storage system could not support such massive data.
• Values hidden in big data depend on data freshness.
• Therefore, an importance principle related to the analytical value should be
developed to decide which data shall be stored and which data shall be
discarded.
Challenges of Big Data
• Data Confidentiality
• Most big data service providers could not effectively maintain and analyze
such huge datasets because of their limited capacity.
• They must rely on professionals or tools to analyze the data
• Which increase the potential safety risks
• For example, the transactional dataset generally includes a set of complete
operating data to drive key business processes. Such data contains details of
the lowest granularity and some sensitive information such as credit card
numbers. Therefore, analysis of big data may be delivered to a third party for
processing only when proper preventive measures are taken to protect the
sensitive data, to ensure its safety.
Challenges of Big Data
• Energy Management
• With the increase of data volume and analytical demands, the processing,
storage, and transmission of big data will inevitably consume more and more
electric energy.
• Therefore, system-level power consumption control and management
mechanisms shall be established for big data while expandability and
accessibility are both ensured.
Challenges of Big Data
• Expendability and Scalability
• The analytical system of big data must support present and future datasets.
• The analytical algorithm must be able to process increasingly expanding and
more complex datasets.
Challenges of Big Data
• Cooperation
• Research which requires experts in different fields cooperate to harvest the
potential of big data.
• A comprehensive big data network architecture must be established to help
scientists and engineers in various fields access different kinds of data and
fully utilize their expertise, so as to cooperate to complete the analytical
objectives.
Related Technologies
• Cloud Computing
• IoT
• Data Center
• Hadoop
Cloud Computing
• Cloud services allow individuals and businesses to use software and
hardware that are managed by third parties at remote locations.
• Reliable hardware infrastructures is critical to provide reliable storage.
• Cloud Computing is evolved from :
• Distributed Computing
• Parallel Computing
• Grid Computing
• Production data
• Inventory data
• Sales data
• Financial data
Big Data Generation
• IoT Data
Features
• Large-Scale Data
• Heterogeneity
• Strong Time and Space Correlation
• Effective Data Accounts for Only a Small Portion of the Big Data
Big Data Generation
• Internet Data
• searching entries
• Internet forum posts
• chatting records
• microblog messages
• Bio-medical Data
Big Data Acquisition
• Data Collection
• Data transmission
• Data pre-processing
Data Collection
• Log Files
• Sensors
• Mobile Equipment's
• Methods for Acquiring Network Data
• web crawler
• word segmentation system
• Task system
• Index system
• Libpcap-Based Packet Capture Technology
• Zero-Copy Packet Capture Technology
Data Transportation
• Data will be transferred to a data storage infrastructure for processing and
analysis
• Inter-DCN transmissions
• Intra-DCN transmissions
Inter-DCN transmissions
• Transmissions are from data source to data center
• Achieved with the existing physical network infrastructure
• IP-based wavelength division multiplexing
• Orthogonal frequency-division multiplexing
Intra-DCN transmissions
• Data communication flows within data centers
• Physical connection plates, chips, internal memories of data servers,
network architectures of data centers, and communication protocols
Data Pre-processing
• The collected datasets vary with respect to noise, redundancy, and
consistency.
• Integration
• Cleaning
• Data cleaning is a process to identify inaccurate, incomplete, or unreasonable data, and
then modify or delete such data to improve data quality.
• Searching and identifying errors
• Redundancy Elimination
• Compression
• Repeated data deletion