Magic quadrant for data warehouse database management systems divjeev
This document provides a Magic Quadrant analysis of 16 data warehouse database management system vendors to help readers choose the right vendor for their needs. It discusses trends in the market in 2010 such as acquisitions, the introduction of new appliances, and continued performance issues. The document also outlines key factors that will influence the market in 2011, including demands for better performance, extreme data management, and new applications delivering high business value.
This document discusses Apache Hadoop, its current state and future direction. It provides an overview of Hadoop as an open source platform for storing and analyzing large amounts of data across distributed systems. The document outlines Hortonworks' vision of making Hadoop an enterprise-ready platform that can power data-driven businesses and unify both traditional and big data analytics methods. It also announces an upcoming Hadoop conference in June 2012 with sessions showcasing real-world Hadoop uses.
Enterprise Data Workflows with CascadingPaco Nathan
Cascading meetup held jointly with Enterprise Big Data meetup at Tata Consultancy Services in Santa Clara on 2012-12-17
http://www.meetup.com/cascading/events/94079162/
Golam MD Enamul Haque has over 7 years of experience as a senior database analyst and BI specialist. He has expertise in Oracle, SQL Server, SAP Business Objects, and data warehousing. Some of his projects include implementing a data warehouse for Giarte Services and data integration services using SQL Server and SSIS. He holds an M.Eng. in Computer Science and Engineering and is proficient in several programming languages.
Ibm pure data system for analytics n200xIBM Sverige
The document describes IBM's PureData System for Analytics N200x, a new model that provides faster performance for big data analytics through improvements like more CPU cores, faster FPGAs, and more disk drives per server blade. The N200x offers 3x faster query performance than previous models, improved data center efficiency through 50% greater data capacity per rack, and fewer service calls expected through more frequent drive regeneration.
HBaseCon 2012 | Real-time Analytics with HBase - SematextCloudera, Inc.
In this talk we’ll explain how we implemented “update-less updates” (not a typo!) for HBase using append-only approach. This approach uses HBase core strengths like fast range scans and the recently added coprocessors to enable real-time analytics. It shines in situations where high data volume and velocity make random updates (aka Get+Put) prohibitively expensive. Apart from making real-time analytics possible, we’ll show how the append-only approach to updates makes it possible to perform rollbacks of data changes and avoid data inconsistency problems caused by tasks in MapReduce jobs that fail after only partially updating data in HBase.
The document summarizes several new features in SAP BW 7.4 on HANA including:
- Advanced DSOs, which replace previous object types with one modeling object that can be configured for different uses.
- Composite providers, which allow joining data from multiple sources for reporting.
- Open ODS views, which enable flexible integration of external data sources without traditional modeling.
- SLT sources for efficient real-time data replication from SAP and non-SAP systems.
- The HANA analysis process for running HANA-native functions directly on BW data.
- BW workspaces for agile modeling at local levels.
The document describes the IBM PureData System for Analytics N3001 appliance. It is a high-performance, scalable appliance that enables analytics on large volumes of data. It provides faster query performance, supports thousands of users, and includes business intelligence and Hadoop starter kits. The appliance requires minimal administration and maintenance, providing low total cost of ownership.
Microsoft® SQL Azure™ Database is a cloud-based relational database service built for Windows® Azure platform. It provides a highly available, scalable, multi-tenant database service hosted by Microsoft in the cloud. SQL Azure Database enables easy provisioning and deployment of multiple databases. Developers do not have to install, setup, patch or manage any software. High Availability and fault tolerance is built-in and no physical administration is required. SQL Azure supports Transact-SQL (T-SQL). Customers can leverage existing tools and knowledge in T-SQL based familiar relational
data model for building applications.
As the core SQL processing engine of the Greenplum Unified Analytics Platform, the Greenplum Database delivers Industry leading performance for Big Data Analytics while scaling linearly on massively parallel processing clusters of standard x86 servers. This session reviews the product's underlying architecture, identify key differentiation areas, go deep into the new features introduced in Greenplum Database Release 4.2, and discuss our plans for 2012.
Oracle Warehouse Builder is Oracle's tool for designing, deploying, and managing business intelligence and data integration projects on the Oracle database. It provides a graphical environment to extract, transform, and load data from various sources into a Oracle data warehouse or datamarts. Warehouse Builder manages the full lifecycle of metadata and data, and enables users to design and deploy ETL processes, reporting infrastructure, and manage the target schema.
This document provides an overview and technical details of Oracle NoSQL Database. It describes NoSQL databases as evolving from vertically integrated applications to modern web-scale architectures. It outlines key characteristics of NoSQL such as eventual consistency and high availability. The document then details Oracle NoSQL Database's data model, APIs, administration capabilities, and architecture. It positions Oracle NoSQL Database as suitable for large-scale, low-latency applications requiring simple key-value access.
The IBM Netezza Data Warehouse ApplianceIBM Sverige
The document discusses the IBM Netezza data warehouse appliance. It provides faster, simpler analytics compared to traditional systems. The appliance features a purpose-built analytics engine, integrated database and storage, and standard interfaces. It offers speeds 10-100x faster than traditional systems, minimal administration, and scalability to petabytes of data. Customers across various industries have seen performance improvements of 5x or more by migrating to the Netezza appliance.
This document provides an introduction to data warehousing fundamentals. It defines a data warehouse as an enterprise repository for subject-oriented, time-variant data used for reporting and analysis. It describes the typical phases of a data warehousing project including strategy, definition, analysis, design, build, population, and evolution. It compares data warehouses to operational databases and data marts. Finally, it discusses extract, transform, load processes, possible reasons for ETL failure, and typical warehousing development tasks.
Apache Hadoop has gained considerable attention from the enterprise IT community as a data analytics alternative to traditional BI systems and data warehousing. And while this is not the only alternative currently available, it has become highly visible.
However, with heightened visibility comes heightened scrutiny. Hadoop’s shortcomings have also become more visible to enterprise IT administrators who have expressed concern over data integrity, system resiliency, ease of use, and maintainability. Now, a growing number of enterprise IT‐centric vendors are responding to the opportunity to offer a Hadoop‐based data analytics solution that conforms to the demands of a production data center environment. Here we review one such solution that has resulted from a partnership between NetApp and Cloudera, the commercial face of Apache Hadoop.
Explores the notion of "Hadoop as a Data Refinery" within an organisation, be it one with an existing Business Intelligence system or none - looks at 'agile data' as a a benefit of using Hadoop as the store for historical, unstructured and very-large-scale datasets.
The final slides look at the challenge of an organisation becoming "data driven"
Hadoop as Data Refinery - Steve LoughranJAX London
1. Steve Loughran presented on using Hadoop as a data refinery to store, clean, and refine large amounts of raw data for business intelligence and analytics.
2. A data refinery uses Hadoop to ingest raw data from various sources, clean it, filter it, and forward it to destinations like data warehouses or new agile data systems. It retains raw data for future analysis and offloads work from core data warehouses.
3. Hadoop allows organizations to become more data-driven by supporting ad-hoc queries, storing more historical data affordably, and serving as a platform for data science applications and machine learning. This helps drive innovative business models and competitive advantages.
This document discusses real-time big data applications and provides a reference architecture for search, discovery, and analytics. It describes combining analytical and operational workloads using a unified data model and operational database. Examples are given of organizations using this approach for real-time search, analytics and continuous adaptation of large and diverse datasets.
The document discusses Oracle's Advanced Analytics Option which extends the Oracle Database into a comprehensive advanced analytics platform. It includes Oracle Data Mining for in-database predictive analytics and data mining, and Oracle R Enterprise which integrates the open-source R statistical programming language with the database. The option aims to bring algorithms to the data within the database to eliminate data movement and reduce total cost of ownership compared to traditional statistical environments.
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
Apache Hadoop is revolutionizing business intelligence and data analytics by providing a scalable and fault-tolerant distributed system for data storage and processing. It allows businesses to explore raw data at scale, perform complex analytics, and keep data alive for long-term analysis. Hadoop provides agility through flexible schemas and the ability to store any data and run any analysis. It offers scalability from terabytes to petabytes and consolidation by enabling data sharing across silos.
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
"Amr Awadallah served as the VP of Engineering of Yahoo's Product
Intelligence Engineering (PIE) team for a number of years. The PIE
team was responsible for business intelligence and advanced data
analytics across a number of Yahoo's key consumer facing properties (search, mail, news, finance, sports, etc). Amr will share the data architecture that PIE had implementted before Hadoop was deployed and the headaches that architecture entailed. Amr will then show how most, if not all of these headaches were eliminated once Hadoop was deployed. Amr will illustrate how Hadoop and Relational Database complement each other within the traditional business intelligence data stack, and how that enables organizations to access all their data under different
operational and economic constraints."
This document provides an overview and introduction to Hadoop, an open-source framework for storing and processing large datasets in a distributed computing environment. It discusses what Hadoop is, common use cases like ETL and analysis, key architectural components like HDFS and MapReduce, and why Hadoop is useful for solving problems involving "big data" through parallel processing across commodity hardware.
The document discusses how SQL and NoSQL databases can work together for big data. It provides an overview of relational databases based on Codd's rules and how NoSQL databases are used for less structured data like documents and graphs. Examples of using MongoDB and Hadoop are provided. The document also discusses using MySQL with memcached to get the benefits of both SQL and NoSQL for accessing data.
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
- Apache Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of commodity hardware.
- Cloudera's Data Operating System (CDH) is an enterprise-grade distribution of Apache Hadoop that includes additional components for management, security, and integration with existing systems.
- CDH enables enterprises to leverage Hadoop for data agility, consolidation of structured and unstructured data sources, complex data processing using various programming languages, and economical storage of data regardless of type or size.
This document discusses big data solutions and analytics. It defines big data in terms of volume, velocity, and variety of data. It contrasts big data analytics with traditional business intelligence, noting that big data looks for untapped insights rather than dashboards. It also provides examples of scalable big data platform architectures and advanced analytics capabilities. Finally, it outlines Anexinet's big data offerings including strategy, starter solutions, projects, and partnerships.
Offline processing with Hadoop allows for scalable, simplified batch processing of large datasets across distributed systems. It enables increased innovation by supporting complex analytics over large data sets without strict schemas. Hadoop adoption is moving beyond legacy roles to focus on data processing and value creation through scalable and customizable systems like Cascading.
Choosing a Next Gen Database: the New World Order of NoSQL, NewSQL, and MySQLScaleBase
In this webinar Matt Aslett of 451 Research joins ScaleBase to discuss the benefits and drawbacks of NoSQL, NewSQL & MySQL databases and explores real-life use cases for each.
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
This document summarizes a webinar presented by Talend and Caserta Concepts on the big data ecosystem. The webinar discussed how Talend provides an open source integration platform that scales to handle large data volumes and complex processes. It also overviewed Caserta Concepts' expertise in data management, big data analytics, and industries like financial services. The webinar covered topics like traditional vs big data, Hadoop and NoSQL technologies, and common integration patterns between traditional data warehouses and big data platforms.
A unified data modeler in the world of big dataWilliam Luk
This document discusses the challenges of data modeling in the era of big data and the need for data modeling tools to evolve to represent both traditional relational databases and non-relational data stores used in big data systems. It provides an overview of how the data landscape has changed with the rise of big data and NoSQL databases. It also describes a proof of concept project where CA used its data modeling tool to reverse engineer and represent both relational and non-relational data from its products stored in a Hadoop cluster. The document argues that a unified view of all enterprise data spread across different data systems is needed and possible.
Cloud computing, big data, and mobile technologies are driving major changes in the IT world. Cloud computing provides scalable computing resources over the internet. Big data involves extremely large data sets that are analyzed to reveal business insights. Hadoop is an open-source software framework that allows distributed processing of big data across commodity hardware. It includes tools like HDFS for storage and MapReduce for distributed computing. The Hadoop ecosystem also includes additional tools for tasks like data integration, analytics, workflow management, and more. These emerging technologies are changing how businesses use and analyze data.
The document discusses the emergence of big data and new data architectures needed to handle large, diverse datasets. It notes that internet companies built their own data systems like Hadoop to process massive amounts of unstructured data across thousands of servers in a fault-tolerant, scalable way. These systems use a map-reduce programming model and distributed file systems like HDFS to store and process data in a parallel, distributed manner.
The document provides an overview of big data technologies including Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, HBase, MongoDB, and Cassandra. It discusses how these technologies enable processing and analyzing very large datasets across commodity hardware. It also outlines the growth and market potential of the big data sector, which is expected to reach $48 billion by 2018.
Big data refers to the massive amounts of unstructured data that are growing exponentially. Hadoop is an open-source framework that allows processing and storing large data sets across clusters of commodity hardware. It provides reliability and scalability through its distributed file system HDFS and MapReduce programming model. The Hadoop ecosystem includes components like Hive, Pig, HBase, Flume, Oozie, and Mahout that provide SQL-like queries, data flows, NoSQL capabilities, data ingestion, workflows, and machine learning. Microsoft integrates Hadoop with its BI and analytics tools to enable insights from diverse data sources.
Apache Drill is an open source engine for interactive analysis of large-scale datasets. It provides low-latency queries using standard SQL and supports nested and hierarchical data. Drill is inspired by Google's Dremel system and provides an alternative to traditional batch processing systems like MapReduce for interactive analysis of big data.
Presentation big dataappliance-overview_oow_v3xKinAnx
The document outlines Oracle's Big Data Appliance product. It discusses how businesses can use big data to gain insights and make better decisions. It then provides an overview of big data technologies like Hadoop and NoSQL databases. The rest of the document details the hardware, software, and applications that come pre-installed on Oracle's Big Data Appliance - including Hadoop, Oracle NoSQL Database, Oracle Data Integrator, and tools for loading and analyzing data. The summary states that the Big Data Appliance provides a complete, optimized solution for storing and analyzing less structured data, and integrates with Oracle Exadata for combined analysis of all data sources.
Similar to Big dataappliance hadoopworld_final (20)
Oracle Big Data Appliance and Big Data SQL for advanced analyticsjdijcks
Overview presentation showing Oracle Big Data Appliance and Oracle Big Data SQL in combination with why this really matters. Big Data SQL brings you the unique ability to analyze data across the entire spectrum of system, NoSQL, Hadoop and Oracle Database.
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...jdijcks
Learn about the benefits of Oracle Big Data Appliance and how it can drive business value underneath applications and tools. This includes a section by Paul Kent, VP Big Data SAS describing how SAS runs well on Oracle Engineered Systems and on Oracle Big Data Appliance specifically.
Expand a Data warehouse with Hadoop and Big Datajdijcks
After investing years in the data warehouse, are you now supposed to start over? Nope. This session discusses how to leverage Hadoop and big data technologies to augment the data warehouse with new data, new capabilities and new business models.
1) The document discusses big data strategies and technologies including Oracle's big data solutions. It describes Oracle's big data appliance which is an integrated hardware and software platform for running Apache Hadoop.
2) Key technologies that enable deeper analytics on big data are discussed including advanced analytics, data mining, text mining and Oracle R. Use cases are provided in industries like insurance, travel and gaming.
3) An example use case of a "smart mall" is described where customer profiles and purchase data are analyzed in real-time to deliver personalized offers. The technology pattern for implementing such a use case with Oracle's real-time decisions and big data platform is outlined.
A description of four simple steps to augment an existing analytics infrastructure with big data including a simple end-to-end example of how to benefit from this addition of big data. Covers Hadoop, NoSQL, RDBMS and Streaming technologies like Event Processing
How Netflix Builds High Performance Applications at Global ScaleScyllaDB
We all want to build applications that are blazingly fast. We also want to scale them to users all over the world. Can the two happen together? Can users in the slowest of environments also get a fast experience? Learn how we do this at Netflix: how we understand every user's needs and preferences and build high performance applications that work for every user, every time.
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Erasmo Purificato
Slide of the tutorial entitled "Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Emerging Trends" held at UMAP'24: 32nd ACM Conference on User Modeling, Adaptation and Personalization (July 1, 2024 | Cagliari, Italy)
Implementations of Fused Deposition Modeling in real worldEmerging Tech
The presentation showcases the diverse real-world applications of Fused Deposition Modeling (FDM) across multiple industries:
1. **Manufacturing**: FDM is utilized in manufacturing for rapid prototyping, creating custom tools and fixtures, and producing functional end-use parts. Companies leverage its cost-effectiveness and flexibility to streamline production processes.
2. **Medical**: In the medical field, FDM is used to create patient-specific anatomical models, surgical guides, and prosthetics. Its ability to produce precise and biocompatible parts supports advancements in personalized healthcare solutions.
3. **Education**: FDM plays a crucial role in education by enabling students to learn about design and engineering through hands-on 3D printing projects. It promotes innovation and practical skill development in STEM disciplines.
4. **Science**: Researchers use FDM to prototype equipment for scientific experiments, build custom laboratory tools, and create models for visualization and testing purposes. It facilitates rapid iteration and customization in scientific endeavors.
5. **Automotive**: Automotive manufacturers employ FDM for prototyping vehicle components, tooling for assembly lines, and customized parts. It speeds up the design validation process and enhances efficiency in automotive engineering.
6. **Consumer Electronics**: FDM is utilized in consumer electronics for designing and prototyping product enclosures, casings, and internal components. It enables rapid iteration and customization to meet evolving consumer demands.
7. **Robotics**: Robotics engineers leverage FDM to prototype robot parts, create lightweight and durable components, and customize robot designs for specific applications. It supports innovation and optimization in robotic systems.
8. **Aerospace**: In aerospace, FDM is used to manufacture lightweight parts, complex geometries, and prototypes of aircraft components. It contributes to cost reduction, faster production cycles, and weight savings in aerospace engineering.
9. **Architecture**: Architects utilize FDM for creating detailed architectural models, prototypes of building components, and intricate designs. It aids in visualizing concepts, testing structural integrity, and communicating design ideas effectively.
Each industry example demonstrates how FDM enhances innovation, accelerates product development, and addresses specific challenges through advanced manufacturing capabilities.
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
Are you interested in learning about creating an attractive website? Here it is! Take part in the challenge that will broaden your knowledge about creating cool websites! Don't miss this opportunity, only in "Redesign Challenge"!
7 Most Powerful Solar Storms in the History of Earth.pdfEnterprise Wired
Solar Storms (Geo Magnetic Storms) are the motion of accelerated charged particles in the solar environment with high velocities due to the coronal mass ejection (CME).
How RPA Help in the Transportation and Logistics Industry.pptxSynapseIndia
Revolutionize your transportation processes with our cutting-edge RPA software. Automate repetitive tasks, reduce costs, and enhance efficiency in the logistics sector with our advanced solutions.
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
How Social Media Hackers Help You to See Your Wife's Message.pdfHackersList
In the modern digital era, social media platforms have become integral to our daily lives. These platforms, including Facebook, Instagram, WhatsApp, and Snapchat, offer countless ways to connect, share, and communicate.
How to Avoid Learning the Linux-Kernel Memory ModelScyllaDB
The Linux-kernel memory model (LKMM) is a powerful tool for developing highly concurrent Linux-kernel code, but it also has a steep learning curve. Wouldn't it be great to get most of LKMM's benefits without the learning curve?
This talk will describe how to do exactly that by using the standard Linux-kernel APIs (locking, reference counting, RCU) along with a simple rules of thumb, thus gaining most of LKMM's power with less learning. And the full LKMM is always there when you need it!
2. The following is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decisions.
The development, release, and timing of any features or
functionality described for Oracle’s products remain at the
sole discretion of Oracle.
3. Case: On-line Ads and Content
Real-time: Determine
Low best ad to place
Latency Lookup user on page for this user
profile
Add user NoSQL Expert
if not present DB Input into System
Actual
HDFS Predictions
ads
on browsing
Web served
logs
High scale Batch
data reductions BI and
Billing
NoSQL DB Analytics
Profiles
4. Agenda
• Big Data Technology
• Oracle Big Data Appliance
• Big Data Applications
• Summary
• Q&A
6. Big Data: Infrastructure Requirements
Acquire Organize Analyze
• Low, predictable Latency
• High Transaction Volume • Deep Analytics
• Flexible Data Structures • Agile Development
• Massive Scalability
• High Throughput
• Real Time Results
• In-Place Preparation
• All Data Sources/Structures
12. Oracle Big Data Appliance Hardware
•18 Sun X4270 M2 Servers
– 48 GB memory per node = 864 GB memory
– 12 Intel cores per node = 216 cores
– 24 TB storage per node = 432 TB storage
•40 Gb p/sec InfiniBand
•10 Gb p/sec Ethernet
13. Big Data Appliance
Cluster of industry standard servers for Hadoop and NoSQL Database
• Focus on Scalability and Availability at low cost
InfiniBand Network
Compute and Storage
• Redundant 40Gb/s switches
• 18 High-performance low-cost
• IB connectivity to Exadata
servers acting as Hadoop
nodes
10GigE Network • 24 TB Capacity per node
• 8 10GigE ports • 2 6-core CPUs per node
• Datacenter connectivity • Hadoop triple replication
• NoSQL Database triple
replication
14. Scale Out to Infinity
Scale out by connecting racks
to each other using Infiniband
• Expand up to eight racks without
additional switches
• Scale beyond eight racks by adding
an additional switch
15. Oracle Big Data Appliance Software
•Oracle Linux 5.6
•Java Hotspot VM
•Apache Hadoop Distribution v0.20.x
•R Distribution
•Oracle NoSQL Database Enterprise
Edition
•Oracle Data Integrator Application
Adapter for Hadoop
•Oracle Loader for Hadoop
16. Why Open-Source Apache Hadoop?
• Fast evolution in critical features
• Built by the Hadoop experts in the community
• Practical instead of esoteric
• Focus on what is needed for large clusters
• Proven at very large scale
• In production at all the large consumers of Hadoop
• Extremely stable in those environments
• Well-understood by practitioners
17. Software Layout
• Node 1:
• M: Name Node, Balancer & HBase Master
• S: HDFS Data Node, NoSQL DB Storage Node
• Node 2:
• M: Secondary Name Node, Management,
Zookeeper, MySQL Slave
• S: HDFS Data Node, NoSQL DB Storage Node
• Node 3:
• M: JobTracker, MySQL Master, ODI Agent,
Hive Server
• S: HDFS Data Node, NoSQL DB Storage Node
• Node 4 – 18:
• S: HDFS Data Nodes, Task Tracker, HBase
Region Server, NoSQL DB Storage Nodes
• Your MapReduce runs here!
18. Big Data Appliance
Big Data for the Enterprise
• Optimized and Complete
• Everything you need to store and integrate
your lower information density data
• Integrated with Oracle Exadata
• Analyze all your data
• Easy to Deploy
• Risk Free, Quick Installation and Setup
• Single Vendor Support
• Full Oracle support for the entire system and
software set
20. Key-Value Store Workloads
• Large dynamic schema based data repositories
• Data capture
• Web applications
• Online retail
• Sensor/statistics/network capture/Mobile Devices
• Data services
• Scalable authentication
• Real-time communication (MMS, SMS, routing)
• Personalization / Localization
• Social Networks
21. Oracle NoSQL DB
A distributed, scalable key-value database
• Simple Data Model
• Key-value pair with major+sub-key paradigm
• Read/insert/update/delete operations Application Application
• Scalability NoSQLDB Driver NoSQLDB Driver
• Dynamic data partitioning and distribution
• Optimized data access via intelligent driver
• High availability
• One or more replicas
• Disaster recovery through location of replicas
• Resilient to partition master failures
• No single point of failure
Storage Nodes Storage Nodes
• Transparent load balancing Data Center B
Data Center A
• Reads from master or replicas
• Driver is network topology & latency aware
22. Resolving a Request
Operation + Key[M,m] + Value + Transaction Policy
Client
Hash Major Key to determine
Partition id
Use Partition Map to map Partition • Operation result
id to a Rep Group • New Partition Map
• RepNodeStorageTable
Use State Table to determine eligible information
Storage Node(s) within Rep Group
Use Load Balancer to select best
eligible Rep Node
Contact Rep Node directly
23. ACID Transactions
Transaction Policy Transaction Policy
Write Durability Read Consistency
• Configurable per-operation, • Configurable per-operation,
application can set defaults application can set defaults
• Write Transaction Durability consists • Read Consistency specified as
of both
Absolute, Time-based, Version or
a) Sync policy (on Master and None
Replica)
• Absolute Read from the master
• Sync – force to disk
• Write No Sync – force to OS • Time-based Read from any
buffer replica that is within <time-
• No Sync – write to local log buffer, interval> of master or better
flush when convenient • Version Read from any replica
b) Replica Acknowledgement Policy that is current with <transaction-
• All token> or higher
• Simple Majority • None Read from any replica
• None
24. Oracle NoSQL DB Differentiation
• Commercial Grade Software and Support
• General-purpose
• Reliable – Based on proven Berkeley DB JE HA
• Easy to install and configure
• Scalable throughput, bounded latency
• Simple Programming and Operational Model
• Simple Major + Sub key and Value data structure
• ACID transactions
• Configurable consistency & durability
• Easy Management
• Web-based console, API accessible
• Manages and Monitors: Topology; Load; Performance; Events; Alerts
• Completes Oracle large scale data storage offerings
25. Try NoSQL Database on OTN
Oracle NoSQL Database:
• Community Edition is available as a software
only distribution
• Enterprise Edition is available as a separately
licensable product or as part of Big Data Appliance
38. Big Data Appliance
Big Data for the Enterprise
• Optimized and Complete
• Everything you need to store and integrate your lower
information density data
• Integrated with Oracle Exadata
• Analyze all your data
• Easy to Deploy
• Risk Free, Quick Installation and Setup
• Single Vendor Support
• Full Oracle support for the entire system and software
set
39. Big Data Appliance and Exadata
Big Data for the Enterprise
NoSQL DB
HDFS
Hadoop
RDBMS
Is Developer Centric the right word? Should we hyphenate, or put comma’s
Benefits for Online Mode: No need to write to disk after Hadoop job Simpler management for use cases with lots of nodes generating output filesBenefits for Offline Mode (DP Files): Import operation can be parallelized in the database Fastest option for external tables
Direct HDFS:Access data on HDFS through the external table mechanismBenefitsData on HDFS can be queried from the databaseImport into the database as needed