Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Data Modeling and Scale Out
Jason Stamper, 451 Research
Vladi Vexler and Paul Campaniello, ScaleBase
2
Agenda
Data Modeling and Scale Out
1. 451 Research
• Key challenges in the data landscape
• Evolution of distributed database environments
2. ScaleBase
• Pros and cons of abstracting complex databases topology
• Top strategies of distributed data modeling
• Advanced data modeling and “what-if” simulations with Analysis Genie
• Scaling real apps – From need to deployment
• Demo
3. Q & A (please type questions directly into the GoToWebinar side panel)
3
Today’s Presenters
Jason Stamper
Analyst, Data Manage-
ment and Analytics
- 451 Research
• Over 20 years of
experience in IT
• Formerly Editor
of Computer Business
Review & Technology
Editor at The New
Statesman
Vladi Vexler
Vice President, Tech.
& Product Marketing
- ScaleBase
• Over 15 years experience
in software development
and product management
• Author of patents in field
of databases innovation,
dynamic data caching and
machine learning analytics
Paul Campaniello
Vice President,
Worldwide Marketing
- ScaleBase
• Over 25 years of software
marketing & sales
experience
• Held senior marketing
and sales positions at
Mendix, Lumigent, ESI,
ComBrio, Savantis and
Precise Software
4
About 451 Research
Founded in 2000
210+ employees, including over 100 analysts
1,000+ clients: Technology & Service
providers, corporate
advisory, finance, professional services, and IT
decision makers
10,000+ senior IT professionals in our research
community
Over 52 million data points each quarter
Headquartered in New York with offices in
Boston, San Francisco, Washington, London…
Research & Data
Advisory Services
Events
5
The Challenge
Businesses and their users are facing what one might call a
perfect storm – decision-makers need insight faster than ever,
and yet IT is struggling to avoid becoming a bottleneck.
6
The Facts Speak for Themselves…
Recent survey by trade magazine Computer Business
Review: 98% (of 200 UK CIOs) admit “significant gap”
between what business expects and what IT can deliver.
7
So What Does the Business Want?
Speed
Information, not
data
Flexibility
Ease-of-use
Mobility
New ways of
working
Self-service
Scale
Collaboration
8
What Causes IT to Become a Bottleneck?
Governance
Control
Security
Budget
Legacy
Staff
9
What Have We Learned So Far?
• So far, the emergence of so-called ‘hot’ data platform and
analytics technologies have not solved the IT information
bottleneck.
• Hadoop isn’t going to save the world (and neither is
NoSQL).
• The ability to analyze large data sets, in real- or near
real-time, is only set to grow in the era of the Internet of
Things.
• IT is still critical, but it needs to enable the business to
help itself. The question is how to achieve the right blend
of usability, value-for-money and scalability.
10
A Word or Two on Hadoop Adoption
0 2000 4000 6000 8000
2013
2012DW and DBMS
Unstructured file
Virtualized server/OS
Backup
Archive
Other
Big data/Hadoop
Average total storage capacity (TBs), and total storage footprint
by workload illustrate the low level of adoption today
11
451 Research’s View of the ‘Total Data Approach’
12
What is Driving the Change?
Developers
Agile
REST
JSON
Schemaless
Schema-on-read
Flexible
Applications
Web
Social
Mobile
Always-on
Interactive
Local
Architecture
Cloud
Scalable
Elastic
Virtual
Distributed
Flexible
New applications require
distributed architecture
Distributed architecture
encourages new
development
approaches
New development approaches
demand new architecture
Distributed architecture
enables new applications
New app
requirements
demand new
development
approaches
New dev
approaches
enable new
lightweight
apps
13
The Database Challenge
– The traditional relational database has been stretched beyond its
normal capacity limits by the needs of high-volume, highly
distributed or highly complex applications.
– There are workarounds – such as DIY sharding – but manual,
homegrown efforts can result in database administrators being
stretched beyond their available capacity in terms of managing
complexity.
– Scalability
– Performance
– Relaxed consistency Increased willingness to look
– Agility for emerging alternatives
– Intricacy
– Necessity
14
Scalability, and Other Challenges
• As usage of MySQL and MariaDB has grown, so has the usage
of applications that depend on MySQL and MariaDB:
– Games; Social; Customer Facing; Web; Business apps like Ad Networks;
• This has highlighted a number of challenges
– Scalability of master-slave architecture
– Performance and predictability at scale
– Lower latency; greater throughput; richer apps
– User expectations rising
– Manageability of increasing database/app sprawl
• External factors driving greater complexity:
– Distributed computing architectures
– Proliferation of cloud and elasticity requirements
– Geo-distributed application requirements
– Viral success means growth can come very quickly
15
Conclusions
• The success of MySQL and MariaDB has led to complications
in terms of scalability concerns
• Distributed computing, proliferation of cloud, and geo-
distributed applications are adding to the complexity
• Manual sharding techniques transfer the strain from the
database to the database administrator
• MySQL – and MySQL administrators – has/have never been
under so much strain
• Database scalability software enables users to move beyond
the limitations and complexity of DIY sharding; precisely how
data is managed with a distributed database in the cloud or on
premise is key.
Scale Out Designs
17
About ScaleBase
Distributed Database Management System
Architected for the Cloud
Simple. Reliable. Powerful.
18
Quick Scale Out
Medium scale needs
Multiple database
replicas performing load
balancing with
read/write splitting
Designs of Distributed MySQL Environments
Massive Scale Out
High scale needs
Complete distributed
database environment,
with policy-based data
sharding/distribution
19
Quick Scale-Out
Read/Write Splitting and
Continuous Availability
Application
Redirection
(ip/port)
MySQL Replicas
MySQL Master
R R R
R/W
20
Massive Scale-Out
0 1 2
etc.
Master
Replicas
Master
Replicas
Master
Replicas
Shards:
21
The Right Solution for You Depends on Your Goals
• Scale (mostly) reads
• Scale (mostly) writes
• Performance of reads
– Affected by joins and big tables scans of big tables
• Performance of writes
– Affected by IO r/wr, CPU and table indexes
(a growing overhead)
• Locks
• CPU/IO/ RAM issues
• Load peaks
• Data growth
• Geo-distribution, special data distribution needs
Pros and Cons of
Abstracting Complex Database Topology
23
Pros of Abstracting Complex Database Topology
• Development Agility - Accelerates
your innovation speed
• Simplifies application code
• Reduces maintenance costs and
simplifies it
• Operations Efficiency – Zero
downtime for applications
• Reduces operation costs
• Better monitoring, analytics, HA,
scale, elasticity, etc.
24
Cons of Abstracting Complex Database Topology
• Additional technology component may increase complexity
• Additional layer to monitor and manage
• Additional machines to monitor and manage (possible increased opex)
• Less control on application code level (transparent)
25
Scale Out
Methodologies
Comparison
Characteristics & Modeling in a
Distributed Database System
27
Characteristics of Distributed Table Types
• MASTER – On master shard (0) only
Site settings, Admin data tables
• GLOBAL – Full copy on all shards
Lookups, Frequently joined tables, Slow growing tables
• DISTRIBUTED-ROOT – Distribution based on a key column
User.Id
• DISTRIBUTED-CASCADED (child) – Based on parent row
User_Photos, User_Photos_Likes – depend on Users
Shards: 0 1 2 3
Full table
Full table Full table Full table Full table
¼ table ¼ table ¼ table ¼ table
28
Characteristics of Distributed Queries
• ONE-DB – 1 shard, 1 node. Most optimal.
1) Any call when data known to be in one shard (Distributed/Master)
2) Call to Global table (load balance)
• ALL-DB – All shards, 1 node.
1) AGREGATED READs (like map-reduce)
2) DML (writes) on Global tables
3) DDL (create, drop, alter schema)
• FULL-DB – All shards, all nodes.
Session calls (USE, SET)
• CROSS-DB – #n shards, 1 node. Least optimal, but critical
Cross-shard conflict resolution.
Note: Not all sharding platforms support all distributed query types.
29
Why Data Modeling is Important?
• DATA and LOAD – Efficient distribution of:
– DATA - all / main tables and data
– READS
– WRITES
• QUERIES
– Handle ALL-DB Queries (Map-reduce concept)
– Minimize (but support!) CROSS-DB Queries – higher performance and scale
• OPTIMIZE DEVELOPMENT with SQL ANALYTICS
– Insight into the real database usage
30
Data Relationships Can be Extremely Complex
Usually, scale out is applied to growing-mature apps.
How do you define an optimal data distribution policy?
Analysis Genie:
MySQL Visual Analysis &
Optimal Distribution Policy Configuration
32
ScaleBase Analysis Genie
• A tool enabling MySQL visual analysis and building an optimal data
distribution policy; Designed for DBAs, Architects & Dev. Managers
• Two step-process:
– Analysis Assistant
– An agent captures app/DB information, including SQL traffic and
database metrics
– Obfuscates, summarizes and packages the App-DB data
– Analysis Genie
– a SaaS application, receives the AA package and presents the
visual analysis and details the policy configuration
Analysis Assistant Analysis Genie
33
ScaleBase Analysis Genie
• Advanced analytics
– Schemas, data & queries
– Semantic structure analysis
– Usage, Load and Scale analytics
• Data Modeling and
Scale-out planning
– Customized for the most complex
applications
– Auto identification of optimal
data distribution policy
– Complete policy control
• Quality assurance
– Review before production
• Simulation of results
– “What-if” analysis
34
Relationship Identification
Mapping includes:
• Schemas structures
• Tables & columns names
matching
• Queries parsing and
identification of joined
tables and columns
• Statistics on every object
size and access
35
Analyzing Relationships: From Chaos to Order
Understanding
and mapping
complex
relationships
ScaleBase Genie Demo
37
MySQL Visual Analysis Demo
• Visual analysis
• Distribution policy identification and configuration
• Scale out load via data sharding (massive scale out)
ScaleBase Enterprise
Analysis
Genie
Summary
39
Reading Plus
Who:
• Online education company
Problem:
• Busy season (back-to-school) was approaching and they needed a solution
that could be quickly implemented, while guaranteeing uptime
• With increasing growth, they needed to implement a scale out solution quickly
Alternatives Considered:
• A clustering technology, which proved to be infeasible due to schema
complexity and a lengthy re-architecture requirement
Solution:
• Used visual analysis to determine best scale out plan
• ScaleBase Lite for instant scale out and continuous availability
• 35 Tomcat application servers were connected to 3 ScaleBase controllers
• ScaleBase performed automated read/write splitting and load balancing
40
Next Gen SaaS ERP Company
Who:
• Inventory management
ecommerce company
• Hosted on Rackspace
(ScaleBase Partner)
Problem:
• Largest available hardware could not support workload
Alternatives Considered:
• Initially went with a “black box” solution, encountering many issues
Solution:
• Scaled out a single MySQL instance to 8 clustered shards
• On-demand growth – current workload over 20,000 TPS
– Plan to double footprint in next quarter
– Support all production customers during Black Friday & Cyber Monday
41
Scale out to unlimited users
Continuous availability
Dynamic workload optimization
Fast and simple deployment
Easily scale out a single
MySQL instance
Optimized for the Cloud
Reduces time-to-market
No changes needed to app or database
Database usage analytics
Intelligent load balancing
Centralized data management
ScaleBase
Distributed Database Management System
42
Products and Editions
Community
Limited by
Deployment
Startup
Free for Qualified
Candidates
Enterprise
Massive
Scale Out
Also available on:
Lite
Quick
Scale Out
Analysis Genie Database Performance Analytics
43
How Can I Learn More?
Use visual analysis to plan your
scale out strategy
Download the
Analysis Genie:
https://www.scalebase.com/software
Read the 451 report about
ScaleBase (& the DB market)
Download Jason’s Report
(authored last week)
https://www.scalebase.com/resources/
whitepapers
Questions?
Contact Info:
Paul Campaniello
paul.campaniello@scalebase.com
Vladi Vexler
vladi.vexler@scalebase.com
Resources:
www.scalebase.com
www.scalebase.com/resources
www.scalebase.com/blog
info@scalebase.com
(617) 630.2800

More Related Content

Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

  • 1. Data Modeling and Scale Out Jason Stamper, 451 Research Vladi Vexler and Paul Campaniello, ScaleBase
  • 2. 2 Agenda Data Modeling and Scale Out 1. 451 Research • Key challenges in the data landscape • Evolution of distributed database environments 2. ScaleBase • Pros and cons of abstracting complex databases topology • Top strategies of distributed data modeling • Advanced data modeling and “what-if” simulations with Analysis Genie • Scaling real apps – From need to deployment • Demo 3. Q & A (please type questions directly into the GoToWebinar side panel)
  • 3. 3 Today’s Presenters Jason Stamper Analyst, Data Manage- ment and Analytics - 451 Research • Over 20 years of experience in IT • Formerly Editor of Computer Business Review & Technology Editor at The New Statesman Vladi Vexler Vice President, Tech. & Product Marketing - ScaleBase • Over 15 years experience in software development and product management • Author of patents in field of databases innovation, dynamic data caching and machine learning analytics Paul Campaniello Vice President, Worldwide Marketing - ScaleBase • Over 25 years of software marketing & sales experience • Held senior marketing and sales positions at Mendix, Lumigent, ESI, ComBrio, Savantis and Precise Software
  • 4. 4 About 451 Research Founded in 2000 210+ employees, including over 100 analysts 1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers 10,000+ senior IT professionals in our research community Over 52 million data points each quarter Headquartered in New York with offices in Boston, San Francisco, Washington, London… Research & Data Advisory Services Events
  • 5. 5 The Challenge Businesses and their users are facing what one might call a perfect storm – decision-makers need insight faster than ever, and yet IT is struggling to avoid becoming a bottleneck.
  • 6. 6 The Facts Speak for Themselves… Recent survey by trade magazine Computer Business Review: 98% (of 200 UK CIOs) admit “significant gap” between what business expects and what IT can deliver.
  • 7. 7 So What Does the Business Want? Speed Information, not data Flexibility Ease-of-use Mobility New ways of working Self-service Scale Collaboration
  • 8. 8 What Causes IT to Become a Bottleneck? Governance Control Security Budget Legacy Staff
  • 9. 9 What Have We Learned So Far? • So far, the emergence of so-called ‘hot’ data platform and analytics technologies have not solved the IT information bottleneck. • Hadoop isn’t going to save the world (and neither is NoSQL). • The ability to analyze large data sets, in real- or near real-time, is only set to grow in the era of the Internet of Things. • IT is still critical, but it needs to enable the business to help itself. The question is how to achieve the right blend of usability, value-for-money and scalability.
  • 10. 10 A Word or Two on Hadoop Adoption 0 2000 4000 6000 8000 2013 2012DW and DBMS Unstructured file Virtualized server/OS Backup Archive Other Big data/Hadoop Average total storage capacity (TBs), and total storage footprint by workload illustrate the low level of adoption today
  • 11. 11 451 Research’s View of the ‘Total Data Approach’
  • 12. 12 What is Driving the Change? Developers Agile REST JSON Schemaless Schema-on-read Flexible Applications Web Social Mobile Always-on Interactive Local Architecture Cloud Scalable Elastic Virtual Distributed Flexible New applications require distributed architecture Distributed architecture encourages new development approaches New development approaches demand new architecture Distributed architecture enables new applications New app requirements demand new development approaches New dev approaches enable new lightweight apps
  • 13. 13 The Database Challenge – The traditional relational database has been stretched beyond its normal capacity limits by the needs of high-volume, highly distributed or highly complex applications. – There are workarounds – such as DIY sharding – but manual, homegrown efforts can result in database administrators being stretched beyond their available capacity in terms of managing complexity. – Scalability – Performance – Relaxed consistency Increased willingness to look – Agility for emerging alternatives – Intricacy – Necessity
  • 14. 14 Scalability, and Other Challenges • As usage of MySQL and MariaDB has grown, so has the usage of applications that depend on MySQL and MariaDB: – Games; Social; Customer Facing; Web; Business apps like Ad Networks; • This has highlighted a number of challenges – Scalability of master-slave architecture – Performance and predictability at scale – Lower latency; greater throughput; richer apps – User expectations rising – Manageability of increasing database/app sprawl • External factors driving greater complexity: – Distributed computing architectures – Proliferation of cloud and elasticity requirements – Geo-distributed application requirements – Viral success means growth can come very quickly
  • 15. 15 Conclusions • The success of MySQL and MariaDB has led to complications in terms of scalability concerns • Distributed computing, proliferation of cloud, and geo- distributed applications are adding to the complexity • Manual sharding techniques transfer the strain from the database to the database administrator • MySQL – and MySQL administrators – has/have never been under so much strain • Database scalability software enables users to move beyond the limitations and complexity of DIY sharding; precisely how data is managed with a distributed database in the cloud or on premise is key.
  • 17. 17 About ScaleBase Distributed Database Management System Architected for the Cloud Simple. Reliable. Powerful.
  • 18. 18 Quick Scale Out Medium scale needs Multiple database replicas performing load balancing with read/write splitting Designs of Distributed MySQL Environments Massive Scale Out High scale needs Complete distributed database environment, with policy-based data sharding/distribution
  • 19. 19 Quick Scale-Out Read/Write Splitting and Continuous Availability Application Redirection (ip/port) MySQL Replicas MySQL Master R R R R/W
  • 20. 20 Massive Scale-Out 0 1 2 etc. Master Replicas Master Replicas Master Replicas Shards:
  • 21. 21 The Right Solution for You Depends on Your Goals • Scale (mostly) reads • Scale (mostly) writes • Performance of reads – Affected by joins and big tables scans of big tables • Performance of writes – Affected by IO r/wr, CPU and table indexes (a growing overhead) • Locks • CPU/IO/ RAM issues • Load peaks • Data growth • Geo-distribution, special data distribution needs
  • 22. Pros and Cons of Abstracting Complex Database Topology
  • 23. 23 Pros of Abstracting Complex Database Topology • Development Agility - Accelerates your innovation speed • Simplifies application code • Reduces maintenance costs and simplifies it • Operations Efficiency – Zero downtime for applications • Reduces operation costs • Better monitoring, analytics, HA, scale, elasticity, etc.
  • 24. 24 Cons of Abstracting Complex Database Topology • Additional technology component may increase complexity • Additional layer to monitor and manage • Additional machines to monitor and manage (possible increased opex) • Less control on application code level (transparent)
  • 26. Characteristics & Modeling in a Distributed Database System
  • 27. 27 Characteristics of Distributed Table Types • MASTER – On master shard (0) only Site settings, Admin data tables • GLOBAL – Full copy on all shards Lookups, Frequently joined tables, Slow growing tables • DISTRIBUTED-ROOT – Distribution based on a key column User.Id • DISTRIBUTED-CASCADED (child) – Based on parent row User_Photos, User_Photos_Likes – depend on Users Shards: 0 1 2 3 Full table Full table Full table Full table Full table ¼ table ¼ table ¼ table ¼ table
  • 28. 28 Characteristics of Distributed Queries • ONE-DB – 1 shard, 1 node. Most optimal. 1) Any call when data known to be in one shard (Distributed/Master) 2) Call to Global table (load balance) • ALL-DB – All shards, 1 node. 1) AGREGATED READs (like map-reduce) 2) DML (writes) on Global tables 3) DDL (create, drop, alter schema) • FULL-DB – All shards, all nodes. Session calls (USE, SET) • CROSS-DB – #n shards, 1 node. Least optimal, but critical Cross-shard conflict resolution. Note: Not all sharding platforms support all distributed query types.
  • 29. 29 Why Data Modeling is Important? • DATA and LOAD – Efficient distribution of: – DATA - all / main tables and data – READS – WRITES • QUERIES – Handle ALL-DB Queries (Map-reduce concept) – Minimize (but support!) CROSS-DB Queries – higher performance and scale • OPTIMIZE DEVELOPMENT with SQL ANALYTICS – Insight into the real database usage
  • 30. 30 Data Relationships Can be Extremely Complex Usually, scale out is applied to growing-mature apps. How do you define an optimal data distribution policy?
  • 31. Analysis Genie: MySQL Visual Analysis & Optimal Distribution Policy Configuration
  • 32. 32 ScaleBase Analysis Genie • A tool enabling MySQL visual analysis and building an optimal data distribution policy; Designed for DBAs, Architects & Dev. Managers • Two step-process: – Analysis Assistant – An agent captures app/DB information, including SQL traffic and database metrics – Obfuscates, summarizes and packages the App-DB data – Analysis Genie – a SaaS application, receives the AA package and presents the visual analysis and details the policy configuration Analysis Assistant Analysis Genie
  • 33. 33 ScaleBase Analysis Genie • Advanced analytics – Schemas, data & queries – Semantic structure analysis – Usage, Load and Scale analytics • Data Modeling and Scale-out planning – Customized for the most complex applications – Auto identification of optimal data distribution policy – Complete policy control • Quality assurance – Review before production • Simulation of results – “What-if” analysis
  • 34. 34 Relationship Identification Mapping includes: • Schemas structures • Tables & columns names matching • Queries parsing and identification of joined tables and columns • Statistics on every object size and access
  • 35. 35 Analyzing Relationships: From Chaos to Order Understanding and mapping complex relationships
  • 37. 37 MySQL Visual Analysis Demo • Visual analysis • Distribution policy identification and configuration • Scale out load via data sharding (massive scale out) ScaleBase Enterprise Analysis Genie
  • 39. 39 Reading Plus Who: • Online education company Problem: • Busy season (back-to-school) was approaching and they needed a solution that could be quickly implemented, while guaranteeing uptime • With increasing growth, they needed to implement a scale out solution quickly Alternatives Considered: • A clustering technology, which proved to be infeasible due to schema complexity and a lengthy re-architecture requirement Solution: • Used visual analysis to determine best scale out plan • ScaleBase Lite for instant scale out and continuous availability • 35 Tomcat application servers were connected to 3 ScaleBase controllers • ScaleBase performed automated read/write splitting and load balancing
  • 40. 40 Next Gen SaaS ERP Company Who: • Inventory management ecommerce company • Hosted on Rackspace (ScaleBase Partner) Problem: • Largest available hardware could not support workload Alternatives Considered: • Initially went with a “black box” solution, encountering many issues Solution: • Scaled out a single MySQL instance to 8 clustered shards • On-demand growth – current workload over 20,000 TPS – Plan to double footprint in next quarter – Support all production customers during Black Friday & Cyber Monday
  • 41. 41 Scale out to unlimited users Continuous availability Dynamic workload optimization Fast and simple deployment Easily scale out a single MySQL instance Optimized for the Cloud Reduces time-to-market No changes needed to app or database Database usage analytics Intelligent load balancing Centralized data management ScaleBase Distributed Database Management System
  • 42. 42 Products and Editions Community Limited by Deployment Startup Free for Qualified Candidates Enterprise Massive Scale Out Also available on: Lite Quick Scale Out Analysis Genie Database Performance Analytics
  • 43. 43 How Can I Learn More? Use visual analysis to plan your scale out strategy Download the Analysis Genie: https://www.scalebase.com/software Read the 451 report about ScaleBase (& the DB market) Download Jason’s Report (authored last week) https://www.scalebase.com/resources/ whitepapers
  • 44. Questions? Contact Info: Paul Campaniello paul.campaniello@scalebase.com Vladi Vexler vladi.vexler@scalebase.com Resources: www.scalebase.com www.scalebase.com/resources www.scalebase.com/blog info@scalebase.com (617) 630.2800

Editor's Notes

  1. Here is a summary of different approaches. More detailed description can be found on our website, under Resources -> Competitive Comparison Explain the circles, We are the only one for example that provide Advanced Analytics, which is the foundation for defining optimal distribution policy. ScaleBase solution is the most simple to deploy, enabling shortest go-to-market and lowest maintenance
  2. One of first steps is to Visually Analyze complete summary about state of your MySQL tables: - Physical and Logical Sizes, Writes, Reads, Joins
  3. Determine optimal distribution policy for your specific application and database Analyze your existing schema and queries What is the current structure of your data How is your data accessed by the applications What is the size and rate of writes to individual tables
  4. Determine optimal distribution policy for your specific application and database Analyze your existing schema and queries What is the current structure of your data How is your data accessed by the applications What is the size and rate of writes to individual tables
  5. Risk Cost savings (ROI) Time to market Building solution takes years Open source is limited Not comprehensive Lack of technical support and services Custom built Inefficient and hard to maintain
  6. Risk Cost savings (ROI) Time to market Building solution takes years Open source is limited Not comprehensive Lack of technical support and services Custom built Inefficient and hard to maintain