This document discusses PowerBI and R. It provides an overview of Microsoft R products including Microsoft R Open, Microsoft R Server, and SQL Server R Services. It explains how SQL Server R Services integrates R with SQL Server for scalable in-database analytics. Examples of using R with PowerBI, SQL Server, and Azure are provided. The document also compares the capabilities of Microsoft R Open, Microsoft R Server, and open source R and discusses using R for advanced analytics, predictive modeling, and big data at scale.
Report
Share
Report
Share
1 of 24
More Related Content
20160317 - PAZUR - PowerBI & R
1. PowerBI & R
Łukasz Grala
Architect Data Platform & Advanced Analytics & BI Solutions
Data Platform MVP
Uniwersytet Ekonomiczny w Poznaniu
2016-03-17
2. @Łukasz Grala – lukasz@tidk.pl
• Architekt rozwiązań Platformy Danych & Business Intelligence & Zaawansowanej Analityki w TIDK
• Certyfikowany trener Microsoft i wykładowca na wyższych uczelniach
• Autor zaawansowanych szkoleń i warsztatów, oraz licznych publikacji i webcastów
• Od 2010 roku wyróżniany nagrodą Microsoft Data Platform MVP
• Doktorant Politechnika Poznańska – Wydział Informatyki (obszar bazy danych, eksploracja danych,
uczenie maszynowe)
• Prelegent na licznych konferencjach w kraju i na świecie
• Posiada liczne certyfikaty (MCT, MCSE, MCSA, MCITP,…)
• Członek Polskiego Towarzystwa Informatycznego
• Członek i lider Polish SQL Server User Group (PLSSUG)
• Pasjonat analizy, przechowywania i przetwarzania danych, miłośnik Jazzu
4. BI and Analytics PlatformAdvanced Analytics
Gartner MQ
Data Warehouse
5. New BI Solutions
ETL Tool
(SSIS, etc) EDW
(SQL Server, Teradata, etc)
Extract
Original Data
Load
Transformed
Data
Transform
BI Tools
Ingest (EL)
Original Data
Scale-out
Storage &
Compute
(HDFS, Blob Storage,
etc)
Transform & Load
Data Marts
Data Lake(s)
Dashboards
Apps
Streaming data
lukasz@tidk.pl
6. lukasz@tidk.pl
• Free and open source R distribution
• Enhanced and distributed by Revolution Analytics
Microsoft R Open
Microsoft R Products
7. lukasz@tidk.pl
• Built in Advanced Analytics and Stand Alone Server Capability
• Leverages the Benefits of SQL 2016 Enterprise Edition
SQL Server R Services
Microsoft R Products
8. lukasz@tidk.pl
Microsoft R Server
• Microsoft R Server for Redhat Linux
• Microsoft R Server for SUSE Linux
• Microsoft R Server for Teradata DB
• Microsoft R Server for Hadoop on Redhat
Microsoft R Server
9. Introducing SQL Server 2016 R Services
Enterprise speed and
performance
Near-DB analytics
Parallel threading and
processing
Model on-premises, store
in cloud—or vice versa
Hybrid memory and disk
scalability
Not bound by memory-
enabling limits of larger
datasets
Included in SQL Server 2016
Reuse and optimize existing
R code
Eliminate data movement
across machines
Write once, deploy
anywhere
10. Scalable in-database analytics
Data Scientist
Interacts directly with data
Creates models
and experiments
Data Analyst/DBA
Manages data and
analytics together
Example Solutions
• Fraud detection
• Sales forecasting
• Warehouse efficiency
• Predictive maintenance
010010
100100
010101
Relational Data
Extensibility
?
R
R Integration
Analytic Library
Open Source R
Revolution PEMA
T-SQL Interface
How is it Integrated?
• T-SQL calls a Stored Procedure
• Script is run in SQL through
extensibility model
• Result sets sent through Web API to
database or applications
Benefits
• Faster deployment of ML models
• Less data movement, faster insights
• Work with large datasets: mitigate R
memory and scalability limitations
16. • Multithreaded library replaces standard
BLAS/LAPACK algorithms
• Intel MKL on Windows/Linux ; Accelerate on Mac
• High-performance algorithms
• Sequential Parallel
• Uses as many threads as there are available cores
• No need to change any R code
• Included with RRO binary distributions
Multi-threaded performance
16
17. lukasz@tidk.pl
ScaleR - Performance comparison
Microsoft R Server has no data size limits in relation to size of available RAM. When open source R operates on data sets that exceed
RAM it will fail. In contrast Microsoft R Server scales linearly well beyond RAM limits and parallel algorithms are much faster.
US flight data for 20 years
Linear Regression on Arrival Delay
Run on 4 core laptop, 16GB RAM and 500GB SSD
18. lukasz@tidk.pl
DistributedR
ScaleR
ConnectR
DevelopR
Distributed R - Model development and model compute choice:
“Write Once. Deploy Anywhere.”
Code Portability Across Platforms
In the Cloud
Workstations & Servers Linux
Windows
EDW Teradata
Hadoop
Hortonworks
Cloudera
MapR
+ HD Insights
+ Hadoop Spark
+ R Tools for
Visual Studio
+ Azure ML
Roadmap
Azure Marketplace
+ SQL Server v16
MicrosoftRServer
19. lukasz@tidk.pl
Datasize
In-memory
In-memory In-Memory or Disk Based
Speed of Analysis
Single threaded Multi-threaded
Multi-threaded, parallel processing
1:N servers
Support
Community Community Community + Commercial
Analytic Breadth &
Depth 7500+ innovative analytic
packages
7500+ innovative analytic
packages
7500+ innovative packages +
commercial parallel high-speed
functions
Licence
Open Source
Open Source
Commercial license.
Supported release with indemnity
CRAN, MRO, MRS Comparison
Microsoft
R Open
Microsoft
R Server
20. lukasz@tidk.pl
• More efficient and multi-threaded math computation.
• Benefits math intensive processing.
• No benefit to program logic and data transform
CRAN R compared to Microsoft R Open
• Matrix calculation – upto 27x faster
• Matrix functions – upto 16x faster
• Programation – 0x faster
21. lukasz@tidk.pl
Naïve Bayes
ScaleR Functions & Algorithms
Data import – Delimited, Fixed, SAS, SPSS, OBDC
Variable creation & transformation
Recode variables
Factor variables
Missing value handling
Sort, Merge, Split
Aggregate by category (means, sums)
Min / Max, Mean, Median (approx.)
Quantiles (approx.)
Standard Deviation
Variance
Correlation
Covariance
Sum of Squares (cross product matrix for set
variables)
Pairwise Cross tabs
Risk Ratio & Odds Ratio
Cross-Tabulation of Data (standard tables & long
form)
Marginal Summaries of Cross Tabulations
Chi Square Test
Kendall Rank Correlation
Fisher’s Exact Test
Student’s t-Test
Subsample (observations & variables)
Random Sampling
Data Step Statistical Tests
Sampling
Descriptive Statistics
Sum of Squares (cross product matrix for set
variables)
Multiple Linear Regression
Generalized Linear Models (GLM) exponential family
distributions: binomial, Gaussian, inverse Gaussian,
Poisson, Tweedie. Standard link functions: cauchit,
identity, log, logit, probit. User defined distributions &
link functions.
Covariance & Correlation Matrices
Logistic Regression
Classification & Regression Trees
Predictions/scoring for models
Residuals for all models
Predictive Models
K-Means
Decision Trees
Decision Forests
Gradient Boosted Decision Trees
Cluster Analysis
Classification
Simulation
Variable Selection
Stepwise Regression
Simulation (e.g. Monte Carlo)
Parallel Random Number Generation
Combination
PEMA-R API
rxDataStep
rxExec
22. Microsoft and R
• Microsoft Open R/R Server
• SQL Server 2016
• Reporting Services & Mobile Reports
• PowerBI
• Azure Data Lake Storage & Analytics
• Azure Machine Learning
• Azure HDInsight
24. • 16-18 maj 2016
• Wrocław Centrum Konferencyjne
• 3 dni, 6 warsztatów, 4 ścieżki, ponad 30 prelegentów, 50 sesji
• 600 uczestników + sponsorzy + prelegenci + organizatorzy
• Goście między innymi z USA, Anglii, Niemiec, Ukrainy, Bułgarii, Słoweni
• Premiera techniczna SQL Server 2016
sqlday.pl @sqlday
lukasz@tidk.pl
W tym warsztat Big Data Analytics – Łukasz Grala & Marcin Szeliga
Editor's Notes
Slide objective
Introduce the three value proposition pillars of SQL Server 2016 R Services.
Talking points
SQL Server 2016 R Services brings the perfect mix of fast querying and In-Memory OLTP optimization from SQL Server 2016, as well as data exploration, predictive modeling, scoring, and visualization from the R Services family of products.
[CLICK]
It delivers unprecedented enterprise speed and performance for advanced analytics, thanks to near-database analytics and parallel threading and processing.
[CLICK]
It also delivers scalability and choice not seen before from a stable, commercial platform for advanced analytics. Its on-premises, cloud, and hybrid benefits, as well as its limits with large datasets, are unmatched.
[CLICK]
Finally, there is no additional cost because the offering is included in SQL Server 2016. In addition, the ability to reuse existing R code and eliminate data movement across machines provides significant value.