Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Microsoft Technologies
for Data Science
Mark Tabladillo, Ph.D.
Solution Architect (Data Scientist)
Microsoft
August 2016: SQL Saturday Columbus GA
Networking
Interactive















Terms Definition
Data Science
Machine Learning
Data Mining
Applied Statistics
the automated or semi-
automated process of
discovering patterns in
data
Applied scientific method
http://www.kdnuggets.com/polls/2015/analytics-
data-mining-data-science-software-used.html
http://products.office.com/en-us/excel
http://www.microsoft.com/en-
us/server-cloud/products/sql-server/
http://pytools.codeplex.com/
http://azure.microsoft.com/en-
us/services/hdinsight/
http://www.revolutionanalytics.com/
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608

Technology Choices
SQL SERVER ANALYSIS SERVICES Enterprise
Business Intelligence
EXCEL ADD-IN FOR SSAS Office 365
Office 2013 or Higher x64
SEMANTIC SEARCH Enterprise
Business Intelligence
Standard
Web
Express with Advanced Services
MICROSOFT AZURE ML Free (Size Limited)
Paid (Web Service): Experiment + Query
F# Open Source
SQL SERVER R SERVICES SQL Server 2016 or higher
Microsoft Data Science Technologies 201608
http://download.microsoft.com/download/F/C/2/FC21C981-
4351-4434-A78A-
3384CA7515BF/SQL_Server_2016_Deeper_Insights_Across_D
ata_White_Paper.pdf
SS
SQL
AS
NoSQL
Microsoft Data Science Technologies 201608
Data mining add-in for business
analysts
• Ease of use
• Rich data mining
• Scalable
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
Rowset
Output
with Scores
Varchar
NVarchar
Office
PDF
Documents
Full-Text
Keyword
Index
“FTI”
iFilters
Semantic Document
Similarity Index “DSI”
Semantic
Database
Semantic
Key Phrase
Index –
Tag Index
“TI”
Simplified Chinese
British English
Portuguese
Chinese (Hong Kong SAR, PRC)
Spanish
Chinese (Singapore)
Chinese (Macau SAR)
Time in Seconds vs. Number of Documents
(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
Features
Microsoft R Open
R Distribution (Free)
Microsoft R Client
Free
Microsoft R Server
Commercial
Big Data
In-memory bound
Can only process datasets that fit
into the available memory
In-memory bound
Can process datasets that fit into the available
memory
Operates on large volumes when connected
to R Server
Disk scalability
Operates on bigger volumes &
factors
Speed of
Analysis
Multi-threaded when MKL is
installed for non-ScaleR functions
Multi-threaded with MKL for non-ScaleR
functions
Up to 2 threads for ScaleR functions with a
local compute context
Full parallel threading &
processing
Enterprise
Readiness
Community support Community support Commercial support
Analytic
Breadth
& Depth
8000+ open source packages
Leverage & optimize open source R packages
plus 'Big Data'-ready ScaleR packages
Leverage & optimize open source
R packages plus 'Big Data'-ready
+ Multithreaded ready ScaleR
packages
Commercial
Viability
Risk of deployment to open
source
Free for everyone Commercial licenses
DeployR
Enterprise
Not available Not available Included
Microsoft R Server Editions Description Install ScaleR Get Started
R Server for Hadoop
Scale your analysis transparently
by distributing work across
nodes without complex
programming
Doc Doc
R Server for Teradata DB
Run advanced analytics in-
database for seamless data
analysis
Doc Doc
R Server for Linux
Bring predictive and prescriptive
analytics power to your Linux
environments
Doc Doc
 http://datacamp.com

Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
Mutable Immutable
Classic Open
Source
Java Scala
.NET
Now Open Source
C#, C++,
VB.NET
F#
Microsoft Data Science Technologies 201608



Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
https://www.microsoft.com/en-us/cloud-platform/what-is-cortana-intelligence-suite
Capabilities Products
Preconfigured solutions •Business scenarios •Forecasting, churn, etc.
Intelligence
•Integration with Cortana
•Bot services
•Cognitive services
•Cortana
•Bot Framework
•Cognitive Services
Dashboards and visualizations •Dashboards and visualizations •Power BI
Machine learning and advanced
analytics
•Machine learning
•Hadoop
•Distributed analytics
•Complex event processing
•Machine Learning
•HDInsight (Data Lake service)
•Data Lake analytics
•Stream Analytics
Big data stores
•Big Data repository
•Elastic data warehouse
•Data Lake store, Blobs
•SQL Data Warehouse
Information management
•Data orchestration
•Data catalog
•Event ingestion
•Data Factory
•Data catalog
•Event Hubs

 https://github.com/jakevdp/sklearn_pycon2015
Microsoft Data Science Technologies 201608
 http://www.bing.com/explore/predicts
 https://techcrunch.com/2016/07/07/microsoft-now-helps-businesses-use-the-data-that-powers-bing-predicts/
Microsoft Data Science Technologies 201608
 https://academy.microso
ft.com/en-
US/professional-
degree/data-science/
 https://borntolearn.msle
arn.net/b/weblog/posts/
announcing-the-
microsoft-professional-
degree-mpd-program
http://www.kdnuggets.com/2015/09/free-data-science-
books.html

https://channel9.msdn.com/Blogs/Windows-Azure

https://mva.microsoft.com/



http://blogs.technet.com/b/machinelearning/
http://social.msdn.microsoft.com/forums/azure/en-
US/home?forum=MachineLearning
http://sqlserverdatamining.com
http://marktab.net
http://curah.microsoft.com/342704/azure-machine-learning-
videos-february-2015

 http://datascience.sqlpass.org/

 https://www.youtube.com/channel/UCqB3xWdwjA9soFV6EOu7qfg
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608


More Related Content

Microsoft Data Science Technologies 201608