Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
62 views

Solution Path For Implementing A Comprehensive Architecture For Data and Analytics Strategies

The document discusses building a holistic data management architecture to support business intelligence and advanced analytics. It provides a solution path and overview of key components including data governance, the logical data warehouse, data lakes, machine learning, and analytic deployment. The document also includes several figures illustrating components of a data reference architecture.

Uploaded by

rainworm99
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Solution Path For Implementing A Comprehensive Architecture For Data and Analytics Strategies

The document discusses building a holistic data management architecture to support business intelligence and advanced analytics. It provides a solution path and overview of key components including data governance, the logical data warehouse, data lakes, machine learning, and analytic deployment. The document also includes several figures illustrating components of a data reference architecture.

Uploaded by

rainworm99
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Solution Path for Implementing a

Comprehensive Architecture for Data and


Analytics Strategies
Downloadable Figures

Carlton Sapp

CONFIDENTIAL AND PROPRIETARY


This presentation, including any supporting materials, is owned by Gartner, Inc. and/or its affiliates and is for the sole use of the intended Gartner audience or other intended recipients. This presentation may contain information that is
confidential, proprietary or otherwise legally protected, and it may not be further copied, distributed or publicly displayed without the express written permission of Gartner, Inc. or its affiliates. © 2018 Gartner, Inc. and/or its affiliates.
All rights reserved.
Figure 1
How Can I Build a Holistic Data Management Architecture to Support Business Intelligence and
Advanced Analytics?
"EIM 1.0: Setting Up "Enabling Streaming "Adopt the Logical Data "Solution Path for "Create a Data Reference "Making Machine Learning
Enterprise Information Architectures for Warehouse Architecture Planning and Architecture to Enable a Scalable Enterprise
Management and Continuous Data and to Meet Your Modern Implementing the Logical Self-Service BI" Reality — From
Governance" Events" Analytical Needs" Data Warehouse" Development to Production"

1. Enterprise Information 2 Acquire and 3. Enable for 4. Enable Business 5. Extend and Automate 6. Deploy and Integrate
intelligence and advanced analytics capabilities? Management Organize Analytics Insights With AI/ML Into Operations
management architecture to support business
How can I build and maintain a holistic data

1.1 Information 2.1 Ingest and


Governance Analyze 4.1 Business 6.1 Analytic
3.1 Data 5.1 Real-Time
1.2 Master Intelligence Deployment
Warehouse Analytics
Data and Visualization Architectures
Management
1.3 Enterprise
Metadata 2.2 Process 3.2 Data 3.3 Data 5.2 Machine
Management and Transform Lake Virtualization 4.2 Self-Service Learning
Data 6.2 Points of
Preparation Integration for
and Analytics Analytic Systems
1.5
1.4 Data Information
Quality Life Cycle
Assurance Management 2.3
Integrate 4.3 5.3 Automate
Advanced 6.3 Model
1.6 Privacy and Store 3.4 Logical Data and Scale Deployment for
and Security Warehouse Analytics
Data Scientists

Continuous Align With Governance People Account for Agile Database Cloud Versus
Improvement Business Strategy and Skills Citizen Roles Development On-Premises

"Enable Essential Data "Applying Effective Data "Implement Agile Database "Migrating Enterprise
Governance for Successful Big Governance to Secure Development to Support Your Databases and Data to
Data Architecture Deployment" Your Data Lake" Continuous Delivery Initiative" the Cloud"
ID: 351281 © 2018 Gartner, Inc.

1 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 2

An End-to-End Data Reference Architecture

Acquire Organize Analyze Deliver


Data Sources
Streaming/In Motion
LOB Apps Analytic
Policy Orchestration Deployment
Stream Ingestion RT Algorithm
IT Log IoT Feeds
Streaming Real Time
Video API High-Performance Ingestion
RT Integration Message Traditional Interact
Image Audio Broker Queue Reporting
Device
IoT Self-Service Pub/Sub
Staging/At Rest Platform Data Analytic Smart Data
Capabilities Discovery
Preparation/
Operational Systems Data Access
Analyze
Distributed
Logical Data Warehouse Columnar Optimize Visual
Embed
Layer Forecast
Exploration

Other NoSQL Data


Distributed Storage/
Process In-Memory Report Analytic
Data Management
NoSQL NoSQL Temporary Plan Dashboards
API Integration Virtualization NoSQL Data Discover App
Other IMDB Storage API
Transform SQL Other/Hadoop Services Collaborate Storytelling/
Narrative
Centralized/Monolithic Aggregate Market- Predict
Automate
SQL RDBMS place or Model
Datasets Advanced Enrich
Analytics Data
ERPs API Store
Doc Image Feeds Administration

Video Text IT Log

External

Manage and Govern = Optional


Information Governance (Including Metadata Management, Data Quality, Data Modeling, Master Data Management), = Cloud, On-Premises
Data Management (Data Admin., Security, Privacy and Identity) and Organization (People) or Hybrid

ID: 351281 © 2018 Gartner, Inc.

2 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 3
EIM Components and Their Relationship to
Information Governance

Metadata
Management

Information
Master Data
Lifecycle
Management
Management
Information
Governance

Privacy and Data Quality


Security Management

ID: 351281 © 2018 Gartner, Inc.

3 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 4
EIM and Information Governance Creates a Virtuous Cycle

Governance Compliance

Value Protection

Insights Trust

Findability
ID: 351281 © 2018 Gartner, Inc.

4 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 5
Social Metadata — The Tip of the Metadata Spear

Social
▪ Specialist insight
▪ Tribal knowledge
▪ Usage tips
▪ Context and veracity

Business
▪ Definitions
▪ Quality rules
▪ Integration rules
▪ Usage rules
Technical Operational
▪ Data model ▪ Lineage
▪ Type ▪ Quality
▪ Format ▪ Provenance
▪ Structure

ID: 351281 © 2018 Gartner, Inc.

5 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 6

Application and Data Security Overview

Cloud
Security
and Emerging
Technology
Security
Threat and Applications ▪ Data security governance
Vulnerability and Data defines policies and controls
Management IAM and Security
Security ▪ Data is pervasive; data
Team security must be too
▪ Data residency and hacking
are top risks
Security
Endpoint and
Monitoring
Mobile ▪ Application security testing
and is critical
Security
Operations
Network and
Gateway
Security

ID: 351281 © 2018 Gartner, Inc.

6 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 7
Data and Analytics Architecture — Acquire and Organize
Acquire Organize Analyze Deliver
Data Sources

Streaming/In Motion
LOB Apps

Policy Orchestration
Stream Ingestion RT Algorithm
IT Log IoT Feeds
Real Time
Streaming
Video API
RT Traditional Interact
Image Audio Reporting
Device
Message High-Performance Pub/Sub
Broker Ingestion Strategy Self-Service
Analytic Smart Data
Staging/At Rest Data
Capabilities Discovery
Preparation/
Operational Systems Data Access Analyze
Logical Data Warehouse Layer Optimize Visual Embed
Distributed Columnar
Forecast Exploration
Other NoSQL Traditional Distributed Storage/
Data Process In-Memory Report
NoSQL NoSQL Management Temporary Plan Analytic
Data Dashboards App
Virtualization NoSQL Storage
Other IMDB API Integration Discover API
Transform SQL Data Collaborate
Other/ Storytelling/
Centralized/Monolithic Aggregate Hadoop Services Predict
Market- Narrative Automate
SQL RDBMS Model Enrich
place or Data
Datasets Store
Advanced
ERPs API
Analytics
Image Feeds Message
Doc
Queues Administration

Video Text IT Log


Evolving Into IoT
External Platforms
Apps

ID: 351281 © 2018 Gartner, Inc.


7 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.
Figure 8
Data and Analytics Architecture — Store and Enable
Acquire Organize Analyze Deliver
Data Sources

Streaming/In Motion
LOB Apps

Policy Orchestration
Stream Ingestion RT Algorithm
IT Log IoT Feeds
Real Time
Streaming
Video API
RT Traditional Interact
Image Audio Reporting
Device
Message High-Performance Pub/Sub
Broker Ingestion Strategy Self-Service
Analytic Smart Data
Staging/At Rest Data
Capabilities Discovery
Preparation/
Operational Systems Data Access Analyze
Logical Data Warehouse Layer Optimize Visual Embed
Distributed Columnar
Forecast Exploration
Other NoSQL Traditional Distributed Storage/
Data Process In-Memory Report
NoSQL NoSQL Management Temporary Plan Analytic
Data Dashboards App
Virtualization NoSQL Storage
Other IMDB API Integration Discover API
Transform SQL Data Collaborate
Other/ Storytelling/
Centralized/Monolithic Aggregate Hadoop Services Predict
Market- Narrative Automate
SQL RDBMS Model Enrich
place or Data
Logical Data Lakes Datasets
Advanced Store
ERPs API
Internal External Data Science Analytics
Image Feeds Message Data Lake Data Lake Data Lake
Doc
Queues Administration
Object
Video Text IT Log Cloud Stores

External
Apps

ID: 351281 © 2018 Gartner, Inc.


8 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.
Figure 9
The LDW Conceptual Architectural Diagram
Analytical Clients

Descriptive Diagnostic Predictive Prescriptive

Logical Data Warehouse

Semantic Management

Management
Metadata
Repository Virtualization Distributed Process

Data Integration Layer

Data Sources

Social Image Text Feeds Video Audio IT Log

Streaming RDF RDBMS HDFS ERPs Doc

ID: 351281 © 2018 Gartner, Inc.

9 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 10
The LDW Architecture
Key Question: "What are we building?"
Company Strategy
Business Processes

Operational Query and Reporting Standard and Self-Service Reporting/ Data Science, AI, Machine Learning,
APIs
Reporting Ad Hoc Data Sourcing Statistical, Predictive

Metadata/Model, Governance, Change Control


LDW Noncore ODS DW Platforms Nonrelational Platforms
Systems Platform Virtual Marts
Other DB

Access/Virtualization/Federation
Remote & Third- ODS DW Marts Data Lake
Party Systems Graph,
(Not Just Hadoop) Document,
Legacy DWs Sandboxes Sandboxes …
Staging

Security
Infrastructure Platforms And/Or
On-Premises Servers (Compute) Cloud Servers (Compute)
Node Node Node Node Node Node Node Node Node Node Node Node

Hot Data Storage Hot Data Storage

Cool Data Storage Cool Data Storage

Ingest: ETL, ELT, Replication, Streaming, Business, Rules Streaming Analytics

Key Queryable Process Governance Source Source Source Source Source Source Source
Ingest Platform HTAP HTAP HTAP
ID: 351281 © 2018 Gartner, Inc.

10 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 11 Business Analytic Reference Architecture — From Data to
Insight to Action
Analytic Output
Artifacts Options

Analytic Traditional
Capabilities Reporting
Interact
Device
Analyze

Optimize
Self-Service Smart Data

Data Source Connectors


Data Discovery
Preparation/ Forecast
Data Access
Layer Columnar Report
Storage/
In-Memory Plan Visual
Temporary Exploration Embed
Storage Discover
Data App
Services — Collaborate Analytic
Marketplace Dashboards
or Datasets Predict

Model Automate
Storytelling/
Narrative
Enrich Data
Store

Administration Advanced
Analytics

ID: 351281 © 2018 Gartner, Inc.

11 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 12

The Basics of Machine Learning Technology


Input Data "Training" Output Data
Machine Learning (a
Data-Driven Approach)

Feed Machine Learning Align Appropriate Type of Present Results Based on


Various Data Machine Learning Data
(e.g., structured Algorithm (e.g., exploratory, predictive
and unstructured) (e.g., supervised and classification)
and unsupervised)

ID: 351281 © 2018 Gartner, Inc.

12 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 13

Machine Learning Architecture


Data Processing
(Feature Engineering)
Processing Engine

Transform- Normal- Cleaning


Data ation ization and
Acquisition Encoding Execution Deployment
Data
Ingestion

ERP
Databases
Stream Preprocessing Sample Training/
Processing Data Selection Testing Set
Platform

Experimen- Testing Tuning


Mainframe tation
Model
Batch Data Engineering
IoT Warehouse
Devices Data
Machine Storage
Algorithms

Clustering Learning
Algorithm Algorithm Execution

ID: 351281 © 2018 Gartner, Inc.

13 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 14
End-to-End ML and Analytics Architecture
ACQUIRE ORGANIZE ANALYZE DELIVER
Data Sources Processing and Data Preparation
Streaming/In Motion LOB Apps
Feature
Engineering Policy Orchestration
Messaging Stream
Ingestion Includes feature
IT Log IoT Feeds Protocols Cleaning and
Real Time Transformation Normalization extraction and
Encoding
feature
Streaming API transformation
Video R/T Execution and Operationalizing

Experiment Test Tune Traditional


Reporting Interact
Image Audio Device
IoT Platform PUB/
Preprocessing Sample Training/ Analytic
Staging/At Rest Data Selection Testing Set SUB
Capabilities
Operational Systems Self-Service Smart Data
Data Analyze Discovery
Distributed
Preparation/
NoSQL Data Data Access
Other Optimize
Modeling Layer
Visual
NoSQL NoSQL Forecast Exploration
Model
Engineering Columnar Embed
Other IMDB Storage/ Report
Includes model
fitting and model In-Memory
Clustering Learning Analytic
Centralized/Monolithic evaluation Temporary Plan
Algorithm Algorithm Dashboards
API Storage App
SQL RDBMS API
Discover
Logical Data Warehouse
ERPs Data Data Storytelling
Collaborate Automate
Integration Traditional Distributed Process Services – /Narrative
Image Feeds Transform Data Marketplace
Doc Aggregate Virtualization or Datasets Predict
Management Other/ Enrich
NoSQL Data
SQL Hadoop Model Advanced Store
API
Video Text IT Log Analytics

External Administration

Manage and Govern = Optional


Information Governance (including Metadata Management, Data Quality, Data Modeling, Master Data Management), Data Management (Data Admin, Security, Privacy and
Identity), Organization (People) = Cloud, On-Premises or Hybrid

ID: 351281 © 2018 Gartner, Inc.

14 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 15

Taxonomy for AI Services


▪ Computer vision APIs ▪ Predictive analytics with ML
▪ Language processing APIs ▪ Platform/Infra services for data
Machine scientists (PaaS/ IaaS)
▪ Speech APIs AI Services Learning
▪ CRM AI APIs Services ▪ Python distribution services
(Anaconda)
▪ ML Pkgs (Spark MLlib)

Compute
▪ AutoML (DataRobot)

R Distribution
▪ Decision engines Expert Systems ▪ Archive network service
Services
(Insight Engines)
▪ Intelligent search (CRAN, MRAN) ▪ R distribution services
▪ Knowledge- ▪ Package managers
based systems

Quantitative
ID: 351281 © 2018 Gartner, Inc.

15 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 16

Example of Connecting to Data Versus Collecting It

Execution
in-database

Connecting

Store procedure
contains Python
code that executes
in-database

Stored Procedure Call (R)DBMS

Application

Results
Server

ML Services

Python Runtime

ID: 351281 © 2018 Gartner, Inc.

16 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 17

Key Drivers for MLaaS

Built-In Reliability Established Uniformity Consistent Reproducibility

Trust Dealing With Diversity Managing Complexity

▪ Can we rely on the ▪ Will it support diverse ▪ Can we reproduce


technology? business objectives? the output?
▪ Can we trust the output ▪ Will it easy interoperate ▪ Can we consistently monitor
being produced? with diverse and manage results?
▪ Do we have what we need technology platforms? ▪ Can we quickly scale?
when need it? ▪ Will it help me homogenize
or consolidate my
technology stacks?

ID: 351281 © 2018 Gartner, Inc.

17 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 18
Integration Points for Various Analytics
Events and Alerts

Self-service Deliver
Acquire Organize Analyze
Analytics
Data Sources
Streaming/In Motion
LOB Apps
Push-method
Stream Policy Orchestration analytics
IT Log IoT Feeds Enterprise Analytics
Ingestion RT Algorithm
Streaming
Video API Real Time
RT Traditional Interact
Image Audio Reporting
IoT Self-Service Device
Pub/Sub
Staging/At Rest Platform Data Analytic Smart Data
Capabilities Discovery
Preparation/
Operational Systems Data Access
Analyze
Distributed
Logical Data Warehouse Columnar Optimize Visual
Embed
Layer Exploration
Distributed Storage/ Forecast
Other NoSQL Traditional Report
Process In-Memory Analytic
Data Data
NoSQL NoSQL Temporary Plan Dashboards
Virtualization Management NoSQL App
Other IMDB API Integration Data Storage Discover API
Transform SQL Other/Hadoop Services Collaborate Storytelling/
Narrative
Centralized/Monolithic Aggregate Market- Predict
Automate
SQL RDBMS place or Model
Datasets Advanced Enrich
Analytics Data
ERPs API Store
Image Feeds Administration
Doc

Video Text IT Log

External

Manage and Govern = Optional


Information Governance (Including Metadata Management, Data Quality, Data Modeling, Master Data Management), = Cloud, On-Premises
Data Management (Data Admin., Security, Privacy and Identity) and Organization (People) or Hybrid

ID: 351281 © 2018 Gartner, Inc.

18 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 19

Traditional Analytic Provisioning


IT often re-engineers
the analytic models New capabilities,
Portability? How do
What do I need to to support application such as AI/ML can
I part the analytic
prep my analytic integration or to overwhelm your
enhance performance across multiple
for deployment? existing operational
(e.g., embedded architectures?
environments
anayltics)

Explore and Predeployment Productionizing Scaling the Monitoring and


Discover Handoff the Analytic Analytic Maintenance
The search for The first phase Getting the How do we How do we
business thru of deploying the model integrated scale the improve
exploratory analytic model into the analytic model the analytic
analysis production in production or iterate
and discovery environment over time?

Changes that need to be


made based on frequent
updates to analytical models

ID: 351281 © 2018 Gartner, Inc.

19 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 20

Target Analytic Provisioning Model


Explore and Predeployment Productionizing Scaling the Monitoring and
Discover Handoff the Analytic Analytic Maintenance
The search for The first phase Getting the How do we How do we
business thru of deploying the model integrated scale improve
exploratory analytic model into the the analytic the analytic
analysis production model in or iterate
and discovery environment production? over time?

Integrated Analytic/
Processing Engines
Model Interchange Model Interchange Embedded
Format Standards Executes code Format Standards Analytic Model
associated with
analytic models

ID: 351281 © 2018 Gartner, Inc.

20 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 21
Aligning Business Strategy With Development

Bottom-Line and
Stakeholders Outcomes Other Results

Performance Feedback on
Executives
Metrics Strategy

Process Business Process Metrics Efficiency and


Owners (Process, Task, Data) Effectiveness

Data and
Information Data Metrics Business
Stewards (Process, Task, Data)
Data Quality

Stakeholder Level Focus

Make the Connection to the Business


ID: 351281 © 2018 Gartner, Inc.

21 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 22 Information Governance — Stages
Awareness
▪ Promote visibility
▪ Training
▪ Auditing and reporting

Compliance
▪ Regulatory compliance
▪ Corporate policies and procedures
▪ IT policies and procedures

Information Profiling
▪ Data inventory and analysis
▪ Data quality, veracity
▪ Lineage and provenance

Privacy and Security


▪ Information classification
▪ Privacy management
▪ Role-based access controls

Information Life Cycle


▪ Retention policy
▪ Hot, warm, cold – tiered storage
▪ Archival

ID: 351281 © 2018 Gartner, Inc.

22 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 23

Example Team Structure for Building an LDW


Key Question: "Who builds the LDW?" Governance Council

Program Manager Chief Data Officer BICC (Evolves From


Development) Business Units
IT/Business (CDO) Business Business Units
Business Units

DW Project Manager IT/Business Lake Team "Citizen


‘Citizen Analysts’
‘Citizen Analysts’
Analysts"/
Power Users
Central Solution Team (Server, Cloud) Data
Data In Team Information Out Team
Scientist IT/
Business ‘CitizenAnalysts’
‘Citizen Analysts’
‘Citizen Integrators’
IT Architect (e.g., Data,
ETL Developer Business Analyst Application, Infrastructure Architect
ETL Developer (Requirements) and Security Architects) IT
and Developer
ETL integrator IT Business Analytics DataScientists
Scientist
and integrator IT Business Data
and Integrator IT IT/Business
Migration Specialist Dev/Build

Data Stewards
Data Modeller Operations Planning Report/Analysis Admin/
Report/Analysis Business
IT/Business Developers
Report/Analysis Monitoring
Developers
Platforms, Including Developers
Appliances, Cloud, …
Virtualization
Information
Business Architect
Self-Service Governance
Virtualization Architect Intelligence Manager
Enablement
Specialist IT or
Business
Database Administrators
Infrastructure, Cloud, App Key DW Agile Lake Governance

ID: 351281 © 2018 Gartner, Inc.

23 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.


Figure 24 Cloud Deployment Options for Data
On-Premises
Client’s Own Data Center Private Cloud

This is the traditional option, but it can tie Unlike public cloud, the infrastructure is
business down into processing and maintaining dedicated to a single organization. This can
infrastructure and paying the associated costs. help meet certain needs such as strong security
It is best-suited for data needs that are well- control. Private clouds can also be hosted at an
understood and predictable or are governed external provider. These are best-suited for
by stringent regulations. mission-critical applications that have very high
security and uptime requirements.

Public
IaaS BDaaS

Provides clients full control of deploying This provides clients with access to an already-
their data stores(s) in public cloud. This also installed data store that clients can configure to
reduces some of the infrastructure overhead their needs. The advantage of this option is that
but still requires operating system and data businesses don’t need to procure and manage
store skills. infrastructure, and they don’t need to invest in
the corresponding skills. Cloud providers use
multitenancy, in which multiple clients use the
same data store in the public cloud but have a
clear separation of data.

ID: 351281 © 2018 Gartner, Inc.

24 © 2018 Gartner, Inc. and/or its affiliates. All rights reserved.

You might also like