Revolutionary container based hybrid cloud solution for MLPlatform
Ness' data science platform, NextGenML, puts the entire machine learning process: modelling, execution and deployment in the hands of data science teams.
The entire paradigm approaches collaboration around AI/ML, being implemented with full respect for best practices and commitment to innovation.
Kubernetes (onPrem) + Docker, Azure Kubernetes Cluster (AKS), Nexus, Azure Container Registry(ACR), GlusterFS
Workflow
Argo->Kubeflow
DevOps
Helm, kSonnet, Kustomize,Azure DevOps
Code Management & CI/CD
Git, TeamCity, SonarQube, Jenkins
Security
MS Active Directory, Azure VPN, Dex (K8s) integrated with GitLab
Machine Learning
TensorFlow (model training, boarding, serving), Keras, Seldon
Storage (Azure)
Storage Gen1 & Gen2, Data Lake, File Storage
ETL (Azure)
Databricks, Spark on K8, Data Factory (ADF), HDInsight (Kafka and Spark), Service Bus (ASB)
Lambda functions & VMs, Cache for Redis
Monitoring and Logging
Graphana, Prometeus, GrayLog
2. Ness Machine Learning Platform (MLP)
Ø Robust pipeline container oriented to:
ü to train a model
ü provide metrics and generic evaluation
Ø Measure the performance of the trained model
Ø Deploy the model as a service
… of course all integrated into a Kubernetes container solution
3. Ness Machine Learning Platform (MLP)
Ness MLP act as a hub around which all data
science work takes place at enterprise scale level.
Ness' data science platform puts the
entire data modelling process in the hands of data
science teams so they can focus on gaining insights
from data and communicating them to key stakeholders in
your business.
4. • Infrastructure Agnostic (Docker)
•was deployed on prem but also on cloud (Azure Kubernetes Service)
•still easy to integrate with cloud services (Databricks – Spark)
•used cutting edge devOps technologies in container area (Dockerize, K8s)
• Azure heterogenous cluster (CPU&GPU) for dynamic GPU allocation for ML
trainig
In client’s production we run in parallel more than 20 ML models for organ-segmentation, each one took around 20-24
hours. The GPU nodes were allocated on the fly minimaxing the time and costs.
Not vendor locked, entire technology stack is opensource
NextGenML
Platform
5. • ML pipeline (Kubeflow)
Build, deploy and manage multi-step ML workflows based on Docker containers
• Pipeline creation is done programmatically using Python DSL (low level control)
• Real time perception of progress, execution, logs, in/outs of parameters
• Unified all the logs from pipeline containers and also system containers so it’s easy for all
the users(data engineers/scientist) to check progress
• Easy tool to compare in parallel different parameters and hyper-parameters for scientific
experiments.
• Also keep track of entire history of all runs and re-runs
NextGenML
Platform
6. • Asset Catalog
Build as ML/AI collaboration (Records, Models, Experiments, Pipelines, Deployments)
• built by Ness us from scratch as single source of truth built around ML concepts
• is the centric part for data scientist, the reference data integration also for data engineers
• data traceability UI/UX designed to easily drill down the data and ML execution
• REST API and SDK(Python) to manage records, experiment, pipelines during ML execution
• “experiment metadata” provided through UI => no need to ssh the machine for info
NextGenML
Platform
7. • Asset Catalog
Build as ML/AI collaboration (Records, Models, Experiments, Pipelines, Deployments)
• built by Ness us from scratch as single source of truth built around ML concepts
• is the centric part for data scientist, the reference data integration also for data engineers
• data traceability UI/UX designed to easily drill down the data and ML execution
• REST API and SDK(Python) to managerecords, experiment, pipelines during ML execution
• “experiment metadata” provided through UI => no need to ssh the machine for info
NextGenML
Platform
8. • ETL (Spark)
Build for easy integration with the ML platform
• full integration with Asset Catalog
• created own Spark adapters, Spark Operator, to deploy and run Spark on
Kubernetes
• created own Spark Python SDK to create Databricks Spark Cluster, deploy
and execute spark job on Databricks
• integration of Azure Data Factory with Databricks and Asset Catalog to
support Azure Service Bus(distributed queue)
NextGenML
Platform
9. • Automation (CI/CD)
Created for an easy code and flow integration with user’s demands
• smooth propagation of released code
• deploy easily “exact the same Docker image” in different environments like
development, integration system, production
• built on top of Jenkins and devOps best practices
NextGenML
Platform
10. Developing a cognitive interface between engineers and technology – data catalog
Data
Engineers
devOps
Data
Scientist
Data
Scientist
Data
Scientist
devOps
Data
Scientist
- unify the work for data scientists across different Line of Business
- to automate most of their manual process for data processing
- data discovery and tracing
- normalize different computer data
- visual navigation on experiments & deployments
Data
Engineers
NextGenML
Platform
11. import kfp.dsl as dsl
from kubernetes import client as k8s_client
training_op = dsl.ContainerOp(
name='training',
image=AZURE_ML___GPU_IMAGE,
command=['bash', '-c'],
arguments=[...],
file_outputs={'output': '....'})
training_op.add_pod_annotation('....', 'false')
training_op.set_gpu_limit("1")
Data
Engineers
devOps
Data
Scientist
CPU Cluster GPU Cluster Extended
TensorFlow
Keras
GPU
User implementation
System automation
TensorFlow
Keras
GPU
NextGenML
Platform