This presentation introduces the audience to the DataOps and AIOps practices. It deals with organizational & tech aspects, and provide hints to start you data journey.
“TODAY, COMPANIES ACROSS ALL INDUSTRIES ARE BECOMING SOFTWARE COMPANIES.”
The familiar refrain is certainly true of the new-school, born-in-the-cloud set. But it can also apply to traditional enterprises that are reinventing themselves by coupling DevOps excellence with intelligent DataOps.
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
MLOps Virtual Event: Automating ML at ScaleDatabricks
ML is transforming many industries but operating ML systems at scale is complex as it involves many teams, constant data and model updates, and moving from development to production. ML platforms aim to help with this by providing software to manage the entire ML lifecycle from data to experimentation to production deployment through a consistent interface. Desirable features for an ML platform include ease of use, integration with data infrastructure for governance, and collaboration functions to enable sharing of code, data, models and experiments. Databricks provides an open source ML platform that integrates with data lakes and a data science workspace to help organizations perform MLOps at scale.
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
This document provides an overview of building a modern cloud analytics solution using Microsoft Azure. It discusses the role of analytics, a history of cloud computing, and a data warehouse modernization project. Key challenges covered include lack of notifications, logging, self-service BI, and integrating streaming data. The document proposes solutions to these challenges using Azure services like Data Factory, Kafka, Databricks, and SQL Data Warehouse. It also discusses alternative implementations using tools like Matillion ETL and Snowflake.
This document discusses data mesh, a distributed data management approach for microservices. It outlines the challenges of implementing microservice architecture including data decoupling, sharing data across domains, and data consistency. It then introduces data mesh as a solution, describing how to build the necessary infrastructure using technologies like Kubernetes and YAML to quickly deploy data pipelines and provision data across services and applications in a distributed manner. The document provides examples of how data mesh can be used to improve legacy system integration, batch processing efficiency, multi-source data aggregation, and cross-cloud/environment integration.
Machine Learning operations brings data science to the world of devops. Data scientists create models on their workstations. MLOps adds automation, validation and monitoring to any environment including machine learning on kubernetes. In this session you hear about latest developments and see it in action.
The document discusses migrating a data warehouse to the Databricks Lakehouse Platform. It outlines why legacy data warehouses are struggling, how the Databricks Platform addresses these issues, and key considerations for modern analytics and data warehousing. The document then provides an overview of the migration methodology, approach, strategies, and key takeaways for moving to a lakehouse on Databricks.
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
The Importance of DataOps in a Multi-Cloud WorldDATAVERSITY
There’s no denying that Cloud has evolved from being an outlying market disruptor to a mainstream method for delivering IT applications and services. In fact, it’s not uncommon to find that Enterprises use the services of more than one cloud at the same time. However, while a multi-cloud strategy offers many benefits, it also increases data management complexity and consequently reduces data availability. This webinar defines the meaning of DataOps and why it’s a crucial component for every multi-cloud approach.
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Команда Data Phoenix Events приглашает всех, 17 августа в 19:00, на первый вебинар из серии "The A-Z of Data", который будет посвящен MLOps. В рамках вводного вебинара, мы рассмотрим, что такое MLOps, основные принципы и практики, лучшие инструменты и возможные архитектуры. Мы начнем с простого жизненного цикла разработки ML решений и закончим сложным, максимально автоматизированным, циклом, который нам позволяет реализовать MLOps.
https://dataphoenix.info/the-a-z-of-data/
https://dataphoenix.info/the-a-z-of-data-introduction-to-mlops/
The document discusses Microsoft's approach to implementing a data mesh architecture using their Azure Data Fabric. It describes how the Fabric can provide a unified foundation for data governance, security, and compliance while also enabling business units to independently manage their own domain-specific data products and analytics using automated data services. The Fabric aims to overcome issues with centralized data architectures by empowering lines of business and reducing dependencies on central teams. It also discusses how domains, workspaces, and "shortcuts" can help virtualize and share data across business units and data platforms while maintaining appropriate access controls and governance.
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
Past, present and future of data mesh at Intuit. This deck describes a vision and strategy for improving data worker productivity through a Data Mesh approach to organizing data and holding data producers accountable. Delivered at the inaugural Data Mesh Leaning meetup on 5/13/2021.
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
Today, practically every firm uses big data to gain a competitive advantage in the market. With this in mind, freely available big data tools for analysis and processing are a cost-effective and beneficial choice for enterprises. Hadoop is the sector’s leading open-source initiative and big data tidal roller. Moreover, this is not the final chapter! Numerous other businesses pursue Hadoop’s free and open-source path.
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
2017 StrataHadoop SJC conference talk. https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56047
Description:
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #DataScienceHappiness.
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #datasciencehappiness.
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Debraj GuhaThakurta
Presented at: Global Big AI Conference, Santa Clara, Jan 2018 Developing and deploying AI solutions on the cloud using Team Data Science Process (TDSP) and Azure Machine Learning (AML)
The document contains the resume of Naveen Reddy Tamma which summarizes his work experience and qualifications. He has over 7 years of experience working as an Associate at Cognizant Technology Solutions on various projects involving Informatica ETL development, data quality, and reporting. He holds a B.Tech in Computer Science and has experience with technologies like Informatica, Teradata, Oracle, and Cognos.
The document contains the resume of Naveen Reddy Tamma which summarizes his work experience and qualifications. He has over 7 years of experience working as an Associate at Cognizant Technology Solutions on various projects involving Informatica ETL development, data quality testing, and report generation. He holds a B.Tech in Computer Science and has experience working with technologies like Informatica, Teradata, Oracle, and Cognos.
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...DianaGray10
This session is focused on the art of application architecture, where we unravel the intricacies of creating a standard, yet dynamic application structure.
We'll explore:
Essential components of a typical application, emphasizing their roles and interactions.
Learn how to connect UiPath RPA Processes, UiPath Apps, and Data Service together to build a stronger app.
Gain insights into building more efficient, interconnected, and robust applications in the UiPath ecosystem.
Speaker:
David Kroll, Director, Product Marketing @Ashling Partners and UiPath MVP
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningKai Wähner
Comparison of Data Preparation vs. Data Wrangling Programming Languages, Frameworks and Tools in Machine Learning / Deep Learning Projects.
A key task to create appropriate analytic models in machine learning or deep learning is the integration and preparation of data sets from various sources like files, databases, big data storages, sensors or social networks. This step can take up to 80% of the whole project.
This session compares different alternative techniques to prepare data, including extract-transform-load (ETL) batch processing (like Talend, Pentaho), streaming analytics ingestion (like Apache Storm, Flink, Apex, TIBCO StreamBase, IBM Streams, Software AG Apama), and data wrangling (DataWrangler, Trifacta) within visual analytics. Various options and their trade-offs are shown in live demos using different advanced analytics technologies and open source frameworks such as R, Python, Apache Hadoop, Spark, KNIME or RapidMiner. The session also discusses how this is related to visual analytics tools (like TIBCO Spotfire), and best practices for how the data scientist and business user should work together to build good analytic models.
Key takeaways for the audience:
- Learn various options for preparing data sets to build analytic models
- Understand the pros and cons and the targeted persona for each option
- See different technologies and open source frameworks for data preparation
- Understand the relation to visual analytics and streaming analytics, and how these concepts are actually leveraged to build the analytic model after data preparation
Video Recording / Screencast of this Slide Deck: https://youtu.be/2MR5UynQocs
This document discusses how data science models have transitioned to the cloud to take advantage of greater computing resources. It notes that data science models are resource-intensive and traditionally required powerful local machines. The cloud allows data scientists to run models on cloud infrastructure for lower costs than high-end laptops and with access to many GPUs. Several major cloud platforms - Azure, AWS, and Google Cloud - are discussed and compared in terms of their machine learning offerings. The document also introduces Microsoft's Team Data Science Process, which aims to help data science teams collaborate more effectively on projects in the cloud.
Jeff has over 33 years of experience in IT consulting, product development, and system operations. He has expertise in big data technologies including Hadoop, Spark, and Hive. Most recently as a Big Data Architect, he helped customers optimize data warehouse workloads on Hadoop. He also led teams to design and build innovative tools for automating data warehouse migrations to Hadoop. Jeff has extensive experience developing, operating, and administering large-scale production environments and big data initiatives.
Shivaprasada Kodoth is seeking a position as an ETL Lead/Architect with experience in data warehousing and ETL. He has over 8 years of experience in data warehousing and Informatica design and development. He is proficient in technologies like Oracle, Teradata, SQL, and PL/SQL. Some of his key projects include developing ETL mappings and workflows for integrating various systems at BoheringerIngelheim and UBS. He is looking for opportunities in Bangalore, Mangalore, Cochin, Europe, USA, Australia, or Singapore.
Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Juypter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production ?
In this session learn the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of DSX with HDP with the focus on integration, security and model deployment and management.
Speakers:
Sriram Srinivasan, Senior Technical Staff Member, Analytics Platform Architect, IBM
Vikram Murali, Program Director, Data Science and Machine Learning, IBM
This document discusses best practices for developing data science products at Philip Morris International (PMI). It covers:
- PMI's data science team of over 40 people across four hubs working on fraud prevention and other problems.
- Key principles for PMI's data science work, including being business-driven, investing in people, self-organizing, iterating to improve, and co-creating solutions.
- Challenges in data product development involving integrating work between data scientists and other teams, and practices like continuous integration/delivery to overcome these challenges.
- The role of data scientists in contributing code that is readable, testable, reusable, reproducible, and usable by other teams to integrate into
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
Getting machine learning models to production is notoriously difficult: it involves multiple teams (data scientists, data and machine learning engineers, operations, …), who often does not speak to each other very well; the model can be trained in one environment but then productionalized in completely different environment; it is not just about the code, but also about the data (features) and the model itself… At DataSentics, as a machine learning and cloud engineering studio, we see this struggle firsthand – on our internal projects and client’s projects as well.
This document provides an introduction to big data and analytics. It discusses definitions of key concepts like business intelligence, data analysis, and big data. It also provides a brief history of analytics, describing how technologies have evolved from early business intelligence systems to today's big data approaches. The document outlines some of the key components of Hadoop, including HDFS and MapReduce, and how it addresses issues like volume, variety and velocity of big data. It also discusses related technologies in the Hadoop ecosystem.
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesRaphael Branger
"We now do Agile BI too” is often heard in todays BI community. But can you really "create" agile in Business Intelligence projects? This presentation shows that Agile BI doesn't necessarily start with the introduction of an iterative project approach. An organisation is well advised to establish first the necessary foundations in regards to organisation, business and technology in order to become capable of an iterative, incremental project approach in the BI domain.
In this session you learn which building blocks you need to consider. In addition you will see what a meaningful sequence to these building blocks is. Selected aspects like test automation, BI specific design patterns as well as the Disciplined Agile Framework will be explained in more and practical details.
Coding software and tools used for data science management - PhdassistancephdAssistance1
The technique of extracting usable information from data is known as data science. This is the procedure for collecting, modelling and analysing, data in order to address real-world issues. Data Science tools have been developed as a result of the vast range of applications and rising demand. The following section goes through the greatest Data Science tools in detail.The most notable attribute of these tools is that they do not require the usage of programming languages to implement Data Science.
Read More: https://bit.ly/3rbp1Lb
For Enquiry:
India: +91 91769 66446
UK: +44 7537144372
Email: info@phdassistance.com
This document discusses DevOps and MLOps practices for machine learning models. It outlines that while ML development shares some similarities with traditional software development, such as using version control and CI/CD pipelines, there are also key differences related to data, tools, and people. Specifically, ML requires additional focus on exploratory data analysis, feature engineering, and specialized infrastructure for training and deploying models. The document provides an overview of how one company structures their ML team and processes.
This document discusses various tools and technologies used in data science. It covers popular programming languages like Python, R, Java and C++; databases like MySQL, NoSQL, SQL Server and Oracle; data analytics tools like SAS, Tableau, SPSS and Excel; APIs like TensorFlow; servers and frameworks like Hadoop and Spark; and compares SQL and NoSQL databases. It provides details on languages and tools like R, Python, Excel, SAS, SPSS and discusses their uses and popularity in data science.
Ansible, Terraform, CloudFormation, [insert your favorite tech here]… Les solutions d’infra-as-code sont pléthores. Alors, pourquoi parler du dernier rejeton à la mode porté par le CNCF ? Allez, spoilons un peu l'affaire ! Bâti sur Kubernetes, Crossplane permet lui de faire converger le delivery d’une app containerisée avec toutes les autres ressources requises hors de votre cluster K8S préféré, et dont elle aura toutefois grand besoin pour fonctionner correctement : un bucket S3, une base de donnée managée, etc.. Vous orchestrez ainsi le cycle de vie de votre application complète avec une seule et même perspective. Ajoutez à cela un multicloud facilité, ou encore une vrai capacité à s’inscrire dans une démarche GitOps, et vous obtenez là une solution très efficace pour organiser vos prochains déploiements !
This presentation explains what serverless is all about, explaining the context from Devs & Ops points of view, and presenting the various ways to achieve serverless (Functions a as Service, BaaS....). It also presents the various competitors on the market and demo one of them, openfaas. Finally, it enlarges the pictures, positionning serverless, combined with Edge computing & IoT, as a valuable triptic cloud vendors are leveraging on top of, to create end-to-end offers.
2 self-managed Docker clusters deployed on public clouds and fight each other in a ruthless battle. One has been designed to resist any form of threat. The other one's only aim is to destroy the first one. Who's going to win?
Although it's presented as an entertainment, this talk will show off two serious platforms leveraging on different principles. Beyond the technical aspects covered (swarm/kubernetes orchestration, IaaS clouds, various tools such as terraform, kops or helm) , it will be the opportunity to discuss more largely architecture topics such as immutable infrastructure, hybridation, microservices, etc.
DevOps at scale: what we did, what we learned at Societe GeneraleAdrien Blind
The following talk discusses Societe Generale's transformation journey to DevOps, and more largelly to continuous delivery principles, inside a large, traditionnal company. It emphases the importance of practices over tooling, a human centric approach massively leveraging on coaching, and our "framework" approach to make it scaling up to the IS level.
It has been initially delivered at DevOps Rex conference, with teammate Laurent Dussault, also DevOps coach at Societe Generale.
Unleash software architecture leveraging on dockerAdrien Blind
The following talk first comes back on key aspects of microservices architectures. It then shifts to Docker, to explain in this context the benefits of containers and especially the new orchestration features appeared with version 1.12.
Docker, cornerstone of cloud hybridation ? [Cloud Expo Europe 2016]Adrien Blind
The following talk discusses the opportunity to leverage on docker to create an hybrid logical cloud built simultaneously on top of traditionnal datacenters and public cloud vendors and enabling to manage new kind of containers (Windows, linux over ARM). It also discusses the value of such capacity for applications in a contexte of topology orchestrations and micro service oriented applications.
DevOps à l'échelle: ce que l'on a fait, ce que l'on a appris chez Societe Gen...Adrien Blind
The following talk discusses Societe Generale's transformation journey to DevOps, and more largelly to continuous delivery principles, inside a large, traditionnal company. It emphases the importance of practices over tooling, a human centric approach massively leveraging on coaching, and our "framework" approach to make it scaling up to the IS level.
It has been initially delivered at DevOps Rex conference, with teammate Laurent Dussault, also DevOps coach at Societe Generale.
Docker, cornerstone of an hybrid cloud?Adrien Blind
In this presentation, I propose to explore the orchestration & hybridation potential raised by Docker 1.12 Swarm Mode and the subsequent benefits.
I'll first remind why docker fits well the microservices paradigms, and how does this architecture engender new challenges : service discovery, app-centric security, scalability & resilience, and of course, orchestration.
I'll then discuss the opportunity to create your own docker CaaS platform hybridating simultaneously on various cloud vendors & traditional datacenters, better than just leveraging on vendors integrated offers.
Finally, I'll discuss the rise of new technologies (Windows containers, ARM architectures) in the docker landscape, and the opportunity of integrating them in a global docker composite orchestration, enabling to depict globally complex apps.
Petit déjeuner Octo - L'infra au service de ses projetsAdrien Blind
Cette présentation revient sur le projet d'automatisation de l'infrastructure informatique de Société Générale, dans un contexte plus large de déploiement des pratiques et outils du continuous delivery et devops.
Since many apps are not about just a single container, this talk discusses the ability and benefits of creating an hybrid Docker cluster capacity leveraging on Linux+Windows OS and x86+ARM architectures.
Moreover, the docker nodes composing this cloud will be hosted across several providers (local DC, cloud vendors such as Azure or AWS), in order to face various scenarios (cloud migration, elasticity...).
DevOps, NoOps, everything-as-code, commoditisation… Quel futur pour les ops ?Adrien Blind
La mise en oeuvre du continuous delivery engendre de nouvelles pressions sur les Ops, l’infra et l’opérabilité d’une application se bâtissant désormais au rythme croissant des itérations livrées. En parallèle, les patterns d’architecture évoluent eux aussi : résilience et scalabilité se traitent désormais de plus en plus au sein même des applications, ramenant progressivement l’infrastructure au rang de commodité… Enfin, les équipes de Devs n’ont de cesse de réclamer plus d’autonomie et une ergonomie plus adaptée à leurs besoins : les acteurs du cloud et de solutions star comme Docker ne s’y sont pas trompés en proposant des produits qui leur parlent directement : la tentation du NoOps grandit peu à peu…
L’enjeu pour les Ops consiste donc à proposer un positionnement et une offre en résonance avec ces nouvelles attentes. Les challenges sont nombreux, revêtant à la fois des aspects techniques (infra-as-code, software-defined-software/storage/, hybridation du SI…) et non techniques (agilité, craftsmanship, devops…).
Des Devs s’arrogeant la place des Ops, des Ops acquérant des compétence de Dev… Dans cette session, nous vous proposons ainsi d’explorer ces profondes mutations culturelles et techniques, et nous vous partagerons quelques recettes pour le plus grand bénéfice des OPs… comme des DEVs. Comme l’écrivait Audiard, « Quand ça change, ça change... Faut jamais se laisser démonter » !
Introduction to Unikernels at first Paris Unikernels meetupAdrien Blind
This is an introduction to unikernels and their impact on architecture and IT organizations (in French, I'll translate it in short terms). I produced this talk for the first Paris Unikernels Meetup.
When Docker Engine 1.12 features unleashes software architectureAdrien Blind
This slidedeck deals with new features delivered with Docker Engine 1.12, in a larger context of application architecture & security. It has been presented at Voxxed Days Luxembourg 2016
The document discusses full stack automation and DevOps. It introduces Clément Cunin and Adrien Blind and their roles. Some key benefits discussed are reduced time to market, repeatability, and serenity. Methods discussed include deploying new releases daily with a 15 minute commit to production time, treating infrastructure as code, using ephemeral environments, and measuring everything.
This presentation discusses how to achieve continuous delivery, leveraging on docker containers, here used as universal application artifacts. It has been presented at Voxxed '15 Bucharest.
Docker: Redistributing DevOps cards, on the way to PaaSAdrien Blind
This talk first presents Docker through its key characteristics: being Portable, Disposable, Live, Social. It then discusses a new type of cloud, the CaaS (Container as a Service), and it potential benefits for PaaS (Platform as a Service).
Docker, Pierre angulaire du continuous delivery ?Adrien Blind
This presentation explores continuous delivery principles leveraging on Docker : it depicts the use of Docker containers as universal application artifacts, delivered flowly all along a deployment pipeline.
This slideshow has been initially presented at Devops D-Day conference, Marseille.
Identity & Access Management in the cloudAdrien Blind
This presentation discusses the evolution of IAM (Identity & Access Management) problematic, considering a context pushing more & more externalization & opening (B2B, B2C) of enterprises IS, also leveraging massively on the cloud.
The talk particularly focuses on IAM SSO & federation topics, and subsequent technologies (SAML, OpenID, OAuth...).
The missing piece : when Docker networking and services finally unleashes so...Adrien Blind
Docker now provides several building blocks, combining engine, clustering, and componentization, while the new networking and service features enable many new usecases such as multi-tenancy. In this session, you will first discover the new experimental networking and service features expected soon, and then drift rapidly to software architecture, explaining how a complete Docker stack unleashes microservices paradigms.
The first part of the talk will introduce what SDNs and service registries are to the audience and will cover corresponding network & service experimental features of docker accordingly, with a technical focus. For instance, it explains how to create an overlay network of top of a swarm cluster or how to publish services.
The second part of the talk moves from infrastructure to application concerns, explaining that application architecture paradigms are shifting. In particular, we discuss the growing porosity of companies’s IS (especially due to massive use of cloud services) drifting security boundaries from the global IS perimeter, to the application shape. We also remind that traditional SOA patterns leveraging on buses (ie. ESBs & ETLs) are being replaced by microservices promoting more direct, full-mesh, interactions. To get the picture really complete, we’ll also rapidely remind other trends and shifts which are already covered by other docker components: scalability & resiliency to be supported by the apps themselves, fine-grained applications, or even infrastructure commoditization…
Most of all, the last part depicts a concrete, state-of-the-art application, applying all the properties discussed previously, and leveraging on a multi-tenant docker full stack using new networking and services features, in addition to traditional swarm, compose, and engine components. And just because we say it doesn’t mean it’s true, we’ll be happy to demonstrate this live !
What's Next Web Development Trends to Watch.pdfSeasiaInfotech2
Explore the latest advancements and upcoming innovations in web development with our guide to the trends shaping the future of digital experiences. Read our article today for more information.
What Not to Document and Why_ (North Bay Python 2024)Margaret Fero
We’re hopefully all on board with writing documentation for our projects. However, especially with the rise of supply-chain attacks, there are some aspects of our projects that we really shouldn’t document, and should instead remediate as vulnerabilities. If we do document these aspects of a project, it may help someone compromise the project itself or our users. In this talk, you will learn why some aspects of documentation may help attackers more than users, how to recognize those aspects in your own projects, and what to do when you encounter such an issue.
These are slides as presented at North Bay Python 2024, with one minor modification to add the URL of a tweet screenshotted in the presentation.
Are you interested in learning about creating an attractive website? Here it is! Take part in the challenge that will broaden your knowledge about creating cool websites! Don't miss this opportunity, only in "Redesign Challenge"!
Interaction Latency: Square's User-Centric Mobile Performance MetricScyllaDB
Mobile performance metrics often take inspiration from the backend world and measure resource usage (CPU usage, memory usage, etc) and workload durations (how long a piece of code takes to run).
However, mobile apps are used by humans and the app performance directly impacts their experience, so we should primarily track user-centric mobile performance metrics. Following the lead of tech giants, the mobile industry at large is now adopting the tracking of app launch time and smoothness (jank during motion).
At Square, our customers spend most of their time in the app long after it's launched, and they don't scroll much, so app launch time and smoothness aren't critical metrics. What should we track instead?
This talk will introduce you to Interaction Latency, a user-centric mobile performance metric inspired from the Web Vital metric Interaction to Next Paint"" (web.dev/inp). We'll go over why apps need to track this, how to properly implement its tracking (it's tricky!), how to aggregate this metric and what thresholds you should target.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/07/intels-approach-to-operationalizing-ai-in-the-manufacturing-sector-a-presentation-from-intel/
Tara Thimmanaik, AI Systems and Solutions Architect at Intel, presents the “Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” tutorial at the May 2024 Embedded Vision Summit.
AI at the edge is powering a revolution in industrial IoT, from real-time processing and analytics that drive greater efficiency and learning to predictive maintenance. Intel is focused on developing tools and assets to help domain experts operationalize AI-based solutions in their fields of expertise.
In this talk, Thimmanaik explains how Intel’s software platforms simplify labor-intensive data upload, labeling, training, model optimization and retraining tasks. She shows how domain experts can quickly build vision models for a wide range of processes—detecting defective parts on a production line, reducing downtime on the factory floor, automating inventory management and other digitization and automation projects. And she introduces Intel-provided edge computing assets that empower faster localized insights and decisions, improving labor productivity through easy-to-use AI tools that democratize AI.
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
Performance Budgets for the Real World by Tammy EvertsScyllaDB
Performance budgets have been around for more than ten years. Over those years, we’ve learned a lot about what works, what doesn’t, and what we need to improve. In this session, Tammy revisits old assumptions about performance budgets and offers some new best practices. Topics include:
• Understanding performance budgets vs. performance goals
• Aligning budgets with user experience
• Pros and cons of Core Web Vitals
• How to stay on top of your budgets to fight regressions
Quality Patents: Patents That Stand the Test of TimeAurora Consulting
Is your patent a vanity piece of paper for your office wall? Or is it a reliable, defendable, assertable, property right? The difference is often quality.
Is your patent simply a transactional cost and a large pile of legal bills for your startup? Or is it a leverageable asset worthy of attracting precious investment dollars, worth its cost in multiples of valuation? The difference is often quality.
Is your patent application only good enough to get through the examination process? Or has it been crafted to stand the tests of time and varied audiences if you later need to assert that document against an infringer, find yourself litigating with it in an Article 3 Court at the hands of a judge and jury, God forbid, end up having to defend its validity at the PTAB, or even needing to use it to block pirated imports at the International Trade Commission? The difference is often quality.
Quality will be our focus for a good chunk of the remainder of this season. What goes into a quality patent, and where possible, how do you get it without breaking the bank?
** Episode Overview **
In this first episode of our quality series, Kristen Hansen and the panel discuss:
⦿ What do we mean when we say patent quality?
⦿ Why is patent quality important?
⦿ How to balance quality and budget
⦿ The importance of searching, continuations, and draftsperson domain expertise
⦿ Very practical tips, tricks, examples, and Kristen’s Musts for drafting quality applications
https://www.aurorapatents.com/patently-strategic-podcast.html
this resume for sadika shaikh bca studentSadikaShaikh7
I am a dedicated BCA student with a strong foundation in web technologies, including PHP and MySQL. I have hands-on experience in Java and Python, and a solid understanding of data structures. My technical skills are complemented by my ability to learn quickly and adapt to new challenges in the ever-evolving field of computer science.
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
Implementations of Fused Deposition Modeling in real worldEmerging Tech
The presentation showcases the diverse real-world applications of Fused Deposition Modeling (FDM) across multiple industries:
1. **Manufacturing**: FDM is utilized in manufacturing for rapid prototyping, creating custom tools and fixtures, and producing functional end-use parts. Companies leverage its cost-effectiveness and flexibility to streamline production processes.
2. **Medical**: In the medical field, FDM is used to create patient-specific anatomical models, surgical guides, and prosthetics. Its ability to produce precise and biocompatible parts supports advancements in personalized healthcare solutions.
3. **Education**: FDM plays a crucial role in education by enabling students to learn about design and engineering through hands-on 3D printing projects. It promotes innovation and practical skill development in STEM disciplines.
4. **Science**: Researchers use FDM to prototype equipment for scientific experiments, build custom laboratory tools, and create models for visualization and testing purposes. It facilitates rapid iteration and customization in scientific endeavors.
5. **Automotive**: Automotive manufacturers employ FDM for prototyping vehicle components, tooling for assembly lines, and customized parts. It speeds up the design validation process and enhances efficiency in automotive engineering.
6. **Consumer Electronics**: FDM is utilized in consumer electronics for designing and prototyping product enclosures, casings, and internal components. It enables rapid iteration and customization to meet evolving consumer demands.
7. **Robotics**: Robotics engineers leverage FDM to prototype robot parts, create lightweight and durable components, and customize robot designs for specific applications. It supports innovation and optimization in robotic systems.
8. **Aerospace**: In aerospace, FDM is used to manufacture lightweight parts, complex geometries, and prototypes of aircraft components. It contributes to cost reduction, faster production cycles, and weight savings in aerospace engineering.
9. **Architecture**: Architects utilize FDM for creating detailed architectural models, prototypes of building components, and intricate designs. It aids in visualizing concepts, testing structural integrity, and communicating design ideas effectively.
Each industry example demonstrates how FDM enhances innovation, accelerates product development, and addresses specific challenges through advanced manufacturing capabilities.
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
How RPA Help in the Transportation and Logistics Industry.pptxSynapseIndia
Revolutionize your transportation processes with our cutting-edge RPA software. Automate repetitive tasks, reduce costs, and enhance efficiency in the logistics sector with our advanced solutions.
AI_dev Europe 2024 - From OpenAI to Opensource AIRaphaël Semeteys
Navigating Between Commercial Ownership and Collaborative Openness
This presentation explores the evolution of generative AI, highlighting the trajectories of various models such as GPT-4, and examining the dynamics between commercial interests and the ethics of open collaboration. We offer an in-depth analysis of the levels of openness of different language models, assessing various components and aspects, and exploring how the (de)centralization of computing power and technology could shape the future of AI research and development. Additionally, we explore concrete examples like LLaMA and its descendants, as well as other open and collaborative projects, which illustrate the diversity and creativity in the field, while navigating the complex waters of intellectual property and licensing.
Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threatsanupriti
In the rapidly evolving landscape of blockchain technology, the advent of quantum computing poses unprecedented challenges to traditional cryptographic methods. As quantum computing capabilities advance, the vulnerabilities of current cryptographic standards become increasingly apparent.
This presentation, "Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threats," explores the intersection of blockchain technology and quantum computing. It delves into the urgent need for resilient cryptographic solutions that can withstand the computational power of quantum adversaries.
Key topics covered include:
An overview of quantum computing and its implications for blockchain security.
Current cryptographic standards and their vulnerabilities in the face of quantum threats.
Emerging post-quantum cryptographic algorithms and their applicability to blockchain systems.
Case studies and real-world implications of quantum-resistant blockchain implementations.
Strategies for integrating post-quantum cryptography into existing blockchain frameworks.
Join us as we navigate the complexities of securing blockchain networks in a quantum-enabled future. Gain insights into the latest advancements and best practices for safeguarding data integrity and privacy in the era of quantum threats.
How Netflix Builds High Performance Applications at Global ScaleScyllaDB
We all want to build applications that are blazingly fast. We also want to scale them to users all over the world. Can the two happen together? Can users in the slowest of environments also get a fast experience? Learn how we do this at Netflix: how we understand every user's needs and preferences and build high performance applications that work for every user, every time.
How to Avoid Learning the Linux-Kernel Memory ModelScyllaDB
The Linux-kernel memory model (LKMM) is a powerful tool for developing highly concurrent Linux-kernel code, but it also has a steep learning curve. Wouldn't it be great to get most of LKMM's benefits without the learning curve?
This talk will describe how to do exactly that by using the standard Linux-kernel APIs (locking, reference counting, RCU) along with a simple rules of thumb, thus gaining most of LKMM's power with less learning. And the full LKMM is always there when you need it!
How to Avoid Learning the Linux-Kernel Memory Model
Introdution to Dataops and AIOps (or MLOps)
1. An introduction to
DataOps & AIOps (or MLOps)
Adrien Blind (@adrienblind)
Disclaimer and credits:
Parts of this presentation have been built with former team mates out of the context of Saagie:
- a broader talk initially co-developed and co-delivered along with Frederic Petit for DevOps D-Day and Snow Camp conferences. Original slides here: https://bit.ly/2Ci3Ilh
- a talk discussing Continuous Delivery and DevOps, co-developed and co-delivered along with Laurent Dussault for DevOps Rex conferences. Slides here: https://bit.ly/2CmEIcB
5. The point is to Operationalize data projects
Proof of Concept
Operational product
● Robust, resilient
● Scalable
● Secure
● Updatable
● Shareable
6. Value is hard to demonstrate
Long time to implement
Rarely deployed in production
Only 27% of CxO considered their Big
Data projects valuable
12 to 18 months to build and deploy
AI pilots
Only 15% of AI projects have been
deployed
Sources
Gartner’s CIO Survey (2018)
The Big Data Payoff: Turning Big Data into Business Value (Cap Gemini and Informatica survey, 2016)
BCG, Putting Artificial Intelligence to Work, September 2017
Challenges delivering value from Big Data / AI
8. DIY, time/budget-consuming, multi-skills, high-risk approach
Grant access
Connect databases /
files
Integrate data
frameworks
Deploy test jobs &
validate models
Define new policies
Change algos and
integrate new libs
Rewrite/build ETL
codes to prod
Deploy prod jobs
Monitor & audit
activity
Write/Build ML
codes
Write/Build ETL
codes
Provision cluster(s)
Align processes w/
business reqs
Rewrite/build ML
codes to prod
Challenges ㅡ Process
SecurityIT Ops
Data Engineer IT Ops Data Scientist
Data Engineer Data Scientist
IT Ops
IT Ops
Data ScientistData Engineer
Data Steward Business Analyst
9. Barriers between organization : silos and different cultures!
Challenges ㅡ People & organization
Data Analyst
Data Steward
BUSINESS
Data Analyst
Data Steward
ANALYTICS
TEAM
Data Engineer
Data Scientists
IT
IT Ops
IT Architect & Coders
14. Information Technology (on premises, cloud, etc.)
#0 ITOps: provide compute & storage to host data processing / models / app code
Infrastructure landscape: infrastructure driven
15. #1 DevOps
Build, deliver & run apps
Developers need pipelines to
deliver innovative apps
Continuous
improvement
#0 ITOps: provide compute & storage to host data processing / models / app code
API (used
internally &
shared
externally)
External
API you
consume
Operational I.S. (apps, ERP, CRM…) is API centric.
Input and output are business features as APIs.
Application landscape: API driven
Information Technology (on premises, cloud, etc.)
16. #1 DevOps
Build, deliver & run apps
Developers need pipelines to
deliver innovative apps
#2 DataOps
Process & share data
Data engineers need pipelines to
deliver a capital of data
Internal raw data generated by
your apps
Continuous
improvement
Continuous
improvement
Information Technology (on premises, cloud, etc.)
#0 ITOps: provide compute & storage to host data processing / models / app code
API (used
internally &
shared
externally)
External data
you consume:
opendata, from
partners...
External
API you
consume
Data Information System is data processing centric. Input is data, output is data and data models.
Generally not directly plugged on the operational IS (you copy data and process there)
Operational I.S. (apps, ERP, CRM…) is API centric.
Input and output are business features as APIs.
Data processing landscape: data driven
17. #1 DevOps
Build, deliver & run apps
Developers need pipelines to
deliver innovative apps
#2 DataOps
Process & share data
Data engineers need pipelines to
deliver a capital of data
(For
analytics)
As shared datamarts &
more & more as APIs
(Provide
training sets
for AI)
Internal raw data generated by
your apps
Datasets
Continuous
improvement
Continuous
improvement
#0 ITOps: provide compute & storage to host data processing / models / app code
API (used
internally &
shared
externally)
External data
you consume:
opendata, from
partners...
Data you share
externally
Data you share
back to operational IS
External
API you
consume
Operational I.S. (apps, ERP, CRM…) is API centric.
Input and output are business features as APIs.Data Information System is data processing centric. Input is data, output is data and data models.
Data processing landscape outputs
18. #3 AIOPs
Explore & build models
Data scientists need pipelines to
deliver valuable models
#1 DevOps
Build, deliver & run apps
Developers need pipelines to
deliver innovative apps
#2 DataOps
Process & share data
Data engineers need pipelines to
deliver a capital of data
Continuous
improvement
(For
analytics)
Performance drift
analysis (to retrain &
optimize models)
As shared datamarts &
more & more as APIs
(Provide
training sets)
Internal raw data generated by
your apps
Models
to be bundled
and ran as
APIs in the
operational IS
Datasets
Continuous
improvement
Continuous
improvement
Information Technology (on premises, cloud, etc.)
#0 ITOps: provide compute & storage to host data processing / models / app code
API (used
internally &
shared
externally)
External data
you consume:
opendata, from
partners...
Data you share
externally
Data you share
back to operational IS
External
API you
consume
Operational I.S. (apps, ERP, CRM…) is API centric.
Input and output are business features as APIs.Data Information System is data processing centric. Input is data, output is data and data models.
Data science landscape: model driven
19. AIOPs needs DataOps
In the data landscape, spotlights are on data analytics,
and even more on data science/AI which valorize data in a revolutionary way… because they solve business challenges.
… But it requires to have built up a data capital to process first!
Said differently, I like to say that…
( of AI ) ( DATA )
20. Summary: Pensé par les Devs… Pansé par les Ops!
Tech side Non-tech side
#0 ITOps
ITOps operationalizes the delivery of infrastructure assets.
The purpose is to deliver an underlying platform on top of
which assets will be hosted (apps/data processing/ML).
CloudOps lands here, but is opinionated on the way to
achieve this.
Fosters collaboration between Infrastructure teams working
in project mode to deliver new assets, and those running
them (support/run/monitoring, etc.).
#1 DevOps
DevOps operationalizes the delivery of app code (automates,
measure, etc.). The purpose is to deliver innovative
services to the business.
Fosters collaboration between devs who build apps, and ops
responsible to deploying & running these apps. “You build
it, you run it!”
#2 DataOps
DataOps operationalizes the setup of of data (automates
data processing). The purpose is to deliver/shape a capital
(of data).
Fosters collaboration between data engineers who own and
shape the data, and ops deploying the underlying data
processing jobs.
#3 AIOPs
AIOPs operationalizes the delivery of models. The purpose is
to deliver value.
Fosters collaboration between datascientists who explore
data to build up models, and ops delivering these as
useable asset.
Designed by devs, bandaged by the Ops (less fun in english)
So, what about BizDevOps, ITSecOps, DevFinOps, etc.? Business, Security, Finance, etc. are transversal interlocutors / topics which are to be addressed anyway, whatever we’re speaking about DevOps,
DataOps or AIOPs.
22. Agile & DevOps are not enough for data projects
Agile+Devops was good for app-centric projects, where data was isolated. But data-centric projects triggers new additional
challenges!
● New players to involve: data scientists, data engineers... These may have a completely different background
(mathematicians...) and face the technology differently. → Need common understanding, appropriate ergonomy.
(notebooks, GUI…)
● A recurrent technological/language stack used for the various types of jobs to handle: ingestion, dataprep, modeling… →
Need for a ready-to-use toolbox
● Coordinate the various jobs applied to the data → Need for job pipelining/orchestration
● Feed the dev process massively using production data (ex. for machine learning) → Strengthen security
● Identify the patrimony (cataloging), share data, control spreading → Need for governance
23. One DataOps definition
DataOps is a collaborative data management
practice focused on improving the communication,
integration and automation of data flows between
data managers and data consumers across an
organization.
The goal of DataOps is to deliver value faster by
creating predictable delivery and change
management of data, data models and related
artifacts.
DataOps uses technology to automate the design,
deployment and management of data delivery with
the appropriate levels of governance and metadata
to improve the use and value of data in a dynamic
environment.
Source: Gartner - Innovation Insight for DataOps - Dec. 2018
24. DataOps is gaining momentum
The number of data and analytics
experts in business units will grow
at 3X the rate of experts in IT
departments, which will force
companies to rethink their
organizational models and skill sets.
80% of organizations will initiate
deliberate competency development
in the field of data literacy,
acknowledging their extreme
deficiency.
26. Data engineers need pipelines to deliver data
Extract Transform Agregate Share
Shared
Dataset(s)
& data APIs
Data processing
Consumers
That’s where your good old
datawarehouse
generally stands!
If data is the new oil, datalakes are just oil fields (passive, mass raw of structured & unstructured data),
Hive/Impala & co. are oil rigs, while the DataOps pipelines are refineries, aimed at processing data…
Car engines are the datascience leveraging on this fuel to provide a disruptive way of transportation!
#1 the datalake is not the point (while companies focused on it). Data processing is.
#2 You don’t process data just for the pleasure. You do it to support activities which, them, bring value to the business.
DATALAKE
Data storing: datalakes, object storage, data virtualization
27. In comparision, Dev needed pipelines to deliver innovative apps
Commit
Compile
& test
Package
Deploy to
Dev &
test
Code
Running
app
Promote
to … &
test
Promote
to PROD
29. ShareTransformExtract
Inception: DataOps (and AIOps) delivered in a DevOps way
CONSUMEAggregate
Data processing jobs (for ingesting, transforming data, etc.) are finally just pieces of code.
These pieces of code can be delivered themselves using DevOps principles :) Automated through delivery pipelines.
30. DataOps Orchestrator
Enables the delivery and run of
data projects
DataLab Teams
Data projects governance
Software factory
Inception: DataOps (& AIOPs) to be achieved... in a DevOps way!
Regular landscape for apps (app servers…)
UAT PRODPREPRODDEV
Feature
team x
Feature
team y
Version nVersion n+1Version n+3 Version n+2
Version nVersion n+1Version n+3 Version n+2
Business
needs
API
API
31. Building up a dataops platform
Concretely, you need a platform performing the following features:
- It must enable to deploy data processing jobs, leveraging on languages/stacks and technologies that are
commonly used by data engineers (Apache Sqoop, python, java…). Regular ETLs may be part of the story
- It must enable to schedule and run pipelines aggregating jobs in logical sequences (acquiring data, preparing it,
delivering it in datamarts (databases, indexing clusters…)
- It must provide data cataloging & governance features (to have a clear view of the data patrimony), and enable to
manage data governance/security (perform access control, etc.)
- It must appropriate types of datamarts regarding the data patrimony (structured/non structured, time oriented or
not, etc.)
- It must have an ergonomy enabling data engineers and dataops persons to be autonomous and productive (avoid
using tools not design for them, such as regular “OPs” schedulers, raw use of complex tools such as
kubernetes…)
Progressively, more event-driven, data streaming projects arrive on the market. They also need appropriate set of
underlying technologies (Kafka clusters among them)
33. Datahub commitments: build up a data capital
Data Dictionnary &
catalog
Data Extraction /
Lineage
Expertize animation,
marketing,
communication
Data Exposition Data Processing
Data WareHouse /
Data Lake
Data Viz
Data Quality
Governance /
Security
Modelization
Transversal commitment: Build up & share a transverse data capital for the company
The process is largely geared by DataOps pipelines!
This is an extract from a longer presentation: extensive version can be found here https://bit.ly/33tfoNJ
34. Datahub commitments: deliver usecases
Data Collection
Data Exploration &
Analysis tools
ML Code
ML Trainning
(Model)
Monitoring
Data Viz
Data Verification
Service
Presentation
Deliver valuable usecases for the business
The process is largelly geared by a combination of Devops + Dataops + ML/AIOps pipelines!
This is an extract from a longer presentation: extensive version can be found here https://bit.ly/33tfoNJ
36. From DevOps to DataOps & AIOPs
Squad Squad Squad
Chapter devs
Tribe
Chapter ...
Chapter
datascience
Chapter data
engineer
False good idea
Sounds logical, prolongating agile/devops paradigms. But it’s too early! You don’t have the
maturity & critical mass to do this at the begining!
37. From DevOps to DataOps & AIOPs: short term
Squad Squad Squad
Chapter devs
Tribe
Chapter ...
SquadSquad
Chapter
datascience
Chapter data
engineer
DataHub
Valuableusecasesforthebusiness
Transversa
lactivities
Build a datahub first, which create a clear positionning, creates visibility accross the org.
Two objectives: deliver valuable usecases to ignite & show off value of data, while data used for it are the first data to integrate you data catalog
38. Data scientists chapters
(per tribe & datahub)
linked through a guild
From DevOps to DataOps & AIOPs: longer term
Squad Squad Squad
Chapter devs
Tribe
Chapter ...
Squad
Data engineers
chapters (per
tribe & datahub)
linked through a
guild
DataHub
People working on business usescases will progressively get back to the regular organization: if you don’t your just creating a new silo, while the devops/agile
orgz were intended to remove them (paradox). As it was usefull in a first step, it should progressively spread in the org. You may only keep few squads to work on
very innovative tech to address new usescases (ex. deep learning when regular ML will become common). They will also be responsible to foster their expertize
through the guild they will animate too. However, you keep people working on transversal data engineering topics)
Valuableusecasesforthebusiness
Transversa
lactivities
39. Matrix organization & serendipity
This matrix organization (transversal datasets owned by the Datahub, securely shared to several isolated
usecases) enable to factorize the work (so raise your dataset ROI). Each time a usecase team needs a new
dataset, it should be capitalized by integratin the data catalog owned by the datahub (see the central team’s
value ?)
Serendipity: by having a clear understanding of your data patrimony, you can valorize it of course, but it may
also help to give new ideas! “Since I’ve this data, and this one, so I may be able to [your_new_idea_here]”
“If only HP knew what HP knows, we'd be three times more productive”
- Lew Platt, former CEO of Hewlett-Packard
Dataset #1 Dataset #2 Dataset #3 Dataset #4
Usecase #1
Usecase #2
Usecase #3
Data Catalog
41. Data engineering vs Data Science
[80%]
of a data project is roughly about
data aquisition/preparation/sharing
(data engineering)
[20%]
of a data project is roughly about
data valorization
(data science, data analytics)
→ Your datascientists generally spend most of their time at doing data engineering empirically
when a clear data engineer position doesn’t exist in your organization!
- It’s not very efficient (as datascientists costs much more than data engineers and are difficult to hire)
- They generally doesn’t like this activity (and may leave your company at the end!)
- Happens regularly: two datascientists using same data for different usecases will probably create 2 identical
ingestion/preparation pipelines for their projects (you miss a factorization effect)
42. Create clear Data Engineer and DataOps positions!
Data Engineers are the tech plumber of data
Key missions
- Create, configure transformation/preparation jobs to ingest and shape the data
- Deliver them through appropriate datamarts (DB, indexing clusters, APIs…)
- In small / fewly constrained setups, he may handle deployment/run of these process himself
in PROD (quite “noOps” pattern), or this is offloaded to a specialized dataops person
mutualized among several data engineers
Background
- More close to a developer / integrator than a datascientist! (but with a sensibilisation on data
challenges and technologies : Sqoop, HDFS, Hive, Impala, spark, Object storage, etc.)
Data analysts & scientists are experts in valorizing the data
Key missions
- Develop BI, analytics, models based on the datasets they have.
Background
- May come from a very non-IT background (former statisticians are commons) Knowledgeable
on specific frameworks (tensorflow, etc.)
The Data stewart is a functional manager of data
Key missions
- Manage governance and security
Background
- Have a functional / business knowledge of data
DataOps guy are the local, specialized OPs
attached to the data engineers & scientists
Key missions
- Offload deployment of jobs, pipelines and various assets built up by the data engineer (and
datascientists) from dev to prod
- Set up CI/CD toolchains and teach data engineers to work “in a devops way”
- Instrument/Monitor data flow and data quality, manage the run time
- ...
Background
- Mostly DevOps person, with sensibilization on data challenges, and technologies
Transversal,
support data
functions
44. How to start?
Focus on early usecases delivery to gain trust: datascientists and
analysts should be your best friends
● Define clear Data Engineer or even DataOps positions
● Provide them an industrial platform, enabling them to be more
autonomous and productive (less round trips with ops)
● Empower pluridisciplinary data project teams and make them
achieve some first (simple!) use cases to create confidence and
gain more budget if needed
● Set up empirically a basic data catalog made of the dataset
gathered and prepared for your usecases
Don’t enforce organization changes yet! Foster day to day collaboration on operational
topics first. Adopting technologies and automation is at the heart of any tech people (IT
dept. at the first row). This is a quite natural process. But changing organization is much
more sensitive (address management reorganization, people objectives changes, etc.).
This should be done in a latter step, when some early victories have helped to gain trust,
and proves your path is the right one.
45. How to start?
Now, it’s time to shape your datahub
● On the tech side: Automate the whole toolchain (CI/CD); shift to
more (complex) use cases (AI…), scale out platform
● Start changing organization / management: set up your datahub
with a clear commitment, spend more energy on the dataops part,
since enough usecases have been delivered to justify the
factorization/transversal effect
On a longer term, scuttle your work!
● More seriously, your initial siloted approach enabled to have the
critical mass to bootstrap. Now, it’s time to desilot your datalab to
spread in the whole IT dept; if you don’t, you just created a sub data
driven IT, in the larger IT ecosystem, with few porosity
46. BEWARE
Data engineering is a hidden (‘cause spotlights are on
datascientists) key success factor to accelerate,
increase reliability and enhance ROI of your data
project.
But don’t “do Dataops for Dataops”!
Remind : DataOps is there to serve, offload pains of
datascientists & analysts, which them transform
business needs in solution. Exactly like ITOps is there to
provide infrastructure assets to any app / data teams of
the IT dept...
47. WeWork
92 Av. des Champs-Élysées
75008 Paris - France
Seine Innopolis
72, rue de la République
76140 Le Petit-Quevilly - France
Thank you!
@adrienblind