ML.NET is an open source, machine learning framework built in .NET and runs on Windows, Linux and macOS. It allows developers to integrate custom machine learning into their applications without any prior expertise in developing or tuning machine learning models. Enhance your .NET apps with sentiment analysis, price prediction, fraud detection and more using custom models built with ML.NET
In this Session, Andy will show not only the core of ML.NET but best practices around Azure Data Lake and data in general when using .NET
HA/DR options with SQL Server in Azure and hybridJames Serra
What are all the high availability (HA) and disaster recovery (DR) options for SQL Server in a Azure VM (IaaS)? Which of these options can be used in a hybrid combination (Azure VM and on-prem)? I will cover features such as AlwaysOn AG, Failover cluster, Azure SQL Data Sync, Log Shipping, SQL Server data files in Azure, Mirroring, Azure Site Recovery, and Azure Backup.
This document discusses using Azure HDInsight for big data applications. It provides an overview of HDInsight and describes how it can be used for various big data scenarios like modern data warehousing, advanced analytics, and IoT. It also discusses the architecture and components of HDInsight, how to create and manage HDInsight clusters, and how HDInsight integrates with other Azure services for big data and analytics workloads.
This presentation is for those of you who are interested in moving your on-prem SQL Server databases and servers to Azure virtual machines (VM’s) in the cloud so you can take advantage of all the benefits of being in the cloud. This is commonly referred to as a “lift and shift” as part of an Infrastructure-as-a-service (IaaS) solution. I will discuss the various Azure VM sizes and options, migration strategies, storage options, high availability (HA) and disaster recovery (DR) solutions, and best practices.
These slides are a copy of a last Azure Cosmos DB + Gremlin API in Action session which I had the pleasure to present on June 2nd, 2018 at PASS SQL Saturday event in Montreal. The original PowerPoint version contained much more elaborate series of animations. We understand that those had to be flatten for upload in this case. Though I guess you'll get the idea of the logic involved.
These are the slides for my talk "An intro to Azure Data Lake" at Azure Lowlands 2019. The session was held on Friday January 25th from 14:20 - 15:05 in room Santander.
In our first Windows webinar, find out about the benefits of migrating your Windows workloads to AWS. During the session, we will explain why AWS makes your Windows applications faster, more reliable and more secure. He will also talk about how to bring your own license (BYOL), how to architect, deploy, and manage your Windows platforms on AWS.
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
First introduced with the Analytics Platform System (APS), PolyBase simplifies management and querying of both relational and non-relational data using T-SQL. It is now available in both Azure SQL Data Warehouse and SQL Server 2016. The major features of PolyBase include the ability to do ad-hoc queries on Hadoop data and the ability to import data from Hadoop and Azure blob storage to SQL Server for persistent storage. A major part of the presentation will be a demo on querying and creating data on HDFS (using Azure Blobs). Come see why PolyBase is the “glue” to creating federated data warehouse solutions where you can query data as it sits instead of having to move it all to one data platform.
Azure SQL Database Managed Instance is a new flavor of Azure SQL Database that is a game changer. It offers near-complete SQL Server compatibility and network isolation to easily lift and shift databases to Azure (you can literally backup an on-premise database and restore it into a Azure SQL Database Managed Instance). Think of it as an enhancement to Azure SQL Database that is built on the same PaaS infrastructure and maintains all it's features (i.e. active geo-replication, high availability, automatic backups, database advisor, threat detection, intelligent insights, vulnerability assessment, etc) but adds support for databases up to 35TB, VNET, SQL Agent, cross-database querying, replication, etc. So, you can migrate your databases from on-prem to Azure with very little migration effort which is a big improvement from the current Singleton or Elastic Pool flavors which can require substantial changes.
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
This presentation shows how you can build solutions that follow the modern data warehouse architecture and introduces the .NET for Apache Spark support (https://dot.net/spark, https://github.com/dotnet/spark)
Microsoft Azure Cosmos DB is a multi-model database that supports document, key-value, wide-column and graph data models. It provides high throughput, low latency and global distribution across multiple regions. Cosmos DB supports multiple APIs including SQL, MongoDB, Cassandra and Gremlin to allow developers to use their preferred API based on their application needs and skills. It also provides automatic scaling of throughput and storage across all data partitions.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Microsoft Data Platform - What's includedJames Serra
This document provides an overview of a speaker and their upcoming presentation on Microsoft's data platform. The speaker is a 30-year IT veteran who has worked in various roles including BI architect, developer, and consultant. Their presentation will cover collecting and managing data, transforming and analyzing data, and visualizing and making decisions from data. It will also discuss Microsoft's various product offerings for data warehousing and big data solutions.
Machine learning allows us to build predictive analytics solutions of tomorrow - these solutions allow us to better diagnose and treat patients, correctly recommend interesting books or movies, and even make the self-driving car a reality. Microsoft Azure Machine Learning (Azure ML) is a fully-managed Platform-as-a-Service (PaaS) for building these predictive analytics solutions. It is very easy to build solutions with it, helping to overcome the challenges most businesses have in deploying and using machine learning. In this presentation, we will take a look at how to create ML models with Azure ML Studio and deploy those models to production in minutes.
The breath and depth of Azure products that fall under the AI and ML umbrella can be difficult to follow. In this presentation I’ll first define exactly what AI, ML, and deep learning is, and then go over the various Microsoft AI and ML products and their use cases.
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
Discover, manage, deploy, monitor – rinse and repeat. In this session we show how Azure Machine Learning can be used to create the right AI model for your challenge and then easily customize it using your development tools while relying on Azure ML to optimize them to run in hardware accelerated environments for the cloud and the edge using FPGAs and Neural Network accelerators. We then show you how to deploy the model to highly scalable web services and nimble edge applications that Azure can manage and monitor for you. Finally, we illustrate how you can leverage the model telemetry to retrain and improve your content.
Cortana Analytics Suite is a fully managed big data and advanced analytics suite that transforms your data into intelligent action. It is comprised of data storage, information management, machine learning, and business intelligence software in a single convenient monthly subscription. This presentation will cover all the products involved, how they work together, and use cases.
Azure SQL Database & Azure SQL Data WarehouseMohamed Tawfik
This document provides an overview of Microsoft Azure Data Services and Azure SQL Database. It discusses Infrastructure as a Service (IaaS) versus Platform as a Service (PaaS), and highlights the opportunities in the Linux database market. It also discusses Microsoft's commitment to customer choice and partnerships with companies like Red Hat. The remainder of the document focuses on features of Azure SQL Database, including an overview of the DTU and vCore purchasing models, managed instances, backup and recovery, high availability options, elastic scalability, and data sync capabilities.
Should I move my database to the cloud?James Serra
So you have been running on-prem SQL Server for a while now. Maybe you have taken the step to move it from bare metal to a VM, and have seen some nice benefits. Ready to see a TON more benefits? If you said “YES!”, then this is the session for you as I will go over the many benefits gained by moving your on-prem SQL Server to an Azure VM (IaaS). Then I will really blow your mind by showing you even more benefits by moving to Azure SQL Database (PaaS/DBaaS). And for those of you with a large data warehouse, I also got you covered with Azure SQL Data Warehouse. Along the way I will talk about the many hybrid approaches so you can take a gradual approve to moving to the cloud. If you are interested in cost savings, additional features, ease of use, quick scaling, improved reliability and ending the days of upgrading hardware, this is the session for you!
This document provides an overview of Azure Databricks, including:
- Azure Databricks is an Apache Spark-based analytics platform optimized for Microsoft Azure cloud services. It includes Spark SQL, streaming, machine learning libraries, and integrates fully with Azure services.
- Clusters in Azure Databricks provide a unified platform for various analytics use cases. The workspace stores notebooks, libraries, dashboards, and folders. Notebooks provide a code environment with visualizations. Jobs and alerts can run and notify on notebooks.
- The Databricks File System (DBFS) stores files in Azure Blob storage in a distributed file system accessible from notebooks. Business intelligence tools can connect to Databricks clusters via JDBC
Scaling face recognition with big data - Bogdan BocseITCamp
Exploring the experience and insight of VisageCloud into building, testing, training and ramping up to production face recognition workloads which can be easily integrated with big data stores.
ITCamp 2019 - Mihai Tataran - Governing your Cloud ResourcesITCamp
This document summarizes a presentation on governing cloud resources. The presentation covered:
1. The need for cloud governance to properly organize, secure, audit, and control costs of cloud resources as complexity increases.
2. How to implement governance on Microsoft Azure using tools like management groups, role-based access control, Azure Policy, auditing with activity logs, and blueprints to define repeatable resource deployments.
3. Demos of setting up management groups and policies in Azure, integrating governance with DevOps pipelines, and using autoscaling to optimize costs.
The presentation provided an overview of the importance of cloud governance and specific approaches for implementing it on Azure to manage permissions, compliance, costs, and
Azure SQL Database From A Developer's Perspective - Alex MangITCamp
SQL Server had, has and will certainly have a lot to offer, but the number one concern for a common developer when it comes to SQL Server is the management cost involved. This also happens to be the number one reason for why Azure SQL Database is so successful for hardcore developers who simply don’t want to become accidental DBAs and worry too much about the SQL production workload health. Throughout this fun session, I will demo (a lot!) and walk you through the techniques and technologies Microsoft brought to bring Azure SQL Database’s game up a few notches. These will literally make you, the developer, more productive and your application, more performant.
ITCamp 2018 - Damian Widera U-SQL in great depthITCamp
I would like to invite to the session about Microsoft Azure Data Lake and the USQL. I would like to show how quickly you can do data analysis using traditional C# and a new language that is a bit similar to the TSQL. I will also show more complicated things -how to run Python and R scripts to perform even more robust analysis
Testing your PowerShell code with Pester - Florin LoghiadeITCamp
As Infrastructure as Code is growing more in popularity, system administrators and devs started writing more and more sophisticated systems code and scripts.
Testing code is something that devs have been doing this for a long time while system administrators just started adopting the idea. With the growing popularity of PowerShell, more and more system administrators and devs began to write PowerShell code for provisioning and configuring infrastructure either on-premises or in the cloud, but the biggest problem was that there was no useful framework to test that code when a breaking change occurred.
This is the concept of “I ran it, and it worked,” did it now?
Enter Pester.
Pester is a unit testing framework for PowerShell. It provides a few simple-to-use keywords that let you create tests for your scripts. Pester implements a test drive to isolate your test files, and it can replace almost any command in PowerShell with your implementation. This makes it an excellent framework for both Black-box and White-box testing.
In this presentation, you will learn what Pester is, how you can use pester as your daily driver when you’re writing scripts and how you can use Pester to make your life better when change happens.
ITCamp 2018 - Damian Widera - SQL Server 2016. Meet the Row Level Security. P...ITCamp
I would like to Present a very important feature of the next SQL Server that is called Rów Level Security. that feature gives a new security Level to the product and musy be understand in depth by all Developers. I would like to Present the feature and show all implications especially important from performance point of view. I will be doing demos all the time.
ITCamp 2018 - Andrea Martorana Tusa - Failure prediction for manufacturing in...ITCamp
Working in manufacturing industry means that you must deal with product failures. As a BI and/or Data Scientist developer, your task is not only monitor and report product’s health state during its lifecycle, but also predict the likelihood of a fail in the production phase or when product has been delivered to the customer.
Machine Learning techniques can help us to accomplish this task. Starting from past failure data, we can build up a predictive model to forecast the likelihood for a product to fail or giving an estimate on its duration. And now it is possible to develop an end-to-end solution in SQL Server, because of the introduction of advanced analytics tools like R since release 2016.
In this session, we start from the real case of a manufacturing company to create some predictive models: a) Regression model, to predict how many months a product will last once it is delivered to the customer; Time to Failure (TTF); b) Classification models, to predict if a product will fail within a given time frame. (Binary or multi-class classification problem.(
The solution is built using SQL Server 2016, R and R services. On top of that, some reports are created to deliver the outcome to the stakeholders.
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Databricks
This document summarizes Walmart's transition to building an enterprise data platform on Azure Databricks to enable machine learning and data science at scale. Previously, Walmart had a complex and slow legacy technology stack. The new platform goals were to centralize data in the cloud, increase productivity with data science tools, and reduce costs. Key aspects of the new platform included using Azure and Databricks for data processing and machine learning, Airflow for orchestration, and building several machine learning models for applications like fraud detection and product recommendations. Challenges in the transition included optimizing performance and managing resources across the platforms.
Bridging the Gap: Analyzing Data in and Below the CloudInside Analysis
The Briefing Room with Dean Abbott and Tableau Software
Live Webcast July 23, 2013
http://www.insideanalysis.com
Today’s desire for analytics extends well beyond the traditional domain of Business Intelligence. That’s partly because business users are realizing the value of mixing and matching all kinds of data, from all kinds of sources. One emerging market driver is Cloud-based data, and the desire companies have to analyze this data cohesively with their on-premise data sets.
Register for this episode of The Briefing Room to learn from Analyst Dean Abbott, who will explain how the ability to access data in the cloud can play a critical role for generating business value from analytics. He’ll be briefed by Ellie Fields of Tableau Software who will tout Tableau’s latest release, which includes native connectors to cloud-based applications like Salesforce.com, Amazon Redshift, Google Analytics and BigQuery. She’ll also demonstrate how Tableau can combine cloud data with other data sources, including spreadsheets, databases, cubes and even Big Data.
ITCamp 2019 - Emil Craciun - RoboRestaurant of the future powered by serverle...ITCamp
Let's face it, our world will be taken over by robots, or at least our jobs as the scary ML & AI speculations seem to say. But until that day arrives, I want to take you on a hypothetical journey of designing and creating a fully automated restaurant of the future, where a fine tuned and efficiently orchestrated group of RoboChefs will cook your desired meal perfectly each time. And all of this is possible thanks to Actions, Timers, Monitors, Orchestrators, Sub-Orchestrators and more, all concepts from Azure Durable Functions, the real focus of this session, an extension to Functions that adds state, and which are part of Azure's Serverless Compute technologies.
From Developer to Data Scientist - Gaines KergosienITCamp
ABSTRACT: Due to recent advances in technology, humanity is collecting vast amounts of data at an unprecedented rate, making the skills necessary to mine insights from this data increasingly valuable. So what does it take for a Developer to enter the world of data science?
Join me on a journey into the world of big data and machine learning where we will explore what the work actually looks like, identify which skills are most important, and design a road map for how you too can join this exciting and profitable industry.
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...ITCamp
If there is a common practice in architecting software systems, it is to have them store the last known state of business entities in a relational database: though widely adopted and effectively supported by existing development tools, this practice trades the easiness of implementation with the cost of losing the history of such entities.
Event Sourcing provides a pivotal solution to this problem, giving systems the capability of restoring the state they had at any given point in time. Furthermore, injecting mock-up events and having them replayed by the business logic allows for an easy implementation of simulations and “what if” scenarios.
In this session, Andrea will demonstrate how to design time travelling systems by examining real-world, production-tested solutions.
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks
Interested in learning how Showtime is leveraging the power of Spark to transform a traditional premium cable network into a data-savvy analytical competitor? The growth in our over-the-top (OTT) streaming subscription business has led to an abundance of user-level data not previously available. To capitalize on this opportunity, we have been building and evolving our unified platform which allows data scientists and business analysts to tap into this rich behavioral data to support our business goals. We will share how our small team of data scientists is creating meaningful features which capture the nuanced relationships between users and content; productionizing machine learning models; and leveraging MLflow to optimize the runtime of our pipelines, track the accuracy of our models, and log the quality of our data over time. From data wrangling and exploration to machine learning and automation, we are augmenting our data supply chain by constantly rolling out new capabilities and analytical products to help the organization better understand our subscribers, our content, and our path forward to a data-driven future.
Authors: Josh McNutt, Keria Bermudez-Hernandez
Day zero of a cloud project Radu Vunvulea ITCamp 2018Radu Vunvulea
In this session, we will take a look at what we need to keep on our radar when we start a new project inside the Cloud. When not only us but also clients are new to the Cloud ecosystem, we need to be aware of small things that can make a difference or secure the success of the project. Things like security, environment isolation and access controls are just a tiny part of the things that we need be aware. There will be examples from real life projects and hands-on implementations inside Microsoft Azure. After joining this session, you will have an initial view on what things we shall be aware when we start a new project inside the Cloud.
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
Many data scientists are well grounded in creating accomplishment in the enterprise, but many come from outside – from academia, from PhD programs and research. They have the necessary technical skills, but it doesn’t count until their product gets to production and in use. The speaker recently helped a struggling data scientist understand his organization and how to create success in it. That turned into this presentation, because many new data scientists struggle with the complexities of an enterprise.
It camp 2015 how to scale above clouds limits, radu vunvuleaRadu Vunvulea
The number of devices that are online increases every day. The quantity of digital content that is produced every year sets new record each time. Last but not least clients are more and more demanding. A cloud provider offers us a great basket of resources but we need to know how to use and manage them.
In this session we will talk about how to scale over this limits and how to be prepared for this kind of situations. If we are designing our system to be prepared to scale over cloud services limits then we will have a system that will be used in 5 year from now. We will talk about different scenarios when it is easy to reach different limits and we will learn how to overcome them.
One Azure Monitor to Rule Them All? - Marius ZahariaITCamp
After winding paths, the different Azure services finally harmonize into a unified monitoring strategy. Focus on Azure Monitor and its features, as well as the modalities of integration between Azure Monitor and complementary blocks, Application Insights, or Log Analytics.
One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)Marius Zaharia
After winding paths, the different Azure services finally harmonize into a unified monitoring strategy. Focus on Azure Monitor and its features, as well as the modalities of integration between Azure Monitor and complementary blocks, Application Insights, or Log Analytics.
Session presented at IT Camp 2017, Cluj, Romania.
Data Con LA 2020
Description
Coming from a grand belief of data democratization, I believe that in order for any team to be successful collaborators, it has to be data centric and data should be accessible to all.
*To ensure that your non software or software engineering centric team has maximum efficiency, data should be visible, data lake should be accessible.
*Form a database for analytics summaries, talk about the different technologies(SQL, NoSQL) cost of deployment, need, team driven structure. Build an API for this database for external/inter team crosstalk.
*Build analytics and visual layer on top of it. Flask/Django/Node, etc.., to enable the team to have high visibility in their analysis, and to ensure a higher turnaround of data.
*Talk about an easy way of enabling the team to run code, could be local/cloud, JupyterHub is a great way of doing so, talk about the tremendous value added in that and the potential it enables
*Talk about the common tools user for version control/CICD/Coding technologies, etc..
*Finally summarize the value of the mixture of all these tools and technologies in order to ensure the maximum efficiency.
Speaker
Nawar Khabbaz, Rivian, Data Engineer
Similar to ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake (20)
ITCamp 2019 - Stacey M. Jenkins - Protecting your company's data - By psychol...ITCamp
Protecting your company's data: by psychologically evaluating potential Espionage and Spy activity
•We talk about protecting data.
•We talk about outside forces seeking to obtain our data by
unconventional means.
•I will speak about PROTECTING or DATA that is stolen from
trusted individuals within.
ITCamp 2019 - Silviu Niculita - Supercharge your AI efforts with the use of A...ITCamp
Microsoft "Automated Machine Learning" (AutoML) is an amazing toolkit now available on Azure that's really starting to ramp up.
In a nutshell, it is an automated service that identifies the best machine learning pipelines for labeled data ... it dramatically frees up time for experienced practitioners and gives a tremendous boost to in productivity engineers at the start of their ML journey.
ITCamp 2019 - Peter Leeson - Managing SkillsITCamp
Understanding skills is key to managing any organisation. Skills are not necessarily related to your job, your qualifications or your studies, they are related to what you can do and the responsibilities you have (or should have) within your organisation. Through a systematic and structured approach to understanding, analysing and classifying skills, the business can become more effective, staff has a better understanding of their roles and responsibilities, there is increased job satisfaction, and clear career and training progression plans can be defined.
ITCamp 2019 - Ivana Milicic - Color - The Shadow Ruler of UXITCamp
Color. It has the power to evoke emotions and empower the effectiveness of a product, but it also has the ability to ruin otherwise meticulously crafted user experiences. It often rules from the shadows, disguised as a purely aesthetic element and a mean of beautification. Let’s see how to overtake control and strategically use color in digital product development.
Product teams often fail to remember that color has an enormous impact on our response to visual stimulation during human-computer interaction. The most immediate and direct psychological impact on experiences is of course - color. With its complexity and various levels of subconscious effects, it triggers an emotional response.
Color doesn’t live in a vacuum, and we need to start considering it in the context of use. There are many aspects that we need to take into account: target audience and their potential visual impairments, cultural background and individual difference, previous experiences and memories, the physical environment of use and compliance with the brand.
In this talk, we will immerse into approaches and best practices that product teams should take for strategic use of color in their product design process. After a basic introduction to color theory and psychology (to make sure everyone is up to speed), we will elaborate in detail how even subtle differences in color schemes have a significant impact on interface perception and product success. We will show a series of interface examples we tested on various users and do some live testing on site as well.
Clean Architecture as a term is around for a while. However, the path to implement it is not always clear nor easy to follow. When projects fail for reasons that are primary technical, the reason is often uncontrolled complexity. The complexity goes out of hand when the code lacks structure, when it lacks Clean Architecture.
In this session, I will show how to achieve consistency by implementing Clean Architecture through structure, rather than relying on discipline only. We will look at some basic building blocks of an application infrastructure which will enforce the way dependencies are created, how dependency injection is used or how separation of the data access concerns is enforced.
ITCamp 2019 - Florin Loghiade - Azure Kubernetes in Production - Field notes...ITCamp
You played around with containers? You feel you can handle the adrenaline rush of publishing your containers in production? Well hold on there because there are some aspects you need to consider before you start rushing to production. How you will handle auto-scalling? What about updates / upgrades? Downtime of your app? Version 1 and Version 2? CI/CD? Etc.
This session is about deploying your services on containers using the Azure Kubernetes managed offering. You will learn about what problems you might encounter and how to handle them during your deployment journey, and we will cover the main features of Kubernetes and how they can be of use to you
ITCamp 2019 - Florin Flestea - How 3rd Level support experience influenced m...ITCamp
After being a 3rd level support guy for 2 years, my code changed in several ways. Why this happened? Is this change good? Should you care about this?
I will tell from experience how my code changed and in what ways so that you can prevent the same mistakes I did and how to make your days better instead of wasting time debugging and trying to understand what happened in production
ITCamp 2019 - Eldert Grootenboer - Cloud Architecture Recipes for The EnterpriseITCamp
Azure offers a wide range of services, with which we can build powerful solutions. But how do we know which services to choose, and how to combine them to create even better architectures? In this session, we will take a look at real-life scenarios and how we solved by leveraging the power of Azure.
Blockchain is one of the main legal tech trends today and, like any new technology, comes with strings attached. Issues like enforceability of smart contracts, performance risks, data privacy and compliance with various regulations in different jurisdictions are main legal concerns. The session will focus on the main legal risks by means of case studies and offer a hands-on approach for risk management in case of blockchain and architectures of distributed ledgers.
ITCamp 2019 - Andy Cross - Business Outcomes from AIITCamp
Andy Cross, Director of Elastacloud, Microsoft Regional Director, Azure MVP and all round good guy, gives a session on how to successfully build or transform a business using AI technologies.
Over the last years, Elastacloud have delivered analytics projects to a variety of customers. The greatest challenges around AI are both technical and organisational. The existing landscape of process and strategy doesn't solve these challenges in combination, and the gap between causes friction and the failure of AI projects.
When modelling the outcome of actions that were informed by AI, possibly enacted by AI, the standard risk modelling approaches need to be transformed to include a factor that can change over time to represent the effectiveness of the AI solutions. Given that we should accept errors as part of the AI solution, and that errors are reinforcing of better future decisions, we need to project risk as a decreasing vector over time.
ITCamp 2019 - Andrea Saltarello - Modernise your app. The Cloud StoryITCamp
"App Modernisation" is such a buzzword you might end up thinking there's no such thing. That code just needs to be rewritten every "N" years, that existing apps couldn't take advantage of new platforms, technologies or frameworks. That all the fuss about "goin' cloud" is a fad. Let me tell why you might consider being wrong.
ITCamp 2019 - Andrea Saltarello - Implementing bots and Alexa skills using Az...ITCamp
Thanks to the recently released v4 of the Bot Framework SDK, creating your first bot is a breeze; still, implementing a production viable one is no easy task since several aspects must be taken into account such as user authentication, integration within existing apps, multi language support, technical considerations (e.g.: Azure Functions vs. MVC Core, Blob Storage vs. CosmosDB) and, last but not least, operational costs.
Moreover, you might want to reuse your bot’s Azure hosted, Cognitive Services-backed code to address Amazon’s Alexa users to avoid the need to implement (and evolve) it twice.
Eager to learn how to do that for real? Don’t miss this code-based talk then.
ITCamp 2019 - Alex Mang - I'm Confused Should I Orchestrate my Containers on ...ITCamp
'There are multiple ways to skin a cat' says a famous Chinese proverb. However, when it comes to container orchestration in Azure you might feel confused and overwhelmed due to the high number of services and available services.
During this pragmatic session, you get a better understanding of the pros and cons of either choosing Service Fabric or AKS for container orchestration.
ITCamp 2019 - Alex Mang - How Far Can Serverless Actually Go NowITCamp
You may have heard me talk about the capabilities of Azure Logic Apps and Azure Functions before, but now I'm taking it up a few notches! And this is mostly because a lot of things have changed over the past few months in terms of serverless and cloud-native applications.
Join me at this session during which you will get to do a deep dive with me on the ins and outs of Azure Functions when it comes to developer real applications, not just 'Hello, World's and the brand-new, top-notch Azure Service Fabric Mesh offering.
I will finger point each bad practice and the things you should avoid, but at the end of the day we'll have created a highly scalable, production-ready application. So, how far and how fast can we actually go... now?
ITCamp 2019 - Peter Leeson - Vitruvian QualityITCamp
Marcus Vitruvius Pollio, commonly known as Vitruvius, was a Roman author, architect, civil engineer and military engineer during the 1st century BC. He is known for his multi-volume work entitled “De architectura” and his discussion of perfect proportion in architecture and the human body, which led, among others to the famous drawing by Leonardo da Vinci called the “Vitruvian Man”.
Within the principles of “Vitruvian Quality”, we seek to find those perfect proportions and how to align all components of the business architecture in order to make them fit the human needs of the impacted stakeholders.
ITCamp 2018 - Ciprian Sorlea - Million Dollars Hello World ApplicationITCamp
This session might look like a joke, and it partially is.
On one hand it is a parody about how the most recent trends in industry can significantly increase the cost associated with launching an application (design, development, hosting & operations, etc).
However, it is also a live demo of how you can incrementally evolve your application to take advantage of all the cool technologies out there without needing the actual a million dollars.
ITCamp 2018 - Ciprian Sorlea - Enterprise Architectures with TypeScript And F...ITCamp
The document discusses building enterprise applications with TypeScript. It provides an overview of TypeScript, describing it as a superset of JavaScript that adds types and other features. It also discusses some common technologies that work well with TypeScript, such as Node.js, Nest.js, Docker, Kubernetes, MongoDB, and Angular. The presentation aims to demonstrate how TypeScript can help build robust, scalable enterprise applications when combined with these complementary technologies.
ITCamp 2018 - Mete Atamel Ian Talarico - Google Home meets .NET containers on...ITCamp
What does it take to connect a Google Home to a .NET container running in the cloud? Surprisingly, not much! In this talk, we will use Dialogflow to setup a Google Home device to talk to a .NET container managed by Kubernetes Engine.
We will take a look at some of the Google Cloud services such as Machine Learning APIs, BigQuery, Stackdriver diagnostics and see how they can elevate our Google Home to the next level. If you’re curious about what Google has to offer for your .NET apps, this talk is for you!
ITCamp 2018 - Magnus Mårtensson - Azure Global Application PerspectivesITCamp
Building and running a service for a truly global audience has always been the ultimate challenge for any business and for any application developer. In this session, we will discuss global perspectives on running your application tier in a scalable way – WebApps/APIs, Traffic Manager and Serverless. We will discuss the new Cosmos DB service offering in Azure and it’s built in global sync with little more than a press of a button on your end – data was always the final frontier of globalization of your app. We will look at what it takes to monitor this kind of an environment. Naturally this is a very big set of topics which means this session is aimed to give an overview, spark a discussion and provide some directional and inspirational input.
ITCamp 2018 - Magnus Mårtensson - Azure Resource Manager For The WinITCamp
With the new model, Azure Resource Manger Microsoft are gaining the repeatability they always wanted to have for deployment to the Cloud and removing the dreary, repetitive, error prone manual deployment tasks which has always held us back! With ARM, you can create a Template for your environment and use that for deploying identical environments every time without fail! There is some news in the world of “infrastructure as code” that we need to consider while setting up our Cloud environments. The Win we get from being able to deploy our development environment or our temporary load test environment automatically and identically to our production environment cannot be understated. This is ARM from a project efficiency, development and DevOps perspective. This is what you need to know to make you much more efficient every day of development.
Performance Budgets for the Real World by Tammy EvertsScyllaDB
Performance budgets have been around for more than ten years. Over those years, we’ve learned a lot about what works, what doesn’t, and what we need to improve. In this session, Tammy revisits old assumptions about performance budgets and offers some new best practices. Topics include:
• Understanding performance budgets vs. performance goals
• Aligning budgets with user experience
• Pros and cons of Core Web Vitals
• How to stay on top of your budgets to fight regressions
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
this resume for sadika shaikh bca studentSadikaShaikh7
I am a dedicated BCA student with a strong foundation in web technologies, including PHP and MySQL. I have hands-on experience in Java and Python, and a solid understanding of data structures. My technical skills are complemented by my ability to learn quickly and adapt to new challenges in the ever-evolving field of computer science.
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
Are you interested in dipping your toes in the cloud native observability waters, but as an engineer you are not sure where to get started with tracing problems through your microservices and application landscapes on Kubernetes? Then this is the session for you, where we take you on your first steps in an active open-source project that offers a buffet of languages, challenges, and opportunities for getting started with telemetry data.
The project is called openTelemetry, but before diving into the specifics, we’ll start with de-mystifying key concepts and terms such as observability, telemetry, instrumentation, cardinality, percentile to lay a foundation. After understanding the nuts and bolts of observability and distributed traces, we’ll explore the openTelemetry community; its Special Interest Groups (SIGs), repositories, and how to become not only an end-user, but possibly a contributor.We will wrap up with an overview of the components in this project, such as the Collector, the OpenTelemetry protocol (OTLP), its APIs, and its SDKs.
Attendees will leave with an understanding of key observability concepts, become grounded in distributed tracing terminology, be aware of the components of openTelemetry, and know how to take their first steps to an open-source contribution!
Key Takeaways: Open source, vendor neutral instrumentation is an exciting new reality as the industry standardizes on openTelemetry for observability. OpenTelemetry is on a mission to enable effective observability by making high-quality, portable telemetry ubiquitous. The world of observability and monitoring today has a steep learning curve and in order to achieve ubiquity, the project would benefit from growing our contributor community.
MYIR Product Brochure - A Global Provider of Embedded SOMs & SolutionsLinda Zhang
This brochure gives introduction of MYIR Electronics company and MYIR's products and services.
MYIR Electronics Limited (MYIR for short), established in 2011, is a global provider of embedded System-On-Modules (SOMs) and
comprehensive solutions based on various architectures such as ARM, FPGA, RISC-V, and AI. We cater to customers' needs for large-scale production, offering customized design, industry-specific application solutions, and one-stop OEM services.
MYIR, recognized as a national high-tech enterprise, is also listed among the "Specialized
and Special new" Enterprises in Shenzhen, China. Our core belief is that "Our success stems from our customers' success" and embraces the philosophy
of "Make Your Idea Real, then My Idea Realizing!"
Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threatsanupriti
In the rapidly evolving landscape of blockchain technology, the advent of quantum computing poses unprecedented challenges to traditional cryptographic methods. As quantum computing capabilities advance, the vulnerabilities of current cryptographic standards become increasingly apparent.
This presentation, "Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threats," explores the intersection of blockchain technology and quantum computing. It delves into the urgent need for resilient cryptographic solutions that can withstand the computational power of quantum adversaries.
Key topics covered include:
An overview of quantum computing and its implications for blockchain security.
Current cryptographic standards and their vulnerabilities in the face of quantum threats.
Emerging post-quantum cryptographic algorithms and their applicability to blockchain systems.
Case studies and real-world implications of quantum-resistant blockchain implementations.
Strategies for integrating post-quantum cryptography into existing blockchain frameworks.
Join us as we navigate the complexities of securing blockchain networks in a quantum-enabled future. Gain insights into the latest advancements and best practices for safeguarding data integrity and privacy in the era of quantum threats.
In this follow-up session on knowledge and prompt engineering, we will explore structured prompting, chain of thought prompting, iterative prompting, prompt optimization, emotional language prompts, and the inclusion of user signals and industry-specific data to enhance LLM performance.
Join EIS Founder & CEO Seth Earley and special guest Nick Usborne, Copywriter, Trainer, and Speaker, as they delve into these methodologies to improve AI-driven knowledge processes for employees and customers alike.
Quantum Communications Q&A with Gemini LLM. These are based on Shannon's Noisy channel Theorem and offers how the classical theory applies to the quantum world.
UiPath Community Day Kraków: Devs4Devs ConferenceUiPathCommunity
We are honored to launch and host this event for our UiPath Polish Community, with the help of our partners - Proservartner!
We certainly hope we have managed to spike your interest in the subjects to be presented and the incredible networking opportunities at hand, too!
Check out our proposed agenda below 👇👇
08:30 ☕ Welcome coffee (30')
09:00 Opening note/ Intro to UiPath Community (10')
Cristina Vidu, Global Manager, Marketing Community @UiPath
Dawid Kot, Digital Transformation Lead @Proservartner
09:10 Cloud migration - Proservartner & DOVISTA case study (30')
Marcin Drozdowski, Automation CoE Manager @DOVISTA
Pawel Kamiński, RPA developer @DOVISTA
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
09:40 From bottlenecks to breakthroughs: Citizen Development in action (25')
Pawel Poplawski, Director, Improvement and Automation @McCormick & Company
Michał Cieślak, Senior Manager, Automation Programs @McCormick & Company
10:05 Next-level bots: API integration in UiPath Studio (30')
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
10:35 ☕ Coffee Break (15')
10:50 Document Understanding with my RPA Companion (45')
Ewa Gruszka, Enterprise Sales Specialist, AI & ML @UiPath
11:35 Power up your Robots: GenAI and GPT in REFramework (45')
Krzysztof Karaszewski, Global RPA Product Manager
12:20 🍕 Lunch Break (1hr)
13:20 From Concept to Quality: UiPath Test Suite for AI-powered Knowledge Bots (30')
Kamil Miśko, UiPath MVP, Senior RPA Developer @Zurich Insurance
13:50 Communications Mining - focus on AI capabilities (30')
Thomasz Wierzbicki, Business Analyst @Office Samurai
14:20 Polish MVP panel: Insights on MVP award achievements and career profiling
Hire a private investigator to get cell phone recordsHackersList
Learn what private investigators can legally do to obtain cell phone records and track phones, plus ethical considerations and alternatives for addressing privacy concerns.
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecJames Anderson
The lecture titled "Automating AppSec" delves into the critical challenges associated with manual application security (AppSec) processes and outlines strategic approaches for incorporating automation to enhance efficiency, accuracy, and scalability. The lecture is structured to highlight the inherent difficulties in traditional AppSec practices, emphasizing the labor-intensive triage of issues, the complexity of identifying responsible owners for security flaws, and the challenges of implementing security checks within CI/CD pipelines. Furthermore, it provides actionable insights on automating these processes to not only mitigate these pains but also to enable a more proactive and scalable security posture within development cycles.
The Pains of Manual AppSec:
This section will explore the time-consuming and error-prone nature of manually triaging security issues, including the difficulty of prioritizing vulnerabilities based on their actual risk to the organization. It will also discuss the challenges in determining ownership for remediation tasks, a process often complicated by cross-functional teams and microservices architectures. Additionally, the inefficiencies of manual checks within CI/CD gates will be examined, highlighting how they can delay deployments and introduce security risks.
Automating CI/CD Gates:
Here, the focus shifts to the automation of security within the CI/CD pipelines. The lecture will cover methods to seamlessly integrate security tools that automatically scan for vulnerabilities as part of the build process, thereby ensuring that security is a core component of the development lifecycle. Strategies for configuring automated gates that can block or flag builds based on the severity of detected issues will be discussed, ensuring that only secure code progresses through the pipeline.
Triaging Issues with Automation:
This segment addresses how automation can be leveraged to intelligently triage and prioritize security issues. It will cover technologies and methodologies for automatically assessing the context and potential impact of vulnerabilities, facilitating quicker and more accurate decision-making. The use of automated alerting and reporting mechanisms to ensure the right stakeholders are informed in a timely manner will also be discussed.
Identifying Ownership Automatically:
Automating the process of identifying who owns the responsibility for fixing specific security issues is critical for efficient remediation. This part of the lecture will explore tools and practices for mapping vulnerabilities to code owners, leveraging version control and project management tools.
Three Tips to Scale the Shift Left Program:
Finally, the lecture will offer three practical tips for organizations looking to scale their Shift Left security programs. These will include recommendations on fostering a security culture within development teams, employing DevSecOps principles to integrate security throughout the development
How Netflix Builds High Performance Applications at Global ScaleScyllaDB
We all want to build applications that are blazingly fast. We also want to scale them to users all over the world. Can the two happen together? Can users in the slowest of environments also get a fast experience? Learn how we do this at Netflix: how we understand every user's needs and preferences and build high performance applications that work for every user, every time.
Implementations of Fused Deposition Modeling in real worldEmerging Tech
The presentation showcases the diverse real-world applications of Fused Deposition Modeling (FDM) across multiple industries:
1. **Manufacturing**: FDM is utilized in manufacturing for rapid prototyping, creating custom tools and fixtures, and producing functional end-use parts. Companies leverage its cost-effectiveness and flexibility to streamline production processes.
2. **Medical**: In the medical field, FDM is used to create patient-specific anatomical models, surgical guides, and prosthetics. Its ability to produce precise and biocompatible parts supports advancements in personalized healthcare solutions.
3. **Education**: FDM plays a crucial role in education by enabling students to learn about design and engineering through hands-on 3D printing projects. It promotes innovation and practical skill development in STEM disciplines.
4. **Science**: Researchers use FDM to prototype equipment for scientific experiments, build custom laboratory tools, and create models for visualization and testing purposes. It facilitates rapid iteration and customization in scientific endeavors.
5. **Automotive**: Automotive manufacturers employ FDM for prototyping vehicle components, tooling for assembly lines, and customized parts. It speeds up the design validation process and enhances efficiency in automotive engineering.
6. **Consumer Electronics**: FDM is utilized in consumer electronics for designing and prototyping product enclosures, casings, and internal components. It enables rapid iteration and customization to meet evolving consumer demands.
7. **Robotics**: Robotics engineers leverage FDM to prototype robot parts, create lightweight and durable components, and customize robot designs for specific applications. It supports innovation and optimization in robotic systems.
8. **Aerospace**: In aerospace, FDM is used to manufacture lightweight parts, complex geometries, and prototypes of aircraft components. It contributes to cost reduction, faster production cycles, and weight savings in aerospace engineering.
9. **Architecture**: Architects utilize FDM for creating detailed architectural models, prototypes of building components, and intricate designs. It aids in visualizing concepts, testing structural integrity, and communicating design ideas effectively.
Each industry example demonstrates how FDM enhances innovation, accelerates product development, and addresses specific challenges through advanced manufacturing capabilities.
AC Atlassian Coimbatore Session Slides( 22/06/2024)apoorva2579
This is the combined Sessions of ACE Atlassian Coimbatore event happened on 22nd June 2024
The session order is as follows:
1.AI and future of help desk by Rajesh Shanmugam
2. Harnessing the power of GenAI for your business by Siddharth
3. Fallacies of GenAI by Raju Kandaswamy
AC Atlassian Coimbatore Session Slides( 22/06/2024)
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
1. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
MACHINE LEARNING WITH ML.NET
AND AZURE DATA LAKE
ANDY CROSS
Director Elastacloud / Azure MVP / Microsoft Regional Director
3. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Who am I?
• Andy Cross; andy@elastacloud.com;
@andyelastacloud
• Microsoft Regional Director
• Azure MVP for 7 years
• Co-Founder of Elastacloud, an international Microsoft
Gold Partner specialising in data
4. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
What am I here to talk about?
• Machine Learning
• A new dotnet approach to data
–ML.NET (https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet)
–Parquet.NET
• Big Data in production
–Relevant tools in Azure for data
6. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Machine learning for prediction talk quality
Formal definition: -
"A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P if its
performance at tasks in T, as measured by P, improves with
experience E.“
E = historical quantities of beer consumed and corresponding
audience ratings
T = predicting talk quality
P = accuracy of the talk quality prediction
7. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Examples today
• Sentiment analysis of website comments
• Finding mappings of historic taxi fares
• Predicting the most likely group of flowers a certain flower
belongs to
• Time Series
• Image Recognition
• NOTE: We reuse algorithms; to do so in most cases is akin
to writing a new database to write a website – a vast
overreach of engineering
8. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Machine learning best practice
Machine learning should be approached in a methodical
manner, in this way we are more likely to achieve
accurate, reliable and generalisable models
This is achieved by following best practice for machine
learning
Best practice mostly revolves around how the data is
used
9. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Machine learning best practice
Data preparation
– Ensure that the feature do not contain future data (aka time-travelling)
– Training, validation and testing data sets
Cross validation
– Think of this as using mini training and testing data sets to find a
model that generalises to the problem
Validation and testing data sets
– Data that the model has never seen before – simulate the future
– Gives a final ‘sanity’ check of our model’s performance
10. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Model training
Provide an algorithm with the training data set
The algorithm is ‘tuned’ to find the best parameters that
minimise some measure of error
11. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Model testing
To ensure that our model has not overfit to the training
data it is imperative that we use it to predict values from
unseen data
This is how we ensure that our model is generalisable
and will provide reliable predictions on future data
Use a validation set and test sets
12. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Validation set
The validation set is randomly chosen data from the same
data that the model is trained with – but not used for training
Used to check that the trained model gives predictions that
are representative of all data
Used to prevent overfitting
Gives a ‘best’ accuracy
13. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Test set
The test set data should be ‘future’ data
We simulate this by selecting data from the end of the
available time-frame with a gap
Use the trained model to predict for this
More realistic of the model in production
Gives a conservative estimate of the accuracy
14. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Test 2
= Training data
= Validation data
= Testing data
= Ignored data
01/01/15 13/11/1721/06/17 01/11/17
Test 1
09/06/1704/02/17
15. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Model training, validation and testing
Train with cross validation to find best parameters
Assess overfitting on validation set
Retrain with best parameters on training data set
Evaluate performance on test data sets
17. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Distributed Computing
• Break a larger problem into smaller tasks
• Distribute tasks around multiple computation hosts
• Aggregate results into an overall result
• Types of Distributed Compute
– HPC – Compute Bound
– Big Data – IO Bound
• Big Data – Database free at scale IO over flat files
18. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Freed of historic constraints
• Algorithms such as we’ll see tonight are not new
–From the 1960s many
• The difference is the ability to show the algorithm
more examples
• Removing IO bounds gives us access to more data
• Removing CPU bounds allows us to compute over
larger domains
19. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Azure Services for Distributed Compute
• Azure Batch
• Azure HDInsight
• Azure Databricks
• Bring your own partitioner
–Azure Functions
–Azure Service Fabric (Mesh)
–Virtual Machines
–ACS/AKS
21. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Various places for ML
• Azure Databricks
• Azure HDInsight
• Azure Data Lake Analytics
• Azure ML
– V1
– V2 with Workbench
– V2 RC no Workbench
• R/Python in many hosts
– Functions
– Batch
– SQL Database
• C# and dotnet hosted in many
places
• Typical Azure DevOps pipelines
more mature for .NET
22. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Azure Databricks
• Databricks Spark hosted as SaaS on Azure
• Focussed on collaboration between data scientists and
data engineers
• Powered by Apache Spark
• Dev in:
–Scala
–Python
–Spark-SQL
–More?
23. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Collaboration in Databricks
Collaborative Editing
Switch between Scala, Python, R & SQL
On notebook comments
Link notebooks with GitHub
24. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Databricks Notebooks Visuals
Visualisations made easy
Use popular libraries – ggplot, matplotlib
Create a dashboard of pinned visuals
Use Pip, CRAN & JAR packages to add
additional visual tools
25. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Azure HDInsight
• Multipurpose Big Data platform
• Operates as a host and configurator for various tools
–Hadoop
–Hive
–Hbase
–Kafka
–Spark
–Storm
27. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Data Landscape
.NET
Utilising System.IO
is unstructured,
our evidence is
that most users
handroll some
form of CSV/TSV
or use JSON
encoding as it is a
safe space.
First stage transcoding
from “something .NET
can write to”
All subsequent data
inspection and
manipulation outside
of .NET
Parquet
Operationalisation
mandates database
rectangularisation to take
advantage of ADO.NET
.NET
28. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Data Landscape
.NET
Utilising System.IO
is unstructured,
our evidence is
that most users
handroll some
form of CSV/TSV
or use JSON
encoding as it is a
safe space.
Immediately serialize
.NET POCOs to
Parquet
Parquet specific
optimisation and
inspection tools in
.net
Parquet
.NET can read operational
output from ML and Big
Data natively using
Parquet.net
.NETParquet.NET
Machine
Learnin
g for
.NET
30. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
V1 Stable!!! ML.NET Pipelines
• Library evolving quickly
• Towards common approach in Spark/SciKit-Learn
• LearningPipeline weaknesses:
– Enumerable of actions not common elsewhere
– Logging and State not handled
• Bring in MlContext, Environment and lower level data tools
IDataView
• Not all <0.6.0 features available, so later demos are “Legacy”
33. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Sentiment Analysis
• By showing examples of content and a Toxic bit flag,
learn that makes things Toxic
• The quality of the model reflects the quality of the
data
–Representation
–Variety
–Breadth
34. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Testing and Training Data
Sentiment SentimentText
1 ==You're cool== You seem like a really cool guy... *bursts out laughing at sarcasm*.
0 I just want to point something out (and I'm in no way a supporter of the strange old git), but he is referred to as Dear Leader, and his father
was referred to as Great Leader.
1 ==RUDE== Dude, you are rude upload that carl picture back, or else.
0 " : I know you listed your English as on the ""level 2"", but don't worry, you seem to be doing nicely otherwise, judging by the same page -
so don't be taken aback. I just wanted to know if you were aware of what you wrote, and think it's an interesting case. : I would write that
sentence simply as ""Theoretically I am an altruist, but only by word, not by my actions."". : PS. You can reply to me on this same page, as I
have it on my watchlist. "
Tab Separated, two columned data set with headers.
36. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Simple to build classifier
var pipeline = new TextTransform(env, "Text", "Features")
.Append(new LinearClassificationTrainer(env, new
LinearClassificationTrainer.Arguments(),
"Features",
"Label"));
var model = pipeline.Fit(trainingData);
37. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Evaluation of Models
IDataView testData = GetData(env, _testDataPath);
var predictions = model.Transform(testData);
var binClassificationCtx = new BinaryClassificationContext(env);
var metrics = binClassificationCtx.Evaluate(predictions, "Label");
PredictionModel quality metrics evaluation
------------------------------------------
Accuracy: 94.44%
Auc: 98.77%
F1Score: 94.74%
41. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
The problem with flat files
• With no database or storage engine Data is written
arbitrarily to disc
• Format errors
– Caused by bug
– Caused by error
• Inefficiencies
– Compression an afterthought
– GZIP splittable problem
42. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
More problems with flat files
• Access errors
–Mutability
–Variability between files in a fileset
• Naivety
–Just because brute force scanning is possible doesn’t mean
it’s optimal
–Predicate Push-Down
–Partitioning
43. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Parquet
• Apache Parque is a file format, primarily driven from
Hadoop Ecosystem, particularly loved by Spark
• Columnar format
• Block
–File
• Row Group
– Column Chunk
» Page
44. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Designed for Data(!)
• Schemas consistent through file
• Held at the end so can be quickly seeked
• By Convention: WRITE ONCE
• Parallelism:
– Partition workload by File or Row Group
– Read data by column chunk
– Compress per Page
46. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Parquet.NET
• Until recently no approach available to .NET
– Leading to System.IO to write arbitrary data and then requiring
data engineering to sort the data out
• Libraries for cpp can be used
– Implementation by G-Research called ParquetSharp uses
Pinvoke
• A full .NET Core implementation is Parquet.NET
– https://github.com/elastacloud/parquet-dotnet
48. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Parq!
• End to end tooling for .NET devs on a platform they’re
familiar with.
• Uses dotnet global tooling
51. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
What is regression?
• Supervised Machine Learning
• Features go to a Label
• Label is a “real value” not a categorical like in
classification
• Regression algorithms generate weights for features
53. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Taxi Fare Data
• NYC data
• Taxi Fares, Time, Distance, Payment Method etc
• Use these as features and predict the most likely Fare
ahead of time.
56. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Evaluation with RMS and r2
Rms = 3.30299146626885
RSquared = 0.885729301000846
Predicted fare: 31.14972, actual fare: 29.5
57. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
COEFFICIENT OF DETERMINATION
r2 is the proportion of the variance in the dependent variable that is predictable from the
independent variables
62. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
What does it tell us?
• In a multiple-regression model to determine solar energy
production:
– The energy production is the dependent variable (Y)
– The cloud cover level is an independent variable (X1)
– The season of year is an independent variable (X2)
– Y = X1*weightX1 + X2*weightX2
• It’s a coefficient (ratio) of how good my predictions are
versus the amount of variability in my dependent variable.
63. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
How does it do that?
• It measures the relationship between two variables:
– Y-hat : ŷ
• The estimator is based on regression variables, Intercept, X Variable 1 to X
Variable n
• The distance from this estimator (prediction) and the real y value (y- ŷ)
• Squared
– Y-bar : ȳ
• The average value of all Ys
• The distance of the real y value from the mean of all y values (y-ȳ), which is how
much the data varies from average
• Squared
• These two squares are summed, and calculated:
– 1-(((y-ŷ)^2)/((y-ȳ)^2))
– 1-(estimation error/actual distance from average)
64. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Y is our actual value:
Energy generation
X1 is our first input value
we want to predict from,
X2 our second: Irradiance
Percentage and Days left
until service
The data science team works hard to
understand the data and model it, which
means produce weights for how much
each input variable affects the actual value
If you were to plot out these values
Intercept + X1 * X Variable 1 + X2 * X Variable 2
You would get a straight line like the red one above
(weightX1) (weightX2)
65. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
To calculate a prediction (called y-hat) we add the intercept to the variable
values multiplied by their modifiers (coefficient)
66. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
IF we take away the prediction
(estimator) from the actual value we
have the residual, or the size of the
error. If this is too high, the error is
positive, if the number is too low, the
error is negative.
You might sometimes hear data
scientists talking about residuals; these
are the residuals.
It is simply the actual value minus the
prediction.
67. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
We can also measure the difference between
the observed actual value and the average
(mean) of the whole set of actuals.
This gives us a distance measure of variance,
how far the actual varies from the average
value of the actuals. This is a measure of the
variance of the data.
If the number is bigger than average, the
number is positive, if it is smaller than average,
the number is negative.
68. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Both the size of the measures we
just created have a sum of 0,
because some values are bigger
and some smaller than the
actuals. This means when they get
added up they cancel out to 0.
This is correct, but we are trying to
compare the sizes of the errors,
not whether they are above or
below actual, so we need to lose
the sign and make everything
positive.
The way we’ll do that is by times-
ing the number by itself (squaring
the number), as -1*-1 = 1, -2 * -2 =
4 etc
Adding all these up across the set
gives us the sums of squares
69. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
We then divide the sum of squares of our estimator error by the sum of squares of our distance from average.
The estimator errors are always lower than the variance of the data, and we take the result from 1, to give us a
value like 0.80, which we shouldn’t describe as 80% accurate, but you can think of it along these lines; 80% of
variance is explained by the model.
70. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
ROOT MEAN SQUARED ERROR
RMSE is the average of how far out we are on an estimation by estimation basis
74. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
What does it tell us?
• By comparing the average distance between the estimator ŷ
and the observed actual, we can get a measure of how
close in real world terms we are on average to the actual.
• Unlike r2 which gives an abstract view on variance, RMSE
gives the bounds to the accuracy of our prediction.
• On average, the real value is +/- RMSE from the estimator.
75. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
How do we calculate it?
• It’s very easy and we’ve already done the hard work.
• The MSE part means average of the squared distance
of the estimator from the actual
–=AVERAGE(data[(y-ŷ)^2])
• Since the values were squared, this number is still big;
square rooting this gives us a real world value.
–=SQRT(AVERAGE(data[(y-ŷ)^2]))
76. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Take the average of the
squared estimator
distance from actual.
Squaring it earlier was
useful, as the non-
squared value averages
out to zero!
77. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Since the MSE is still in the order of magnitude of the Squares, square root it to give us a real world value
and this is Root Mean Square Error (RMSE).
In this example, the estimate is on average 147.673 away from the actual value.
80. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
What is Clustering
• Used to generate groups or clusters of similarities
• Identify relationships that are not evident to humans
• Examples may be:
–Who is a VIP customer?
–Who is cheating in a game?
–How many people are likely to drop out of class?
–Which components in manufacturing are likely to fail?
82. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Supervised Learning
Definition
- You give the input data (X)
and an output variable (Y)
(labels), and you use an
algorithm to learn the
mapping function from
the input to the output.
Y = f(X)
Techniques
- Classification: you want to
classify a new input value.
- Regression
83. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Unsupervised Learning
Definition
- You give the input data (X)
and no corresponding
output variables (labels).
Techniques
- Clustering: you want to
discover the inherent
groupings in the data.
- Association: you want to
discover rules that
describe large portions of
your data.
88. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
TensorFlow since 0.5.0
• Additional work for:
– CNTK
– Torch
• This means ML.NET can be the OSS, XPLAT host for many
data science frameworks, anywhere that NETCORE runs.
• On devices with Xamarin, on edge devices, on web servers.
91. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Input
Files
Stream
Unified Cloud Data Environment
Excel
XML
Other
Landing
Excel
XML
Other
A true copy
of the input
data, used
to trace
linage and
rebuild data
Staging
Validated once
and for all,
optimised for
processing
Parquet
Enriched
Results of
machine
learning
experiments
Parquet
Operationalised
Any combination of modern data
platform to support visualisations
and operational systems
Parquet.NET ML.NET
93. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
The questions to answer
• Opportunity discovery
–What can we do?
• Adoption
–How do I get my company to use this?
• Sustainability
–How do we control costs and/or build a viable price point?
• Measurement
–How do we know it’s generating value?
Build
Use
Profit
94. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
The end goals
• Providing better customer service and experience
– Better products
– Self-healing and self-improving
• Improving operational processes and resource
consumption
– Faster Time to Market
– Lighter Bill of Materials
– Stronger Supply Chain
• New business models
– Insights from data
– Different pricing models
96. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Mental Models
In order to envision the transformation
possibilities, adopt the following mental models.
AI is an extremely focussed entry level clerical
assistant
AI is exceptionally reactive
AI is auditable, repeatable, improvable
AI is resistant of external bias
98. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Billy the Kid Reaction
Times
What if all workers can
react and form a course
of action in 1
millisecond?
99. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Auditable Actions
What if all actions and their impact were measurable
and qualified?
Great companies have high cultures of accountability, it
comes with this culture of criticism […], and I think our
culture is strong on that.
Steve Ballmer
100. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Cement the views of the Founders
What if the decision making process of the hyper-
successful founders was cemented and eternally
referenceable? Resistant to the distractions of hype and
false promises.
101. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
All tools are Smart Tools
What if every tool used by a business was as
smart as a person. A forklift could tell you the
context of a crate was the wrong weight. A
machine could tell you it was feeling unwell. A
room could tell you someone had left a
dangerous tool in an unsafe state.
103. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
JIT Service Regimes
• Migrate from manufacturer led maintenance cycles to
just-in-time
• Use data to establish mean time to failure
• Conditional maintenance
104. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
AzureML
~ 1 minute telemetry
through dedicated cloud
gateway
Reporting
Alarm
- Machine Learning used in real-time with “Anomaly
Detection” which enables history to be assessed and
false positive spikes in behaviour to be discarded
- Model retraining can occur to help understand
acceptable lowering of efficiency as equipment degrades
over time within acceptable parameters
- Predictions through AI become an
integral part of business reporting
105. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Supplier efficiency
• Use data to judge the reliability of assets
• Track fault recurrence and resolution speeds
• Contract with SLAs around reducing outages
107. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
JIT Warehousing
• Regarding Parts
• JIT Maintenance leads to JIT Warehousing
• Regarding Fuel
• Integrated Data streams including ERP and CRM
• Manage the supply chain and keep the right stock
level
108. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
G4S Data Science Programme
• G4S plc is a British multinational security services company and operates
an integrated security business in more than 90 countries across the
globe.
• They aim to differentiate G4S by providing industry leading security
solutions that are innovative, reliable and efficient.
• Aim: Predict anomalous card activity in
monitored buildings
110. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Data Science Conceptual Model – Crisp DM outline Methodology
Business
Understanding
Business
Objective
Data
Understanding
Data
Preparation
Deployment
Modelling and
Evaluation
Data Science Team
Business Understanding
• IDENTIFYING YOUR BUSINESS GOALS
• A problem that your management wants to address
• The business goals
• Constraints (limitations on what you may do, the kinds of solutions that can be
used, when the work must be completed, and so on)
• Impact (how the problem and possible solutions fit in with the business)
• ASSESSING YOUR SITUATION
• Inventory of resources: A list of all resources available for the project.
• Requirements, assumptions, and constraints:
• Risks and contingencies:
• Terminology
• Costs and benefits:
• DEFINING YOUR DATA-MINING GOALS
• Data-mining goals: Define data-mining deliverables, such as models, reports,
presentations, and processed datasets.
• Data-mining success criteria: Define the data-mining technical criteria necessary
to support the business success criteria. Try to define these in quantitative terms
(such as model accuracy or predictive improvement compared to an existing
method).
• PRODUCING YOUR PROJECT PLAN
• Project plan: Outline your step-by-step action plan for the project. (for example,
modelling and evaluation usually call for several back-and-forth repetitions).
• Initial assessment of tools and techniques
111. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Data Science Conceptual Model – Crisp DM outline Methodology
Business
Understanding
Business
Objective
Data
Understanding
Data
Preparation
Deployment
Modelling and
Evaluation
Data Science Team
Data Understanding
• GATHERING DATA
• Outline data requirements: Create a list of the types of data necessary to address
the data mining goals. Expand the list with details such as the required time
range and data formats.
• Verify data availability: Confirm that the required data exists, and that you can
use it.
• Define selection criteria: Identify the specific data sources (databases, files,
documents, and so on.)
• DESCRIBING DATA
• Now that you have data, prepare a general description of what you have.
• EXPLORING DATA
• Get familiar with the data.
• Spot signs of data quality problems.
• Set the stage for data preparation steps.
• VERIFYING DATA QUALITY
• The data you need doesn’t exist. (Did it never exist, or was it discarded? Can this
data be collected and saved for future use?)
• It exists, but you can’t have it. (Can this restriction be overcome?)
• You find severe data quality issues (lots of missing or incorrect values that can’t
be corrected).
112. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Data Science Conceptual Model – Crisp DM outline Methodology
Business
Understanding
Business
Objective
Data
Understanding
Data
Preparation
Deployment
Modelling and
Evaluation
Data Science Team
Data Preparation
• SELECTING DATA
• Now you will decide which portion of the data that you have is actually going to
be used for data mining.
• The deliverable for this task is the rationale for inclusion and exclusion. In it,
you’ll explain what data will, and will not, be used for further data-mining work.
• You’ll explain the reasons for including or excluding each part of the data that
you have, based on relevance to your goals, data quality, and technical issues
• CLEANING DATA
• The data that you’ve chosen to use is unlikely to be perfectly clean (error-free).
• You’ll make changes, perhaps tracking down sources to make specific data
corrections, excluding some cases or individual cells (items of data), or replacing
some items of data with default values or replacements selected by a more
sophisticated modelling technique.
• CONSTRUCTING DATA
• You may need to derive some new fields (for example, use the delivery date and
the date when a customer placed an order to calculate how long the customer
waited to receive an order), aggregate data, or otherwise create a new form of
data.
• INTEGRATING DATA
• Your data may now be in several disparate datasets. You’ll need to merge some
or all of those disparate datasets together to get ready for the modelling phase.
• FORMATTING DATA
• Data often comes to you in formats other than the ones that are most
convenient for modelling. (Format changes are usually driven by the design of
your tools.) So convert those formats now.
113. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Data Science Conceptual Model – Crisp DM outline Methodology
Business
Understanding
Business
Objective
Data
Understanding
Data
Preparation
Deployment
Modelling and
Evaluation
Data Science Team
Modelling and Evaluation (Modelling)
• SELECTING MODELING TECHNIQUES
• Modelling technique: Specify the technique(s) that you will use.
• Modelling assumptions: Many modelling techniques are based on certain
assumptions.
• DESIGNING TESTS
• The test in this task is the test that you’ll use to determine how well your model
works. It may be as simple as splitting your data into a group of cases for model
training and another group for model testing.
• Training data is used to fit mathematical forms to the data model, and test data
is used during the model-training process to avoid overfitting
• BUILDING MODEL(S)
• Parameter settings: When building models, most tools give you the option of
adjusting a variety of settings, and these settings have an impact on the structure
of the final model. Document these settings in a report.
• Model descriptions: Describe your models. State the type of model (such as
linear regression or neural network) and the variables used.
• Models: This deliverable is the models themselves. Some model types can be
easily defined with a simple equation; others are far too complex and must be
transmitted in a more sophisticated format.
• ASSESSING MODEL(S)
• Model assessment: Summarizes the information developed in your model
review. If you have created several models, you may rank them based on your
assessment of their value for a specific application.
• Revised parameter settings: You may choose to fine-tune settings that were used
to build the model and conduct another round of modelling and try to improve
your results.
114. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Data Science Conceptual Model – Crisp DM outline Methodology
Business
Understanding
Business
Objective
Data
Understanding
Data
Preparation
Deployment
Modelling and
Evaluation
Data Science Team
Modelling and Evaluation Cont… (Evaluation)
• EVALUATING RESULTS
• Assessment of results (for business goals): Summarize the results with respect to
the business success criteria that you established in the business-understanding
phase. Explicitly state whether you have reached the business goals defined at
the start of the project.
• Approved models: These include any models that meet the business success
criteria.
• REVIEWING THE PROCESS
• Now that you have explored data and developed models, take time to review
your process. This is an opportunity to spot issues that you might have
overlooked and that might draw your attention to flaws in the work that you’ve
done while you still have time to correct the problem before deployment. Also
consider ways that you might improve your process for future projects.
• DETERMINING THE NEXT STEPS
• List of possible actions: Describe each alternative action, along with the
strongest reasons for and against it.
• Decision: State the final decision on each possible action, along with the
reasoning behind the decision.
115. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Data Science Conceptual Model – Crisp DM outline Methodology
Business
Understanding
Business
Objective
Data
Understanding
Data
Preparation
Deployment
Modelling and
Evaluation
Data Science Team
Deployment
• PLANNING DEPLOYMENT
• When your model is ready to use, you will need a strategy for putting it to work
in your business.
• PLANNING MONITORING AND MAINTENANCE
• Data-mining work is a cycle, so expect to stay actively involved with your models
as they are integrated into everyday use.
• REPORTING FINAL RESULTS
• Final report: The final report summarizes the entire project by assembling all the
reports created up to this point, and adding an overview summarizing the entire
project and its results.
• Final presentation: A summary of the final report is presented in a meeting with
management. This is also an opportunity to address any open questions.
• REVIEW PROJECT
• Finally, the data-mining team meets to discuss what worked and what didn’t,
what would be good to do again, and what should be avoided!
116. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Business
Understanding
Data Science Continuous Integration and improvement Cycle
Business
Objective
Data
Understanding
Data
Preparation
Deployment
Modelling and
Evaluation
Business
Understanding
New
Objective
Data
Understanding
Data
Preparation
Deployment
Modelling and
Evaluation
Business
Understanding
New
Objective
Data
Understanding
Data
Preparation
Deployment
Modelling and
Evaluation
Data Science Team Data Science Team Data Science Team
Editor's Notes
Notebooks have a primary language R, Python & Scala
Use magic to change language
Supports SQL, R, Python & Scala
Create table from dataframe in any language to share between spark contexts
Easy to pair to translate
Treat Notebooks as pseudo-OO call out to function or values in other notebooks, can be combined with if / switch logic etc
Chain notebooks passing data frames and resources from one to another
Schedule notebooks
Plots make easy one line display(dataframe)
GUI driven experience between plots with drag and drop
Use SQL, R, Python & Scala as primary language
Support for popular visual libraries including GGPlot & Matplotlib
Pip, R & JAR packages to add additional visuals
Interactive Visuals i.e. Plotly
Create a Dashboard of pinned visuals