At Lennox International, we have thousands of IoT connected devices streaming data into the Azure platform with a minute level polling interval. The challenge was to use these data sets, combine with external data sources such as weather, and predict equipment failure with high levels of accuracy along with their influencing patterns and parameters. Previously the team was using a combination of on-premise and desktop tools to run algorithms on a sample set of devices. The result was low accuracy levels (around 65%) on a process that took more than 6 hours. The team had to work through several data orchestration challenges and identify a machine learning platform which enabled them to collaborate between our engineering SME’s, Data Engineers and Data Scientists. The team decided to use Azure Databricks to build the data engineering pipelines, appropriate machine learning models and extract predictions using PySpark. To enhance the sophistication of the learning, the team worked on a variety of Spark ML models such as Gradient Boosted Trees and Random Forest. The team also implemented stacking, ensemble methods using H2O driverless AI and sparkling water on Azure Databricks clusters, which can scale up to 1000 cores. Join us in this session and see how this resulted in models that run in 40 minutes with minimal tuning and predict failures with accuracy of about 90%.