Machine Learning Project

Mercedes-Benz aims to reduce testing time for new car configurations to lower emissions. Engineers developed a testing system but optimizing speeds with many feature combinations is complex. The document provides a dataset to predict testing times using algorithms. Following preprocessing like removing zero variance columns, imputing missing values, and dimensionality reduction, an XGBoost model can be trained to quickly and accurately predict testing times for new configurations. The objective is to contribute to faster testing while maintaining Mercedes-Benz's high standards.

Uploaded by

poongothai s

Available Formats

Download as PDF, TXT or read online on Scribd

55% found this document useful (11 votes)

2K views

Machine Learning Project

Uploaded by

poongothai s

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Mercedes-Benz Greener Manufacturing

DESCRIPTION
Reduce the time a Mercedes-Benz spends on the test bench.
Problem
Since the first automobile, the Benz Patent Motor Car in 1886, Mercedes-Benz has
stood for important automotive innovations. These include the passenger safety cell
with a crumple zone, the airbag, and intelligent assistance systems. Mercedes-Benz
applies for nearly 2000 patents per year, making the brand the European leader
among premium carmakers. Mercedes-Benz is the leader in the premium car industry.
With a huge selection of features and options, customers can choose the customized
Mercedes-Benz of their dreams.
To ensure the safety and reliability of every unique car configuration before they hit
the road, the company’s engineers have developed a robust testing system. As one
of the world’s biggest manufacturers of premium cars, safety and efficiency are
paramount on Mercedes-Benz’s production lines. However, optimizing the speed of
their testing system for many possible feature combinations is complex and time-
consuming without a powerful algorithmic approach.
You are required to reduce the time that cars spend on the test bench. Others will work
with a dataset representing different permutations of features in a Mercedes-Benz car
to predict the time it takes to pass testing. Optimal algorithms will contribute to faster
testing, resulting in lower carbon dioxide emissions without reducing Mercedes-Benz’s
standards.
Following actions should be performed:
• If for any column(s), the variance is equal to zero, then you need to remove those
variable(s).
• Check for null and unique values for test and train sets.
• Apply label encoder.
• Perform dimensionality reduction.
• Predict your test_df values using XGBoost.

Objective:

This dataset contains an anonymized set of variables that describe different Mercedes
cars. The ground truth is labelled 'y' and represents the time (in seconds) that the car
took to pass testing.

First imported the necessary modules

The number of rows is small with 388 columns.

Target Variable:
"y" variable has predicted and some analysis done on this variable.

Seems like a single data point is well above the rest.

Then, plotting the distribution graph.

Now, looking at the data type of all the variables present in the dataset.
Majority of the columns are integers with 8 categorical columns and 1 float column.

X0 to X8 are the categorical columns.

Missing values:

Now, checking for the missing values.

There are no missing values in the dataset.

Integer Columns Analysis:

All the integer columns are binary with some columns have only one unique value 0.
Possibly exclude those columns in this modelling activity.
Now, exploring the categorical columns present in the dataset.
Binary Variables:
Now, looking into the binary variables. There are quite a few of them have seen before.
Then, started with getting the number of 0's and 1's in each of these variables.
Now, checking the mean y value in each of the binary variable.
Binary variables which shows a good colour difference in the above graphs between
0 and 1 are likely to be more predictive given the count distribution is also good
between both the classes. Then, dive more into the important variables in the later part
of the notebook.
ID variable:

One more important thing to look at it is ID variable. This will give an idea of how the
splits are done across train and test and also to help if ID has some potential prediction
capability.

The 'y' variable changes with ID variable.

There seems to be a slight decreasing trend with respect to ID variable, the IDs are
distributed across train and test.
It Seems like a random split of ID variable between train and test samples.
Important Variables:

Run and xgboost model to get the important variables.

Categorical occupy the top spots followed by binary variables.
Then, build a Random Forest model and checked the important variables.

Knime Project Report
No ratings yet
Knime Project Report
12 pages
Machine Learning Assignment Report - Cars
100% (4)
Machine Learning Assignment Report - Cars
42 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
SMDM Project SAMPLE REPORT
0% (2)
SMDM Project SAMPLE REPORT
7 pages
K2 Cold Storage Case Study
0% (1)
K2 Cold Storage Case Study
1 page
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
Machine Learning Project Car Price Prediction Algorithm
No ratings yet
Machine Learning Project Car Price Prediction Algorithm
4 pages
Predicting Mode of Transport (ML) : Akalya KS
No ratings yet
Predicting Mode of Transport (ML) : Akalya KS
17 pages
Machine Learning VIVEK
80% (5)
Machine Learning VIVEK
118 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
Machine Learning Solution
100% (1)
Machine Learning Solution
12 pages
Report On Linear Regression Using R
No ratings yet
Report On Linear Regression Using R
15 pages
Customer Churn Analysis and Prediction
No ratings yet
Customer Churn Analysis and Prediction
4 pages
Car Transport Prediction
100% (2)
Car Transport Prediction
27 pages
SMDM Report
No ratings yet
SMDM Report
12 pages
Data Mining Quiz 3 - Random Forest: Course Content
No ratings yet
Data Mining Quiz 3 - Random Forest: Course Content
8 pages
Solution To Problem 1: Importing The Libraries
No ratings yet
Solution To Problem 1: Importing The Libraries
6 pages
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
No ratings yet
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
18 pages
Data Preparation
No ratings yet
Data Preparation
12 pages
House Price Prediction
No ratings yet
House Price Prediction
3 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
House Price Prediction Using Data Science
No ratings yet
House Price Prediction Using Data Science
8 pages
Bank Customer Churn Analysis - Jupyter Notebook
No ratings yet
Bank Customer Churn Analysis - Jupyter Notebook
11 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Bankruptcy Prevention Project
No ratings yet
Bankruptcy Prevention Project
16 pages
House Price Prediction 1
No ratings yet
House Price Prediction 1
27 pages
Answer Report (Preditive Modelling)
100% (1)
Answer Report (Preditive Modelling)
29 pages
Project ML
100% (4)
Project ML
36 pages
P L Lohitha 19-04-23 TSF Business Report
No ratings yet
P L Lohitha 19-04-23 TSF Business Report
70 pages
Quiz Forecasting
100% (1)
Quiz Forecasting
13 pages
Capstone Presentation
No ratings yet
Capstone Presentation
9 pages
Tushar Tukaram Bhakare: Education Skills
No ratings yet
Tushar Tukaram Bhakare: Education Skills
1 page
Clustering Project
100% (1)
Clustering Project
44 pages
Data Mining For Customer Segmentation
No ratings yet
Data Mining For Customer Segmentation
13 pages
PM Guided Project Sample Business Report
No ratings yet
PM Guided Project Sample Business Report
52 pages
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
No ratings yet
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
4 pages
Akshaya SMDM Project Report
100% (1)
Akshaya SMDM Project Report
18 pages
Project 3 - Build A Logistic Regression Model To Predict Custo Mer Churn in Telecom IndustryV1.0 PDF
100% (1)
Project 3 - Build A Logistic Regression Model To Predict Custo Mer Churn in Telecom IndustryV1.0 PDF
38 pages
Graded Quiz 1 - Working With Python Great Lakes
No ratings yet
Graded Quiz 1 - Working With Python Great Lakes
6 pages
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
Facebook Comment Volume Prediction
No ratings yet
Facebook Comment Volume Prediction
20 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
Time Series Forecasting
0% (1)
Time Series Forecasting
1 page
ML Quiz 2
No ratings yet
ML Quiz 2
1 page
Extended Project
No ratings yet
Extended Project
1 page
Capstone Project
0% (1)
Capstone Project
6 pages
SMDM Project
0% (1)
SMDM Project
22 pages
Project 4 - Cars-Datasets PDF
100% (2)
Project 4 - Cars-Datasets PDF
44 pages
ML - Project - Business Report
No ratings yet
ML - Project - Business Report
43 pages
Predictive Modelling - Linear Discriminant Analysis - Mentor Version - Jupyter Notebook
100% (1)
Predictive Modelling - Linear Discriminant Analysis - Mentor Version - Jupyter Notebook
25 pages
SMDM Project Solved
0% (1)
SMDM Project Solved
27 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
Predictive Modelling
67% (3)
Predictive Modelling
64 pages
WSMA SharkTank
No ratings yet
WSMA SharkTank
33 pages
Tableau Questions
No ratings yet
Tableau Questions
2 pages
Business Report: Advanced Statistics Module Project I
100% (1)
Business Report: Advanced Statistics Module Project I
5 pages
Answer Book - Rose Wines
100% (1)
Answer Book - Rose Wines
11 pages
Mastering Parallel Programming with R
From Everand
Mastering Parallel Programming with R
Simon R. Chapple
No ratings yet
Data Mining Unit-4
No ratings yet
Data Mining Unit-4
27 pages
CLASS-8 ANSWER KEY MCQs CH 1 TO 4 Objective Type Questions
No ratings yet
CLASS-8 ANSWER KEY MCQs CH 1 TO 4 Objective Type Questions
4 pages
الذكاء الاصطناعي
No ratings yet
الذكاء الاصطناعي
66 pages
Theory of Attributes: By: Dr. Akash Asthana Assistant Professor, University of Lucknow, Lucknow
No ratings yet
Theory of Attributes: By: Dr. Akash Asthana Assistant Professor, University of Lucknow, Lucknow
23 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
049 Stat 326 Regression Final Paper
No ratings yet
049 Stat 326 Regression Final Paper
17 pages
T215A_Session 1
No ratings yet
T215A_Session 1
41 pages
Porting Python 2 Code To Python 3: Guido Van Rossum and The Python Development Team
No ratings yet
Porting Python 2 Code To Python 3: Guido Van Rossum and The Python Development Team
6 pages
Negative Binomial Regression
No ratings yet
Negative Binomial Regression
36 pages
Intel GFX PRM Osrc KBL Vol02a Commandreference Instructions
No ratings yet
Intel GFX PRM Osrc KBL Vol02a Commandreference Instructions
1,352 pages
ITS632 Lecture2 Data
No ratings yet
ITS632 Lecture2 Data
61 pages
Porting Python 2 Code To Python 3: Guido Van Rossum and The Python Development Team
No ratings yet
Porting Python 2 Code To Python 3: Guido Van Rossum and The Python Development Team
7 pages
Sakhil Capstone
No ratings yet
Sakhil Capstone
20 pages
Tbscript en Manual
No ratings yet
Tbscript en Manual
44 pages
Food Service in Hospital An Indicative Model BFJ
No ratings yet
Food Service in Hospital An Indicative Model BFJ
25 pages
Open Office Base
No ratings yet
Open Office Base
157 pages
Quantitative Analysis of Categorical Variables
No ratings yet
Quantitative Analysis of Categorical Variables
25 pages
Data Science
No ratings yet
Data Science
47 pages
Bok:978 1 4899 7218 7 PDF
No ratings yet
Bok:978 1 4899 7218 7 PDF
375 pages
Types of Data or Classification of Variables 1
No ratings yet
Types of Data or Classification of Variables 1
14 pages
Time Horizon and Cryptocurrency Ownership Is Crypto Not Speculative
No ratings yet
Time Horizon and Cryptocurrency Ownership Is Crypto Not Speculative
23 pages
All About Encoding - by Baijayanta Roy - Towards Data Science
No ratings yet
All About Encoding - by Baijayanta Roy - Towards Data Science
25 pages
2 - Power BI - Query Editor - Column Transformation - Data Types
100% (1)
2 - Power BI - Query Editor - Column Transformation - Data Types
64 pages
Customer Churn: by Dinesh Nair Adrien Le Doussal Fiona Tait Fatma Ahmadi Fulya Percin
100% (1)
Customer Churn: by Dinesh Nair Adrien Le Doussal Fiona Tait Fatma Ahmadi Fulya Percin
20 pages
Learning Analytics
No ratings yet
Learning Analytics
56 pages
Entrepreneurship and Small Business - Compress
No ratings yet
Entrepreneurship and Small Business - Compress
120 pages
Logistic PDF
No ratings yet
Logistic PDF
146 pages
Decision Making & Uncertainty Excel Sheet
No ratings yet
Decision Making & Uncertainty Excel Sheet
10 pages
What Test Flowchart and Table
No ratings yet
What Test Flowchart and Table
2 pages
Cemat V 8.2 Function Block Library ILS - CEM
No ratings yet
Cemat V 8.2 Function Block Library ILS - CEM
15 pages