Exercise - First Machine Learning Model
Exercise - First Machine Learning Model
3","language":"python","name":"python3"},"language_info":{"codemirror_mode":
{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-
python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","vers
ion":"3.6.5"},"kaggle":{"accelerator":"none","dataSources":
[{"sourceId":10211,"databundleVersionId":111096,"sourceType":"competition"},
{"sourceId":15520,"sourceType":"datasetVersion","datasetId":11167},
{"sourceId":38454,"sourceType":"datasetVersion","datasetId":2709}],"isInternetEnabled":f
alse,"language":"python","sourceType":"notebook","isGpuEnabled":false}},"nbformat_mino
r":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"**This notebook is an exercise
in the [Introduction to Machine Learning](https://www.kaggle.com/learn/intro-to-machine-
learning) course. You can reference the tutorial at [this
link](https://www.kaggle.com/dansbecker/your-first-machine-learning-model).**\n\n---\
n","metadata":{}},{"cell_type":"markdown","source":"## Recap\nSo far, you have loaded
your data and reviewed it with the following code. Run this cell to set up your coding
environment where the previous step left off.","metadata":{}},
{"cell_type":"code","source":"# Code you have previously used to load data\nimport pandas
as pd\n\n# Path of the file to read\niowa_file_path =
'../input/home-data-for-ml-course/train.csv'\n\nhome_data = pd.read_csv(iowa_file_path)\n\
n# Set up code checking\nfrom learntools.core import binder\nbinder.bind(globals())\nfrom
learntools.machine_learning.ex3 import *\n\nprint(\"Setup Complete\")","metadata":
{"collapsed":true,"jupyter":{"outputs_hidden":true}},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":"# Exercises\n\n## Step 1: Specify Prediction Target\
nSelect the target variable, which corresponds to the sales price. Save this to a new
variable called `y`. You'll need to print a list of the columns to find the name of the column
you need.\n","metadata":{}},{"cell_type":"code","source":"# print the list of columns in the
dataset to find the name of the prediction target\n Index(['Id', 'MSSubClass', 'MSZoning',
'LotFrontage', 'LotArea', 'Street',\n 'Alley', 'LotShape', 'LandContour', 'Utilities',
'LotConfig',\n 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',\n
'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',\n 'RoofStyle',
'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',\n 'MasVnrArea', 'ExterQual',
'ExterCond', 'Foundation', 'BsmtQual',\n 'BsmtCond', 'BsmtExposure', 'BsmtFinType1',
'BsmtFinSF1',\n 'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',\
n 'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',\n 'LowQualFinSF',
'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',\n 'HalfBath', 'BedroomAbvGr',
'KitchenAbvGr', 'KitchenQual',\n 'TotRmsAbvGrd', 'Functional', 'Fireplaces',
'FireplaceQu', 'GarageType',\n 'GarageYrBlt', 'GarageFinish', 'GarageCars',
'GarageArea', 'GarageQual',\n 'GarageCond', 'PavedDrive', 'WoodDeckSF',
'OpenPorchSF',\n 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC',\n
'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType',\n 'SaleCondition',
'SalePrice'],\n dtype='object')","metadata":{},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":"y = home_data['SalePrice']\n\n# Check your answer\
nstep_1.check()","metadata":{"collapsed":true,"jupyter":
{"outputs_hidden":true}},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":"# The lines below will show you a hint or the solution.\n#
step_1.hint() \n# step_1.solution()","metadata":{},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":"## Step 2: Create X\nNow you will create a DataFrame
called `X` holding the predictive features.\n\nSince you want only some columns from the
original data, you'll first create a list with the names of the columns you want in `X`.\n\
nYou'll use just the following columns in the list (you can copy and paste the whole list to
save some typing, though you'll still need to add quotes):\n * LotArea\n * YearBuilt\n *
1stFlrSF\n * 2ndFlrSF\n * FullBath\n * BedroomAbvGr\n * TotRmsAbvGrd\n\nAfter
you've created that list of features, use it to create the DataFrame that you'll use to fit the
model.","metadata":{}},{"cell_type":"code","source":"# Create the list of features below\
nfeature_names
=['LotArea','YearBuilt','1stFlrSF','2ndFlrSF','FullBath','BedroomAbvGr','TotRmsAbvGrd']\n\
n\n# Select data corresponding to features in feature_names\nX =
home_data[feature_names]\n\n# Check your answer\nstep_2.check()","metadata":
{"collapsed":true,"jupyter":{"outputs_hidden":true}},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":"# step_2.hint()\n# step_2.solution()","metadata":
{},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Review
Data\nBefore building a model, take a quick look at **X** to verify it looks
sensible","metadata":{}},{"cell_type":"code","source":"# Review data\n# print description
or statistics from X\n#print(X.describe())\n\n# print the top few lines\
n#print(X.head())","metadata":{},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":"## Step 3: Specify and Fit Model\nCreate a
`DecisionTreeRegressor` and save it iowa_model. Ensure you've done the relevant import
from sklearn to run this command.\n\nThen fit the model you just created using the data in
`X` and `y` that you saved above.","metadata":{}},{"cell_type":"code","source":"# from
sklearn.tree import DecisionTreeRegressor\n#specify the model. \n#For model
reproducibility, set a numeric value for random_state when specifying the model\
niowa_model = DecisionTreeRegressor(random_state=1)\n\n# Fit the model\
niowa_model.fit(X,y)\n\n# Check your answer\nstep_3.check()","metadata":
{},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"# step_3.hint()\n#
step_3.solution()","metadata":{"collapsed":true,"jupyter":
{"outputs_hidden":true}},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":"## Step 4: Make Predictions\nMake predictions with
the model's `predict` command using `X` as the data. Save the results to a variable called
`predictions`.","metadata":{}},{"cell_type":"code","source":"predictions =
iowa_model.predict(X)\nprint(predictions)\n\n# Check your answer\
nstep_4.check()","metadata":{"collapsed":true,"jupyter":
{"outputs_hidden":true}},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":"# step_4.hint()\n# step_4.solution()","metadata":
{},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Think About
Your Results\n\nUse the `head` method to compare the top few predictions to the actual
home values (in `y`) for those same homes. Anything surprising?\n","metadata":{}},
{"cell_type":"code","source":"# You can write code in this cell\
nprediction=iowa_model.predict(X.head())\nprint(prediction)\nansw=y.head()\
nprint(answ)","metadata":{},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":"It's natural to ask how accurate the model's predictions
will be and how you can improve that. That will be you're next step.\n\n# Keep Going\n\
nYou are ready for **[Model Validation](https://www.kaggle.com/dansbecker/model-
validation).**\n","metadata":{}},{"cell_type":"markdown","source":"---\n\n\n\n\n*Have
questions or comments? Visit the [course discussion
forum](https://www.kaggle.com/learn/intro-to-machine-learning/discussion) to chat with
other learners.*","metadata":{}}]}