Machine Learning Practice

Data playground for improving machine learning skills using Kaggle datasets.

To create the mlp environment run:

conda env create -f environment.yml

What if scientists could anticipate volcanic eruptions as they predict the weather? While determining rain or shine days in advance is more difficult, weather reports become more accurate on shorter time scales. A similar approach with volcanoes could make a big impact. Just one unforeseen eruption can result in tens of thousands of lives lost. If scientists could reliably predict when a volcano will next erupt, evacuations could be more timely and the damage mitigated.

Enter Italy's Istituto Nazionale di Geofisica e Vulcanologia (INGV), with its focus on geophysics and volcanology. The INGV's main objective is to contribute to the understanding of the Earth's system while mitigating the associated risks. Tasked with the 24-hour monitoring of seismicity and active volcano activity across the country, the INGV seeks to find the earliest detectable precursors that provide information about the timing of future volcanic eruptions.

Data size is 31.25 GB and contains 8953 files.

Download the data zip file directly from Kaggle by running the following code within the data/ directory:

kaggle competitions download -c predict-volcanic-eruptions-ingv-oe

The data zip file can then be unzipped via:

unzip predict-volcanic-eruptions-ingv-oe.zip

For the data zip file to download successfully, please ensure your ~/.kaggle folder contains a valid Kaggle API token kaggle.json.

If not, please create a new token from within your Kaggle account settings, then move the token from the Downloads folder to the ~/.kaggle folder.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Warm-up Exercises 🔥

Each exercise includes the following:

data/: contains the dataset description, train and test sets, and sample prediction CSV file

real_data/: contains the full dataset, and/or kaggle leaderboard score distribution

model.ipynb: sample ML workflow using Jupyter Notebook

This is a fun competition aimed at helping you get started with machine learning. While the dataset is publicly available on the internet, looking up the answers defeats the entire purpose. So seriously, don't do that.

1. Titanic - Machine Learning from Disaster 🚢

_{^{Adapted from GIF animation by Artistosteles, Wikimedia Commons}}

2. House Prices - Advanced Regression Techniques 🏠

3. Digit Recognizer 🔢

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Check your score! 📈

score.py is a python script for evaluating final model performance on the test set, callable within each exercise directory via:

python ../score.py -f [prediction csv filepath]

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Digit_Recognizer		Digit_Recognizer
House_Prices_Advanced_Regression_Techniques		House_Prices_Advanced_Regression_Techniques
INGV_Volcanic_Eruption_Prediction		INGV_Volcanic_Eruption_Prediction
Titanic_Machine_Learning_from_Disaster		Titanic_Machine_Learning_from_Disaster
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
score.py		score.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Practice

Contents

INGV - Volcanic Eruption Prediction 🌋

Warm-up Exercises 🔥

1. Titanic - Machine Learning from Disaster 🚢

2. House Prices - Advanced Regression Techniques 🏠

3. Digit Recognizer 🔢

Check your score! 📈

About

Languages

kelleuseis/Machine_Learning_Practice

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Practice

Contents

INGV - Volcanic Eruption Prediction 🌋

Warm-up Exercises 🔥

1. Titanic - Machine Learning from Disaster 🚢

2. House Prices - Advanced Regression Techniques 🏠

3. Digit Recognizer 🔢

Check your score! 📈

About

Topics

Resources

Stars

Watchers

Forks

Languages