Are you sure you want to delete this access key?
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
This repository is an example that demonstrates how dvc together with streamlit can help tracking the model performance during R&D exploration.
The python code is not the purpose of this repository. It is adapted from the transfer learning Tensorflow tutorial.
Data, metrics, model weights produced during the training and evaluation processed are tracked using dvc while a streamlit app allows to visually explore model predictions and compare trained models.
poetry install
The repository contains a single dvc pipeline that looks like this :
Stages description:
# | Stage Name | Description |
---|---|---|
1 | download_dataset |
Download the cat_vs_dogs dataset to data/raw folder |
2 | split_dataset |
The cat vs dogs has no test subset. This stage keeps the train subset as is and splits the val subset into val and test subsets. Then, it copies images in train / val / test subfolders in data/dataset |
3 | train |
Train a classifier using transfer learning from a pre-trained network |
4 | evaluate |
Compute accuracy of the trained model on the test subset |
Useful dvc commands:
Command | Description |
---|---|
dvc pull |
Pull all the data: dataset images, model weights, etc |
dvc repro |
Relaunch the whole pipeline. Use -f to force pipeline execution or -s to launch a single stage. |
dvc plots show data/evaluation/predictions.csv --out data/evaluation/confusion.html |
Generate confusion matrix using the dvc predefined template. |
dvc dag --full --dot | dot -Tpng -o docs/images/dvc-pipeline.png |
Regenerate the pipeline graph above. The graphviz package is required. |
To go further, see the dvc CLI reference.
⚠️ A note on dvc remote storage: remote storage is the Sicara's public s3 bucket (see dvc config file). By default, you have permission to read (
dvc pull
) but you cannot write (dvc push
). If you want to run experiments and save your result withdvc push
, consider adding your own dvc remote.
Launch the Streamlit app: streamlit run st_scripts/st_dashboard.py
Open you browser, you should see the Streamlit app :
docker build -t dvc-streamlit-example .
docker run --gpus all --rm -v $PWD:/tmp --shm-size=1g dvc-streamlit-example ${CMD}
.
For instance, to relaunch the training pipeline:
docker run --gpus all --rm -v $PWD:/tmp --shm-size=1g dvc-streamlit-example dvc repro
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?