DASF is a generic framework specialized in acceleration and scaling common techniques for Machine Learning. DASF uses most methods and functions from the most common libraries to increase the speed up of most algorithms. Part of this is to use Dask data to scale computation and RAPIDS AI algorithms to extend the support to GPUs as well.
For now, the installation can be done using docker or singularity (if available).
To install DASF using docker or singularity, you must go to the build/
directory and execute the command below directory according to your build type:
cpu
or gpu
. Notice that DASF uses HPC Container Maker
(HPCCM) to generate recipes for all sorts of container types. You should install
HPCCM first, in order to generate them.
./build_container.sh --device <cpu|gpu>
You can also configure other parameters of the container if you want. Run -h
for further information. It includes the container backend: docker or
singularity.
The dasf
image will be created and ready to use. Once it is ready, you
can start a jupyter instance by executing the command:
./start_jupyter_server.sh --device <cpu|gpu>
You can also define a different port by using --port PORT
argument.
To install this development version, all you need to do is run pip
from the
root project directory (the same where pyproject.toml
lives).
pip3 install .
If you want to see some examples of how to use DASF, you can visit the tutorials page to get more information of basic and advanced usage.
If you have a working environment with DASF installed, you can execute all
the test sets. Make sure you have all development packages installed such as
pytest, parameterized and mock. To run, you need to execute
pytest
from the tests/
directory.
pytest tests/
The table below is a list of supported machine learning algorithms by DASF framework.
ML Algorithm | CPU | GPU | Multi-CPU | Multi-GPU | Path |
---|---|---|---|---|---|
K-Means | X | X | X | X | dasf.ml.cluster |
SOM | X | X | X | X | dasf.ml.cluster |
Agglomerative Clustering | X | X | dasf.ml.cluster | ||
DBSCAN | X | X | X | dasf.ml.cluster | |
HDBSCAN | X | X | dasf.ml.cluster | ||
Spectral Clustering | X | X | dasf.ml.cluster | ||
Gaussian Mixture Models | X | dasf.ml.mixture | |||
PCA | X | X | X | X | dasf.ml.decomposition |
SVM | X | X | dasf.ml.svm | ||
Boosted Trees | X | X | X | X | dasf.ml.xgboost |
KNN | X | X | dasf.ml.neighbors |
If you are using this project in your research, please cite our first paper where DASF was proposed.
@inproceedings{dasf,
title = {DASF: a high-performance and scalable framework for large seismic datasets},
author = {Julio C. Faracco and Otávio O. Napoli and João Seródio and Carlos A. Astudillo and Leandro Villas and Edson Borin and Alan A. Souza and Daniel C. Miranda and João Paulo Navarro},
year = {2024},
month = {August},
booktitle = {Proceedings of the International Meeting for Applied Geoscience and Energy},
address = {Houston, TX},
organization = {AAPG/SEG}
}
For further reference, below the authors list:
- Julio Faracco
- João Seródio
- Otavio Napoli
- Edson Borin