DISSEC-COZY is a proof-of-concept implementation of DISSEC-ML, the academic work done by Cozy and PETRUS.
It is a decentralized aggregation protocol designed to be used for privacy-preserving machine learning. Nodes locally learn a model and then use a simple additively homomorphic secret-sharing scheme to send data to aggregators. Aggregators form a tree to maximize efficiency.
📌 Note: we recommend to use Yarn instead of NPM for package management. Don't hesitate to install and use it for your Cozy projects, it's now our main node packages tool for Cozy official apps.
Setting up the Cozy DISSEC-COZY app requires you to setup a dev environment. However, you will need a spcific version of cozy-stack
, so we recommend that you install it from sources and run git checkout 3bca7d384076a21367c24bf69f5381ad9e54223b
before running make
.
You can then clone the app repository and install dependencies:
$ git clone https://github.com/JulienMirval/dissec_cozy.git
$ cd dissec_cozy
$ yarn install
📌 If you use a node environment wrapper like nvm or ndenv, don't forget to set your local node version before doing a yarn install
.
Cozy's apps use a standard set of npm scripts to run common tasks, like watch, lint, test, build…
If you want to check that everything is working smoothly, you can check integration test. If you want to see how the protocol works in details, run the demonstration.
In order to allow continuous enhancement of performances, nodes in the protocol can start training from a pretrained model, which is the result of the last training.
Currently, this model is stored on the local file system and the path needs to be defined for the execution to work. In the file dissec.config.json
, set the localModelPath
value to a path where you want this shared model to be stored.
Also, you need to add a line in the cozy.yml
configuration of the Cozy Stack to serve the model as a remote asset. The remote asset of the model currently NEEDS to be called dissec_model
.
Before:
remote_assets:
bank: https://myassetserver.com/remote_asset.json
After:
remote_assets:
dissec_model: file:///home/dode/cozy/models/model.data
Tests are run by jest under the hood. You can easily run the unit tests suite with:
$ cd dissec_cozy
$ yarn test
There are also integration tests that verify that the trainings and predictions work as expected. The Cozy-stack needs to be running for integration tests to work. They can be ran with:
$ yarn test:integration
It currently tests the following properties:
- A distributed training where the union of individuals dataset is equal to the dataset used by a single participant to train a local model give the same accuracy.
📌 Don't forget to update / create new tests when you contribute to code to keep the app the consistent.
A test scenario has been developed to test the protocol with multiple instances. It is currently static and will involves 11 nodes: 7 nodes will contribute a single data to the protocol, 3 nodes will serve as intermediate aggregators and the last will act as the querier, creating the tree and triggerring contributors.
The steps to to execute the demonstration are as follows:
- Have a
build
folder in thedissecozy
repo. For development purposes, you can run ayarn watch
command, which will look for updates in the repo and automatically build the latest version. Else, runyarn build
. - Launch
cozy-stack serve --disable-csp
to start the stack with the dissecozy app loaded. - Create test instances by running
yarn run populate
. This will create the 10 test instances (test1.cozy.localhost:8080
totest10.cozy.localhost:8080
) and automatically provides 10 banking operations of 10 different categories by default. It will also output a JSON file containing all these instances' webhooks, and uploads these webhooks to the querier to use them to construct the aggregation tree. The file is located ingenerated/webhooks.json
. - Open a browser and go to the dissecozy URL of your default instance (e.g.
http://dissecozy.cozy.localhost:8080/
) - In the Nodes section, click the 'Choose a file' button and select the JSON file containing webhooks. Then, click upload to register all the test instances to the querier.
- Go to the Execution section and, in the Full Aggregation sub section, first click the 'Generate new tree' button, then 'Launch execution' button.
Congratulations, you launched the execution. After a few seconds, you should see new file created at the location indicated by localModelPath
value of dissec.config.json
file.
The command yarn run measure
can be used to demonstrate the efficiency of distributed learning.
It runs both type of learning and measures the accuracy on a single validation dataset.
This script implies that yarn run populate
has been run before.
In order to obtain experimental results, a simulation has been done. It aims to simulate only the phase of the protocol spanning from contacting contributors to the querier recomposing the final result. It abstracts the network overlay and assumes the tree construction, as presented in the paper, is already done.
The simulation starts in the following state:
- All the aggregators (leaf aggregators and querier included) start the process of monitoring their children's health with periodic pings
- The first member of each leaf aggregator group sends a request to all its associated contributors to send their data to its group.
The simulator has two components: the simulator itself and a visualization dashboard.
cd simulation
- Rename
example.dissec.config.json
todissec.config.json
and fill it properly yarn install
yarn start
The simulation has executed and the results are stored at the path indicated by dissec.config.json
To launch the dashboard:
pip install requirements.txt
yarn dashboard
- Open your browser at localhost:8050
The Cozy datastore stores documents, which can be seen as JSON objects. A doctype
is simply a declaration of the fields in a given JSON object, to store similar objects in an homogeneous fashion.
Cozy ships a built-in list of doctypes
for representation of most of the common documents (Bills, Contacts, Files, ...).
DISSEC-COZY uses the following additionnal doctypes:
dissec.nodes
is used to register instances willing to participate in the protocol, so that the querier can organize them to create an efficient tree.
If you want to work on DISSEC-COZY and submit code modifications, feel free to open pull-requests! See the contributing guide for more information about how to properly open pull-requests.
Cozy is a platform that brings all your web services in the same private space. With it, your webapps and your devices can share data easily, providing you with a new experience. You can install Cozy on your own hardware where no one's tracking you.
Localization and translations are handled by Transifex, which is used by all Cozy's apps.
As a translator, you can login to Transifex (using your Github account) and claim an access to the app repository. Locales are pulled when app is build before publishing.
As a developer, you must configure the transifex client, and claim an access as maintainer to the app repository. Then please only update the source locale file (usually en.json
in client and/or server parts), and push it to Transifex repository using the tx push -s
command.
The lead maintainer for DISSEC-COZY is Julien Mirval, send him/her a 🍻 to say hello!
You can reach the Cozy Community by:
- Chatting with us on IRC #cozycloud on Freenode
- Posting on our Forum
- Posting issues on the Github repos
- Say Hi! on Twitter
DISSEC-COZY is developed by Julien Mirval and distributed under the AGPL v3 license.