FCIM IS 211M EN Dodon Ion Master
FCIM IS 211M EN Dodon Ion Master
FCIM IS 211M EN Dodon Ion Master
Chişinău, 2023
MINISTERUL EDUCAȚIEI ȘI CERCETĂRII AL REPUBLICII MOLDOVA
Universitatea Tehnică a Moldovei
Facultatea Calculatoare, Informatică și Microelectronică
Departamentul Inginerie Software și Automatică
Admis la susținere
Șef departament: Fiodorov I. dr., conf. univ.
_____________________________
„____” _________________ 2022
Teză de master
Chişinău, 2023
REZUMAT
Keywords: Date, securitate, decentralizare, blockchain, IPFS.
Domeniul de tehnologii informat, ionale se dezvoltă foarte repede si apar tot mai multe tehnici s, i
tehnologii noi ce fac acest domeniu să progreseze. Unele dintre cele mai noi idei apărute in ultimii ani sunt
de a folosi principiile democratiei pentru a rezolva anumite probleme in era noastră informat, ională. Aceste
probleme fiind legate de sigurant, a datelor, despre recunoas, terea apartenentei lor, si altele.
Această lucrare sub numele “Aplicarea tehnologiilor blockchain s, i IPFS ı̂n asigurarea confident, ialităt, ii
s, i accesibilităt, ii datelor sensibile” elaborată ı̂n cadrul tezei de master, de către studentul Ion Dodon al Uni-
versităt, ii Tehnice a Moldovei, are ca scop de a rezolva unele probleme ce t, in de confident, ialitatea datelor
sensibile. Si pentru aceasta se vor folosi tehnologiile blockchain si IPFS. Acestea având caracteristicile
necesare pentru a crea un sistem sigur de protect, ie a datelor care nu depind de o companie tert, ă.
Lucrarea este ı̂mpărt, ită in următoarele părt, i: analiza domeniului, specificarea cerint, elor funct, ionale
s, i mai ales non-funct, ionale, design-ul sistemului, si implementarea lui. Iar la sfârs, it este concluzia unde se
ment, ionează despre observat, iile făcute pe parcursul lucrării s, i rezultatele obt, inute.
Internetul are o important, ă din ce in ce mai mare pe zi ce trece. In ficare zi, in fiecare arie din soci-
etate sunt folosite calculatoarele personale, servere, telefoane mobile inteligente si toate acestea comunică
ı̂ntre ele. O mare parte din datele despre noi, care ne expun personalitatea, sunt stocate in mediul on-line.
Aceste date pot fi folosite cu scopuri rele, de exemplu ele pot fi colectate de o companie sau chiar persoane
individuale fiind vândute altor companii sau persoane individuale pentru ca mai târziu societatea sa fie ma-
nipulata deja fiind cunoscută destul de bine ce preferint, e are sau alte alte date despre populat, ie. Multe din
datele stocate pe servere sunt de natură sensibilă, cum ar fi acte personale medicinale. Desigur la moment
aceste date sunt păstrate cât de sigur posibil, dar pană la urmă ele sunt pastrate pe servere care apart, in unor
companii ce ofera servicii IT (ı̂n cloud). Ceea ce ı̂nseamnă că noi incredint, ăm lor aceste date. Alt aspect
de risc ar fi cazul când aceste servere cad si datele nu mai pot si recuperate. Problemele mentionate sunt
bine cunoscute si există măsuri de prevenire, dar sigur că aceste măsuri nu sunt perfecte. Această lucrare
is, i propune sa vină cu noi idei de a proteja datele utilizatorilor de internet fără a depinde de o companie
externă. Aceste date care să nu le poată obtine nimeni altcineva decât det, inătorul lor, si care nu se pot pierde
ı̂n cazul cand un server cade.
Pentru a insus, i acest nivel de securitate va fi creat un sistem care stocheaza datele pe o ret, ea IPFS
ceea ce ı̂nseamnă că va fi un sistem decentralizat si ı̂n cazul cand un server cade, datele sunt repede recu-
perate pe alte noduri (servere). Va fi creat un Smart Contract care va functiona ca un manager de date s, i va
memora cine s, i ce date det, ine. Toate aceste operat, ii vor fi transparente s, i datele stocate for fi encriptate.
ABSTRACT
Keywords: Data, security, decentralization, blockchain, IPFS.
The field of information technologies develops very quickly and more and more new techniques
and technologies appear that make this field progress. Some of the newest ideas that have emerged in
recent years are to use the principles of democracy to solve certain problems in our information age. These
problems are related to the safety of the data, about the recognition of their belonging, and others.
This work under the name ”The application of blockchain and IPFS technologies for assuring con-
fidentiality and accessibility of sensitive data” elaborated as part of the master’s thesis, by the student Ion
Dodon of the Technical University of Moldova, aims to solve some problems related to confidentiality
sensitive data. And for this, blockchain and IPFS technologies will be used. These having the necessary
features to create a secure data protection system that does not depend on a third-party company.
The work is divided into the following parts: domain analysis, specification of functional and espe-
cially non-functional requirements, system design, and its implementation. And at the end is the conclusion
where it is mentioned about the observations made during the work and the results obtained.
The Internet is becoming more and more important every day. Today, personal computers, servers,
and smart mobile phones are used in every area of society and all of these communicate with each other.
A large part of the data about us, which expose our personality, is stored in the online environment. This
data can be used for bad purposes, for example, it can be collected by a company or even individuals being
sold to other companies or individuals so that later the company can be manipulated already knowing quite
well what its preferences are or other data about the population. Much of the data stored on the servers is
of a sensitive nature, such as personal medical records. Of course, at the moment these data are kept as
safe as possible, but until the end, they are kept on servers belonging to companies that offer IT services (in
the cloud). Which means we entrust them with this data. Another aspect of risk would be the case when
these servers fall and the data cannot be recovered. The mentioned problems are well known and there are
preventive measures, but surely these measures are not perfect. This paper aims to come up with new ideas
to protect internet users’ data without depending on an external company. These data cannot be obtained
by anyone other than their owner, and that cannot be lost if a server goes down.
In order to acquire this level of security, a system will be created that stores data on an IPFS network,
which means that it will be a decentralized system and in case a server falls, the data is quickly recovered
on other nodes (servers). A Smart Contract will be created that will act as a data manager and remember
who owns what data. All these operations will be transparent and the stored data will be encrypted.
Contents
List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1 DOMAIN ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 REQUIREMENTS SPECIFICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 SYSTEM DESIGN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 SYSTEM IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5
4.1.1 Solidity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.2 Hardhat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1.3 IPFS-core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.4 Data protection using AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.5 Goerli testnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 The Smart Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.1 Build process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.2 Resulting artifacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.3 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 The client application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6
List of Figures
7
List of Tables
8
Acronyms
ABI Application Binary Interface.
RSA Rivest–Shamir–Adleman.
9
Listings
4.1.1 Smart Contract definition example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.2 How to connect to IPFS using IPFS-core . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.1 Generate Hardhat project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.2 Hardhat project dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.3 Environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.4 Hardhat testnet configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.5 Hardhat project structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.6 The Ledger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.7 Smart Contract function to publish CID . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.8 Function to get the owner address of CID . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.9 Compile the contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.10The Smart Contract ABI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.11Contract deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.12Deployer accounts per blockchain network . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.1 Client app dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.2 Client side function to upload a file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
A.0.1The Smart Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
10
INTRODUCTION
People use the internet every day at work, at school, at home for entertainment or remote work, and
almost everywhere. At first, computers were used to access the internet, the people started to share data
about themselves on the internet. This data includes what they like, what they do, etc. A bit later the internet
and computers started to conquer almost every field in society such as the medical field, governmental field
an others. At this point, people started to host on the internet even more sensitive data than what was
mentioned above, for example, medical certificates or personal IDs. These data should be protected for
malicious people and those who host the data should make sure they won’t lose it. There are of course
developed techniques to protect the data, but it is sometimes no enough reliable.
This paper describes a system that can be used to protect data with a higher level of security. First
of all, it is needed to have defined the functional and non-functional requirements for such a system. By
looking at the requirements, the most appropriate technologies and techniques were defined that can be
used to implement the system. The functional requirements are a few and very simple and those are to be
able to write some data and to be able to retrieve back what one stored earlier. While the non-functional
requirements are the most important for the project. They are what makes this project special.
By looking at the non-functional requirements and after researching on what technologies can be
used, it has been determined that to always assure availability a good technology to use is IPFS because
it is a protocol that allows storing data in a decentralized way, meaning that if a server falls down, the
other servers (also knows as nodes) will recover the data. To assure data encryption and transparency it
is possible to use Smart Contracts running on Ethereum blockchian. The Smart Contract will store the
information about who owns what.
The first chapter is the domain analysis. In this chapter the problem is analysed and defined. Also
in this chapter the scope and solution are being explained. Along all these, in the first chapter is explained
why the user would have to pay 0.36$ to store a document on this system. The second chapter tells the
functional and non-functional requirements of the system and also show the design of it using explained
diagrams. As the project focuses more on showing a new approach to protect the data online, there are
only a few simple functional requirements, while the non-functional requirements are the most important
aspects of the project being actually the backbone definition for what should be achieved. The diagrams
described how the selected technologies can be put together to implement that system that fulfils the defined
requirements. The third chapter is the actual implementation of the system with all of its code examples
and explanations.
10
1 DOMAIN ANALYSIS
It is clearly seen that informational technologies have an important role in our lives more and more as
time passes. These days informational technologies are used pretty much in every area of society including
in sensitive areas such as state security, personal health, the military domain, etc. This means that people
should pay special attention to the security aspect of today’s informational security. This chapter will be
explained the weaknesses of today’s informational data and how sensitive data can be protected on the
internet.
11
Problem definition - Sensitive data is stored on servers belonging to third-party companies and
these companies could use the data for their personal purposes without the real owner of the data knowing.
12
friendly. As was mentioned, each node strives to mine more blocks since a blockchain is actually a chain of
blocks mined by the nodes that run the network. The nodes are rewarded for the mined blocks. Of course,
the more blocks one node mines, the more they get as a reward. In the consensus process, nodes agree upon
each other on the blockchain state. This means that if one node mines more blocks and has the longest
blockchains, the other node must validate that that node did a correct job, and its blockchain state is correct.
It is like voting, if more than 50% agree that one mined correctly the blocks, then those blocks are added
to the blockchain and synchronized with the other nodes. But what if someone holds more than 50% of the
nodes? This means that that person can decide for the rest of the miners. This is called the 51% attack.
Another attack is the Sybil attack which is similar to the 51% attack but a bit different.
In Figure 1.1 is presented the structural difference between a centralized data storage system and
IPFS. Mote details about this comparison can be found on the following medium article named IPFS: A
Complete Analysis of The Distributed Web. [8]
Figure 1.1 - Comparing the movement of data in IPFS to centralized client-server models
13
of war between hackers and scientists who want to find better ways to secure the data on the Internet.
As mentioned above, blockchain will be used to secure the data on the internet. The Requirements
specification and System Design 2 chapter and Implementation 4 chapter will be shown how blockchain and
IPFS can be used to encrypt the data, sign it, store it in a decentralized way, etc. Blockchain actually, when
it was invented, it was intended to be used as a platform for payment registry systems and ledger systems in
the digitalized world. Blockchain by its nature is immutable. All transactions of money are stored on it and
it is not easy to hack the blockchain in such a way that it will show that someone corrupted it and got illegal
money. Blockchain has a well-defined role of storing data transparently and democratically. Democratically
means that the same rules of its use are equal for everyone. But it has a cost that will be explained in the
Requirements Specification and System Design chapter 2.
14
1. Filebase
S3-compatible object storage based on Web3. Build Web3 with Web3, storing data on any of the networks
we support, including IPFS, Sia, Skynet, and Storj. The filebase functions as a suitable means for users to
upload data to favorably safe and geo-redundant blockchain webs without the necessity to handle contracts,
SLAs, or cryptocurrencies. In the filebase, all files linked to IPFS are even kept on the Sia network. This
produces a condition where the storage tier for IPFS filebase nodes is highly public and, most notably, geo-
redundant. Get started now with our unlimited 5GB free tier. We don’t charge for outgoing requests or API
requests, so you don’t have to worry about high overhead. There are no minimum object size requirements
or data retention policies in the filebase. Our API replaces existing applications and tools that use AWS S3,
making it easy to migrate to Filebase.
Pricing: Starting Price: $5.99 per month
Pricing Details: Our minimum subscription fee of $5.99 includes up-to your first 1 TB of storage and 1
TB of bandwidth. Additional storage and outgoing bandwidth transfer is billed at $0.0059 / GB.
Free Version: Free Version available.
Integration: Filebase integrates with the following services: Comet Backup, Amazon S3, NetDrive, Cy-
berduck, Couchdrop, Simplebackups, BackupSheep, Arq, Dropshare, and SnapShooter.
Ratings/Reviews: User Reviews as per Sourceforge.net
– overall 5.0 / 5;
– ease 5.0 / 5;
– features 4.5 / 5;
– design 4.5 / 5;
– support 5.0 / 5.
Features:
– blockchain;
– cloud storage;
a) encryption;
b) file sharing;
– IPFS pinning.
15
2. Google Drive
Store, share, and access your files from any device. Your first 15 GB of storage is free. With Drive
Enterprise, businesses only pay for the storage employees use. It comes with Google Docs, Sheets, and
Slides — and works seamlessly with Microsoft Office. Keep photos, stories, designs, drawings, recordings,
videos, and more. Your first 15 GB of storage is free with a Google Account. Your files in Drive can be
reached from any smartphone, tablet, or computer. So wherever you go, your files follow. You can quickly
invite others to view, download, and collaborate on all the files you want–no email attachment is needed.
Get started with Drive for free.
Starting Price: Free
Free Version: Free Version available.
Integration: Google drive has probably the richest integration set. The set of services with which Google
Drive can be integrated can be seen on the following resource Google Drive Integrations. Some of them are
Connecteam, monday.com, Wrike, and ClickUp.
Ratings/Reviews: User Reviews as per Sourceforge.net
– overall 4.9 / 5;
– ease 4.9 / 5;
– features 5.0 / 5;
– design 5.0 / 5;
– support 4.8 / 5.
16
But the strong are for this system are its security and the fact that it is decentralized and there is no
company controlling it and all this is due to blockchain and IPFS.
From Table 1.1 it is noticeable that Google Drive has more advantages but at the same time, it also
has more disadvantages than Filebase. The two platforms each of them has their own strong point in a
specific field, for example, Google Drive is good for storing lots of data while Filebase is good for storing
data is a decentralized way.
In Table 1.2 below presented the advantages and disadvantages of the system that is being described
in this paper. Like in Table 1.1 there are enumerated advantages and disadvantages, but this time for a
system that has a very narrow use. IPFSdataStore has its advantages due to the blockchain and IPFS
technologies. Blockchain is good because is able to share data fast and in a secure way amount entities.
In fact, blockchain can bring many advantages to businesses whether those businesses are using private or
public blockchains.
The strengths of blockchain are the following: trust, improved security, and privacy, decentralized
structure, reduced costs, speed, visibility and traceability, immutability, tokenization, and individual control
of data. It creates trust between entities where the trust does not exist or is unproven. There is not central
entity that coordinates everything. Everything is a peer-to-peer network and democratic rules are applied to
17
make this system work. The data stored on the blockchain is stored across a network of computers making
hacking even more difficult. Blockchains are immutable, which enables secure and a reliable audit of the
information. It is fast because that many intermediaries that would have the role to make the system secure
are actually absent. Tokenization is the process by which the value of an asset (physical or digital) is con-
verted into a digital token, which is then recorded and then transferred through the blockchain. According
to Joe Davey, chief technology officer at global consulting firm West Monroe, tokenization has taken root
in digital art and other virtual assets, but tokenization has broader applications that can simplify business
transactions. Utilities, for example, can use tokenization to trade carbon credits as part of carbon capping
programs.
From Table 1.2 it is noticeable that there are a lot of disadvantages, but those disadvantages are from
the point of view of a traditional data storage system. The good thing is that the advantages are very strong
and difficult to accomplish with a storage system that is held by a company.
It is convenient to present a comparison in tabular form because it is easy to see the differences
alongside. There are 5 advantages of using the IPFSdataStore and there are 8 disadvantages of using the
IPFSdataStore. Even if it has less disadvantages it is a recommended tool for those people who want to
store their data online securely.
18
1.4.3 Cost analysis
Google Drive offers two kinds of plans: a monthly plan and an annual plan. Here will be compared
its monthly plan with the other service’s monthly plans. On Google’s official site, the plan looks like this in
the figure
As it can be seen from the image, Google Drive has a free plan and this plan offers a generous
amount of storage space - 15 GB. Also, one can observe that people will pay per month 0.99$ if they want
to add 100 GB to their account. Alongside this 100 GB, the user gets Access to Google experts, the user can
share this 100 GB with 5 other users and some other extra member benefits. These advantages are called
Google One. To be easier to compare the costs between Google Drive, Filebase and IPFSdataStore will be
taken the amount one has to pay per month.
Filebase is free to utilize for all users who can keep up to 5GB of data on Filebase with no credit card
needed. Behind 5GB, users will have to boost to our their model subscription. Their minimum subscription
fee is $5.99 which includes up to 1 TB of storage and 1 TB of bandwidth. In the case of Filebase, they
also have bandwidth restrictions in contrast to Google Drive. For additional storage capacity and transfer,
the user will have to pay $0.0059 / GB. Also, they say that their subscription is renewed monthly until it is
canceled. Filebase claims that they will not charge for ingress nor for the number of API requests. They do
not have minimum file or sector size, there are no retention issues or retrieval delays, they do not require
the use of any special software and there is no special skill set required meaning that it is an easy-to-use
service.
It is clearly seen that as in the case of Google Drive, they charge a fixed price per month. Next will
19
be explained the IPFSdataStore pricing. Filebase and Google Drive, these two and most of the companies
that offer any kind of IT service, take the money in their companies. In the case of IPFSdataStore, as there is
no corporation, the money is distributed to those who take part in the mining process. It is like a community
where whoever works more and is more trusty makes more money.
As mentioned in the previous chapters, the IPFSdataStore will not be held by any corporation, but
as in life there is nothing for free, of course, the user will have to pay some money to store the data. That
money will go the miner. Miners are those who run the nodes on which the data is being stored. To run a
node it is required to consume energy. One can think of a node as a server running special software. That
software has the role to mine blockchain blocks. As mentioned above, there are two way of proving that a
blockchain node is valid or not. These two methods are PoW and PoS. PoW is more expensive, while PoS
is cheaper because it is based on a voting concept and is environment-friendly.
20
atomic operation costs a fixed price. To be able to make the calculation, the Etherium blockchain will be
taken as an example, and we will suppose that this blockchain will be used to create the Smart Contract.
There are a lot of blockchains that support Smart Contracts, such as Cardano, Solana, Avalanche, Tron,
and a lot of others. The crypto-currency related to the Ethereum blockchain is Ether. This means that each
transaction executed on the Etherium blockchain costs a specific amount of Ether or Wei. One Wei is a
smaller unit of one Ether. This cost is named the GAS fee.
What is Gas? Gas dictates the unit that calculates the quantity f computational work needed to
perform precise functions on the Ethereum network. Since Ethereum transactions need computational aids
to complete, the individual transaction needs a price. Gas directs to the cost needed to complete a transaction
on Ethereum successfully. The operations that consume gas are illustrated in Figure 1.3
Gas prices are paid in Ethereum’s native money, ether (ETH). Gas expenses are indicated in Gwei,
which itself is a smaller unit of ETH - each Gwei costs 0.000000001 ETH (10-9 ETH). For instance, rather
than stating that gas costs 0.000000001 ether, one could tell that gas prices are 1 gwei. ’gwei’ tells ’giga-
wei’, and its value is 1,000,000,000 wei. Wei (named after Wei Dai, the creator of b-money) is the smallest
unit of ETH.
Why do Gas fees exist? In brief, gas prices assist keeping the Ethereum grid safe. By demanding
a price for every calculation performed on the network, to stop bad players from spamming the blockchain
web. In order to bypass unexpected or negative infinite circles or other computational wastage in the code,
the individual transaction is needed to limit how numerous computational actions of code execution it can
operate. The basic unit of calculation is ”gas”.
21
Even if a transaction has a limitation, a gas that is not used in a transaction is yielded to the user
(i.e.max f ee − (base f ee + tip). The gas refund process is presented in Figure 1.4.
As it has been mentioned, each computational operation costs Wei. In Solidity programming lan-
guage, which is one of the most popular programming languages to write smart contracts, each atomic
operation has a price, as shown in Figure 1.5. [16]
This is just a part of all the operations that are part of a contract that can be written in the Solidity
language. The entire table can be seen on the following link.
supposing that there are 100 000 000 users using IPFSdataStore, and each of them would sore 10
documents on the platform. From the table presented in Figure 1.5 can be seen that SSTORE is the operation
22
to store something on the blockchain. This operation costs 20,000 gas if one stores non-zero values in a
location in where there were zero values. On Oct 07 2022 the value of one gas was 13.49 Gwei, so in total
it will cost 20000*13.49=269800Gwei. In Ether, this is equal to 0.0002698 and in USD this is equal to
0.36$, since on 10th October 2022 when this was calculated 1 ETH was valued at 1328.72$. This would
mean that to save a record that shows that a user holds a specific document, this would cost 0.36$, therefore
for 10 documents this would cost 3.6$. This would be all the cost of the user’s spinning his IPFS node. 100
000 000 users in total would spend 100 000 000 * 3.6$ = 3.6 billion U.S. dollars if each one would store 10
documents.
23
2 REQUIREMENTS SPECIFICATION
In each project setting up the requirements is an important step to make a project alive. If the
requirements are set up correctly then the project has a higher rate of success. In case they are not defined
correctly or are not defined at all, this will bring problems regarding the communication between the entire
team working on the system and inevitably will cause delays.
As shown in Figure 2.1, there are different types of requirements and those are: business require-
ments, user requirements, product requirements, and finally product requirements lead to functional and
non-functional requirements. As shown in the diagram, everything starts with business requirements, and
the others are derived from the previous.
Functional requirements and non-functional requirements are enumerated in more detail in the Func-
tional Requirements section 2.1 and Non-functional Requirements 2.2 respectively. The other types of re-
quirements should not be confused with the functional requirements because they are more technical and
have a higher grade of details for the developer to be easier to understand what and how to implement. The
other types of requirements are broadly explained here
Business requirements are more focused on the business values. As shown in Figure 2.1 taking
from left to right the level of details in the requirements increases. This means that the business requirements
have the lowest level of detail. This is because these requirements are most often defined by the owner of
the project and he/she in a general way knows what he/she wants but knows less about the steps needed to
rich to the final goal which is to get the project done correctly.
24
User requirements are more describing what the user can do on the platform. These requirements
can be represented in different types of diagrams such as UML (Unified Modeling Language) Use Case
diagrams or can be described in user stories or scenarios.
Product requirements are describing what actions this system should perform in order to achieve
the business and user requirements. One product requirement can be composed of a number of smaller
functional requirements. The final system is a set of functional requirements implemented and the behavior
of the implemented system shows the non-functional requirements.
There is a small set of functional requirements for the system that is being described in this thesis
because the purpose of this thesis work isn’t to implement much functionality for the product but to actu-
ally demonstrate that such a system implemented on blockchain and IPFS can bring a lot of security and
performance benefits in special for systems that operate with sensitive data. This means that for this project
the non-functional requirements are the most important.
25
have a balance like any other Smart Contract that is deployed on a blockchain. This balance will be used to
keep the donations. The Contract should accept donations as a simple crypto-value transfer from another
contract or wallet held by a person. On the web page, there should be a button that can be used to show
what’s the amount of donations that this Smart Contract got. The donations should be withdrawable only
using the same wallet that was used to deploy the Smart Contract.
R4 On the SPA (Single Page App) there should be a button that can be used to get the list of all
CIDs for the stored encrypted files. This acrshortglo:cids can later be used to download the encrypted files.
Also, it should be possible to get the list of CIDs of the stored file of another user by specifying the wallet
address which is actually the private key.
R5 The user should be able to get the address of the owning wallet of a file that is stored on IPFS.
This can be used to demonstrate who is the owner of a file because each unique file will have a unique CID
encrypted with the same key.
R6 Once the user connects the SPA to a wallet provider, bellow should be listed all the CIDs of
the files that were already stored. If the user changes the connected wallet, the CIDs should also change
accordingly.
R6 The entire SPA should be deployed on IPFS. The app should be accessible through its CID. This
means that the app will be open source therefore the user will have more trust in the app. The app will not
have a backend part and the user will see all the code the app consists of.
26
R4 The system should be accessible anywhere on the planet where there is the internet and it should
be accessible easily like a simple website using a browser.
R5 Since all the data is stored on IPFS and on a Smart Contract (later named – the Ledger) there
shouldn’t be restrictions on how many users the system will hold. This relies on the blockchain capacity
and how many nodes are mining the respective blockchain, and of course, this also depends on how many
nodes contribute to the IPFS network.
27
3 SYSTEM DESIGN
Each system should have a well-defined design before implementation. This held the developers
understand how to implement the system and will help later to extend it or to solve any issues about it.
While defining the design the architect while making the design diagrams can see issues that can appear
later in the system, before implementing it. This is because people in some situations can understand things
better if looking at a diagram.
This chapter presented the diagrams that explain how the systems work, what are the components,
and how those components interact between them. The platform consists of several components that come
from different environments. Those environments are different in the way they are constructed and each
of them has its own advantages and disadvantages in specific situations. As mentioned in the previous
chapters, this platform will consist of components that are not held by any third-party company and that are
open source. There will be a single component that is primarily developed withing this project, but this one
has the responsibility to integrate all the other ones in order to achieve the goal of allowing the user to store
their data online in a secure way.
The environments that interact are the following: an Ethereum blockchain, an IPFS network (for this
one will be needed a local node), and the local environment (user’s computer) where will be running a SPA.
This chapter will be described the use cases of the system, the high-level design showing the components,
how the data will be protected on IPFS, the flow of the data storage, and what are the external components
that complement the system.
28
In the following Figure 3.1 is presented the most important use cases of the system.
As shown in this figure, the user will be able to connect to a crypto wallet, for instance, MetaMask.
MetaMask is an open-source chromium-based extension that is used to hold the key pair of crypto wallets,
and not only. After the SPA is connected to the wallet the user can select a file from his local storage, then
he/she will be asked for a key that will be used to encrypt the file because the file will be later store on IPFS
network and this network is open for everyone, everyone can see the what is stored on a given CID. The
same key that was used for encryption should be used for decryption. The Encryption process is based on a
symmetric key algorithm – AES.
Other use cases are the possibility to get encrypted files based on its CID. After the encrypted file
is downloaded it should be decrypted using the same key that was used for encryption. Another use case
is to get the owning address of the data stored on a given CID. This can be used to prove that a given CID
is owned by only one person. And lastly, one of the most important use cases is to get all the CIDs of all
the data that was stored using a given wallet. This is useful because it is difficult to remember all the CIDs
because they represent a long string of characters and so do the wallet addresses, but these are managed by
MetaMask, for instance.
29
3.2 The environments involved
The platform is built of components that are part of three totally different environments: blockchain,
Interplanetary filesystem, and the local machine of the user. One important component of the system is the
smart contract which will be deployed on the Ethereum blockchain. The data itself that is owned by the
user will be stored on IPFS network. And finally, the local computer of the user will be used to run the SPA
in the browser. The wallet management will also run in the user’s browser.
Blockchain is a technology used to store transparent data in a democratic way. The blockchain
is immutable and it has been proven that it cannot be hacked because of its characteristics. This is why
blockchain has been used and is still used for sensitive data storage such as keeping transactions of money,
it is used in voting systems, and so on.
IPFS is a new way of storing big amounts of data without involving other third-party companies such
as Google, Facebook, Amazon, or any other. It is a decentralized data storage mechanism in which anyone
can contribute with storage and can be rewarded for the storage offered. This can be achieved through
Filecoin. IPFS is actually a protocol very similar at P2P (Peer-to-peer) protocol. For IPFS it is enough to
start a node and through it, the user can store any file or even folder on the network.
30
well as each bit stored. This is why blockchain can not be used to store the files the user owns. Instead, it
can be used to store data that does not take much memory and represents relations between entities.
The later sections will be described in more detail the external components: MetaMask, IPFS Desk-
top, Etherscan, testnets, and others.
31
software level. At the hardware level, the algorithm is more performant but more difficult to set up because
it could be the case that special hardware is used. For this project, the encryption will be performed at the
software level because this way to project is decoupled from the client’s hardware which means the system
is not hardware dependent.
On ssl2buy.com can be found a more in detail explanation of how symmetric and asymmetric key
cryptography work and what the differences between them are. [18] As shown in the following Figure 3.3
and Figure 3.4 that are fully explained on ssl2buy.com, the two parties which are involved in the secure
communication should use the same key for encryption and decryption
while for asymmetric key encryption, the two parties should not exchange any key. Asymmetric encryption
is a new technique. One of the most popular algorithms for public-private key cryptography is RSA. The
algorithm was invented by three people Rivest, Shamir, and Adleman, hence the name of the technique.
RSA is a technique that is based on mathematical principles just like most of the other cryptographical
techniques. This technique is basically based on some properties of the prime numbers.
The two keys are related to each other. First, the private key is chosen, and using the private key it
is possible to generate the public key. For encryption, the public key is used, and to be able to decrypt the
data only the private key will help. There is no other key that can decrypt this data.
A very similar technique is used to generate wallets for blockchains. The public key is considered
the address of a wallet and the private key is used to send transactions. Only the user who owns the private
key of an account can send crypto value from that account. It is very easy to obtain a crypto wallet. For this,
it is just needed to generate a key pair with any tool that generates addresses compatible with the ones used
32
by the blockchain. The user can firstly any private key he/she wants, it is recommended the key be longer,
the based on this private key the public key will be generated.
In the next section will be explained the entire process of how a file is encrypted and saved to IPFS,
registered in the Smart Contract.
33
it. Now, if the user wants to store some data to IPFS, they will select a file from their local machine’s hard
drive, and after the file was selected the user will be asked for a key to be used for encryption. After the key
was inserted the file will be encrypted using the AES algorithm. To store the file on IPFS the user will have
to press Store File button. After a few seconds, the user will receive the response from IPFS containing the
CID of the stored data.
The obtained CID is then sent to the Smart Contract using the selected account from MetaMask. For
this will be created a transaction. This transaction will cost GAS. The GAS price depends on the state of
the blockchain. The selected account from MetaMask will be charged the consumedgas ∗ gasprice ETH.
The Smart Contract will register this CID on a map as a key and its value will be the address of the wallet
that was used to register the CID. In this way, the Smart Contract will keep track of who owns specific data
from IPFS because the CID is like an address of the date on IPFS.
To receive The file back decrypted the user should know the CID. If knowing the CID he can down-
34
load the data from IPFS. After downloading, they will be asked for the same key that was used for encryption
to decrypt the file. If the wrong key is specified then the ”decrypted” data will be corrupted.
The only components that are developed for this project are the SPA and the Smart Contract. The
single page application will be deployed on IPFS while the Smart Contract (also named – Ledger) will
be deployed on the Ethereum blockchain. The other components are used to fulfill the other needs and are
integrated with the whole system. The external components used are a MetaMask extension for the browser,
the Ethereum blockchain, the IPFS network, and Etherscan. The ones that were not explained till now will
be explained in the next sections of the next Implementation chapter 4.
3.6.1 MetaMask
MetaMask is an Ethereum-based crypto wallet manager. This wallet manager is open source. It can
be used to call Smart Contract or to send amounts of ether from one account to another. The manager can
35
be considered a gateway to the Ethereum blockchain. In Figure 3.7 can be seen the interface of MetaMask.
MetaMask is simple and easy to use. The extension can be used to swap Ethereum-based assets, to see the
previous transaction made with a specific wallet, and is easy to connect to different blockchain networks
even testnets. It also can hold multiple wallets.
3.6.2 Etherscan
Etherscan is the block explorer for Ethereum-based blockchains. One can think of it as being the
Google for the Ethereum blockchain. Figure 3.8 is presented one section of a page from Etherscan where
is shown some general information about the current state of the mainnet Ethereum blockchain. Some
of this information includes the current Ether price, the total number of transactions, medium GAS price,
Ethereum transaction history in the last 14 days, the market cap, the number of the last finalized block, and
the number of the last safe block.
Etherscan can also be used to look into the code of a Smart Contract because everything in the
blockchain is open and transparent. It is also to see the ABI of a Smart Contract, the Transactions related
36
to one, the events and so on. In general, it is possible to see everything that is related to the blockchain.
37
proving the correctness of a block. Now to testnets in use are Goerli and Sepolia. These two are based on
Proof of Stake.
Testnets are using fake ETHs. This is useful for development purposes where it is risky to use real
money. Testnets have their own scanners very similar to Etherscan for mainnet. For example, the scanner
for Goerli testnet is goerli.etherscan.io. In order to get fake ETH, there are faucets. Faucets are a simple
website that can be used to transfer fake ETH into our wallets used for development purposes.
38
4 SYSTEM IMPLEMENTATION
This chapter describes the technologies used to make the concept alive are shown the code snipped
for the actual implementation and other aspects related to the implementation of the system. Usually, there
are more possibilities to implement the same system, these possibilities relate to the chosen techniques and
technologies. Also, the implementation mostly depends on the defined architecture. Different technologies,
techniques, philosophies, and architectures will produce systems with different non-functional achieve-
ments, while the functional requirements will be satisfied and will work mostly the same way. So, to get
better performance, a more secure system, and other improved non-functional requirements it is important
to choose the appropriate technologies, and techniques and to have a well-defined system design.
In general, for this system were chosen technologies that are proven to be reliable. These technolo-
gies are also some of the most popular have great documentation and are supported by lots of developers.
Smart Contract development is already not something new for developers and even Smart Contracts are
already used by many clients where these technologies fit well for their requirements. These requirements
are usually regarding security and transparency. IPFS is already used by artists to share their artworks, also
known in the online world – NFTs.
4.1 Technologies
There are several languages available for developing Smart Contracts. Most of the time the used
language depends on the used blockchains. There are blockchains that support Smart Contract develop-
ment using general-purpose programming languages such as Python. Once Smart Contracts were invented,
specific languages for their development appeared. One of those languages is Solidity.
4.1.1 Solidity
Solidity is a Smart Contract development language. It has all the features needed to develop a
feature-rich Smart Contract. The language is high-level and object-oriented. This language allows the
creation of a Smart Contract that represents sub currencies. Each Smart Contract file has a .sol extension.
Each File should have a license identifier and a compiler version to be used to compile the Smart Contract.
In the Listing 4.1.1 is presented a simple Contract example and how the version and license can
be specified at the top of the file. The Contract is like a class in Java programming language. It has state
–attributes and behavior – methods. Each contract is resized at an address on the blockchain where it is
deployed. In the Listing 4.1.1 the uint storedData is the state while function set(uint x) public and function
get() public view returns (uint) are methods. Methods can be called from another contract or from the
exterior of the blockchain for example using MetaMask. Also, methods have different access modifiers
such as public, private, external. There are as well other methods specifiers such as payable. Payable
means that the call of this method will charge a specific amount of ETH.
39
/ / SPDX− L i c e n s e − I d e n t i f i e r : GPL− 3 . 0
pragma s o l i d i t y >=0.4.16 < 0 . 9 . 0 ;
contract SimpleStorage {
uint storedData ;
f u n c t i o n g e t ( ) p u b l i c view r e t u r n s ( u i n t ) {
return storedData ;
}
}
Solidity also supports the creation of an interface. Interface help to define commons trait for the
contracts.
4.1.2 Hardhat
When developing a Smart Contract it is not recommended to use the magnet because the magnet
uses real money. For each Smart Contract deployment, the developer should pay a GAS amount. There are
testnets that can be used with fake money for testing purposes. Testnets are recommended for testing pur-
poses because they are very similar to the mainnet. But they are not convenient to be used for development
because testsnets like mainnets are slow. They are slow because each transaction should be confirmed with
a few mined clocks to make sure that the transaction was successfully registered in a block. Here come to
help the libraries such as Hrdhat, Truffle, Ganache, and other useful tools. For this project is used Hardhat
because its widely used by many developers and because it is based on Javascript which is a well know
programming language used in many applications of different types.
Hardhat has many features that are handy when developing a Smart Contract. One of the most
important is the ability to run a fake local blockchain node. This local blockchain is quick and the developer
will not have to wait a lot of time for the transaction to be mined and confirmed. Any valid blockchain even
the local one can be easily integrated into MetaMask. MetaMask is an extensible tool that works with any
Ethereum ETH-based blockchain.
Hardhat as well has lots of plugins, for example, it has plugins that make the deployment easier or
40
plugins that help test the Smart Contract from the exterior. A list of Hardhat plugins are if the following
– @nomicfoundation/hardhat-toolbox;
– @nomicfoundation/hardhat-chai-matchers;
– @nomiclabs/hardhat-ethers;
– @nomiclabs/hardhat-etherscan;
– @nomiclabs/hardhat-vyper;
But these are just a fraction of them. The framework has plugins for almost any thing that is needed for
Smart Contract development. The framework can be installed as a NPM package and has an extension for
VS Code.
4.1.3 IPFS-core
As presented in the third System Design chapter 3, the system is composed of several components
two of which are the ledger that is deployed on an Ethereum blockchain and another is a client app that in
this case will be a single page application. The client app will connect to the IPFS network. For this, it
needs a way to connect to the network.
IPFS-core is a JavaScript-based library. The library allows connecting to the IPFS network and
provides useful functions to upload or download data to and from the network. The library integrates all
that is needed to integrate the app into the IPFS network.
It is easy to integrate the library into the application. As shown in Listing 4.1.2 all that’s needed to
get a connection to the network is to import the library and to call the .create() function.
i m p o r t * a s IPFS from ’ i p f s − c o r e ’
c o n s t i p f s = a w a i t IPFS . c r e a t e ( )
c o n s t { c i d } = a w a i t i p f s . add ( ’ H e l l o world ’ )
The cid is the content identifier explained in the System Design chapter 3. The library is free and open
source and dual licensed under MIT and Apache-2.0.
41
before AES. There is also a derivative of DES called 3DES. But because this is a derivative the technique
doesn’t differ very much from DES. It has been chosen as a symmetric key encryption technique because
this technique is faster than the asymmetric key encryption techniques therefore big chunks of data can be
encrypted quickly using AES.
It has been proven that DES and 3DES are not enough secure for the current systems. AES has the
advantage that it allows using keys of different lengths. It also uses a more elegant mathematical approach.
DES is based on a key of 56bit length while AES allows keys of length 28-bit, 192-bit or 256-bit. This
makes the algorithm much stronger. Another disadvantage of DES is that it is efficient only on hardware.
AES was designed to be efficient in both hardware and software. Figure 4.1 is presented a show summarized
comparison of the two encryption algorithms.
When the user wants to upload a file on the IPFS network, he/she will first be asked for the en-
cryption key. After the user provides the encryption key the file is encrypted and sent to the network in
the encrypted form. The encryption will take place at the software level, more precisely in this case the
encryption will take place in the browser.
When the user wants to download the encrypted file, he/she will have to provide the same key that
was used when the file was encrypted and save it in the IPFS network. The file will first be downloaded then
as in the encryption case the description will take place at the software level and also in the browser. There
won’t be any error if the user specifies the wrong decryption key. The client app will still try to decrypt the
file but the resulting data will not be the original one - in fact, there will be a file consisting of bytes that in
the end do not represent any valid data format.
42
called faucets that allow a developer gat a certain amount of crypto for testing purposes to test his application
which is in the development phase. Since there are these faucets, the money from the testnets is not valuable
at all. The testnets were invented for development and testing purposes because it is not real to use the
mainnet for testing purposes. Using the mainnet for testing and development purposes will lead to the
loss of lots of money because the processing of developing any kind of software is based on the fail-trial
approach.
Sometimes the testnets that act almost identical to the mainnet are also slow. The more users use
the testnet the slower it is because there would be more blocks to mine if there are more users. There are
as well other kinds of blockchains that are used for testing and development purposes. But this blockchain
even if on the surface they act like a mainnet, under the hood they are actually very different implemented.
This difference is intentional because these types of blockchains are made to be fast in order to have a fast
development cycle. Like in the case of testnets mentioned earlier, the blockchain does not use real money.
This blockchain is usually integrated into a tool that is used for developing Smart Contracts and this tool
comes with many other useful features that are useful for Smart Contract development. An example of such
a tool was already presented in one of the previous sections of this chapter.
In regard to the testnets, there are many of the not just one. Each blockchain mainnets has several
testnets. Ethereum mainnet, for instance, had the following testnets: Ropsten, Rinkeby, Kovan, and other.
”Had” because these are already deprecated. They were deprecated because Ethereum 2022 switched to a
different block approval approach. Earlies the PoW (Proof of Work) approach was used, but now the PoS
(Proof of Stake) approach is being used. When the Ethereum mainnet switched to PoS two new testnets
were created to work with the proof of stake approach: Goerli and Sepolia. These two testnets are very
similar to the actual Ethereum mainnet. It doesn’t make a noticeable difference which one of them is
chosen to work with. For this project, the Goerly testnet was chosen. This means that the Ledger Smart
Contract will be deployed on the Goerli testnet and all the transactions will be registered on this testnet. But
this happens only for the testing and development phases. In production, the mainnet will be used.
Each mainnet has one or more block explorers. The block explorers are interfaces (web, mobile,
or other) that listen to all blockchain interactions and keep track of what happened on the blockchain. For
the Ethereum mainnet the official block explorer is Etherscan. The testnets also can have their own block
explorers. For the Goerli testnet for instance, the official block explorer is https://goerli.etherscan.
io/. On this website can be seen all the blocks that have been approved, what the approved transactions
and what are pending ones, all the information about the transactions, and lots of other useful information.
43
4.2 The Smart Contract
The Smart Contract, also called the Ledger, is responsible for maintaining the relationship between
the wallet address with which a file was deployed to the IPFS and the CID of the uploaded file. This Ledger
is like a key-value database. It basically consists of a hash map in which the key is the CID of the file and
the value is the address of the wallet. In this way, the Ledger keeps track of who is the owner of each data
stored on the IPFS network.
The Smart Contracts are agreements between the users of the platform. Everyone using the Smart
Contract agrees to the contract’s rules. Everyone can see the contract’s rules because the contracts are
deployed on a blockchain and everything that is on a blockchain is publicly available. This means that
everyone can see the code of the Smart Contract. They are computer programs written in programming
languages specific to writing them or some other general-purpose programming languages. The contract is
automatically executed when a transaction is sent from a wallet to the address of the contract or even from
another contract to the address of a contract. When a transaction is sent to a Smart Contract, the contract
has all the information of the transaction and even has the access to all the funds that were sent with the
transaction.
For this platform, besides registering in the Smart Contact who is the owner of each data store in
the IPFS network, the Smart Contract can do a little more. For example, it can receive donations, it offers
information about what are all the CIDs of the data that a user owns.
npm i n s t a l l −− s a v e − dev h a r d h a t
after running this command the developer will be asked if he/she wants to use JavaScript or TypeScirpt
to develop with. After the execution finishes the package.json file will be created containing the in-
formation about the project as shown in Listing 4.2.2. Two of the most important dependency are the
@nomiclabs/hardhat-ethers and ethers. These two libraries allow interaction with the blockchain. chain is
used for testing the Smart Contract. hardhat-deploy is a plugin/extension of Hardhat. It is used to deploy
the Smart Contract on the blockchain. hardhat-gas-reporter is a hardhat plugin that is useful to create gas
44
usage reports for the gas that was used to deploy the Smart Contract. solidity-coverage is a library that is
used to calculate the coverage percentage of the tests. These are mainly the most important libraries used
by Hardhat.
{
...
” devDependencies ” : {
” @chainlink / c o n t r a c t s ” : ” ˆ 0 . 3 . 1 ” ,
” @nomiclabs / h a r d h a t − e t h e r s ” : ”npm : h a r d h a t − d e p l o y − e t h e r s @ ˆ 0 . 3 . 0 − b e t a
.13” ,
” @nomiclabs / h a r d h a t − e t h e r s c a n ” : ” ˆ 3 . 0 . 0 ” ,
” @nomiclabs / h a r d h a t − w a f f l e ” : ” ˆ 2 . 0 . 2 ” ,
” chai ”: ”ˆ4.3.4” ,
” dotenv ”: ” ˆ 1 4 . 2 . 0 ” ,
” ethereum − w a f f l e ” : ” ˆ 3 . 4 . 0 ” ,
” ethers ”: ”ˆ5.5.3” ,
” hardhat ”: ”ˆ2.8.3” ,
” hardhat −deploy ”: ” ˆ 0 . 9 . 2 9 ” ,
” h a r d h a t − gas − r e p o r t e r ” : ” ˆ 1 . 0 . 7 ” ,
” p r e t t i e r −plugin − s o l i d i t y ”: ”ˆ1.0.0 − beta .19” ,
” s o l i d i t y −coverage ”: ”ˆ0.7.18”
},
...
}
After the basic Hardhat project structure was created in the project’s files there will be a file called
hardhat.config.js. This file has all the configurations needed for Hardhat to work with the testnets, or even
the mainnet, although this is not recommended.
As shown in Listing 4.2.3 the hardhat.config.js imports some constants that should not be visible to
the public. These are private keys of wallets that are going to be used to deploy the contract or keys to some
APIs.
45
c o n s t PRIVATE KEY = p r o c e s s . env . PRIVATE KEY
c o n s t ETHERSCAN API KEY = p r o c e s s . env . ETHERSCAN API KEY
These environment variables can be defined in a .env file or even in the environment variables at the op-
erating system level. The variable should not depend on the system and should be easily changed without
affecting the system’s functionality.
The imported variables are used to configure the components of Hardhat such as the testnets, the gas
reporter, the accounts, the used Etherscan, the solidity compiler version to be used, and others. In Listing
4.2.4 how the testnet networks are configured to be used by Hardhat.
module . e x p o r t s = {
defaultNetwork : ” hardhat ” ,
networks : {
hardhat : {
/ / from h a r d h a t
/ / p o r t 8545
c h a i n I d : 31337 ,
/ / g a s P r i c e : 130000000000 ,
},
ganache : {
u r l : GANACHE RPC URL ,
c h a i n I d : 1337 ,
},
goerli : {
u r l : GOERLI RPC URL ,
a c c o u n t s : [ PRIVATE KEY ] ,
chainId : 5 ,
blockConfirmations : 3 ,
},
},
...
namedAccounts : {
deployer : {
46
default : 0,
1: 0 , / / mainnet
5: 0 ,
},
},
...
}
As shown in the Listing 4.2.4 there are configured three blockchains to be used in the development and
testing. The first one hardhat comes by default from Hardhat. This is not a testnet, but it behaves like one.
This blockchain is very fast and this is convenable for teh developer. The second one is ganache which
also is not a real testnet but a simulation of it. Under the hood, it doesn’t look like a real blockchain. This
blockchain is also useful for development and it has a GUI (Graphical User Interface) that shows all the
blocks, accounts, transactions and other useful information about the blockchain. The last one that was
used is goerli which is a real testnet. Goerli is a real blockchain that is very similar to the main net just that
it doesn’t use real money. The Goerli testnet is used only for development and testing.
Each blockchain whether it’s the mainnet, a testnet, or a fake blockchain for development, has a
chainId. The chainId is used to differentiate between the many blockchains there are available. In the
previous configuration, the chainId is specified alongside an URL. This URL shows an access point to the
blockchain through RPC (Remote Procedure Call). It is also possible to specify in the configuration the
number of accounts that are going to be used with the blockchain and the number of blocks to be used for
transaction confirmation. Confirming a transaction means that X blocks were mined after a transaction was
created in the blockchain.
Once the Hardhat project has been configured it is possible to write the code for the Smart Contract.
The project has the following file structure as presented in Listing 4.2.5
− artifacts /
− contracts /
− deploy /
− deployments /
− node modules /
− scripts /
− test /
− utils /
47
− . env
− package . json
− hardhat . config . js
...
In the contracts folder are located the Solidity files that are used to develop the Smart Contracts. In these
folders, there is also another folder called test. In the test folder under the contracts folder are located
all the tests for Solidity code. The Smart Contract can be tested as unit tests at the Solidity Code level
and these tests are located in the test directory under the contracts directory. It is also possible to test the
Smart Contracts in integration (integration tests). To write their integration tests it is possible to use any
programming language that has the needed libraries to test the Smart contract that is already developed
on a test net. The integration tests are located in the test directory under the project’s root directory or
a new scripts folder can be created under the root folder that and those scripts can be used to test the
Smart Contract from the exterior. The deploy folder contains JS files that are used to deploy the Smart
Contract on a testnet. These deployment scripts are similar to how database migration is written in SQL.
The deployments directory contains the resulting artifacts after deploying a Smart Contract on a testnet.
In the .env file are specified all the required properties that are injected in the JavaScript scripts and those
properties are used to configure the deployment of the Smart Contract of to configure any library that is used
to get some information about the contract after deployment, for example how much GAS has been spent
for the deployment. In the package.json file are specified all the needed libraries and their versions and in
as explained earlier, in the hardhat.config.js file are specified the configurations for the Hardhat project.
As mentioned, the Smart Contract is developed in a language called Solidity. There are out there
many programming languages to develop Smart Contract. Some of them are even well-known general-
purpose languages such as Python, Rust, etc. Solidity is a niched programming language that is used only
for Smart Contract development. In Listing 4.2.6 is presented the Solidity code for the Smart Contract that
is used for this project and next this code will be explained in more detail.
/ / SPDX− L i c e n s e − I d e n t i f i e r : MIT
pragma s o l i d i t y ˆ 0 . 8 . 7 ;
e r r o r NotOwnerError ( ) ;
c o n t r a c t Ledger {
a d d r e s s p u b l i c immutable i owner ;
48
address [] public users ;
mapping ( a d d r e s s => u i n t 2 5 6 ) p u b l i c s d o n a t o r T o D o n a t e d A m o u n t s ;
mapping ( s t r i n g => a d d r e s s ) p u b l i c s c i d T o A d d r e s s ;
mapping ( a d d r e s s => s t r i n g [ ] ) p u b l i c s a d d r e s s T o O w n e d C i d s ;
e v e n t NewCidRegistered ( a d d r e s s ownerAddress , s t r i n g c i d ) ;
event DonationsWithdrawal ( ) ;
e v e n t NewDonation ( ) ;
constructor () {
i o w n e r = msg . s e n d e r ;
}
/ / m e t h o d s t o p e r f o r m a c t i o n s on t h e s m a r t c o n t r a c t
...
}
The full code of this contract can be found in the Appendix of this paper.
Each Smart Contract should specify its license identifier. For this project, the MIT licenses have
been chosen. In a later section will be explained why this license was chosen and what others are other
licences available. In the case of Solidity, the license should be specified at the first line of the code as a
comment in this way SPDX-License-Identifier: MIT. After the licence was specified it is required to specify
the version of the compiler to be used to compile the Smart Contract. This is done using pragma solidity
followed by the compiler version. For this project, the 0.8.7 version was chosen.
Solidity also supports error types. to define an error type is needed to write the keyword error
followed by the error name starting with a capital letter. This contract uses the NotOwnerError error to
panic the execution of the contract in the case an action is done by someone else other than the owner of the
contract. Recall that the owner of the contract is represented by the wallet address that was used to deploy
the contract. As it will be shown later, there is a function that is used to withdraw all the funds from the
donations and this can be done only by the owner of the contract.
Each contract starts with the contract keyword and then followed the Contract’s name starting with
a capital letter very similar to how classes are named in Java programming language. A contract is similar
to how a class is in Java. It has state and behavior. To mutate the state of the contract requires GAS to be
49
burned, therefore it costs fees. Like in Java, how classes can implement interfaces, contracts in Solidity
can also implement interfaces. Implementing an interface allows the creation of contracts that share similar
behaviors. For example, there is an ERC20 specification that is used to create Ethereum-based tokens. It is
simple to create your own toke by implementing this ERC20 interface.
The state of the contract is one of the most important parts because the contract is used to hold
some information. This information is visible to everyone and cannot be corrupted. For the Ledger Smart
Contract there are a few mapping that is used to hold the information about who is the owner of each CID
of the files stored on the IPFS network. The i owner is an immutable attribute that holds the address of
the wallet that was used to deploy the contract. The users attribute is an array that holds all the wallet
addresses that stored a CID on this Smart Contract. The s cidToAddress is a storage attribute that holds the
information about who is the owner of a given CID. The other attributes are used for similar purposes or for
donations.
Solidity language also supports firing events. Events are used to log information about what hap-
pened in the Smart Contact. For example, it is possible to log all CID registrations, all donations and so on.
In this case, there are three types of events: NewCidRegistered, DonationsWithdrawal and NewDonation.
Events can also hold data. When an event is fired the data is specified and the data is logged in the block
explorer.
Similar to object-oriented programming languages, Solidity Smart Contract have constructors that
are automatically called when the contract is deployed. This constructor should be used to initialise the
state of the contract. In this case, is used to set the owner of the Smart Contract. Only the owner is the one
who can withdraw the donations of the Smart Contract.
In Listing 4.2.7 is presented the function that is used to register a new CID on the Smart Contract’s
state in the blockchain.
f u n c t i o n p u b l i s h C i d ( s t r i n g memory c i d ) p u b l i c {
s c i d T o A d d r e s s [ c i d ] = msg . s e n d e r ;
s a d d r e s s T o O w n e d C i d s [ msg . s e n d e r ] . p u s h ( c i d ) ;
u s e r s . p u s h ( msg . s e n d e r ) ;
e m i t N e w C i d R e g i s t e r e d ( msg . s e n d e r , c i d ) ;
}
The function in Solidity can have access specifiers. This function has the public access specifier because it
should be accessible by anyone. Other access specifiers that can be used are external, internal and private.
The private access modifier is used to specify that a function should be visible only from the contract where
50
it is defined. The external visibility modifier is used to specify that a function can be called externally
only by another contract because the contract can also call other contracts. And finally, the internal access
specifier is used to specify that a function can be called within the contract or from another contract that
inherits from the contract in which the function is defined.
The function takes a string which is the content identifier of the data stored on the Inter Planetary
file system network. This content identifier is stored in the s cidToAddress map to keep track of who is its
owner. Each time a function is called there is a default variable called msg. This is like an object that holds
all the information about the current call to this function. This information consists of the address of the
wallet that was used to call the function, the value that has been sent when calling the function, if any, and
other useful information. After the address of the wallet that called the function was saved in the contract’s
state, the cid is stored in s addressToOwnerCids. This attribute was explained earlier, it is used to store the
CIDs that are owned by a specific wallet/user. Then the users array is registered that this user has used this
contract to store some data. Finally, the NewCidRegistered is fired.
In Listing 4.2.8 is presented as the function that can be used to verify who is the owner of a given
CID.
This function is as simple as it looks. It only returns the address of the wallet that was used to store the
given CID on the Smart Contract. There are some differences if to compare this function with the previous
one that was presented. This function returns a value which is an address. Solidity has a dedicated data type
called address to represent wallet addresses. And this function has a new function modifier called view.
This view is used to define a function that only reads the state of the Smart Contract, it does not change the
state. Therefore it does not cost anything to call this function. Anyone can call this function because it is
public and more than that, it won’t cost anything to check who is the owner of a given CID.
51
This folder is called artifacts and in it will be two folders: build-info and contracts. In the build-info folder
there will be a JSON file with lots of useful information about the build. In the other contracts folder will
be two files for each compiler contract. In the case of this project, there is only one contract called Ledger,
therefore there will be two files: Ledger.dbg.json and Ledger.json. The Ledger.dbg.json is just a JSON
file that point to the file from the build-info directory. But the most important file is the Ledger.json. A
shortened part of this file is represented in Listing 4.2.10.
{
” contractName ” : ” Ledger ” ,
” sourceName ” : ” c o n t r a c t s / L e d g e r . s o l ” ,
” abi ”: [
{
” inputs ”: [] ,
” s t a t e M u t a b i l i t y ”: ” nonpayable ” ,
” type ”: ” const ructor ”
},
{
” inputs ”: [] ,
” name ” : ” N o t O w n e r E r r o r ” ,
” type ”: ” e r r o r ”
},
...
],
” b y t e c o d e ” : ”0 x 6 0 a 0 6 0 4 0 5 2 f f f 6 0 4 0 c 5 7 8 3 0 0 0 8 0 7 0 0 3 3 ” ,
...
}
The JSON format to show all the information about the result of the contract compilation. In this file
can be seen the contract name, the source Solidity file, the bytecode, and other useful data. But the most im-
portant part is the ABI (Application Binary Interface). The ABI is a JSON array that has information about
all the components of the Smart Contract such as the functions, the constructor, the event, the error, and so
on. The ABI can be used to call another Smart Contract from a Smart Contract. In general, the ABI is used
to call a Smart Contract. For example, in Listing 4.2.10 two functions can be noticed. Most of the functions
were removed from the ABI in order to have a simpler example to present. The first function described
52
is the constructor. As shown, the constructor has 0 input parameters and the stateMutability=nonpayable.
Payable functions are the ones that extract some funds from the called when a function is called. The
bytecode is what is saved on the blockchain and this code is run by the EVM.
4.2.3 Deployment
After the ABI is obtained it is possible to deploy it on a devenet, testnet, or even on the mainnet.
To deploy first it is needed to have a deploy script. There is a good Hardhat plugin called hardhat-deploy
that makes the deployment very easy. The deployment scripts are like database migrations. These scripts
should be placed in the deploy folder in the Hardhat project root directory. In the case of Hardhat, the
scripts are written using Javascript and the files should follow a certain pattern, for example, the first script
file to deploy the Ledger is named 01-deploy-ledger. The idea is to name the deployment script starting
with numbers and those numbers should represent the order in which the script should run. Hardhat will
detect which scripts were already run and will run only the ones that are new.
In Listing 4.2.11 is presented the deployment script to deploy the Ledger Smart Contract.
{
...
module . e x p o r t s = a s y n c ( { getNamedAccounts , d e p l o y m e n t s } ) => {
c o n s t { deploy , log } = deployments
c o n s t { d e p l o y e r } = a w a i t g et N a me d A c co u n ts ( )
const c h a i n I d = network . c onfi g . c h a i n I d
l o g ( ” D e p l o y i n g L e d g e r and w a i t i n g f o r c o n f i r m a t i o n s . . . ” )
c o n s t l e d g e r = await deploy (” Ledger ” , {
from : d e p l o y e r ,
args : [] ,
log : true ,
/ / we n e e d t o w a i t i f on a l i v e n e t w o r k s o we c a n v e r i f y
properly
w a i t C o n f i r m a t i o n s : network . c onf ig . blockConfirmations | | 1 ,
})
l o g ( ‘ L e d g e r d e p l o y e d a t ${ l e d g e r . a d d r e s s } ‘ )
53
}
}
...
}
This small script takes the contract Solidity file from contracts folder by the name. The deployers variable
is the that shows what wallet to be used to deploy the Smart Contract. This comes from the hardhat.config.js
where it’s specified what wallet to be used to deploy the contract on a specific network as shown in Listing
4.2.12.
...
namedAccounts : {
deployer : {
default : 0,
1: 0 , / / mainnet
5: 0 ,
},
},
...
As it can be seen from the listing above, it is possible to specify which wallet to be used when the deploy-
ment command is run. On the left side is the chainId and on the right side is the index on the wallet from
the list of wallets that were defined for each blockchain network separately. If no account was specified
then the default one will be the first wallet from the list of wallets.
54
To implement SPA SvelteKit was used because it is a JavaScript library that is easy to use and solves
many problems of the older libraries such as ReactJs and Angular. Some of those problems are related to
component state management. To implement the encryption using AES was used a JavaScript library called
crypto-js. The package.json file where all the libraries used are shown is presented in Listing 4.3.1
...
” devDependencies ” : {
” @neoconfetti / s v e l t e ”: ”ˆ1.0.0” ,
” @sveltejs / k i t ”: ” next ” ,
...
” svelte ”: ”ˆ3.46.0” ,
” vite ”: ”ˆ3.1.0”
},
” t y p e ” : ” module ” ,
” dependencies ”: {
” @sveltejs / adapter − s t a t i c ”: ”ˆ1.0.0 − next .48” ,
” bootstrap ”: ”ˆ5.2.2” ,
” buffer ”: ”ˆ6.0.3” ,
” crypto − j s ”: ” ˆ 4 . 1 . 1 ” ,
” encrypt − uint8array ”: ” ˆ 1 . 0 . 0 ” ,
” ethers ”: ”ˆ5.7.2” ,
” ipfs −core ”: ”ˆ0.17.0” ,
” ipfs −http − c l i e n t ”: ”ˆ59.0.0”
}
These are some of the most important libraries that are used to create the SPA. adatper-static is used to
build the static files from the Svelte source code. These static files can be uploaded on the IPFS network
and since all the SPA will be represented in pre-built static files it will be possible to access the application
directly from the IPFS network. The buffer library is used to convert any file in a buffer of bytes. These
bytes are then encrypted using AES and uploaded on the IPFS network. The ethers library is one of the
most important. It is used to interact with the Smart Contract, which means calling the contract functions.
And ipfs-core is used to interact with the IPFS network in order to upload or download files to and from it.
The client application is very simple and does all the needed things. If offers the possibility to
upload a file on the IPFS network, then get the CID of that file, then it registered the CID in the Ledger.
55
The cline app also offers the ability to download a file from the IPFS network by its CID. These two are the
main functionalities that are supported, but there are more. In Listing 4.3.2 is presented the function that
takes a file specified by the user, then encrypts it and stores in on the IPFS network.
c o n s t p u b l i s h F i l e = a s y n c ( ) => {
// validations ...
c o n s t r e s u l t = a w a i t i p f s . add ( e n c r y p t e d )
const cid = r e s u l t . path
The encrypt function converts the file into a buffer of bytes and then encrypted those bytes using Cryp-
toJS.AES.encrypt(wordArray, key).toString(). In the end, the array of bytes is converted into a string and
then saved to the IPFS network.
4.4 License
There are many types of licenses that can be used to protect a project. The types of licenses can be
divided into two categories. The first category is the proprietary licenses that describe a proprietary type of
product and those licenses mostly protect a company’s right to its products. The other type of license is a
license that is used for open-source projects.
There are many types of licenses for open-source projects and the most known and most used are
the following:
56
– GNU Library or ”Lesser” General Public License (LGPL);
– MIT license;
This project is a licensed user of the MIT license. This license is the least restricted one and gives
to the other lots of possibilities. The source code is open for everyone and it’s free to use. One can even use
this code to make it proprietary software.
Implementation Conclusions
The chapter explains how the system is implemented as a prototype that demonstrates that it is
possible to have an independent platform on which people can store sensitive data without the fear of losing
it and that someone could get the data. Most of the technologies that were used to implement the system
were listed and explained why those were chosen.
Solidity is the language of choice when it comes to writing Smart Contract. This language was
invented for contract development and it has all the needed features to write a contract. JavaScript is a
language that always has libraries for almost anything and of course, it is one of the first languages that got
tools to facilitate contract development. On the client side was used also JavaScript since it is the dominant
language to write single-page applications for the browser. Hardhat is the framework of choice because it
has a built-in fake blockchain that makes the testing of the Smart Contract easier.
57
CONCLUSIONS
The internet has a more and more important role in people’s lives and lots of sensitive data is stored
on platforms that are proprietary. That means that the user should entrust their data to those companies.
Unfortunately, there were already cases when people’s data was sold and this can cause big problems
related to violation of human rights.
This thesis project demonstrates with a functional prototype that it is possible to store sensitive data
on a platform that is independent and which assures that the data will not be lost. Having a system that is
decentralized and open-source increases people’s trust in it. A decentralized platform means that it is hosted
and run collaboratively by those who use it or who want to contribute to the long liveness of the project. In
the last, approximately 10 years have appeared technologies that are useful especially for such cases of data
protection, equality, and transparency. Some of those technologies are blockchain and IPFS (Interplanetary
file system). The blockchain is used for decentralized computation and keeping immutable data, especially
sensitive data while IPFS is used for storing big files on a decentralized network.
During the work on this thesis project was implemented a prototype that allows to upload and down-
load of encrypted files on the IPFS network. For this, the user has to have crypto wallets managed through
a wallet manager such as MetaMask. The prototype also offers the possibility to check who is the owner of
a given data and what data is owned by a given crypto wallet address. Even if it is possible to see what data
a user has uploaded on the IPFS network, it is not possible to read that data because it is encrypted before
uploading it using the client application.
The project has the goal to show an idea that works. It is licensed under the MIT license which
means that it is open source and free for everyone. Everyone is allowed to take this prototype and expand
on it by making it a real product that protects people’s data.
58
Bibliography
1. 51% attack. [online] [accessed 10.10.2022].
Available: https://en.bitcoinwiki.org/wiki/51%25_attack.
59
15. Smart Contracts [online] [accessed 11.10.2022].
Available: https://en.wikipedia.org/wiki/Smart_contract.
20. The Complete Open-Source and Business Software Platform [online] [accessed 12.11.2022].
Avaialble: https://sourceforge.net/.
60
Appendix A
The Smart Contract
/ / SPDX− L i c e n s e − I d e n t i f i e r : MIT
pragma s o l i d i t y ˆ 0 . 8 . 7 ;
e r r o r NotOwnerError ( ) ;
c o n t r a c t Ledger {
a d d r e s s p u b l i c immutable i owner ;
address [] public users ;
mapping ( a d d r e s s => u i n t 2 5 6 ) p u b l i c s d o n a t o r T o D o n a t e d A m o u n t s ;
mapping ( s t r i n g => a d d r e s s ) p u b l i c s c i d T o A d d r e s s ;
mapping ( a d d r e s s => s t r i n g [ ] ) p u b l i c s a d d r e s s T o O w n e d C i d s ;
e v e n t NewCidRegistered ( a d d r e s s ownerAddress , s t r i n g c i d ) ;
event DonationsWithdrawal ( ) ;
e v e n t NewDonation ( ) ;
constructor () {
i o w n e r = msg . s e n d e r ;
}
f u n c t i o n p u b l i s h C i d ( s t r i n g memory c i d ) p u b l i c {
s c i d T o A d d r e s s [ c i d ] = msg . s e n d e r ;
s a d d r e s s T o O w n e d C i d s [ msg . s e n d e r ] . p u s h ( c i d ) ;
u s e r s . p u s h ( msg . s e n d e r ) ;
e m i t N e w C i d R e g i s t e r e d ( msg . s e n d e r , c i d ) ;
}
f u n c t i o n g e t P u b l i s h e d C i d s ( ) p u b l i c view r e t u r n s ( s t r i n g [ ] memory ) {
r e t u r n s a d d r e s s T o O w n e d C i d s [ msg . s e n d e r ] ;
}
61
function getPublishedCidsByUser ( address userAddress )
public
view
r e t u r n s ( s t r i n g [ ] memory )
{
r e t u r n s addressToOwnedCids [ userAddress ] ;
}
f u n c t i o n w i t h d r a w ( ) p u b l i c onlyOwner {
emit DonationsWithdrawal ( ) ;
( b o o l s u c c e s s , ) = p a y a b l e ( msg . s e n d e r ) . c a l l {
value : address ( t h i s ) . balance
}(””) ;
require ( success , ” Call f a i l e d !” ) ;
}
/ / c a l l e d when no c a l l d a t a i s s p e c i f i e s s
receive ( ) external payable {
s d o n a t o r T o D o n a t e d A m o u n t s [ msg . s e n d e r ] += msg . v a l u e ;
e m i t NewDonation ( ) ;
}
/ / c a l l e d when t h e f u n c t i o n from c a l l d a t a i s n o t f o u n d
fallback ( ) external payable {
i f ( msg . v a l u e > 0 ) {
revert (” Fallback ”) ;
}
}
62
m o d i f i e r onlyOwner ( ) {
i f ( msg . s e n d e r ! = i o w n e r ) {
r e v e r t NotOwnerError ( ) ;
}
;
}
}
63