| Name | Student Number | Email Address |
|---|---|---|
| Swapnil Patel | 99728870 | Swap.patel@mail.utoronto.ca |
| Hanxiao Chang | 1006341709 | Hanxiao.chang@mail.utoronto.ca |
| Mohammad Hooman Keshvari | 1011293869 | Hooman.keshvari@mail.utoronto.ca |
- Swapnil Patel: Chunk server, cluster management, documentation
- Hanxiao Chang: Client utils, Heartbeat, Interfaces, Demo
- Mohammad Hooman Keshvari: Master Node Logic and Implementation, Benchmarking
Our Distributed File System (DFS) is a system that allows multiple computers to share a common file system, making data accessible and manageable across a network of interconnected machines. It provides a way to store, access, and manage files across various servers or nodes in a distributed manner. The main objectives of a DFS include:
1. Centralized Management: Despite data being distributed, the DFS offers a unified view of files, allowing users and applications to interact with them as if they were on a local machine.
2. Scalability: It can scale easily by adding more servers or nodes, improving performance and fault tolerance.
3. Redundancy and Fault Tolerance: Files can be replicated across multiple servers to enhance reliability and availability. In case of a server failure, other copies can be accessed.
4. Efficiency: It optimizes data storage, access speed, and resource utilization by distributing files across multiple nodes.
DFS is commonly used in cloud storage, big data processing, and content delivery networks, where access to large volumes of data from multiple locations is essential.
The project focuses on designing a distributed file storage system to deliver reliable, efficient, and secure data services, essential for modern data-driven applications. Inspired by the large-scale data management needs of internet corporations and the rapid growth of big data and AI technologies, the system addresses the limitations of traditional databases while aligning with contemporary trends.
- Reliability: The system ensures high availability through data replication across multiple nodes, enabling quick fault recovery and efficient load balancing, critical for mission-critical applications.
- Flexibility: Leveraging a master-slave architecture, the system supports horizontal scalability and diverse data formats (structured, semi-structured, and unstructured), making it adaptable to a wide range of applications.
- Efficiency: Optimized data I/O operations minimize latencies and balance network workloads, meeting the needs of resource-constrained applications.
- Security: Distributed architecture reduces risks of breaches while advanced encryption and access control mechanisms further safeguard data integrity and reliability.
- Big Data and Machine Learning: The system supports large-scale data analysis and incorporates machine learning for fault tolerance, data recovery, and anomaly detection, enhancing system resilience and security.
- Importance of Rust: Unlike traditional implementations in Java or Golang (e.g., HDFS, GFS), Rust provides superior performance, memory safety, and thread safety. Its ownership model, async runtime support, and efficient concurrency management make it particularly suited for distributed systems.
By addressing the challenges of data consistency and development complexity, the proposed system aims to fill a critical gap in Rust’s ecosystem and contribute to advancing distributed storage technologies.
Our DFS is inspired by the Google File System (GFS) architecture, which is a distributed file system designed to store and manage large volumes of data across multiple servers. The GFS architecture consists of three main components:
1. Master Server: The master server is responsible for managing the metadata of the file system, such as file locations, access permissions, and replication policies. It keeps track of the file system's state and coordinates operations across multiple servers.
2. Chunk Servers: The chunk servers store the actual data in the form of fixed-size chunks. Each chunk is replicated across multiple chunk servers to ensure data availability and reliability. The chunk servers are responsible for storing, replicating, and serving data to clients. (in progress)
3. Client: The client parses user input to correspondingly perform file and directory operations including reading, writing, updating, and deleting files. The client communicates with the master server for chunk information, and then interacts with the chunk servers with binary data.
The code structure for the above binaries is shown below:
├── launch_dfs.sh
├── Metadata_Example
├── PROPOSAL.md
├── README.md
└── src
├── chunk
│ ├── chunk_manager.rs
│ ├── chunk.rs
│ └── heartbeat_manager.rs
├── client
│ ├── chunk_client.rs
│ ├── main.rs
│ └── master_client.rs
├── lib.rs
├── master
│ ├── chunk_manager.rs
│ ├── heartbeat_manager.rs
│ ├── log_manager.rs
│ ├── main.rs
│ ├── namespace_manager.rs
│ └── safe_map.rs
└── shared
├── log_manager.rs
├── master_chunk_utils.rs
├── master_client_utils.rs
└── mod.rsBuilding the project will generate three binaries:
- master: The master server that manages the metadata of the file system.
- chunk_server: The chunk server that stores and serves data chunks.
- client: The client application that interacts with the master server and chunk servers to perform file operations.
Currently the Client allows user to interact with the DFS using a command-line interface. However, the client can be wrapped in a GUI to provide a more user-friendly experience for future work.
The core features of the DFS include:
1. File Operations: The DFS supports basic file operations such as creating, reading, writing, and deleting files. Clients can interact with the file system using a REST API.
2. Chunk Management: The DFS stores data in fixed-size chunks and replicates them across multiple chunk servers to ensure data availability and reliability.
3. Fault Tolerance: The DFS is designed to handle server failures gracefully by replicating data across multiple servers and maintaining multiple copies of each chunk. (in progress)
4. Scalability: The DFS can scale horizontally by adding more chunk servers to store additional data and improve performance.
cargo build --release DFS can be launched on a local machine for testing and development purposes. The launch_dfs.sh script can be used to start the master and chunk servers on the local machine. The script takes the number of chunk servers as an argument and starts the master server and the specified number of chunk servers.
launch_dfs.sh <number_of_nodes>Once the cluster is up and running, you can interact with the DFS using the client application.
The client uses a command-line interface. Below is the general syntax:
./client --username <USERNAME> --password <PASSWORD> --target <TARGET> --action <ACTION> [--local-path <LOCAL_PATH>] [--remote-path <REMOTE_PATH>]
| Parameter | Description |
|---|---|
--username |
Username for authentication. |
--password |
Password for authentication. |
--target |
Target type for the operation: file or directory (short forms: f, d). |
--action |
Action to perform: create, read, update, or delete (short forms: c, r, u, d). |
--local-path |
Path to a local file or directory (used for create or update operations). |
--remote-path |
Path to a remote file or directory on the Master Server. |
Every command requires valid user credentials (--username and --password). The client will authenticate the user with the Master Server before executing any operation.
-
Description: Uploads a local file to the Master Server and distributes its chunks to Chunk Servers.
-
Command:
./client --username <USERNAME> --password <PASSWORD> --target file --action create --local-path <LOCAL_PATH> --remote-path <REMOTE_PATH> -
Example:
./client --username alice --password secure123 --target file --action create --local-path ./example.txt --remote-path /remote/example.txt
-
Description: Downloads a file from the Chunk Servers and saves it locally.
-
Command:
./client --username <USERNAME> --password <PASSWORD> --target file --action read --local-path <LOCAL_PATH> --remote-path <REMOTE_PATH> -
Example:
./client --username alice --password secure123 --target file --action read --local-path ./downloaded.txt --remote-path /remote/example.txt
-
Description: Replaces the contents of a remote file with a local file.
-
Command:
./client --username <USERNAME> --password <PASSWORD> --target file --action update --local-path <LOCAL_PATH> --remote-path <REMOTE_PATH> -
Example:
./client --username alice --password secure123 --target file --action update --local-path ./updated.txt --remote-path /remote/example.txt
-
Description: Deletes a remote file.
-
Command:
./client --username <USERNAME> --password <PASSWORD> --target file --action delete --remote-path <REMOTE_PATH>
-
Description: Creates a new directory on the Master Server.
-
Command:
./client --username <USERNAME> --password <PASSWORD> --target directory --action create --remote-path <REMOTE_PATH> -
Example:
./client --username alice --password secure123 --target directory --action create --remote-path /remote/new_directory
-
Description: Lists the contents of a remote directory.
-
Command:
./client --username <USERNAME> --password <PASSWORD> --target directory --action read --remote-path <REMOTE_PATH> -
Example:
./client --username alice --password secure123 --target directory --action read --remote-path /remote/new_directory
-
Description: Deletes a remote directory.
-
Command:
./client --username <USERNAME> --password <PASSWORD> --target directory --action delete --remote-path <REMOTE_PATH>
The DFS provides a REST API for clients to interact with the system. The API allows clients to perform various operations such as uploading, downloading, deleting, and listing files. The API is implemented using the Rocket framework, which is a lightweight, high-performance web framework for Rust.
-
Method:
POST -
Description: Registers a new user by adding their credentials to the system. If the user already exists, an error response is returned.
-
Parameters:
user: A JSON object containing the new user's details.username: A string representing the username of the new user.password: A string representing the password for the new user.
-
Request Example:
curl -X POST "http://<base_url>/user/register" -H "Content-Type: application/json" -d '{"username": "exampleuser", "password": "securepassword"}'
-
Success Response:
- Code: 200
-
Error Responses:
- 400 Bad Request: If the request is malformed or missing required parameters.
- 409 Conflict: If the user already exists in the system.
-
Method:
POST -
Description: Authenticates a user by checking the provided credentials against the stored user data. If the user is not found or the password does not match, an error response is returned.
-
Parameters:
user: A JSON object containing the user's credentials.username: A string representing the username of the user attempting to log in.password: A string representing the password of the user attempting to log in.
-
Request Example:
curl -X POST "http://<base_url>/user/login" -H "Content-Type: application/json" -d '{"username": "exampleuser", "password": "securepassword"}'
-
Success Response:
- Code: 200
- Content: A JSON object containing the user's authentication token.
-
Error Responses:
- 400 Bad Request: If the request is malformed or missing required parameters.
- 401 Unauthorized: If the user is not found or the password does not match.
-
Method:
POST -
Description: Creates a new file at the specified path in the system. If the file already exists or an error occurs during creation, an appropriate error response is returned.
-
Parameters:
path: A string representing the path where the new file should be created.
-
Request Example:
curl -X POST "http://<base_url>/file/create?path=/path/to/file" -H "Content-Type: application/json"
-
Method:
GET -
Description: Reads a specific chunk of a file from the system. The function retrieves the chunk information based on the file path.
-
Parameters:
path: A string representing the path to the file.
-
Request Example:
curl -X GET "http://<base_url>/file/read?path=/path/to/file"
-
Method:
GET -
Description: Reads all chunks of a file from the system. The function retrieves the chunk information based on the file path.
-
Parameters:
path: A string representing the path to the file.
-
Request Example:
curl -X GET "http://<base_url>/file/read/all?path=/path/to/file"
-
Method:
POST -
Description: Updates an existing file in the system by writing new data chunks. This function is used to modify the file's content or append new data. It returns the updated list of data chunks.
-
Parameters:
path: A string representing the path to the file.size: An integer indicating the size of the file after the update.
-
Request Example:
curl -X POST "http://<base_url>/file/update?path=/path/to/file&size=1024"
-
Method:
GET -
Description: Deletes a file from the system at the specified path. This action removes the file and all its associated data permanently.
-
Parameters:
path: A string representing the path to the file that needs to be deleted.
-
Request Example:
curl -X GET "http://<base_url>/file/delete?path=/path/to/file"
-
Method:
POST -
Description: Creates a new directory at the specified path in the file system. If a directory with the given path already exists, an error response will be returned.
-
Parameters:
path: A string representing the path where the new directory should be created.
-
Request Example:
curl -X POST "http://<base_url>/dir/create?path=/path/to/directory"
-
Method:
GET -
Description: Reads and returns the list of directories and files at the specified path in the file system.
-
Parameters:
path: A string representing the path of the directory to read.
-
Request Example:
curl -X GET "http://<base_url>/dir/read?path=/path/to/directory"
-
Method:
POST -
Description: Deletes a directory at the specified path from the file system.
-
Parameters:
path: A string representing the path of the directory to delete.
-
Request Example:
curl -X POST "http://<base_url>/dir/delete?path=/path/to/directory"
- Description: Adds a chunk to the chunk manager. This endpoint expects a POST request with binary data as the body of the request and allows specifying a UUID to associate with the chunk.
- Parameters:
chunk_id: The UUID of the chunk to be added.data: The binary data of the chunk.
- Example Request:
-
curl -X POST "http://127.0.0.1:8100/add_chunk?id=<UUID>" \ -H "Content-Type: application/octet-stream" \ --data-binary @example.bin
-
- Error Responses:
- 400 Bad Request: If the request is malformed or missing required parameters.
- 413 Payload Too Large: If the chunk size exceeds the maximum allowed size.
- Description: Updates a chunk in the chunk manager. This endpoint expects a POST request with binary data as the body of the request and allows specifying a UUID to associate with the chunk.
- Parameters:
chunk_id: The UUID of the chunk to be updated.data: The binary data of the chunk.
- Example Request:
-
curl -X POST "http://127.0.0.1:8100/update_chunk?id=<UUID>" \ -H "Content-Type: application/octet-stream" \ --data-binary @example.bin
-
- Error Responses:
- 400 Bad Request: If the request is malformed or missing required parameters.
- 413 Payload Too Large: If the chunk size exceeds the maximum allowed size.
- Description: Retrieves a chunk from the chunk manager. This endpoint expects a GET request with the UUID of the chunk to be retrieved.
- Parameters:
chunk_id: The UUID of the chunk to be retrieved.
- Example Request:
-
curl -X GET "http://127.0.0.1:8100/get_chunk?id=<UUID>" --output chunk_output.bin
-
- Error Responses:
- 400 Bad Request: If the request is malformed or missing required parameters.
- 404 Not Found: If the chunk with the specified UUID does not exist.
- Description: Retrieves a list of all chunks stored in the chunk manager. This endpoint expects a GET request without any parameters.
- Example Request:
-
curl -X GET "http://127.0.0.1:8100/get_chunk_list"
-
- Success Response:
- Code: 200
- Content: A JSON array containing the UUIDs of all chunks stored in the chunk manager.
- Example Content:
["UUID1", "UUID2", "UUID3"]
- Error Responses:
- 400 Bad Request: If the request is malformed or contains invalid parameters.
- Description: Deletes a chunk from the chunk manager. This endpoint expects a DELETE request with the UUID of the chunk to be deleted.
- Parameters:
chunk_id: The UUID of the chunk to be deleted.
- Example Request:
-
curl -X GET "http://127.0.0.1:8100/delete_chunk?id=<UUID>"
-
- Error Responses:
- 400 Bad Request: If the request is malformed or missing required parameters.
- 404 Not Found: If the chunk with the specified UUID does not exist.
The project includes a comprehensive benchmarking tool for measuring and visualizing the performance characteristics of the DFS. This tool helps in understanding the system's behavior under various concurrent load conditions and can be used to optimize system configuration.
- Concurrent performance testing with configurable thread counts
- Multiple test repetitions for statistical significance
- Automated test file generation and cleanup
- Performance visualization through graphs
- Detailed CSV outputs with raw data and statistical summaries
- Support for both read and write operations testing
- Python 3.8+
- Required Python packages:
pip install pandas matplotlib
Run the benchmark with default settings:
python dfs_benchmark.py \
--username your_username \
--password your_password \
--client-path /path/to/dfs/clientConfigure the benchmark with additional parameters:
python dfs_benchmark.py \
--username your_username \
--password your_password \
--client-path /path/to/dfs/client \
--max-threads 16 \
--file-size 1 \
--duration 30 \
--repeats 3 \
--output custom_benchmark_name| Parameter | Description | Default |
|---|---|---|
--username |
Username for DFS authentication | Required |
--password |
Password for DFS authentication | Required |
--client-path |
Path to DFS client binary | Required |
--max-threads |
Maximum number of concurrent threads to test | 16 |
--file-size |
Size of test files in MB | 1 |
--duration |
Duration of each test in seconds | 30 |
--repeats |
Number of times to repeat each test | 3 |
--output |
Prefix for output files | dfs_benchmark |
The benchmark generates several output files:
-
Raw Data:
{prefix}_raw_data.csv- Contains all individual test results
- Columns: threads, operation, repeat, throughput_mbps, latency_ms, success_rate, total_operations
-
Summary Statistics:
{prefix}_summary.csv- Statistical summary of all tests
- Includes mean, standard deviation, min, and max values
- Grouped by thread count and operation type
-
Performance Graphs:
{prefix}_throughput_mbps.png: Throughput over thread counts{prefix}_latency_ms.png: Latency over thread counts{prefix}_success_rate.png: Success rate over thread counts{prefix}_total_operations.png: Total operations over thread counts
- Measured in MB/s
- Higher values indicate better performance
- Look for the point where throughput plateaus to identify optimal thread count
- Measured in milliseconds (ms)
- Lower values indicate better performance
- Watch for sharp increases which might indicate system saturation
- Measured as percentage (%)
- Higher values indicate better reliability
- Sharp drops might indicate system overload
- Raw count of operations completed
- Higher values indicate better performance
- Should scale with thread count until system saturation
-
"File already exists" error
- The tool automatically creates unique file names for each operation
- Check for leftover test files from previous runs
-
Missing columns in output
- Verify that the DFS client is outputting metrics in the expected format
- Check the debug output for parsing errors
-
Connection errors
- Ensure the DFS cluster is running and accessible
- Verify the client path and credentials are correct
We could see the results of our filesystem for 1, 2 and 4 concurrent threads in read/write mode.
As seen in the graphs our filesystem is scalable both for read and write operations.
The project's progress has exceeded all our expectations. It successfully demonstrated the feasibility and huge potential of building a distributed file system using emerging programming languages like Rust. The system’s architecture, as inspired by the Google File System, combines centralized metadata management along with distributed file storage, and eventually delivers a scalable and robust solution. Looking ahead, several aspects require improvement:
- System fault-tolerance: Further persistent storage is needed across all three components to ensure safe recovery from system failures.
- Data transmission: Regardless of the convenience of Rest APIs, JSON serialization appears less efficient than other alternatives such as gRPC, which however does not currently support Rust language.
- Centralization: The DFS currently heavily relies on the client to relay communication between the master and chunk servers. The master would have better control of the system by instructing the chunk servers directly.
1. Ghemawat, S., Gobioff, H., & Leung, S. T. (2003). The Google File System. In ACM SIGOPS Operating Systems Review (Vol. 37, No. 5, pp. 29-43).
2. Dean, J., & Ghemawat, S. (2004). MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation - Volume 6 (pp. 137-150).
3. Rust official documentation: https://doc.rust-lang.org/
4. Rocket framework for Rust: https://rocket.rs/



