Lab Sheet 06 - Introduction To NoSQL Databases Using MongoDB
Lab Sheet 06 - Introduction To NoSQL Databases Using MongoDB
MongoDB
The objective of this lab is to introduce some features of non-relational or NoSQL databases
using MongoDB. MongoDB stores data in JSON objects which it calls documents and uses a
custom language for queries.
Installation
Option 1:
1. Set up a free cluster on Mongo Atlas by following the instructions here:
https://docs.atlas.mongodb.com/tutorial/deploy-free-tier-cluster/
2. Install the mongo shell on your machine and connect to your cluster following the
instructions on the dashboard.
Option 2:
1. Download and setup MongoDB for your OS following these instructions
https://www.mongodb.com/download-center/community
2. Start the server. Then use mongo shell to connect your server:
https://docs.mongodb.com/manual/mongo/
Preparation
The instructions below assume you have MongoDB installed and started on your local
machine. For Atlas, create a database and a user by following the instructions on the
dashboard.
1. Once you have installed and started the Mogodb you can log into your server as root.
mongo -u root
2. To create a new database do
use mongolab
3. Now create a new user to access the database.
Useful references
1. This shows the mapping between SQL and MongoDB concepts and syntax:
https://docs.mongodb.com/manual/reference/sql-comparison/
2. If you need more help there are plenty of introductory tutorials available like this one:
https://www.sitepoint.com/an-introduction-to-mongodb/
3. If you prefer a guided introduction you can go through the material in this free course:
MongoDB Basics | M001
Exercises
We'll use an e-commerce business with 3 main entities whose definitions are given below.
Entity Example
Customer {
represents a person making a "id": "1",
purchase "firstName": "Aruna",
"lastName": "Silva",
"phoneNumber": "555-555-1212",
"email": "xyz@abc.io"
}
Product {
represents an item being sold "id": "1",
"name": "Chocolate",
"price": 2.99,
"productId": "123abc"
}
Transaction {
represents the purchase of a number "id": "1",
of products by a customer. "productId": "1",
"customerId": "1",
"payment": "cash",
"amount": 20.00
}
To create the customer collection with a custom data validation function, enter the following
code.
db.createCollection("customers", {
validator: {
$and: [
{
"firstName": {$type: "string", $exists: true}
},
{
"lastName": { $type: "string", $exists: true}
},
{
"phoneNumber": {
$type: "string",
$exists: true,
$regex: /^[0-9]{3}-[0-9]{3}-[0-9]{4}$/
}
},
{
"email": {
$type: "string",
$exists: true
}
}
]
}
})
Note how the data validator specifies the rules for each field using the MongoDB query
syntax (the id field is added to each document automatically.)
Now write code to create collections for Product and Transaction implementing the following
validation rules.
1. Product: all fields are required and must have the proper type. Price must be greater
than 0.
2. Transaction: all fields are required and must have the proper type. Payment should
be one of “cash_on_delivery”, “credit_card” or “debit_card”. Amount must be greater
than 0.
For this exercise, first import the data provided in exercise2.json to a collection named
blog using the following command.
mongoimport -u e14xxx -d mongolab -c blog --jsonArray exercise2.json
The data consists of a set of blog post metadata: title, author, date, likes and tags. Let’s say
we want to aggregate the total likes that each author has. We can do this using MongoDB
map-reduce as follows:
db.blog.mapReduce(
function() { emit(this.author, this.likes) },
function(author, likes) { return Array.sum(likes) },
{out: "total_likes"})
To use map-reduce we define three parameters.
1. The map function to process each input document: In the function, this refers to the
document that the map-reduce operation is processing. We use emit to return the
processed value.
2. The corresponding reduce function with two parameters key and value. The key is
used to group records. The value is an array of values corresponding to that
particular key. The reduce function returns something computed by aggregating the
values.
3. The third parameter is used to specify the output collection name and a query that is
used to filter the input records.