MOnocular Depth Estimation and SegmenTation (MODEST) Museum Dataset

Link to Dataset: https://www.kaggle.com/shanwizard/modest-museum-dataset

Directory Structure

The dataset comprises seven parts, which are stored inside the root directory.

bg: (d) Background images
fg: (d) Foreground images
fg_mask: (d) Mask of foreground images
bg_fg: (d) Images where foregrounds are overlayed on top of backgrounds
bg_fg_mask: (d) Mask of overlayed foreground-background images
bg_fg_depth_map: (d) Depth maps of overlayed foreground-background images
file_map.txt: (f) Relations between files across different parts of the dataset
bbox.txt: (f) Bounding box of foreground in each background

*(d): Directory, (f): File

dataset
    ├── bg
    |   ├── bg001.jpeg
    |   ├── bg002.jpeg
    |   ├── ...
    |   ├── bg100.jpeg
    ├── fg
    |   ├── fg001.png
    |   ├── fg002.png
    |   ├── ...
    |   └── fg100.png
    ├── fg_mask
    |   ├── fg001_mask.png
    |   ├── fg002_mask.png
    |   ├── ...
    |   └── fg100_mask.png
    ├── bg_fg
    |   ├── bg001_fg001_01.jpeg
    |   ├── bg001_fg001_02.jpeg
    |   ├── ...
    |   └── bg100_fg100_40.jpeg
    ├── bg_fg_mask
    |   ├── bg001_fg001_01_mask.jpeg
    |   ├── bg001_fg001_02_mask.jpeg
    |   ├── ...
    |   └── bg100_fg100_40_mask.jpeg
    ├── bg_fg_depth_map
    |   ├── bg001_fg001_01_depth_map.jpeg
    |   ├── bg001_fg001_02_depth_map.jpeg
    |   ├── ...
    |   └── bg100_fg100_40_depth_map.jpeg
    ├── file_map.txt
    └── bbox.txt

Content Description

bg

This directory contains images of museum interiors. These are called background images.

Image Size: 224x224x3
Number of Images: 100
Naming Convention: bg001.jpeg, bg002.jpeg, ..., bg100.jpeg

fg

This directory contains images of humans with transparent background. These are called foreground images.

Image Height: 108 (width will depend upon the aspect ratio of each image)
Number of Channels: 4
Number of Images: 100
Naming Convention: fg001.png, fg002.png, ..., fg100.png

fg_mask

This directory contains mask of foregrounds. These are called foreground masks.

Image Height: 108
Number of Channels: 1
Number of Images: 100
Naming Convention: fg001_mask.png, fg002_mask.png, ..., fg100_mask.png
- Example: The mask fg001_mask.png corresponds to the foreground fg001.png

bg_fg

This directory contains images where foregrounds are overlayed on backgrounds. These are called background-foreground images.

Image Size: 224x224x3
Number of Images: 400,000
Naming Convention: bg001_fg001_01.jpeg, bg001_fg001_02.jpeg, ..., bg100_fg100_40.jpeg
- Example: The images ranging from bg001_fg001_01.jpeg to bg001_fg001_40.jpeg correspond to all the variations of the foreground fg001.png overlayed on background bg001.jpeg

bg_fg_mask

This directory contains mask of background-foreground images. These are called background-foreground masks.

Image Size: 224x224x1
Number of Images: 400,000
Naming Convention: bg001_fg001_01_mask.jpeg, bg001_fg001_02_mask.jpeg, ..., bg100_fg100_40_mask.jpeg
- Example: The mask bg001_fg001_01_mask.jpeg corresponds to the background-foreground bg001_fg001_01.jpeg

bg_fg_depth_map

This directory contains depth map of background-foreground images. These are called background-foreground depth maps.

Image Size: 224x224x1
Number of Images: 400,000
Naming Convention: bg001_fg001_01_depth_map.jpeg, bg001_fg001_02_depth_map.jpeg, ..., bg100_fg100_40_depth_map.jpeg
- Example: The depth map bg001_fg001_01_depth_map.jpeg corresponds to the background-foreground bg001_fg001_01.jpeg

file_map.txt

This file contains the mapping between images across different parts of the dataset.

Each line in the file contains 4 entries separated by \t where entries are ordered as background, background-foreground, background-foreground mask, background-foreground depth map.
Example: The line
bg029 bg029_fg010_18 bg029_fg010_18_mask bg029_fg010_18_depth_map
would mean
- Background: bg029
- Background-Foreground: bg029_fg010_18
- Background-Foreground Mask: bg029_fg010_18_mask
- Background-Foreground Depth Map: bg029_fg010_18_depth_map

To see an overview of the contents of the file, go here.

bbox.txt

This file contains the bounding box (bbox) top-left coordinate and bbox size of the foreground in each background image.

Each line in the file contains 5 entries separated by \t where entries are ordered as background-foreground, bbox top left x-coordinate, bbox top left y-coordinate, bbox width, bbox-height.
Example: The line
bg050_fg084_17.jpeg 90 70 39 108
would mean
- Background-Foreground: bg029_fg010_18
- bbox top left x-coordinate: 90
- bbox top left y-coordinate: 70
- bbox top left width: 39
- bbox top left height: 108

To see an overview of the contents of the file, go here.

Statistics

Dataset Size: 6.6 GB
Number of Text Files: 2
Number of Images: 1,200,100
Image Types and their Statistics
- Backgrounds
  - Image Size: 224x224x3
  - Number of Images: 100
  - Mean: (0.40086, 0.46599, 0.53281)
  - Standard Deviation: (0.25451, 0.24249, 0.23615)
- Background-Foregrounds
  - Image Size: 224x224x3
  - Number of Images: 400,000
  - Mean: (0.41221, 0.47368, 0.53431)
  - Standard Deviation: (0.25699, 0.24577, 0.24217)
- Background-Foreground Masks
  - Image Size: 224x224x1
  - Number of Images: 400,000
  - Mean: 0.05207
  - Standard Deviation: 0.21686
- Background-Foreground Depth Maps
  - Image Size: 224x224x1
  - Number of Images: 400,000
  - Mean: 0.2981
  - Standard Deviation: 0.11561

Dataset Creation

The dataset was created as follows

Download background and foreground images
Resizing backgrounds
Background removal from foregrounds
Mask creation for foregrounds
Resizing foregrounds
Overlaying foregrounds on backgrounds
Depth map
Resizing the entire dataset
Dump file relations
Dump bounding box data

Download background and foreground images

Create the directories bg and fg inside the root directory.
Search and download 100 images from the web showing museums from the inside. Crop these images in an aspect ratio of 1:1 (square). Keep these images in the bg directory.
Search and download 100 images from the web containing humans. (It is preferable to download images with a solid background. This will make the background removal from these images easier). Keep these images in the fg directory.

Resizing backgrounds

All the background images are resized into a fixed size of 704x704. This is done for the following reasons

If the size of backgrounds is fixed, then the foregrounds which will be overlayed on top of the backgrounds can be of fixed size as well. This will eliminate the need for specifying different foreground size for each background.
The DenseDepth model (which will be used later) requires all the input images to be of same dimension.

Reason for choosing initial background size of 704x704:
This shape was chosen because during our trials we found that this was the minimum shape at which DenseDepth gave very accurate depth maps.
Later, after all the images and depth maps are created, all the images will be resized to a size of 224x224.

Sample background images

Background removal from foregrounds

All the downloaded foreground images should have a transparent background in order to overlay them on top of background images.

For removing backgrounds, the open-source software GIMP - GNU Image Manipulation Program can be used. Steps for removing background using GIMP has been shown below:

Images are exported as PNG because it allows the image to have a transparent background.

Sample foreground images

Mask creation for foregrounds

Concept of alpha channel

There is an alpha channel in the in foreground images which specifies the opacity for a color. After adding transparent backgrounds to the images in GIMP, the alpha parameter ranges from 0 (fully transparent) to 255 (fully opaque).
The alpha channel in foreground images has pixel value set to 0 wherever transparency is present.
After adding transparency to images in GIMP, the background color of the image is set to white (i.e. pixels values in RGB channel are equal to 255) which is hidden with the help of the alpha channel.

Creating mask

The pixels in the foreground image are set to 255 (white) where the object is present and rest of the pixels (background) are set to 0 (black).
The non-zero values in the alpha channel are set to 255, this ensures full opaqueness of the object mask in the image.

Sample foreground masks

Resizing foregrounds

All the foreground images are resized to have a fixed height of 340.
Since the background image is of fixed dimension 704x704, the foreground that has to be overlayed on top of it should be in proportion. Also in order to maintain the aspect ratio of the foreground, we find the corresponding width with respect to a height of 340 and then resize it to (340, new_width).

Reason for choosing foreground height as 340:
After many attempts of overlaying different foregrounds on different backgrounds, we found 340 to be the most optimum height for the foregrounds with respect to the background size of 704x704.

Overlaying foregrounds on backgrounds

Foregrounds are overlayed on backgrounds to create background-foreground images. A corresponding mask of background-foreground image will also be created.

Creating a single image and its mask

Find a random location (x,y) on the background image. Make sure that x ranges between [0, background_height-foreground_height] and y ranges between [0, background_width-foreground_width]. This ensures that foreground is always completely inside the background.
Place the foreground image on top of the background image with (x,y) as the top left corner. This will be the background-foreground image.
Calculate the bounding box values for this image
- Current image size is 704x704 and the final image size will be 224x224, so the bounding box values will be calculated with respect to the final image size
- Calculate the scale ratio = 224 / 704
- Multiply the x, y, foreground width and height values with the scale ratio to obtain the bounding box values
Place the mask of the foreground on a black image that has same shape as that of the background at (x,y) as the top left corner. This will be the mask for the background-foreground image.
Since the mask is in grayscale, the number of channels in these images is reduced to 1 to save storage space.

Dataset Creation

Now for each background-foreground pair:

Repeat the steps above for 20 different locations.
Flip the foreground and its mask horizontally and again repeat the steps above for 20 different locations.
Store all the bounding box values for each image in a list and dump them in a file.

Images created in this step

Number of foreground images = 100
Number of background images = 100
Overlaying foreground on 20 different locations: 100x100x20 = 200,000
Overlaying horizontally flipped foreground on 20 different locations: 100x100x20 = 200,000
Total overlayed images = 400,000

Each of background-foreground, background-foreground mask and background-foreground depth map will have 400,000 images.

Thus, total number of images in the dataset is 1,200,000.

Sample background-foreground images

Sample background-foreground masks

Depth map

To create the monocular depth estimation map of the background-foreground images, we use pretrained DenseNet-201. Implementation for the model inference was referenced from this repository.

Since the depth maps are in grayscale, the number of channels for these images is reduced to 1 to save storage space.

Note: Since we don't have a DepthCam, we rely on a pretrained DenseNet-201 model to generate depth maps.

Sample background-foreground depth maps

Final resize of the dataset

Since the dimension 704x704 is quite big and it will take huge amount of time to train so we resize the entire dataset which consists of background images, background-foreground images, background-foreground mask and background-foreground depth maps to 224x224 which is big enough to get a good accuracy as well as train faster.

The background images have been resized by a factor of 3.1428 (704 / 224), so the foreground images would have to be resized as well. The new size of foreground images becomes 108 (340 / 3.1428).

Dump file relations

The relationships between various parts of the dataset are dumped in a file named file_map.txt. The contents of the file are as follows:

bg001 bg001_fg001_01  bg001_fg001_01_mask bg001_fg001_01_depth_map
bg001 bg001_fg001_02  bg001_fg001_02_mask bg001_fg001_02_depth_map
...
bg029 bg029_fg010_18  bg029_fg010_18_mask bg029_fg010_18_depth_map
...
bg100 bg100_fg100_40  bg100_fg100_40_mask bg100_fg100_40_depth_map

Dump bounding box data

The bounding box data for each background-foreground image is dumped in a file named bbox.txt. The contents of the file are as follows:

bg001_fg001_01.jpeg     123     113     35      108
bg001_fg001_02.jpeg     101     81      35      108
...
bg050_fg084_17.jpeg     90      70      39      108
...

The steps for calculation of bounding box for each image is described in this section.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
_layouts		_layouts
densedepth		densedepth
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
MODEST_Museum_Dataset_Creation.ipynb		MODEST_Museum_Dataset_Creation.ipynb
MODEST_Museum_Dataset_Statistics.ipynb		MODEST_Museum_Dataset_Statistics.ipynb
README.md		README.md
_config.yml		_config.yml

License

shan18/MODEST-Museum-Dataset

Folders and files

Latest commit

History

Repository files navigation

MOnocular Depth Estimation and SegmenTation (MODEST) Museum Dataset

Directory Structure

Content Description

bg

fg

fg_mask

bg_fg

bg_fg_mask

bg_fg_depth_map

file_map.txt

bbox.txt

Statistics

Dataset Creation

Download background and foreground images

Resizing backgrounds

Sample background images

Background removal from foregrounds

Sample foreground images

Mask creation for foregrounds

Concept of alpha channel

Creating mask

Sample foreground masks

Resizing foregrounds

Overlaying foregrounds on backgrounds

Creating a single image and its mask

Dataset Creation

Images created in this step

Sample background-foreground images

Sample background-foreground masks

Depth map

Sample background-foreground depth maps

Final resize of the dataset

Dump file relations

Dump bounding box data

Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages