Abstract
Mining existing image datasets with rich information can help advance knowledge across domains in the humanities and social sciences. In the past, the extraction of this information was often prohibitively expensive and labor-intensive. AI can provide an alternative, making it possible to speed up the labeling and mining of large and specialized datasets via a human-in-the-loop method of active learning (AL). Although AL methods are helpful for certain scenarios, they present limitations when the set of classes is not known before labeling (i.e. an open-ended set) and the distribution of objects across classes is highly unbalanced (i.e. a long-tailed distribution). To address these limitations in object detection scenarios we propose a multi-step approach consisting of 1) object detection of a generic “object” class, and 2) image classification with an open class set and a long tail distribution. We apply our approach to recognizing stamps in a large compendium of historical documents from the Japanese company Mitsui Mi’ike Mine, one of the largest business archives in modern Japan that spans half a century, includes tens of thousands of documents, and has been widely used by labor historians, business historians, and others. To test our approach we produce and make publicly available the novel and expert-curated MiikeMineStamps dataset. This unique dataset consists of 5056 images of 405 different Japanese stamps, which to the best of our knowledge is the first published dataset of historical Japanese stamps. We hope that the MiikeMineStamps dataset will become a useful tool to further explore the application of AI methods to the study of historical documents in Japan and throughout the world of Chinese characters, as well as serve as a benchmark for image classification algorithms with an open-ended and highly unbalanced class set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aghdam, H.H., González-García, A., van de Weijer, J., López, A.M.: Active learning for deep detection neural networks. In: ICCV, pp. 3671–3679 (2019)
Beluch, W.H., Genewein, T., Nurnberger, A., Kohler, J.M.: The power of ensembles for active learning in image classification. In: CVPR, pp. 9368–9377 (2018). https://doi.org/10.1109/CVPR.2018.00976
Buitrago, P.A., Nystrom, N.A.: Neocortex and bridges-2: a high performance AI+HPC ecosystem for science, discovery, and societal good. In: Nesmachnow, S., Castro, H., Tchernykh, A. (eds.) High Performance Computing, pp. 205–219. Springer International Publishing, Cham (2021)
Clanuwat, T., Lamb, A., Kitamoto, A.: KuroNet: pre-modern Japanese Kuzushiji character recognition with deep learning. In: ICDAR, pp. 607–614 (2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Precup, D., Teh, Y.W. (eds.) ICML, vol. 70, pp. 1126–1135 (2017)
Gal, Y., Islam, R., Ghahramani, Z.: Deep Bayesian active learning with image data. ICML 70, 1183–1192 (2017)
Geifman, Y., El-Yaniv, R.: Deep active learning over the long tail (2017)
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. CalTech Report, March 2007
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Kao, C.-C., Lee, T.-Y., Sen, P., Liu, M.-Y.: Localization-aware active learning for object detection. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11366, pp. 506–522. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20876-9_32
Krishna, R., et al.: The visual genome dataset v1.0 + v1.2 images. https://visualgenome.org/
Krishnamurthy, A., Agarwal, A., Huang, T.K., Daume, H., III., Langford, J.: Active learning for cost-sensitive classification. JMLR 20(65), 1–50 (2019)
Krizhevsky, A., Nair, V., Hinton, G.: CIFAR-100 (Canadian Institute for Advanced Research)
Lin, T., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2999–3007 (2017)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)
Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., Yu, S.X.: Large-scale long-tailed recognition in an open world. In: CVPR (2019)
Nystrom, N.A., Levine, M.J., Roskies, R.Z., Scott, J.R.: Bridges: a uniquely flexible HPC resource for new communities and data analytics. In: XSEDE 2015: Scientific Advancements Enabled by Enhanced Cyberinfrastructure (2015). https://doi.org/10.1145/2792745.2792775
Qu, Z., Du, J., Cao, Y., Guan, Q., Zhao, P.: Deep active learning for remote sensing object detection (2020)
Roy, S., Unmesh, A., Namboodiri, V.: Deep active learning for object detection. In: BMVC (2019)
Russell, B., Torralba, A., Murphy, K., Freeman, W.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77, 157–173 (2008)
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. CoRR abs/1503.03832 (2015)
Sener, O., Savarese, S.: Active learning for convolutional neural networks: a core-set approach. In: ICLR (2018)
Sinha, S., Ebrahimi, S., Darrell, T.: Variational adversarial active learning. In: ICCV, pp. 5971–5980 (2019). https://doi.org/10.1109/ICCV.2019.00607
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. NIPS 30, 4077–4087 (2017)
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: CVPR, pp. 1701–1708 (2014). https://doi.org/10.1109/CVPR.2014.220
Toropov, E., Buitrago, P.A., Moura, J.M.F.: Shuffler: A large scale data management tool for machine learning in computer vision. In: PEARC (2019)
Towns, J., Cockerill, T., Dahan, M., Foster, I., Gaither, K., Grimshaw, A., Hazlewood, V., Lathrop, S., Lifka, D., Peterson, G.D., Roskies, R., Scott, J., Wilkins-Diehr, N.: XSEDE: accelerating scientific discovery. Comput. Sci. Eng. 16(05), 62–74 (2014). https://doi.org/10.1109/MCSE.2014.80
Villalonga, G., Lopez, A.M.: Co-training for on-board deep object detection (2020)
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, k., Wierstra, D.: Matching networks for one shot learning. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) NIPS, vol. 29, pp. 3630–3638 (2016)
Wang, K., Zhang, D., Li, Y., Zhang, R., Lin, L.: Cost-effective active learning for deep image classification. IEEE Trans. Circ. Syst. Video Technol. 27(12), 2591–2600 (2017). https://doi.org/10.1109/TCSVT.2016.2589879
Wang, Y., Yao, Q., Kwok, J., Ni, L.: Few-shot learning: a survey. arXiv preprint arXiv:1904.05046 (2019)
Xia, G., et al.: DOTA: a large-scale dataset for object detection in aerial images. In: CVPR, pp. 3974–3983 (2018). https://doi.org/10.1109/CVPR.2018.00418
Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. In: CVPR, pp. 3530–3538 (2017)
Yoo, D., Kweon, I.S.: Learning loss for active learning. In: CVPR, pp. 93–102 (2019). https://doi.org/10.1109/CVPR.2019.00018
Zhang, S., Benenson, R., Schiele, B.: CityPersons: a diverse dataset for pedestrian detection. In: CVPR, pp. 4457–4465 (2017). https://doi.org/10.1109/CVPR.2017.474
Acknowledgements
This work used the Extreme Science and Engineering Discovery Environment (XSEDE) which is supported by National Science Foundation grant number ACI-1548562. Specifically, it used the Bridges and Bridges-2 systems, which is supported by NSF award number ACI-1445606 and ACI-1928147, at the Pittsburgh Supercomputing Center (PSC) [3, 20, 30]. The work was made possible through the XSEDE Extended Collaborative Support Service (ECSS) program.
We are grateful to the Mitsui Archives for giving us permission to reproduce their documents and publish the stamps.
Finally, this work would not have been possible without the expert labeling and assistance of Ms. Mieko Ueda.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Buitrago, P.A., Toropov, E., Prabha, R., Uran, J., Adal, R. (2021). MiikeMineStamps: A Long-Tailed Dataset of Japanese Stamps via Active Learning. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-86334-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86333-3
Online ISBN: 978-3-030-86334-0
eBook Packages: Computer ScienceComputer Science (R0)