VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph

Yuan, Jicheng; Le-Tuan, Anh; Nguyen-Duc, Manh; Tran, Trung-Kien; Hauswirth, Manfred; Le-Phuoc, Danh

doi:10.1007/978-3-031-60635-9_5

Jicheng Yuan¹⁵,
Anh Le-Tuan¹⁵,
Manh Nguyen-Duc¹⁵,
Trung-Kien Tran¹⁶,
Manfred Hauswirth^15,17 &
…
Danh Le-Phuoc^15,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14665))

Included in the following conference series:

European Semantic Web Conference

287 Accesses

Abstract

The availability of vast amounts of visual data with diverse and fruitful features is a key factor for developing, verifying, and benchmarking advanced computer vision (CV) algorithms and architectures. Most visual datasets are created and curated for specific tasks or with limited data distribution for very specific fields of interest, and there is no unified approach to manage and access them across diverse sources, tasks, and taxonomies. This not only creates unnecessary overheads when building robust visual recognition systems, but also introduces biases into learning systems and limits the capabilities of data-centric AI. To address these problems, we propose the Vision Knowledge Graph (VisionKG), a novel resource that interlinks, organizes and manages visual datasets via knowledge graphs and Semantic Web technologies. It can serve as a unified framework facilitating simple access and querying of state-of-the-art visual datasets, regardless of their heterogeneous formats and taxonomies. One of the key differences between our approach and existing methods is that VisionKG is not only based on metadata but also utilizes a unified data schema and external knowledge bases to integrate, interlink, and align visual datasets. It enhances the enrichment of the semantic descriptions and interpretation at both image and instance levels and offers data retrieval and exploratory services via SPARQL and natural language empowered by Large Language Models (LLMs). VisionKG currently contains 617 million RDF triples that describe approximately 61 million entities, which can be accessed at https://vision.semkg.org and through APIs. With the integration of 37 datasets and four popular computer vision tasks, we demonstrate its usefulness across various scenarios when working with computer vision pipelines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://huggingface.co/docs/datasets/index.
2.
https://github.com/opendatalab/opendatalab-python-sdk.
3.
https://paperswithcode.com/datasets.
4.
https://vision.semkg.org.
5.
https://github.com/cqels/vision.
6.
https://vision.semkg.org/sparql.
7.
List of dataset licenses in VisionKG: http://vision.semkg.org/licences.html.
8.
https://creativecommons.org/licenses/by/4.0/.
9.
https://github.com/cqels/vision.
10.
https://github.com/cqels/vision.

References

Alla, S., Adari, S.K., Alla, S., Adari, S.K.: What is MLOps? Beginning MLOps with MLFlow: Deploy Models in AWS SageMaker, Google Cloud, and Microsoft Azure, pp. 79–124 (2021)
Google Scholar
Bisong, E., Bisong, E.: Kubeflow and kubeflow pipelines. In: Bisong, E. (ed.) Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, pp. 671–685. Apress, Berkeley (2019). https://doi.org/10.1007/978-1-4842-4470-8_46
Chapter Google Scholar
Bizer, C., Heath, T., Berners-Lee, T.: Linked data: the story so far. In: Semantic Services, Interoperability and Web Applications: Emerging Concepts, pp. 205–227. IGI global (2011)
Google Scholar
Bollacker, K., Cook, R., Tufts, P.: Freebase: a shared database of structured general human knowledge. In: AAAI, vol. 7, pp. 1962–1963 (2007)
Google Scholar
Budroni, P., Claude-Burgelman, J., Schouppe, M.: Architectures of knowledge: the European open science cloud. ABI Tech. 39(2), 130–141 (2019)
Article Google Scholar
Chen, K., et al.: MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Cordts, M., et al.: The cityscapes dataset. In: CVPR Workshop on the Future of Datasets in Vision, vol. 2. sn (2015)
Google Scholar
Cui, P., Liu, S., Zhu, W.: General knowledge embedded image representation learning. IEEE Trans. Multimed. 20(1), 198–207 (2017)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: RML: a generic language for integrated rdf mappings of heterogeneous data. Ldow 1184 (2014)
Google Scholar
Ebert, C., Gallardo, G., Hernantes, J., Serrano, N.: DevOps. IEEE Softw. 33(3), 94–100 (2016)
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Article Google Scholar
Fang, Y., Kuan, K., Lin, J., Tan, C., Chandrasekhar, V.: Object detection meets knowledge graphs. In: International Joint Conferences on Artificial Intelligence (2017)
Google Scholar
Filipiak, D., Fensel, A., Filipowska, A.: Mapping of ImageNet and Wikidata for knowledge graphs enabled computer vision. In: Business Information Systems, pp. 151–161 (2021)
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset (2007)
Google Scholar
Gupta, A., Dollar, P., Girshick, R.: LVIS: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5356–5364 (2019)
Google Scholar
Hambardzumyan, S., et al.: Deep lake: a lakehouse for deep learning (2023)
Google Scholar
Iglesias, E., Jozashoori, S., Chaves-Fraga, D., Collarana, D., Vidal, M.E.: SDM-RDFizer: an RML interpreter for the efficient creation of rdf knowledge graphs. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3039–3046 (2020)
Google Scholar
Koeva, S.: Multilingual image corpus: annotation protocol. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pp. 701–707 (2021)
Google Scholar
Koeva, S.: Ontology of visual objects. In: Proceedings of the 5th International Conference on Computational Linguistics in Bulgaria (CLIB 2022), pp. 120–129. Department of Computational Linguistics, IBL – BAS, Sofia (2022). https://aclanthology.org/2022.clib-1.14
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia (2013)
Google Scholar
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123, 32–73 (2017)
Article MathSciNet Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS (2012)
Google Scholar
Kuznetsova, A., et al.: The open images dataset v4: unified image classification, object detection, and visual relationship detection at scale. Int. J. Comput. Vision 128(7), 1956–1981 (2020)
Article Google Scholar
Lambert, J., Liu, Z., Sener, O., Hays, J., Koltun, V.: MSeg: a composite dataset for multi-domain semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2879–2888 (2020)
Google Scholar
Le-Tuan, A., Tran, T.K., Nguyen, D.M., Yuan, J., Hauswirth, M., Le-Phuoc, D.: VisionKG: towards a unified vision knowledge graph. In: ISWC (Posters/Demos/Industry) (2021)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Monka, S., Halilaj, L., Schmid, S., Rettinger, A.: Learning visual models using a knowledge graph as a trainer. In: Hotho, A., et al. (eds.) ISWC 2021. LNCS, vol. 12922, pp. 357–373. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88361-4_21
Chapter Google Scholar
Moore, B.E., Corso, J.J.: Fiftyone. GitHub (2020). https://github.com/voxel51/fiftyone
Neuhold, G., Ollmann, T., Rota Bulo, S., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4990–4999 (2017)
Google Scholar
Nielsen, F.Å.: Linking ImageNet WordNet synsets with Wikidata. In: Companion Proceedings of the the Web Conference 2018, pp. 1809–1814 (2018)
Google Scholar
Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012 (2016)
Google Scholar
Paullada, A., Raji, I.D., Bender, E.M., Denton, E., Hanna, A.: Data and its (dis) contents: a survey of dataset development and use in machine learning research. Patterns 2(11), 100336 (2021)
Article Google Scholar
Qin, A., Xiao, M., Wu, Y., Huang, X., Zhang, X.: Mixer: efficiently understanding and retrieving visual content at web-scale. Proc. VLDB Endow. 14(12), 2906–2917 (2021)
Article Google Scholar
Sakaridis, C., Dai, D., Van Gool, L.: ACDC: the adverse conditions dataset with correspondences for semantic driving scene understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10765–10775 (2021)
Google Scholar
Shah, S., Mishra, A., Yadati, N., Talukdar, P.P.: KVQA: knowledge-aware visual question answering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8876–8884 (2019)
Google Scholar
Shao, S., et al.: Objects365: a large-scale, high-quality dataset for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8430–8439 (2019)
Google Scholar
Speer, R., Chin, J., Havasi, C.: ConceptNet 5.5: an open multilingual graph of general knowledge. In: AAAI (2017)
Google Scholar
Sun, T., et al.: SHIFT: a synthetic driving dataset for continuous multi-task domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21371–21382 (2022)
Google Scholar
Tran, T.K., Le-Tuan, A., Nguyen-Duc, M., Yuan, J., Le-Phuoc, D.: Fantastic data and how to query them. arXiv preprint arXiv:2201.05026 (2022)
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Article Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset (2011)
Google Scholar
Wang, X., Cai, Z., Gao, D., Vasconcelos, N.: Towards universal object detection by domain attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst. 193, 102907 (2020)
Article Google Scholar
Whang, S.E., Roh, Y., Song, H., Lee, J.G.: Data collection and quality challenges in deep learning: a data-centric AI perspective. VLDB J. 1–23 (2023)
Google Scholar
Wilkinson, M.D., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3 (2016)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Yamamoto, Y., Egami, S., Yoshikawa, Y., Fukuda, K.: Towards semantic data management of visual computing datasets: increasing usability of MetaVD. In: Proceedings of the ISWC 2023 Posters, Demos and Industry Tracks (2023)
Google Scholar
Yang, K., Russakovsky, O., Deng, J.: SpatialSense: an adversarially crowdsourced benchmark for spatial relation recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2051–2060 (2019)
Google Scholar
Yu, F., et al.: BDD100K: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2636–2645 (2020)
Google Scholar
Zhang, Y., Kang, B., Hooi, B., Yan, S., Feng, J.: Deep long-tailed learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
Google Scholar
Zhou, X., Koltun, V., Krähenbühl, P.: Simple multi-dataset detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7571–7580 (2022)
Google Scholar
Zhu, C., Chen, F., Ahmed, U., Shen, Z., Savvides, M.: Semantic relation reasoning for shot-stable few-shot object detection. In: Proceedings of the IEEE/CVF Conference on computer vision and Pattern Recognition, pp. 8782–8791 (2021)
Google Scholar
Zhu, X., Vondrick, C., Fowlkes, C.C., Ramanan, D.: Do we need more training data? Int. J. Comput. Vision 119(1), 76–92 (2016)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by the Deutsche Forschungsgemeinschaft, German Research Foundation under grant number 453130567 (COSMO), by the Horizon Europe Research and Innovation Actions under grant number 101092908 (SmartEdge), by the Federal Ministry for Education and Research, Germany under grant number 01IS18037A (BIFOLD) and by the Horizon Europe Research and Innovation programme under grant agreement number 101079214 (AIoTwin).

Author information

Authors and Affiliations

Open Distributed Systems, Technical University of Berlin, Berlin, Germany
Jicheng Yuan, Anh Le-Tuan, Manh Nguyen-Duc, Manfred Hauswirth & Danh Le-Phuoc
Bosch Center for Artificial Intelligence, Renningen, Germany
Trung-Kien Tran
Fraunhofer Institute for Open Communication Systems, Berlin, Germany
Manfred Hauswirth & Danh Le-Phuoc

Authors

Jicheng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Anh Le-Tuan
View author publications
You can also search for this author in PubMed Google Scholar
Manh Nguyen-Duc
View author publications
You can also search for this author in PubMed Google Scholar
Trung-Kien Tran
View author publications
You can also search for this author in PubMed Google Scholar
Manfred Hauswirth
View author publications
You can also search for this author in PubMed Google Scholar
Danh Le-Phuoc
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jicheng Yuan .

Editor information

Editors and Affiliations

King’s College London, London, UK
Albert Meroño Peñuela
KU Leuven, Sint-Katelijne-Waver, Belgium
Anastasia Dimou
EURECOM, Biot, France
Raphaël Troncy
Linköping University, Linköping, Sweden
Olaf Hartig
Technical University of Munich, Heilbronn, Germany
Maribel Acosta
Polytechnic Institute of Paris, Palaiseau, France
Mehwish Alam
University of Mannheim, Mannheim, Germany
Heiko Paulheim
EURECOM, Biot, France
Pasquale Lisena

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuan, J., Le-Tuan, A., Nguyen-Duc, M., Tran, TK., Hauswirth, M., Le-Phuoc, D. (2024). VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph. In: Meroño Peñuela, A., et al. The Semantic Web. ESWC 2024. Lecture Notes in Computer Science, vol 14665. Springer, Cham. https://doi.org/10.1007/978-3-031-60635-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-60635-9_5
Published: 19 May 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60634-2
Online ISBN: 978-3-031-60635-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph