Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FarsBase: : The Persian knowledge graph

Published: 01 January 2019 Publication History

Abstract

Over the last decade, extensive research has been done on automatic construction of knowledge graphs from Web resources, resulting in a number of large-scale knowledge graphs such as YAGO, DBpedia, BabelNet, and Wikidata. Despite that some of these knowledge graphs are multilingual, they contain few or no linked data in Persian, and do not support tools for extracting knowledge from Persian information sources. FarsBase (available at http://farsbase.net/about) is the first Persian multi-source knowledge graph, which is specifically designed for semantic search engines to support Persian knowledge. FarsBase uses a diverse set of hybrid and flexible techniques to extract and integrate knowledge from various sources, such as Wikipedia, Web tables and unstructured texts. It also supports entity linking, which allows integration with other knowledge graphs. To maintain high accuracy for triples, we adopt a low-cost mechanism for verifying candidate knowledge by human experts, where the candidates for human verification are prioritized using different heuristics. FarsBase is being used as the semantic-search system of a Persian search engine and efficiently answers hundreds of semantic queries per second.

References

[1]
A. Abujabal, M. Yahya, M. Riedewald and G. Weikum, Automated template generation for question answering over knowledge graphs, in: Proceedings of the 26th International Conference on World Wide Web, WWW, 2017, pp. 1191–1200.
[2]
A. Ahmeti, J.D. Fernández, A. Polleres and V. Savenkov, in: Updating Wikipedia via DBpedia Mappings and SPARQL, Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10249 LNCS, 2017, pp. 485–501.
[3]
A. Akbik and J. Broß, Wanderlust: Extracting semantic relations from natural language text using dependency grammar patterns, 2009, pp. 6–15.
[4]
H. Al-Feel, A step towards the Arabic DBpedia, International Journal of Computer Applications 80(3) (2013), 27–33.
[5]
A.P. Aprosio, C. Giuliano and A. Lavelli, Extending the coverage of DBpedia properties using distant supervision over Wikipedia, in: Proceedings of the NLP & DBpedia Workshop Co-Located with the 12th International Semantic Web Conference (ISWC 2013), 2013.
[6]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak and Z. Ives, Dbpedia: A nucleus for a web of open data, in: The Semantic Web, 2007, pp. 722–735.
[7]
I. Augenstein, D. Maynard and F. Ciravegna, Distantly supervised web relation extraction for knowledge base population, Semantic Web Journal 7(4) (2016), 335–349.
[8]
N. Bach and S. Badaskar, A Survey on Relation Extraction, Master’s thesis, Language Technologies Institute Carnegie Mellon University, 2007.
[9]
A.O. Bahanshal and H.S. Al-Khalifa, Toward recipes for Arabic DBpedia, in: Proceedings of International Conference on Information Integration and Web-Based Applications & Services, ACM, 2013, p. 331.
[10]
C.F. Baker, C.J. Fillmore and J.B. Lowe, The berkeley framenet project, in: Proceedings of the 17th International Conference on Computational Linguistics, Vol. 1, Association for Computational Linguistics, 1998, pp. 86–90.
[11]
S. Batsakis, E.G. Petrakis, I. Tachmazidis and G. Antoniou, Temporal representation and reasoning in OWL 2, Semantic Web 8(6) (2017), 981–1000.
[12]
C.S. Bhagavatula, T. Noraset and D. Downey, TabEL: Entity linking in web tables, in: Proceedings of the 14th International Semantic Web Conference, Springer, 2015, pp. 425–441.
[13]
R. Blanco, B.B. Cambazoglu, P. Mika and N. Torzec, Entity recommendations in Web search, in: Proceedings of 12th International Semantic Web Conference, Springer, Sydney, NSW, Australia, 2013, pp. 33–48.
[14]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge and J. Taylor, Freebase: A collaboratively created graph database for structuring human knowledge, in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, ACM, 2008, pp. 1247–1250.
[15]
J. Breen, JMDict: A Japanese-multilingual dictionary, in: Proceedings of the Workshop on Multilingual Linguistic Ressources, Association for Computational Linguistics, 2004, pp. 71–79.
[16]
A.X. Chang and C.D. Manning, TokensRegex: Defining Cascaded Regular Expressions over Tokens, Stanford University Computer Science Technical Reports. CSTR 2014-02, 2014.
[17]
Y. Chen, D.Z. Wang and S. Goldberg, ScaLeKB: Scalable learning and inference over large knowledge bases, The VLDB Journal 25(6) (2016), 893–918.
[18]
S. Colucci, F.M. Donini and E. Di Sciascio, Reasoning over RDF knowledge bases: Where we are, in: Proceedings of 16th Conference of the Italian Association for Artificial Intelligence, Springer, 2017, pp. 243–255.
[19]
K. Dashtipour, M. Gogate, A. Adeel, A. Algarafi, N. Howard and A. Hussain, Persian named entity recognition, in: Proceedings of 16th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), IEEE, 2017, pp. 79–83.
[20]
A. Dimou, D. Kontokostas, M. Freudenberg, R. Verborgh, J. Lehmann, E. Mannens and S. Hellmann, DBpedia mappings quality assessment, in: Proceedings of Central Europe (CEUR) Workshop, Vol. 1690, 2016.
[21]
A. Dimou, D. Kontokostas, M. Freudenberg, R. Verborgh, J. Lehmann, E. Mannens, S. Hellmann and R. Van de Walle, Assessing and refining mappings to RDF to improve dataset quality, in: Proceedings of 14th International Semantic Web Conference, Springer, 2015, pp. 133–149.
[22]
M. Dojchinovski and T. Vitvar, Linked web APIs dataset, Semantic Web 9(4) (2018), 381–391.
[23]
X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun and W. Zhang, Knowledge vault: A Web-scale approach to probabilistic knowledge fusion, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – KDD ’14, 2014, pp. 601–610.
[24]
L. Ehrlinger and W. Wöß, Towards a definition of knowledge graphs, in: Proceedings of SEMANTiCS (Posters, Demos, SuCCESS), 2016.
[25]
O. Etzioni, A. Fader, J. Christensen, S. Soderland and M. Mausam, Open information extraction: The second generation, in: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI11), Vol. 11, 2011, pp. 3–10.
[26]
P. Exner and P. Nugues, Entity extraction: From unstructured text to DBpedia RDF triples, Proceedings of Central Europe (CEUR) Workshop 906(Iswc) (2012), 58–69.
[27]
P. Exner and P. Nugues, Entity extraction: From unstructured text to DBpedia RDF triples, in: Proceedings of the Web of Linked Entities Workshop (WoLE 2012), 2012, pp. 58–69.
[28]
M. Färber, F. Bartscherer, C. Menne and A. Rettinger, in: Linked Data Quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO, Semantic Web, 2016, pp. 1–53.
[29]
M. Färber, B. Ell, C. Menne and A. Rettinger, A comparative survey of DBpedia, freebase, OpenCyc, wikidata,and YAGO, Semantic Web 1 (2015), 1–5.
[30]
M. Fossati, E. Dorigatti and C. Giuliano, N-ary relation extraction for joint T-box and A-box knowledge base augmentation, Semantic Web Journal 0(0) (2015), 1–28.
[31]
A. Halevy, Technical perspective: Incremental knowledge base construction using DeepDive, ACM SIGMOD Record (SIGMOD REC) 45(1) (2016), 2016.
[32]
X. Han, Collective entity linking in web text: A graph-based method, in: The 20th ACM International Conference on Information and Knowledge Management, 2011, pp. 765–774. https://dx.doi.org/10.1145/2009916.2010019.
[33]
N. Heist and H. Paulheim, Language-agnostic relation extraction from Wikipedia abstracts, in: Proceedings of 16th International Semantic Web Conference, Vol. 10587, LNCS, Springer, 2017, pp. 383–399.
[34]
J. Hoffart, F.M. Suchanek, K. Berberich and G. Weikum, YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia, in: Proceedings of IJCAI International Joint Conference on Artificial Intelligence, Vol. 194, Elsevier B.V., 2013, pp. 3161–3165.
[35]
A.S. Ismail, H. Al-Feel and H.M. Mokhtar, Introducing a new Arabic endpoint for DBpedia internationalization project, in: Proceedings of the 20th International Database Engineering & Applications Symposium, ACM, 2016, pp. 284–289.
[36]
A. Ismayilov, D. Kontokostas, S. Auer, J. Lehmann, S. Hellmann et al., Wikidata through the eyes of DBpedia, Semantic Web 9(4) (2018), 493–503.
[37]
G. Kasneci, M. Ramanath, F. Suchanek and G. Weikum, The YAGO-NAGA approach to knowledge discovery, SIGMOD Record 37(4) (2008), 41–47.
[38]
A. Ktob and Z. Li, The Arabic knowledge graph: Opportunities and challenges, in: Proceedings of 2017 IEEE 11th International Conference on Semantic Computing (ICSC), IEEE, 2017, pp. 48–52.
[39]
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P.N. Mendes, S. Hellmann, M. Morsey, P. Van Kleef, S. Auer et al., DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web 6(2) (2015), 167–195.
[40]
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P.N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer and C. Bizer, DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web Journal 1 (2012), 1–5.
[41]
D.B. Lenat, CYC: A large-scale investment in knowledge infrastructure, Communications of the ACM 38(11) (1995), 33–38.
[42]
P. Li, H. Wang, H. Li and X. Wu, Employing semantic context for sparse information extraction assessment, ACM Transactions on Knowledge Discovery from Data (TKDD) 12(5) (2018), 54.
[43]
G. Limaye, S. Sarawagi and S. Chakrabarti, Annotating and searching Web tables using entities, types and relationships, Proceedings of the VLDB Endowment 3(1–2) (2010), 1338–1347.
[44]
F. Liu, H. Lu and G. Neubig, Handling homographs in neural machine translation, in: Proceedings of the 2018 Conference of the North American, Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, 2018, pp. 1336–1345.
[45]
M.F. Loesch, VIAF (the virtual international authority file) – http://viaf.org, Technical Services Quarterly 28(2) (2011), 255–256.
[46]
A. Madaan, A. Mittal Mausam, G. Ramakrishnan and S. Sarawagi, Numerical relation extraction with minimal supervision, in: Proceedings of the 30th Conference on Artificial Intelligence (AAAI 2016), 2016, pp. 2764–2771.
[47]
F. Mahdisoltani, J. Biega and F. Suchanek, YAGO3: A knowledge base from multilingual wikipedias, in: Proceedings of the 7th Biennial Conference on Innovative Data Systems Research, 2014.
[48]
G.-S. Mai, Y.-H. Wang, Y.-J. Hsia, S.-S. Lu, C.-C. Lin et al., Linked open data of ecology (LODE): A new approach for ecological data sharing, Taiwan Journal of Forest Science 26(4) (2011), 417–424.
[49]
P.N. Mendes, M. Jakob and C. Bizer, DBpedia: A multilingual cross-domain knowledge base, in: Language Resources and Evaluation LRES, 2012, pp. 1813–1817.
[50]
M. Mintz, S. Bills, R. Snow and D. Jurafsky, Distant supervision for relation extraction without labeled data, in: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Vol. 2, Association for Computational Linguistics, 2009, pp. 1003–1011.
[51]
A. Mirzaei and P. Safari, Persian discourse treebank and coreference corpus, in: Proceedings of 11th Language Resources and Evaluation Conference, 2018.
[52]
T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves and J. Welling, Never-ending learning, in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), 2015.
[53]
T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, B. Yang, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel et al., Never-ending learning, Communications of the ACM 61(5) (2018), 103–115.
[54]
R. Mitkov, Robust pronoun resolution with limited knowledge, in: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Vol. 2, Association for Computational Linguistics, 1998, pp. 869–875.
[55]
S. Mohtaj, B. Roshanfekr, A. Zafarian and H. Asghari, Parsivar: A language processing toolkit for persian, in: Proceedings of 11th Language Resources and Evaluation Conference, 2018.
[56]
A. Moro, A. Raganato and R. Navigli, Entity linking meets word sense disambiguation: A unified approach, Transactions of the Association for Computational Linguistics 2 (2014), 231–244.
[57]
A. Moschitti, K. Tymoshenko, P. Alexopoulos, A. Walker, M. Nicosia, G. Vetere, A. Faraotti, M. Monti, J.Z. Pan, H. Wu et al., Question Answering and Knowledge Graphs, in: Exploiting Linked Data and Knowledge Graphs in Large Organisations, Springer, 2017, pp. 181–212.
[58]
E. Munoz, A. Hogan and A. Mileo, Triplifying Wikipedia’s tables, in: Proceedings of ISWC 2013 Workshop on Linked Data for Information Extraction, Vol. 1057, 2013.
[59]
E. Muñoz, A. Hogan and A. Mileo, Using linked data to mine RDF from Wikipedia’s tables, in: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, ACM, 2014, pp. 533–542.
[60]
N.T. Nakashole, Automatic Extraction of Facts, Relations, and Entities for Web-Scale Knowledge Base Population, PhD thesis, University of Saarland, 2012.
[61]
R. Navigli and S.P. Ponzetto, BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network, Artificial Intelligence 193 (2012), 217–250.
[62]
D.B. Nguyen, A. Abujabal, N.K. Tran, M. Theobald and G. Weikum, Query-driven on-the-fly knowledge base construction, Proceedings of the VLDB Endowment 11(1) (2017), 66–77.
[63]
D.B. Nguyen, M. Theobald and G. Weikum, J-REED: Joint relation extraction and entity disambiguation, in: Proceedings of the 26th ACM International Conference on Information and Knowledge Management, 2017, pp. 2227–2230.
[64]
S. Oramas, L. Espinosa-Anke, M. Sordo, H. Saggion and X. Serra, Information extraction for knowledge base construction in the music domain, Data and Knowledge Engineering 106 (2016), 70–83.
[65]
H. Paulheim, Knowledge graph refinement: A survey of approaches and evaluation methods, Semantic Web Journal 8(3) (2017), 489–508.
[66]
V. Presutti, A.G. Nuzzolese, S. Consoli, D.R. Recupero and A. Gangemi, From hyperlinks to semantic web properties using open knowledge extraction, Semantic Web Journal 7(4) (2016), 1–5.
[67]
J.-H. Qian, X. Jin, Z.-J. Zhang and C. Shao, Construction of knowledge base based on ontology, in: Proceedings of the 2017 International Conference on Wireless Communications, Networking and Applications, ACM, 2017, pp. 77–83.
[68]
Z. Quan and V. Haarslev, A Parallel Computing Architecture for High-Performance OWL Reasoning, Parallel Computing, 2018.
[69]
M. Rashid, M. Torchiano, G. Rizzo and N. Mihindukulasooriya, A quality assessment approach for evolving knowledge bases, Semantic Web 10(2) (2019), 349–383.
[70]
A. Ratner, S.H. Bach, H. Ehrenberg, J. Fries, S. Wu and C. Ré, Snorkel: Rapid training data creation with weak supervision, Proceedings of the VLDB Endowment 11(3) (2017), 269–282.
[71]
T. Rebele, F. Suchanek, J. Hoffart, J. Biega, E. Kuzey and G. Weikum, YAGO: A multilingual knowledge base form Wikipedia, wordnet and geonames, in: Proceeding of 15th International Semantic Web Conference, Vol. 94, Springer, 2016, pp. 1–26.
[72]
D. Ritze, O. Lehmberg and C. Bizer, Matching HTML tables to DBpedia, in: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, ACM, 2015, p. 10.
[73]
D. Ritze, O. Lehmberg, Y. Oulabi and C. Bizer, Profiling the potential of Web tables for augmenting cross-domain knowledge bases, in: Proceedings of the 25th International Conference on World Wide Web – WWW ’16, International World Wide Web Conferences Steering Committee, 2016, pp. 251–261.
[74]
G. Rizzo, R. Troncy, O. Corcho, A. Jameson, J. Plu, J. Carlos, J. Hermida, A. Assaf, C.-M. Barbu, A. Spirescu, K.-D. Kuhn, I. Celino, R. Agarwal, C. Kinh Nguyen, A. Pathak, C. Scanu, M. Valla, T. Haaker, E.S. Verga and J. García, 3cixty@Expo Milano 2015: Enabling visitors to explore a smart city, in: Proceedings of Conference: 14th International Semantic Web Conference (ISWC), Semantic Web Challenge, 2015.
[75]
M. Röder, R. Usbeck and A.-C. Ngonga, in: Ngomo, GERBIL–Benchmarking Named Entity Recognition and Linking Consistently, Semantic Web, 2017, pp. 1–21.
[76]
J. Rouces, G. De Melo and K. Hose, FrameBase: Enabling integration of heterogeneous knowledge, Semantic Web Journal 8(6) (2017), 817–850.
[77]
M. Shamsfard, Challenges and open problems in persian text processing, in: Proceedings of 5th Language & Technology Conference, Vol. 11, 2011.
[78]
M. Shamsfard, A. Hesabi, H. Fadaei, N. Mansoory, A. Famian, S. Bagherbeigi, E. Fekri, M. Monshizadeh and S.M. Assi, Semi automatic development of FarsNet: The persian wordnet, in: Proceedings of 5th Global WordNet Conference, Vol. 29, Mumbai, India, 2010.
[79]
P. Singh et al., The public acquisition of commonsense knowledge, in: Proceedings of AAAI Spring Symposium: Acquiring (and Using) Linguistic (and World) Knowledge for Information Access, 2002.
[80]
R. Speer, J. Chin and C. Havasi, ConceptNet 5.5: An open multilingual graph of general knowledge, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) ConceptNet, 2017, pp. 4444–4451.
[81]
G. Stanovsky, J. Michael, L. Zettlemoyer and I. Dagan, Supervised open information extraction, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, Association for Computational Linguistics, New Orleans, Louisiana, 2018, pp. 885–895.
[82]
F.M. Suchanek, G. Kasneci and G. Weikum Yago, A core of semantic knowledge, in: Proceedings of the 16th International Conference on World Wide Web, ACM, 2007, pp. 697–706.
[83]
F.M. Suchanek, G. Kasneci and G. Weikum Yago, A large ontology from Wikipedia and wordnet, Journal of Web Semantics: Science, Services and Agents on the World Wide Web 6(3) (2008), 203–217.
[84]
A. Swartz, Musicbrainz: A semantic Web Service, IEEE Intelligent Systems 17(1) (2002), 76–77.
[85]
C. Unger, L. Bühmann, J. Lehmann, A.-C. Ngonga Ngomo, D. Gerber and P. Cimiano, Template-based question answering over RDF data, in: Proceedings of the 21st International Conference on World Wide Web, WWW, 2012, pp. 639–648.
[86]
B. Vatant and M. Wick, Geonames ontology, 2012. Accessed: 2019-09-06.
[87]
D.-T. Vo and E. Bagheri, Open information extraction, Encyclopedia with Semantic Computing and Robotic Intelligence 01(01) (2017), 1630003.
[88]
D. Vrandečić and M. Krötzsch, Wikidata: A free collaborative knowledgebase, Communications of the ACM 57(10) (2014), 78–85.
[89]
D.Z. Wang, Y. Chen, S. Goldberg, C. Grant and K. Li, Automatic knowledge base construction using probabilistic extraction, deductive reasoning, and human feedback, in: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-Scale Knowledge Extraction, Association for Computational Linguistics, 2012, pp. 106–110.
[90]
M. Yakout, K. Ganjam, K. Chakrabarti and S. Chaudhuri, Infogather: Entity augmentation and attribute discovery by holistic matching with web tables, in: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, 2012, pp. 97–108.
[91]
C. Zhang, C. Ré, M. Cafarella, C. De Sa, A. Ratner, J. Shin, F. Wang and S. Wu, DeepDive: Declarative knowledge base construction, Communications of the ACM 60(5) (2017), 93–102.

Cited By

View all
  • (2022)Answer selection in community question answering exploiting knowledge graph and context informationSemantic Web10.3233/SW-22297013:3(339-356)Online publication date: 1-Jan-2022

Index Terms

  1. FarsBase: The Persian knowledge graph
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Semantic Web
      Semantic Web  Volume 10, Issue 6
      Knowledge Graphs: Construction, Management and Querying
      2019
      234 pages
      ISSN:1570-0844
      EISSN:2210-4968
      Issue’s Table of Contents

      Publisher

      IOS Press

      Netherlands

      Publication History

      Published: 01 January 2019

      Author Tags

      1. Semantic Web
      2. Linked Date
      3. Persian
      4. knowledge graph

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 12 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Answer selection in community question answering exploiting knowledge graph and context informationSemantic Web10.3233/SW-22297013:3(339-356)Online publication date: 1-Jan-2022

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media