Abstract
In bibliometric research, keyword analysis of publications provides an effective way not only to investigate the knowledge structure of research domains, but also to explore the developing trends within domains. To identify the most representative keywords, many approaches have been proposed. Most of them focus on using statistical regularities, syntax, grammar, or network-based characteristics to select representative keywords for the domain analysis. In this paper, we argue that the domain knowledge is reflected by the semantic meanings behind keywords rather than the keywords themselves. We apply the Google Word2Vec model, a model of a word distribution using deep learning, to represent the semantic meanings of the keywords. Based on this work, we propose a new domain knowledge approach, the Semantic Frequency-Semantic Active Index, similar to Term Frequency-Inverse Document Frequency, to link domain and background information and identify infrequent but important keywords. We adopt a semantic similarity measuring process before statistical computation to compute the frequencies of “semantic units” rather than keyword frequencies. Semantic units are generated by word vector clustering, while the Inverse Document Frequency is extended to include the semantic inverse document frequency; thus only words in the inverse documents with a certain similarity will be counted. Taking geographical natural hazards as the domain and natural hazards as the background discipline, we identify the domain-specific knowledge that distinguishes geographical natural hazards from other types of natural hazards. We compare and discuss the advantages and disadvantages of the proposed method in relation to existing methods, finding that by introducing the semantic meaning of the keywords, our method supports more effective domain knowledge analysis.









Similar content being viewed by others
References
Bird, S. (2006). NLTK: The natural language toolkit. In Proceedings of the COLING/ACL on interactive presentation sessions, 2006 (pp. 69–72). Association for Computational Linguistics.
Borgatti, S. P. (2005). Centrality and network flow. Social networks, 27(1), 55–71. https://doi.org/10.1016/j.socnet.2004.11.008.
Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57(3), 359–377.
Chen, G., & Xiao, L. (2016). Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods. Journal of Informetrics, 10(1), 212–223.
Chen, G., Xiao, L., Hu, C.-P., & Zhao, X.-Q. (2015). Identifying the research focus of Library and Information Science institutions in China with institution-specific keywords. Scientometrics, 103(2), 707–724.
Der Maaten, L. V., & Hinton, G. E. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
Ding, Y., Chowdhury, G. G., & Foo, S. (2001). Bibliometric cartography of information retrieval research by using co-word analysis. Information Processing and Management, 37(6), 817–842.
Feng, J., Zhang, Y. Q., & Zhang, H. (2017). Improving the co-word analysis method based on semantic distance. Scientometrics, 111(3), 1521–1531.
Handler, A. (2014). An empirical study of semantic similarity in WordNet and Word2Vec. Citeseer.
Huang, E. H., Socher, R., Manning, C. D., & Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th annual meeting of the association for computational linguistics: Long papers—Volume 1, 2012 (pp. 873–882): Association for Computational Linguistics.
Meng, L., Huang, R., & Gu, J. (2013). A review of semantic similarity measures in wordnet. International Journal of Hybrid Information Technology, 6(1), 1–12.
Mihalcea, R., Corley, C., & Strapparava, C. (2006). Corpus-based and knowledge-based measures of text semantic similarity. In AAAI, 2006 (Vol. 6, pp. 775–780).
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Neural information processing systems (pp. 3111–3119).
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.
Newman, M. E. (2008). The mathematics of networks. The New Palgrave Encyclopedia of Economics, 2(2008), 1–12.
Quoniam, L., Balme, F., Rostaing, H., Giraud, E., & Dou, J. M. (1998). Bibliometric law used for information retrieval. [journal article]. Scientometrics, 41(1), 83–91. https://doi.org/10.1007/bf02457969.
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics, 27(3), 832–837.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.
Su, H.-N., & Lee, P.-C. (2010). Mapping knowledge structure by keyword co-occurrence: A first look at journal papers in Technology Foresight. Scientometrics, 85(1), 65–79. https://doi.org/10.1007/s11192-010-0259-8.
Wang, Z.-Y., Li, G., Li, C.-Y., & Li, A. (2012). Research on the semantic-based co-word analysis. Scientometrics, 90(3), 855–875.
Yang, S., Han, R., Wolfram, D., & Zhao, Y. (2016). Visualizing the intellectual structure of information science (2006–2015): Introducing author keyword coupling analysis. Journal of Informetrics, 10(1), 132–150.
Zhao, R., & Wang, J. (2010). Visualizing the research on pervasive and ubiquitous computing. Scientometrics, 86(3), 593–612.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 41371372). Thanks Mr. Stephen C. McClure for helping us with the English revisions.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Top 99 selected keywords from the TF, TF-IDF and TF-KAI methods (the unique keywords are in bold)
Rank | TF | Freq. | TF-IDF | Freq. | TF-KAI | Freq. |
---|---|---|---|---|---|---|
1 | gis | 94 | gis | 306 | geograph_inform_system_gis | 5,934,569 |
2 | natur_hazard | 64 | geograph_inform_system | 216 | geograph_inform_system | 4,634,698 |
3 | geograph_inform_system | 47 | natur_hazard | 158 | gis | 2,444,443 |
4 | landslid | 43 | geograph_inform_system_gis | 150 | malaysia | 1,139,113 |
5 | vulner | 37 | landslid | 130 | spatial_analysi | 999,469.3 |
6 | geograph_inform_system_gis | 27 | remot_sens | 109 | prison | 979,072 |
7 | Hazard | 26 | vulner | 107 | disast_risk_assess | 925,529 |
8 | remot_sens | 26 | natur_disast | 83 | geograph_variat | 925,529 |
9 | flood | 22 | risk_assess | 67 | frequenc_ratio | 920,429.7 |
10 | natur_disast | 22 | flood | 60 | remot_sens | 790,241.5 |
11 | risk | 22 | earthquak | 51 | urban_geolog | 697,015.1 |
12 | risk_assess | 20 | landslid_suscept | 51 | analyt_hierarchi_process_ahp | 674,641.8 |
13 | disast | 18 | social_vulner | 49 | coastal_vulner_index | 674,641.8 |
14 | earthquak | 16 | disast | 46 | gis_model | 674,641.8 |
15 | climat_chang | 13 | climat_chang | 45 | Pca | 674,641.8 |
16 | resili | 11 | malaysia | 44 | activ_learn | 573,675 |
17 | landslid_suscept | 10 | resili | 40 | anthropogen_intervent | 573,675 |
18 | social_vulner | 10 | bangladesh | 39 | catastroph_theori | 573,675 |
19 | drought | 9 | risk | 39 | climat_variat | 573,675 |
20 | risk_manag | 9 | drought | 38 | cross_applic | 573,675 |
21 | bangladesh | 7 | frequenc_ratio | 38 | cumulonimbus_convect | 573,675 |
22 | environ | 7 | logist_regress | 37 | debris_laden_slope | 573,675 |
23 | hurrican | 7 | hazard | 35 | decis_tree_dt | 573,675 |
24 | logist_regress | 7 | risk_manag | 35 | digit_cartographi | 573,675 |
25 | malaysia | 7 | hazard_map | 32 | earth_fractur | 573,675 |
26 | adapt | 6 | hurrican | 31 | geo_informat | 573,675 |
27 | flash_flood | 6 | flash_flood | 30 | geograph_analysi | 573,675 |
28 | frequenc_ratio | 6 | land_us_plan | 30 | geograph_weight_regress | 573,675 |
29 | geomorpholog | 6 | mass_movement | 29 | hazard_cours | 573,675 |
30 | hazard_map | 6 | topographi | 29 | human_settlement | 573,675 |
31 | itali | 6 | ahp | 28 | multi_criteria_decis_analysi_mcda | 573,675 |
32 | model | 6 | itali | 28 | multicriteria_analysi | 573,675 |
33 | monitor | 6 | seismic_hazard | 28 | orograph_forc | 573,675 |
34 | seismic_hazard | 6 | spatial_analysi | 28 | participatori_map | 573,675 |
35 | sustain | 6 | sustain_develop | 28 | physic_geographi | 573,675 |
36 | ahp | 5 | sea_level_rise | 27 | primari_school | 573,675 |
37 | debri_flow | 5 | urban_geolog | 27 | river_murray | 573,675 |
38 | disast_manag | 5 | disast_manag | 26 | scenario_model | 573,675 |
39 | environment_hazard | 5 | environment_hazard | 26 | seismic_microzon | 573,675 |
40 | eros | 5 | land_cover | 26 | torrenti_rainfal | 573,675 |
41 | fire | 5 | sustain | 26 | urban_form | 573,675 |
42 | indic | 5 | geomorpholog | 25 | casualti | 562,201.5 |
43 | insur | 5 | monitor | 25 | parallel_comput | 562,201.5 |
44 | land_us_plan | 5 | analyt_hierarchi_process | 24 | support_vector_machin_svm | 562,201.5 |
45 | mass_movement | 5 | artifici_neural_network | 24 | area_with_geograph_specif | 489,536 |
46 | risk_percept | 5 | bushfir | 24 | multi_criteria_method | 489,536 |
47 | sea_level_rise | 5 | coastal_manag | 24 | dewey_john | 489,536 |
48 | suscept | 5 | flood_disast | 24 | european_polici | 489,536 |
49 | sustain_develop | 5 | land_use_plan | 23 | evidence_bas_polici | 489,536 |
50 | topographi | 5 | debri_flow | 22 | gloss_srtm_data | 489,536 |
51 | tsunami | 5 | disast_risk_assess | 22 | natur_killert_cel | 489,536 |
52 | urban | 5 | geograph_variat | 22 | orissa_india | 489,536 |
53 | analyt_hierarchi_process | 4 | insur | 22 | participatori_plan | 489,536 |
54 | artifici_neural_network | 4 | slope_stabil | 22 | spatial_heterogen | 489,536 |
55 | bushfir | 4 | adapt | 21 | spatial_homogen | 489,536 |
56 | china | 4 | analyt_hierarchi_process_ahp | 21 | white_gilbert | 489,536 |
57 | coastal_manag | 4 | casualti | 21 | land_cover | 470,065.8 |
58 | flood_disast | 4 | coastal_vulner_index | 21 | landslid_suscept | 452,160.2 |
59 | flood_hazard | 4 | environ | 21 | flood_suscept | 437,085.7 |
60 | groundwat | 4 | gis_model | 21 | global_posit_system | 437,085.7 |
61 | himalaya | 4 | himalaya | 21 | analyt_hierarch_process | 430,256.3 |
62 | infrastructur | 4 | iran | 21 | crowdsourc | 430,256.3 |
63 | iran | 4 | parallel_comput | 21 | decis_support_system_dss | 430,256.3 |
64 | land_cover | 4 | pca | 21 | eman_coeffici | 430,256.3 |
65 | land_use_plan | 4 | risk_percept | 21 | fire_weather | 430,256.3 |
66 | rainfal | 4 | support_vector_machin_svm | 21 | flood_prepared | 430,256.3 |
67 | rockfal | 4 | suscept | 21 | fuzzi_relat | 430,256.3 |
68 | slope_stabil | 4 | tsunami | 21 | human_environ_relat | 430,256.3 |
69 | spatial_analysi | 4 | environment_justic | 20 | landslid_hazard_map | 430,256.3 |
70 | turkey | 4 | flood_hazard | 20 | multi_hazard_assess | 430,256.3 |
71 | uncertainti | 4 | flood_suscept | 20 | probabl_model | 430,256.3 |
72 | urban_geolog | 4 | global_posit_system | 20 | radon_mass_exhal_rate | 430,256.3 |
73 | wildfir | 4 | hazus | 20 | 430,256.3 | |
74 | analyt_hierarchi_process_ahp | 3 | iczm | 20 | vancouv | 430,256.3 |
75 | caribbean | 3 | indic | 20 | volunt_geograph_inform | 430,256.3 |
76 | casualti | 3 | natur_disturb | 20 | bangladesh | 401,176.9 |
77 | climat_chang_adapt | 3 | rockfal | 20 | land_us_plan | 382,450 |
78 | coastal_eros | 3 | serbia | 20 | landslid | 353,191.5 |
79 | coastal_vulner_index | 3 | turkey | 20 | social_vulner | 346,514.1 |
80 | damag | 3 | environment_chang | 19 | environment_justic | 339,955.6 |
81 | databas | 3 | estuari | 19 | hazus | 339,955.6 |
82 | dea | 3 | fire | 19 | Iczm | 339,955.6 |
83 | decis_support_system | 3 | infrastructur | 19 | natur_disturb | 339,955.6 |
84 | disast_mitig | 3 | ndvi | 19 | serbia | 339,955.6 |
85 | disast_risk_assess | 3 | photogrammetri | 19 | natur_disast | 311,798.6 |
86 | ecosystem_servic | 3 | rainfal | 19 | logist_regress | 307,984.7 |
87 | emerg_manag | 3 | urban | 19 | mass_movement | 306,324.2 |
88 | emerg_respons | 3 | china | 18 | topographi | 306,324.2 |
89 | environment_chang | 3 | eros | 18 | dea_model | 299,840.8 |
90 | environment_justic | 3 | groundwat | 18 | exceed_probabl | 299,840.8 |
91 | epidemiolog | 3 | peru | 18 | landslid_monitor | 299,840.8 |
92 | estuari | 3 | uncertainti | 18 | riverbank_stabil | 299,840.8 |
93 | exposur | 3 | wildfir | 18 | urban_system | 299,840.8 |
94 | flood_suscept | 3 | caribbean | 17 | natur_hazard | 297,429.3 |
95 | gender | 3 | coastal_eros | 17 | artifici_neural_network | 293,721.6 |
96 | geograph_variat | 3 | decis_support_system | 17 | bushfir | 293,721.6 |
97 | gis_model | 3 | disast_mitig | 17 | coastal_manag | 293,721.6 |
98 | global_posit_system | 3 | ecosystem_servic | 17 | photogrammetri | 276,128.9 |
99 | hazus | 3 | liquefact | 17 | analyt_hierarchi_process | 259,166.1 |
Appendix 2: Top 99 semantic units in the semantic based results (SF, SF-SIDF, SF-SAI)
Rank | SF | Freq. | SF-SIDF | Freq. | SF-SAI | Freq. |
---|---|---|---|---|---|---|
1 | gis; participatori_gis | 95 | gis; participatori_gis | 350.5554 | fluvial_hazard; hydrometeorolog_hazard; hazard_geographi; geoenvironment_hazard; multi_hazard; hazard_ontolog; hazard_informat; hazard; | 8,329,761 |
2 | geograph_inform_system; geograph_inform_system_gis | 74 | geograph_inform_system; geograph_inform_system_gis | 341.2582 | car_model; multiscal_model; dea_model; loglinear_model; model; ensembl_model; nois_model; traffic_model; model_chain; hidden_markov_model; mathemat_model; geochem_model; bayesian_hierarch_model; probabilist_model | 3,059,600 |
3 | natur_hazard | 64 | natur_hazard | 163.5489 | geograph_inform_system; geograph_inform_system_gis | 551,130.6 |
4 | socioeconom_vulner; socio_demograph_vulner; differenti_vulner; vulner; social_vulner; port_vulner | 52 | socioeconom_vulner; socio_demograph_vulner; differenti_vulner; vulner; social_vulner; port_vulner | 171.622 | gis; participatori_gis; | 361,425.3 |
5 | landslid; landslid_inventori | 45 | landslid; landslid_inventori | 151.6579 | socioeconom_vulner; socio_demograph_vulner; differenti_vulner; vulner; social_vulner; port_vulner | 73,343.6 |
6 | vulner_matrix; vulner; port_vulner; differenti_vulner | 40 | vulner_matrix; vulner; port_vulner; differenti_vulner | 138.9708 | hyperspectr_remot_sens; remot_sens_rs; remot_sens; satellit_remot_sens | 65,562.86 |
7 | fluvial_hazard; hydrometeorolog_hazard; hazard_geographi; geoenvironment_hazard; multi_hazard; hazard_ontolog; hazard_informat; hazard | 33 | fluvial_hazard; hydrometeorolog_hazard; hazard_geographi; geoenvironment_hazard; multi_hazard; hazard_ontolog; hazard_informat; hazard | 295.0969 | landslid; landslid_inventori | 58,894.39 |
8 | hyperspectr_remot_sens; remot_sens_rs; remot_sens; satellit_remot_sens | 30 | hyperspectr_remot_sens; remot_sens_rs; remot_sens; satellit_remot_sens | 128.6511 | natur_hazard | 52,744.62 |
9 | flash_flood; flood | 28 | flash_flood; flood | 93.62943 | vulner_matrix; vulner; port_vulner; differenti_vulner; | 51,638.82 |
10 | latent_risk; risk; predat_risk; risk_zonat | 25 | latent_risk; risk; predat_risk; risk_zonat | 86.54228 | analyt_hierarch_process; analyt_hierarchi_process; analyt_hierarch_process_ahp; analyt_hierarchi_process_ahp; fuzzi_analyt_hierarchi_process_fahp | 51,418.28 |
11 | multi_risk_assess; risk_assess; participatori_risk_assess; ecolog_risk_assess; individu_risk_assess | 24 | multi_risk_assess; risk_assess; participatori_risk_assess; ecolog_risk_assess; individu_risk_assess | 85.94491 | frequenc_ratio; likelihood_frequenc_ratio | 31,233.42 |
12 | natur_disast; natur_disast_prepared | 23 | natur_disast; natur_disast_prepared | 90.27616 | landslid_suscept; landslid_suscept_ls | 28,922.78 |
13 | chamoli_earthquak; earthquak; haiti_earthquak; chi_chi_earthquak; wenchuan_earthquak; devast_earthquak | 22 | chamoli_earthquak; earthquak; haiti_earthquak; chi_chi_earthquak; wenchuan_earthquak; devast_earthquak | 77.67505 | malaysia | 28,830.85 |
14 | presidenti_disast_declar; manmad_disast; disast; post_disast_reconstruct; disast_prepared | 22 | presidenti_disast_declar; manmad_disast; disast; post_disast_reconstruct; disast_prepared | 78.47514 | natur_disast; natur_disast_prepared | 26,796.83 |
15 | car_model; multiscal_model; dea_model; loglinear_model; model; ensembl_model; nois_model; traffic_model; model_chain; hidden_markov_model; mathemat_model; geochem_model; bayesian_hierarch_model; probabilist_model | 20 | car_model; multiscal_model; dea_model; loglinear_model; model; ensembl_model; nois_model; traffic_model; model_chain; hidden_markov_model; mathemat_model; geochem_model; bayesian_hierarch_model; probabilist_model | 178.8466 | geograph_inform; volunt_geograph_inform | 24,476.8 |
16 | climat_chang; climat_chang_adapt | 16 | climat_chang; climat_chang_adapt | 56.49095 | intrins_vulner_map; map_vulner; vulner_map; specif_vulner_map; | 24,476.8 |
17 | analyt_hierarch_process; analyt_hierarchi_process; analyt_hierarch_process_ahp; analyt_hierarchi_process_ahp; fuzzi_analyt_hierarchi_process_fahp | 11 | analyt_hierarch_process; analyt_hierarchi_process; analyt_hierarch_process_ahp; analyt_hierarchi_process_ahp; fuzzi_analyt_hierarchi_process_fahp | 66.57154 | spatial_analysi; spatial_multi_criteria_analysi; tempor_analysi | 22,947 |
18 | landslid_suscept; landslid_suscept_ls | 11 | landslid_suscept; landslid_suscept_ls | 60.24254 | flash_flood; flood; | 22,210.43 |
19 | resili | 11 | resili | 44.24584 | multi_risk_assess; risk_assess; participatori_risk_assess; ecolog_risk_assess; individu_risk_assess; | 20,684.62 |
20 | gargano_promontori_southern_itali; southern_itali; itali; western_jilin_provinc; central_liaon_provinc | 10 | gargano_promontori_southern_itali; southern_itali; itali; western_jilin_provinc; central_liaon_provinc | 46.51871 | land_cover | 20,397.33 |
21 | hurrican; hurrican_mitch; hurrican_katrina | 10 | hurrican; hurrican_mitch; hurrican_katrina | 46.65664 | latent_risk; risk; predat_risk; risk_zonat; | 19,919.27 |
22 | optim_risk_manag; risk_manag | 10 | optim_risk_manag; risk_manag | 42.97939 | nord_pas_de_calai; vulnerabilidad_de_las_instalacion_critica; santiago_de_chile; vulnerabilidad_de_la_infraestructura; | 17,483.43 |
23 | ayvalik_turkey; turkey; findik_turkey; egirdir_turkey; yenisehir_turkey; istanbul_turkey | 9 | ayvalik_turkey; turkey; findik_turkey; egirdir_turkey; yenisehir_turkey; istanbul_turkey | 47.05882 | casualti | 17,210.25 |
24 | drought | 9 | drought | 41.04273 | disast_risk_assess | 17,210.25 |
25 | geotechn_microzon_map; neotecton_map; krige_map; participatori_map; map; geomorpholog_map | 9 | geotechn_microzon_map; neotecton_map; krige_map; participatori_map; map; geomorpholog_map | 45.64016 | geograph_variat | 17,210.25 |
26 | environ; geo_environ | 8 | environ; geo_environ | 38.65165 | presidenti_disast_declar; manmad_disast; disast; post_disast_reconstruct; disast_prepared; | 17,139.43 |
27 | urban; urban_geographi; urban_terror; urban_abandon | 8 | urban; urban_geographi; urban_terror; urban_abandon | 43.32776 | chamoli_earthquak; earthquak; haiti_earthquak; chi_chi_earthquak; wenchuan_earthquak; devast_earthquak; | 16,527.3 |
28 | alborz_mountain; fagara_mountain; apuseni_mountain; changbai_mountain; mountain | 7 | alborz_mountain; fagara_mountain; apuseni_mountain; changbai_mountain; mountain | 40.06418 | land_plan; land_us_plan | 15,298 |
29 | bangladesh | 7 | bangladesh | 39.52545 | hazus; hazus_mh | 15,298 |
30 | bayesian_analysi; multicriteria_analysi; spatiotempor_analysi; discours_analysi; meta_analysi; 3d_visual_analysi | 7 | bayesian_analysi; multicriteria_analysi; spatiotempor_analysi; discours_analysi; meta_analysi; 3d_visual_analysi | 20.23169 | urban_geolog | 15,298 |
31 | binghamton_geomorpholog_symposium; geomorpholog | 7 | binghamton_geomorpholog_symposium; geomorpholog | 35.07353 | ayvalik_turkey; turkey; findik_turkey; egirdir_turkey; yenisehir_turkey; istanbul_turkey; | 15,111.44 |
32 | frequenc_ratio; likelihood_frequenc_ratio | 7 | frequenc_ratio; likelihood_frequenc_ratio | 45.20196 | alborz_mountain; fagara_mountain; apuseni_mountain; changbai_mountain; mountain; | 14,992.04 |
33 | logist_regress | 7 | logist_regress | 35.4979 | topographi | 14,709.62 |
34 | malaysia | 7 | malaysia | 44.64167 | urban; urban_geographi; urban_terror; urban_abandon; | 14,398.12 |
35 | northeast_china; fujian_china; china; northern_china | 7 | northeast_china; fujian_china; china; northern_china | 33.05976 | bangladesh | 13,881.52 |
36 | semi_natur_habitat; natur_disturb; natur_geohazard; natur_calam; natur_conserv | 7 | semi_natur_habitat; natur_disturb; natur_geohazard; natur_calam; natur_conserv | 34.80427 | rainfal; torrenti_rainfal | 13,768.2 |
37 | sustain; sustain_tourism | 7 | sustain; sustain_tourism | 36.10698 | gis_model | 13,768.2 |
38 | artifici_neural_network; neural_network | 6 | artifici_neural_network; neural_network | 33.45021 | pca | 13,768.2 |
39 | asset_manag; habitat_manag; stock_manag; wildlif_manag; self_manag; manag | 6 | asset_manag; habitat_manag; stock_manag; wildlif_manag; self_manag; manag | 29.60998 | catchment; saratel_catchment; xiangxi_catchment; loess_catchment; | 13,598.22 |
40 | big_data; data_fusion; multisensor_data_fusion; srtm_data; lidar_data; administr_data | 6 | big_data; data_fusion; multisensor_data_fusion; srtm_data; lidar_data; administr_data | 31.5207 | natur_disturb_regim; natur_disturb | 13,598.22 |
41 | cardiovascular_diseas; coronari_diseas; diarrheal_diseas; hiv_diseas_progress; diseas_progress; coronari_heart_diseas | 6 | cardiovascular_diseas; coronari_diseas; diarrheal_diseas; hiv_diseas_progress; diseas_progress; coronari_heart_diseas | 15.01788 | geotechn_microzon_map; neotecton_map; krige_map; participatori_map; map; geomorpholog_map | 12,907.69 |
42 | fire; grassland_fire | 6 | fire; grassland_fire | 28.70068 | estuari; meghna_estuari; | 12,238.4 |
43 | hazard_map | 6 | hazard_map | 34.84102 | wildland_fire_risk; fire_risk; forest_fire_risk; | 12,238.4 |
44 | land_plan; land_us_plan | 6 | land_plan; land_us_plan | 36.31175 | hazard_map; | 11,972.35 |
45 | monitor | 6 | monitor | 31.5207 | coastal_vulner_index | 11,473.5 |
46 | rainfal; torrenti_rainfal | 6 | rainfal; torrenti_rainfal | 35.67959 | dea; | 11,473.5 |
47 | sea_level; sea_level_rise | 6 | sea_level; sea_level_rise | 33.45021 | disast_risk_index; ecolog_disast_risk_index; grassland_snow_disast_risk_index; | 11,473.5 |
48 | seismic_hazard | 6 | seismic_hazard | 30.30306 | coastal_manag; urban_manag | 11,248.53 |
49 | slope; slope_movement; slope_stabil | 6 | slope; slope_movement; slope_stabil | 30.81401 | hurrican; hurrican_mitch; hurrican_katrina; | 10,623.61 |
50 | social_media; social_geographi; social_disadvantag; social; social_contact | 6 | social_media; social_geographi; social_disadvantag; social; social_contact | 32.85957 | landslid_inventori_map; landslid_map; landslid_suscept_map; | 10,623.61 |
51 | spatial_analysi; spatial_multi_criteria_analysi; tempor_analysi | 6 | spatial_analysi; spatial_multi_criteria_analysi; tempor_analysi | 38.74454 | gargano_promontori_southern_itali; southern_itali; itali; western_jilin_provinc; central_liaon_provinc; | 10,478.08 |
52 | arctic_region; region; himalayan_region; arid_region; mediterranean_region | 5 | arctic_region; region; himalayan_region; arid_region; mediterranean_region | 27.70566 | spatial; spatial_smooth; spatial_heterogen; spatial_homogen | 10,198.67 |
53 | coastal_manag; urban_manag | 5 | coastal_manag; urban_manag | 30.54558 | activ_learn | 10,198.67 |
54 | debri_flow | 5 | debri_flow | 22.99262 | anthropogen_intervent | 10,198.67 |
55 | decis_support_system; decis_support_system_dss | 5 | decis_support_system; decis_support_system_dss | 29.25644 | catastroph_theori | 10,198.67 |
56 | disast_manag | 5 | disast_manag | 27.22911 | claim; claim_payout; | 10,198.67 |
57 | eros | 5 | eros | 25.67834 | climat_variat | 10,198.67 |
58 | flash_flood_hazard; flood_hazard | 5 | flash_flood_hazard; flood_hazard | 27.70566 | communic_satellit; satellit_communic; | 10,198.67 |
59 | guangdong_provinc; shandong_provinc; hebei_provinc; jilin_provinc; western_jilin_provinc | 5 | guangdong_provinc; shandong_provinc; hebei_provinc; jilin_provinc; western_jilin_provinc | 29.73299 | crowdsourc | 10,198.67 |
60 | himalaya; kashmir_himalaya | 5 | himalaya; kashmir_himalaya | 27.70566 | cumul_fire_risk_index; fire_risk_index; | 10,198.67 |
61 | indic | 5 | indic | 29.03418 | cumulonimbus_convect | 10,198.67 |
62 | insur | 5 | insur | 25.56844 | earth_fractur | 10,198.67 |
63 | landslid_inventori_map; landslid_map; landslid_suscept_map | 5 | landslid_inventori_map; landslid_map; landslid_suscept_map | 30.25979 | flood_prepared | 10,198.67 |
64 | mass_movement | 5 | mass_movement | 29.98946 | geo_informat | 10,198.67 |
65 | po_river; river_murray; river; yangtz_river_delta | 5 | po_river; river_murray; river; yangtz_river_delta | 23.54112 | geograph_analysi | 10,198.67 |
66 | risk_percept | 5 | risk_percept | 21.88991 | geograph_weight_regress | 10,198.67 |
67 | sustain_develop | 5 | sustain_develop | 28.61727 | geospati; | 10,198.67 |
68 | topographi | 5 | topographi | 31.8869 | hazard_cours | 10,198.67 |
69 | tsunami | 5 | tsunami | 21.94227 | hazard_zone; multi_hazard_zone; | 10,198.67 |
70 | alp; apuan_alp; swiss_alp | 4 | alp; apuan_alp; swiss_alp | 23.05711 | human_environ_relat | 10,198.67 |
71 | bushfir | 4 | bushfir | 25.21309 | land_us; | 10,198.67 |
72 | catchment; saratel_catchment; xiangxi_catchment; loess_catchment | 4 | catchment; saratel_catchment; xiangxi_catchment; loess_catchment | 26.98042 | offshor_taiwan; onshor_taiwan; | 10,198.67 |
73 | central_america; central_greec; central_liaon_provinc | 4 | central_america; central_greec; central_liaon_provinc | 21.54793 | physic_geographi | 10,198.67 |
74 | coastal_area; coastal_urban_area; urban_area | 4 | coastal_area; coastal_urban_area; urban_area | 24.20783 | primari_school | 10,198.67 |
75 | cultur; cultur_geographi; cultur_heritag | 4 | cultur; cultur_geographi; cultur_heritag | 23.78639 | prison | 10,198.67 |
76 | eco_friend_method; interdisciplinari_method; geostatist_method; semi_qualit_method | 4 | eco_friend_method; interdisciplinari_method; geostatist_method; semi_qualit_method | 21.0138 | scenario_model | 10,198.67 |
77 | environment_anthropolog; environment_justic | 4 | environment_anthropolog; environment_justic | 22.30014 | 10,198.67 | |
78 | estuari; meghna_estuari | 4 | estuari; meghna_estuari | 26.55898 | urban_form | 10,198.67 |
79 | flood_disast | 4 | flood_disast | 24.43647 | vancouv | 10,198.67 |
80 | forest; minudasht_forest; oak_forest | 4 | forest; minudasht_forest; oak_forest | 22.73693 | mass_movement | 10,064.47 |
81 | fujita_scale; scale | 4 | fujita_scale; scale | 25.50952 | digit; digit_cartographi | 9834.429 |
82 | geograph_inform; volunt_geograph_inform | 4 | geograph_inform; volunt_geograph_inform | 29.33157 | flood_suscept | 9834.429 |
83 | geolog_hazard; geomorpholog_hazard | 4 | geolog_hazard; geomorpholog_hazard | 22.4405 | global_posit_system | 9834.429 |
84 | groundwat | 4 | groundwat | 20.12123 | iczm | 9834.429 |
85 | hazus; hazus_mh | 4 | hazus; hazus_mh | 27.45155 | photogrammetri | 9834.429 |
86 | human_capit; human_settlement; human_biomonitor | 4 | human_capit; human_settlement; human_biomonitor | 23.22734 | guangdong_provinc; shandong_provinc; hebei_provinc; jilin_provinc; western_jilin_provinc; | 9561.25 |
87 | infrastructur | 4 | infrastructur | 24.20783 | artifici_neural_network; neural_network | 9495.31 |
88 | integr_vulner_assess; vulner_assess | 4 | integr_vulner_assess; vulner_assess | 22.73693 | sea_level; sea_level_rise; | 9495.31 |
89 | intrins_vulner_map; map_vulner; vulner_map; specif_vulner_map | 4 | intrins_vulner_map; map_vulner; vulner_map; specif_vulner_map | 29.33157 | fujita_scale; scale; | 9414.154 |
90 | iran | 4 | iran | 21.43525 | climat_chang; climat_chang_adapt; | 8741.714 |
91 | land_cover | 4 | land_cover | 28.60228 | bushfir | 8741.714 |
92 | land_use_plan | 4 | land_use_plan | 23.05711 | decis_support_system; decis_support_system_dss | 8692.045 |
93 | natur_break; natur_geohazard; natur_calam; socio_natur | 4 | natur_break; natur_geohazard; natur_calam; socio_natur | 18.44639 | social_media; social_geographi; social_disadvantag; social; social_contact; | 8605.125 |
94 | natur_disturb_regim; natur_disturb | 4 | natur_disturb_regim; natur_disturb | 26.98042 | landsat; landsat_tm; | 8605.125 |
95 | nival_process; poisson_process; torrenti_process; geochem_process | 4 | nival_process; poisson_process; torrenti_process; geochem_process | 20.81864 | serbia | 8605.125 |
96 | nord_pas_de_calai; vulnerabilidad_de_las_instalacion_critica; santiago_de_chile; vulnerabilidad_de_la_infraestructura | 4 | nord_pas_de_calai; vulnerabilidad_de_las_instalacion_critica; santiago_de_chile; vulnerabilidad_de_la_infraestructura | 27.98568 | sustain; sustain_tourism; | 8518.205 |
97 | rockfal | 4 | rockfal | 20.63256 | indic; | 8314.13 |
98 | satellit_imag; satellit_imageri; satellit_imag_classif | 4 | satellit_imag; satellit_imageri; satellit_imag_classif | 22.58597 | environ; geo_environ; | 8025.18 |
99 | soil; soil_termiticid; soil_suction | 4 | soil; soil_termiticid; soil_suction | 16.68658 | logist_regress | 7808.354 |
Appendix 3: All 137 keywords including 33 unique TF-KAI keywords, 38 unique SF-SAI keywords, and 66 overlapping keywords
Id | Keyword | Id | Keyword | Id | Keyword | Id | Keyword | Id | Keyword | Id | Keyword |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | cross_applic | 23 | fuzzi_relat | 46 | xiangxi_catchment | 69 | geo_environ | 92 | earth_fractur | 115 | |
1 | debris_laden_slope | 24 | landslid_hazard_map | 47 | meghna_estuari | 70 | dea | 93 | geo_informat | 116 | vancouv |
2 | decis_tree_dt | 25 | multi_hazard_assess | 48 | wildland_fire_risk | 71 | geograph_inform_system_gis | 94 | geograph_analysi | 117 | volunt_geograph_inform |
3 | human_settlement | 26 | probabl_model | 49 | hazard_map | 72 | geograph_inform_system | 95 | geograph_weight_regress | 118 | bangladesh |
4 | multi_criteria_decis_analysi_mcda | 27 | radon_mass_exhal_rate | 50 | disast_risk_index | 73 | gis | 96 | hazard_cours | 119 | land_us_plan |
5 | multicriteria_analysi | 28 | environment_justic | 51 | hurrican_mitch | 74 | malaysia | 97 | participatori_map | 120 | landslid |
6 | orograph_forc | 29 | exceed_probabl | 52 | landslid_inventori_map | 75 | spatial_analysi | 98 | physic_geographi | 121 | social_vulner |
7 | river_murray | 30 | landslid_monitor | 53 | gargano_promontori_southern_itali | 76 | prison | 99 | primari_school | 122 | hazus |
8 | seismic_microzon | 31 | riverbank_stabil | 54 | claim_payout | 77 | disast_risk_assess | 100 | scenario_model | 123 | iczm |
9 | parallel_comput | 32 | urban_system | 55 | satellit_communic | 78 | geograph_variat | 101 | torrenti_rainfal | 124 | natur_disturb |
10 | support_vector_machin_svm | 33 | hydrometeorolog_hazard | 56 | cumul_fire_risk_index | 79 | frequenc_ratio | 102 | urban_form | 125 | serbia |
11 | area_with_geograph_specif | 34 | participatori_gis | 57 | geospati | 80 | remot_sens | 103 | casualti | 126 | natur_disast |
12 | multi_criteria_method | 35 | vulner_matrix | 58 | multi_hazard_zone | 81 | urban_geolog | 104 | spatial_heterogen | 127 | logist_regress |
13 | dewey_john | 36 | intrins_vulner_ma | 59 | land_us | 82 | analyt_hierarchi_process_ahp | 105 | spatial_homogen | 128 | mass_movement |
14 | european_polici | 37 | flash_flood | 60 | onshor_taiwan | 83 | coastal_vulner_index | 106 | land_cover | 129 | topographi |
15 | evidence_bas_polici | 38 | multi_risk_assess | 61 | guangdong_provinc | 84 | gis_model | 107 | landslid_suscept | 130 | dea_model |
16 | gloss_srtm_data | 39 | latent_risk | 62 | sea_level_rise | 85 | pca | 108 | flood_suscept | 131 | natur_hazard |
17 | natur_killer_t_cel | 40 | vulnerabilidad_de_la_infraestructura | 63 | fujita_scale | 86 | activ_learn | 109 | global_posit_system | 132 | artifici_neural_network |
18 | orissa_india | 41 | presidenti_disast_declar | 64 | climat_chang_adapt | 87 | anthropogen_intervent | 110 | analyt_hierarch_process | 133 | bushfir |
19 | participatori_plan | 42 | wenchuan_earthquak | 65 | social_media | 88 | catastroph_theori | 111 | crowdsourc | 134 | coastal_manag |
20 | white_gilbert | 43 | istanbul_turkey | 66 | landsat_tm | 89 | climat_variat | 112 | decis_support_system_dss | 135 | photogrammetri |
21 | eman_coeffici | 44 | changbai_mountain | 67 | sustain_tourism | 90 | cumulonimbus_convect | 113 | flood_prepared | 136 | analyt_hierarchi_process |
22 | fire_weather | 45 | urban_terror | 68 | indic | 91 | digit_cartographi | 114 | human_environ_relat |
Appendix 4: Experimental materials for testing the efficiency of the SF-SAI and TF-KAI methods
Rank | SF-SAI semantic units | SF-SAI keywords | TF-KAI unique keywords |
---|---|---|---|
1 | fluvial_hazard; hydrometeorolog_hazard; hazard_geographi; geoenvironment_hazard; multi_hazard; hazard_ontolog; hazard_informat; hazard | hydrometeorolog_hazard | cross_applic |
2 | gis; participatori_gis | participatori_gis | debris_laden_slope |
3 | vulner_matrix; vulner; port_vulner; differenti_vulner | vulner_matrix | decis_tree_dt |
4 | intrins_vulner_map; map_vulner; vulner_map; specif_vulner_map | intrins_vulner_ma | human_settlement |
5 | flash_flood; flood | flash_flood | multi_criteria_decis_analysi_mcda |
6 | multi_risk_assess; risk_assess; participatori_risk_assess; ecolog_risk_assess; individu_risk_assess | multi_risk_assess | multicriteria_analysi |
7 | latent_risk; risk; predat_risk; risk_zonat | latent_risk | orograph_forc |
8 | nord_pas_de_calai; vulnerabilidad_de_las_instalacion_critica; santiago_de_chile; vulnerabilidad_de_la_infraestructura | vulnerabilidad_de_la_infraestructura | river_murray |
9 | presidenti_disast_declar; manmad_disast; disast; post_disast_reconstruct; disast_prepared | presidenti_disast_declar | seismic_microzon |
10 | chamoli_earthquak; earthquak; haiti_earthquak; chi_chi_earthquak; wenchuan_earthquak; devast_earthquak | wenchuan_earthquak | parallel_comput |
11 | ayvalik_turkey; turkey; findik_turkey; egirdir_turkey; yenisehir_turkey; istanbul_turkey | istanbul_turkey | support_vector_machin_svm |
12 | alborz_mountain; fagara_mountain; apuseni_mountain; changbai_mountain; mountain | changbai_mountain | area_with_geograph_specif |
13 | urban; urban_geographi; urban_terror; urban_abandon | urban_terror | multi_criteria_method |
14 | catchment; saratel_catchment; xiangxi_catchment; loess_catchment | xiangxi_catchment | dewey_john |
15 | estuari; meghna_estuari | meghna_estuari | european_polici |
16 | wildland_fire_risk; fire_risk; forest_fire_risk | wildland_fire_risk | evidence_bas_polici |
17 | hazard_map | hazard_map | gloss_srtm_data |
18 | disast_risk_index; ecolog_disast_risk_index; grassland_snow_disast_risk_index | disast_risk_index | natur_killer_t_cel |
19 | hurrican; hurrican_mitch; hurrican_katrina | hurrican_mitch | orissa_india |
20 | landslid_inventori_map; landslid_map; landslid_suscept_map | landslid_inventori_map | participatori_plan |
21 | gargano_promontori_southern_itali; southern_itali; itali; western_jilin_provinc; central_liaon_provinc | gargano_promontori_southern_itali | white_gilbert |
22 | claim; claim_payout | claim_payout | eman_coeffici |
23 | communic_satellit; satellit_communic | satellit_communic | fire_weather |
24 | cumul_fire_risk_index; fire_risk_index | cumul_fire_risk_index | fuzzi_relat |
25 | geospati | geospati | landslid_hazard_map |
26 | hazard_zone; multi_hazard_zone | multi_hazard_zone | multi_hazard_assess |
27 | land_us | land_us | probabl_model |
28 | offshor_taiwan; onshor_taiwan | onshor_taiwan | radon_mass_exhal_rate |
29 | guangdong_provinc; shandong_provinc; hebei_provinc; jilin_provinc; western_jilin_provinc | guangdong_provinc | environment_justic |
30 | sea_level; sea_level_rise | sea_level_rise | exceed_probabl |
31 | fujita_scale; scale | fujita_scale | landslid_monitor |
32 | climat_chang; climat_chang_adapt | climat_chang_adapt | riverbank_stabil |
33 | social_media; social_geographi; social_disadvantag; social; social_contact | social_media | urban_system |
34 | landsat; landsat_tm | landsat_tm | |
35 | sustain; sustain_tourism | sustain_tourism | |
36 | indic | indic | |
37 | environ; geo_environ | geo_environ | |
38 | dea | dea |
Rights and permissions
About this article
Cite this article
Hu, K., Wu, H., Qi, K. et al. A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model. Scientometrics 114, 1031–1068 (2018). https://doi.org/10.1007/s11192-017-2574-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-017-2574-9