Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In bibliometric research, keyword analysis of publications provides an effective way not only to investigate the knowledge structure of research domains, but also to explore the developing trends within domains. To identify the most representative keywords, many approaches have been proposed. Most of them focus on using statistical regularities, syntax, grammar, or network-based characteristics to select representative keywords for the domain analysis. In this paper, we argue that the domain knowledge is reflected by the semantic meanings behind keywords rather than the keywords themselves. We apply the Google Word2Vec model, a model of a word distribution using deep learning, to represent the semantic meanings of the keywords. Based on this work, we propose a new domain knowledge approach, the Semantic Frequency-Semantic Active Index, similar to Term Frequency-Inverse Document Frequency, to link domain and background information and identify infrequent but important keywords. We adopt a semantic similarity measuring process before statistical computation to compute the frequencies of “semantic units” rather than keyword frequencies. Semantic units are generated by word vector clustering, while the Inverse Document Frequency is extended to include the semantic inverse document frequency; thus only words in the inverse documents with a certain similarity will be counted. Taking geographical natural hazards as the domain and natural hazards as the background discipline, we identify the domain-specific knowledge that distinguishes geographical natural hazards from other types of natural hazards. We compare and discuss the advantages and disadvantages of the proposed method in relation to existing methods, finding that by introducing the semantic meaning of the keywords, our method supports more effective domain knowledge analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Bird, S. (2006). NLTK: The natural language toolkit. In Proceedings of the COLING/ACL on interactive presentation sessions, 2006 (pp. 69–72). Association for Computational Linguistics.

  • Borgatti, S. P. (2005). Centrality and network flow. Social networks, 27(1), 55–71. https://doi.org/10.1016/j.socnet.2004.11.008.

    Article  MathSciNet  Google Scholar 

  • Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57(3), 359–377.

    Article  Google Scholar 

  • Chen, G., & Xiao, L. (2016). Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods. Journal of Informetrics, 10(1), 212–223.

    Article  Google Scholar 

  • Chen, G., Xiao, L., Hu, C.-P., & Zhao, X.-Q. (2015). Identifying the research focus of Library and Information Science institutions in China with institution-specific keywords. Scientometrics, 103(2), 707–724.

    Article  Google Scholar 

  • Der Maaten, L. V., & Hinton, G. E. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

    MATH  Google Scholar 

  • Ding, Y., Chowdhury, G. G., & Foo, S. (2001). Bibliometric cartography of information retrieval research by using co-word analysis. Information Processing and Management, 37(6), 817–842.

    Article  MATH  Google Scholar 

  • Feng, J., Zhang, Y. Q., & Zhang, H. (2017). Improving the co-word analysis method based on semantic distance. Scientometrics, 111(3), 1521–1531.

    Article  Google Scholar 

  • Handler, A. (2014). An empirical study of semantic similarity in WordNet and Word2Vec. Citeseer.

  • Huang, E. H., Socher, R., Manning, C. D., & Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th annual meeting of the association for computational linguistics: Long papersVolume 1, 2012 (pp. 873–882): Association for Computational Linguistics.

  • Meng, L., Huang, R., & Gu, J. (2013). A review of semantic similarity measures in wordnet. International Journal of Hybrid Information Technology, 6(1), 1–12.

    Google Scholar 

  • Mihalcea, R., Corley, C., & Strapparava, C. (2006). Corpus-based and knowledge-based measures of text semantic similarity. In AAAI, 2006 (Vol. 6, pp. 775–780).

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Neural information processing systems (pp. 3111–3119).

  • Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  • Newman, M. E. (2008). The mathematics of networks. The New Palgrave Encyclopedia of Economics, 2(2008), 1–12.

    Google Scholar 

  • Quoniam, L., Balme, F., Rostaing, H., Giraud, E., & Dou, J. M. (1998). Bibliometric law used for information retrieval. [journal article]. Scientometrics, 41(1), 83–91. https://doi.org/10.1007/bf02457969.

    Article  Google Scholar 

  • Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics, 27(3), 832–837.

    Article  MathSciNet  MATH  Google Scholar 

  • Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.

    Article  Google Scholar 

  • Su, H.-N., & Lee, P.-C. (2010). Mapping knowledge structure by keyword co-occurrence: A first look at journal papers in Technology Foresight. Scientometrics, 85(1), 65–79. https://doi.org/10.1007/s11192-010-0259-8.

    Article  Google Scholar 

  • Wang, Z.-Y., Li, G., Li, C.-Y., & Li, A. (2012). Research on the semantic-based co-word analysis. Scientometrics, 90(3), 855–875.

    Article  Google Scholar 

  • Yang, S., Han, R., Wolfram, D., & Zhao, Y. (2016). Visualizing the intellectual structure of information science (2006–2015): Introducing author keyword coupling analysis. Journal of Informetrics, 10(1), 132–150.

    Article  Google Scholar 

  • Zhao, R., & Wang, J. (2010). Visualizing the research on pervasive and ubiquitous computing. Scientometrics, 86(3), 593–612.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 41371372). Thanks Mr. Stephen C. McClure for helping us with the English revisions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Hu.

Appendices

Appendix 1: Top 99 selected keywords from the TF, TF-IDF and TF-KAI methods (the unique keywords are in bold)

Rank

TF

Freq.

TF-IDF

Freq.

TF-KAI

Freq.

1

gis

94

gis

306

geograph_inform_system_gis

5,934,569

2

natur_hazard

64

geograph_inform_system

216

geograph_inform_system

4,634,698

3

geograph_inform_system

47

natur_hazard

158

gis

2,444,443

4

landslid

43

geograph_inform_system_gis

150

malaysia

1,139,113

5

vulner

37

landslid

130

spatial_analysi

999,469.3

6

geograph_inform_system_gis

27

remot_sens

109

prison

979,072

7

Hazard

26

vulner

107

disast_risk_assess

925,529

8

remot_sens

26

natur_disast

83

geograph_variat

925,529

9

flood

22

risk_assess

67

frequenc_ratio

920,429.7

10

natur_disast

22

flood

60

remot_sens

790,241.5

11

risk

22

earthquak

51

urban_geolog

697,015.1

12

risk_assess

20

landslid_suscept

51

analyt_hierarchi_process_ahp

674,641.8

13

disast

18

social_vulner

49

coastal_vulner_index

674,641.8

14

earthquak

16

disast

46

gis_model

674,641.8

15

climat_chang

13

climat_chang

45

Pca

674,641.8

16

resili

11

malaysia

44

activ_learn

573,675

17

landslid_suscept

10

resili

40

anthropogen_intervent

573,675

18

social_vulner

10

bangladesh

39

catastroph_theori

573,675

19

drought

9

risk

39

climat_variat

573,675

20

risk_manag

9

drought

38

cross_applic

573,675

21

bangladesh

7

frequenc_ratio

38

cumulonimbus_convect

573,675

22

environ

7

logist_regress

37

debris_laden_slope

573,675

23

hurrican

7

hazard

35

decis_tree_dt

573,675

24

logist_regress

7

risk_manag

35

digit_cartographi

573,675

25

malaysia

7

hazard_map

32

earth_fractur

573,675

26

adapt

6

hurrican

31

geo_informat

573,675

27

flash_flood

6

flash_flood

30

geograph_analysi

573,675

28

frequenc_ratio

6

land_us_plan

30

geograph_weight_regress

573,675

29

geomorpholog

6

mass_movement

29

hazard_cours

573,675

30

hazard_map

6

topographi

29

human_settlement

573,675

31

itali

6

ahp

28

multi_criteria_decis_analysi_mcda

573,675

32

model

6

itali

28

multicriteria_analysi

573,675

33

monitor

6

seismic_hazard

28

orograph_forc

573,675

34

seismic_hazard

6

spatial_analysi

28

participatori_map

573,675

35

sustain

6

sustain_develop

28

physic_geographi

573,675

36

ahp

5

sea_level_rise

27

primari_school

573,675

37

debri_flow

5

urban_geolog

27

river_murray

573,675

38

disast_manag

5

disast_manag

26

scenario_model

573,675

39

environment_hazard

5

environment_hazard

26

seismic_microzon

573,675

40

eros

5

land_cover

26

torrenti_rainfal

573,675

41

fire

5

sustain

26

urban_form

573,675

42

indic

5

geomorpholog

25

casualti

562,201.5

43

insur

5

monitor

25

parallel_comput

562,201.5

44

land_us_plan

5

analyt_hierarchi_process

24

support_vector_machin_svm

562,201.5

45

mass_movement

5

artifici_neural_network

24

area_with_geograph_specif

489,536

46

risk_percept

5

bushfir

24

multi_criteria_method

489,536

47

sea_level_rise

5

coastal_manag

24

dewey_john

489,536

48

suscept

5

flood_disast

24

european_polici

489,536

49

sustain_develop

5

land_use_plan

23

evidence_bas_polici

489,536

50

topographi

5

debri_flow

22

gloss_srtm_data

489,536

51

tsunami

5

disast_risk_assess

22

natur_killert_cel

489,536

52

urban

5

geograph_variat

22

orissa_india

489,536

53

analyt_hierarchi_process

4

insur

22

participatori_plan

489,536

54

artifici_neural_network

4

slope_stabil

22

spatial_heterogen

489,536

55

bushfir

4

adapt

21

spatial_homogen

489,536

56

china

4

analyt_hierarchi_process_ahp

21

white_gilbert

489,536

57

coastal_manag

4

casualti

21

land_cover

470,065.8

58

flood_disast

4

coastal_vulner_index

21

landslid_suscept

452,160.2

59

flood_hazard

4

environ

21

flood_suscept

437,085.7

60

groundwat

4

gis_model

21

global_posit_system

437,085.7

61

himalaya

4

himalaya

21

analyt_hierarch_process

430,256.3

62

infrastructur

4

iran

21

crowdsourc

430,256.3

63

iran

4

parallel_comput

21

decis_support_system_dss

430,256.3

64

land_cover

4

pca

21

eman_coeffici

430,256.3

65

land_use_plan

4

risk_percept

21

fire_weather

430,256.3

66

rainfal

4

support_vector_machin_svm

21

flood_prepared

430,256.3

67

rockfal

4

suscept

21

fuzzi_relat

430,256.3

68

slope_stabil

4

tsunami

21

human_environ_relat

430,256.3

69

spatial_analysi

4

environment_justic

20

landslid_hazard_map

430,256.3

70

turkey

4

flood_hazard

20

multi_hazard_assess

430,256.3

71

uncertainti

4

flood_suscept

20

probabl_model

430,256.3

72

urban_geolog

4

global_posit_system

20

radon_mass_exhal_rate

430,256.3

73

wildfir

4

hazus

20

twitter

430,256.3

74

analyt_hierarchi_process_ahp

3

iczm

20

vancouv

430,256.3

75

caribbean

3

indic

20

volunt_geograph_inform

430,256.3

76

casualti

3

natur_disturb

20

bangladesh

401,176.9

77

climat_chang_adapt

3

rockfal

20

land_us_plan

382,450

78

coastal_eros

3

serbia

20

landslid

353,191.5

79

coastal_vulner_index

3

turkey

20

social_vulner

346,514.1

80

damag

3

environment_chang

19

environment_justic

339,955.6

81

databas

3

estuari

19

hazus

339,955.6

82

dea

3

fire

19

Iczm

339,955.6

83

decis_support_system

3

infrastructur

19

natur_disturb

339,955.6

84

disast_mitig

3

ndvi

19

serbia

339,955.6

85

disast_risk_assess

3

photogrammetri

19

natur_disast

311,798.6

86

ecosystem_servic

3

rainfal

19

logist_regress

307,984.7

87

emerg_manag

3

urban

19

mass_movement

306,324.2

88

emerg_respons

3

china

18

topographi

306,324.2

89

environment_chang

3

eros

18

dea_model

299,840.8

90

environment_justic

3

groundwat

18

exceed_probabl

299,840.8

91

epidemiolog

3

peru

18

landslid_monitor

299,840.8

92

estuari

3

uncertainti

18

riverbank_stabil

299,840.8

93

exposur

3

wildfir

18

urban_system

299,840.8

94

flood_suscept

3

caribbean

17

natur_hazard

297,429.3

95

gender

3

coastal_eros

17

artifici_neural_network

293,721.6

96

geograph_variat

3

decis_support_system

17

bushfir

293,721.6

97

gis_model

3

disast_mitig

17

coastal_manag

293,721.6

98

global_posit_system

3

ecosystem_servic

17

photogrammetri

276,128.9

99

hazus

3

liquefact

17

analyt_hierarchi_process

259,166.1

  1. All the keywords have been processed with the stemming methods to keep the basic linguistic forms

Appendix 2: Top 99 semantic units in the semantic based results (SF, SF-SIDF, SF-SAI)

Rank

SF

Freq.

SF-SIDF

Freq.

SF-SAI

Freq.

1

gis; participatori_gis

95

gis; participatori_gis

350.5554

fluvial_hazard; hydrometeorolog_hazard; hazard_geographi; geoenvironment_hazard; multi_hazard; hazard_ontolog; hazard_informat; hazard;

8,329,761

2

geograph_inform_system; geograph_inform_system_gis

74

geograph_inform_system; geograph_inform_system_gis

341.2582

car_model; multiscal_model; dea_model; loglinear_model; model; ensembl_model; nois_model; traffic_model; model_chain; hidden_markov_model; mathemat_model; geochem_model; bayesian_hierarch_model; probabilist_model

3,059,600

3

natur_hazard

64

natur_hazard

163.5489

geograph_inform_system; geograph_inform_system_gis

551,130.6

4

socioeconom_vulner; socio_demograph_vulner; differenti_vulner; vulner; social_vulner; port_vulner

52

socioeconom_vulner; socio_demograph_vulner; differenti_vulner; vulner; social_vulner; port_vulner

171.622

gis; participatori_gis;

361,425.3

5

landslid; landslid_inventori

45

landslid; landslid_inventori

151.6579

socioeconom_vulner; socio_demograph_vulner; differenti_vulner; vulner; social_vulner; port_vulner

73,343.6

6

vulner_matrix; vulner; port_vulner; differenti_vulner

40

vulner_matrix; vulner; port_vulner; differenti_vulner

138.9708

hyperspectr_remot_sens; remot_sens_rs; remot_sens; satellit_remot_sens

65,562.86

7

fluvial_hazard; hydrometeorolog_hazard; hazard_geographi; geoenvironment_hazard; multi_hazard; hazard_ontolog; hazard_informat; hazard

33

fluvial_hazard; hydrometeorolog_hazard; hazard_geographi; geoenvironment_hazard; multi_hazard; hazard_ontolog; hazard_informat; hazard

295.0969

landslid; landslid_inventori

58,894.39

8

hyperspectr_remot_sens; remot_sens_rs; remot_sens; satellit_remot_sens

30

hyperspectr_remot_sens; remot_sens_rs; remot_sens; satellit_remot_sens

128.6511

natur_hazard

52,744.62

9

flash_flood; flood

28

flash_flood; flood

93.62943

vulner_matrix; vulner; port_vulner; differenti_vulner;

51,638.82

10

latent_risk; risk; predat_risk; risk_zonat

25

latent_risk; risk; predat_risk; risk_zonat

86.54228

analyt_hierarch_process; analyt_hierarchi_process; analyt_hierarch_process_ahp; analyt_hierarchi_process_ahp; fuzzi_analyt_hierarchi_process_fahp

51,418.28

11

multi_risk_assess; risk_assess; participatori_risk_assess; ecolog_risk_assess; individu_risk_assess

24

multi_risk_assess; risk_assess; participatori_risk_assess; ecolog_risk_assess; individu_risk_assess

85.94491

frequenc_ratio; likelihood_frequenc_ratio

31,233.42

12

natur_disast; natur_disast_prepared

23

natur_disast; natur_disast_prepared

90.27616

landslid_suscept; landslid_suscept_ls

28,922.78

13

chamoli_earthquak; earthquak; haiti_earthquak; chi_chi_earthquak; wenchuan_earthquak; devast_earthquak

22

chamoli_earthquak; earthquak; haiti_earthquak; chi_chi_earthquak; wenchuan_earthquak; devast_earthquak

77.67505

malaysia

28,830.85

14

presidenti_disast_declar; manmad_disast; disast; post_disast_reconstruct; disast_prepared

22

presidenti_disast_declar; manmad_disast; disast; post_disast_reconstruct; disast_prepared

78.47514

natur_disast; natur_disast_prepared

26,796.83

15

car_model; multiscal_model; dea_model; loglinear_model; model; ensembl_model; nois_model; traffic_model; model_chain; hidden_markov_model; mathemat_model; geochem_model; bayesian_hierarch_model; probabilist_model

20

car_model; multiscal_model; dea_model; loglinear_model; model; ensembl_model; nois_model; traffic_model; model_chain; hidden_markov_model; mathemat_model; geochem_model; bayesian_hierarch_model; probabilist_model

178.8466

geograph_inform; volunt_geograph_inform

24,476.8

16

climat_chang; climat_chang_adapt

16

climat_chang; climat_chang_adapt

56.49095

intrins_vulner_map; map_vulner; vulner_map; specif_vulner_map;

24,476.8

17

analyt_hierarch_process; analyt_hierarchi_process; analyt_hierarch_process_ahp; analyt_hierarchi_process_ahp; fuzzi_analyt_hierarchi_process_fahp

11

analyt_hierarch_process; analyt_hierarchi_process; analyt_hierarch_process_ahp; analyt_hierarchi_process_ahp; fuzzi_analyt_hierarchi_process_fahp

66.57154

spatial_analysi; spatial_multi_criteria_analysi; tempor_analysi

22,947

18

landslid_suscept; landslid_suscept_ls

11

landslid_suscept; landslid_suscept_ls

60.24254

flash_flood; flood;

22,210.43

19

resili

11

resili

44.24584

multi_risk_assess; risk_assess; participatori_risk_assess; ecolog_risk_assess; individu_risk_assess;

20,684.62

20

gargano_promontori_southern_itali; southern_itali; itali; western_jilin_provinc; central_liaon_provinc

10

gargano_promontori_southern_itali; southern_itali; itali; western_jilin_provinc; central_liaon_provinc

46.51871

land_cover

20,397.33

21

hurrican; hurrican_mitch; hurrican_katrina

10

hurrican; hurrican_mitch; hurrican_katrina

46.65664

latent_risk; risk; predat_risk; risk_zonat;

19,919.27

22

optim_risk_manag; risk_manag

10

optim_risk_manag; risk_manag

42.97939

nord_pas_de_calai; vulnerabilidad_de_las_instalacion_critica; santiago_de_chile; vulnerabilidad_de_la_infraestructura;

17,483.43

23

ayvalik_turkey; turkey; findik_turkey; egirdir_turkey; yenisehir_turkey; istanbul_turkey

9

ayvalik_turkey; turkey; findik_turkey; egirdir_turkey; yenisehir_turkey; istanbul_turkey

47.05882

casualti

17,210.25

24

drought

9

drought

41.04273

disast_risk_assess

17,210.25

25

geotechn_microzon_map; neotecton_map; krige_map; participatori_map; map; geomorpholog_map

9

geotechn_microzon_map; neotecton_map; krige_map; participatori_map; map; geomorpholog_map

45.64016

geograph_variat

17,210.25

26

environ; geo_environ

8

environ; geo_environ

38.65165

presidenti_disast_declar; manmad_disast; disast; post_disast_reconstruct; disast_prepared;

17,139.43

27

urban; urban_geographi; urban_terror; urban_abandon

8

urban; urban_geographi; urban_terror; urban_abandon

43.32776

chamoli_earthquak; earthquak; haiti_earthquak; chi_chi_earthquak; wenchuan_earthquak; devast_earthquak;

16,527.3

28

alborz_mountain; fagara_mountain; apuseni_mountain; changbai_mountain; mountain

7

alborz_mountain; fagara_mountain; apuseni_mountain; changbai_mountain; mountain

40.06418

land_plan; land_us_plan

15,298

29

bangladesh

7

bangladesh

39.52545

hazus; hazus_mh

15,298

30

bayesian_analysi; multicriteria_analysi; spatiotempor_analysi; discours_analysi; meta_analysi; 3d_visual_analysi

7

bayesian_analysi; multicriteria_analysi; spatiotempor_analysi; discours_analysi; meta_analysi; 3d_visual_analysi

20.23169

urban_geolog

15,298

31

binghamton_geomorpholog_symposium; geomorpholog

7

binghamton_geomorpholog_symposium; geomorpholog

35.07353

ayvalik_turkey; turkey; findik_turkey; egirdir_turkey; yenisehir_turkey; istanbul_turkey;

15,111.44

32

frequenc_ratio; likelihood_frequenc_ratio

7

frequenc_ratio; likelihood_frequenc_ratio

45.20196

alborz_mountain; fagara_mountain; apuseni_mountain; changbai_mountain; mountain;

14,992.04

33

logist_regress

7

logist_regress

35.4979

topographi

14,709.62

34

malaysia

7

malaysia

44.64167

urban; urban_geographi; urban_terror; urban_abandon;

14,398.12

35

northeast_china; fujian_china; china; northern_china

7

northeast_china; fujian_china; china; northern_china

33.05976

bangladesh

13,881.52

36

semi_natur_habitat; natur_disturb; natur_geohazard; natur_calam; natur_conserv

7

semi_natur_habitat; natur_disturb; natur_geohazard; natur_calam; natur_conserv

34.80427

rainfal; torrenti_rainfal

13,768.2

37

sustain; sustain_tourism

7

sustain; sustain_tourism

36.10698

gis_model

13,768.2

38

artifici_neural_network; neural_network

6

artifici_neural_network; neural_network

33.45021

pca

13,768.2

39

asset_manag; habitat_manag; stock_manag; wildlif_manag; self_manag; manag

6

asset_manag; habitat_manag; stock_manag; wildlif_manag; self_manag; manag

29.60998

catchment; saratel_catchment; xiangxi_catchment; loess_catchment;

13,598.22

40

big_data; data_fusion; multisensor_data_fusion; srtm_data; lidar_data; administr_data

6

big_data; data_fusion; multisensor_data_fusion; srtm_data; lidar_data; administr_data

31.5207

natur_disturb_regim; natur_disturb

13,598.22

41

cardiovascular_diseas; coronari_diseas; diarrheal_diseas; hiv_diseas_progress; diseas_progress; coronari_heart_diseas

6

cardiovascular_diseas; coronari_diseas; diarrheal_diseas; hiv_diseas_progress; diseas_progress; coronari_heart_diseas

15.01788

geotechn_microzon_map; neotecton_map; krige_map; participatori_map; map; geomorpholog_map

12,907.69

42

fire; grassland_fire

6

fire; grassland_fire

28.70068

estuari; meghna_estuari;

12,238.4

43

hazard_map

6

hazard_map

34.84102

wildland_fire_risk; fire_risk; forest_fire_risk;

12,238.4

44

land_plan; land_us_plan

6

land_plan; land_us_plan

36.31175

hazard_map;

11,972.35

45

monitor

6

monitor

31.5207

coastal_vulner_index

11,473.5

46

rainfal; torrenti_rainfal

6

rainfal; torrenti_rainfal

35.67959

dea;

11,473.5

47

sea_level; sea_level_rise

6

sea_level; sea_level_rise

33.45021

disast_risk_index; ecolog_disast_risk_index; grassland_snow_disast_risk_index;

11,473.5

48

seismic_hazard

6

seismic_hazard

30.30306

coastal_manag; urban_manag

11,248.53

49

slope; slope_movement; slope_stabil

6

slope; slope_movement; slope_stabil

30.81401

hurrican; hurrican_mitch; hurrican_katrina;

10,623.61

50

social_media; social_geographi; social_disadvantag; social; social_contact

6

social_media; social_geographi; social_disadvantag; social; social_contact

32.85957

landslid_inventori_map; landslid_map; landslid_suscept_map;

10,623.61

51

spatial_analysi; spatial_multi_criteria_analysi; tempor_analysi

6

spatial_analysi; spatial_multi_criteria_analysi; tempor_analysi

38.74454

gargano_promontori_southern_itali; southern_itali; itali; western_jilin_provinc; central_liaon_provinc;

10,478.08

52

arctic_region; region; himalayan_region; arid_region; mediterranean_region

5

arctic_region; region; himalayan_region; arid_region; mediterranean_region

27.70566

spatial; spatial_smooth; spatial_heterogen; spatial_homogen

10,198.67

53

coastal_manag; urban_manag

5

coastal_manag; urban_manag

30.54558

activ_learn

10,198.67

54

debri_flow

5

debri_flow

22.99262

anthropogen_intervent

10,198.67

55

decis_support_system; decis_support_system_dss

5

decis_support_system; decis_support_system_dss

29.25644

catastroph_theori

10,198.67

56

disast_manag

5

disast_manag

27.22911

claim; claim_payout;

10,198.67

57

eros

5

eros

25.67834

climat_variat

10,198.67

58

flash_flood_hazard; flood_hazard

5

flash_flood_hazard; flood_hazard

27.70566

communic_satellit; satellit_communic;

10,198.67

59

guangdong_provinc; shandong_provinc; hebei_provinc; jilin_provinc; western_jilin_provinc

5

guangdong_provinc; shandong_provinc; hebei_provinc; jilin_provinc; western_jilin_provinc

29.73299

crowdsourc

10,198.67

60

himalaya; kashmir_himalaya

5

himalaya; kashmir_himalaya

27.70566

cumul_fire_risk_index; fire_risk_index;

10,198.67

61

indic

5

indic

29.03418

cumulonimbus_convect

10,198.67

62

insur

5

insur

25.56844

earth_fractur

10,198.67

63

landslid_inventori_map; landslid_map; landslid_suscept_map

5

landslid_inventori_map; landslid_map; landslid_suscept_map

30.25979

flood_prepared

10,198.67

64

mass_movement

5

mass_movement

29.98946

geo_informat

10,198.67

65

po_river; river_murray; river; yangtz_river_delta

5

po_river; river_murray; river; yangtz_river_delta

23.54112

geograph_analysi

10,198.67

66

risk_percept

5

risk_percept

21.88991

geograph_weight_regress

10,198.67

67

sustain_develop

5

sustain_develop

28.61727

geospati;

10,198.67

68

topographi

5

topographi

31.8869

hazard_cours

10,198.67

69

tsunami

5

tsunami

21.94227

hazard_zone; multi_hazard_zone;

10,198.67

70

alp; apuan_alp; swiss_alp

4

alp; apuan_alp; swiss_alp

23.05711

human_environ_relat

10,198.67

71

bushfir

4

bushfir

25.21309

land_us;

10,198.67

72

catchment; saratel_catchment; xiangxi_catchment; loess_catchment

4

catchment; saratel_catchment; xiangxi_catchment; loess_catchment

26.98042

offshor_taiwan; onshor_taiwan;

10,198.67

73

central_america; central_greec; central_liaon_provinc

4

central_america; central_greec; central_liaon_provinc

21.54793

physic_geographi

10,198.67

74

coastal_area; coastal_urban_area; urban_area

4

coastal_area; coastal_urban_area; urban_area

24.20783

primari_school

10,198.67

75

cultur; cultur_geographi; cultur_heritag

4

cultur; cultur_geographi; cultur_heritag

23.78639

prison

10,198.67

76

eco_friend_method; interdisciplinari_method; geostatist_method; semi_qualit_method

4

eco_friend_method; interdisciplinari_method; geostatist_method; semi_qualit_method

21.0138

scenario_model

10,198.67

77

environment_anthropolog; environment_justic

4

environment_anthropolog; environment_justic

22.30014

twitter

10,198.67

78

estuari; meghna_estuari

4

estuari; meghna_estuari

26.55898

urban_form

10,198.67

79

flood_disast

4

flood_disast

24.43647

vancouv

10,198.67

80

forest; minudasht_forest; oak_forest

4

forest; minudasht_forest; oak_forest

22.73693

mass_movement

10,064.47

81

fujita_scale; scale

4

fujita_scale; scale

25.50952

digit; digit_cartographi

9834.429

82

geograph_inform; volunt_geograph_inform

4

geograph_inform; volunt_geograph_inform

29.33157

flood_suscept

9834.429

83

geolog_hazard; geomorpholog_hazard

4

geolog_hazard; geomorpholog_hazard

22.4405

global_posit_system

9834.429

84

groundwat

4

groundwat

20.12123

iczm

9834.429

85

hazus; hazus_mh

4

hazus; hazus_mh

27.45155

photogrammetri

9834.429

86

human_capit; human_settlement; human_biomonitor

4

human_capit; human_settlement; human_biomonitor

23.22734

guangdong_provinc; shandong_provinc; hebei_provinc; jilin_provinc; western_jilin_provinc;

9561.25

87

infrastructur

4

infrastructur

24.20783

artifici_neural_network; neural_network

9495.31

88

integr_vulner_assess; vulner_assess

4

integr_vulner_assess; vulner_assess

22.73693

sea_level; sea_level_rise;

9495.31

89

intrins_vulner_map; map_vulner; vulner_map; specif_vulner_map

4

intrins_vulner_map; map_vulner; vulner_map; specif_vulner_map

29.33157

fujita_scale; scale;

9414.154

90

iran

4

iran

21.43525

climat_chang; climat_chang_adapt;

8741.714

91

land_cover

4

land_cover

28.60228

bushfir

8741.714

92

land_use_plan

4

land_use_plan

23.05711

decis_support_system; decis_support_system_dss

8692.045

93

natur_break; natur_geohazard; natur_calam; socio_natur

4

natur_break; natur_geohazard; natur_calam; socio_natur

18.44639

social_media; social_geographi; social_disadvantag; social; social_contact;

8605.125

94

natur_disturb_regim; natur_disturb

4

natur_disturb_regim; natur_disturb

26.98042

landsat; landsat_tm;

8605.125

95

nival_process; poisson_process; torrenti_process; geochem_process

4

nival_process; poisson_process; torrenti_process; geochem_process

20.81864

serbia

8605.125

96

nord_pas_de_calai; vulnerabilidad_de_las_instalacion_critica; santiago_de_chile; vulnerabilidad_de_la_infraestructura

4

nord_pas_de_calai; vulnerabilidad_de_las_instalacion_critica; santiago_de_chile; vulnerabilidad_de_la_infraestructura

27.98568

sustain; sustain_tourism;

8518.205

97

rockfal

4

rockfal

20.63256

indic;

8314.13

98

satellit_imag; satellit_imageri; satellit_imag_classif

4

satellit_imag; satellit_imageri; satellit_imag_classif

22.58597

environ; geo_environ;

8025.18

99

soil; soil_termiticid; soil_suction

4

soil; soil_termiticid; soil_suction

16.68658

logist_regress

7808.354

  1. All the keywords have been processed with the stemming methods to keep the basic linguistic forms

Appendix 3: All 137 keywords including 33 unique TF-KAI keywords, 38 unique SF-SAI keywords, and 66 overlapping keywords

Id

Keyword

Id

Keyword

Id

Keyword

Id

Keyword

Id

Keyword

Id

Keyword

0

cross_applic

23

fuzzi_relat

46

xiangxi_catchment

69

geo_environ

92

earth_fractur

115

twitter

1

debris_laden_slope

24

landslid_hazard_map

47

meghna_estuari

70

dea

93

geo_informat

116

vancouv

2

decis_tree_dt

25

multi_hazard_assess

48

wildland_fire_risk

71

geograph_inform_system_gis

94

geograph_analysi

117

volunt_geograph_inform

3

human_settlement

26

probabl_model

49

hazard_map

72

geograph_inform_system

95

geograph_weight_regress

118

bangladesh

4

multi_criteria_decis_analysi_mcda

27

radon_mass_exhal_rate

50

disast_risk_index

73

gis

96

hazard_cours

119

land_us_plan

5

multicriteria_analysi

28

environment_justic

51

hurrican_mitch

74

malaysia

97

participatori_map

120

landslid

6

orograph_forc

29

exceed_probabl

52

landslid_inventori_map

75

spatial_analysi

98

physic_geographi

121

social_vulner

7

river_murray

30

landslid_monitor

53

gargano_promontori_southern_itali

76

prison

99

primari_school

122

hazus

8

seismic_microzon

31

riverbank_stabil

54

claim_payout

77

disast_risk_assess

100

scenario_model

123

iczm

9

parallel_comput

32

urban_system

55

satellit_communic

78

geograph_variat

101

torrenti_rainfal

124

natur_disturb

10

support_vector_machin_svm

33

hydrometeorolog_hazard

56

cumul_fire_risk_index

79

frequenc_ratio

102

urban_form

125

serbia

11

area_with_geograph_specif

34

participatori_gis

57

geospati

80

remot_sens

103

casualti

126

natur_disast

12

multi_criteria_method

35

vulner_matrix

58

multi_hazard_zone

81

urban_geolog

104

spatial_heterogen

127

logist_regress

13

dewey_john

36

intrins_vulner_ma

59

land_us

82

analyt_hierarchi_process_ahp

105

spatial_homogen

128

mass_movement

14

european_polici

37

flash_flood

60

onshor_taiwan

83

coastal_vulner_index

106

land_cover

129

topographi

15

evidence_bas_polici

38

multi_risk_assess

61

guangdong_provinc

84

gis_model

107

landslid_suscept

130

dea_model

16

gloss_srtm_data

39

latent_risk

62

sea_level_rise

85

pca

108

flood_suscept

131

natur_hazard

17

natur_killer_t_cel

40

vulnerabilidad_de_la_infraestructura

63

fujita_scale

86

activ_learn

109

global_posit_system

132

artifici_neural_network

18

orissa_india

41

presidenti_disast_declar

64

climat_chang_adapt

87

anthropogen_intervent

110

analyt_hierarch_process

133

bushfir

19

participatori_plan

42

wenchuan_earthquak

65

social_media

88

catastroph_theori

111

crowdsourc

134

coastal_manag

20

white_gilbert

43

istanbul_turkey

66

landsat_tm

89

climat_variat

112

decis_support_system_dss

135

photogrammetri

21

eman_coeffici

44

changbai_mountain

67

sustain_tourism

90

cumulonimbus_convect

113

flood_prepared

136

analyt_hierarchi_process

22

fire_weather

45

urban_terror

68

indic

91

digit_cartographi

114

human_environ_relat

  

Appendix 4: Experimental materials for testing the efficiency of the SF-SAI and TF-KAI methods

Rank

SF-SAI semantic units

SF-SAI keywords

TF-KAI unique keywords

1

fluvial_hazard; hydrometeorolog_hazard; hazard_geographi; geoenvironment_hazard; multi_hazard; hazard_ontolog; hazard_informat; hazard

hydrometeorolog_hazard

cross_applic

2

gis; participatori_gis

participatori_gis

debris_laden_slope

3

vulner_matrix; vulner; port_vulner; differenti_vulner

vulner_matrix

decis_tree_dt

4

intrins_vulner_map; map_vulner; vulner_map; specif_vulner_map

intrins_vulner_ma

human_settlement

5

flash_flood; flood

flash_flood

multi_criteria_decis_analysi_mcda

6

multi_risk_assess; risk_assess; participatori_risk_assess; ecolog_risk_assess; individu_risk_assess

multi_risk_assess

multicriteria_analysi

7

latent_risk; risk; predat_risk; risk_zonat

latent_risk

orograph_forc

8

nord_pas_de_calai; vulnerabilidad_de_las_instalacion_critica; santiago_de_chile; vulnerabilidad_de_la_infraestructura

vulnerabilidad_de_la_infraestructura

river_murray

9

presidenti_disast_declar; manmad_disast; disast; post_disast_reconstruct; disast_prepared

presidenti_disast_declar

seismic_microzon

10

chamoli_earthquak; earthquak; haiti_earthquak; chi_chi_earthquak; wenchuan_earthquak; devast_earthquak

wenchuan_earthquak

parallel_comput

11

ayvalik_turkey; turkey; findik_turkey; egirdir_turkey; yenisehir_turkey; istanbul_turkey

istanbul_turkey

support_vector_machin_svm

12

alborz_mountain; fagara_mountain; apuseni_mountain; changbai_mountain; mountain

changbai_mountain

area_with_geograph_specif

13

urban; urban_geographi; urban_terror; urban_abandon

urban_terror

multi_criteria_method

14

catchment; saratel_catchment; xiangxi_catchment; loess_catchment

xiangxi_catchment

dewey_john

15

estuari; meghna_estuari

meghna_estuari

european_polici

16

wildland_fire_risk; fire_risk; forest_fire_risk

wildland_fire_risk

evidence_bas_polici

17

hazard_map

hazard_map

gloss_srtm_data

18

disast_risk_index; ecolog_disast_risk_index; grassland_snow_disast_risk_index

disast_risk_index

natur_killer_t_cel

19

hurrican; hurrican_mitch; hurrican_katrina

hurrican_mitch

orissa_india

20

landslid_inventori_map; landslid_map; landslid_suscept_map

landslid_inventori_map

participatori_plan

21

gargano_promontori_southern_itali; southern_itali; itali; western_jilin_provinc; central_liaon_provinc

gargano_promontori_southern_itali

white_gilbert

22

claim; claim_payout

claim_payout

eman_coeffici

23

communic_satellit; satellit_communic

satellit_communic

fire_weather

24

cumul_fire_risk_index; fire_risk_index

cumul_fire_risk_index

fuzzi_relat

25

geospati

geospati

landslid_hazard_map

26

hazard_zone; multi_hazard_zone

multi_hazard_zone

multi_hazard_assess

27

land_us

land_us

probabl_model

28

offshor_taiwan; onshor_taiwan

onshor_taiwan

radon_mass_exhal_rate

29

guangdong_provinc; shandong_provinc; hebei_provinc; jilin_provinc; western_jilin_provinc

guangdong_provinc

environment_justic

30

sea_level; sea_level_rise

sea_level_rise

exceed_probabl

31

fujita_scale; scale

fujita_scale

landslid_monitor

32

climat_chang; climat_chang_adapt

climat_chang_adapt

riverbank_stabil

33

social_media; social_geographi; social_disadvantag; social; social_contact

social_media

urban_system

34

landsat; landsat_tm

landsat_tm

 

35

sustain; sustain_tourism

sustain_tourism

 

36

indic

indic

 

37

environ; geo_environ

geo_environ

 

38

dea

dea

 
  1. All the keywords have been processed with the stemming methods to keep the basic linguistic forms

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, K., Wu, H., Qi, K. et al. A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model. Scientometrics 114, 1031–1068 (2018). https://doi.org/10.1007/s11192-017-2574-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-017-2574-9

Keywords