Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Study of violence against women and its characteristics through the application of text mining techniques

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

The Internet provides a wide variety of information that can be collected and studied, creating a massive data repository. Among the data available on the Internet, we can find articles about Violence against Women (VAW) published in the digital press, which are of great societal interest. In this work, we utilized Web scraping techniques to gather VAW-related news from the internet. Applying Text Mining techniques, we conducted a study on VAW and its characteristics. Our work comprises an exploratory analysis and the application of Topic Modelling to VAW events to identify latent topics and their semantic structures. We employed classification algorithms on a set of VAW press articles to determine the type of violence they refer to, namely physical, psychological, sexual, or a combination of them. We proposed two methodologies to target the data: the first one is based on dictionaries of VAW types, while the second approach extends the former by using the predominant violence to identify other associated types. Furthermore, we implemented two feature selection techniques: TF-IDF and \({Chi}^{2}\). Then, we applied Support Vector Machine, Decision Tree, Bayesian Networks, XGBoost Classifier, Random Forest, and Artificial Neural Networks. The results obtained showed that the classifiers achieved better performance when using \({Chi}^{2}\). The Boost Classifier demonstrated the best performance, followed by Random Forest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of data and materials

To access the item, go to https://doi.org/10.6084/m9.figshare.21252987.v1.

Abbreviations

ANN:

Artificial neural networks

BC:

Boost classifier

BN:

Bayesian networks

DT:

Decision tree

VAW:

Violence against women

LDA:

Latent Dirichlet allocation

NPL:

Natural processing lenguage

RF:

Random forest

SVM:

Support vector machine

References

  1. United Nation-Women. (2020). Intimate partner violence in five Caricom countries: Findings from national prevalence surveys on violence against women, (May). file:///C:/Users/inbalh/ Downloads/20201009CARICOMResearchBrief5.pdf

  2. Assembly, U.G.: Declaration on the elimination of violence against women. UN General Assembly (1993)

  3. Xue, J., Chen, J., Gelles, R.: Using data mining techniques to examine domestic violence topics on Twitter. Violence Gender 6(2), 105–114 (2019). https://doi.org/10.1089/vio.2017.0066

    Article  Google Scholar 

  4. Dehingia, N., Raj, A.: Mining Twitter Data to Identify Topics of Discussion by Indian Feminist Activists (2020). http://data2x.org/wp-content/uploads/2021/01/UCSD-Brief-1_Big-Data-and-Gender-in-Covid-Brief-Series.pdf

  5. Gil, V., Betancur, J., Puerta, I., Montoya, L., Sepulveda, J.: The femicide in Colombia and Mexico: a text mining analysis. Turk. Online J. Des. Art Commun. 8, 170–177 (2018). https://doi.org/10.7456/1080mse/021

    Article  Google Scholar 

  6. Madhubala, D., Rajendiran, M., Elangovan, D.: A study on effective analysis of machine learning algorithm towards the women’s safety in social media. In: 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), pp. 1151–1156. IEEE (2020). https://doi.org/10.1109/ICECA49313.2020.9297386

  7. Melville, S., Eccles, K., Yasseri, T.: Semantic map of sexism: topic modelling of everyday sexism project entries. CoRR (2017). http://www.researchgate.net/profile/Taha-Yasseri/publication/321306966_Semantic_Map_of_Sexism_Topic_Modelling_of_Everyday_Sexism_Project_Entries/links/5a38da49458515919e72785a/Semantic-Map-of-Sexism-Topic-Modelling-of-Everyday-Sexism-Project-Entries.pdf

  8. Poelmans, J., Elzinga, P., Viaene, S., Dedene, G.: Formally analysing the concepts of domestic violence. Expert Syst. Appl. 38, 3116–3130 (2011). https://doi.org/10.1016/j.eswa.2010.08.103

    Article  Google Scholar 

  9. Karystianis, G., Adily, A., Schofield, P., Knight, L., Galdon, C., Greenberg, D., Jorm, L., Nenadic, G., Butler, T.: Automatic extraction of mental health disorders from domestic violence police narratives: text mining study. J. Med. Internet Res. 20, e11548 (2018). https://doi.org/10.2196/11548

    Article  Google Scholar 

  10. Kiani, R., Mahdavi, S., Keshavarzi, A.: Analysis and prediction of crimes by clustering and classification. Int. J. Adv. Res. Artif. Intell. 4, 11–17 (2015). https://doi.org/10.14569/ijarai.2015.040802

    Article  Google Scholar 

  11. Hwang, Y.I., Zheng, L., Karystianis, G., Gibbs, V., Sharp, K., Butler, T.: Domestic violence events involving autism: a text mining study of police records in New South Wales, 2005–2016. Res. Autism Spectrum Disorders 78, 101634 (2020). https://doi.org/10.1016/j.rasd.2020.101634

    Article  Google Scholar 

  12. Motwani, M., Purwar, R., Madhur, R., Jamshed, A.: An efficient approach towards crime against women using Time Series algorithm. Int J Comput Appl 179, 22–26 (2018). https://doi.org/10.5120/ijca2018916730

    Article  Google Scholar 

  13. Karystianis, G., Adily, A., Schofield, P.W., Wand, H., Lukmanjaya, W., Buchan, I., et al.: Surveillance of domestic violence using text mining outputs from australian police records. Front. Psych. 12, 1–13 (2022). https://doi.org/10.3389/fpsyt.2021.787792

    Article  Google Scholar 

  14. Poojitha, P.V., Menon, R.R.K. (2020) Document representations to improve topic modelling, pp. 18–25

  15. Chakravorty, S., Daripa, S., Saha, U., Bose, S., Goswami, S., Mitra, S.: Data mining techniques for analyzing murder related structured and unstructured data. Am. J. Adv. Comput. 2, 47–54 (2015). http://www.researchgate.net/profile/Saptarsi-Goswami-2/publication/297369503_Data_mining_techniques_for_analyzing_murder_related_structured_and_unstructured_data/links/56e0158508ae979addf0e341/Data-mining-techniques-for-analyzing-murder-related-structured-and-unstructured-data.pdf

  16. Karami, A., White, C.N., Ford, K., Swan, S., Spinel, M.Y.: Unwanted advances in higher education: uncovering sexual harassment experiences in academia with text mining. Inf. Process. Manag. 57, 102167 (2020). https://doi.org/10.1016/j.ipm.2019.102167

    Article  Google Scholar 

  17. Tayal, D.K., Jain, A., Arora, S., Agarwal, S., Gupta, T., Tyagi, N.: Crime detection and criminal identification in India using data mining techniques. AI Soc. 30, 117–127 (2015). https://doi.org/10.1007/s00146-014-0539-6

    Article  Google Scholar 

  18. Febro-Naga, J., Tinam-Isan, M.A.: Exploring cyber violence against women and girls in th Philippnes though Explorando la cibrviolencia contra mujres y niñas enFilipnas. Comunicar 30(70), 121–133 (2022)

    Article  Google Scholar 

  19. Negara, E.S., Triadi, D., Andryani, R.: Topic modelling twitter data with latent Dirichlet allocation method. In: 2019 International Conference on Electrical Engineering and Computer Science (ICECOS) (2019). https://doi.org/10.1109/ICECOS47637.2019.8984523

  20. Rachman, F.F., Pramana, S.: Analisis sentimen Pro dan Kontra Masyarakat Indonesia tentang Vaksin COVID-19 pada Media Sosial Twitter. Indones. Health Inf. Manag. J. (INOHIM) 8(2), 100–109 (2020). https://doi.org/10.4108/eai.2-8-2019.2290336

    Article  Google Scholar 

  21. Ahmed, F., Nawaz, M., Jadoon, A.: Topic modeling of the Pakistani economy in English newspapers via latent Dirichlet allocation (LDA). SAGE Open (2022). https://doi.org/10.1177/21582440221079931

  22. Amara, A., HadjTaieb, M.A., BenAouicha, M.: Multilingual topic modeling for tracking COVID-19 trends based on Facebook data analysis. Appl. Intell. 51(5), 3052–3073 (2021). https://doi.org/10.1007/s10489-020-02033-3]

    Article  Google Scholar 

  23. Fahlevvi, M.R., Azhari, S.N.: Topic modeling on online news portal using latent Dirichlet allocation (LDA). IJCCS 16(4), 335 (2022). https://doi.org/10.22146/ijccs.74383

    Article  Google Scholar 

  24. Zhao, B.: Web scraping. Encyclopedia Big Data (2017). https://doi.org/10.1007/978-3-319-32001-4

  25. Dias Canedo, E., Cordeiro Mendes, B.: Software requirements classification using machine learning algorithms. Entropy 22, 1057 (2020). https://doi.org/10.3390/E22091057

    Article  Google Scholar 

  26. Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., Kochut, K.: A brief survey of text mining: classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919 (2017)

  27. Kou, G., Yang, P., Peng, Y., Xiao, F., Chen, Y., Alsaadi, F.E.: Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl. Soft Comput. 86, 105836 (2020). https://doi.org/10.1016/j.asoc.2019.105836

    Article  Google Scholar 

  28. Esplugues, J.S.: ¿Qué es violencia? Una aproximación al concepto ya la clasificación de la violencia. Daimon Rev. Int. Filos. (2007). http://revistas.um.es/daimon/article/view/95881

  29. López, Y.R., Gigato, B.A.A., Alvarez, I.G.: Consecuencias psicológicas del abuso sexual infantil. Eureka (Asunc.) Línea 9, 58–68 (2012)

    Google Scholar 

  30. Hernández, R.P., Gras, R.M.L.: Víctimas de violencia familiar: Consecuencias psicológicas en hijos de mujeres maltratadas. Anal. Psicol./Ann. Psychol. 21, 11–17 (2005)

    Google Scholar 

  31. Hernández Ramos, C., Magro Servet, V., Cuéllar Otón, J.P.: El maltrato psicológico. Causas, consecuencias y criterios jurisprudenciales. El Probl. Prob. http://hdl.handle.net/10045/46929

  32. Sheykhmousa, M., Mahdianpari, M., Ghanbari, H., Mohammadimanesh, F., Ghamisi, P., Homayouni, S.: Support vector machine versus random forest for remote sensing image classification: a meta-analysis and systematic review. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 13, 6308–6325 (2020). https://doi.org/10.1109/JSTARS.2020.3026724

    Article  Google Scholar 

  33. Wu, J.-Y., Hsiao, Y.-C., Nian, M.-W.: Using supervised machine learning on large-scale online forums to classify course-related Facebook messages in predicting learning achievement within the personal learning environment. Interact. Learn. Environ. 28, 65–80 (2020). https://doi.org/10.1080/10494820.2018.1515085

    Article  Google Scholar 

  34. Wang, P., Yan, Y., Si, Y., Zhu, G., Zhan, X., Wang, J., Pan, R.: Classification of proactive personality: text mining based on weibo text and short-answer questions text. IEEE Access 8, 97370–97382 (2020). https://doi.org/10.1109/ACCESS.2020.2995905

    Article  Google Scholar 

  35. Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585 (2018). https://doi.org/10.1016/j.procs.2018.05.122

    Article  Google Scholar 

  36. Qiu, S., Lin, Z., Zhou, Y., Wang, D., Yuan, L., Wei, Y., Dai, T., Luo, L., Chen, G.: Highly selective colorimetric bacteria sensing based on protein-capped nanoparticles. Analyst 140, 1149–1154 (2015). https://doi.org/10.1039/b000000xv

    Article  Google Scholar 

  37. Jambukia, S.H., Dabhi, V.K., Prajapati, H.B.: ECG beat classification using machine learning techniques. Int. J. Biomed. Eng. Technol. 26, 32–53 (2018). https://doi.org/10.1504/IJBET.2018.089255

    Article  Google Scholar 

  38. Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning-based text classification: a comprehensive review. ACM Comput. Surv. (CSUR) 54, 1–40 (2021). https://doi.org/10.1145/3439726

    Article  Google Scholar 

  39. Pranckevičius, T., Marcinkevičius, V.: Comparison of naive bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Balt. J. Mod. Comput. 5, 221 (2017). https://doi.org/10.22364/bjmc.2017.5.2.05

    Article  Google Scholar 

  40. Barberá, P., Boydstun, A.E., Linn, S., McMahon, R., Nagler, J.: Automated text classification of news articles: a practical guide. Polit. Anal. 29, 19–42 (2021). https://doi.org/10.1017/pan.2020.8]

    Article  Google Scholar 

  41. Campos, D., Silva, R.R., Bernardino, J.: Text mining in hotel reviews: impact of words restriction in text classification. In: KDIR, pp. 442–449 (2019). https://doi.org/10.5220/0008346904420449

  42. Li, L., Goh, T.-T., Jin, D.: How textual quality of online reviews affect classification performance: a case of deep learning sentiment analysis. Neural Comput. Appl. 32, 4387–4415 (2020). https://doi.org/10.1007/s00521-018-3865-7

    Article  Google Scholar 

  43. Hartmann, J., Huppertz, J., Schamp, C., Heitmann, M.: Comparing automated text classification methods. Int. J. Res. Mark. 36, 20–38 (2019). https://doi.org/10.1016/j.ijresmar.2018.09.009

    Article  Google Scholar 

Download references

Funding

We acknowledge financial support from the Ministerio de Ciencia e Innovación (Spain) (Research Project PID2020-112495RB-C21) and the I + D + i FEDER 2020 project B-TIC-42-UGR20.

Author information

Authors and Affiliations

Authors

Contributions

All authors have contributed equally to this work.

Corresponding author

Correspondence to M. C. Pegalajar.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interests.

Ethical approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stephanie, E.M.A., Ruiz, L.G.B., Vila, M.A. et al. Study of violence against women and its characteristics through the application of text mining techniques. Int J Data Sci Anal 18, 35–48 (2024). https://doi.org/10.1007/s41060-023-00448-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-023-00448-y

Keywords