Abstract
Quantitative ethnography approaches are often used to analyze large scale qualitative data. Manually coding such data is expensive and time consuming, if not impractical or impossible. In contrast, machine learning algorithms can code virtually unlimited amounts of data once a model has been created. However, machine learning approaches lack transparency and rely on large amount of training data. An alternative automated coding approach using regular expressions has the advantage of minimizing required training data while providing transparency. However, manually creating regular expressions during the coding process can be a very challenging task for many researchers. One potential solution to this challenge is automatic regular expression generation. Unfortunately, existing algorithms are all based on large pre-coded training data which is often unavailable in quantitative ethnography tasks. In this paper, we present a lightweight and interactive algorithm that actively constructs regular expression-based coding classifiers with the researcher. We use a simulation on an education data to show that the proposed algorithm is promising.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Active learning of regular expressions for entity extraction. IEEE Trans. Cybern. 48(3), 1067–1080 (2018). https://doi.org/10.1109/TCYB.2017.2680466. http://ieeexplore.ieee.org/document/7886274/
Cai, Z., Eagan, B., Marquart, C., Shaffer, D.W.: LSTM neural network assisted regex development for qualitative coding. In: Damşa, C., Barany, A. (eds.) ICQE 2022. CCIS, vol. 1785, pp. 17–29. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-31726-2_2
Cai, Z., Marquart, C., Shaffer, D.: Neural recall network: a neural network solution to low recall problem in regex-based qualitative coding. In: Mitrovic, A., Bosch, N. (eds.) Proceedings of the 15th International Conference on Educational Data Mining, Durham, United Kingdom, pp. 228–238. International Educational Data Mining Society (2022). https://doi.org/10.5281/zenodo.6853047
Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W.: Using topic modeling for code discovery in large scale text data. In: Ruis, A.R., Lee, S.B. (eds.) ICQE 2020. CCIS, vol. 1312, pp. 18–31. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67788-6_2
Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W., Xiangen, Hu., Graesser, A.C.: nCoder+: a semantic tool for improving recall of ncoder coding. In: Eagan, B., Misfeldt, M., Siebert-Evenstone, A. (eds.) ICQE 2019. CCIS, vol. 1112, pp. 41–54. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33232-7_4
Charmaz, K.: Constructing Grounded Theory. Sage, London (2006)
Chen, N.C., Drouhard, M., Kocielnik, R., Suh, J., Aragon, C.R.: Using machine learning to support qualitative coding in social science: shifting the focus to ambiguity. ACM Trans. Interact. Intell. Syst. 8(2), 9:1–9:20 (2018). https://doi.org/10.1145/3185515
Chesler, N., Ruis, A., Collier, W., Swiecki, Z., Arastoopour, G., Shaffer, D.: A novel paradigm for engineering education: virtual internships with individualized mentoring and assessment of engineering thinking. J. Biomech. Eng. 137(2), 1–8 (2015)
Gautam, D., Swiecki, Z., Shaffer, D.W., Graesser, A.C., Rus, V.: Modeling classifiers for virtual internships without participant data. In: Proceedings of the 10th International Conference on Educational Data Mining, pp. 278–283 (2017)
Glaser, B., Strauss, A.: The discovery of grounded theory: stretegies for qualitative research. Aldine, Chicago (1967)
Li, X., Cui, M., Li, J., Bai, R., Lu, Z., Aickelin, U.: A hybrid medical text classification framework: integrating attentive rule construction and neural network. Neurocomputing 443, 345–355 (2021). https://doi.org/10.1016/j.neucom.2021.02.069. https://linkinghub.elsevier.com/retrieve/pii/S0925231221003258
Shaffer, D.: Quantitative Ethnography. Cathcart Press, Madison (2017)
Shaffer, D.W., Ruis, A.R.: How we code. In: Ruis, A.R., Lee, S.B. (eds.) ICQE 2020. CCIS, vol. 1312, pp. 62–77. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67788-6_5
Acknowledgements
This work was funded in part by the National Science Foundation (DRL-2100320, DRL-2201723, DRL-2225240), the Wisconsin Alumni Research Foundation, and the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison. The opinions, findings, and conclusions do not reflect the views of the funding agencies, cooperating institutions, or other individuals.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cai, Z., Marquart, C., Eagan, B., Xiao, Y., Williamson Shaffer, D. (2023). A Lightweight Interactive Regular Expression Generator for Qualitative Coding in Quantitative Ethnography. In: Arastoopour Irgens, G., Knight, S. (eds) Advances in Quantitative Ethnography. ICQE 2023. Communications in Computer and Information Science, vol 1895. Springer, Cham. https://doi.org/10.1007/978-3-031-47014-1_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-47014-1_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47013-4
Online ISBN: 978-3-031-47014-1
eBook Packages: Computer ScienceComputer Science (R0)