From Wordle to Insights: Using Tailored Clustering and CART to Forecast Difficulty Levels

Xu, Xinyi; Huang, Jinqi

doi:10.1007/978-981-97-4193-9_17

Xinyi Xu³⁹ &
Jinqi Huang³⁹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1214))

Included in the following conference series:

International Conference on Innovative Computing

117 Accesses

Abstract

Wordle, a popular daily puzzle in the New York Times, has garnered significant attention with its unique challenge. Players must decipher a five-letter word in six attempts or less, receiving feedback after each guess. Rooted in information theory and pragmatics, Wordle offers a valuable platform for exploration. Our research proposes an innovative approach consisting of two components. The first utilizes K-means clustering with tailored parameters: the forward difficulty evaluation index (Dforward) and the reverse difficulty evaluation index (Dreverse). Dforward is calculated by weighting factors like Nrepeat (normalized count of repeated letters), word frequency (F), and the number of vowels (Nvowel) in the word. Dreverse is obtained by normalizing the predicted number of successful guesses. The second component applies a CART decision tree model with difficulty level labels, predicting future solution complexities. By analyzing linguistic features in five-letter words, our study constructs a model that accurately determines word difficulty using statistical knowledge and machine learning techniques. Additionally, this model facilitates related linguistic analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Systematic Review of Data-Driven Approaches to Item Difficulty Prediction

Solving Estimation Tasks: Novel Features of the Emerging Models When Three-Dimensional Geometry Becomes Relevant

Measuring the prediction difficulty of individual cases in a dataset using machine learning

Article Open access 07 May 2024

References

Brychkov, Y.A., Savischenko, N.V.: Application of hypergeometric functions of several variables in the mathematical theory of communication: evaluation of error probability in fading singlechannel system. Lobachevskii J. Math. 41, 1971–1991 (2020)
Article MathSciNet Google Scholar
Cano, A., Krawczyk, B.: Kappa updated ensemble for drifting data stream mining. Mach. Learn. 109, 175–218 (2020)
Article MathSciNet Google Scholar
Lin, C., Sun, D., Song, C.: Posterior propriety of an objective prior for generalized hierarchical normal linear models. Stat. Theor. Relat. Fields 6(4), 309–326 (2022)
Article MathSciNet Google Scholar
Saroj, Kavita: Review: study on simple k mean and modified K mean clustering technique. Int. J. Comput. Sci. Eng. Technol. 6(7) 279281 (2016)
Google Scholar
Han, J., Xu, J., Nie, F., Li, X.: Multi-view K-means clustering with adaptive sparse memberships and weight allocation. IEEE Trans. Knowl. Data Eng. 34(2), 816–827 (2022)
Article Google Scholar
Steuer, R., Kurths, J., Daub, C.O., et al.: The mutual information: detecting and evaluating dependencies between variables. Bioinformatics 18(suppl_2), S231–S240 (2002) https://doi.org/10.1093/bioinformatics/18.suppl_2.s231
Bholowalia, P., Kumar, A.: EBK-means: a clustering technique based on elbow method and k-means in WSN. Int. J. Comput. Appl. 105(09), 17–24 (2014)
Google Scholar
Li, H., Shao, J., Liao, K., Tang, M.: Do simpler statistical methods perform better in multivariate long sequence time-series forecasting? In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, CIKM 2022, pp. 4168–4172. Association for Computing Machinery, New York (2022)
Google Scholar
Yu, Y., Si, X., Hu, C., et al.: A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 31(7), 1235–1270 (2019)
Article MathSciNet Google Scholar
Zhou, H., et al.: Informer: beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 35(12), 11106–11115 (2021). https://doi.org/10.1609/aaai.v35i12.17325
Article Google Scholar
Zhou, Z.: Machine Learning. Tsinghua University Press (2016)
Google Scholar
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. 2010 IEEE International Conference on Data Mining, Sydney, NSW, Australia, pp. 911–916 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai University, Shanghai, China
Xinyi Xu & Jinqi Huang

Authors

Xinyi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jinqi Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinyi Xu .

Editor information

Editors and Affiliations

Computer Science and Engineering, University of Aizu, Aizuwakamatsu Shi, Japan
Yan Pei
Department of Computer Science and Information Engineering, National Taichung University of Science, North District, Taiwan
Hao Shang Ma
Department of Computer Science and Information Management, Providence University, Taichung, Taiwan
Yu-Wei Chan
Humanitas College, Kyung Hee University, Dongdaemun-gu, Korea (Republic of)
Hwa-Young Jeong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, X., Huang, J. (2024). From Wordle to Insights: Using Tailored Clustering and CART to Forecast Difficulty Levels. In: Pei, Y., Ma, H.S., Chan, YW., Jeong, HY. (eds) Proceedings of Innovative Computing 2024 Vol. 1. IC 2024. Lecture Notes in Electrical Engineering, vol 1214. Springer, Singapore. https://doi.org/10.1007/978-981-97-4193-9_17

Download citation

DOI: https://doi.org/10.1007/978-981-97-4193-9_17
Published: 21 June 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-4192-2
Online ISBN: 978-981-97-4193-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

From Wordle to Insights: Using Tailored Clustering and CART to Forecast Difficulty Levels

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Systematic Review of Data-Driven Approaches to Item Difficulty Prediction

Solving Estimation Tasks: Novel Features of the Emerging Models When Three-Dimensional Geometry Becomes Relevant

Measuring the prediction difficulty of individual cases in a dataset using machine learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

From Wordle to Insights: Using Tailored Clustering and CART to Forecast Difficulty Levels

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Systematic Review of Data-Driven Approaches to Item Difficulty Prediction

Solving Estimation Tasks: Novel Features of the Emerging Models When Three-Dimensional Geometry Becomes Relevant

Measuring the prediction difficulty of individual cases in a dataset using machine learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation