Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

From Wordle to Insights: Using Tailored Clustering and CART to Forecast Difficulty Levels

  • Conference paper
  • First Online:
Proceedings of Innovative Computing 2024 Vol. 1 (IC 2024)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1214))

Included in the following conference series:

  • 117 Accesses

Abstract

Wordle, a popular daily puzzle in the New York Times, has garnered significant attention with its unique challenge. Players must decipher a five-letter word in six attempts or less, receiving feedback after each guess. Rooted in information theory and pragmatics, Wordle offers a valuable platform for exploration. Our research proposes an innovative approach consisting of two components. The first utilizes K-means clustering with tailored parameters: the forward difficulty evaluation index (Dforward) and the reverse difficulty evaluation index (Dreverse). Dforward is calculated by weighting factors like Nrepeat (normalized count of repeated letters), word frequency (F), and the number of vowels (Nvowel) in the word. Dreverse is obtained by normalizing the predicted number of successful guesses. The second component applies a CART decision tree model with difficulty level labels, predicting future solution complexities. By analyzing linguistic features in five-letter words, our study constructs a model that accurately determines word difficulty using statistical knowledge and machine learning techniques. Additionally, this model facilitates related linguistic analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Brychkov, Y.A., Savischenko, N.V.: Application of hypergeometric functions of several variables in the mathematical theory of communication: evaluation of error probability in fading singlechannel system. Lobachevskii J. Math. 41, 1971–1991 (2020)

    Article  MathSciNet  Google Scholar 

  2. Cano, A., Krawczyk, B.: Kappa updated ensemble for drifting data stream mining. Mach. Learn. 109, 175–218 (2020)

    Article  MathSciNet  Google Scholar 

  3. Lin, C., Sun, D., Song, C.: Posterior propriety of an objective prior for generalized hierarchical normal linear models. Stat. Theor. Relat. Fields 6(4), 309–326 (2022)

    Article  MathSciNet  Google Scholar 

  4. Saroj, Kavita: Review: study on simple k mean and modified K mean clustering technique. Int. J. Comput. Sci. Eng. Technol. 6(7) 279281 (2016)

    Google Scholar 

  5. Han, J., Xu, J., Nie, F., Li, X.: Multi-view K-means clustering with adaptive sparse memberships and weight allocation. IEEE Trans. Knowl. Data Eng. 34(2), 816–827 (2022)

    Article  Google Scholar 

  6. Steuer, R., Kurths, J., Daub, C.O., et al.: The mutual information: detecting and evaluating dependencies between variables. Bioinformatics 18(suppl_2), S231–S240 (2002) https://doi.org/10.1093/bioinformatics/18.suppl_2.s231

  7. Bholowalia, P., Kumar, A.: EBK-means: a clustering technique based on elbow method and k-means in WSN. Int. J. Comput. Appl. 105(09), 17–24 (2014)

    Google Scholar 

  8. Li, H., Shao, J., Liao, K., Tang, M.: Do simpler statistical methods perform better in multivariate long sequence time-series forecasting? In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, CIKM 2022, pp. 4168–4172. Association for Computing Machinery, New York (2022)

    Google Scholar 

  9. Yu, Y., Si, X., Hu, C., et al.: A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 31(7), 1235–1270 (2019)

    Article  MathSciNet  Google Scholar 

  10. Zhou, H., et al.: Informer: beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 35(12), 11106–11115 (2021). https://doi.org/10.1609/aaai.v35i12.17325

    Article  Google Scholar 

  11. Zhou, Z.: Machine Learning. Tsinghua University Press (2016)

    Google Scholar 

  12. Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. 2010 IEEE International Conference on Data Mining, Sydney, NSW, Australia, pp. 911–916 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinyi Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, X., Huang, J. (2024). From Wordle to Insights: Using Tailored Clustering and CART to Forecast Difficulty Levels. In: Pei, Y., Ma, H.S., Chan, YW., Jeong, HY. (eds) Proceedings of Innovative Computing 2024 Vol. 1. IC 2024. Lecture Notes in Electrical Engineering, vol 1214. Springer, Singapore. https://doi.org/10.1007/978-981-97-4193-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-4193-9_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-4192-2

  • Online ISBN: 978-981-97-4193-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics