Dementia, primarily caused by neurodegenerative diseases like Alzheimer’s disease (AD), affects millions worldwide, making detection and monitoring crucial. To enable these tasks, we propose encoding in-text pauses and filler words (i.e., “uh” and “um”) in text-based language models, and thoroughly evaluate their effect in performance. Additionally, we suggest using contrastive learning to improve performance in a multi-task framework. Our results demonstrated the effectiveness of our approaches in enhancing the model’s performance, achieving 87% accuracy and an 86% F1-score. Compared to the state-of-the-art, our approach has similar performance despite having significantly fewer parameters. This highlights the importance of pause and filler word encoding on the detection of dementia.