Nowadays, broadcasting news on social media and websites has grown at a swifter pace, which has had negative impacts on both the general public and governments; hence, this has urged us to build a fake news detection system. Contextualized word embeddings have achieved great success in recent years due to their power to embed both syntactic and semantic features of textual contents. In this article, we aim to address the problem of the lack of fake news datasets in Persian by introducing a new dataset crawled from different news agencies, and propose two deep models based on the Bidirectional Encoder Representations from Transformers model (BERT), which is a deep contextualized pre-trained model for extracting valuable features. In our proposed models, we benefit from two different settings of BERT, namely pool-based representation, which provides a representation for the whole document, and sequence representation, which provides a representation for each token of the document. In the former one, we connect a Single Layer Perceptron (SLP) to the BERT to use the embedding directly for detecting fake news. The latter one uses Convolutional Neural Network (CNN) after the BERT’s embedding layer to extract extra features based on the collocation of words in a corpus. Furthermore, we present the TAJ dataset, which is a new Persian fake news dataset crawled from news agencies’ websites. We evaluate our proposed models on the newly provided TAJ dataset as well as the two different Persian rumor datasets as baselines. The results indicate the effectiveness of using deep contextualized embedding approaches for the fake news detection task. We also show that both BERT-SLP and BERT-CNN models achieve superior performance to the previous baselines and traditional machine learning models, with 15.58% and 17.1% improvement compared to the reported results by Zamani et al. [
30
], and 11.29% and 11.18% improvement compared to the reported results by Jahanbakhsh-Nagadeh et al. [
9
].