The automatic detection of emotions in textual parts of social media websites such as Facebook and Twitter has applications for business development, user interface design, content creation, emergency response, among others. Current... more
The automatic detection of emotions in textual parts of social media websites such as Facebook and Twitter has applications for business development, user interface design, content creation, emergency response, among others. Current research has shown that it is possible to detect emotions for English content. To our knowledge, however, there are only few attempts for Arabic content. There is neither Arabic corpus with instances labeled for emotions, nor studies to detect emotions from Arabic microblogs content. Therefore, we collected Arabic text messages from the social networking website Twitter from January/February 2011. Human annotators labeled them with the corresponding emotions. Working with that corpus, our experiments show that emotions can be automatically detected from tweets after performing Arabic language related language preprocessing steps. Our contribution consists in adding preprocessing steps that have improved the classification results by 4.4% compared to the original Khoja stemmer. In addition, we have extracted a sample word-emotion lexicon from that corpus. Our experiment demonstrates that this sample word-emotion lexicon enhances the emotion detection results by 22.27% compared to the SMO classification using the train/test option. Finally, we show that the communication style used by the writer significantly relates with the emotion expressed in the text.
Concept extraction can help in building ontologies, which are the main component of the semantic Web. Ontologies are not only used in the semantic Web, but also in other fields such as Information Retrieval to improve the retrieval. In... more
Concept extraction can help in building ontologies, which are the main component of the semantic Web. Ontologies are not only used in the semantic Web, but also in other fields such as Information Retrieval to improve the retrieval. In this work, an Automatic Concept Extractor, which processes Arabic text, is presented. The algorithm of the Automatic Concept Extractor tags the words in the text, finds the pattern of each noun, and outputs only those nouns whose patterns match one of the concepts patterns in the concepts extraction rules. The result of each rule was evaluated individually to find the rules with the highest precision. Two datasets were crawled from the Web and converted to XML. Each rule was tested twice with each dataset as the input. The average precision of the rules showed that the rules with the patterns "Tafe'el" “ليعفت” and Fe'aleh “ةلاعف” achieved a high precision.
Abstract: This paper exploits the existence of the redundant Arabic extension character, i.e. Kashida. We propose to use pointed letters in Arabic text with a Kashida to hold the secret bit ‘one ’ and the un-pointed letters with a Kashida... more
Abstract: This paper exploits the existence of the redundant Arabic extension character, i.e. Kashida. We propose to use pointed letters in Arabic text with a Kashida to hold the secret bit ‘one ’ and the un-pointed letters with a Kashida to hold ‘zero’. The method can be classified under secrecy feature coding methods where it hides secret information bits within the letters benefiting from their inherited points. This watermarking technique is found attractive too to other languages having similar texts to Arabic such as Persian and Urdu. 1
Arabic diacritic marks represent efficient carriers to hide information into plain text. The ability of diacritics to invisibly superimpose them on each other, when typed multiple times consecutively, empowers them for robust data hiding... more
Arabic diacritic marks represent efficient carriers to hide information into plain text. The ability of diacritics to invisibly superimpose them on each other, when typed multiple times consecutively, empowers them for robust data hiding applications. In this paper, we propose two different algorithms to map secret messages into repeated diacritics in a non-wasteful fashion where the number of extra diacritics is defined in fixed and variable size fashions. Therefore, the size of the outputted text is decided by the encoding flexibility. Both steganographic algorithms are characterized by several advantages over their existing counterparts. Finally, we provide a detailed performance analysis of both algorithms in terms of embedding capacity, robustness and file size measures.