2021 12th International Conference on Information and Communication Systems (ICICS), 2021
In this paper, we introduce both a Multi-Label Classification (MLC) method to determine all the e... more In this paper, we introduce both a Multi-Label Classification (MLC) method to determine all the emotions expressed in an Arabic tweet and a Multi-Target Regression (MTR) method to determine the emotions’ intensities. MLC involves the prediction of zero or more classes per sample. It is one of the interesting research topics in Natural Language Processing (NLP), especially for the Arabic language due to scarcity of works related to it. MTR is a harder task compared to MLC, but it lends itself as a suitable representation for Emotion Analysis (EA), which is gaining more interest due to the increasing use of social media and the wide range of applications related to it. This work introduces a new study on the use of Deep Learning (DL) techniques for emotions classification and quantification in Arabic tweets.
Metamorphic malware are able to change their appearance to evade detection by traditional anti-ma... more Metamorphic malware are able to change their appearance to evade detection by traditional anti-malware software. One of the ways to help mitigate the threat of new metamorphic malware is to determine their origins, i.e., the families to which they belong. This type of metamorphic malware analysis is not typically handled by commercial software. Moreover, existing works rely on analyzing the op-code sequences extracted from the Assembly files of the malware. Very few papers have tried to perform analysis on the binary files of the malware. However, they focused on the simple binary problem of differentiating between a certain malware family and benign files. In this work, we address the more difficult problem of determining the origin of a new metamorphic malware by measuring its similarity to hundreds of variants taken from 13 families of real malware. To address this problem, we use a compression-based classification approach. We experiment with two such approaches: AMDL and BCN. T...
Quranic Recitation Rules (Ahkam Al-Tajweed) are the articulation rules that should be applied pro... more Quranic Recitation Rules (Ahkam Al-Tajweed) are the articulation rules that should be applied properly when reciting the Holy Quran. Most of the current automatic Quran recitation systems focus on the basic aspects of recitation, which are concerned with the correct pronunciation of words and neglect the advanced Ahkam Al-Tajweed that are related to the rhythmic and melodious way of recitation such as where to stop and how to “stretch” or “merge” certain letters. The only existing works on the latter parts are limited in terms of the rules they consider or the parts of Quran they cover. This paper comes to fill these gaps. It addresses the problem of identifying the correct usage of Ahkam Al-Tajweed in the entire Quran. Specifically, we focus on eight Ahkam Al-Tajweed faced by early learners of recitation. Popular audio processing techniques for feature extraction (such as LPC, MFCC and WPD) and classification (KNN, SVM, RF, etc.) are tested on an in-house dataset. Moreover, we stud...
International Journal of Web Information Systems, 2016
Purpose Multi-label Text Classification (MTC) is one of the most recent research trends in data m... more Purpose Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth of online data and the increasing tendency of internet users to be more comfortable with assigning multiple labels/tags to describe documents, emails, posts, etc. The dimensionality of labels makes MTC more difficult and challenging compared with traditional single-labeled text classification (TC). Because it is a natural extension of TC, several ways are proposed to benefit from the rich literature of TC through what is called problem transformation (PT) methods. Basically, PT methods transform the multi-label data into a single-label one that is suitable for traditional single-label classification algorithms. Another approach is to design novel classification algorithms customized for MTC. Over the past decade, several works have appeared on both approaches focusing mainly on the English language. ...
Abstract: The continuous information explosion through the Internet and all information sources m... more Abstract: The continuous information explosion through the Internet and all information sources makes it necessary to perform all information processing activities automatically in quick and reliable manners. In this paper, we proposed and implemented a method to automatically create and Index for books written in Arabic language. The process depends largely on text summarization and abstraction processes to collect main topics and statements in the book. The process is developed in terms of accuracy and performance ...
Thesis (Ph. D.)--Illinois Institute of Technology, 1995. Includes bibliographical references (lea... more Thesis (Ph. D.)--Illinois Institute of Technology, 1995. Includes bibliographical references (leaves 125-130). Microfilm.
This paper reports a comparative study of two machine learning methods on Arabic text categorizat... more This paper reports a comparative study of two machine learning methods on Arabic text categorization. Based on a collection of news articles as a training set, and another set of news articles as a testing set, we evaluated K nearest neighbor (KNN) algorithm, and support ...
Journal of the American Society for Information Science, 1997
We have put together a corpus of 242 abstracts of Arabic documents using the Proceedings of the S... more We have put together a corpus of 242 abstracts of Arabic documents using the Proceedings of the Saudi Arabian National Conferences as a source. All these abstracts involve computer science and information systems. We also designed and built an automatic information retrieval ...
Journal of the American Society For Information Science, Oct 1, 1997
We have put together a corpus of 242 abstracts of Arabic documents using the Proceedings of the S... more We have put together a corpus of 242 abstracts of Arabic documents using the Proceedings of the Saudi Arabian National Conferences as a source. All these abstracts involve computer science and information systems. We also designed and built an automatic information retrieval ...
There are many hot topics related to information retrieval paradigm, and one of these important f... more There are many hot topics related to information retrieval paradigm, and one of these important fields is Automatic text indexing that aims to make process of online retrieving documents easier for the web searchers. In this paper we intend to introduced a comprehensive study on Indexing Arabic Documents, since there have been little works deals with. The introduced papers here addressed this problem from deferent views, some deals with single-term indexing while others deal with phrase indexing, other researchers made comparisons between deferent techniques and gave us preferability to one against others based on some experimental results. On the other hand, some papers proposed new technique or made some enhancements on existing ones either depends on statistical or un-statistical methods. The rest of papers proposed tools as key-terms extractors to be used in text indexing. Till now there are no optimal suggested solutions that solve the indexing problem that could be considered ...
Stock prices prediction is interesting and challenging research topic. Developed countries' e... more Stock prices prediction is interesting and challenging research topic. Developed countries' economies are measured according to their power economy. Currently, stock markets are considered to be an illustrious trading field because in many cases it gives easy profits with low risk rate of return. Stock market with its huge and dynamic information sources is considered as a suitable environment for data mining and business researchers. In this paper, we applied k-nearest neighbor algorithm and non-linear regression approach in order to predict stock prices for a sample of six major companies listed on the Jordanian stock exchange to assist investors, management, decision makers, and users in making correct and informed investments decisions. According to the results, the kNN algorithm is robust with small error ratio; consequently the results were rational and also reasonable. In addition, depending on the actual stock prices data; the prediction results were close and almost par...
Abstract - Genetic algorithms are an evolutionary technique that uses selection, crossover, and m... more Abstract - Genetic algorithms are an evolutionary technique that uses selection, crossover, and mutation operators to solve optimization problems using a survival of the fittest idea. In this paper, a straightforward genetic algorithm is used to solve the Constraint Satisfaction ...
In the last few years, as Internet usage becomes the main artery of the life's daily activiti... more In the last few years, as Internet usage becomes the main artery of the life's daily activities, the problem of spam becomes very serious for internet community. Spam pages form a real threat for all types of users. This threat proved to evolve continuously without any clue to abate. Different forms of spam witnessed a dramatic increase in both size and negative impact. A large amount of E-mails and web pages are considered spam either in Simple Mail Transfer Protocol (SMTP) or search engines. Many technical methods were proposed to approach the problem of spam. In E-mails spam detection, Bayesian Filters are widely and successfully applied for the sake of detecting and eliminating spam. The assumption that each term in the document contributes to the filtering task equally to other terms and the avoidance of user's feed back are major shortcomings that we attempt to overcome in this work. We propose an improved Naïve Bayes Classifier that gives weight to the information fed...
The object of this paper is to introduce a λ-type calculus which is easier to handle than the kno... more The object of this paper is to introduce a λ-type calculus which is easier to handle than the known λ-calculus. We were able to represent conditional functions and Booleans, in the λ-type calculus, in a form that is easy to understand and easy to compute.
This paper proposes a modified approach for the concept based query expansion method referred to ... more This paper proposes a modified approach for the concept based query expansion method referred to as a New Approach for Automatic Query Expansion using Genetic Algorithm (NAQEGA). This technique employs query concept to find the most similar term to the query and adds this term to the query to form another query terms. This process is repeated until there is no more similar term to the query concept. Furthermore, an automatic GA method is proposed to overcome exhaustive calculation in finding the most similar term to the query concept and adding it to the query. Also, a fitness function is addressed in GA to fulfill the need of finding and adding the most similar term to the query. The genetic operators in the GA applied in this paper are the partially matched multipoint crossover method to deal with the crossover operations and the inversion mutation method to deal with the mutation operations. The proposed algorithm improves the average recall rate by 9% when compared to the Concep...
Multi-label text classification (MTC) is a natural extension of the traditional text classificati... more Multi-label text classification (MTC) is a natural extension of the traditional text classification (TC) in which a possibly large set of labels can be assigned to each document. The dimensionality of labels makes MTC difficult and challenging. Several ways are proposed to ease the classification process and one of them is called the problem transformation (PT) method. It is used to transform the multi-labeled data into a singlelabel one that is suitable for normal classification. Our paper presents a detailed study about using the supervised approach to address the MTC problem for Arabic text. Moreover, the scalability of such an approach is considered in our experiments. The MEKA system is used to convert the multi-label data into a single-label one using different PT methods: LC, BR and RT. Then, different classifiers commonly used for TC such as SVM, NB, KNN, and Decision Tree, are applied to each dataset. The results show that using SVM on the LC dataset generated the best resu...
There are several researches and procedures for classifying Arabic-language texts that are based ... more There are several researches and procedures for classifying Arabic-language texts that are based mostly on different environments. This lack of dependence on a unified standard (such as a unified dataset) makes it hard to determine the most accurate technique for classification. In this paper, we study and analyze the classification algorithms based on a unified environment and a different dataset with the included challenges faced by these algorithms to demonstrate their effectiveness and accuracy with a large dataset.
2021 12th International Conference on Information and Communication Systems (ICICS), 2021
In this paper, we introduce both a Multi-Label Classification (MLC) method to determine all the e... more In this paper, we introduce both a Multi-Label Classification (MLC) method to determine all the emotions expressed in an Arabic tweet and a Multi-Target Regression (MTR) method to determine the emotions’ intensities. MLC involves the prediction of zero or more classes per sample. It is one of the interesting research topics in Natural Language Processing (NLP), especially for the Arabic language due to scarcity of works related to it. MTR is a harder task compared to MLC, but it lends itself as a suitable representation for Emotion Analysis (EA), which is gaining more interest due to the increasing use of social media and the wide range of applications related to it. This work introduces a new study on the use of Deep Learning (DL) techniques for emotions classification and quantification in Arabic tweets.
Metamorphic malware are able to change their appearance to evade detection by traditional anti-ma... more Metamorphic malware are able to change their appearance to evade detection by traditional anti-malware software. One of the ways to help mitigate the threat of new metamorphic malware is to determine their origins, i.e., the families to which they belong. This type of metamorphic malware analysis is not typically handled by commercial software. Moreover, existing works rely on analyzing the op-code sequences extracted from the Assembly files of the malware. Very few papers have tried to perform analysis on the binary files of the malware. However, they focused on the simple binary problem of differentiating between a certain malware family and benign files. In this work, we address the more difficult problem of determining the origin of a new metamorphic malware by measuring its similarity to hundreds of variants taken from 13 families of real malware. To address this problem, we use a compression-based classification approach. We experiment with two such approaches: AMDL and BCN. T...
Quranic Recitation Rules (Ahkam Al-Tajweed) are the articulation rules that should be applied pro... more Quranic Recitation Rules (Ahkam Al-Tajweed) are the articulation rules that should be applied properly when reciting the Holy Quran. Most of the current automatic Quran recitation systems focus on the basic aspects of recitation, which are concerned with the correct pronunciation of words and neglect the advanced Ahkam Al-Tajweed that are related to the rhythmic and melodious way of recitation such as where to stop and how to “stretch” or “merge” certain letters. The only existing works on the latter parts are limited in terms of the rules they consider or the parts of Quran they cover. This paper comes to fill these gaps. It addresses the problem of identifying the correct usage of Ahkam Al-Tajweed in the entire Quran. Specifically, we focus on eight Ahkam Al-Tajweed faced by early learners of recitation. Popular audio processing techniques for feature extraction (such as LPC, MFCC and WPD) and classification (KNN, SVM, RF, etc.) are tested on an in-house dataset. Moreover, we stud...
International Journal of Web Information Systems, 2016
Purpose Multi-label Text Classification (MTC) is one of the most recent research trends in data m... more Purpose Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth of online data and the increasing tendency of internet users to be more comfortable with assigning multiple labels/tags to describe documents, emails, posts, etc. The dimensionality of labels makes MTC more difficult and challenging compared with traditional single-labeled text classification (TC). Because it is a natural extension of TC, several ways are proposed to benefit from the rich literature of TC through what is called problem transformation (PT) methods. Basically, PT methods transform the multi-label data into a single-label one that is suitable for traditional single-label classification algorithms. Another approach is to design novel classification algorithms customized for MTC. Over the past decade, several works have appeared on both approaches focusing mainly on the English language. ...
Abstract: The continuous information explosion through the Internet and all information sources m... more Abstract: The continuous information explosion through the Internet and all information sources makes it necessary to perform all information processing activities automatically in quick and reliable manners. In this paper, we proposed and implemented a method to automatically create and Index for books written in Arabic language. The process depends largely on text summarization and abstraction processes to collect main topics and statements in the book. The process is developed in terms of accuracy and performance ...
Thesis (Ph. D.)--Illinois Institute of Technology, 1995. Includes bibliographical references (lea... more Thesis (Ph. D.)--Illinois Institute of Technology, 1995. Includes bibliographical references (leaves 125-130). Microfilm.
This paper reports a comparative study of two machine learning methods on Arabic text categorizat... more This paper reports a comparative study of two machine learning methods on Arabic text categorization. Based on a collection of news articles as a training set, and another set of news articles as a testing set, we evaluated K nearest neighbor (KNN) algorithm, and support ...
Journal of the American Society for Information Science, 1997
We have put together a corpus of 242 abstracts of Arabic documents using the Proceedings of the S... more We have put together a corpus of 242 abstracts of Arabic documents using the Proceedings of the Saudi Arabian National Conferences as a source. All these abstracts involve computer science and information systems. We also designed and built an automatic information retrieval ...
Journal of the American Society For Information Science, Oct 1, 1997
We have put together a corpus of 242 abstracts of Arabic documents using the Proceedings of the S... more We have put together a corpus of 242 abstracts of Arabic documents using the Proceedings of the Saudi Arabian National Conferences as a source. All these abstracts involve computer science and information systems. We also designed and built an automatic information retrieval ...
There are many hot topics related to information retrieval paradigm, and one of these important f... more There are many hot topics related to information retrieval paradigm, and one of these important fields is Automatic text indexing that aims to make process of online retrieving documents easier for the web searchers. In this paper we intend to introduced a comprehensive study on Indexing Arabic Documents, since there have been little works deals with. The introduced papers here addressed this problem from deferent views, some deals with single-term indexing while others deal with phrase indexing, other researchers made comparisons between deferent techniques and gave us preferability to one against others based on some experimental results. On the other hand, some papers proposed new technique or made some enhancements on existing ones either depends on statistical or un-statistical methods. The rest of papers proposed tools as key-terms extractors to be used in text indexing. Till now there are no optimal suggested solutions that solve the indexing problem that could be considered ...
Stock prices prediction is interesting and challenging research topic. Developed countries' e... more Stock prices prediction is interesting and challenging research topic. Developed countries' economies are measured according to their power economy. Currently, stock markets are considered to be an illustrious trading field because in many cases it gives easy profits with low risk rate of return. Stock market with its huge and dynamic information sources is considered as a suitable environment for data mining and business researchers. In this paper, we applied k-nearest neighbor algorithm and non-linear regression approach in order to predict stock prices for a sample of six major companies listed on the Jordanian stock exchange to assist investors, management, decision makers, and users in making correct and informed investments decisions. According to the results, the kNN algorithm is robust with small error ratio; consequently the results were rational and also reasonable. In addition, depending on the actual stock prices data; the prediction results were close and almost par...
Abstract - Genetic algorithms are an evolutionary technique that uses selection, crossover, and m... more Abstract - Genetic algorithms are an evolutionary technique that uses selection, crossover, and mutation operators to solve optimization problems using a survival of the fittest idea. In this paper, a straightforward genetic algorithm is used to solve the Constraint Satisfaction ...
In the last few years, as Internet usage becomes the main artery of the life's daily activiti... more In the last few years, as Internet usage becomes the main artery of the life's daily activities, the problem of spam becomes very serious for internet community. Spam pages form a real threat for all types of users. This threat proved to evolve continuously without any clue to abate. Different forms of spam witnessed a dramatic increase in both size and negative impact. A large amount of E-mails and web pages are considered spam either in Simple Mail Transfer Protocol (SMTP) or search engines. Many technical methods were proposed to approach the problem of spam. In E-mails spam detection, Bayesian Filters are widely and successfully applied for the sake of detecting and eliminating spam. The assumption that each term in the document contributes to the filtering task equally to other terms and the avoidance of user's feed back are major shortcomings that we attempt to overcome in this work. We propose an improved Naïve Bayes Classifier that gives weight to the information fed...
The object of this paper is to introduce a λ-type calculus which is easier to handle than the kno... more The object of this paper is to introduce a λ-type calculus which is easier to handle than the known λ-calculus. We were able to represent conditional functions and Booleans, in the λ-type calculus, in a form that is easy to understand and easy to compute.
This paper proposes a modified approach for the concept based query expansion method referred to ... more This paper proposes a modified approach for the concept based query expansion method referred to as a New Approach for Automatic Query Expansion using Genetic Algorithm (NAQEGA). This technique employs query concept to find the most similar term to the query and adds this term to the query to form another query terms. This process is repeated until there is no more similar term to the query concept. Furthermore, an automatic GA method is proposed to overcome exhaustive calculation in finding the most similar term to the query concept and adding it to the query. Also, a fitness function is addressed in GA to fulfill the need of finding and adding the most similar term to the query. The genetic operators in the GA applied in this paper are the partially matched multipoint crossover method to deal with the crossover operations and the inversion mutation method to deal with the mutation operations. The proposed algorithm improves the average recall rate by 9% when compared to the Concep...
Multi-label text classification (MTC) is a natural extension of the traditional text classificati... more Multi-label text classification (MTC) is a natural extension of the traditional text classification (TC) in which a possibly large set of labels can be assigned to each document. The dimensionality of labels makes MTC difficult and challenging. Several ways are proposed to ease the classification process and one of them is called the problem transformation (PT) method. It is used to transform the multi-labeled data into a singlelabel one that is suitable for normal classification. Our paper presents a detailed study about using the supervised approach to address the MTC problem for Arabic text. Moreover, the scalability of such an approach is considered in our experiments. The MEKA system is used to convert the multi-label data into a single-label one using different PT methods: LC, BR and RT. Then, different classifiers commonly used for TC such as SVM, NB, KNN, and Decision Tree, are applied to each dataset. The results show that using SVM on the LC dataset generated the best resu...
There are several researches and procedures for classifying Arabic-language texts that are based ... more There are several researches and procedures for classifying Arabic-language texts that are based mostly on different environments. This lack of dependence on a unified standard (such as a unified dataset) makes it hard to determine the most accurate technique for classification. In this paper, we study and analyze the classification algorithms based on a unified environment and a different dataset with the included challenges faced by these algorithms to demonstrate their effectiveness and accuracy with a large dataset.
Uploads
Papers by Ismail Hmeidi