What makes scraping methodologically interesting for social and cultural research? This paper seeks to contribute to debates about digital social research by exploring how a ‘medium-specific’ technique for online data capture may be... more
What makes scraping methodologically interesting for social and cultural research? This paper seeks to contribute to debates about digital social research by exploring how a ‘medium-specific’ technique for online data capture may be rendered analytically productive for social research. As a device that is currently being imported into social research, scraping has the capacity to re-structure social research, and this in at least two ways. Firstly, as a technique that is not native to social research, scraping risks to introduce ‘alien’ methodological assumptions into social research (such as an pre-occupation with freshness). Secondly, to scrape is to risk importing into our inquiry categories that are prevalent in the social practices enabled by the media: scraping makes available already formatted data for social research. Scraped data, and online social data more generally, tend to come with ‘external’ analytics already built-in. This circumstance is often approached as a ‘problem’ with online data capture, but we propose it may be turned into virtue, insofar as data formats that have currency in the areas under scrutiny may serve as a source of social data themselves. Scraping, we propose, makes it possible to render traffic between the object and process of social research analytically productive. It enables a form of ‘real-time’ social research, in which the formats and life cycles of online data may lend structure to the analytic objects and findings of social research. By way of a conclusion, we demonstrate this point in an exercise of online issue profiling, and more particularly, by relying on Twitter to profile the issue of ‘austerity’. Here we distinguish between two forms of real-time research, those dedicated to monitoring live content (which terms are current?) and those concerned with analysing the liveliness of issues (which topics are happening?).
International Conference on NLP, Data Mining and Machine Learning (NLDML 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Natural Language Computing, Data... more
International Conference on NLP, Data Mining and Machine Learning (NLDML 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Natural Language Computing, Data Mining and Machine Learning.
This article tries to explain our rule-based Arabic Named Entity recognition (NER) and classification system. It is based on lists of classified proper names (PN) and particularly on syntactico-semantic patterns resulting in fine... more
This article tries to explain our rule-based Arabic Named Entity recognition (NER) and classification system. It is based on lists of classified proper names (PN) and particularly on syntactico-semantic patterns resulting in fine classification of Arabic NE. These patterns use syntactico-semantic combination of morpho-syntactic and syntactic entities. It also uses lexical classification of trigger words and NE extensions. These linguistic data are essential not only to name entity extraction but also to the taxonomic classification and to determining the NE frontiers. Our method is also based on the contextualisation and on the notion of NE class attributes and values. Inspired from X-bar theory and immediate constituents, we built a rule-based NER system composed of five levels of syntactico-semantic combination. We also show how the fine NE annotations in our system output (XML database) is exploited in information retrieval and information extraction.
This paper describes the Information extraction and content analysis system. The proposed system based on a conditional random eld algorithm and intended to extract aspect terms mentioned in the text. We used a set of morphological... more
This paper describes the Information extraction and content analysis system. The proposed system based on a conditional random eld algorithm and intended to extract aspect terms mentioned in the text. We used a set of morphological features for machine learning. The system was used to automatic extraction of explicit aspects and also to automatic extraction of all aspects (explicit, implicit and sentiment facts), and tested on two domains { restaurants and automobiles. We show that our system can produce quite a high level of precision which means that the system is capable of recognize of aspect terms rather accurate. Our system demonstrated that even a small set of features for conditional random eld algorithm performed competitively and showed a good results.
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze,... more
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
Semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship through the goods or... more
Semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship through the goods or services, etc. Motivated by that kind of interaction, this study proposes a method to extract those objects and their interactions. It is presented a general framework of object-interaction mining of large corpora. The framework is started with the initial step in extracting a single object in the unstructured document. In this study, the initial step is a pattern learning method that is applied to drug-label documents to extract drug-names. We utilize an existing external knowledge to identify a certain regular expressions surrounding the targeted object and the probabilities of that regular expression, to perform the pattern learning process. The performance of this pattern learning approach is promising to apply in this relation extraction area. As presented in the results of this study, the best f-score performance of this method is 0.78 f-score. With adjusting of some parameters and or improving the method, the performance can be potentially improved
Semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship through the goods or... more
Semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship through the goods or services, etc. Motivated by that kind of interaction, this study proposes a method to extract those objects and their interactions. It is presented a general framework of object-interaction mining of large corpora. The framework is started with the initial step in extracting a single object in the unstructured document. In this study, the initial step is a pattern learning method that is applied to drug-label documents to extract drug-names. We utilize an existing external knowledge to identify a certain regular expressions surrounding the targeted object and the probabilities of that regular expression, to perform the pattern learning process. The performance of this pattern learning approach is promising to apply in this relation extraction area. A...
semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship through the goods or... more
semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship through the goods or services, etc. Motivated by that kind of interaction, this study proposes a method to extract those objects and their interactions. It is presented a general framework of object-interaction mining of large corpora. The framework is started with the initial step in extracting a single object in the unstructured document. In this study, the initial step is a pattern learning method that is applied to drug-label documents to extract drug-names. We utilize an existing external knowledge to identify a certain regular expressions surrounding the targeted object and the probabilities of that regular expression, to perform the pattern learning process. The performance of this pattern learning approach is promising to apply in this relation extraction area. A...
Essay explains how today's education system and societal expectations require "educated" individuals to have strong computer and information analysis skills.