Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
<b>Russian Abstract:</b>Динамика ночных огней в видимом диапазоне спектра на суше в долгосрочной перспективе коррелирует с плотностью населения, ВВП и с технологическим прогрессом в целом. В настоящее время доступны большие... more
<b>Russian Abstract:</b>Динамика ночных огней в видимом диапазоне спектра на суше в долгосрочной перспективе коррелирует с плотностью населения, ВВП и с технологическим прогрессом в целом. В настоящее время доступны большие массивы космических снимков, которые позволяют провести ретроспективный анализ этой корреляции за последние 30 лет.<br><br><b>English Abstract:</b>The dynamics of night lights in the visible range of the spectrum on land in the long term correlates with population density, GDP and with technological progress in general. Currently, large arrays of satellite images are available that allow a retrospective analysis of this correlation over the past 30 years.
Russian Abstract: Сжигание попутного нефтяного газа (ПНГ) является экономически и экологически нерациональным способом его утилизации. Доступность данных по сжиганию ПНГ меняется в зависимости от страны; часто статистика ограничена или... more
Russian Abstract: Сжигание попутного нефтяного газа (ПНГ) является экономически и экологически нерациональным способом его утилизации. Доступность данных по сжиганию ПНГ меняется в зависимости от страны; часто статистика ограничена или нерелевантна. На сегодняшний день спутниковый мониторинг является единственным инструментальным методом измерения объёмов сжигания газа, не связанным с деятельностью нефтегазовых компаний. Однако разработка методики мониторинга сталкивается с необходимостью обработки больших массивов данных и введения алгоритма оценки объёмов сожжённого газа. В данной работе описан алгоритм, разработанный для сенсора VIIRS, и приведён анализ полученных данных по территории России за период наблюдений. English Abstract: Incineration of associated petroleum gas (APG) is an economically and environmentally unsustainable way to utilize it. Availability of APG incineration data varies by country; often statistics are limited or irrelevant. At present, satellite monitoring is the only instrumental method for measuring gas flaring volumes that is not related to the activities of oil and gas companies. However, the development of a monitoring methodology is faced with the need to process large amounts of data and introduce an algorithm for estimating volumes of burned gas. This paper describes an algorithm developed for the VIIRS sensor, and an analysis of the data obtained on the territory of Russia for the observation period.
Russian Abstract: Лавинообразное увеличение объема спутниковых данных, получаемых современными сенсорами, стало основным препятствием к их использованию для ручного обнаружения рыболовных судов промысловыми агентствами и другими... more
Russian Abstract: Лавинообразное увеличение объема спутниковых данных, получаемых современными сенсорами, стало основным препятствием к их использованию для ручного обнаружения рыболовных судов промысловыми агентствами и другими организациями. В связи с этим потребовалась разработка алгоритма и реализующей его автоматической системы детектирования ночных судовых огней по спутниковым данным и анализа их распределения. В данной работе представлен алгоритм детектирования ночных судовых огней по спутниковым данным сенсора VIIRS, описана программная система, реализующая разработанный алгоритм, описаны разработанные методы и средства анализа распределения ночных судовых огней, представлены результаты апробации разработанных методов. English Abstract: The avalanche increase in the amount of satellite data received by modern sensors has become a major obstacle to their use for manual detection of fishing vessels by fishing agencies and other organizations. In this regard, it was necessary to develop an algorithm and an automatic system for detecting night ship lights using satellite data and analyzing their distribution. This paper presents an algorithm for detecting night ship lights using satellite data from the VIIRS sensor, describes a software system that implements the developed algorithm, describes the developed methods and tools for analyzing the distribution of night ship lights, and presents the results of testing the developed methods.
More and more information extraction (IE) systems use ontologies for extraction tasks. These systems use knowledge representation techniques for extracting information from unstructured or semi-structured domains more efficiently. The... more
More and more information extraction (IE) systems use ontologies for extraction tasks. These systems use knowledge representation techniques for extracting information from unstructured or semi-structured domains more efficiently. The advantages of these procedures are especially an increase of quality in IE-templates, reusability, and maintainability. Ontologies in IE may provide new techniques for supporting open tasks of semantic analyses regarding for instance temporal analyses, resolution of contradiction, ...
The distribution of brightness of nighttime lights (NTL) at the Earth’s surface in the visible band of the electromagnetic spectrum is a new forward-looking data source for socio-economic studies. Visual and statistical analysis of this... more
The distribution of brightness of nighttime lights (NTL) at the Earth’s surface in the visible band of the electromagnetic spectrum is a new forward-looking data source for socio-economic studies. Visual and statistical analysis of this distribution in time and space requires new mathematical and geo-informational methods of cooperative processing of many raster images and vector data (geographical maps) together with socio-economic analytics. The current research develops new means of the spatiotemporal analysis, reveals basic problems of applied monitoring, and outlines forward-looking approaches to their
Some new contributions merged in and some placeholders added. 1 Executive Summary / Introduction The " Social Semantic Desktop " project NEPOMUK 1 , that has been going on for three years (2006–2008) has now reached its end.... more
Some new contributions merged in and some placeholders added. 1 Executive Summary / Introduction The " Social Semantic Desktop " project NEPOMUK 1 , that has been going on for three years (2006–2008) has now reached its end. This deliverable reports the achievements of the third and final year of NEPOMUK's work package 1 named " Knowledge Articulation and Visualisation ". Previous work in this work-package has been described in the deliverables D1. Bogdan 2008) and many additional publications referred to in these deliverables. The multitude of tools described in D1.2 " Conceptual Data Structure Tools " , has been integrated more tightly – with one another and with the NEPOMUK back-end services: Some have been integrated into the rich and monolithic NEPOMUK application framework PSEW, some have fused into small stand-alone applications that all connect to the NEPOMUK back-end to integrate and share their contents with other NEPOMUK components. PSEW/...
Social networking tools, blogs and microblogs, user-generated content sites, discussion groups, problem reporting, and other social services have transformed the way people communicate and consume information. Yet managing this... more
Social networking tools, blogs and microblogs, user-generated content sites, discussion groups, problem reporting, and other social services have transformed the way people communicate and consume information. Yet managing this information is still a very onerous activity for both the consumer and the provider, the information itself remains passive. Traditional methods of keyword extraction from text based on predefined codified knowledge are not well suited for use in such empirical environments, and as such do little to support making this information more an active part of the processes to which it may otherwise belong. In this paper we analyse various use cases of real-time context-sensitive keyword detection methods using IBM LanguageWare applications as example. We present a general high-performance method for exploiting ontologies to automatically generate semantic metadata for text assets, and demonstrate examples of how this method can be implemented to bring commercial an...
Our study focuses on the demography of the largest European social network VK and the representativeness of VK population sample with respect to the real-world state demography. The relationships between the variables, such as region... more
Our study focuses on the demography of the largest European social network VK and the representativeness of VK population sample with respect to the real-world state demography. The relationships between the variables, such as region code, settlement type, age and gender are explored. A special-purpose tool has been developed for ethnic group labeling purposes, which performs the classification given the user forename, patronymic and/or surname and ensures 99.2% accuracy. The analysis of the considered variables is helpful in finding a solution to the cold start problem in recommender systems. Keywords-internet sociology; internet demography; internet surveys; social network analysis.
... In both cases there is a need to operate with Clouds (fuzzy sets of PIMO nodes): Clouds describe topicality of documents in terms of PIMO as we described ... The Emergence of Multidimensional Networks.” Retrieved February 13, 2010,... more
... In both cases there is a need to operate with Clouds (fuzzy sets of PIMO nodes): Clouds describe topicality of documents in terms of PIMO as we described ... The Emergence of Multidimensional Networks.” Retrieved February 13, 2010, from http://www.hctd.net/newsletters/fall2007 ...
Spreading Activation is a family of graph-based algorithms widely used in areas such as information retrieval, epidemic models, and recommender systems. In this paper we introduce a novel Spreading Activation (SA) method that we call... more
Spreading Activation is a family of graph-based algorithms widely used in areas such as information retrieval, epidemic models, and recommender systems. In this paper we introduce a novel Spreading Activation (SA) method that we call Vectorised Spreading Activation (VSA). VSA algorithms, like “traditional” SA algorithms, iteratively propagate the activation from the initially activated set of nodes to the other nodes in a network through outward links. The level of the node’s activation could be used as a centrality measurement in accordance with dynamic model-based view of centrality that focuses on the outcomes for nodes in a network where something is flowing from node to node across the edges. Representing the activation by vectors allows the use of the information about various dimensionalities of the flow and the dynamic of the flow. In this capacity, VSA algorithms can model multitude of complex multidimensional network flows. We present the results of numerical simulations o...
Finite-state processing is typically based on structures that allow for efficient indexing and sequential search. However, this “rigid” framework has several disadvantages when used in natural language processing, especially for... more
Finite-state processing is typically based on structures that allow for efficient indexing and sequential search. However, this “rigid” framework has several disadvantages when used in natural language processing, especially for non-alphabetical languages. The solution is to systematically introduce polymorphic programming techniques that are adapted to particular cases. In this paper we describe the structure of a morphological dictionary implemented with finite-state automata using variable or polymorphic node formats. Each node is assigned a format from a predefined set reflecting its utility in corpora processing as measured by a number of graph theoretic metrics and statistics. Experimental results demonstrate that this approach permits a 52 % increase in the performance of dictionary look-up.
Finite-state devices are widely used in natural language processing, yet little if anything is known about metrics and topology of finite-state transition graphs. Here we study numerically the structure of directed state transition graphs... more
Finite-state devices are widely used in natural language processing, yet little if anything is known about metrics and topology of finite-state transition graphs. Here we study numerically the structure of directed state transition graphs for several types of finite-state devices representing morphology of 16 languages. In all experiments we have found that distribution of incoming and outcoming links is highly skewed and is modeled well by the power law, not by Poisson distribution typical of classical random graphs. The power-law form of degree distribution is regarded as a signature of self-organizing systems, and it has been previously found for numerous real world networks in communication, biology, social sciences and economics.
In this paper, we show how to represent to our formal reasoning and to model social context as knowledge using network models to aggregate heterogeneous information. We show how social context can be efficiently used for well understood... more
In this paper, we show how to represent to our formal reasoning and to model social context as knowledge using network models to aggregate heterogeneous information. We show how social context can be efficiently used for well understood tasks in natural language processing (such as context-dependent automated, large scale semantic annotation, term disambiguation, search of similar documents), as well as for novel applications such as social recommender systems which aim to alleviate information overload for social media users by presenting the most attractive and relevant content. We present the algorithms and the architecture of a hybrid recommender system in the activity centric environment Nepomuk-Simple (EU 6th Framework Project NEPOMUK): recommendations are computed on the fly by network flow methods performing in the unified multidimensional network of concepts from the personal information management ontology augmented with concepts extracted from the documents pertaining to ...
Many applications of the semantic web and Web 2.0 aim to empower the knowledge worker. These applications however, do not allow the user to combine all of his/her social and semantic information into a single resource which allows data to... more
Many applications of the semantic web and Web 2.0 aim to empower the knowledge worker. These applications however, do not allow the user to combine all of his/her social and semantic information into a single resource which allows data to be processed, managed and enhanced automatically. In our demo we will present a number of demo applications based on Galaxy, IBM’s ontological network miner, which was designed to work with such resources to enhance the capabilities of a number of applications in social semantic computing. Galaxy is a highly efficient, scalable system which can be easily built into an application and can be optimised to suit a user’s preferences or to take into account the needs of a particular task or application.
This talk is based on IBM’s experiences in the Nepo muk project. Nepomuk aims to build a socio-semantic desktop based on semantic we b technologies centred around a personal information management ontology, PIMO, whi ch represents each... more
This talk is based on IBM’s experiences in the Nepo muk project. Nepomuk aims to build a socio-semantic desktop based on semantic we b technologies centred around a personal information management ontology, PIMO, whi ch represents each user’s perspective on his/her data, the concepts they refe r to and how they all relate to each other and the real world. However, for most applications to benefit from the use of semantic web technologies, the user needs to manually annotate his/her data (email s, documents etc) with information (from his/her own perspective) as to what they are about, how they relate to each other and to topics/concepts in the ontology. Fortunately, this need not be the case. PIMO is a k ind of ontology, and can be used within text analysis to facilitate concept detectio n, generalisation, disambiguation automatic content labelling and much more. But how? Firstly we must consider how most approaches tend t o treat lexical entries in ontologies. Typically this follow...
Big data frequently come in tabular form of rows and columns of numbers, special codes and short textual descriptions, in strict, structured, disciplined formats generated by a variety of transactional and operational business systems. In... more
Big data frequently come in tabular form of rows and columns of numbers, special codes and short textual descriptions, in strict, structured, disciplined formats generated by a variety of transactional and operational business systems. In this paper we discuss the advantages of modeling heterogeneous data by multidimensional networks in line with the concept known as “Graph databases”. Graph-based methods provide a powerful abstraction for mining such data; however, it is hard to achieve good results in mining using of the shelf methods. In this paper we show how empirical methods of fuzzy logic could be injected into abstract graph-based methods to achieve desirable results. We outline the wide range of applications of that modeling and mining, and present our results on the use of our methods of modeling and mining for processing of custom declarations for commercial goods. We examine several use cases, including recommendations to custom officers and participants of the internati...
The proliferation of Web 2.0 and Enterprise 2.0 te chnologies has lead to the emergence of massive networks connecting people and various digital artefacts. The efficiency of human navigation in su ch networks depends on the availability... more
The proliferation of Web 2.0 and Enterprise 2.0 te chnologies has lead to the emergence of massive networks connecting people and various digital artefacts. The efficiency of human navigation in su ch networks depends on the availability of suitable user interfaces powered by an "intelligent" backend which provides guidance and recommendations. In this paper we describe how the "pile" based GUI (Graphical User Interface) cal led Nepomuk Simple and the IBM graphmining library Galaxy can be used for such guided navigation through the Personal Information Model (PIMO) ontology in the scenario of social semantic desktop as pertaining to the EU 6 th framework project Nepomuk. Firstly, we describe a method for graph based related item recommendation. The initial data for recommendations which allow br owsing from one single thing to another, one-to-one correspondences might be properly treated as an egocentric query. Following this logic, initial dat a for recommendati...
To graduate from a university and receive a diploma the student must follow curricula, have good command of certain topics, pass certain tests and exams. All the above mentioned artifacts of educational processes could be viewed as nodes... more
To graduate from a university and receive a diploma the student must follow curricula, have good command of certain topics, pass certain tests and exams. All the above mentioned artifacts of educational processes could be viewed as nodes in a large network where nodes of various kinds are connected by typed arcs, indicating, for instance, that the knowledge of a particular book or a research paper is required in a particular item of a particular curricula, or that before enrolling for a particular examination one needs to pass through particular tests. In this paradigm the process of education becomes the navigation from the initial nodes corresponding to the student knowledge and qualifications to the nodes which represent her goals. For some students the goal could be just one node representing diploma, for other students, especially for self-motivated life-long learners, the goal is a set of nodes. In this paper we present the initial results in modeling educational process as th...
Social tagging systems present a new challenge to the researchers working on recommender systems. The presence of tags, which uncover the reasons of user interests to tagged items, opens a way to increase the quality of recommendations.... more
Social tagging systems present a new challenge to the researchers working on recommender systems. The presence of tags, which uncover the reasons of user interests to tagged items, opens a way to increase the quality of recommendations. Yet, there is no common agreement of how the power of tags can be harnessed for recommendation. In this paper we argue for the use of spreading activation approach for building tag-aware recommender systems and suggest a specific version of this approach adapted to the multidimensional nature of social tagging networks. We introduce the asymmetric measure of relevancy (proximity) of two nodes on a multidimensional network as a cumulative strength of (weighted) multiple connections between two nodes, which includes paths and graph-structures connecting the nodes. This metric is also applicable to measure relevancy of two sub-graphs. Spreading activation methods (SAM), which usually employ breadth first search, are an efficient way to define and comput...
A mining method for egocentric and polycentric queries in multi-dimensional networks is proposed. The method allows fast search for objects in sufficient proximity of other object(s) where the proximity is defined in terms of multiple... more
A mining method for egocentric and polycentric queries in multi-dimensional networks is proposed. The method allows fast search for objects in sufficient proximity of other object(s) where the proximity is defined in terms of multiple relationships between objects. The method uses spreading activation technique. Other potential uses of spreading activation technique are also outlined and, in particular, include applications to collaborative filtering (community detection based on tag recommendations, expertise location, etc). Moreover, the spreading activation technique is combined with so-called ambient navigation. The advantages of such approach are high performance and high scalability in terms of size of multidimensional network. The proposed method is very practical and is implemented in IBM LanguageWare software products.
Techno-social systems generate data, which are rather different, than data, traditionally studied in social network analysis and other fields. In massive social networks agents simultaneously participate in several contexts, in different... more
Techno-social systems generate data, which are rather different, than data, traditionally studied in social network analysis and other fields. In massive social networks agents simultaneously participate in several contexts, in different communities. Network models of many real data from techno-social systems reflect various dimensionalities and rationales of actor’s actions and interactions. The data are inherently multidimensional, where “everything is deeply intertwingled”. The multidimensional nature of Big Data and the emergence of typical network characteristics in Big Data, makes it reasonable to address the challenges of structure detection in network models, including a) development of novel methods for local overlapping clustering with outliers, b) with near linear performance, c) preferably combined with the computation of the structural importance of nodes. In this chapter the spreading connectivity based clustering method is introduced. The viability of the approach and...
The presented article is focused on the comprehensive study of the issues connected with text processing on social media. First and foremost, the standpoint of the so-called ordinary language philosophy is examined. It holds the view of... more
The presented article is focused on the comprehensive study of the issues connected with text processing on social media. First and foremost, the standpoint of the so-called ordinary language philosophy is examined. It holds the view of the meaningfulness of words in sentences as being primarily determined by the ways in which they were put to use in the practical activity and the role they played in a broader context of the lifestyle. The contemporary lifestyle is inherent in social media with its virtual discourse which is determined not only by the words used but also by the relationships between the actors that generate the content. Second, the linguistic aspects of this issue are discussed. The modeling covering the structure of social media as well as users' texts are examined with regard to computational approaches to the broadening of the context concept in Frege's contextual principle. The examples of successful mining of such model both for traditional linguistic t...
Texts in virtual social networks differ cardinally from those of reviewed and edited publications, as being in fact the materials of non-moderated chat dialog having all the syntactic features, and in many cases, they are hypertexts with,... more
Texts in virtual social networks differ cardinally from those of reviewed and edited publications, as being in fact the materials of non-moderated chat dialog having all the syntactic features, and in many cases, they are hypertexts with, as analysis subjects, quite relative boundaries. Instead of texts the virtual discourse, an object of new type is analyzed. In these conditions, the linguistic analysis is transformed into preliminary linguistic processing the texts and analysis of texts into the analysis of networks…
Research Interests:
The construction process of the traditional heavyweight ontology precisely describing a specific area is a time consuming task. But in the field of constantly changing dynamic areas like the Web it is possible to produce a complete... more
The construction process of the traditional heavyweight ontology precisely describing a specific area is a time consuming task. But in the field of constantly changing dynamic areas like the Web it is possible to produce a complete ontology to accurately reflect any particular domain of interest. If information in the domain is changing rapidly the corresponding ontology should be constantly enriched with newly emerged concepts and relations. The problem of the ontology enrichment becomes even more crucial now with the emergence of social networking services and e-learning domains with highly dynamic content. In the paper we propose an experiment aimed at constructing ontology of interests based on the data provided by the Delicious online social service. This ontology will then be used as a raw material for our main goal of addressing a challenge of improving or enriching the ontological structure by developing techniques and mechanisms for capturing and representing the "hidd...
ABSTRACT In our increasingly globalised world, the study of impediments to international trade is of interest to the field of international economics. This paper focuses on the particular problem of speedy and accurate processing of... more
ABSTRACT In our increasingly globalised world, the study of impediments to international trade is of interest to the field of international economics. This paper focuses on the particular problem of speedy and accurate processing of customs declarations. We present a novel use of graph based spreading activation algorithm for the automated processing of customer declarations for commercial goods, based on supervised learning. This method allows us to build recommender systems for use by customs officers, traders, carriers and insurers. We examine the particular use case of the recommendation to assign or not assign an armed escort to a shipping vehicle in cases of elevated risk of theft. In contrast to the usual risk based approach, this algorithm is trained solely on shipment data rather than on traditional risk indicators. This is useful as the recommendation to customs officials can be explained in terms of the make-up of a shipment and can be verified in real-time. The feasibility of the approach was tested by application to 2500 custom records collected during a continuous period of one month at eight border checkpoints between Russian Federation and two EU countries. The algorithm achieved 100 % accuracy under experimental conditions.
Spreading activation (also known as spread of activation) is a method for searching associative networks, neural networks or semantic networks. The method is based on the idea of quickly spreading an associative relevancy measure over the... more
Spreading activation (also known as spread of activation) is a method for searching associative networks, neural networks or semantic networks. The method is based on the idea of quickly spreading an associative relevancy measure over the network. The goal is to give an expanded introduction to the method. The authors will demonstrate and describe in sufficient detail that this method can be applied to very diverse problems and applications. They present the method as a general framework. First they will present this method as a ...

And 27 more