Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
The paper considers one of the first steps of implementation of the Russian Virtual Observatory Information Infrastructure (RVOII) – organization of a Community centre at IPI RAS (Moscow) for support of scientific astronomical problem... more
The paper considers one of the first steps of implementation of the Russian Virtual Observatory Information Infrastructure (RVOII) – organization of a Community centre at IPI RAS (Moscow) for support of scientific astronomical problem solving over distributed repositories of astronomical information. As a motivation the trends for distributed infrastructure development for e-science are presented. Information infrastructure of RVO aimed to satisfy International Virtual Observatory Alliance standards is briefly introduced. Structure of AstroGrid system considered as a core of RVOII is presented in
Data intensive research is increasingly dependent on the explicit use of hypotheses, simulations and computational models. This paper is devoted to the development of infrastructure for explicit management of virtual experiments and... more
Data intensive research is increasingly dependent on the explicit use of hypotheses, simulations and computational models. This paper is devoted to the development of infrastructure for explicit management of virtual experiments and research hypotheses. In particular, hypothesis lattices construction issues are considered. Basic concepts for working with research hypotheses such as hypotheses structure, its basic properties, causal correspondence of equations and variables over the defined structures are provided. The notion of hypotheses lattice is presented as a graph whose vertices are hypotheses, edges are the derived by relationship between hypotheses. An algorithm for constructing hypothesis lattices in virtual experiments is presented. A proof of the proposition on the complexity of the algorithm for constructing a lattice of hypotheses is provided. The developed method for constructing hypothesis lattices is implemented as a program component in the Python3 language.
Efficient and timely fault detection is a significant problem due to the intensifying use of modern technological solutions in machine condition monitoring. This work is carried out as part of a project that is aimed at development of... more
Efficient and timely fault detection is a significant problem due to the intensifying use of modern technological solutions in machine condition monitoring. This work is carried out as part of a project that is aimed at development of software solutions for a housing and utility condition monitoring system. An experimental setup was designed and assembled for the study of basic housing infrastructure elements operating modes. The setup includes electric pumps, power transformers, ventilation and air conditioning systems (HVAC), heaters and electric boilers. Every element is equipped with various sensors. Sensor readings were gathered, processed and analyzed. This dataset was used to fit statistical and probabilistic models such as linear regression and Hidden Markov model in order to classify regular and faulty operating modes of equipment. Nine classes of equipment malfunction were modeled, these models are intended to be used as a theoretical basis for the design of industrial housing and utility condition monitoring systems.
Modern IT world requires data integration systems to deal with the large number of heterogeneous data sources. Such systems should perform not only data extraction, but also schema alignment, entity resolution and data fusion. In the... more
Modern IT world requires data integration systems to deal with the large number of heterogeneous data sources. Such systems should perform not only data extraction, but also schema alignment, entity resolution and data fusion. In the world of big data with large number of heterogenous data sources, there are number of methods that address various aspects of integration, to make the system automatic and less user-dependent. This work proposes an extensible approach for development of data integration system to perform materialized integration of heterogenous sources in a distributed computation environment. A prototype of the system with implementation of advanced methods for big data integration has been developed. The system is applied in e-commerce domain.
Functional magnetic resonance imaging (fMRI) is widely used to study the human brain. Many urgent problems in neurophysiology are solved applying machine learning classification methods over fMRI data. The paper compares state-of-the-art... more
Functional magnetic resonance imaging (fMRI) is widely used to study the human brain. Many urgent problems in neurophysiology are solved applying machine learning classification methods over fMRI data. The paper compares state-of-the-art approaches for classification of human brain state including neural ordinary differential equations in real time.
Recent development of new approaches to data collection, storage and analysis make the data-driven condition monitoring techniques a powerful instrument in housing and utility infrastructure maintenance. Advancements in software... more
Recent development of new approaches to data collection, storage and analysis make the data-driven condition monitoring techniques a powerful instrument in housing and utility infrastructure maintenance. Advancements in software development and sensor construction lead to spread of "Internet of Things" concept suggesting the devices to be equipped with various sensors producing large amount of data. The paper introduces an architecture of the information system for predictive maintenance in housing and utility infrastructure based on scalable distributed computing and data mining methods for fault detection and prognostics. Data mining methods featured in the information system are compared and analyzed.
In the areas of Semantic Web and data integration, ontology matching is one of the important steps to resolve semantic heterogeneity. Manual ontology matching is very labor-intensive, time-consuming and prone to errors. So development of... more
In the areas of Semantic Web and data integration, ontology matching is one of the important steps to resolve semantic heterogeneity. Manual ontology matching is very labor-intensive, time-consuming and prone to errors. So development of automatic or semi-automatic ontology matching methods and tools is quite important. This paper applies machine learning with different similarity measures between ontology elements as features for ontology matching. An approach to combine string-based, language-based and structurebased similarity measures with machine learning techniques is proposed. Logistic Regression, Random Forest classifier and Gradient Boosting are used as machine learning methods. The approach is evaluated on two datasets of Ontology Alignment Evaluation Initiative (OAEI).
According to the Open Science paradigm data sources are to be concentrated within research data infrastructures intended to support the whole cycle of data management and processing. FAIR data management and stewardship principles that... more
According to the Open Science paradigm data sources are to be concentrated within research data infrastructures intended to support the whole cycle of data management and processing. FAIR data management and stewardship principles that had being developed and announced recently state that data within a data infrastructure have to be findable, accessible, interoperable and reusable. Note that data sources can be quite heterogeneous and represented using very different data models. Variety of data models includes traditional relational model and its object-relational extensions, array and graph-based models, semantic models like RDF and OWL, models for semi-structured data like NoSQL, XML, JSON and so on. This particular paper overviews data model unification techniques considered as a formal basis for (meta)data interoperability, integration and reuse within FAIR data infrastructures. These techniques are intended to deal with heterogeneity of data models and their data manipulation languages used to represent data and provide access to data in data sources. General principles of data model unification, languages and formal methods required, stages of data model unification are considered and illustrated by examples. Application of the techniques for data integration within FAIR data infrastructures is discussed.
Nowadays data sources within data infrastructures are quite heterogeneous, they are represented using very different data models. Data models vary from relational one to NoSQL zoo of data models. A prerequisite for (meta)data... more
Nowadays data sources within data infrastructures are quite heterogeneous, they are represented using very different data models. Data models vary from relational one to NoSQL zoo of data models. A prerequisite for (meta)data interoperability, integration and reuse within some data infrastructure is unification of source data models and their data manipulation languages. A unifying data model (called canonical) has to be chosen for the data infrastructure. Every source data model has to be mapped into the canonical model, mapping should be formalized and verified. The paper overviews data unification techniques have been developed during recent years and discusses application of these techniques for data integration within FAIR data infrastructures.
This paper focuses on the field of sentiment analysis of natural language texts. To be precise, short texts extracted from social networks are analyzed. At present two groups of applied methods can be distinguished in this field: machine... more
This paper focuses on the field of sentiment analysis of natural language texts. To be precise, short texts extracted from social networks are analyzed. At present two groups of applied methods can be distinguished in this field: machine learning methods and methods based on sentiment lexicons. The paper reviews the principal methods in this field and proposes an approach for short text sentiment analysis, combining sentiment lexicons and blending of machine learning algorithms for the problem of three-class text classification. Various formulas for determining weights of the words in vector representation of texts are considered as well. This approach is applied to a dataset that consists of 10,000 manually marked posts extracted from VKontakte social network. Class distribution of dataset objects for the studied area often tends to be unbalanced. Standard models show moderate quality but only due to the fact that the majority of data points are classified as belonging to the domin...
VARIABLES m_directed, m_restricted, m_startVertexType, m_endVertexType, isValidEdge INVARIANT ... Раздел INVARIANT содержит формулу, которая состоит из предикатов, типизирующих переменные состояния, и налагающих различные совместные... more
VARIABLES m_directed, m_restricted, m_startVertexType, m_endVertexType, isValidEdge INVARIANT ... Раздел INVARIANT содержит формулу, которая состоит из предикатов, типизирующих переменные состояния, и налагающих различные совместные ограничения на переменные и константы. Предикаты соединяются операцией конъюнкции. Так, декларируется, что c_edges и c_vertices действительно являются именами классов: c_edges: classNames & c_vertices: classNames Здесь classNames – множество, содержащее имена всех классов базы данных [22]. Метаинформация, связанная с типом экземпляров класса ребер, представлена переменными m_directed (направленность ребра), m_restricted (определенность типов вершин ребра), m_startVertexType (тип исходящей вершины), m_endVertexType (тип входящей вершины): m_directed: subclasses(c_edges) --> BOOL & m_restricted: subclasses(c_edges) --> BOOL & m_startVertexType: subclasses(c_edges) --> subclasses(c_vertices) & m_endVertexType: subclasses(c_edges) --> subclasses(...
The paper considers an approach for specification of data integration rules using RIF-BLD logic dialect that is a recommendation of W3C. This allows us to reference entities defined in different collections represented using different... more
The paper considers an approach for specification of data integration rules using RIF-BLD logic dialect that is a recommendation of W3C. This allows us to reference entities defined in different collections represented using different data models in the same rule. Logical semantics of RIF-BLD provides also unambiguous interpretation of data integration rules. The paper proposes an approach for implementation of RIF-BLD rules using HIL language. Thus data integration rules are compiled into MapReduce programs and can be executed over Hadoop-based distributed infrastructures.
Integration of large heterogeneous data collections, gathered usually for making decisions on information security (IS) management issues, requires a preliminary step -- unification of their data models. It is provided by mapping the... more
Integration of large heterogeneous data collections, gathered usually for making decisions on information security (IS) management issues, requires a preliminary step -- unification of their data models. It is provided by mapping the source data models into the canonical information model. Information and semantics of data definition languages (DDL) and semantics of the operations of data manipulation languages (DML) have to be preserved by the mapping. This research is devoted to the unification of the graph data models (GDM) -- the important kind of the existing various data models. The distinguishing features of modern GDM are discussed as well as their application in the information security (IS) area. The issues of proof of DDL and DML semantics preserving by the mapping of the GDM into the object-frame canonical model are briefly considered. Future work steps in applying research results to different IS management areas are indicated in conclusion.
Abstract. Organization and management of virtual experiments in data-intensive research has been widely studied in the several past years. Authors survey existing approaches to deal with virtual experiments and hypotheses, and analyze... more
Abstract. Organization and management of virtual experiments in data-intensive research has been widely studied in the several past years. Authors survey existing approaches to deal with virtual experiments and hypotheses, and analyze virtual experiment management in a real astronomy use-case. Requirements for a system to organize virtual experiments in data intensive domain have been gathered and overall structure and functionality for system running virtual experiments are presented. The relationships between hypotheses and models in virtual experiment are discussed. Authors also illustrate how to conceptually model virtual experiments and respective hypotheses and models in provided astronomy use-case. Potential benefits and drawbacks of such approach are discussed, including maintenance of experiment consistency and shrinkage of experiment space. Overall, infrastructure for managing virtual experiments is presented.
Developing methods for analyzing and extracting information from modern sky surveys is a challenging task in astrophysical studies and is important for many investigations of galactic and extragalactic objects. We have designed a method... more
Developing methods for analyzing and extracting information from modern sky surveys is a challenging task in astrophysical studies and is important for many investigations of galactic and extragalactic objects. We have designed a method for determination of stellar parameters and interstellar extinctions from multicolor photometry. This method was applied to objects drawn from modern large photometric surveys. In this work, we give a review of the surveys and discuss problems of cross-identification, paying particular attention to the information flags contained in the surveys. Also we have determined new statistical relations to estimate the stellar atmospheric parameters using MK spectral classification.
An approach for applying a combination of the semantically different rule-based languages for interoperable conceptual programming over various rule-based systems (RS) and relying on the logic program transformation technique recommended... more
An approach for applying a combination of the semantically different rule-based languages for interoperable conceptual programming over various rule-based systems (RS) and relying on the logic program transformation technique recommended by the W3C Rule Interchange Format (RIF) is presented. Such approach is coherently combined with the heterogeneous data base integration applying semantic rule mediation. The basic functions of the infrastructure implementing the multi-dialect conceptual specifications by the interoperable RS and mediator programs are defined. The references to the detailed description of the infrastructure application for solving complex combinatorial problem are given. The research results show the usability of the approach and of the infrastructure for declarative, resource independent and re-usable data analysis in various application domains.
The results presented in this paper contribute to the techniques for conceptual representation of data analysis algorithms as well as processes to specify data and behavior semantics in one paradigm. An investigation of a novel approach... more
The results presented in this paper contribute to the techniques for conceptual representation of data analysis algorithms as well as processes to specify data and behavior semantics in one paradigm. An investigation of a novel approach for applying a combination of semantically different platform independent rule-based languages (dialects) for interoperable conceptual specifications over various rule-based systems (RSs) relying on the rule-based program transformation technique recommended by the W3C Rule Interchange Format (RIF) is extended here. The approach is coupled also with the facilities for heterogeneous information resources mediation. This paper extends a previous research of the authors [1] in the direction of workflow modeling for definition of compositions of algorithmic modules in a process structure. A capability of the multi-dialect workflow support specifying the tasks in semantically different languages mostly suited to the task orientation is presented. A practical workflow use case, the interoperating tasks of which are specified in several rule-based languages (RIF-CASPD, RIF-BLD, RIF-PRD) is introduced. In addition, OWL 2 is used for the conceptual schema definition, RIF-PRD is used also for the workflow orchestration. The use case implementation infrastructure includes a production rule-based system (IBM ILOG), a logic rule-based system (DLV) and a mediation system.
An approach for rule-based specification of data integration using RIF-BLD logic dialect that is a recommendation of W3C is presented. The approach allows to combine entities defined in different sources represented in different data... more
An approach for rule-based specification of data integration using RIF-BLD logic dialect that is a recommendation of W3C is presented. The approach allows to combine entities defined in different sources represented in different data models (relational, XML, graph-based, document-based) in the same rule. Logical semantics of RIF-BLD provides for unambiguous interpretation of data integration rules. The paper proposes an approach for implementation of RIF-BLD rules using IBM High-level integration language (HIL) as well. Thus data integration rules can be compiled into MapReduce programs and executed over Hadoop-based distributed infrastructures.
This position paper provides a short summary of results obtained so far on an application-driven approach for mediation-based EIS development. This approach has significant advantages over the conventional, information source driven... more
This position paper provides a short summary of results obtained so far on an application-driven approach for mediation-based EIS development. This approach has significant advantages over the conventional, information source driven approach. Basic methods for the application-driven approach are discussed including synthesis methods of canonical information models, unifying languages of various kinds of heterogeneous information sources in one extensible model,
The paper1 considers the middleware architecture of subject mediators in the hybrid grid-infrastructure of the Russian virtual observatory (RVO) for scientific problem solving over a set of heterogeneous distributed information resources... more
The paper1 considers the middleware architecture of subject mediators in the hybrid grid-infrastructure of the Russian virtual observatory (RVO) for scientific problem solving over a set of heterogeneous distributed information resources (such as databases, services, ontologies) integrated by the mediators. The RVO hybrid infrastructure is constructed as a merge of the AstroGrid VO system developed in the UK and of
Research Interests:
The paper presents an approach for heterogeneous information sources registration at subject mediators. The information source registration is considered as the process of compositional information systems development. The method is... more
The paper presents an approach for heterogeneous information sources registration at subject mediators. The information source registration is considered as the process of compositional information systems development. The method is applicable to wide class of source specification models representable in hybrid semistructured/object canonical mediator model.
Research Interests:
This position paper provides a short summary of results obtained so far on an application-driven approach for mediation-based EIS development. This approach has significant advantages over the conventional, information source driven... more
This position paper provides a short summary of results obtained so far on an application-driven approach for mediation-based EIS development. This approach has significant advantages over the conventional, information source driven approach. Basic methods for the application-driven approach are discussed including synthesis methods of canonical information models, unifying languages of various kinds of heterogeneous information sources in one extensible model, methods of identification of sources relevant to an application and their registration at the mediator applying GLAV techniques as well as ontological contexts reconciliation methods. Methodology of EIS application development according to the approach is briefly discussed emphasizing importance of a mediator consolidation phase by the respective community, application problem formulations in canonical model and their rewriting into the requests to the registered information sources. The technique presented is planned to be u...
Research Interests:
The paper 1 considers the middleware architecture of subject mediators in the hybrid grid-infrastructure of the Russian virtual observatory (RVO) for scientific problem solving over a set of heterogeneous distributed information resources... more
The paper 1 considers the middleware architecture of subject mediators in the hybrid grid-infrastructure of the Russian virtual observatory (RVO) for scientific problem solving over a set of heterogeneous distributed information resources (such as databases, services, ontologies) integrated by the mediators. The RVO hybrid infrastructure is constructed as a merge of the AstroGrid VO system developed in the UK and of the middleware supporting subject mediators developed at the Institute of Informatics Problems of RAS. An example of implementation in the hybrid architecture of a subject mediator for support of distant galaxy discovery problem is presented.
Research Interests:

And 7 more