IDEAS'12 is the seventeenth meeting in a series of annual meetings to address issues on the engineering and application of databases. The series was inaugurated in 1997 and has been held in North America, Europe, and Asia. Following the IDEAS tradition, these proceedings include quality papers presenting original ideas and new findings on applied technological and theoretical aspects of information technology. The Program Committee and the referees have ensured this quality for the papers through careful refereeing of the submitted papers. The submissions have gone through three phases of review; this consisted of screening for suitability, reviews by four members of the program committee followed by a double blind debate among reviewers in the case of controversial papers. The program reflects a high quality as indicated by 30% acceptance rate and consists of 17 full(18%), 9 short(11% of the remaining) and 3 poster(3% of the remaining) papers.
Evolving social data mining and affective analysis methodologies, framework and applications
Social networks drive todays opinions and content diffusion. Large scale, distributed and unpredictable social data streams are produced and such evolving data production offers the ground for the data mining and analysis tasks. Such social data streams ...
XML query processing: efficiency and optimality
XML (Extensible Mark-up Language) is a well established format which is often used for modeling of semi-structured data. XPath and XQuery are de facto standards among XML query languages and searching for occurrences of a twig pattern query (TPQ) in an ...
A constrained frequent pattern mining system for handling aggregate constraints
Frequent pattern mining searches data for sets of items that are frequently co-occurring together. Most of algorithms find all the frequent patterns. However, there are many real-life situations in which users is interested in only some small portions ...
TEEPA: a timely-aware elastic parallel architecture
Parallel Shared-Nothing architectures are frequently used to handle large star-schema Data Warehouses (DW). The continuous increase in data volume and the star-schema storage organization introduce severe limitations to scalability due to the well-known ...
Autonomous database partitioning using data mining on single computers and cluster computers
One of the most important metrics in measuring the performance of a database system is query response time, which is composed of I/O time and CPU time. I/O time is decided by the amount of data read/write from/to disks and how the data is located on ...
Schematron schema inference
In this paper we introduce a method to infer a Schematron schema from a set of XML documents. We analyze different aspect of Schematron schema generation. Since the automatic inferring of XML documents is not a new problem, we will introduce only a ...
Partitioning XML documents for iterative queries
This paper presents an XML partitioning technique that allows main-memory query engines to process a class of XQuery queries, that we dub iterative queries, on arbitrarily large input documents. We provide a static analysis technique to recognize these ...
Evaluation of data reduction techniques for vehicle to infrastructure communication saving purposes
In this paper we investigate the employment of different data reduction techniques to minimize V2I communication in an Intelligent Transportation System (ITS). We consider the context of the PEGASUS Project, where vehicles are equipped with sensor-based ...
DYMOND: an active system for dynamic vertical partitioning of multimedia databases
In recent years, vertical partitioning techniques have been employed in multimedia databases to achieve efficient retrieval of multimedia objects. These techniques are static because the input to the partitioning process, which includes queries ...
CMOA: continuous moving object anonymization
This paper proposes a continuous anonymization method for a trajectory stream. In today's mobile environment, positions of moving objects are frequently sensed and collected. For real-time movement pattern analyses of people and automobiles, trajectory ...
The QOL approach for optimizing distributed queries without complete knowledge
This paper concerns the integration of the Case Based Reasoning (CBR) paradigm in query processing, providing a way to optimize queries when there is no prior knowledge on queried data sources and certainly no related metadata such as data statistics. ...
A cooperative scheme to aggregate spatio-temporal events in VANETs
Today, thanks to vehicular networks, drivers may receive useful information produced or relayed by neighboring sensors or vehicles (e.g., the location of an available parking space, of a traffic congestion, etc.). In this paper, we address the problem ...
Efficient graph management based on bitmap indices
- Norbert Martínez-Bazan,
- M. Ángel Águila-Lorente,
- Victor Muntés-Mulero,
- David Dominguez-Sal,
- Sergio Gómez-Villamor,
- Josep-L. Larriba-Pey
The increasing amount of graph like data from social networks, science and the web has grown an interest in analyzing the relationships between different entities. New specialized solutions in the form of graph databases, which are generic and able to ...
Sample-based forecasting exploiting hierarchical time series
Time series forecasting is challenging as sophisticated forecast models are computationally expensive to build. Recent research has addressed the integration of forecasting inside a DBMS. One main benefit is that models can be created once and then ...
Stream-join revisited in the context of epoch-based SQL continuous query
The current generation of stream processing systems is in general built separately from the query engine thus lacks the expressive power of SQL and causes significant overhead in data access and movement. This situation has motivated us to leverage the ...
Resource allocation algorithm for a relational join operator in grid systems
Grid systems become very popular during the last decade because of their rapidly increasing computational capabilities. On the other hand, the advances on different domains cause enormous increase in the scale of the manipulated data. This issue ...
Incrementally maintaining run-length encoded attributes in column stores
Run-length encoding is a popular compression scheme which is used extensively to compress the attribute values in column stores. Out of order insertion of tuples potentially degrades the compression achieved using run-length encoding and consequently, ...
XML class outlier detection
XML (eXtensible Markup Language) became in recent years the new standard for data representation and exchange on the WWW. This has resulted in a great need for data cleaning techniques in order to identify outlying data. In this paper, we present a ...
Continuous queries on trajectories of moving objects
Since navigation systems and tracking devices are becoming ubiquitous in our daily life, the development of efficient methods for processing massive sets of mobile objects are of utmost importance. Although future routes of mobile objects are often ...
Dealing with inconsistencies in linked data mashups
- Eveline R. Sacramento,
- Marco A. Casanova,
- Karin K. Breitman,
- Antonio L. Furtado,
- José Antonio F. de Macědo,
- Vănia M. P. Vidal
Data mashups constructed from independent sources may contain inconsistencies, puzzling the user that observes the data. This paper formalizes the notion of consistent data mashups and introduces a heuristic procedure to compute such mashups.
Mining GPS traces to recommend common meeting points
Scheduling a meeting is a difficult task for people who have overbooked calendars and many constraints. The complexity increases when the meeting is to be scheduled between parties who are situated in geographically distant locations of a city and have ...
UBIQUEST, for rapid prototyping of networking applications
- Ahmad Ahmad-Kassem,
- Christophe Bobineau,
- Christine Collet,
- Etienne Dublé,
- Stéphane Grumbach,
- Fuda Ma,
- Lourdes Martinez,
- Stéphane Ubéda
An UBIQUEST system provides a high level programming abstraction for rapid prototyping of heterogeneous and distributed applications in a dynamic environment. Such a system is perceived as a distributed database and the applications interact through ...
A mediator-based system for distributed semantic provenance management systems
Today, most of the applications exchanging and processing documents on the web or in clouds become provenance aware and provides heterogeneous, decentralized and not interoperable provenance data. Provenance is becoming a key metadata for assessing ...
Mining probabilistic datasets vertically
As frequent pattern mining plays an important role in various real-life applications, it has been the subject of numerous studies. Most of the studies mine transactional datasets of precise data. However, there are situations in which data are ...
Differential evolution versus genetic algorithms: towards symbolic aggregate approximation of non-normalized time series
The differential evolution (DE) is a very powerful search method for solving many optimization problems. In this paper we present a new scheme (DESAX) based on the differential evolution to localize the breakpoints utilized with the symbolic aggregate ...
Efficient MD5 hash reversing using D.E.A. framework for sharing computational resources
The recent advances in computing technology lead to the availability of a huge number of computational resources that can be easily connected through network infrastructures. Indeed, a really small fraction of the available computing power is fully ...
ECTree: an extended tree index for attributed subgraph queries
Graphs are popular data structures for modeling complex data types. There is a need for managing such graph data and providing efficient querying tools. In the graph mining realm, the problem lies in indexing a large number of graphs for fast retrieval. ...
On the semantics of ST4SQL, a multidimensional spatio-temporal query language
In Pozzani and Combi proposed ST4SQL, an SQL-based query language extending SQL with new constructs for querying spatio-temporal data. In particular ST4SQL deals with different temporal and spatial semantics, allowing one to specify how the system has ...
Transaction processing using thread-to-metadata
In the distributed transactional database systems, there are a large number of concurrent transactions. Each transaction independently executes on a separate thread this is known as thread-to-transaction. The lock manager is hence responsible for ...
A stream query language TPQL for anomaly detection in facility management
In facility management for plants and buildings, needs of facility diagnosis for saving energy or facility management cost by analyzing time series data from sensors of equipments in facilities have been increasing. This paper proposes a relation-based ...
- Proceedings of the 16th International Database Engineering & Applications Sysmposium