For the 4th time, the International Conference on Information and Knowledge Management (ACM CIKM) hosts a workshop for Ph.D. students: PIKM 2011. The goal of this workshop is two-fold: First, a Ph.D. workshop gives doctoral students an opportunity to present their work in an early stage to a global audience. This allows the students not only to crystallize their ideas into a scientific article, and to practice scientific presentation, but also to receive feedback from reviewers, from fellow students and from the general CIKM audience. Second, we believe that the research community, too, benefits from such a workshop: Ph.D. theses are the grassroots of research. They point out new research avenues and indicate current promising topics. They provide fresh viewpoints from the researchers of tomorrow. Last, we hope that the interaction with other researchers at the workshop itself, across all levels of seniority, will help propel science forward.
The PIKM workshop covers topics in all core areas of the general CIKM conference: information retrieval (IR), databases (DB), and knowledge management (KM). This includes subjects as diverse as resource monitoring, semantic search, pattern recognition, data mining, and data warehousing.
This diversity of topics was reflected in the submissions we received. The call for papers attracted 18 submissions from nearly all continents of the world. Out of these, 9 papers were accepted as full papers. In addition, 4 papers were accepted as poster papers. The papers cover proposals at various stages of the dissertation, from early outline of research plans, to in-depth investigations of acute questions and mid-term reports of work in progress. The dissertations touch all main areas of the PIKM including, for example, work on user interaction and ranking, as well as research on workflow management. Similar to past PIKM workshops, the best submission will receive a best paper award. This year's award will go to Minsuk Kahng, Sangkeun Lee and Sang-Goo Lee for their paper "Ranking Objects by Following Paths in Entity-Relationship Graphs".
As a special highlight, this year's PIKM features a keynote talk by Prof. Dr. Felix Naumann from the Hasso-Plattner-Institute, Potsdam, Germany. Prof. Naumann will talk about the challenges of "Extreme Web Data Integration" -- a task that becomes ever more challenging with the relentless growth of the Web.
Proceeding Downloads
A user interaction model based on the principle of polyrepresentation
Recently, the cognitively motivated principle of polyrepresentation has been shown to correlate with quantum mechanics-inspired IR models. The principle's core hypothesis is that a document is defined by different representations such as low-level ...
Ranking objects by following paths in entity-relationship graphs
In this paper, we propose an object ranking method for search and recommendation. By selecting schema-level paths and following them in an entity-relationship graph, it can incorporate diverse semantics existing in the graph. Utilizing this kind of ...
Online conversation mining for author characterization and topic identification
The increasing popularity of online-based services (Twitter, Facebook, IRC, Myspace, blogs, just to mention few of them) results in a production of a huge amount of novel documents. These documents present properties that can not be found in standard ...
Pattern recognition in multivariate time series: dissertation proposal
Nowadays computer scientists are faced with fast growing and permanently evolving data, which are represented as observations made sequentially in time. A common problem in the data mining community is the recognition of recurring patterns within ...
Resource monitoring in industrial production with knowledge-based models and rules
The manufacturing domain currently experiences a significant increase in resource expenses for industrial plants. However, the implementation of systems to monitor the resource consumption in such complex plants requires high investment concerning time ...
Towards a version control model with uncertain data
Content-based online collaborative platforms and office applications are widely used for collaborating and exchanging data, in particular in the form of XML-based electronic documents. Usually, a version control system is built-in in these applications ...
Aggregation strategies for columnar in-memory databases in a mixed workload
The recent trend towards analytics on operational data has led to an approach of reunifying online transactional processing and online analytical processing in one single database. The advent of columnar in-memory databases makes this viable and ...
E-ETL: framework for managing evolving etl processes
External data sources (EDSs) being integrated in a data warehouse (DW) frequently change their data structures (schemas). As a consequence, in many cases, an already deployed ETL workflow executes with errors. Since structural changes of EDSs are ...
Minimal data sets vs. synchronized data copies in a schema and data versioning system
In this paper, we describe a key component of our proposed data-base schema and data versioning system, ScaDaVer. The versioning system is based on common practices used to manage source code changes in software development. It allows users of a data-...
Utilizing sub-topical structure of documents for information retrieval
Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For ...
Optimizing the cost of information retrieval testcollections
We consider the problem of optimally allocating limited resources to construct relevance judgements for a test collection that facilities reliable evaluation of retrieval systems. We assume that there is a large set of test queries, for each of which a ...
Towards semantic methodologies for automatic regulatory compliance support
Businesses and organizations must comply with requirements and expectations such as regulations, policies, mandates and guidelines to meet public standards and avoid hefty penalties. Checking compliance manually is a laborious, extensive and error-prone ...
RW.KNN: a proposed random walk KNN algorithm for multi-label classification
Multi-label classification refers to the problem that predicts each single instance to be one or more labels in a set of associated labels. It is common in many real-world applications such as text categorization, functional genomics and semantic scene ...
Index Terms
- Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management