Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Vangelis Banos

    Vangelis Banos

    ABSTRACT We present an Open Cultural Digital Content Infrastructure, a platform providing a coherent suite of loosely-coupled services that aim to promote metadata quality in repositories and facilitate metadata data and digital content... more
    ABSTRACT We present an Open Cultural Digital Content Infrastructure, a platform providing a coherent suite of loosely-coupled services that aim to promote metadata quality in repositories and facilitate metadata data and digital content reuse. The key functions of the infrastructure are the aggregation of metadata and digital files and the automatic validation of metadata records and digital material for compliance with desired quality specifications. The system that has recently moved to production, is currently being employed to ensure the quality standards of the output of more than 70 projects that support Greek cultural heritage organisations and are funded by the European Union structural funds. These projects are expected to produce more than 1.5 million digitised and born-digital items accompanied with detailed metadata. The validation is based on a set of quality and interoperability specifications that have been developed for the purpose. In this paper we emphasize on Validator and Aggregator components and present experimental results of their scalability.
    ABSTRACT Blogs are a dynamic communication medium which has been widely established on the web. The BlogForever project has developed an innovative system to harvest, preserve, manage and reuse blog content. This paper presents a key... more
    ABSTRACT Blogs are a dynamic communication medium which has been widely established on the web. The BlogForever project has developed an innovative system to harvest, preserve, manage and reuse blog content. This paper presents a key component of the BlogForever platform, the web crawler. More precisely, our work concentrates on techniques to automatically extract content such as articles, authors, dates and comments from blog posts. To achieve this goal, we introduce a simple and robust algorithm to generate extraction rules based on string matching using the blog's web feed in conjunction with blog hypertext. This approach leads to a scalable blog data extraction process. Furthermore, we show how we integrate a web browser into the web harvesting process in order to support data extraction from blogs with JavaScript generated content.
    ABSTRACT Social media content and user participation has increased dramatically since the advent of Web 2.0. Blogs have become relevant to every aspect of business and personal life. Nevertheless, we do not have the right tools to... more
    ABSTRACT Social media content and user participation has increased dramatically since the advent of Web 2.0. Blogs have become relevant to every aspect of business and personal life. Nevertheless, we do not have the right tools to aggregate and preserve blog content correctly, as well as to manage blog archives effectively. Given the rising importance of blogs, it is crucial to build systems to facilitate blog preservation, safeguarding an essential part of our heritage that will prove valuable for current and future generations. In this paper, we present our work in progress towards building a novel blog preservation platform featuring robust digital preservation, management and dissemination facilities for blogs. This work is part of the BlogForever project which is aiming to make an impact to the theory and practice of blog preservation by creating guidelines and software that any individual or organization could use to preserve their blogs.
    ABSTRACT Blogging is yet another popular and prominent application in the era of Web 2.0. According to recent measurements often considered as conservative, as of now worldwide there are more than 152 million blogs with content spanning... more
    ABSTRACT Blogging is yet another popular and prominent application in the era of Web 2.0. According to recent measurements often considered as conservative, as of now worldwide there are more than 152 million blogs with content spanning over every aspect of life and science, necessitating long term blog preservation and knowledge management. In this work, we present a range of issues that arise when facing the task of blog preservation. We argue that current web archiving solutions are not able to capture the dynamic and continuously evolving nature of blogs, their network and social structure as well as the exchange of concepts and ideas that they foster. Furthermore, we provide directions and objectives that could be reached to realize robust digital preservation, management and dissemination facilities for blogs. Finally, we introduce the BlogForever EC funded project, its main motivation and findings towards widening the scope of blog preservation.