Vangelis Banos

ABSTRACT We present an Open Cultural Digital Content Infrastructure, a platform providing a coherent suite of loosely-coupled services that aim to promote metadata quality in repositories and facilitate metadata data and digital content... more

ABSTRACT We present an Open Cultural Digital Content Infrastructure, a platform providing a coherent suite of loosely-coupled services that aim to promote metadata quality in repositories and facilitate metadata data and digital content reuse. The key functions of the infrastructure are the aggregation of metadata and digital files and the automatic validation of metadata records and digital material for compliance with desired quality specifications. The system that has recently moved to production, is currently being employed to ensure the quality standards of the output of more than 70 projects that support Greek cultural heritage organisations and are funded by the European Union structural funds. These projects are expected to produce more than 1.5 million digitised and born-digital items accompanied with detailed metadata. The validation is based on a set of quality and interoperability specifications that have been developed for the purpose. In this paper we emphasize on Validator and Aggregator components and present experimental results of their scalability.

Publication Date: 2014

Publication Name: IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications

Publication Date: 2014

Publication Name: IEEE/ACM Joint Conference on Digital Libraries

ABSTRACT Blogs are a dynamic communication medium which has been widely established on the web. The BlogForever project has developed an innovative system to harvest, preserve, manage and reuse blog content. This paper presents a key... more

ABSTRACT Blogs are a dynamic communication medium which has been widely established on the web. The BlogForever project has developed an innovative system to harvest, preserve, manage and reuse blog content. This paper presents a key component of the BlogForever platform, the web crawler. More precisely, our work concentrates on techniques to automatically extract content such as articles, authors, dates and comments from blog posts. To achieve this goal, we introduce a simple and robust algorithm to generate extraction rules based on string matching using the blog&#39;s web feed in conjunction with blog hypertext. This approach leads to a scalable blog data extraction process. Furthermore, we show how we integrate a web browser into the web harvesting process in order to support data extraction from blogs with JavaScript generated content.

Publication Date: 2014

Publication Name: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14) - WIMS '14

ABSTRACT Social media content and user participation has increased dramatically since the advent of Web 2.0. Blogs have become relevant to every aspect of business and personal life. Nevertheless, we do not have the right tools to... more

ABSTRACT Social media content and user participation has increased dramatically since the advent of Web 2.0. Blogs have become relevant to every aspect of business and personal life. Nevertheless, we do not have the right tools to aggregate and preserve blog content correctly, as well as to manage blog archives effectively. Given the rising importance of blogs, it is crucial to build systems to facilitate blog preservation, safeguarding an essential part of our heritage that will prove valuable for current and future generations. In this paper, we present our work in progress towards building a novel blog preservation platform featuring robust digital preservation, management and dissemination facilities for blogs. This work is part of the BlogForever project which is aiming to make an impact to the theory and practice of blog preservation by creating guidelines and software that any individual or organization could use to preserve their blogs.

Publication Date: 2013

Publication Name: World Wide Web

Research Interests:
Information Systems, Distributed Computing, and World Wide Web

ABSTRACT Blogging is yet another popular and prominent application in the era of Web 2.0. According to recent measurements often considered as conservative, as of now worldwide there are more than 152 million blogs with content spanning... more

ABSTRACT Blogging is yet another popular and prominent application in the era of Web 2.0. According to recent measurements often considered as conservative, as of now worldwide there are more than 152 million blogs with content spanning over every aspect of life and science, necessitating long term blog preservation and knowledge management. In this work, we present a range of issues that arise when facing the task of blog preservation. We argue that current web archiving solutions are not able to capture the dynamic and continuously evolving nature of blogs, their network and social structure as well as the exchange of concepts and ideas that they foster. Furthermore, we provide directions and objectives that could be reached to realize robust digital preservation, management and dissemination facilities for blogs. Finally, we introduce the BlogForever EC funded project, its main motivation and findings towards widening the scope of blog preservation.

Publication Date: 2013

Publication Name: Lecture Notes in Business Information Processing

Publication Date: 2012

Publication Name: Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics - WIMS '12

Research Interests:
Web Technology, Large Scale, and Data Extraction

Download (.pdf)

Publication Date: 2011

Publication Name: Lecture Notes in Computer Science

Download (.pdf)

Publication Date: 2014

Publication Name: IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications

Download (.pdf)

Publication Date: 2014

Publication Name: IEEE/ACM Joint Conference on Digital Libraries

Publication Date: 2013

Publication Name: International Journal of Metadata, Semantics and Ontologies

Research Interests:
Data Format

Publisher: Springer Berlin/Heidelberg

Publication Date: 2011

Publication Name: Research and Advanced Technology for Digital Libraries

Download (.pdf)

Location: Biblioteca Nacional de Portugal, Lisbon, PT

Publisher: Proceedings of the 10th International Conference on Preservation of Digital Objects (iPRES2013), (eds.). José Borbinha, Michael Nelson, and Steve Knight.

Publication Date: Sep 2, 2013

Conference End Date: Sep 6, 2013

Conference Start Date: Sep 2, 2013

Research Interests:
Digital Libraries, Digital Humanities, Digital Curation, Digital Media, Digital Preservation, and Digital Library

Download (.pdf)

Publication Date: 2014

Publication Name: IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications

Publication Date: 2014

Publication Name: IEEE/ACM Joint Conference on Digital Libraries

Publication Date: 2014

Publication Name: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14) - WIMS '14

Publication Date: 2013

Publication Name: World Wide Web

Research Interests: Information Systems, Distributed Computing, and World Wide Web<div>()</div>

Publication Date: 2013

Publication Name: Lecture Notes in Business Information Processing

Publication Date: 2012

Publication Name: Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics - WIMS '12

Research Interests: Web Technology, Large Scale, and Data Extraction<div>()</div>

Publication Date: 2011

Publication Name: Lecture Notes in Computer Science

Publication Date: 2014

Publication Name: IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications

Publication Date: 2014

Publication Name: IEEE/ACM Joint Conference on Digital Libraries

Publication Date: 2013

Publication Name: International Journal of Metadata, Semantics and Ontologies

Research Interests: Data Format<div>()</div>

Publisher: Springer Berlin/Heidelberg

Publication Date: 2011

Publication Name: Research and Advanced Technology for Digital Libraries

Location: Biblioteca Nacional de Portugal, Lisbon, PT

Publisher: Proceedings of the 10th International Conference on Preservation of Digital Objects (iPRES2013), (eds.). José Borbinha, Michael Nelson, and Steve Knight.

Publication Date: Sep 2, 2013

Conference End Date: Sep 6, 2013

Conference Start Date: Sep 2, 2013

Research Interests: Digital Libraries, Digital Humanities, Digital Curation, Digital Media, Digital Preservation, and Digital Library<div>()</div>

Log In

Research Interests:
Information Systems, Distributed Computing, and World Wide Web

Research Interests:
Web Technology, Large Scale, and Data Extraction

Research Interests:
Data Format

Research Interests:
Digital Libraries, Digital Humanities, Digital Curation, Digital Media, Digital Preservation, and Digital Library