The document discusses institutional repositories and their role in managing and disseminating digital materials created by academic institutions and their communities. It provides definitions of institutional repositories by Clifford Lynch, describing them as services offered by universities to manage and distribute digital materials created by the institution and its members with a commitment to long-term preservation. Institutional repositories are centered around academic institutions and contain scholarly works of varying degrees produced by that institution. They aim to maximize access to and preservation of an institution's research outputs.
1 of 83
More Related Content
Core webinar updated 30-05-2020
1. Dr Mayank Trivedi
University Librarian & Senate Member
CORE Ambassador
The Maharaja Sayajirao University of Baroda
Date : 30th May, 2020
Email : librarian-hml@msubaroda.ac.in
2. Open access digital archive on open source software.
A managed, persistent way of making research,
learning and teaching content with continuing value
both discoverable and accessible.
Repositories can be subject or institutional in their
focus.
Putting content into an institutional repository
enables staff and institutions to manage and preserve
it, and therefore derive maximum value from it.
A repository can support research, learning, and
administrative processes.
The university is building a repository to house the
scholarly output of staff called DORA De Montfort
Open Research Archive. Alternatively a suitable
subject based repository may be available.
3. Clifford Lynch, Executive Director, Coalition for Networked
Information, stated.
“In my view, a university-based institutional
repository is a set of services that a university
offers to the members of its community for the
management and dissemination of digital materials
created by the institution and its community
members. It is most essentially an organizational
commitment to the stewardship of these digital
materials, including long-term preservation where
appropriate, as well as organization and access or
distribution.”
ARL: A Bimonthly Report, no. 226 (February 2003)
4. Centered around a university (other academic
institution) and contain items which are the
scholarly output of that institution.
A collection of (digital) objects, in a variety of
formats Include works of various degrees of
scholarly authority and from various stages in the
process of scholarly inquiry. In addition to
published works, an IR may include preprints,
theses & dissertations, images, data sets,
working papers, course material, or anything else
a contributor deposits.
Typically motivated by a commitment to open
access.
5. Institutions are logical implementers of repositories because they can
take responsibility for:
– Centralizing a distributed activity
– Framework and Infrastructure
– Permanence that can sustain changes
– Stewardship of Digital assets
– Preservation policy for long term access
– Provide central digital showcase for the research, teaching and
scholarship of the institution
Institutional repositories were originally developed to provide a
solution for the collection, preservation and dissemination of the
research output created at universities and research institutions.
The importance of such repositories was quickly recognized since they
provided the means for the institutions to showcase their academic
work.
Using open source software gave the ability to developers
worldwide to build custom features and functionalities for the
systems.
6. Institutional Repositories
Are organized around a
particular institutional
community
Often are dependent upon the
voluntary contribution of
materials by scholars for the
content in their collection
Are mainly repositories and
therefore may only offer
limited user services
E.g. Dspace@MIT, IR@MSU
Digital Libraries
May be built around any
number of organizing
principles (often topic, subject,
or discipline)
Are the product of a deliberate
collection development policy
Typically include an important
service aspect (reference and
research assistance,
interpretive content, or special
resources.)
E.g. NDLI
7. Content is deposited in a repository – by
content creator, owner etc.
Repository architecture manages the content
and the metadata
Repository software offers a minimum set of
basic services – put, get, search
Repository must be sustainable, trusted, well
supported and well-managed
8. Faculty
Pre-prints, post-prints, research
findings, working papers, technical
reports, conference papers
Multimedia, videos, teaching materials,
learning objects
Data sets (scientific, demographic, etc.)
and other ancillary research material
Web-based presentations, exhibits, etc.
Journals and journal articles
Books and book chapters
Technical reports
Working papers
Standards
Theses and dissertations
Reviews
Components (e.g. graphs, figures,
audio, video files)
Departmental and research center
newsletters and bulletins
Papers in support of grant applications
Statistical reports
Surveys
Students
Theses and dissertations
Projects and portfolios
Awarded research
Performances and recitals
University
Annual Reports
Old Question Papers
Curriculum
Course Modules
9. What is open access (OA)?
Many definitions – a report from the Joint Information
Systems Committee (JISC) in the UK of 2006 stated:
The Open Access research literature is composed of free,
online copies of peer-reviewed journal articles and
conference papers as well as technical reports, theses and
working papers. In most cases there are no licensing
restrictions on their use by readers. They can therefore be
used freely for research, teaching and other purposes.
(http://www.jisc.ac.uk/publications/publications/pub_ope
naccess_v2.aspx)
An Open access institutional repository is that repository
where are contents are freely available for use.
10. There are various misunderstandings about
Open Access. It is not self-publishing, nor a
way to bypass peer-review and publication,
nor is it a kind of second-class, cut-price
publishing route. It is simply a means to
make research results freely available online
to the whole research community.
11. Gold OA - uses a funding model that does not
charge readers or their institutions for access
e.g. Ariadne, Dlib Magazine and First Monday
Green OA - authors publish papers in one of
the 25,000 or so refereed journals in all
disciplines and then self archive these papers
in open access/digital/institutional
repositories.
14. An institutional repository is a tangible indicator of
research output of a university – thus increasing its
visibility, prestige and public value
Repository content is readily searchable – both locally and
globally
Can be used as a marketing tool for the institution
Allows an institution to manage its Intellectual Property
Rights appropriately
An institutional repository benefits the institution, the
researcher and anyone interested in scholarly outputs.
From an institutional perspective it provides a record of
scholarly activity taking place within the university.
For a researcher, it creates stable and reliable records of
your work, managed and stored in ways which meet
international technical standards.
Each item in the repository has a unique Internet address
(called a Handle) and it can be found easily on major
search engines.
And finally, increased access to scholarly knowledge is a
benefit to all.
15. Scholarly communication
Storing learning materials and coursework
Managing collections of research documents
Preserving digital materials for the long term
Knowledge management
Electronic publishing
Research assessment exercise
Collaboration tool
16. For researchers
Showcase your institute’s
output
Increases citation for authors
24-hour access through any
web-enabled device
Life’s work in one location
Satisfies funder’s mandates
Persistent URLs
For librarians
Provides new ways for
archiving & preserving valuable
work
Time-saving and cost-effective
Help to identify trends
Reduce duplication of records
For the university
An effective marketing tool
Increase the visibility,
reputation and prestige
Greater interdisciplinary
research
Enhanced funding
Facilitates gathering data such
as publications for
Assessments
For the global community
Free access of scholarly
information
Taxpayers fund a large amount
of scientific research
Developing countries
Increase public knowledge
Gain access to a wide variety of
materials
17. Increase in citations, impact and usage
(useful for research evaluations)
Increase in public research profile – both for
the individual as well as the institution
Preservation of research outputs from the
institution
18. Steps to Building an IR
1. Justify the relevance to the institution and
contributors
2. Develop a policy framework. How will we find this
content and what will we do with it?
3. Build the infrastructure
Bonus: Get institutional support and a mandate.
IR Technology
IR software(Dspace)
OAI-PMH harvesting protocol/software (Free)
Intel/Pentium servers for IR
Linux/Red Hat OS, MySQL/PostGre DBMS,
Apache/Tomcat web server, Perl/Java (Free)
20. Scope : Reinforce the repository’s active support for the
institution’s mission, values and goals
- Identify/build a context in which the repository is
necessary
- Multidiscipline / single subject /Entire research
output/database for each functional unit
Types of documents
◦ Single database for different types /single one
Software: OSS like DSpace or GNU Eprints or develop own
Research Deposit Types: Thesis, Journal articles, Preprints,
Reports, Conference papers, Book Chapter, etc
Resources: Human, IT, Funding
Stake holders: Library, Each Department, Institute as a whole
Services : Focus on building services not collections
21. Deposit options
- Researcher self deposit and /or assisted deposit
Metadata quality
- Ensuring quality and rich metadata is labour intensive
Digitization : Born digital / Scanning
File formats : Accept all, Only PDF and/or other, Conversion
Only full text database and/or Bibliographic
Copyright : RoMeO Publishers Copyright policies
Quality assurance: Peer review, Editing
Deposit Agreement and Use Agreement
- Depositor’s declaration: Non-exclusive license
- Copyright/Patent/Trademarks
- Repository’s rights and responsibilities: Distribute, Store,
Migrate, Copy Rearrange, Remove
- Use Agreement: Copy, Distribute, Display, Share, Author
credit
22. Advocacy
- Sensitive to organizational culture and
background
- Community size
- Strategy: stakeholders, management
committees
Copyright
- Concern of researchers, Legal department
Positioning
- Library/Institute Website
23. Produced by Berkeley Electronic Press
(bepress), focused on maintaining
scholarly output. Not open source.
Developed at the University of Southampton (UK).
Widely considered to be the least complex of the
major repository software platforms.
Developed at Cornell and University of
Virginia. Based on a framework known as
the Flexible Extensible Digital Object and
Repository Framework.
Designed by MIT and Hewlett-Packard to manage
the intellectual output of research institutions and
provide for long-term preservation.
24. Significant subject repositories include many using e-Prints or DSpace
software:
ArXiv - http://www.arxiv.cornell.edu/ (physics, mathematics, nonlinear
science and computer science)
Cogprints - http://cogprints.ecs.soton.ac.uk/ (Cognitive sciences
including psychology, neuroscience, linguistics and other related areas)
CiteSeer - http://citeseer.nj.nec.com/cs (computer science)
HTP Prints - http://htpprints.yorku.ca/ (History and theory of psychology)
PubMedCentral - http://www.pubmedcentral.nih.gov/ (US National
Library of Medicine's digital archive of life sciences journal literature)
PhilSci Archive - http://philsci-archive.pitt.edu/ (philosophy of science)
E-LIS - http://eprints.rclis.org/ (library and information science)
RePEc (Research Papers in Economics)
28. The OpenDOAR service provides a quality-
assured listing of open access repositories
around the world.
OpenDOAR staff harvest and assign metadata to
allow categorisation and analysis to assist the
wider use and exploitation of repositories.
Each of the repositories has been visited by
OpenDOAR staff to ensure a high degree of
quality and consistency in the information
provided.
OpenDOAR is maintained by SHERPA consortium
staff at the University of Nottingham, UK
35. Aims to monitor overall growth in the number
of eprint archives and to maintain a list of GNU
EPrints sites (http://roar.eprints.org)
Available from Southampton University, UK
Data gathered automatically via OAI-PMH
Also ROAR Materials Archiving Policies –
ROARMAP - 181 Institutional repositories
(including The M S University, Baroda,
INFLIBNET, Bharathidasan University in India)
50. Preserving natural/cultural heritages
For promoting academic research
Enabling public access to legacy collections
Technological change
Siginificant increase in the overall volume of research
Increasing need for archival and access to unpublished information
Increasing demand to access knowledge objects from anywhere at
anytime (24x7)
Increasing uncertainty over who will handle the preservation and
archiving of digital scholarly research materials.
Born digital Content(e-only)
Research Data Management(RDM)
Need for Digital Preservation
51. The maintenance of digital materials over the
long-term with a view to ensuring its
continued accessibility.
It ensures that the digital resources are
stored correctly and maintained adequately in
the online world, such that they are available
consistently for use over time.
“Long-term” includes timescales of decades
or even centuries.
52. Primary Sources of
Information
Electronic Conferences
Courseware/Tutorials/G
uides/Manuals
Electronic Journals
Patents
Electronic Preprints and
E-prints
Projects (Ongoing and
Completed)
Science/Research News
Software
Standards
Technical Reports
Electronic Theses and
Dissertation
Secondary source of Information
-Databases, Data sets and
Collections
- Abstracting and Indexing
Databases (Bibliographic
Databases)
- Citation Databases
- Digital Collections (Images,
Audio, Video)
- Equipment/Product Catalogues
- Scientific Data sets (Numeric,
Property and Structural
Databases)
- Library Catalogues (including
Union Catalogues)
- Virtual Libraries
53. Tertiary Source of Information
◦ Electronic Books, Online Book Selling and Print-on-Demand,
Pocket PCs
◦ E-Books on the web
◦ Reference Sources
◦ Dictionaries
◦ Electronic Encyclopaedia
◦ Biographies
◦ Acronyms and Abbreviations
◦ Thesauri and Subject Headings
◦ Handbooks and Manuals
◦ Maps
◦ Employment / Career Sources
◦ Funding / Grants Sources
◦ People/ Experts/Scientist Directories
55. • An aggregation service with more than a thousand data providers
• CORE aggregates open access journals and repositories (subject
and institutional)
• The collected data are available to everyone via the CORE search
engine
• Supports text and data mining of the aggregated content via the
API and the Data Dumps
• Provides a recommendation tool for repositories and journal
software
• Seamless access to the world’s open access research papers
• Aggregate all open access articles across relevant data sources
worldwide, enrich this content and provide seamless access to it
through a set of data services …
56. World’s largest open access articles collection
49TB of data
3,648 Data providers
~ 50k Journals
> 131,302,881 Metadata
> 93,348,838 Abstracts
> 11,250,039 Hosted full texts
> 78,781,218 Links to full text
57. Institutional repositories (IRs) serve a number
of purposes; such collecting and curating
digital outputs, providing statistics, research
excellence, etc.
The primary goal of repositories is to open
and disseminate research outputs to a
worldwide audience (Crow, 2002) – SPARC’s
position paper on the case for institutional
repositories.
58. “For the repository to provide access to the broader
research community, users outside the university must be
able to find and retrieve information from the repository.
Therefore, institutional repository systems must be able to
support interoperability in order to provide access via
multiple search engines and other discovery tools.
An institution does not necessarily need to implement
searching and indexing functionality to satisfy this
demand:
◦ it could simply maintain and expose metadata, allowing other
services to harvest and search the content. This simplicity lowers
the barrier to repository operation for many institutions, as it only
requires a file system to hold the content and the ability to create
and share metadata with external systems.”
59. “Each individual repository is of limited value for
research: the real power of Open Access lies in the
possibility of connecting and trying together
repositories, which is why we need interoperability.
In order to create a seamless layer of content through
connected repositories from around the world, Open
Access relies on interoperability, the ability for
systems to communicate with each other and pass
information back and forth in a usable format.
Interoperability allows us to exploit today's
computational power so that we can aggregate, data
mine, create new tools and services, and generate
new knowledge from repository content.’’ [COAR
manifesto]
60. • OAI-PMH
• ResourceSync
• Bespoke APIs of Publisher
CORE Pipeline
• Metadata download, extraction and harmonisation
• Full text download
• Text extractions and sections extraction
• Metadata validation
• Thumbnails generation
• References and citation contexts extraction
• API enrichment (e.g. finding DOIs, linking to other systems)
• Document type classification
• Deduplication
• Indexing
• Exposing (data dumps, API, ResourceSync)
61. OAI-PMH supports representing metadata in
multiple formats, but at a minimum
repositories must be able to return records
with metadata expressed in the Dublin Core
format (OAI-PMH v2.0, 2008)
If repositories want to satisfy the SPARC
guidelines (Crow, 2002), they must provide a
link to the content as part of the exposed
metadata.
62. The OAI-PMH specification states on this topic
that: “The nature of a resource identifier is
outside the scope of the OAIPMH.
To facilitate access to the resource associated
with harvested metadata, repositories should use
an element in metadata records to establish a
linkage between the record (and the identifier of
its item) and the identifier (URL, URN, DOI, etc.)
of the associated resource.
The mandatory Dublin Core format provides the
identifier element that should be used for this
purpose.”
63. OAI-PMH metadata harvesting
Locating full-text
Focused crawling (to locate full-texts)
Focused crawling (driven by citation analysis)
Aggregations need access to content, not just
metadata!
Certain metadata types can be created only at the
level of the aggregation
Certain metadata can be changing in time
Ensuring content:
◦ accessibility • availability • validity • quality
64. CORE Portal
CORE Mobile
CORE Plugin
CORE API
Repository Analytics
65. • Content
• 7M records
• 230 repositories
• 402k full-texts
• 1TB of data
• 40GB large index
• 35 million RDF triples in the CORE LOD repository
• Started: February 2011
• Budget: 140k£
66. Link to the CORE search engine from your library
Search the CORE engine to help library users
Let your users know about CORE
Discover content in the CORE search engine
Use the CORE API https://core.ac.uk/services#api
and Dataset to apply TDM practices
https://core.ac.uk/services#dataset
AS A REPOSITORY MANAGER
◦ install the CORE Recommender and register for the
Repository Dashboard to manage your content in CORE
AS A DEVELOPER OR COMPANY
◦ subscribe to our API, Dataset or FastSync services
67. CORE has already more than 1,000 data providers
https://core.ac.uk/dataproviders
Join CORE as a data provider https://core.ac.uk/join
Ask your library to list the CORE search engine in
the library pages
When you become a CORE data provider request
access to the CORE Dashboard
https://core.ac.uk/services#dashboard
Install the CORE recommender in your repository
https://core.ac.uk/services#reccomender
68. Extend the current metadata enrichment service of the CORE
system
Classify research papers according to their subject
Development of APIs to increase interoperability and reusability
Develop good practice for publishing content in OARs
Clustering content based on different semantic relations,
associating research areas/topics with each cluster
Developing citation networks and analysing relationships with
semantic similarity of resources
Developing formal methods for analysing the impact of
publications and authors
Mining trends as temporal relations of topics
Analysing inter-topic relationships
To provide a platform for the delivery of Open Access content
aggregated from multiple sources and to deliver a wide range of
services on top of this aggregation.
A nation-wide aggregation system that will improve the
discovery of publications stored in British Open Access
Repositories (OARs).
69. Objective :
… the COnnecting REpositories (CORE) project aims
to facilitate the access and navigation across relevant
scientific papers, stored in British Open Access
repositories using Natural Language Processing and
Linked Data …
- 2.5 Million RDF triples and growing - 143 British
Open Access repositories (more than 1,900
worldwide) - More than 50k full-text papers and
growing (more than 3 million worldwide)
Schema
- MuSim ontology (similarity) - Bibo ontology - OAI
Linked Data repository
70. Recommender :A leading research papers recommendation
Solution - A plugin for repositories, journal systems and web
interfaces to suggest similar articles, supporting users in
discovering similar articles
Discovery : One-click access to free copies of research papers
whenever you hit the paywall- CORE Discovery assists users in
finding freely accessible copies of research papers that are
behind a paywall. Access millions of full text open access papers.
Repository Dashboard : Manage your repository outputs and
ensure they are visible to the world. A service designed for our
data providers. It provides an online interface offering valuable
technical information and statistics to content providers.
With its analytical tools, CORE aims to contribute to the cultural change
and promotion of open access, supporting a variety of stakeholders,
such as researchers, libraries, software developers, funders and more.
71. Faceted searching
What you find is what you can download
Host the full text content
CORE is not here to replace Google Scholar, It
serve a different community
>22 million yearly unique visitors
The world’s largest aggregator of open
access research papers
CORE provides free and seamless access to
millions of research articles aggregated from
thousands of Open Access data providers,
such as repositories and journals.
>60 registered repositories
72. Recommending relevant content to users
from across the CORE’s content
Recommender plugin for repositories
15 repositories using it, including Cambridge
University
73. Free online availability of research outputs (not metadata of
research outputs) is the main goal of open access (OA).
Repositories are one of the recommended ways to achieve this
(BOAI,2002).
Despite large amount of OA content already available online
(Laakso & Bjork, 2012), OA content is not necessarily easily
discoverable (Morrisson, 2012; Konkiel, 2012).
Often available, but difficult to find …
Inhibiting the OA impact, accessibility, discoverability, reuse
Discoverability of OA content on the Web can be dramatically
increased by adopting two simple principles!
From Open Access Metadata to Open Access Content
Two Principles for Increased Visibility of Open Access Content
74. It is beyond human capacities to read all
scientific literature
Example use cases in which CORE is applied:
- Improving discovery
- Plagiarism detection
-Question answering in science
-Literature based discovery
-Fact checking and detection of misinformation
-Analysing research trends
-Finding experts in a particular domain
-Research evaluation and scientometrics
-Exploratory and visual search
75. Main source of data for Naver Academic
IRIS.AI: A start up of 20 people entirely
developing tools to help researchers be more
productive. Highly dependent on CORE API.
ProQuest: CORE supplies data about OA
publications to improve discovery of OA
content in Primo and Summon (new)
additional discovery collaborations on the
way.
Providing a key API for OA Button
77. The institutional repository is like a roach motel. Data goes in, but it doesn’t come
out.” (Salo, 2008)
Lower accessibility of papers (we have them, but cannot find them)
Text-mining, Cannot monitor growth
Loosing a strong argument for the adoption of OA!
Repository systems are currently highly used in education and research
environments; hence they provide a solid base for a global infrastructure of
research.
However most of the platforms used in the repositories are using out of date
technology which was designed two decades ago.
For this reason, repositories have not yet reached their full potential and they are
mainly used for publishing the research output of the institutions.
Working Group to “identify the core functionalities for the next generation of
repositories. These functionalities include more web-friendly architectures,
embedding repositories into the workflow of researchers, open peer review and
quality assessment of content, and better impact and usage measures.”
78. According to COAR the exact vision for Next Generation
Repositories is
“to position repositories as the foundation for a distributed,
globally networked infrastructure for scholarly communication,
on top of which layers of value-added services will be deployed,
thereby transforming the system, making it more research-
centric, open to and supportive of innovation, while also
collectively managed by the scholarly community”.
These new functionalities aim to transform repositories into
the foundation for a distributed, globally networked
infrastructure for scholarly communication.
Layers of added value services can be deployed on this
infrastructure which make a repository more research
oriented, innovative, while providing open peer review and
quality assessment of the content and better impact and
usage measures
82. Proportion of OA content that can be harvested is fairly low in comparison to
metadata
Inhibiting the benefits of OA - Interoperability
Two principles:
1) Dereferencable identifiers - Open Repositories provide open access to content
and not just meatadata
2) Machine access – Open Repositories should provide free access to content (for
anybody, but mainly researchers)
• Compliance validation tools are needed
The rapid growth of OA content provides great opportunity for text-mining.
Aggregations need to aggregate content, not just metadata.
Aggregations should serve the needs of different user groups including researchers
who need access to data. CORE aims to support them.
Global world’s largest aggregator of open access papers
A suite of data services for a variety of stakeholders
Large user base
Moving fast from research to a product, Always innovate.
Excellent value for money.
83. ……..Thank You
Stay Safe and Take Care…
PPTs will be available on :
https://www.slideshare.net/DrTrivedi1
https://www.slideshare.net/mayanktrivedi2
1/presentations