DAF Implementation Guide
DAF Implementation Guide
October 2009
Acknowledgement
This guide is based on the lessons learned through DAF pilot projects and early exemplars. We’re
very grateful to those groups for sharing their experiences with us to help refine the methodology
and assist future users. They were:
We’re also indebted to the JISC, which has supported this research.
1
CONTENTS
BACKGROUND .................................................................................................................... 3
2
BACKGROUND
What is DAF?
The Data Asset Framework is a set of methods to:
- find out what data assets are being created and held within institutions;
- explore how those data are stored, managed, shared and reused;
- identify any risks e.g. misuse, data loss or irretrievability;
- learn about researchers’ attitudes towards data creation and sharing;
- suggest ways to improve ongoing data management.
Originally called the Data Audit Framework, the tool is being renamed in light of user feedback. Some
pilots found the term audit could be off-putting to researchers and misrepresented the survey process,
which focuses more on uncovering researchers’ data needs and concerns than auditing assets.
The DAF methodology is written for information professionals. It is envisaged the person undertaking a
survey would have either a qualification in library, archive or information management, or significant
experience working with data. Such skills are needed to understand the information lifecycle and
identify risks in existing research workflows and data management practices.
The DAF survey process should involve a variety of stakeholders, for example senior managers,
University services such as IT support or repositories, and most importantly researchers.
SENIOR MANGEMENT /
RESEARCH FUNDERS RESEARCHERS
3
WHY USE DAF?
Encouraging participation
Ensuring researcher participation is crucial for success as they create the data and take many decisions
affecting long-term curation and reuse. It helps to make the benefits of taking part clear. Some
methods used by pilot studies to encourage participation were:
4
HOW TO USE DAF?
DAF methodology
The DAF model suggests an incremental, four-step approach to undertaking data surveys. These stages
can be applied flexibly to suit the specific context and needs. Depending on the survey aims you may
wish to focus efforts more in one area than another, or conduct the main stages in a different order.
- Rethinking classification
Some pilots encountered challenges when classifying data as the process is based on value
judgements. The criteria, however, do not need to be expressed in terms of value. The
classification could be more directly linked to the survey aim (e.g. basing it on the potential for
deposit in a repository building exercise) or reflect other approaches such as risk-analysis.
5
The four stage descriptions that follow focus on practical implementation lessons noted by the pilot
studies. More detailed descriptions of each survey stage with activities that may be relevant are
available in the DAF methodology.
Planning
Preparing as much as possible in advance helps to make sure the data survey runs smoothly. One key
aspect to cover is when the survey should take place – arrange a convenient time for the survey so it’s
easy for people to participate. Pilot studies found that participation was affected by annual leave,
exam board meetings, fieldwork and other major commitments.
Elapsed time between meetings, questionnaires and interviews can be significant. Wherever possible
conduct background research in advance and set up appointments to speak with researchers early on
in the process to ensure the survey runs smoothly. Collecting information is very time consuming,
particularly in interviews, so having a clear aim and tight focus is crucial.
Defining aims
Being clear about the aims of the data survey from the outset helps to define the scope. It also means
you can provide clarity for researchers about what will be achieved and the benefits of taking part.
There were many aims behind pilot project data surveys, including:
− Scoping researchers’ requirements to inform the development of new systems
− Performing service gap analysis to see where services should be developed / brought together
− Capacity planning exercises to inform future storage needs
− Responding to identified issues e.g. improving archiving workflow
The information you decide to collect and the approach you adopt to do TIP
so will vary according to the survey’s underlying aims. It may be worth An initial meeting with
looking at the customisations on page 4 to consider how you will tweak researchers or the Head
the approach to meet your context and researcher needs. of Department is a useful
way to find out what
they want and to set the
Setting the scope aims and scope of the
With so much data being created and used by researchers, the pilot survey to meet this.
projects found it crucial to scope surveys tightly to ensure it was feasible
to meet survey aims. Some approaches used that may be of help are:
− Limiting the time period being covered e.g. only data from the last three years
− Excluding certain types of data e.g. forensic archaeology data due to sensitivities
− Focusing on certain research groups or staff e.g. full-time academics not fellows
− Selecting examples of each type of data or project
− Working with projects at different stages of the lifecycle
− Snowball sampling e.g. interviewing research group leaders then others as directed
You may wish to review the scope mid-survey in light of how the information exercise is progressing or
new needs that arise, so it is useful to adopt a flexible approach.
6
INFORMATION COLLECTING EXERCISE – STAGES 2 & 3
From the aim and scope defined at the planning stage you should have a clear idea about the kind of
information you want to collect. The next phase of the survey is to undertake this work. This is covered
by stages 2 and 3. These stages overlap significantly so they’ll be discussed collectively here. Some
pilots found it useful to run them concurrently.
TIP
Collecting information
Questionnaires are a
There are various ways to collect information in data surveys. The pilot useful way to identify
studies found a combination of approaches worked best. Questionnaires researchers willing to
were found to be the most useful means of collecting basic contributions participate in more
from a wide range of stakeholders, while interviews were useful for more detailed interviews.
detailed, qualitative information on data management and user needs.
Desk-based research
Good to collate background information Remote access to data may not be granted
Research articles often provided details of Hard to understand local filing / naming systems
how the data were created
Questionnaires or wiki for researchers to fill in
Good for collecting basic overview Response rate can be low due to survey fatigue –
Allows wide participation best if pushed by internal advocate
Wiki approach lets researchers adapt survey Need to make sure software meets your needs –
to add fields relevant to them Bristol Online Surveys found to work best
Interviews
Provide high quality information Requires significant input from researchers
Can develop questions to tease out points Can be hard to schedule
Help gauge awareness of data issues Very time consuming – can be useful to have two
surveyors: one to interview, one to note-take
The DAF online tool is a place where information collected on data assets can be stored and shared. It
mirrors the four stage approach and provides survey forms for completion. These could be filled in by
the survey organisers or participating researchers – multiple logins with different levels of access can
be assigned. Basic analysis tools are provided, and import and export facilities are planned.
Before completing the inventory you’ll need to define what you mean by data assets. Will this include
software, non-digital items such as lab notebooks that are integral for interpretation, or third-party
data for which you do not have curatorial control? The definition has varied in pilot surveys depending
on what is important for the discipline being surveyed. Most have viewed ‘data’ as encompassing:
7
The level of granularity to adopt also needs to be defined. Are assets recorded as single files, as
datasets, or in terms of collections or projects? The answer will vary in each case depending on the
scope set to ensure that meeting the survey aims is feasible. Pilots did not always encounter well-
documented, homogenous datasets - often there were just ad hoc collections of data and resources
used to support particular research, which could be difficult to interpret or group.
Ensuring the sustainability of the inventory was raised as a concern in some pilot studies. The
inventory could be used to start active data management, for example as a prompt to deposit in
repositories. Embedding the inventory in the work of the department so it becomes a local data
tracking and management tool was suggested. Using wikis and encouraging a researcher-led survey, as
was done with climate change researchers at Monash University, could be a useful way to achieve this.
Classifying data
The DAF methodology suggests data is classified in order to restrict the scope when moving from the
wide, shallow inventory to the more detailed assessment of data management. In practice this was not
always feasible or necessary; pilot studies found inventories were often representative samples rather
than comprehensive registers, so the scope did not need to be restricted.
Some criteria for classifying data suggested by pilots that may be of use were:
• National Science Foundation data categories1
• Risk-assessment e.g. data most at risk of loss or cases with penalties for misuse
• Institutional responsibility e.g. not third party data where HEI does not have curatorial control
• Potential return e.g. if data could be ingested into repository for data sharing
Data interviews
TIP
Interviews were found to be the best way to elicit information on how data
The data lifecycle model
are being managed as there was an opportunity to build rapport with the
was found to be a useful
researcher and gauge their understanding of potential issues. Simply asking
way to frame interviews,
pertinent questions was found to be useful to raise awareness of good
as activities were familiar
curation practice, as it made people reflect on their approach to data
to researchers. It helped
management more critically.
to introduce the range of
stakeholders too. (see p1)
Interviews were sometimes used to gather all the information for stages 2 and 3
at once. One approach used was:
1. going through the interviewee’s personal drives (and, where appropriate, shared drives) to
determine which collections of data constituted data assets;
2. recording names, descriptions, statements of responsibility and locations;
3. discussing the importance of the asset in terms of current and future research;
4. recording additional information about file formats, software requirements, derived reports/
papers, dates of creation and update, etc.;
5. discussing how the interviewee managed the data.
Example interview frameworks used by the pilot studies are available at the end of the guide.
1
NSF, Long-Lived Digital Data Collections Enabling Research and Education in the 21st Century, Appendix D. Digital Data Collections by
Categories. http://www.nsf.gov/pubs/2005/nsb0540/nsb0540_11.pdf
8
STAGE 4 / NEXT STEPS
Reporting
The final stage is to collate findings and report back with recommendations on how data management
practices could be improved. The findings from the pilot implementations were broadly aligned and
echoed the conclusions of other case studies conducted recently, such as those done by the UKRDS. As
the data landscape was found to be common across institutions and research areas, the issues and
recommendations below may help inform your survey. More discussion on findings from early DAF
audits can be found in an IJDC paper.2
Where to go next?
Further information on using DAF is offered through the DCC. We can provide guidance on how the
methodology has been implemented in different contexts and run training courses for those wishing
to conduct data surveys. If you would like more information, get in touch with us at: info@dcc.ac.uk
Example questionnaires and interview frameworks from DAF exemplars are provided in the pages that
follow. If you would like to repurpose these please acknowledge the source institutions as noted.
Additional information on the lessons learned by the DAF exemplars, for which these questionnaires
and interviews were developed, is available in their final project reports available on the DAF website.
2
Ball, Alexander, Ekmekcioglu, Cuna & Jones, Sarah, ‘The Data Audit Framework: a first step in the data management challenge’ in the
International Journal of Digital Curation, Vol 3, No.2, 2008, available at: http://www.ijdc.net/index.php/ijdc/article/viewFile/91/62
9
PRACTICAL EXAMPLES
4. Project details
5. Description of the data
6. Ownership: who owns the data?
7. Characteristics of the data (select all that apply)
a. Observational
b. Experimental
c. Reference
d. Derived
e. Simulated
f. Not Applicable
10
11. Retention period
a. Only over the project period
b. Up to 5 years
c. Up to 10 years
d. More than 10 years
e. Don't know
12. How frequently do you update your data over the project period?
a. Never
b. Daily
c. Weekly
d. Monthly
e. Annually
f. Don't know
14. Do you currently have a formal Research Data Management Plan in place in your school/centre?
a. Yes
b. No
c. Don't know
15. Who is currently responsible for managing the data? (select all that apply)
a. Research project manager
b. Designated person on project
c. External project partners
d. IT staff within your school, centre or research institute
e. Research assistant
f. Yourself
g. National data centre or data archive
h. Nobody
i. Don't know
j. Other (please specify):
PRACTICAL EXAMPLES
11
Imperial College London DAF Survey Questionnaire
Welcome Note
The purpose of this survey is to build a better understanding of research data held in your
Department, to inform strategic planning for data management at Imperial College and to help inform
the wider UK data management community.
This survey consists of 15 questions split across two pages and should take no more than 20 minutes
to complete.
12
6. Who is responsible for managing your Y (Multi • Yourself
electronic research data? answer) • Research Project Manager
• Research Assistant
• Research Technician
• PhD Student
• Other Designated person in Research
Group
• Departmental IT Officer
• Central ICT
• Local Data Centre
• National data centre / data archive
• International data centre / data archive
• Don't know
• No one
• Other (please specify)
7. Please confirm where your electronic Y (Multi • Hard disk drive of instrument/sensor
research data is primarily stored? answer) which generates data
• Hard disk drive of PC
• External hard drive
• Local server
• ICT server
• Third party
• CD/DVD
• USB/Flash drive
• Other (please specify)
8. Is your data backed up regularly? Y • Yes
• No
• Don't know
a. If yes, how frequently is it Y -subject to • Daily
backed up? 8 • Weekly
• Monthly
• Ad hoc
• Don't know
• Other (please specify)
b. What data tends to be backed Y - subject • Everything
up? to 8 • Data critical to project
• Data required for publication
• Don't know
c. Where is it backed up? Y (multi • CD / DVD
answer) - • USB/Flash Drive
subject to 8 • External Hard Drive
• Tape
• Local server
• ICT server
• Third party
• Other (please specify)
9. Do you currently have a data Y • Yes
management plan for your research • No
data (for example, data preservation
policy, record management policy,
data disposal strategy)?
13
a. If yes, what was the main Y (multi • Research requirement to
driver for developing your answer) - access/analyse/annotate others' data
strategy? subject to 9 • Requirement of project funder
• Size of project team (i.e. multiple data
creators)
• Volume of data associated with project
• Complexity of data associated with project
(e.g. multiple formats)
• Absence of College data management
policy
• Other (please specify)
b. If no, please confirm why Y (multi • Not required / appropriate to field of
answer) - research or research group
subject to 9 • Not required by project funder
• Time and effort required
• Lack of training / expertise within research
group
• Lack of local support / guidance (e.g.
Central Library, ICT)
• Absence of College data management
policy
• Don't know
• Other (please specify)
10. Do you currently allow others to Y • Yes
access your research data? • No
a. If yes, who to? Y (multi • Students / Colleagues in Department
answer) – • Students / Colleagues within Imperial
subject to • Research Group
10 • Other Institutions
• As supporting evidence to publication
• General public
• Other (please specify)
b. If no, what access issues are of Y (multi • Confidentiality /IPR
concern to you? answer) – • Commercial value of data
subject to • Possible misinterpretation of data
10 • Time/effort required
• Other (please specify)
11. Have you ever been asked to make Y • Yes
your electronic research data openly • No
available outside of a publication (e.g.
required by project funder)?
a. If yes, please supply high level Y– subject Free text
details to 11
14
Your Data Assets
In this section we would like you to provide details of electronic research data you consider critical to
your own work or that of your Research Group/Department.
For example, if a Research Council were to ask you to safeguard your data for future re-use or if you were to
leave College, what data should be preserved? Alternatively, please provide examples of data which you
consider critical to your own work or that of your Research Group/Department. Including datasets and
information systems that:
12. Please provide the following high level information for each data asset
Question Mandatory Available Responses
Y/N
a) Brief Description N Free text
b) Principle Data Type N • Raw data generated by
program
• Raw data from instrument
• Images, scans or x-rays
• Digital audio
• Digital video
• Database (e.g. MySQL,
Oracle )
• Text document (e.g. Word,
PDF)
• Spreadsheet (e.g. Excel)
• Other proprietary format
• Software
• Lab notes
• Patient data
• Other
c) Effort Associated with Creation of data N • Hours
• Days
• Weeks
• Months
• Years
• Other
d) Planned Retention Period N • < 1 year
• 1 - 2 years
• 2 - 5 years
• 5 - 10 years
• 10 - 20 years
• 20 - 100 years
• 100+ years
• Indefinitely
• Don't know
• Other
15
e) Frequency of Use N • Daily
• Weekly
• Monthly
• Yearly
• Reference Only
• Other
f) Estimated ‘final’ Size of Data N • < 1 GB
• 1 - 50 GB
• 50 - 100 GB
• 100- 500 GB
• 500 GB - 1 TB
• Multiple TB's
Final Page
Thank you for completing this survey, your contribution is very much appreciated.
If you have any questions relating to this survey or if you would like to contribute to the formation of a
research data management strategy, please click the 'Contact Us' button at the bottom of this screen.
16
University of Southampton Questionnaire
You may re-use or adapt this documentation for research or private study with acknowledgement to
McGowan, T. & Gibbs, T. A. (2009) Southampton Data Survey: Our Experiences & Lessons Learned
[unpublished]. University of Southampton: UK.
Thank you for participating in this survey which aims to find out about research data held by staff in
the School of Social Sciences and improve our understanding of the data management processes you
employ.
For the purpose of this study 'research data' is data that you currently hold that has been collected
and/or used in the course of your research at the University of Southampton. Research data can be
primary data collected by you or your research group or secondary data provided by a third party. It
may be quantitative or qualitative e.g. survey results, interview transcripts, databases compiled from
documentary sources, images or audiovisual files.
Data that you 'currently hold' is all the research data that you currently store anywhere. For
example, in your 'My Documents' folder, on the shared 'R' drive, a PC or laptop, on portable media such
as CDs or memory sticks, or on paper.
It would help us greatly if you respond to this questionnaire even if you do not currently hold any
research data (you will only be required to answer 2 questions).
The questionnaire is a maximum of 25 questions and should take no more than 10 minutes to
complete.
Please read the following statements carefully before agreeing to take part in this study;
I have read and understood the participant information sheet (attached to the email in which you
received this link).
I understand that;
• All results from this study will be anonymous. Information extracted from this questionnaire
and any subsequent interview will not, under any circumstances, contain names or identifying
characteristics of participants.
• I am free to withdraw from this study at any time without penalty.
• I am free to decline to answer particular questions.
• Whether I participate or not there will be no effect on my progress in employment in any way.
I consent to take part in this study on the terms described above; Yes No
17
1. Do you currently hold any research data?
Yes
No [GO TO END]
2. Thinking about the primary data you hold, what type of data is it? [Please select all that apply]
I don't hold any primary data [GO TO QUESTION 4]
Cross sectional survey data
Longitudinal survey data
Interview/focus group transcripts
Database compiled from documentary sources
Image files
Audio files
Audio-visual files
Other
If other, please specify
3. Who funded the collection of the primary data you hold? [Please select all that apply]
ESRC
EU-EDULINK
Leverhulme Trust
Nuffield Foundation
UK Government department
Wellcome Trust
Other
If other, please specify
18
4. Thinking about the secondary data you hold, who collected this data? [Please select all that apply]
I don't hold any secondary data [GO TO QUESTION 6]
Datastream (Thomson Reuters)
Eurostat
International Labour Organization (ILO)
Measure DHS
Organisation for Economic Co-operation and Development (OECD)
Office for National Statistics (ONS)
US Census Bureau
World Bank
World Health Organization (WHO)
Other
If other, please specify
5. What type of secondary data is it? [Please select all that apply]
Cross sectional survey data
Longitudinal survey data
Interview/focus group transcripts
Database compiled from documentary sources
Image files
Audio files
Audio-visual files
Macro-economic time series data
Stock market data
Company level data
Other
If other, please specify
6. The remaining questions relate to all the data you currently hold, both primary and secondary;
When using or creating this data, did you collaborate with anyone else?
Yes
No [GO TO QUESTION 9]
19
7. How did you share data when you were collaborating? [Please select all that apply]
By emailing files to colleagues
Using a shared storage facility
Using portable storage such as CDs, DVDs, memory sticks etc
Other
If other, please specify
8. Did you encounter any practical problems when you were collaborating? [Please select all that
apply]
No
Finding suitable shared storage space
Lack of file naming conventions made it difficult to identify files
Lack of version control caused confusion
Legal issues arising from international transfer of data
Problems establishing ownership of data
Other
If other, please specify
9. Where do you store your data (excluding back up copies)? [Please select all that apply]
On paper
My Documents
Shared drive (R-drive)
Hard drive of office PC
Hard drive of laptop PC
Memory stick/USB/Flash drive
CD/DVD
External hard drive
Other
If other, please specify
20
10. Have you ever experienced any problems storing your research data due to the size of the files?
Yes
No [GO TO QUESTION 12]
what problems
11. How did you overcome these storage problems? [Please select all that apply]
Requested additional storage space from iSolutions
Purchased an external hard drive
Saved to portable media
Other
If other, please specify
Yes, all of it is
Yes, some of it is
No, none of it is [GO TO QUESTION 14]
21
14. Do you deposit your data with a data service, such as the UK Data Archive?
15. Do you think that any of your data needs to be preserved by the University for your own use or
that of others?
Yes
No [GO TO QUESTION 17]
16. If you would like someone from the University Library to contact you about preserving your data
please enter your name below;
17. Thinking about your data that is not deposited with a data service, could any of this data be re-
used by others?
22
18. Thinking about your data that can't be re-used or shared, please tell us why [Please select all that
apply];
Confidentiality or data protection issues
Licence agreements prohibit sharing
The data is not fully documented
The data is in a format that is no longer widely readable [IF SELECTED GO TO QUESTION 19,
OTHERWISE GO TO QUESTION 20]
Other
If other, please specify;
19. Please provide brief details of the data you have that is no longer widely readable (e.g. what
software/hardware the data is on, its age etc);
20. Would you like to receive any additional support with managing your data? [Please select all that
apply]
Training
Written guidance
Help with writing data management plans for research bids
Additional personal storage
Additional shared storage
None
Other
If other, please specify
23
21. Which Division do you work in?
Economics
Gerontology
Politics and International Relations
Sociology and Social Policy
Social Statistics
Social Work Studies
22. Would you be prepared to participate in a follow up interview to explore data management
issues in more depth (max. 1 hr)?
Yes
No
If yes, please provide your name and email address so that we can contact you;
23. If you would like to expand on any of your above answers or make further comment, please do so
here;
24
University of Southampton Generic Interview Schedule
You may re-use or adapt this documentation for research or private study with acknowledgement to
McGowan, T. & Gibbs, T. A. (2009) Southampton Data Survey: Our Experiences & Lessons Learned
[unpublished]. University of Southampton: UK.
Introduction
INTRODUCE
My name is Teresa McGowan and this is Harry Gibbs. Harry is the School of Social Sciences librarian
and I am a research assistant here in the School.
RESEARCH
We are working together on a project funded by the Joint Information Systems Committee (JISC). JISC
has developed a framework methodology aimed at helping institutions find out what research data
they hold, where it’s located and who is responsible for it. We are using an adaptation of that
framework today to test its usefulness and to help the School of Social Sciences find out more about
data management and what can be done to aid staff in the use and management of their data.
THANKS
Thank you for agreeing to take part in this interview. Based on the information that we receive we will
produce two reports, one for JISC simply discussing how we used and modified their framework, and a
second for the University which we hope will be used to improve data management in the school.
We would like to record our discussion as it is so difficult to write down everything that is said, and we
don’t want to miss anything. What you say in this interview will be anonymous – your names will not
be recorded on the transcripts and only me, Harry and one transcriber will have access to the
recording and notes. No reports or publications that are produced will identify you in any way.
WANT TO KNOW
Thank-you for taking part in the questionnaire, the purpose of this interview is to find out more about
the data you hold that has been collected or used in the course of your research at this University and
your experience of managing this data. There are no right or wrong answers, we are just interested
in what you have done and how you did it.
We want this to be more like a discussion than a question and answer session. We have a list of x
things we are interested in but it is important to us that you tell us about what is important to you.
If there is anything I ask that you don’t understand please tell us and we can explain further. If there is
anything you want to ask us you can do that too. (If they ask questions that anticipate later
discussions, ask if it’s OK to leave it until later)
2 We can see from the questionnaire that you hold xxx data, please could you give
us some more details about the xxx data that you compiled from documents?
Data Holdings
Name of Interviewee Primary Data Secondary Data
[interviewee’s name] [list of primary data types held by [list of secondary data types held
individual, as reported in by individual, as reported in
questionnaire] questionnaire]
26
(for social surveys this will often be the same as the time
period covered)
Sample size & The number of individuals surveyed and characteristics
description
Current location Path or www address where the data can be found
Format Physical formats of dataset, including file format
information
Size Size of the data in Mb/Gb
Restrictions Access restrictions placed on user of secondary data or
restrictions owner would place on reuse of primary data
Documentation Documentation that is available (e.g. user manuals, code
available books), including references to its location
Retention period Planned retention period for the data & ideal retention
period
OR
3 Can you tell us about your experience of data management or data sharing
plans?
Prompts Tick
YES Which funder?
27
4 TO COLLABORATORS:
In the questionnaire you mentioned you had some problems collaborating, could you
tell us some more about this?
OR
Please can you tell us about one experience of collaborating?
Problems collaborating
[name] [list of problems reported in questionnaire]
Prompts Tick
Who, what Who with?
where?
Where were they geographically?
OR
4 TO NON COLLABORATORS:
Prompts Tick
Version control
28
5 BACK UP
You told us you use xxx methods to store and back up data, can you tell us why you
chose these methods?
Storage location
Main Back up
[name] [list of main storage methods [list of back up methods
reported in questionnaire] reported in questionnaire]
Prompts Tick
What affects your choices? Anticipated lifespan
Importance
Confidentiality
Physical space
File size
AND
You said you don’t back up all your data, can you explain to us why
you don’t?
29
6 You mentioned that you had data storage problems and [list of problems reported in
questionnaire], can you tell us a bit more about what happened?
Tick
Was the unforeseen expense a problem for the project?
7 In the questionnaire you said you did not have anything that should be preserved by
the university, do you have anything that you think should be preserved by yourself
or anyone else?
Prompts Tick
YES How could it be best preserved?
Could you tell us about it?
Why should it be preserved?
NO Already preserved?
Why doesn’t it need (BY UKDA)
preserving? Data not reusable [why?]
Time
Money
OR
7 You mentioned in the questionnaire that you have something that you think the
university should preserve for the future, could you tell us about it now?
Prompts Tick
How could it be best preserved?
30
8 What support is available to you to help you manage your data?
OR
8 In the questionnaire you said you would like some additional support in carrying out
data management, what support is available to you now?
Prompt Tick
How sufficient is it?
9 (Summarise what’s been said, then;) Is there anything you can think of I haven’t
asked or anything you wanted to say that has not been covered?
INVITE HARRY TO ASK ANY QUESTIONS ON ANYTHING SHE WOULD LIKE TO FOLLOW
UP/CLARIFY
Closure
Thank-you for allowing us to talk to you today, it has been very interesting to
listen to your views. We will email you to let you know when the results are
available.
Thanks again for coming today, we are very grateful for your help.
31
University of Oxford Interview Framework
The following framework is based on the interview frameworks developed for the IBVRE3 and eIUS4
projects with some changes to adjust it to the aim and objectives of this scoping study.
Introduction
Give brief introduction to the Scoping Digital Repositories Services for Research Data Management
including overall aim and objectives. Provide an overview of the questions that will follow and remind
the interviewee about the nature of the semi-structured interview, the intention of taking notes,
record the interview (with permission) and to publish findings.
Interview
1. Could you briefly explain your area of research and the types of research questions, with examples,
that you try to answer?
2. I am interested in learning more about the research tasks that involve some form of data
management that you carry out in order to help you move forward with your research agenda. I’m
interested in doing this by going through one of your research projects in the context of a generic
“research life-cycle”, from funding application, data collection/processing, all the way to publishing, in
order to understand to what extent the following elements fit in your average working day.
a. The funding application – increasingly funding agencies require data management and data
sharing plans as part of the funding application.
- When applying for funding how do you decide that new data will need to be collected
and how do you go about providing a plan for this?
With this question I want to learn more about how researchers think about data at this stage, why
they decide that data needs to be collected, how they ensure that this data has not been created
already and how they go about making data management plans.
3
The integrative biology virtual research environment project :http://www.vre.ox.ac.uk/ibvre/
4
The e-Infrastructure use cases and service usage models http://www.eius.ac.uk/
32
b. Data collection –
- Could you please explain what sorts of data (primary, secondary, experimental,
simulation) you collect and provide details about the process of collection?
In this part my aim is to engage in conversation to find out about data collection methods, types of data
produced, the instruments and software used to do this and whether the data could be helpful to
others. I will also ask about secondary data to find out where and how are found and accessed. Finally I
will explore why the collection of data happens in the way described (is it a discipline or departmental
common practice?)
c. Processing of data –
- Once the data have been collected could you describe how they get processed i.e. how
they get annotated, where are they stored, what security measures are taken to
preserve confidentiality or integrity, etc?
d. Publishing – the publication of the research outputs is the end of this generic “research life-cycle”,
what happens with the data after this i.e. they get published or deposited somewhere, you need to
destroy the data, etc?
In this part of the life cycle I want to find out whether deposit in an archive occurs and if not I will
attempt to find out the reasons that stop researchers doing so (data needs to be destroyed, does
not want to share initially or at all, no place to deposit, etc) and where will the data be stored.
33
3. How are researchers supported either at local or institutional level for carrying out all the
management of data required?
With this question I will attempt to figure out how support for data management across the generic
life cycle occurs (researchers help each other at local level, departmental guidelines, etc).
4. What are your challenges and worries when managing research data and what services would help
you do this work more effectively?
With this question I will attempt to get a top 3 requirements for services that would be most useful to
researchers.
De-Brief
7. What are the benefits you believe you get from participating?
8. Could you suggest anyone you know that could participate in these interviews?
34
University of Glasgow digital preservation study5: interview template
It is expected interviews will take between 30 mins - 1 hour. Ideally these would be recorded then
transcribed, with the text sent back for approval. An overview of the topics to be discussed will be
circulated in advance to allow the interviewee to prepare ideas.
At the start of the interview, details of the preservation study and explanation of terms will be
provided. Scoping interviews will be semi-structured to allow free-flowing discussion. The
questions provided below are indicative of the topics that may be discussed. Each interview will
cover five themes:
Working practices
Discuss what happens in terms of digital curation i.e. creating, maintaining and preserving
electronic records. Are there set procedures? What role does each person play…
Individual
− How do you create electronic records? – naming conventions, filing rules…
− Where you store files? Do you back them up or is this done centrally?
− How do you manage digital files e.g. do you sort through and weed them?
− What happens in terms of email? Do you save or print certain messages?
− Do you work differently on research projects due to funding body requirements?
Departmental
− Are there departmental guidelines, policies or procedures you follow?
− Who is responsible for digital material? What role does each person play?
− What happens in terms of legacy material i.e. files created by former staff?
− Do you know when, how and what is backed up centrally?
− Who can access electronic material? How is this controlled? Explain restrictions
5
See: http://www.gla.ac.uk/departments/hatii/research/digitalpreservationpolicystudy
35
Digital preservation issues
Continue discussion to ascertain whether any issues have been encountered when creating and
using electronic material to identify areas where practices could improve
− Have you ever lost digital files or found it hard to find the right ones?
− Are there version control issues when working with colleagues?
− Is it difficult to understand other people’s systems on the shared drive?
− Have you struggled to use older files? e.g. obsolete format, outdated disk…
− Do you have enough storage space? If not, where do you keep material?
Access
− Could your electronic material be reused or repurposed by others?
− Are there any sensitivity or confidentiality restrictions?
− Would other people understand your material - is it documented?
Preservation
− Does all digital material or just a subset need to be preserved in the long-term?
− Who would know what to keep and for how long? Who makes the decision?
− Is there a place where your digital material can be preserved?
Service requirements
Ask where the interviewee currently gets advice and support and what else s/he would like to see
provided by the University. Key thing is to gauge desire for preservation policy, suggested coverage
and any supplementary support needed to implement it.
− Have you used the records management service, archive or Enlighten? Are you aware of
what these services can offer?
− Where do you currently get advice and support?
− What would help you create and manage your electronic files better?
− Who should be responsible for / fund digital preservation?
− Would you welcome a University wide policy on digital preservation? If so, what should it
cover?
36