Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
What are the Drivers?
…and how do we intend to meaningfully respond to them?
David P. Shorthouse
Canadian Museum of Nature
Agriculture & Agri-Food Canada (April 1)
“Is it possible that the lack of recognition in the academic assessment
system of these forms of productivity has contributed to the diminished
status—indeed even the near disappearance from many academic
departments—of traditional systematics…”
Collecting
Curating
Identifying
Naming
Natural History Museums Desperately Want
Brand Awareness
Meaningful Measures of Impact
“…trust in an aggregator is not just a feature of the data signal quality provided by the
sources to the aggregator, but also a consequence of the social design of the
aggregation process and the resulting power balance between individual data
contributors and aggregators.”
How Do We Fix This?
recognition for taxonomist
recognition for host institution
recognition for taxonomists’
institution
Fully automated
Quantifiable
Ingredients to Make This Happen
Newly digitized specimen
IRI identifiedBy
http://rs.tdwg.org/dwc/iri/identifiedBy
https://orcid.org/0000-0001-9144-2848
institutionCode
ORCID: ringgold, GRID
dateIdentified
ORCID: employment/education
start/end date
GRBIO ?
not sameAs
Are There Other Drivers?
Newly digitized specimen
IRI identifiedBy
http://rs.tdwg.org/dwc/iri/identifiedBy
https://orcid.org/0000-0001-9144-2848
Shorthouse - Authority Management of People Names Workshop
Shorthouse - Authority Management of People Names Workshop
https://bloodhound-tracker.net
Shorthouse - Authority Management of People Names Workshop
Shorthouse - Authority Management of People Names Workshop
Shorthouse - Authority Management of People Names Workshop
Shorthouse - Authority Management of People Names Workshop
For the DeceasedFor the Living
Cautionary Tale
Retrospective & Prospective Approaches
Shorthouse - Authority Management of People Names Workshop
Retrospective Approach…Layers of Dirt
• Strings to things
• Parsing, eg ruby gems Namae, DwcAgent
• Entity extraction, eg Rosette, https://www.rosette.com/, Watson Natural
Language
• Similarity scoring, eg R.D.M. Page <=> Roderic Page <=> Roderic D.M. Page
• Search logic
• Disambiguation
• Co-author, co-collector networks
• Collector codes
• Hand-crafted heuristics, eg birth/death/collection dates, taxa, places
Prospective Approach…Clean Dirt
RDA / TDWG Metadata Standards for
attribution of physical and digital collections stewardship
Chairs: Anne Thessen, Matt Woodburn, Dimitris Koureas
Final Recommendations: https://github.com/tdwg/attribution/blob/master/RDA_recommendations.md
Shorthouse - Authority Management of People Names Workshop
What “Actions” Do We Care About?
• authored
• borrowed
• catalogued
• collected
• conserved
• contributed
• created
• curated
• …
• georeferenced
• reviewed
https://github.com/tdwg/attribution/issues/5
Wishlist
• Test suite for parsing lists of names: text file with expectedJSON response
Charles R. Darwin Esq.
[{ “family”: “Darwin”, “given”:”Charles R.”, “title”:”Esq.”}]
leg. A. Chuvilin
[{“family”:”Chivilin”,”given”:”A.”}]
N. Navarro, G. Gómez y A Ferreira
[{“family”:”Navarro”,”given”:”N.”},
{“family”:”Gómez”, “given”:”G.”},
{“family”:”Ferreira”, “given”:”A”}]}
Wishlist
• Common, consistent way to handle search
• Elasticsearch, Solr plugin
• Services
• Input: raw string of name(s), optional parameters
• Output: parsed name, identifiers, likelihood score
• Actions for inclusion in a DwC extension

More Related Content

Shorthouse - Authority Management of People Names Workshop

  • 1. What are the Drivers? …and how do we intend to meaningfully respond to them? David P. Shorthouse Canadian Museum of Nature Agriculture & Agri-Food Canada (April 1)
  • 2. “Is it possible that the lack of recognition in the academic assessment system of these forms of productivity has contributed to the diminished status—indeed even the near disappearance from many academic departments—of traditional systematics…” Collecting Curating Identifying Naming
  • 3. Natural History Museums Desperately Want Brand Awareness Meaningful Measures of Impact
  • 4. “…trust in an aggregator is not just a feature of the data signal quality provided by the sources to the aggregator, but also a consequence of the social design of the aggregation process and the resulting power balance between individual data contributors and aggregators.”
  • 5. How Do We Fix This?
  • 6. recognition for taxonomist recognition for host institution recognition for taxonomists’ institution Fully automated Quantifiable
  • 7. Ingredients to Make This Happen Newly digitized specimen IRI identifiedBy http://rs.tdwg.org/dwc/iri/identifiedBy https://orcid.org/0000-0001-9144-2848 institutionCode ORCID: ringgold, GRID dateIdentified ORCID: employment/education start/end date GRBIO ? not sameAs
  • 8. Are There Other Drivers?
  • 9. Newly digitized specimen IRI identifiedBy http://rs.tdwg.org/dwc/iri/identifiedBy https://orcid.org/0000-0001-9144-2848
  • 17. For the DeceasedFor the Living
  • 21. Retrospective Approach…Layers of Dirt • Strings to things • Parsing, eg ruby gems Namae, DwcAgent • Entity extraction, eg Rosette, https://www.rosette.com/, Watson Natural Language • Similarity scoring, eg R.D.M. Page <=> Roderic Page <=> Roderic D.M. Page • Search logic • Disambiguation • Co-author, co-collector networks • Collector codes • Hand-crafted heuristics, eg birth/death/collection dates, taxa, places
  • 22. Prospective Approach…Clean Dirt RDA / TDWG Metadata Standards for attribution of physical and digital collections stewardship Chairs: Anne Thessen, Matt Woodburn, Dimitris Koureas
  • 25. What “Actions” Do We Care About? • authored • borrowed • catalogued • collected • conserved • contributed • created • curated • … • georeferenced • reviewed https://github.com/tdwg/attribution/issues/5
  • 26. Wishlist • Test suite for parsing lists of names: text file with expectedJSON response Charles R. Darwin Esq. [{ “family”: “Darwin”, “given”:”Charles R.”, “title”:”Esq.”}] leg. A. Chuvilin [{“family”:”Chivilin”,”given”:”A.”}] N. Navarro, G. Gómez y A Ferreira [{“family”:”Navarro”,”given”:”N.”}, {“family”:”Gómez”, “given”:”G.”}, {“family”:”Ferreira”, “given”:”A”}]}
  • 27. Wishlist • Common, consistent way to handle search • Elasticsearch, Solr plugin • Services • Input: raw string of name(s), optional parameters • Output: parsed name, identifiers, likelihood score • Actions for inclusion in a DwC extension