Additional Wikidata tab on leaves's description #863

oolonek · 2024-07-09T17:35:17Z

This would be a nice addition to acess to the Wikidata page of a given taxon when clicking on it's leaf.

For example for https://www.onezoom.org/life/@Aloe_ferox=608115 one could reach https://www.wikidata.org/wiki/Q1194889

Using Qlever, all pairs of Open Tree of Life IDs and Wikidata QID can be retrieve in ms https://qlever.cs.uni-freiburg.de/wikidata/MjoDT0?exec=true. Currently yielding 2'034'851 pairs.

hyanwong · 2024-07-09T17:44:46Z

We have the wikidata ID anyway, in the ordered_leaves table, so we don't need to use the Qlever site (although I'm intrigued how that site works).

davidebbo · 2024-07-09T17:45:58Z

As a workaround, note that from the Wikipedia page, you can choose Tools / Wikidata item to go to that. So it's indirect, but there is a path to it...

hyanwong · 2024-07-09T17:46:22Z

Aha, of course, the OTT IDs are now on wikidata (they used not to be, I argued for their introduction), so we can find the mapping using a sparQL command. Neat.

davidebbo · 2024-07-09T17:49:45Z

the OTT IDs are now on wikidata

Oh, I didn't know that! It's P9157.

hyanwong · 2024-07-09T18:06:34Z

Yes, I noticed it the other day. It's new, I think (created 2021)

hyanwong · 2024-07-09T18:08:49Z

It could be that this is a better way to get the mappings now, rather than going via the ncbi IDs etc.

We could probably check how accurate and comprehensive our mapping is, versus the one on wikidata. If we can simply move to using wikidata, it would probably simplify the code considerably. However, my suspicion is that there are lots of OTT taxa that have NCBI / GBIF ids but which aren't currently on wikidata.

davidebbo · 2024-07-09T18:19:35Z

It could be that this is a better way to get the mappings now, rather than going via the ncbi IDs etc.

We could probably check how accurate and comprehensive our mapping is, versus the one on wikidata. If we can simply move to using wikidata, it would probably simplify the code considerably. However, my suspicion is that there are lots of OTT taxa that have NCBI / GBIF ids but which aren't currently on wikidata.

Yes, that was my first thought when I saw that. It has the potential to simplify things a lot. For now, it would be easy to add some instrumentation that checks whether the QID we find via other paths maps back to the same ott.

Anyway, we're digressing a bit from @oolonek's request 😄

oolonek · 2024-07-10T06:38:58Z

It could be that this is a better way to get the mappings now, rather than going via the ncbi IDs etc.

We could probably check how accurate and comprehensive our mapping is, versus the one on wikidata. If we can simply move to using wikidata, it would probably simplify the code considerably. However, my suspicion is that there are lots of OTT taxa that have NCBI / GBIF ids but which aren't currently on wikidata.

This would be interesting to find out which are missing. Do you expect taxa not to have their WD entry or rather to be present on WD but simply lack their OTT id on their WD page ? In both case it will be of interest to find out and eventually work on pushing the missing info to WD. I will look at this on my side also. Thanks for your quick feedbacks :)

davidebbo · 2024-07-10T07:31:39Z

It's probably going to be a combination of things.

Looking at the DB, out of 2,235,475 leaf taxa, 403,072 don't have a WD entry that we were able to locate (18%)
When we have WD entries, some may be missing an OTT
Some may have an OTT that is different from the OTT we have for the taxa

But for the last two, we really don't know right now because we've never looked at the WD OTT field. But it would be interesting to get that data.

hyanwong · 2024-07-10T08:08:34Z

Good summary. Thanks @davidebbo . And yes, it would be interesting to see how this compares to what wikidata think is the correct mapping.

mdrishti · 2024-07-11T13:15:04Z

Hi,

I have also been working on getting the taxonomic ids from ott and taxonomies from 11 other dbs (gbif, ncbi, eol, itis etc) corresponding to wikidata ids. Found that ~2,032,649 wd ids have ott and 1,435,238 wd ids don't. The latter map to other databases.
On the other hand, out of total 4,528,302 ott ids, 2,530,549 don't have wd ids.

There are 3,826,740 ott ids which are either at species/strain level. I was wondering about the criteria used for keeping the ott id in OneZoom. Also, do all 2,235,475 leaf taxa in OneZoom have an ott id?

Too many numbers above! Sorry!

hyanwong · 2024-07-11T13:58:28Z

I was wondering about the criteria used for keeping the ott id in OneZoom

We tend to retain all the OTTs that are present in the synthetic OpenTree (give or take some that differ because of using bespoke trees in particular areas of the tree, mostly mammals / birds)

davidebbo · 2024-07-11T14:44:19Z

3,826,740 - 2,235,475 = 1,591,265. That's a huge number of species otts that are not in the OneZoom tree. But I do see the same thing if I filter taxonomy.tsv for only species.

I guess that means that all these are incertae sedis, and hence not in the synthetic tree?

davidebbo · 2024-07-11T21:03:25Z

I did some instrumentation. Out of 1,817,682 OneZoom otts that we are mapping to a Wikidata item:

1,607,691 (~88%) have an ott in Wikidata, and it matches our ott
2,893 (<1%) have an ott in Wikidata that does not match our ott
207,098 (~11%) don't have an ott in Wikidata

hyanwong · 2024-07-11T21:56:36Z

Nice. Thanks @davidebbo. It's good there aren't many wrong matches. Seems like we could switch at some point to using wikidata to provide all our mapping then. What we would be missing is data to do with other identifiers, like NCBI, which we get automatically from the opentree.

However, I think it would be fine to omit all the ncbi -> wikidata mapping, and just go straight to mapping OTT from the wikidata JSON dump to the WD qID.

oolonek · 2024-07-12T06:50:54Z

I did some instrumentation. Out of 1,817,682 OneZoom otts that we are mapping to a Wikidata item:

1,607,691 (~88%) have an ott in Wikidata, and it matches our ott

2,893 (<1%) have an ott in Wikidata that does not match our ott

207,098 (~11%) don't have an ott in Wikidata

Hi @davidebbo are these files somewhere on the OneZoom repo or were they generated elsewhere ? Would you mind sharing ? Also, I guess it is the case, but just to be sure, could you confirm its OTT 3.6 you are using ?

oolonek · 2024-07-12T07:04:24Z

Nice. Thanks @davidebbo. It's good there aren't many wrong matches. Seems like we could switch at some point to using wikidata to provide all our mapping then. What we would be missing is data to do with other identifiers, like NCBI, which we get automatically from the opentree.

However, I think it would be fine to omit all the ncbi -> wikidata mapping, and just go straight to mapping OTT from the wikidata JSON dump to the WD qID.

Why not also rely on WD to retrieve the NCBI ids ?
WD could be the single source for all taxa ids like this their would be a single place to work on to improve mappings.

See https://qlever.cs.uni-freiburg.de/wikidata/QObdaz?exec=true

davidebbo · 2024-07-12T07:31:43Z

Why not also rely on WD to retrieve the NCBI ids ?
WD could be the single source for all taxa ids like this their would be a single place to work on to improve mappings.

Yes, that would be a good end state if the data quality is sufficient. In such a world, we may not need to use the OpenTree taxonomy file at all. We could also do away with all the EOL logic.

Basically, we'd have:

A newick tree that includes ott's
We'd use WD to map that ott to a WD item, and to all the other sources
We'd also use WD for all medias

If we went in that direction, we should probably do a rewrite of the tree building logic, rather than iteratively move it in that direction.

I don't think we're quite ready for that yet, but it is a direction.

hyanwong · 2024-07-12T11:22:57Z

Why not also rely on WD to retrieve the NCBI ids ?

I'm not sure that's so sensible, because the OTT IDs are based on NCBI, GBIF, etc. So the OTT taxonomy.tsv file is the canonical source of the NCBI ids that go into generating an OTT.

I.e. the mappings in the taxonomy.tsv file is the definition of an OTT, for a given OpenTree release.

oolonek mentioned this issue Jul 9, 2024

Additional iNaturalist tab on leave's description #864

Open

jrosindell added the EMI proposal label Jul 19, 2024

jrosindell mentioned this issue Jul 19, 2024

Tidy up tabs and add links #865

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional Wikidata tab on leaves's description #863

Additional Wikidata tab on leaves's description #863

oolonek commented Jul 9, 2024

hyanwong commented Jul 9, 2024 •

edited

Loading

davidebbo commented Jul 9, 2024 •

edited

Loading

hyanwong commented Jul 9, 2024

davidebbo commented Jul 9, 2024

hyanwong commented Jul 9, 2024

hyanwong commented Jul 9, 2024

davidebbo commented Jul 9, 2024

oolonek commented Jul 10, 2024 •

edited

Loading

davidebbo commented Jul 10, 2024

hyanwong commented Jul 10, 2024

mdrishti commented Jul 11, 2024 •

edited

Loading

hyanwong commented Jul 11, 2024 •

edited

Loading

davidebbo commented Jul 11, 2024

davidebbo commented Jul 11, 2024

hyanwong commented Jul 11, 2024

oolonek commented Jul 12, 2024

oolonek commented Jul 12, 2024

davidebbo commented Jul 12, 2024

hyanwong commented Jul 12, 2024 •

edited

Loading

Additional Wikidata tab on leaves's description #863

Additional Wikidata tab on leaves's description #863

Comments

oolonek commented Jul 9, 2024

hyanwong commented Jul 9, 2024 • edited Loading

davidebbo commented Jul 9, 2024 • edited Loading

hyanwong commented Jul 9, 2024

davidebbo commented Jul 9, 2024

hyanwong commented Jul 9, 2024

hyanwong commented Jul 9, 2024

davidebbo commented Jul 9, 2024

oolonek commented Jul 10, 2024 • edited Loading

davidebbo commented Jul 10, 2024

hyanwong commented Jul 10, 2024

mdrishti commented Jul 11, 2024 • edited Loading

hyanwong commented Jul 11, 2024 • edited Loading

davidebbo commented Jul 11, 2024

davidebbo commented Jul 11, 2024

hyanwong commented Jul 11, 2024

oolonek commented Jul 12, 2024

oolonek commented Jul 12, 2024

davidebbo commented Jul 12, 2024

hyanwong commented Jul 12, 2024 • edited Loading

hyanwong commented Jul 9, 2024 •

edited

Loading

davidebbo commented Jul 9, 2024 •

edited

Loading

oolonek commented Jul 10, 2024 •

edited

Loading

mdrishti commented Jul 11, 2024 •

edited

Loading

hyanwong commented Jul 11, 2024 •

edited

Loading

hyanwong commented Jul 12, 2024 •

edited

Loading