Wikidata:Requests for permissions/Bot/DifoolBot 5

From Wikidata
Jump to navigation Jump to search

DifoolBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Difool (talkcontribslogs)

Task/s:: Change reference URLs into the related ID property and merge references with the same ID property.

Code:: at Github

Function details:

This task is based on a request of @Jahl de Vautban: The script will iterate through pages based on a search query and examine all references on that page. Here are the steps it will follow:

  1. Change a reference URL (P854) into the related ID property and stated in (P248). So, for example, reference URL (P854)https://www.idref.fr/149649045 is changed into IdRef ID (P269) 149649045. The related ID property is determined based on data from Wikidata, namely pages with properties applicable 'stated in' value (P9073) and URL match pattern (P8966). Here is an example edit.
  2. Merge references with the same ID property. Example edit.
  3. Change references with a reference URL (P854) that has an archive URL to use archive URL (P1065). Example edit.
  4. If the references of a claim are changed, remove references with an imported from Wikimedia project (P143) or Wikimedia import URL (P4656), but only if the claim contains another reference with a stated in (P248). Example edit.

Example search queries are: idref.fr, 80.000 pages, rkd.nl, 185.000 pages and bnf.fr, 180.000 pages.

More example edits can be found here.

--Difool (talk) 02:06, 19 July 2024 (UTC)[reply]

 Strong support clearly needed maintenance, especially useful for making items more readable and reduce their size cutting only redundant data. Epìdosis 06:39, 19 July 2024 (UTC)[reply]
 Support thanks for taking care of it! --Jahl de Vautban (talk) 11:53, 19 July 2024 (UTC)[reply]
 Comment @Difool: how about adding to the bot tasks also the case of Bibliothèque nationale de France ID (P268) in cases like this? It would be very useful. --Epìdosis 16:45, 19 July 2024 (UTC) P.S. Reading again point 1, I guess it's probably already included, but it's just to be sure. --Epìdosis 16:46, 19 July 2024 (UTC)[reply]
@Epìdosis: no, it hasn't been included yet: the page Property:P268 contains a URL match pattern (P8966) with a similar regular expression ^https?:\/\/(?:data|catalogue)\.bnf\.fr\/\w\w\/(\d{8,9}). However, this pattern doesn't match the URL http://data.bnf.fr/ark:/12148/cb12197229. Although I use custom regular expressions to match older URLs, I decided not to do so in this case because the Bibliothèque nationale de France ID (P268) link leads to the 'catalogue' page rather than the 'data' BnF page. Some people may prefer to keep it that way. If there are no objections, I can include a custom regular expression for it. Difool (talk) 18:29, 19 July 2024 (UTC)[reply]
I know that effectively data.bnf.fr and catalogue.bnf.fr are different sites (which is often a bit confusing). Of course I would understand the reasons of potential objections of persons preferring to keep them as they are now. However, since in fact they just display the same data in different ways, I would personally support adding a custom regular expression for them. Epìdosis 18:32, 19 July 2024 (UTC)[reply]
 Support - Mbch331 (talk) 09:41, 23 July 2024 (UTC)[reply]
 Comment One reason to not do this I can think of is that the original reference URL is lost when the formatter URL pattern changes. I very much like to idea, but I think that the original URL needs to be archived on the Internet Archive when the change is made (similar reasons why we use "object named as"). Maybe that the archiving is already done by a second bot, but then I like the two to work together. This is not a blocker. Egon Willighagen (talk) 06:30, 25 July 2024 (UTC)[reply]
Yes, I've had some uncertainty about whether to retain the reference URL or omit it when the bot includes the related ID property. The related ID property includes a link to the current URL, so the only real reason for keeping a reference URL would be to dig up old data from a web archive. But IMO the page associated with the external ID should contain information that enables you to construct that 'old' URL (if the reference also has a 'retrieved' or 'publication date' property)
Note that Help:Sources#Databases also states that you don't need to include a reference URL for a reference to an "internet accessible database". Difool (talk) 06:55, 26 July 2024 (UTC)[reply]