Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Logo of Wikidata Welcome to Wikidata, Peter F. Patel-Schneider!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, don't hesitate to ask on Project chat. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards! --Tobias1984 (talk) 15:54, 19 October 2015 (UTC)Reply

Further reading

edit

You might also be interested in some of the RfCs (both commenting on new ones and reading the old ones): Wikidata:Requests_for_comment. --Tobias1984 (talk) 15:51, 21 October 2015 (UTC)Reply

@Tobias1984: Thanks. I have been reading them and have already made some comments.
I just noticed and was reading some of them. It seems you already have a lot of background knowledge and will transition quickly into this ecosystem. Did you also look at the Wikidata:Property proposal subpages? There are always some good discussions about modelling going on there. --Tobias1984 (talk) 18:50, 21 October 2015 (UTC)Reply

Ships and boats

edit

Hi Peter, last week you did several changes to ship-related items like inland waterway vessel (Q863970) and museum ship (Q575727). In the result, now the inland cargo vessels like c:File:Hannover binnenschiff leer 02.jpg now are derived from boat instead of ship (Q11446). In German, inland waterway vessel (Q863970) refers to "Binnenschiff" which means "inland vessel" so I think the former subclass of ship are better than from boat. Should we create a new class of "Invalnd vessel" or should we use inland waterway vessel (Q863970) with a broader english description of "inland vessel" (like the other languages) or ... ? -- Gerd Fahrenhorst (talk) 10:07, 13 August 2023 (UTC)Reply

Hi Peter, you now created a new class inland waterway vessel (Q121365935) and modified Europaship (Q1375735). This is not good solution because now all the interwiki links to river ships (=inland vessels) remain on inland waterway vessel (Q863970) and it is uncertain what the difference to inland waterway vessel (Q121365935) is. Please let us discuss a proper solution before continuing! Gerd Fahrenhorst (talk) 12:37, 13 August 2023 (UTC)Reply
The problem with inland waterway vessel (Q863970) (which used to be called riverboat) was that it was a subclass of ship (Q11446) but it appeared that most of its subclasses were actually classes of boat (Q35872). So, as part of improving the ship/boat situation in general, it was changed to be a subclass of boat (Q35872). See the discussion in https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Ships#problems_with_ship_types and subsequent topics. An alternative way to go would be to generalize the class to a watercraft type (inland waterway vessel) and create new subclasses for river boats and river ships. But creating a class that is a subclass of both boat and ship isn't a workable solution, as the two classes are mostly disjoint. @Vicarage Peter F. Patel-Schneider (talk) 16:42, 14 August 2023 (UTC)Reply
Works for me Vicarage (talk) 17:06, 14 August 2023 (UTC)Reply

metaclasses documentation

edit

Hello, Peter. My English is not native so I have some difficulties in understanding. Can you please elaborate what do you mean by "These implicit instance of (P31) statements are not all observable in Wikidata."? Infovarius (talk) 19:10, 24 August 2023 (UTC)Reply

@Infovarius It's similar to the implicit instance of (P31) relationships that come from subclass of (P279) links, e.g., Douglas Adams (Q42) is implicitly an instance of mammal (Q110551885) even though the relationship is not (directly) observable in Wikidata. The difference is that the instance of (P31) relationships between, for example, human (Q5) and class (Q16889133) can't even be found by following links that exist in Wikidata so they are in some sense less observable. Nonetheless the relationship follows from the meanings of human (Q5) and class (Q16889133). Peter F. Patel-Schneider (talk) 19:22, 24 August 2023 (UTC)Reply

Sulfoxides

edit

Could you elaborate this edit? You did not add any reason for deprecated rank (P2241) qualifier and AFAIK only compounds with sulfinyl group substituted with two organyl groups are considered sulfoxides, while compounds with other groups are classified in different classes. Wostr (talk) 23:58, 14 December 2023 (UTC)Reply

@Wostr I was working from octasulfur monoxide (Q73246974) which, as far as I can see, is not organic. This causes a disjointness violation between organic compound and inorganic compound. See chemical compound for the disjointness.
As well, the English description of sulfinyl (Q1747806) is "chemical compound containing the sulfinyl group" and sufinyl is "functional group consisting of sulfur double-bonded to oxygen". So carbon is not mentioned at all.
So, two indications that the relationship is incorrect. I suppose I should have just deleted it, but I decided to only deprecate it.
If the description is incorrect then it needs to be changed and the incorrect subclasses need to be fixed up.
What reason for deprecated rank could be used for situations like this? Peter F. Patel-Schneider (talk) 00:12, 15 December 2023 (UTC)Reply
Thanks for quick response and this explanation, I'll check it later with the sources, but right now: (1) I believe that octasulfur monoxide (Q73246974)subclass of (P279)sulfoxide (Q408395) is not correct; maybe there is some class that would better describe such an entity, but I can't come up with it quickly; (2) all sulfoxides have sulfinyl (Q1747806), but not all compounds with this group are sulfoxides; usually inorganic compounds with this group are called 'thionyl compounds', like thionyl chloride (Q409171); however, after this edit, we have 'thionyl compounds' as an alias of 'sulfoxides', while in the same time we have this separately (cf. Q8852386) in some Wikimedia projects; (3) I believe that sulfoxides are exclusively organosulfur compounds and that deprecated statement had a correct 'normal' rank, but I won't revert this now, I'll check if there are any other problems with subclasses of sulfoxide (Q408395) and try to fix this later; (4) I didn't know what was the reason for your edit and this deprecation, so I couldn't propose any reason for deprecated rank (P2241) value for this situation; given your explanation I'd say that possibly invalid entry requiring further references (Q35779580) or does not always apply (Q90177495) could be okay here (probably with an explanation on a discussion page) – but as I said, I will check this later and try to fix this issue as it can be a problem with many more items. Wostr (talk) 00:41, 15 December 2023 (UTC)Reply

Invitation to participate in the WQT UI requirements elicitation online workshop

edit

Dear Peter_F._Patel-Schneider,

I hope you are doing well,

We are a group of researchers from King’s College London working on developing WQT (Wikidata Quality Toolkit), which will support a diverse set of editors in curating and validating Wikidata content.

We are inviting you to participate in an online workshop aimed at understanding the requirements for designing effective and easy-to-use user interfaces (UI) for three tools within WQT that can support the daily activities of Wikidata editors: recommending items to edit based on their personal preferences, finding items that need better references, and generating entity schemas automatically for better item quality.

The main activity during this workshop will be UI mockup sketching. To facilitate this, we encourage you to attend the workshop using a tablet or laptop with PowerPoint installed or any other drawing tools you prefer. This will allow for a more interactive and productive session as we delve into the UI mockup sketching activities.

Participation is completely voluntary. You should only take part if you want to and choosing not to take part will not disadvantage you in any way. However, your cooperation will be valuable for the WQT design. Please note that all data and responses collected during the workshop will be used solely for the purpose of improving the WQT and understanding editor requirements. We will analyze the results in an anonymized form, ensuring your privacy is protected. Personal information will be kept confidential and will be deleted once it has served its purpose in this research.

The online workshop, which will be held on April 5th, should take no more than 3 hours.

If you agree to participate in this workshop, please either contact me at kholoud.alghamdi@kcl.ac.uk or use this form to register your interest https://forms.office.com/e/9mrE8rXZVg Then, I will contact you with all the instructions for the workshop.

For more information about my project, please read this page: https://king-s-knowledge-graph-lab.github.io/WikidataQualityToolkit/

If you have further questions or require more information, don't hesitate to contact me at the email address mentioned above.

Thank you for considering taking part in this project.

Regards Kholoudsaa (talk) 17:00, 19 March 2024 (UTC)Reply

Community Wishlist is back open...

edit

And I've submitted this wish that you endorsed on the old wishes sandbox. Your support may help, although I don't really know what to expect from the new "Focus Area" system. Thanks! Swpb (talk) 18:40, 15 July 2024 (UTC)Reply

I'll see what I can do to endorse the new wish. Peter F. Patel-Schneider (talk) 21:18, 15 July 2024 (UTC)Reply

Modeling of object and agent relations

edit

Hey Peter - I'm coming around to your position that selectional restrictions and roles should be handled separately for objects and agents, and I have a not-yet-public proposal I'd like your reaction to (here please) before I post it to the proposals list. That can serve as a model for agents.

Now as far as we handle agents, I don't think splitting selectional restrictions from roles forces us to split instances from classes; I can see one property that takes either particular agents or classes to which those agents must belong, and another, "agents of action have role" for roles. But, I recognize that a few people's sentiments go the other way, so I'm ok with a three-way division if it gets us over the finish line. If "objects of action have role" succeeds, I will put together a corresponding three-property proposal for agents: "agent of action"/"agent class of action"/"agents of action have role", and suggest that Lectrician1 withdraw Wikidata:Property proposal/agent of action, which seems to be heading down anyway.

Now, I suppose this opens the door for such three-way splits of a lot of the other properties I mapped to semantic roles on Wikidata:Property proposal/has semantic role (2nd proposal), but I don't think I'm ready to jump into that prospect with both feet. E.g., I think if we were to split uses (P2283) into "uses specific item", "uses items of the class", and "objects used have role", we'd end up with so many erroneous statements by confused editors that we'd be worse off than now – and anyway, the latter can usually be handled by qualifying with object of statement has role (P3831). Swpb (talk) 18:11, 15 August 2024 (UTC)Reply

@Swpb I think that any property that is used in the same way as object of occurrence (P12912) or "agent of action" will need the same treatment. So maybe the best approach is to use a qualifier, although that has the problem of repeating the property each time a different role is wanted. It might even be possible to use (abuse?) object of statement has role (P3831) for this purpose, although then semantic roles for which there is no thematic property might be clumsy. The point is to have a solution that does not require lots of properties but that can nicely specify all arguments of an action, the selectional preference for arguments, and the role of arguments. This is what the "has semantic role" proposal was driving towards as part of https://www.wikidata.org/wiki/Wikidata:WikiProject_Events_and_Role_Frames.
I'm in favour of revising the agent of action proposal if that is reasonable and possible, instead of starting from scratch. Peter F. Patel-Schneider (talk) 19:07, 15 August 2024 (UTC)Reply
Re the current agent of action proposal, it seems Lectrician1 has made the question moot.
When you talk about "semantic roles for which there is no thematic property", I have to say again that I don't think there are any such semantic roles, at least major ones, except for agent and maybe experiencer. If you think there are other unmapped roles, please share them!
I understand what the intent of "has semantic role" was, but it doesn't solve the problem so much as turn it into a different, much bigger one: the classic problem I keep bringing up, but which no one wants to acknowledge, of generic properties being magnets for abuse and requiring constant cleanup. If you're serious about separately expressing action arguments, selection classes, and the (non-semantic) roles taken on by those arguments, I think the only way to do it right is with properties that are semantic-role-specific.
However, that brings us to object of statement has role (P3831) - using it to indicate the (non-semantic) role of an action argument (or of its selection class) would not be an abuse, it would be a sub-case of exactly what that qualifier was intended for – as long as the action argument is the object of a main statement and not a qualifier. That mostly* obviates the need for a bunch of separate properties for non-semantic roles of action arguments. *(Where the value of the main statement is the action itself, rather than an argument of that action, is where you could need properties like "objects of action have role" [see examples 4 & 5], but outside of the object and agent semantic roles, I'm not convinced there's much need for that kind of statement. Some SPARQL querying could shed light on that.)
Which leaves the instance vs. (selectional) class distinction. I'm ok (not thrilled, but that's fine) with separate properties for those when it comes to the object and agent roles. For other semantic roles, I think we'd need to consider each one, and whether there is more to be gained from such separation than will be lost to added complexity and editor confusion. I still think it's more or less always inferable whether the object is an instance or selection class. But that's a conversation that can continue another day; it shouldn't impede getting properties made for the agent role. Swpb (talk) 23:36, 15 August 2024 (UTC)Reply
Addendum: here is a SPARQL query of cases where object of statement has role (P3831) might be being misused to indicate the role of an action argument that is not the main statement value. No more than 15 such statements in all of Wikidata. More could be lurking using of (P642), which is much harder to query, but I suspect if there are, the vast majority will be expressing the roles of agents, or to a lesser extent undergoers, vs. any of the other major semantic roles. Swpb (talk) 16:58, 16 August 2024 (UTC)Reply
@Swpb I think that query is missing a * for the P279. With the * it runs out of time in the WDQS but returns 6830 matches in the QLever Wikidata query service (with appropriate changes for labels). Most of them are due to errors in the Wikidata ontology but there are a bunch for conflict and some other properties that look relevant. Peter F. Patel-Schneider (talk) 14:17, 20 August 2024 (UTC)Reply
Ok, so more work needed to determine the need for role-of-action-argument properties for non-object/agent semantic roles. But are we in agreement on the three-way approach for objects and agents of actions? Swpb (talk) 14:23, 20 August 2024 (UTC)Reply
@Swpb Yes.
I put together my guesses as to how some of the PropBank frames would be represented and how some existing and proposed Wikidata action classes and instances might have their thematic arguments represented in User:Peter_F._Patel-Schneider/propbank frames. This is very much a work in progress. Peter F. Patel-Schneider (talk) 15:31, 23 August 2024 (UTC)Reply
Ok, then could you support Wikidata:Property proposal/objects of action have role? I'm still putting together the three-property proposal for agency, will let you know when that's up.
Regarding your mapping of PropBank frames, I think there needs to be more clarity on how your proposed properties will relate to the existing properties that cover those semantic roles:
Existing Proposed (instance-valued) Proposed (class-valued)
uses (P2283) "instrument of action" "instrument class of action"
source of transfer (P12693) "source of action" "source class of action"
destination of transfer (P12694) "destination of action"
"recipient of action"
"destination class of action"
"recipient class of action"
has effect (P1542) "result of action" "result class of action"
has goal (P3712) "goal of action" "goal class of action"
location (P276) "location of action" "location class of action"
Noting that the existing properties are mostly not limited to the domain of actions, it appears your proposed properties would all be sub-properties of the existing ones. Do you intend to migrate statements using the existing properties to your proposed ones? Given the long history of these existing properties being "overloaded" in the sense of accepting both an instance-valued use case and a class-valued one (and, IMO, a lack of problems caused by this), I think I will not be the only one pushing back (once you get to creating proposals) against the idea that it is necessary or desirable to split the properties in this way.
I would also suggest than in some cases where you have object of statement has role (P3831)="??", there simply isn't a context-specific "role" for the statement object, and the qualifier isn't needed:
Some other observations (recognizing I'm critiquing a work in progress): What is the "recipient" of "creation", as distinct from the "thing created"? Also, it looks like you have erroneous Q-items in a few places (canon (Q53831), meal (Q6460735)). Cheers, Swpb (talk) 15:25, 26 August 2024 (UTC)Reply
Let me mull this over a bit longer. Peter F. Patel-Schneider (talk) 01:02, 27 August 2024 (UTC)Reply
Ok – in the meantime, I've put up the agent proposal: Wikidata:Property proposal/agent of action & agent class of action & agents of action have role. Swpb (talk) 19:30, 27 August 2024 (UTC)Reply
It's taken a longer time than I expected to respond. In general, I agree with the idea of using existing properties as much as possible for thematic relations and creating new ...class... properties. I'm still ambivalent about whether it is better to create the third set of properties but I'm OK with it. So, count me, belatedly, in. Peter F. Patel-Schneider (talk) 13:33, 13 September 2024 (UTC)Reply
Thanks. I thought you were in favor of the "...has role" properties, at least for agents and objects; is it just on other relations that you're not sure about them? At some point I will try to compile a list of all the current properties that are technically overloaded with respect to taking instances, selection classes, and roles, and we can see where it may be prudent to think about splitting. Swpb (talk) 15:40, 13 September 2024 (UTC)Reply
I think either way is fine, so long as the extra properties go through.
Would it be worthwhile to create a class with all these properties as instances? Peter F. Patel-Schneider (talk) 15:49, 13 September 2024 (UTC)Reply

SPARQL and "mul"

edit

Hi, I'm curious (and maybe I could help) about what you said on Help_talk:Default_values_for_labels_and_aliases#SPARQL_querying. What queries do you runs exactly? also, what endpoint do you use that is not using Blazegraph? (the only two really working endpoint I know use Blazegraph, which is a big problem in itself as Blazegraph is basically dead...)

Cheers, VIGNERON (talk) 14:36, 22 August 2024 (UTC)Reply

I have been running a lot of queries. Here is one
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?first ?firstLabel WHERE {
{ ?i wdt:P31 ?first. } UNION { ?sub wdt:P279 ?first . } UNION { ?first wdt:P279 ?super . }
OPTIONAL { ?first rdfs:label ?firstLabel . FILTER ( lang(?firstLabel) = 'en' ) }
}
I run this on https://qlever.cs.uni-freiburg.de/wikidata which runs queries against the latest Wikidata RDF dumps, updated weekly (mostly). The SPARQL engine is QLever. The above query runs in about 23s on the QLever service but times out in BlazeGraph.
But QLever does not have the non-standard BlazeGraph extensions so you need the OPTIONAL stuff for each label. With mul these bits become a lot larger. Peter F. Patel-Schneider (talk) 15:02, 22 August 2024 (UTC)Reply
Ok.
First, QLever do use Blazegraph has the SPARQL engine and yes, QLever never had the SERVICE wikibase:label, so labels need to be retrieved like in normal SPARQL. If I understand your need and your query correctly, you are not really concerned by the "mul" label, why not just keep the query as it is?
Cheers, VIGNERON (talk) 15:28, 22 August 2024 (UTC)Reply
I'm pretty sure that the QLever service does not use BlazeGraph. I've been talking with the QLever people and they have been making changes to the underlying SPARQL server. What evidence to you have that QLever uses BlazeGraph? Peter F. Patel-Schneider (talk) 15:49, 22 August 2024 (UTC)Reply
Indeed, I mixed things, sorry.
Anyway, in your example above, you don't need to change anything. Do you have any other examples I could look at?
Cheers, VIGNERON (talk) 16:07, 22 August 2024 (UTC)Reply
The above query needs to be changed because the idea behind mul, as I understand it, is to remove the en labels if they are the same as the mul label. So the OPTIONAL line will have to be changed to the three-lines suggested elsewhere. I think I have some queries online already - I'll look for them and get back to you later today or tomorrow. Peter F. Patel-Schneider (talk) 17:50, 22 August 2024 (UTC)Reply
True but "mul" will be used for instance, not class, by nature (as instances can have redundant labels while classes rarely do) and your query is about classes. Unless there is something I missing, this specific query don't need to be changed. No hurry, I'll wait for more examples to give more answers (because my three-lines suggestion is just a suggestion and not always the best solution depending of the case). Cheers, VIGNERON (talk) 06:38, 23 August 2024 (UTC)Reply
Not necessarily. Consider biology taxa.
In any case, lots of my queries that expect to end up with classes do end up with individuals.
You can look at https://www.wikidata.org/wiki/User:Peter_F._Patel-Schneider/ontology_cleaning_project_class_order for some examples of the queries I am running.
I also write novel queries just about any day I am working with Wikidata and many of them require QLever. Having to add the one-line OPTIONAL is a pain. Having to write the three-lines will be a royal pain. Peter F. Patel-Schneider (talk) 14:53, 23 August 2024 (UTC)Reply
True but why go through a "royal pain" if it's not needed in 99 % of the cases?
Also, it seems that your problem is with QLever more than with "mul"... I can't really help with that.
Cdlt, VIGNERON (talk) 09:44, 24 August 2024 (UTC)Reply
The problem is not with QLever, it is with the interrelationship between SPARQL and how Wikidata will provide labels for items using the mul language. BlazeGraph has a special, non-standard construct that makes this somewhat easier. Peter F. Patel-Schneider (talk) 11:22, 24 August 2024 (UTC)Reply

measuring/calculating geographic distribution/density of information in wikidata

edit

Hi Peter F. Patel-Schneider! It was nice to meet you and learn a bit about your work on last meeting. As you likely noted I am focused on content gaps, so I am also curious of gaps in Wikidata. Would you be interested maybe next Tuesday to think of how to approach measuring geographic distribution/density of information in wikidata? My intuition that basic geographic data and related items are more-less well distributed, but cultural (institutions and individuals) and social data (organizations and events) might be hyper dense in some areas (especially around bigger urban areas where Wikimedia affiliates exists), while in others might be very few and/or possibly missing all together. I would love to check this in relation to contemporary times and the teritory of former SFR Yugoslavia (Macedonia, Slovenia, Serbia and Republika Srpska have Affiliates, while Croatia, Bosnia&Herzegovina, Montenegro have not). Is this something that can be done (at least in fuzzy and quick-and-dirty way) within 30 minutes? I could prepair some categories to look into in advance from Wikipedias, Commons and ...Wikicite? @Millodarka might be able to join us. Zblace (talk) 19:51, 29 August 2024 (UTC)Reply

@Zblace
Tuesday might be more devoted to the PropBank stuff (but maybe not). I think there are tools that show where things are in Wikidata so one of these might be able to nicely show geographic distribution (so long as location information is present). Peter F. Patel-Schneider (talk) 12:28, 30 August 2024 (UTC)Reply
@Peter F. Patel-Schneider sure... sorry if I abused their no-show last time :-) If you think it is inadequate to do something like this within next sessions all together (I promise no hard fealings) I am also happy to try establishing separate dedicated session later in the week...would you consider to join such setup also? -- Zblace (talk) 18:08, 1 September 2024 (UTC)Reply
The discussion last week was good, I think. I'm happy having a talk towards the end of the week. How about Thursday morning (any time before noon)? Peter F. Patel-Schneider (talk) 22:22, 1 September 2024 (UTC)Reply

Where the graph model suffers

edit

Hello. I noticed your comment in the talk with the Search team but I was to late to give any input. I believe your particular example was P279*. Blazegraph is actually nice enough to aggressively parallelize that part of a query for us. But we do take a hit of performance for every step we follow along a graph, and so when those chains get long we have a problem, one that could potentially be fixed by limiting the size of those chains. Another property that is notoriously problematic is P131*. Since this can go all the way from state to neighboorhood we have a problem of long chains - imposed solely by our data model. Queries that employ P131* already frequently breaks, so the WMFs move to segregate scholarly articles was actually overdue as it will relieve a lot of the pressure built up over the recent years. Infrastruktur (talk) 17:04, 10 September 2024 (UTC)Reply

@Infrastruktur:
My experience is that it doesn't take much depth to kill Blazegraph. Here are queries for the subclasses of gene (Q7187).
SELECT ?c WHERE { ?c wdt:P279* wd:Q7187 . }
WDQS
after quite a bit of time I get
Server error: JSON.parse: unterminated string at line 5019004 column 56 of the JSON data
QLever
1,008,545 lines found in 260ms, after clearing cache
SELECT ?c WHERE { ?c wdt:P279 wd:Q7187 . }
WDQS
453791 results in 1861 ms
SELECT ?c WHERE { ?c wdt:P279/wdt:P279 wd:Q7187 . }
WDQS
996082 results in 5626 ms
SELECT ?c WHERE { ?c wdt:P279/wdt:P279/wdt:P279 wd:Q7187 . }
WDQS
22 results in 16212 ms
SELECT ?c WHERE { ?c wdt:P279/wdt:P279/wdt:P279/wdt:P279 wd:Q7187 . }
WDQS
6 results in 8438 ms
SELECT ?c WHERE { ?c wdt:P279* wd:Q7187 . OPTIONAL { ?c rdfs:label ?cLabel . FILTER ( lang(?clabel) = 'en' ) } }
WDQS
Query timeout reached
QLever
1,008,545 lines found in 1,253ms, after clearing cache Peter F. Patel-Schneider (talk) 19:14, 10 September 2024 (UTC)Reply

RfC

edit

See https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/object_vs_design_class_vs_functional_class_for_manufactured_objects, I tried to ping you there. Vicarage (talk) 17:08, 20 September 2024 (UTC)Reply

@Vicarage I was thinking of restricting this to physical objects created by humans. I think that that would better focus the discussion, as I can envision some thinking about how Mount Everest (for example) would fit into the discussion.
Should I just edit the RfC? Peter F. Patel-Schneider (talk) 18:40, 20 September 2024 (UTC)Reply
I'd restrict it to manufactured items and their designs, so not one-offs like artworks (which I only mentioned to show it wouldn't be affected). Go ahead and modify the RFC. Vicarage (talk) 18:46, 20 September 2024 (UTC)Reply
@Vicarage OK, I did that. I also rewrote a bunch of the first half of the RfC as I found it confusing. Take a look. Peter F. Patel-Schneider (talk) 20:30, 20 September 2024 (UTC)Reply
Thanks for doing that. So do you think its a workable plan, and can be applied to watercraft/military stuff (what I really care about), if not the world? I wonder what pushback we will get from people who like ship/ship_class/ship_type triplets, and don't understand you only need ship. Vicarage (talk) 20:41, 20 September 2024 (UTC)Reply
@Vicarage I've only really completely looked at the first section. The rest may have to wait for tomorrow. (I may have to do some things for my brother shortly.) But I may also be able to take a look at it this evening. Peter F. Patel-Schneider (talk) 20:44, 20 September 2024 (UTC)Reply
@Vicarage OK, I think I understand better after modifying the problem statement and the re-reading the solution.
As far as I can see there are two problems:
  • Designs and models are being placed as P31 to functional classes.
  • Categorizing designs and models is difficult because they need to be placed in separate P279 hierarchies that are generally only partial reflections of the functional classes. (I have to rewrite the last bit of the introduction to better state this.)
The proposal as I see it is to
  • Have functional class P279 hierarchies over manufactured objects, probably a single hierarchy rooted in something like manufactured object.
    • No functional class is P279 a design class (or a manufactured object, but there shouldn't be any of these currently in Wikidata).
    • No physical objects are P279 any functional class.
    • The only P31 a functional class are actual manufactured objects.
      • This implies that no design class is P31 to any functional class
  • Collapse the various design class hierarchies into subsidiary P279 hierarchies.
    • Each design class can be a P279 to other design classes.
    • Each design class is a (potentially indirect?) subclass of at least one functional class.
    • Distinguish between models, families, series, etc by creating a single class for each. (Maybe only model and family?)
      • Create a class (potentially just called design) that is a superclass of all these lasses. (I added this, see below.)
    • No "higher" design class can be P279 a "lower" design class, e.g., no family can be P279 a model.
  • Use a new property to relate manufactured objects to their design class
    • This would best be a model, but might be a family or series or ....
I fully agree with the first part. This is the only way to go. Any violation of this currently in Wikidata is just an error.
The second part has positives and negatives. On the positive side it allows for a simpler setup, particularly if there is the extra class as well. This would allow, for example, Spitfire to just be categorized as an instance of design it the creator did not need it to be a family or series or whatever. On the negative side it is harder to state what relationships are not allowed between design classes. There also may be a loss in expressive power but I'm not finding anything lost just yet.
I'm not so keen on the third part. It might make things clearer, but if a model is a subclass of a functional class then it seems to me that the model should have as instances the instances of the functional class that belong to the model. I know that the ship domain uses vessel class, and that vessel class is a subproperty of P31. I think I could only be convinced to support this if Pdesign was a subproperty of P31. This part has, I think, the biggest effect on users of Wikidata. (I note that it might be possible to take over Pvessel class and rename it.)
I'm going to fix up the first part of the RfC to better match my current understanding. Peter F. Patel-Schneider (talk) 22:05, 20 September 2024 (UTC)Reply
I think that's a good interpretation of my idea, I'll try to incorporate it in the RFC, but going out tomorrow. As far as part 3 goes, I agree from a purist standpoint an object merely needs to be an instance of a design, but when that design label is "Project 17" as the Russians call things, it makes it much more readable to have a P31 of the functional class there as the most important property any object has. I think we are spoilt with ship classes that have the form Flower-class corvette (one task I plan is to bulk replace all P31 ship with a P31 of the ship's vessel_class P279, which shows the redundancy), too many design classes are obscure or have different conventions, like adding manufactured for aircraft, but never fighter or bomber. And for every Sony Walkman we have many CDP-34A Vicarage (talk) 22:27, 20 September 2024 (UTC)Reply
I fixed up (I think) the last bit of the first part and the first bit of the second part. I'm not sure that the rest exactly follows and I may be able to work it over tomorrow. Peter F. Patel-Schneider (talk) 22:30, 20 September 2024 (UTC)Reply
I like the idea of broadening vessel class (P289), perhaps just calling it "design". My plan of merging all the *_model items might be scuppered by their wikipedia articles, but there is plenty of work just applying the rules we've agreed to existing designs, for example many of the 3000 results here are wrong
SELECT DISTINCT ?item ?itemLabel ?article WHERE {
 SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
  {
   SELECT DISTINCT ?item ?article WHERE {
     
           {?item wdt:P279* wd:Q10929058}
    UNION {?item wdt:P279* wd:Q811701}
   }
  }
 }
Try it!
Once we've fixed the designs, giving objects a P31 P289 pair should be straightforward. For the X_model/faimily entries, I'd flatten it out so they were all just P31 design_family/model, assuming people select on the P279 tree. Vicarage (talk) 07:52, 21 September 2024 (UTC)Reply
@Vicarage Yes, I noticed the problem with items that cannot be removed because they have Wikipedia entries. I have a potential solution for that that I'll put in a separate thread, as this one is getting too deep. Peter F. Patel-Schneider (talk) 12:42, 21 September 2024 (UTC)Reply
Indeed there are many cases where design classes are subclasses of the design metaclasses. I noticed this with aircraft model (and it also appears that there are some model aircraft classes intruding). Peter F. Patel-Schneider (talk) 12:50, 21 September 2024 (UTC)Reply


Here is another thing that the RfC should mention:

  • No design class can be a subclass of the new classes.

(Look at ATL-90 Accountant (Q790829) and several other subclasses of aircraft model (Q15056995) for examples of current similar problems.)

There is one serious problem with the approach as I see it - it will have the effect of removing information from existing classes in Wikidata or even eliminating some classes entirely. Consider aircraft model (Q15056995). Either it will be removed or it will be separated from items in Wikidata that are aircraft models, like the Spitfire models. There might be queries that have this item in them and such queries will likely produce little or none of the results that they currently return. It also might not be possible to remove some of these items because they have Wikipedia entries, as product model (Q10929058) does.

The right solution to this problem is to have "defined" classes in Wikidata - classes whose instances are given by formulae. Then it is not necessary to state that some item is an instance of aircraft model (Q15056995) because instances of aircraft model (Q15056995) are defined as items that are both instances of Q"model" and subclasses of airplane (Q197). Wikidata already has a property whose purpose appears to be close to this - Wikidata SPARQL query equivalent (P3921). Could it be used here? See Category:1897 films (Q6274641) for an example of how this might work (although the query times out for me).

I've been looking at classes like vehicle component (Q60673395) and aviation equipment (Q4055832). They have some of the characteristics of design classes. (There are certainly lots of problems there, at the least.) But they also have some differences. I don't think that it is worthwhile to further broaden the RfC to include them but I might put something together as a separate notion.

regarding ?item wdt:P279 aircraft model (Q15056995) I saw those and skipped over fixing them. I think they should be renamed to make clear they are P31 of a model_class. A model class should only be for functional classes, not designs. Removing aircraft model (Q15056995) will break queries, but a new query can be trivially written using P31 class P279* aircraft, and it was always arbitrary whether something was filed under it or combat aircraft model (Q124054999). I really don't understand the defined classes idea, but formulae driven classes seems a step too far, especially if its slow.
one tidy we can do is to ensure that all P31 of X_model are P279 of X, (or a more specific subclass, as in aircraft_model and land_based_fighter_monoplane). More generally we should define what a X_model contains, currently every one seems to have a different combination of class, instance, metaclass and the like.Vicarage (talk) 21:07, 21 September 2024 (UTC)Reply
SELECT DISTINCT ?item ?itemLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
  {
    SELECT DISTINCT ?item WHERE {
      ?item wdt:P31 wd:Q15142889
      MINUS {wd:Q728 ^wdt:P279* ?item}
      
    }
  }
}
Try it!

is interesting, many things called weapon_families end up under the military vehicle tree, and you also get gun mounts, functional classes of weapon in local cultures and other clutter. So while you might think a query for P31 weapon_family is useful, its actually a lot less useful than a P279 of weaponVicarage (talk) 21:36, 21 September 2024 (UTC)Reply

@Vicarage Yes, it is likely that different kinds of things end up being P31 to the X_model, X_family, etc., classes because of their sparsity. Think of someone who is creating these things and trying to find where to classify them. There is very little guidance and it is very hard to navigate around and find the right place. Peter F. Patel-Schneider (talk) 11:10, 22 September 2024 (UTC)Reply
Do you think we are ready to advertise the RfC now? I've created a Queries subpage to collate examples of use and problems. Vicarage (talk) 08:27, 26 September 2024 (UTC)Reply
@Vicarage I tidied up the example section and collected the bits that talk about the disposition of the metaclass hierarchies. I think it is all ready to go now. Peter F. Patel-Schneider (talk) 14:00, 26 September 2024 (UTC)Reply
Well, that didn't go well. People seemed to miss-read my plan, and wanted the very granular muddle I wanted to remove. As I already work round the mess, I don't plan to do any more in this area Vicarage (talk) 18:48, 2 October 2024 (UTC)Reply
@Vicarage Indeed. I'm not sure what the way forward should be. Peter F. Patel-Schneider (talk) 18:52, 2 October 2024 (UTC)Reply

model series *2 , proposed aircraft

edit

Can you look at model series (Q31836768) and model series (Q811701) and decide if they can be merged? I've fixed a lot of the glaring errors in my SPARQL report above, but there are some subtitles, like proposed aircraft (Q15061018) I'd appreciate your opinion on. I don't think it should be a *_model thing Vicarage (talk) 20:35, 21 September 2024 (UTC)Reply

@Vicarage
As far as I can tell by translating the German Wikipedia entries, model series (Q811701) is a series of models, such as different models of a car (by year) or different sizes of a small part. model series (Q31836768) is used for actual sequences of physical objects, perhaps the actual nails in a display of the different sizes of nails in a product line. So the former is relevant to our discussions, with the latter much less so. I've updated the description of model series (Q31836768). Peter F. Patel-Schneider (talk) 11:06, 22 September 2024 (UTC)Reply

former entity

edit

User talk:Infovarius#former_entity Andres Ollino (talk) 23:41, 21 September 2024 (UTC)Reply

@Andres Ollino I replied there. Peter F. Patel-Schneider (talk) 11:33, 22 September 2024 (UTC)Reply

Military matters

edit

I've fixed weapons and military vehicles, of which there are very few physical objects, and had a good go at aircraft, though someone has added 3000 odd individual aircraft, which I can't face fixing. I use this query to spot them, as I guessed anything with a location or serial number was an object

SELECT DISTINCT ?item ?itemLabel ?instanceLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
  {
    SELECT DISTINCT ?item ?instance WHERE {
      ?item wdt:P31/wdt:P279* wd:Q216916.
      ?item wdt:P31 ?instance. 
      ?item wdt:P31/wdt:P279* wd:Q11436.
      MINUS {?item wdt:P276 ?location}
      MINUS {?item wdt:P131 ?loc1}
      MINUS {?item wdt:P426 ?reg}
      MINUS {?item wdt:P195 ?collection}
      MINUS {?item wdt:P625 ?loc2}
      MINUS {?item wdt:P2598 ?serial}
      MINUS {?item wdt:P576 ?destroyed}
    }
  }
}
Try it!

Vicarage (talk) 16:09, 22 September 2024 (UTC)Reply

What I do in these cases is write the query, look at the results, and then massage the output into a QuickStatements file to fix the noticed problems. Peter F. Patel-Schneider (talk) 18:45, 22 September 2024 (UTC)Reply
@Vicarage I turned this around to find actual planes - almost all of them are part of what appears to be a complete list of planes in the US Navy, with only the call number and type of plane given as useful information. If this is prevalent in Wikidata there is a reason why it is bursting at the seams.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?item ?itemLabel ?classLabel WHERE {
  {
    SELECT DISTINCT ?item ?class WHERE {
      ?item wdt:P31/wdt:P279* wd:Q216916.
      ?item wdt:P31 ?class. 
      ?item wdt:P31/wdt:P279* wd:Q11436.
 } }
     {?item wdt:P276 ?location} 
      UNION {?item wdt:P131 ?loc1}
      UNION {?item wdt:P426 ?reg}
      UNION {?item wdt:P195 ?collection}
      UNION {?item wdt:P625 ?loc2}
      UNION {?item wdt:P2598 ?serial}
      UNION {?item wdt:P576 ?destroyed}
 OPTIONAL { ?item rdfs:label ?itemLabel . FILTER ( lang(?itemLabel) = 'en' ) }
 OPTIONAL { ?class rdfs:label ?classLabel . FILTER ( lang(?classLabel) = 'en' ) }
Try it!
Peter F. Patel-Schneider (talk) 18:58, 22 September 2024 (UTC)Reply
Indeed, but the person who did it, Joshbaumgartner, responded to your aircraft engine query and otherwise seems sensible. Odd things people. Vicarage (talk) 19:19, 22 September 2024 (UTC)Reply

Products with only one made

edit

How do you think we should handle manufactured classes when only one example was produced, which is quite likely to be notable and so preserved. We want uniformity of query with multi-item product lines, but understandably there will be single Wikipedia article for the the design. For ships, I've been using vessel class (P289) <no value>, we have total produced (P1092) and even unique aircraft model (Q118984909) (which I'd not want repeated)

Should a one-off battleship be both instance of (P31) and subclass of (P279) of battleship (Q182531), and instance of (P31) ship class (Q559026) (or eventually design_model) and have total produced (P1092) 1. It makes sense in query terms, but reads oddly. However we write the label and description it will look odd in one context or another. Vicarage (talk) 07:07, 23 September 2024 (UTC)Reply

If it is necessary to have both a physical object and the design, then make two items. But I don't know how often this happens. I think that in many cases your no value solution will suffice and in many cases there is no need to have any design mentioned at all (i.e., just don't have a vessel class (P289) claim). Peter F. Patel-Schneider (talk) 11:15, 23 September 2024 (UTC)Reply
A good reason not to have 2 items is that the Wikipedia entry can't be assigned to both. Unique objects are not common in absolute terms, but they need to appear in _model lists, and if a prototype for a new sort of object, are certainly important. Later items that were only made once when their rivals were made in dozens are just footnotes in history. I am inclined to overload them with properties of object and class, but perhaps instead of vessel class (P289) no_value we have instance of (P31) unique object (Q1411738). I did create unique ship (Q974686) some time back, but I've gone off these alternate hierarchy objects, so would use unique object (Q1411738) now Vicarage (talk) 13:07, 25 September 2024 (UTC)Reply
@Vicarage I prefer the instance method.
Why can there be only one Wikidata item for each Wikipedia page? There are lots of Wikipedia pages that are about more than one thing. Even DBpedia allows multiple entries for a page. Peter F. Patel-Schneider (talk) 14:05, 25 September 2024 (UTC)Reply
I agree it should be allowed, but it isn't, such that there is a software lock that prevents the adding of an existing WP reference to a second WD item. I had a look round the project pages, but there isn't a clear explanation of why. Vicarage (talk) 14:19, 25 September 2024 (UTC)Reply

Selecting a product model, then a functional class worryingly slow using WDQS

edit
SELECT DISTINCT ?item ?itemLabel ?instanceLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
  {
    SELECT DISTINCT ?item ?instance WHERE {
      {?item wdt:P31/wdt:P279 wd:Q10929058}
      hint:Prior hint:runFirst true.
      {?item wdt:P279* wd:Q1184840}
    }
  }
}
Try it!

times out, which is worrying, as its would be the way to select product_models if we get rid of the specific ones. Which is strange as there only 20000 _models that it needs to do a full P279 tree scan against. I've tried the ^ and WITH optimisations, but they don't help. This is very disappointing given QLever can do it in half a second

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?item ?itemLabel WHERE {
   {?item wdt:P31/wdt:P279 wd:Q10929058}
   {?item wdt:P279* wd:Q1184840}
 OPTIONAL { ?item rdfs:label ?itemLabel . FILTER ( lang(?itemLabel) = 'en' ) }
}

Vicarage (talk) 09:12, 23 September 2024 (UTC)Reply

Excellent, and depressing, example. I'm going to remember it. I also put something in a split page, crediting you. See https://www.wikidata.org/wiki/Wikidata_talk:SPARQL_query_service/WDQS_graph_split#How_much_time_does_the_split_actually_give_us Peter F. Patel-Schneider (talk) 16:51, 23 September 2024 (UTC)Reply

Adding constraints to nudge people into correct behaviour

edit

I've updated country (P17) (see end) to encourage people to use country of origin (P495) rather than country (P17) for weapons, but we could broaden that to all manufactured object designs, as only physical objects have country (P17), an abused term across WD. We could also do location (P276), coordinate location (P625) etc. Vicarage (talk) 09:32, 23 September 2024 (UTC)Reply

Good Peter F. Patel-Schneider (talk) 11:18, 23 September 2024 (UTC)Reply

development status of X_model

edit

We have prototype aircraft model (Q15126161), prototype (Q207977) and prototype aircraft (Q76379517). It seems sensible to me that the prototype quality should be assigned with instance of (P31) to both X_model classes and X objects, and the specialist entries deprecated. This would fit in better with other P31 properties like abandoned project (Q21514702), and avoid yet another duplicated hierarchy . I'd do the same with aircraft conversion (Q17910379) but not experimental aircraft (Q1384417) (the latter is a common term). Similarly proposed aircraft (Q15061018) is a P279 of proposed entity (Q64728694) and aircraft model (Q15056995) and should be discouraged, as it could easily be mis-used as a functional class. It seems much better to have P31 to concentrate on the development features of the project, and P279 on the functional ones. Vicarage (talk) 08:52, 25 September 2024 (UTC)Reply

@Vicarage There certainly should be some common way to do things like this. I'm not sure exactly what the best way to do this should be, though. Peter F. Patel-Schneider (talk) 12:52, 25 September 2024 (UTC)Reply
We also have ship project (Q16214696). I've restricted its use to ship projects that are ongoing, not abandoned. Vicarage (talk) 08:31, 26 September 2024 (UTC)Reply
@Vicarage I don't know whether restricting to ongoing projects is the best approach. There are lots of instances of generalizations of this class that are completed projects. Peter F. Patel-Schneider (talk) 13:14, 26 September 2024 (UTC)Reply
The term "ship project" is meaningless in itself. What we want is a set of generic project management terms that can be applied to any manufactured project's development lifecycle, I guess we could use significant event, but from the outside it would be so hard to record them, so it might be easier to create a new terms "pre-production" "under development", or "onoging", to flag that some key features of a manufactured project, like its service entry or total produced are subject to change. But whatever we choose, using P31 rather than P279 seems preferable, to split the development of something from what is developed, and we don't want them to be parochial with ship_ prefixes. Vicarage (talk) 14:10, 26 September 2024 (UTC)Reply
@Vicarage Agreed. "ship project" is a subclass of "engineering project", which makes it something more general, I guess, and not just the development of a ship. I'm not sure what the best way forward would be here. Peter F. Patel-Schneider (talk) 14:44, 26 September 2024 (UTC)Reply

QLever database age

edit

Hi Peter, I'm trying to use QLever for some queries that time out in WDQS, and a lot of the results are out of date. Since you wrote a report on QLever back in January, I thought you might know:

  1. Is there a way to see the date that the Wikidata database was last updated on QLever?
  2. Is there a way to trigger a full or partial database update?
  3. Do you know if other external query tools might have a more up to date database?

Thanks! Swpb (talk) 17:07, 2 October 2024 (UTC)Reply

@Swpb Yes, click on "Index Information" to see something like: Full Wikidata dump from https://dumps.wikimedia.org/wikidatawiki/entities (latest-all.ttl.bz2 and latest-lexemes.ttl.bz2, version 26.09.2024) + English Wikipeda abstracts (version 27.09.2024, available via schema:description).
There is no way for us to update this, as QLever takes the most recent Wikidata RDF dump and loads it. This happens weekly so QLever is between a few days and almost two weeks behind. If Wikidata produced dumps more often QLever could load them as the load process takes a bit over 1/2 day.
Virtuoso is generally much further behind. I don't know about MilleniumDB.
If the WDQS made an RDF change feed available QLever might be able to set up a nearly-real-time service. Peter F. Patel-Schneider (talk) 17:58, 2 October 2024 (UTC)Reply
Thanks, great info! Swpb (talk) 18:30, 2 October 2024 (UTC)Reply
@Swpb The time for QLever to load Wikidata on high-end, but consumer grade, hardware is actually only about 4 hours. This appears to have remained roughly constant over a few years - as Wikidata grows CPUs and SSDs have increased in speed. Peter F. Patel-Schneider (talk) 11:16, 10 November 2024 (UTC)Reply
Interesting...that may come in handy. Thanks! Swpb (talk) 23:06, 13 November 2024 (UTC)Reply

Benchmarking

edit

Hi Peter, thanks for starting Wikidata:Scaling Wikidata/Benchmarking. You might be interested in related work at a recent hackathon and a follow-up one on Nov 15-16. Daniel Mietchen (talk) 03:25, 10 November 2024 (UTC)Reply

@Daniel Mietchen Thanks for the pointers. I would like to include whatever sets of queries where tested in my effort. Is that possible? Of are they all set up for the split version of the WDQS, in which case they are much less interesting to me because I'm not planning on working with a split Wikidata.
I have a test harness that runs queries against all the known Wikidata SPARQL endpoints - four currently. I've just (yesterday) run a first set of queries through the harness, with some interesting results that I plan on writing up in a day or two.
I'm interested in your next hackathon. Do you mind if I register? Peter F. Patel-Schneider (talk) 11:14, 10 November 2024 (UTC)Reply
There are three main sets of queries that we are working with: https://github.com/JervenBolleman/wikibase-sparql-examples as well as the JSON files in https://github.com/WolfgangFahl/snapquery/tree/main/snapquery/samples and the .sparql files in https://github.com/WDscholia/scholia/tree/master/scholia/app/templates. Note that the latter are usually parametrized with one or more target variables. Yes, please feel welcome to join the hackathon! --Daniel Mietchen (talk) 18:32, 11 November 2024 (UTC)Reply
@Daniel Mietchen Thanks. A lot of the Scholia queries include the WDQS label service, which makes them somewhat less than ideal. I'll probably work on including them as well, though. Peter F. Patel-Schneider (talk) 19:53, 11 November 2024 (UTC)Reply
Many of the Wikidata queries make use of Blazegraph-specific features. There are efforts to rewrite them into standard SPARQL — see the first of the three links above. --Daniel Mietchen (talk) 12:20, 15 November 2024 (UTC)Reply
@Daniel Mietchen I forgot to register for the hackathon until now. Can you send me participation information? Thanks. Peter F. Patel-Schneider (talk) 12:23, 15 November 2024 (UTC)Reply
Sure, will send in a moment. We will also continue work on our local QLever endpoint(s) that you are welcome to include in your test runs (sample query). --Daniel Mietchen (talk) 12:30, 15 November 2024 (UTC)Reply