Wikidata talk:WikiProject Names

From Wikidata
Jump to navigation Jump to search

Norwegian citizen Last names


I have had an excelent query made by @Tagishsimon giving me a list of all last names for norwegian citizens. Now I do have an excel spreadsheet containing (only) last names for about 12500 different items names for norwegian political prisoners during ww2. How can I compare these last names and eventually have new last names added to Wikidata? Breg Pmt (talk) 20:01, 24 October 2023 (UTC)[reply]

Depends on your needs and knowledge. OpenRefine certainly seems like a good start for this project. --Emu (talk) 07:42, 25 October 2023 (UTC)[reply]

Japanese names in name in native language (P1559) be specified with a language code "ja-hani" or simply "ja"?


Looking through the results of this query, a Japanese name in native language (P1559) are specified with a language code "ja-hani" (Japanese (Kanji script)). It sounds pretty odd for native Japanese speakers to specify its code purposely with "Japanese (Kanji script)", instead of "日本語 (ja)". Although it is accurate to apply "ja-hani" to Japanese names that are written only in Kanji, specifying these separate language labels could cause the difference in query results between "ja" and "ja-hani" names, which could be detrimental to information retrieval. Therefore, I'd like to ask you to change all their description to "ja" as a standard language label for Japanese names. Doraemonplus (talk) 11:30, 6 November 2023 (UTC)[reply]

I think ja is the right code to use. Script, country, etc, subtags are only intended to be used when they're actually necessary, not just because you can. Name items should have writing system (P282) with the writing system and name in kana (P1814) as a qualifier (for names in kanji), so I don't see why it would be necessary to use ja-hani. - Nikki (talk) 21:41, 29 December 2023 (UTC)[reply]
A bit late but I agree. --Data Consolidation Officer (talk) 16:02, 13 April 2024 (UTC)[reply]

Reducing redundancy


Items for names take up a lot of space in Wikidata. For example, there are just under 600,000 items for surnames. That is only around 0.5% of all items, but the labels on these items which are the same as the English label account for 10% of all labels in Wikidata. The aliases which are the same as the English label account for a third of all aliases.

The size of Wikidata is causing problems, most notably for the query service, which is likely to stop working at some point in the next few years (see Wikidata:SPARQL query service/WDQS backend update) if we continue the way we are.

The developers are working on adding support for using the language code "mul" on labels (phab:T285156), designed to be used for things like this instead of copying the same label to hundreds of languages (and I hope they will also work on adding some simple dynamically generated descriptions after that - phab:T303677).

I think we can reduce the amount of redundancy on items for names before then though:

  • We could remove labels for country variants of a language, if they're the same as the first fallback language, because the fallback language is still the same language/script. This would remove at least 5 million redundant labels.
  • We could do the same for descriptions, for the same reason. This would remove at least 5 million redundant descriptions.
  • We could remove aliases which match another label and are in the wrong script, because they are not needed for searching and are entered under the wrong language anyway. It's hard to calculate using a query, but I think this would remove at least 50 million redundant aliases.

If people agree, I should be able to make a bot to do this.

- Nikki (talk) 21:24, 29 December 2023 (UTC)[reply]

I usually never think about the size of Wikidata, but you're right that at this big of a scale that it has to be considered. I am even okay with giving a property to function as a description, such as P31 being name (which automatically gives the description as "name" unless it's overwritten by something else), which would also work in removing many languages at once. Also if something has one name that applies for many languages, there could be ways of combining them?
Anyway, I support this effort and see it as vital in the sustainability of open data. Egezort (talk) 22:32, 29 December 2023 (UTC)[reply]

Double given name


I met double given name (Q1243157) for the first time today.

The context was the artist (William) Francis Marshall (Q21459938), with given name (P735) = William Francis (Q104831048) coded by @Arroser: in 2021.

This seems to me quite wrong. IMO, in English at least, William Francis (Q104831048) is just a combination of two first names, not any kind of joint name; and (at least in English) even if somebody is habitually addressed by two first names, IMO (apart from a very few exceptions) those names would not be regarded as a joint or compound first name unless they were hyphenated.

Looking at query it seems there are quite a lot of these.

I see William Francis (Q104831048) was created by User:Moebeus in 2021 and has a Commons category (since 2016). Even so, I believe it should be deleted as not a real thing; along with almost all other English examples of this.

Do others agree? Jheald (talk) 12:47, 2 January 2024 (UTC)[reply]

I no longer edit names (except for adding missing ones when I need them). I appreciate the ping, if you want to delete any of the ones I've created that's okay with me. Moebeus (talk) 15:04, 2 January 2024 (UTC)[reply]
@Jheald I am not sure the double given name item is necessary but joint names which are conventionally spelled with a space in English and not a hyphen are relatively common. Punjabi first names often consist of a first part followed by an honorific, gendered, or tribal suffix; for example in the name Satwant Kaur (Q113570497) the Kaur part is what makss it a female name. I don't know what is included as "English names" here as most names used by English speakers are derived from other languages (Francis from Latin for example), but there are a number of very common Hebrew-origin names spelled with a space in English as well such as Mary Anne and Anne Marie. I have no idea about William Francis specifically, but spaces on their own should not be treated as a reason for considering a single name to be two names. عُثمان (talk) 20:01, 2 January 2024 (UTC)[reply]

Ivan vs. Iwan


There is an article named Iwan which has a redirection from Ivan. The article is linked to Iwan (Q25342533) and the redirect to Ivan (Q830350). Now there is a complaint that it is hard to connect the german article Iwan with e.g. the english article Ivan. I made the proposition to merge these two wikidata items, but got the answer, that this would be disliked here, along with a link to Wikidata:WikiProject Names. There, I can't see any reason why not to merge these two items, so I'm asking here, whether this merge would be a problem. It's the same name, Iwan is the correct german transcription of the russian name Иван, just like Ivan is the english transcription of it. Senechthon (talk) 22:54, 25 March 2024 (UTC)[reply]

There were numerous discussion about this issue, even with "Иван" example: Wikidata_talk:WikiProject_Names/Archive/1#Cyrillic_-_values_for_личное_имя_(P735), Wikidata_talk:WikiProject_Names/Archive/1#Constantin_/_Konstantin_/_Constantine_merger_at_Q7111053... The status quo is that even difference in an accent sign worth a new item. --Infovarius (talk) 21:31, 27 March 2024 (UTC)[reply]

Which items should be added as P735?


For example, consider a Russian person like d:Q2587276. Should it only have a name item with Cyrillic script like d:Q2253934 as given name (P735)? Or should it also have an item with Latin script like Q18130730 for the name's transcription as given name (P735)? D3rT!m (talk) 15:21, 1 April 2024 (UTC)[reply]

@D3rT!m: yes, there should be only one item, the one in the original language (as transcriptions can vary and be multiples). I corrected the item you linked. Cheers, VIGNERON (talk) 05:33, 9 April 2024 (UTC)[reply]
Okay, thank you! D3rT!m (talk) 09:27, 9 April 2024 (UTC)[reply]

@D3rT!m: Not an answer, but a remark: This question can be seen in the broader context of how names should be represented. I (still) have the opinion that a “clean” modelling of names would require reifying them, i.e. having one item per name (with properties like given name, surname, included honorifics etc. being properties of that item, transcriptions would be properties of the given name, surname etc.) referred to from a namebearer by something like name (P2561). But I feel that such a modelling wouldn’t be welcome (because it would be perceived as too complicated). (I also think that there are way too many items about humans – they make SPARQL queries time out too easily for them to be able to answer many interesting questions –, but that’s another can of worms.) --Data Consolidation Officer (talk) 16:00, 13 April 2024 (UTC)[reply]

@Data Consolidation Officer: I missed your answer, what do you propose exactly? (I'm not sure what your proposition would improve/solve). Names are very hard and the current solution is not perfect but I guess it's the best trade-off (and if anything, I would split same names in more items, like Berger (Q1260304) which presently conflates two very different names in French and German). Cheers, VIGNERON (talk) 12:06, 21 April 2024 (UTC)[reply]
@VIGNERON: For context, last year I asked here about whether name in native language (P1559) should include honorifics. That’s one of the many questions that arise when representing names as simple (monolingual) strings, because names are complex. That’s why I think that the “clean” way of modelling names would be having one item per name, with statements encoding the properties of the name. For example, there would be an item for the name “Charles III, King of the United Kingdom” with properties describing that it consists of an individual name (Charles (Q2958359)), a generation number (not sure it’s called that in English; the III or 3, anyway), and a honorific (King of the United Kingdom (Q120643751)). Or there would be an item for the name “George Walker Bush” with properties describing that it consists of two individual names (George (Q15921732) and Walker (Q16622960), with series ordinal (P1545) qualifer), the second of which usually being abbreviated, and a family name (Bush (Q1484464)). Currently, such properties reside on the namebearer’s item (Louis XIV of France (Q7742), George W. Bush (Q207) etc.), presumably because most people only have one name; having one item per name would thus solve problems when describing someone whose name has changed. Additionally, it would make aspects of the names machine-readable, e.g. someone who wants to generate name strings with honorifics (where present) could do that automatically, and someone who wants to generate name strings without then could do so, too. (This could be useful for label generation in different languages with different naming conventions. “Queen Heonae” is a bad German or Spanish label for a Korean queen.) On the downside, this would result in every human (Q5) item having (at least) one corresponding name item, as well as added complexity. That’s probably what you mean by the current solution being “the best trade-off”. --Data Consolidation Officer (talk) 19:11, 27 April 2024 (UTC)[reply]

Missing first names


Hi, I checked the gender of mayors against the gender of their first names. I would be helpful if you could add missing names from de:Benutzer:Herzi Pinki/Von Frauenern und Männerinnen. E.g. all the names marked as undefined in wikidata. I suspect names like Róbertné to be the Hungarian female version of Róbert etc., but I could not get a source for that assumption. best --Herzi Pinki (talk) 15:38, 16 June 2024 (UTC)[reply]

Kovács Róbertné means the wife of Kovács Róbert: in Hungary, it’s common that the wife takes the full name of her husband when they get married, Kis Júlia becoming Kovács Róbertné. However, her given name remains Júlia (Q19851095), so I don’t think the statement given name (P735) = Róbertné would be accurate. Unfortunately, it’s impossible to determine her given name only from the official name (this is also a problem in real-life situations, one doesn’t know how to address these -nés). So while you can assume sex or gender (P21)female (Q6581072) (which, of course, is true only if the person is heterosexual), I don’t think you should add any given name (P735) statements.
By the way, Hungarian laws don’t allow gender-neutral given names, so when your subpage says that the mayor of Abádszalók (Q336820), Gyula Balogh, is neutral, it’s in fact clearly a “he”. —Tacsipacsi (talk) 20:30, 16 June 2024 (UTC)[reply]
thanks, I handle Gyula as neutral, as WD is undecided Gyula (Q9317185) vs. Gyula (Q124001429). Maybe this is a modelling flow? Or it can be used for both genders outside of Hungary for persons that never want to enter Hungary? I just said that the given name of Gyula Balogh is neutral, not him as a person. --Herzi Pinki (talk) 07:23, 17 June 2024 (UTC)[reply]
I think the female Gyula (Q124001429) is a result of a data error. It’s used only on Gyula Kajari (Q123423697), a person born in Ősi (Q383063), Hungary (so it’s not about inside/outside Hungary), and while the item states that Kajári was a female, and has a reference for that, the reference doesn’t seem to highlight in any way why it makes this surprising statement. While (according to hu:Magyarországon anyakönyvezhető utónevek listája) the rule dictating the strict distinction between male and female names exists only since 1965 (Kajári was born in 1926), Gyula is a pretty well-known male name (originating from the Old Hungarian title of Gyula (Q933316)), so I find it unlikely that a girl is given this name (again, a nonbinary person may be a possible explanation, but I’d expect that to be explicitly stated). Maybe just some librarian pushed the wrong button.
For for persons that never want to enter Hungary: I’m pretty sure nothing prohibits a male Boris (Q666112) (in Hungary, only Boris (Q61356723) is allowed) or a female Robin (Q1158139) (in Hungary, it’s classified as a male name) to enter the country: such rules should only apply to citizens. —Tacsipacsi (talk) 22:02, 17 June 2024 (UTC)[reply] gives an evidence that this Gyula Kajari (Q123423697) is male. Do you have any idea on how to fix this around the authority data around and get rid of Gyula (Q124001429) here in WD? best --Herzi Pinki (talk) 11:11, 19 June 2024 (UTC)[reply]

Should fictional names have their own items?


For example, Cogita (Q116790893). No real person has this name, and presumably none ever will, so there will presumably only ever be one fictional character with it. That doesn't justify a separate item in my opinion. Has this ever come up before? I know this is far from the only one. —Xezbeth (talk) 14:14, 29 June 2024 (UTC)[reply]

@Xezbeth: I wonder about that too. The name "Sheev" has only ever been used by the character Palpatine in Star Wars, same problem there.StarTrekker (talk) 15:12, 26 August 2024 (UTC)[reply]
I believe that we should no create items for any names which have <2 bearers, no matter fictional or not. At the same time I would allow specific class "fictional name". --Infovarius (talk) 14:14, 27 August 2024 (UTC)[reply]