Maniphest T201808

Unify separator between language and lexical category
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	KaMan
	Aug 13 2018, 7:58 AM

Description

Problem:
We are not consistent in how we separate language and lexical category.

Examples:
On tooltip of Lexeme there is comma separator between language and lexical category:

On search results there is no separator between language and lexical category:

And no separator in the entity selector:

Acceptance criteria:

We consistently separate language and lexicographical category with a localizable comma in all 3 places.

Details

	Subject	Repo	Branch	Lines +/-
	One comma was added to wikibaselexeme-description and respective test files, to separate language and lexical category.	mediawiki/extensions/WikibaseLexeme	master	+19 -19

Customize query in gerrit

Related Objects

Mentioned In: rEWLE14a41d1e1add: One comma was added to wikibaselexeme-description and respective test files, to…
rEWLE0513e0eeefdd: One comma was added to wikibaselexeme-description and respective test files, to…
rEWLE1b611a50ea3f: One comma was added to wikibaselexeme-description to separate language and…

Event Timeline

KaMan created this task.Aug 13 2018, 7:58 AM

Lea_Lacroix_WMDE removed Lea_Lacroix_WMDE as the assignee of this task.Aug 13 2018, 8:02 AM

Lea_Lacroix_WMDE added subscribers: Lydia_Pintscher, Lea_Lacroix_WMDE.

The messages are https://www.wikidata.org/wiki/MediaWiki:Wikibaselexeme-presentation-lexeme-secondary-label and https://www.wikidata.org/wiki/MediaWiki:Wikibaselexeme-description

Lydia_Pintscher moved this task from incoming to features/bugs for later releases on the Wikidata Lexicographical data board.Sep 1 2018, 6:58 PM

Lydia_Pintscher triaged this task as Medium priority.Sep 2 2018, 3:57 PM

Lydia_Pintscher moved this task from features/bugs for later releases to features/bugs for next release (Lexeme page) on the Wikidata Lexicographical data board.

Lydia_Pintscher updated the task description. (Show Details)Sep 2 2018, 4:09 PM

Lydia_Pintscher merged a task: T195443: Separate lang and category with a comma in Lexeme "description".

Lydia_Pintscher added a project: WMDE-Design.

Lydia_Pintscher added subscribers: VIGNERON, Aklapper, WMDE-leszek.

Restricted Application added a project: Design. · View Herald TranscriptSep 2 2018, 4:10 PM

Lydia_Pintscher updated the task description. (Show Details)Sep 2 2018, 4:56 PM

Would that go through a function of some sort that renders the Word/Language combination? Or would this be defined in CSS what the sepreator is?
In any case, I have few preferences except that consistency would be great.

Lydia_Pintscher moved this task from incoming to consider for next sprint on the Wikidata board.Jan 4 2019, 11:33 AM

Lydia_Pintscher added a project: Wikidata-Campsite.

Editors: preferences please :) Then we can pick this up.

Lydia_Pintscher moved this task from Incoming to Needs Work on the Wikidata-Campsite board.Jan 6 2019, 6:01 PM

I would prefer comma separator but I work only with Latin scripts so I don't know if it works for all languages (Chinese, Hebrew, Tamil, Japanese etc.)

@Amire80 maybe you can give some input?

@Lydia_Pintscher , thanks a lot for asking! :)

The first example with the comma certainly looks better than the other ones with no separator.

As far as I can see, and as @Bugreporter has already written above, the message with the comma is implemented using the optional message wikibaselexeme-presentation-lexeme-secondary-label. It's default value is, expectedly $1, $2. It's a good default. I could imagine some other clever and more generic schemes, for example to use a | as a separator, but it's not really needed, and a comma is good enough. If anybody thinks that a comma is not good enough, I'll be very interested in seeing an example. Languages where something other than a comma and a space is needed can easily customize it by editing the translation at translatewiki.

The presentation without the comma is implemented using the message wikibaselexeme-description. It's value is $1 $2, and it's marked as "ignored" in translatewiki, which means that it's a message for internal technical use and cannot be translated. This designation is probably incorrect.

My immediate intuition is to do the following:

To use wikibaselexeme-presentation-lexeme-secondary-label consistently in every place where showing the lexeme and the part of speech is needed.
To examine the usage of wikibaselexeme-description. Perhaps it can be completely removed and replaced with wikibaselexeme-presentation-lexeme-secondary-label. If it's needed, then perhaps it can be changed to $1, $2, but there should be proper justification for having an identical message. If this is done, then this message should be defined as optional and not as ignored in the translatewiki configuration repo.

Having identical messages is not necessarily bad, as MediaWiki's Localisation guidelines say. I can think of at least one good justification for having two messages: one can be presented as plain text, which would be good for tooltips, and another one can be parsed with wiki syntax, for showing in context where HTML is available. (The messages don't have markup at the moment, but some languages may want it, for example for fixing RTL issues.) There can be other justifications. But this is really a questions that people who are well-familiar with the code should answer.

I'll be happy to give more L10n advice if needed. I'd be happy to go deeper into lexicographical and dictionary design advice, but I don't think that it's needed here, at least for now.

Actually there is already a separator: a spacebar.

In the ideal situation, the separator should be localizable. For example Chinese and Japanese might prefer 、(or · ) rather than a Latin comma. But this is a very technical usage (i.e. not dictated by natural language grammar) so I can't speak for the preferences of other users.

In the short term, both comma and spacebar would be fine. I think speakers of non-Latin languages can cope with a Latin comma as separator, much like we have put up permanently with the utterly foreign and non-localizable Latin colon for namespaces.

It's already localizable, as my comment above says. But I'm not sure why are there two messages and not one.

In English the space separator can work well if you treat it as a phrase with an adjective (language name) that describes a part of speech, e.g., "Polish noun". But it won't work in many other languages. For example, in Russian, names of parts of speech have gender, and then the name of the language adjective will have to be in the same gender. While it's not impossible to generate correct phrases of this kind, it's not trivial either, and probably cannot be done with just simple messages. So it's probably better not to assume that it's a phrase. "Noun, Polish", looks more generic and probably more easily localizable.

Alright. Then let's go with a localizable comma. Thanks for the input everyone!

• Greta_Doci_WMDE claimed this task.Jan 31 2019, 10:43 AM

Lydia_Pintscher removed • Greta_Doci_WMDE as the assignee of this task.Jan 31 2019, 10:43 AM

Lydia_Pintscher assigned this task to • Greta_Doci_WMDE.

Lydia_Pintscher updated the task description. (Show Details)

Lydia_Pintscher moved this task from Needs Work to Ready to estimate on the Wikidata-Campsite board.

Lydia_Pintscher added a subscriber: • Greta_Doci_WMDE.

• Greta_Doci_WMDE moved this task from Ready to estimate to Wikidata-Campsite-Iteration-∞ (On Hold) on the Wikidata-Campsite board.Jan 31 2019, 10:49 AM

• Greta_Doci_WMDE edited projects, added Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)); removed Wikidata-Campsite.

• Greta_Doci_WMDE moved this task from To Do (prioritised from top to bottom) to Doing on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.

Change 487346 had a related patch set uploaded (by Greta WMDE; owner: Greta Doçi):
[mediawiki/extensions/WikibaseLexeme@master] One comma was added to wikibaselexeme-description to separate language and lexical category

https://gerrit.wikimedia.org/r/487346

gerritbot added a project: Patch-For-Review.Jan 31 2019, 11:27 AM

• Greta_Doci_WMDE moved this task from Doing to Peer Review on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Jan 31 2019, 11:27 AM

Diffusion mentioned this in rEWLE1b611a50ea3f: One comma was added to wikibaselexeme-description to separate language and….Jan 31 2019, 11:28 AM

Ladsgroup moved this task from Peer Review to Doing on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Jan 31 2019, 11:41 AM

Diffusion mentioned this in rEWLE0513e0eeefdd: One comma was added to wikibaselexeme-description and respective test files, to….Jan 31 2019, 12:23 PM

Ladsgroup moved this task from Doing to Test (Verification) on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Jan 31 2019, 12:53 PM

Change 487346 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] One comma was added to wikibaselexeme-description and respective test files, to separate language and lexical category.

https://gerrit.wikimedia.org/r/487346

• Greta_Doci_WMDE closed this task as Resolved.Jan 31 2019, 1:24 PM

ReleaseTaggerBot added a project: MW-1.33-notes (1.33.0-wmf.16; 2019-02-05).Jan 31 2019, 2:00 PM

Addshore moved this task from Test (Verification) to Done on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Feb 4 2019, 12:13 PM

jijiki mentioned this in rEWLE14a41d1e1add: One comma was added to wikibaselexeme-description and respective test files, to….Mar 18 2019, 4:35 PM

Aklapper removed a subscriber: Wikidata Lexicographical data.May 16 2023, 10:22 AM

Maintenance_bot removed a project: Patch-For-Review.May 16 2023, 10:36 AM