Wikidata:Property proposal/form decomposition

form decomposition

Originally proposed at Wikidata:Property proposal/Lexemes

Done: form decomposition (P12527) (Talk and documentation)

Description	form decomposition
Data type	Form
Domain	form
Example 1	nin9-zu/𒎐𒍪 (L643660-F2) → nin9/𒎐 (L643660-F1), nin9-zu/𒎐𒍪 (L643660-F2) → zu/𒍪 (L1116255-F1) (see also elaboration of Example 1)
Example 2	lugal/𒈗 (L643713-F10) → lugal[king][-ak][-ø] N.GEN.ABS (see elaboration of Example 2)
Example 3	in-pa3/𒅔𒅆𒊒 (L741253-F2) → i-n-pad[name][-ø] FIN.3-SG-H-A.V.3-SG-P (see elaboration of Example 3)
Planned use	Linking forms to their compositions (Lexemes) and to attach the grammatical role of these compositions
See also	combines lexemes (P5238) which allows to decompose a Lexeme into other lexemes which are parts of it

Motivation

Sumerian, as an agglutinative language derives its grammatical features from compositions of mainly suffixes which are attached to a Lexeme.

In Wikidata, we can already model Lexemes of the individual suffixes and we can create QIDs for the grammatical features that we need to describe a Lexeme Form.

What we miss is a way to decompose a lexeme form to represent how the suffixes represent the grammatical features which are assigned to the form.

One might argue that this is a trivial matter, as only suffixes are added and they can be described sufficiently to represent a grammatical feature.

However, in Sumerian, the interpretation of a word is usually broken down into a description of the chain of suffixes, or even vowels in suffixes, as exemplified here:

http://oracc.museum.upenn.edu/etcsri/parsing/index.html

This interpretation of a Sumerian form can become quite complex and is worth modeling in Wikidata, in my opinion.

To do that, we would need a property that allows for representing the decomposition of a form, similarly to "combines lexemes". Then, we would be able to list the individual suffixes or parts of suffixes in a list e.g. with "series ordinal" to explain the decomposition of the lexeme form completely in RDF.

Usage for other languages

There can be many other potential application cases for this property in other languages such as:

Turkish, Japanese as agglutinative languages (even though maybe with a clearer representation of Suffixes), e.g. all forms of 因る/よる (L11476)
Arguably Indo-European languages, e.g. German gehst (L1026-F4) "gehst" could be separated into "geh" – STEM and "st" "second person singular present, indicative, active"
Akkadian Cuneiform will need similar patterns for verbs, but also includes verbal roots, maybe Arabic is then also applicable

Elaboration on examples

This section elaborates the aforementioned three examples for Sumerian.

Example 1: ninzu (nin9-zu/𒎐𒍪 (L643660-F2))

Form: nin9-zu / 𒎐𒍪
Grammatical interpretation: nin9=HEAD.zu=2-SG-POSS

This noun has a second person singular possessive case which is marked with the suffix zu/𒍪 (L1116255).

We would like to express that the suffix is marked with zu/𒍪 (L1116255) and that nin/𒊩𒌆 (L643660) is the HEAD and carries the meaning of the noun.

Representation in Wikidata

nin9-zu/𒎐𒍪 (L643660-F2) form decomposition nin9/𒎐 (L643660-F1)
- series ordinal (P1545) 1
- value of statement has role (P3831) word stem (Q210523)

nin9-zu/𒎐𒍪 (L643660-F2) form decomposition zu/𒍪 (L1116255-F1)

Example 2: lugal (lugal/𒈗 (L643713-F10))

Taken from: https://github.com/cdli-gh/CDLI-CoNLL-to-CoNLLU-Converter/blob/master/resources/P100065.conll

Genitive absolutive form of lugal (king)
r.1.4 lugal lugal[king][-ak][-ø] N.GEN.ABS

This example shows, that the three forms (lugal/𒈗 (L643713-F7), lugal/𒈗 (L643713-F9), lugal/𒈗 (L643713-F10)) are written in the same way: "lugal". Therefore, additional elaborations on why these forms are written in this way are needed.

The genitive absolutive case of lugal/𒈗 (L643713), lugal/𒈗 (L643713-F10) is comprised of three components:

the STEM (lugal)
the particle (-ak)
the non-written marker for the absolutive case (it is always left empty)

In lugal/𒈗 (L643713-F10), the (-ak) is also not written, hence it is indistinguishable from the forms lugal/𒈗 (L643713-F7), lugal/𒈗 (L643713-F9) without additional context.

Hence, we would like to break down the grammatical composition with reference to the written and non-written parts of the form.

Representation in Wikidata

lugal/𒈗 (L643713-F10) form decomposition lugal/𒈗 (L643713-F7)

lugal/𒈗 (L643713-F10) form decomposition -ak/𒀝 (L1117316-F1)

lugal/𒈗 (L643713-F10) form decomposition -ø (L1117775-F1)

Example 3: pad3 (in-pa3/𒅔𒅆𒊒 (L741253-F2))

Taken from: https://github.com/cdli-gh/CDLI-CoNLL-to-CoNLLU-Converter/blob/master/resources/P100065.conll

r.3.3 in-pa3 i-n-pad[name][-ø] FIN.3-SG-H-A.V.3-SG-P

The example for inpad shows a representation of the in-pa3/𒅔𒅆𒊒 (L741253-F2) with the sense "to name" in Sumerian.

The verb describes its directly associated subject and its associated direct object with different grammatical parameters.

Subject: The subject is described as "third person singular finite human agent", which manifests itself in the prefix "in"
Direct Object: The direct object is described as "third person singular" and manifests itself in the non-written suffix -ø (L1117775-F1) .

Representation in Wikidata

in-pa3/𒅔𒅆𒊒 (L741253-F2) form decomposition in-/𒅔 (L1117776-F1)

in-pa3/𒅔𒅆𒊒 (L741253-F2) form decomposition pad3/𒅆𒊒 (L741253-F1)

in-pa3/𒅔𒅆𒊒 (L741253-F2) form decomposition -ø (L1117775-F1)

– The preceding unsigned comment was added by Situxx (talk • contribs) at 21:45, April 28, 2023‎ (UTC).

Discussion

Support this property is a useful and essential addition for abstracting complex linguistic issues. KaCeBe (talk) 11:53, 17 May 2023 (UTC)[reply]
Comment This seems useful but could be clarified a bit. Wouldn't value of statement has role (P3831) be more appropriate as a qualifier given that the object is the form being linked to, and the subject is the form carrying the statement? Maybe even a different property (or a new one) would make more sense here. I will try to find some examples from other languages to see if that helps clarify anything --عُثمان (talk) 17:32, 6 June 2023 (UTC)[reply]

@Situxx: OK, having thought about it a bit more I have some more specific comments: I am not sure if it is necessary to use subject stated as to qualify the statements. An issue with that property is that it does not allow specifying a language code for the string, and for languages where this information could be presented in multiple ways it is unclear how to use. One approach I have been going with for "zero morphemes" which are common in a Punjabi is to use non-printing Unicode characters as representations of individual forms. ("Left to Right Mark" and "Arabic Letter Mark" for LTR and RTL representations respectively.) This allows attaching additional data to the zero representation, and indicating that a form is an empty string without using a qualifier. See ਇ/ءِ (L718607) for example where this verbal suffix is most often unrealized, but has different forms historically and in some dialects. Rather than using subject named as, I think it would make sense to separate forms like this and select the combining form which has the correct representative string(s).

Then, for example, it could be stated that ਉੱਠ/اُٹھّ (L689060-F14) employs the suffix ‎/؜ (L718607-F1), while ਉੱਠੀ/اُٹھّی (L689060-F15) employs ਈ/ئی (L718607-F3). It would not be clear in the second case how to represent both ਈ and ی using subject named as whereas using the linked form in both cases we can get a representation of the combining form for each language/script code -عُثمان (talk) 20:25, 7 June 2023 (UTC)[reply]

Thank you very much for your remarks. I think using zero morphemes as forms for the suffixes we have in Sumerian that can be omitted is a great idea. I will adapt that and update my proposal accordingly. As for subject has role vs. object has role I think you are right. It should be object has role as the role of the suffix is described and not the grammatical feature of the subject (the form) which is already described in the grammatical feature description. I will adapt that as well and give you a heads up once I am done. Situxx (talk) 13:47, 9 June 2023 (UTC)[reply]

@Situxx: have the adaptations you mentioned been done? Mahir256 (talk) 14:51, 24 January 2024 (UTC)[reply]

Sorry about that. I forgot about this.

@عُثمان I have used "object has role" in the property proposal now and I used the null morpheme instead of the empty string representation in the examples.

I think that improved the proposal a lot. Situxx (talk) 00:56, 29 January 2024 (UTC)[reply]

@Mahir256, عُثمان: pining for attention. Regards, ZI Jony ^(Talk) 06:21, 29 January 2024 (UTC)[reply]

Support, an important property for lexemes.--Arbnos (talk) 20:25, 17 January 2024 (UTC)[reply]
Support --عُثمان (talk) 12:56, 29 January 2024 (UTC)[reply]
@Situxx, KaCeBe, عُثمان, Arbnos, Mahir256: {{Done}} as {{P|12497}}. Don't hesitate to update/add anything if I've missed or updated wrong information! Regards, ZI Jony ^(Talk) 06:43, 25 February 2024 (UTC)[reply]
Hi ZI Jony, we tried to apply the property in an example, and we could not add values as Wikibase Forms, we are only able to add values of Wikibase Lexemes, which is not what we proposed. Is there a chance you could change that? Situxx (talk) 19:21, 28 February 2024 (UTC)[reply]
Situxx, data type lexeme allow only lexeme to link, for Wikibase Forms you supposed to choice data type "wikibase-form". wikibase-form will allow you to add only wikibase-form. There is no chance to change the data type now, only one option is delete the existing property, and new one. Regards, ZI Jony ^(Talk) 17:02, 29 February 2024 (UTC)[reply]
Sorry, this seems to be my bad. All examples were pointing to forms, but it seems I have indeed pointed out the wrong data type. Would it be possible to create a new one? I believe the property has not been used as of yet. Situxx (talk) 19:03, 29 February 2024 (UTC)[reply]
@Situxx:, just for re-confirmation! I've changed the data type as "form". Please conform shall we ahead to create a new property. We'll not be able to delete and re-create again and again. Regards, ZI Jony ^(Talk) 06:57, 1 March 2024 (UTC)[reply]
@Situxx:, waiting for your reply. Regards, ZI Jony ^(Talk) 04:51, 4 March 2024 (UTC)[reply]
Thank you for pinging. I must have missed your previous message.

Yes, changing the type to form is what we intended to propose.

So, you may go ahead. Again sorry for the confusion. Situxx (talk) 12:11, 4 March 2024 (UTC)[reply]
@Situxx, KaCeBe, عُثمان, Arbnos, Mahir256: Done as form decomposition (P12527). Regards, ZI Jony ^(Talk) 12:41, 5 March 2024 (UTC)[reply]

Wikidata:Property proposal/form decomposition

Contents

form decomposition

Motivation

Usage for other languages

Elaboration on examples

Example 1: ninzu (nin9-zu/𒎐𒍪 (L643660-F2))

Representation in Wikidata

Example 2: lugal (lugal/𒈗 (L643713-F10))

Representation in Wikidata

Example 3: pad3 (in-pa3/𒅔𒅆𒊒 (L741253-F2))

Representation in Wikidata

Discussion

Navigation menu

Wikidata:Property proposal/form decomposition

form decomposition

Motivation

Usage for other languages

Elaboration on examples

Example 1: ninzu (nin9-zu/𒎐𒍪 (L643660-F2))

Representation in Wikidata

Example 2: lugal (lugal/𒈗 (L643713-F10))

Representation in Wikidata

Example 3: pad3 (in-pa3/𒅔𒅆𒊒 (L741253-F2))

Representation in Wikidata

Discussion

Navigation menu

Search