Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Page MenuHomePhabricator

Revise documentation for "time" data value in Wikibase/DataModel/JSON (timezone)
Open, LowPublic

Description

The document Wikibase/DataModel/JSON says "NOTE: The canonical copy of this document can be found in the Wikibase source code and should be edited there. Changes can be requested by filing a ticket on Phabricator" hence this ticket.

The documentation for the timezone parameter within the "time" datavalue states

"timezone: Signed integer. Currently unused, and should always be 0. In the future, timezone information will be given as an offset from UTC in minutes. For dates before the modern implementation of UTC in 1972, this is the offset of the time zone from Universal time universal time. Before the implementation of time zones, this is the longitude of the place of the event, expressed in the range −180° to 180° (positive is east of Greenwich), multiplied by 4 to convert to minutes."

I believe this is unrealistic. If timezones are ever supported, it will never be possible to find enough editors to scour all the existing dates in the database to figure out what the correct time zone is, nor is it likely editors will ever be persuaded to enter the time zone at the same time they enter new dates after the hoped-for implementation of time zones. It will also be highly problematic to assign a proper time zone to dates that are imported by robots from outside databases.

It is much more realistic to acknowledge the reality that the dates in the database are, and always will be, local times. The time data value, in reality, contains no information about the time zone and that must be deduced by hints one may find in other properties (such as "place of birth") or outside the database. I therefor suggest the paragraph I quoted be revised to read

"timezone: Signed integer. Unused, and should always be 0 and should always be ignored. This parameter turned out to be impractical, and the time data value contains no information about the time zone. The time data value should be regarded as a local time. In the future a new data value might be implemented for situations where date and time of day with a precision better than 1 day is desired."

I believe the present situation may dissuade editors who only wish to add accurate information from participating as Wikidata editors, because they are unwilling to add dates as being in the UTC time zone when they know this is not true.


(added July 4, 2017): Note the timevalue also includes a "Z" that can be understood as referring to UTC (with offset 0). This despite the current maximum precision being 11 (day) and the documentation mentioning that more precise elements of the timevalue should be ignored.

Event Timeline

I have made a parallel request at the mediawiki talk page for Wikibase/Indexing/RDF_Dump_Format. There is no mention there of needing to submit a Phabricator ticket to change that documentation.

Please tag Phabricator tasks with a project name. If it's Wikidata-related, and you don't know the exact sub-project, you can just use Wikidata.

I don't think that setting time zones should be unavailable simply because many dates won't have the data available.

Time zone data is necessary to be able to put events in proper chronological order, or to determine lengths of time between events. Without time zones, some dates may be ambiguous because of lack of clarity of the "local" time of the event in question, which may not be in a single location. (For example, the WW2 Pacific War began on December 7 in some time zones (as reflected in the English Wikipedia and some others), and December 8 in Asia/West-Pacific time zones (as reflected in the Japanese Wikipedia).) It is practically impossible to automatically determine a historical time zone from a location and date.

I recommend having the timezone parameter default to unspecified, and allow editors to add time zones where applicable.

Esc3300 renamed this task from Revise documentation for "time" data value in Wikibase/DataModel/JSON to Revise documentation for "time" data value in Wikibase/DataModel/JSON (timezone).Jul 4 2017, 10:31 AM
Esc3300 updated the task description. (Show Details)
Esc3300 updated the task description. (Show Details)

I recommend having the timezone parameter default to unspecified, and allow editors to add time zones where applicable.

Agree. "Z" should not be read as the timezone being specified.

Textual values should be possible for timezones.

Making the timezone default to unspecified would require deciding how "unspecified" will be represented in all possible representations, including JSON, RDF, and the internal database value. If this approach is followed, I would like to see a bot go through the database and change all the existing timezone values to unspecified. Then correct values can be added once the facility to specified them becomes available, and as editors determine which values are correct.

As a first step, we might just consider that "unused" means "unspecified".

As a first step, we might just consider that "unused" means "unspecified".

I think it's safe to say the developers would be unwilling to emit contradictory information in the RDF dump format (and related RDF formats) and the JSON format. As I understand it, the external creators of these formats did not define how date/times should be represented, and the representation standards for Wikidata are extensions by Wikidata developers. Neither Wikidata standard indicates that the timezone may be omitted. and neither standard indicates the Z at the end of the datetime string may be omitted. Changing these representations would be a breaking change; various parsers written both inside and outside the Wikimedia organization may be unable to read the new format.

For the more distant future, for RDF, since it tries to follow ISO 8601, there will be an issue of how to represent a date with a timezone but no time of day. For example, if I wanted to represent "today" in my time zone in ISO 8601, the nearest I can come to it is "2017-05-2017T-04:00" but I don't think there is consensus about whether this is a valid ISO 8601 representation.

Since, historically, some countries have considered the year to begin, and incremented the year number on, a date other than January 1, we should specify that we always consider the year to begin on, and always increment the year number on, January 1, for bot the Gregorian and Julian calendars.