Property talk:P274
Documentation
description of chemical compound giving element symbols and counts
Description | description of chemical compound based on element symbols | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Represents | chemical formula (Q83147) | ||||||||||||
Data type | String | ||||||||||||
Domain | chemical coumpounds (note: this should be moved to the property statements) | ||||||||||||
Allowed values | ([αβγδφωλμπ]-)?([([]*[A-Z☐][ub]?[a-z]?[₁₂₃₄₅₆₇₈₉₀.]*(\)?[¹²³⁴⁵⁶⁷⁸⁹⁰]*[⁺⁻]?)?[\])|,₁₂₃₄₅₆₇₈₉₀]*(·\(?[-0-9.]*n?\)?)?)+ | ||||||||||||
Example | water (Q283) → H₂O carbon dioxide (Q1997) → CO₂ ethylene (Q151313) → C₂H₄ | ||||||||||||
Tracking: same | no label (Q28046688) | ||||||||||||
Tracking: differences | no label (Q20636209) | ||||||||||||
Tracking: usage | Category:Pages using Wikidata property P274 (Q20636211) | ||||||||||||
Tracking: local yes, WD no | Category:Chemical formula not in Wikidata, but available on Wikipedia (Q20636201) | ||||||||||||
See also | general formula (P1673) | ||||||||||||
Lists |
| ||||||||||||
Proposal discussion | [not applicable Proposal discussion] | ||||||||||||
Current uses |
| ||||||||||||
Search for values |
List of violations of this constraint: Database reports/Constraint violations/P274#Single value, SPARQL
([αβγδφωλμπ]-)?([([]*[A-Z☐][ub]?[a-z]?[₁₂₃₄₅₆₇₈₉₀.ₓ]*(\)?[¹²³⁴⁵⁶⁷⁸⁹⁰]*[⁺⁻]?)?[\])|,₁₂₃₄₅₆₇₈₉₀ₓ]*(·\(?[-0-9.]*n?\)?)?)+
”: value must be formatted using this pattern (PCRE syntax). (Help)List of violations of this constraint: Database reports/Constraint violations/P274#Format, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P274#Entity types
List of violations of this constraint: Database reports/Constraint violations/P274#Scope, SPARQL
|
- Note:
- It needs font Arial Unicode MS (for instance), subcripts 'x' and 'n' need another font.
- Check en:Superscripts and Subscripts page's table.
Subscript and superscript
[edit]Shouldn´t that be written like C<sub>2</sub>H<sub>6</sub>O? --Goldzahn (talk) 06:26, 13 March 2013 (UTC)
- Good question but I think it's the same at the end: a template can put all numbers in the right format for display. I prefer to avoid mixing data and display features in wikidata DB. Snipre (talk) 14:34, 13 March 2013 (UTC)
- The subscripts is a notation, not a display feature. /Esquilo (talk) 16:00, 15 March 2013 (UTC)
- Ok, but it's a notation for html display, and depending on the use this notation will be useless. Again wikidata provides raw data and data users will do what they want with that according to their programming language or display features. Snipre (talk) 16:46, 15 March 2013 (UTC)
- I think it would be very difficult to write a template that will output C2O42− for Oxalate. I suggest to allow <sub></sub> and <sup></sup> in any string. HenkvD (talk) 12:59, 16 March 2013 (UTC)
- But in wikidata syntax subscript is not defined by html format. So the html format as to be avoided because it is meanless in wikidata. I agree about the convention but for format reasons there is no way to describe subscript in an unique format. So each time you will try to use wikidata data you have to first convert subscript information. So at the end it is simplier to write without subscript format in my opinion. Snipre (talk) 11:16, 17 April 2013 (UTC)
- It does seem like a string is a too simple data format for something as complex as a chemical formula. Something like MathML would be needed that allows complex formulas to be displayed correctly. --Tobias1984 (talk) 11:34, 17 April 2013 (UTC)
- I tried H<sub>2</sub>, H₂ in the demo system but {{#property|p274}} results in H<sub>2</sub>, H₂ instead of H2, H₂. So that does not work either HenkvD (talk) 08:08, 21 April 2013 (UTC)
- User:Pyfisch uses Unicode (Q3513021) --Chris.urs-o (talk) 08:17, 27 May 2013 (UTC)
- But in wikidata syntax subscript is not defined by html format. So the html format as to be avoided because it is meanless in wikidata. I agree about the convention but for format reasons there is no way to describe subscript in an unique format. So each time you will try to use wikidata data you have to first convert subscript information. So at the end it is simplier to write without subscript format in my opinion. Snipre (talk) 11:16, 17 April 2013 (UTC)
- The subscripts is a notation, not a display feature. /Esquilo (talk) 16:00, 15 March 2013 (UTC)
Different way
[edit]Wouldn't it be better to construct the chemical formula from the elements. So H2SO4 would be "2 x H + 2 x S + 2 x O". That way queries for substances containing H could be constructed, instead of queries for "H" in the chemical formula also returning "He". --Tobias1984 (talk) 08:26, 17 April 2013 (UTC)
- We are thinking about adding new properties form atom definition which will be used to calculate the molecular mass but as the molecular formula is a well known identifier for molecules it is better for query reason to have it already defined and to avoid a decomposition of the query. Snipre (talk) 11:11, 17 April 2013 (UTC)
- WolframAlpha does it pretty neat. (e.g. http://www.wolframalpha.com/input/?i=glucose) when you press on the formula it resolves it into number of atoms and mass fraction. I think this is the level of database intelligence we should strive for. --Tobias1984 (talk) 11:38, 17 April 2013 (UTC)
Unique value
[edit]These property doesn't have unique values. There are minerals with the same chemical formula. A chemical formula is a summary, and it is only a repeating unit for networks and chains. --Chris.urs-o (talk) 20:11, 25 May 2013 (UTC)
- Thank you for explaining. I remove the template. Please check my today edits in chemical properties. It is possible, that its contain the same mistake. — Ivan A. Krestinin (talk) 20:28, 25 May 2013 (UTC)
- I add this constraint here for experiment: {{Constraint:Unique value}}
I have a plan to remove it tomorrow. — Ivan A. Krestinin (talk) 22:05, 15 January 2014 (UTC)
format constraint
[edit]i propose to use a format constraint with the pattern (([A-Z][ub]?[a-z]?(<sub>[0-9]+</sub>)?(<sup>[0-9]+[+-]?</sup>)?)+|([A-Z][ub]?[a-z]?[₀₁₂₃₄₅₆₇₈₉]*([⁰¹²³⁴⁵⁶⁷⁸⁹]+[⁺⁻]?)?)+). these values wouldn't match it:
- Q104692 (Ca2(Mg,Fe)5[OH|Si4O11]2)
- Q231506 (Pd3Pb)
- Q1066160 (K3(UO2)4(SO4)2O3(OH)·3H2O)
- Q1069926 ((Fe,Cu)SO4·5H2O)
- Q2856361 (Na2Nb4O11·9H2O)
- Q2896295 (Ag3Bi7S12)
- Q2925579 ((Ni,Al)3(Si,Al)2O5(OH)4)
- Q3261595 (Mg3Cr2(SiO4)3)
- Q3780191 (NaCa16(Si23Al)O60(OH)8·14H2O)
- Q3827887 (Pb7F12Cl2)
is there a need to expand the pattern? --Akkakk 23:26, 5 June 2013 (UTC)
- A bot or a gadget would be helpful and vacancys must be permited too [ ☐ ] ;) --Chris.urs-o (talk) 11:43, 6 June 2013 (UTC)
- stripped the html-variant and changed it to "([([]?[A-Z][ub]?[a-z]?[₀₁₂₃₄₅₆₇₈₉]*([⁰¹²³⁴⁵⁶⁷⁸⁹]+[⁺⁻]?)?[])|,₀₁₂₃₄₅₆₇₈₉]*(·[0-9]+)?)+". can you give an example for ☐? is it instead of an element symbol? the following wouldn't match. --Akkakk 13:13, 6 June 2013 (UTC)
- Q104692 (Ca2(Mg,Fe)5[OH|Si4O11]2)
- Q355615 (CrO<sub>4</sub><sup>2-</sup>, Cr<sub>2</sub>O<sub>7</sub><sup>2-</sup>)
- Q411314 (C<sub>3</sub>H<sub>3</sub>N<sub>3</sub>O<sub>3</sub>)
- Q411876 (P<sub>2</sub>O<sub>3</sub>, P<sub>4</sub>O<sub>6</sub>)
- Q422642 (H<sub>2</sub>CrO<sub>4</sub>, H<sub>2</sub>Cr<sub>2</sub>O<sub>7</sub>)
- As I understand <sup> and <sub> are invalid on WikiData, right?
- tremolite (Q423051) is an example, an amphibole.
- As I understand, sometimes a site on an unit cell of a crystal of a mineral group is empty, and the charge is compensated on another site. --Chris.urs-o (talk) 18:29, 6 June 2013 (UTC)
- i don't know any rule that prohibits <sup> and <sub>, but we should use one form and i think unicode is better. added the box as alternative for chemical element. --Akkakk 00:06, 7 June 2013 (UTC)
- Is vacancy symbol ☐ an official symbol for chemical formula ? Because when I look at the wikipedia article~s this is no symbol like this. Snipre (talk) 00:31, 7 June 2013 (UTC)
- I don't know about IUPAC nomenclature of inorganic chemistry (Red Book), but scientific literature, rruff.info/ima/ and mindat.org uses it throughout. If I remember it right, mindat.org used '{}' for vacancy a while ago.
- The chemical formula of minerals comes from rruff.info or mineralienatlas.de (a form of secondary literature). You can have a look at the end of the page of tremolite (mineralienatlas.de). --Chris.urs-o (talk) 02:02, 7 June 2013 (UTC)
- I think greek letters should be allowed too
- Belite: α-Ca2SiO4, β-Ca2SiO4, γ-Ca2SiO4; α-, β-, γ-, δ- cycloheptasulfur; φ-, ω-, λ-, μ-, π-sulfur. --Chris.urs-o (talk) 07:19, 8 June 2013 (UTC)
- Could we dump unique value? Some sources use a diferent format, rruff.info and mineralienatlas.de use different values. Mineralienatlas.de is more correct, but rruff.info is more up to date. --Chris.urs-o (talk) 04:22, 7 June 2013 (UTC)
changed the pattern, assuming the number after the "·" is optional. --Akkakk 23:55, 9 June 2013 (UTC)
- We have ·xH₂O and ·nH₂O, as well. --Chris.urs-o (talk) 15:12, 20 June 2013 (UTC)
- added [nx]? --Akkakk 13:41, 21 June 2013 (UTC)
- allowed ⁻ without number. --Akkakk 16:21, 21 June 2013 (UTC)
- Thx. It's tempting to use only '·nH2O', but I'm not so bold. --Chris.urs-o (talk) 18:17, 21 June 2013 (UTC)
- Note: autunite (Q407345): Ca(UO₂)₂(PO₄)₂·(10-12)H₂O, thomsonite-Sr (Q655464): NaSr₂Al₅Si₅O₂₀·(6-7)H₂O, parsonsite (Q1067103): Pb₂(UO₂)(PO₄)₂·(0-2)H₂O Regards --Chris.urs-o (talk) 05:03, 22 June 2013 (UTC)
- added --Akkakk 10:23, 22 June 2013 (UTC)
- I changed my mind, deleted [x]?, standard is [n]? now --Chris.urs-o (talk) 07:42, 26 June 2013 (UTC)
- then the [] aren't needed ;) --Akkakk 11:18, 26 June 2013 (UTC)
- I changed my mind, deleted [x]?, standard is [n]? now --Chris.urs-o (talk) 07:42, 26 June 2013 (UTC)
- added --Akkakk 10:23, 22 June 2013 (UTC)
- changed pattern to match (KAl₃[(OH)₆(SO₄)₂]). should (Cu₄(AsO₄)₂(OH)₂·2.5H₂O) be valid? --Akkakk 10:16, 7 July 2013 (UTC)
- Yup, it should, you see it on newer formulas. --Chris.urs-o (talk) 13:26, 7 July 2013 (UTC)
Hello, bot stops the report updating die to error in pattern. Regexp parser says "range out of order in character class". — Ivan A. Krestinin (talk) 15:14, 11 August 2013 (UTC)
- Sorry, I'll ask Akkakk to fix it. --Chris.urs-o (talk) 02:32, 12 August 2013 (UTC)
- reverted to last version by me and added . to match (Cu₄(AsO₄)₂(OH)₂·2.5H₂O) --Akkakk 10:57, 14 August 2013 (UTC)
Errors
[edit]Hi, I compare chemical formula (P274) + PubChem CID (P662) with PubChem database and generate disagree list: User:Ivan A. Krestinin/Chemical compounds. It will be great if somebody helps with error fixing. — Ivan A. Krestinin (talk) 22:05, 31 January 2014 (UTC)
- Q419714: '(CH₃COO)₂Cd' differs from new value 'C₄H₁₀CdO₆'.
- Cadmium acetate: anhydrite CAS 543-90-8; dihydrate CAS 5743-04-4
- Q1014242: '(NH₄)₃[AlF₆]' differs from new value 'AlF₆H₁₂N₃'.
- These might be not full errors, but two versions of the same substance, take care. Three chemical formulas of the same substance might be acceptable, don't overwrite it. The qualifiers/references are important, though. Simple empirical formulas aren't so good in organic chemistry. The PubChem ID might be wrong. The CAS hyperlink isn't working. --Chris.urs-o (talk) 04:39, 2 February 2014 (UTC)
- This is autogenerated list, false positives are present. The list was generated to check data consistence and for manual error fixing, not for auto replacing something. '(NH₄)₃[AlF₆]' was bug in checker, fixed. CAS links was fixed. Wrong PubChem ID are needed to be fixed too. — Ivan A. Krestinin (talk) 09:06, 3 February 2014 (UTC)
Multiple variants of chemical formulas in organic chemistry
[edit]Should we add chemical formulas of organic compounds as (as example butyl propionate):
- molecular formula (summary), e.g. C₇H₁₄O₂
- condensed / semi-structural formula, e.g. CH₃CH₂COO(CH₂)₃CH₃
- both
See: Condensed formulas in organic chemistry implying molecular geometry and structural formulas. The variant 1. does not convey much information but could be useful for searching. Should not we have qualifiers to show which variant of the chemical formula is used? --Pabouk (talk) 23:55, 12 February 2014 (UTC)
- I used 2 variant when it is possible and 1 in other cases. 3 variant is redundant. If some application needs summary, it can calculate it automatically. — Ivan A. Krestinin (talk) 03:48, 13 February 2014 (UTC)
- I would vote for option #1. --Leyo 23:00, 11 April 2015 (UTC)
Checking for 0 or 1
[edit]A 0 or a 1 can never be alone in sub- or superscript (see example fix in Wikipedia). Is there a way to add such errors to Wikidata:Database reports/Constraint violations/P274#Format? --Leyo 23:13, 11 April 2015 (UTC)
mhchem
[edit]There is a new tag on wikipedia the <ce>
tag which renders the following input <ce>H2O</ce>
as unsing the syntax defined by the mhchem package.
Please join the discussion
https://phabricator.wikimedia.org/T126862
to decide if that should become a new datatype.
--Physikerwelt (talk) 17:47, 15 February 2016 (UTC)
Format Constraint
[edit]Is there way to make the regexp constraint optional? I can't add the formula "(C12H20O29S6)n" to dextran sulfate (Q50350128)
- @Gstupp: maybe it would be better to use general formula (P1673)? dextran sulfate (Q50350128) is not a molecule where the chemical formula is precisely known, but a polymer which in fact is a mixture of macromolecules. Wostr (talk) 00:08, 6 March 2018 (UTC)
REGEX has 'ballot box' character?
[edit]At the mpoment, the REGEX string has character 'BALLOT BOX' (U+2610). (See "([α-γδφλμπω]-)?([([]*[A-Z☐][ub]?[a-z]?[₁₂₃₄₅₆₇₈₉₀]*(\)?[¹²³⁴⁵⁶⁷⁸⁹⁰]*[⁺⁻]?)?[])|,₁₂₃₄₅₆₇₈₉₀]*(·\(?[-0-9.]*n?\)?)?)+", right after the element's first, required, uppercase A-Z character. Is that OK? -DePiep (talk) 17:54, 14 October 2019 (UTC)
- @DePiep: see anthophyllite (Q413322) (w:Anthophyllite) for an example.--GZWDer (talk) 16:34, 19 October 2019 (UTC)
Citation required
[edit]@Wostr: complicated chemical formulas get redefined/revised, at least in mineralogy. Some additions are very questionable. Regards --Chris.urs-o (talk) 01:49, 24 December 2021 (UTC)
- All Properties
- Properties with string-datatype
- Properties used on 1000000+ items
- Properties with single value constraints
- Properties with format constraints
- Properties with conflicts with constraints
- Properties with entity type constraints
- Properties with scope constraints
- Chemical properties
- Medical properties