Property talk:P233

From Wikidata
Jump to navigation Jump to search

Documentation

canonical SMILES
Simplified Molecular Input Line Entry Specification (canonical format)
DescriptionSimplified Molecular Input Line Entry Specification - simplified molecular input line entry specification (Q466769)
Representssimplified molecular input line entry specification (Q466769)
Data typeString
Domain
According to this template: chemical substance (Q79529)
According to statements in the property:
type of chemical entity (Q113145171), chemical element (Q11344), isotope (Q25276), functional group (Q170409), structural class of chemical entities (Q47154513) or group of isomeric entities (Q15711994)
When possible, data should only be stored as statements
Allowed values
According to this template: complex text string
According to statements in the property:
[A-Za-z0-9+\-\*=#$:().>/\\\[\]%]+
When possible, data should only be stored as statements
Example(±)-3-carene (Q19414)CC2(C)C\1CCC(C)/C=C/12
gold (Q897)[Au&zoom=2.0&annotate=cip [Au]]
ethanol (Q153)CCO
Formatter URLhttps://www.simolecule.com/cdkdepict/depict/bow/svg?smi=$1&zoom=2.0&annotate=cip
https://chemapps.stolaf.edu/jmol/jmol.php?model=$1
https://cactus.nci.nih.gov/chemical/structure/$1/file?format=sdf&get3d=true
Robot and gadget jobsAuto URL, e.g. http://chemapps.stolaf.edu/jmol/jmol.php?model=CCO
Tracking: sameno label (Q32085237)
Tracking: differencesno label (Q20636208)
Tracking: usageCategory:Pages using Wikidata property P233 (Q20636212)
Tracking: local yes, WD nono label (Q20636205)
See alsoisomeric SMILES (P2017), SMARTS notation (P8533)
Lists
Proposal discussion[not applicable Proposal discussion]
Current uses
Total1,362,647
Main statement1,362,643>99.9% of uses
Qualifier4<0.1% of uses
Search for values
[create Create a translatable help page (preferably in English) for this property to be included here]
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P233#Type Q113145171, Q11344, Q25276, Q170409, Q47154513, Q15711994, SPARQL
Format “[A-Za-z0-9+\-\*=#$:().>/\\\[\]%]+: value must be formatted using this pattern (PCRE syntax). (Help)
List of violations of this constraint: Database reports/Constraint violations/P233#Format, hourly updated report, SPARQL
Allowed entity types are Wikibase item (Q29934200): the property may only be used on a certain entity type (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P233#Entity types
Scope is as main value (Q54828448): the property must be used by specified way only (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P233#Scope, SPARQL
Single best value: this property generally contains a single value. If there are several, one would have preferred rank (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P233#single best value, SPARQL
Pattern ^(.*@.*)$ will be automatically replaced to \1 and moved to isomeric SMILES (P2017) property.
Testing: TODO list

Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.)


format constraint

[edit]

added a format constraint, that just checks if only valid characters are used --Akkakk 00:28, 26 June 2013 (UTC)[reply]

Multiples SMILES

[edit]

Hi. It seems (in the WPEN chemboxes) that a few chemical components can have multiples SMILES codes. Do we have a solution in WD to represent all of them (I see only one property "SMILES"). Kelson (talk) 14:28, 24 April 2015 (UTC)[reply]

@Kelson: You can have only one SMILES per chemical compound but depending your chemical, you can have two notations: canonical and isomeric. Canonical is the normal representation without indications about chiral carbons. This representation is not unique. The isomeric representation is unique. So normally you canonical for simple molecules and isomeric for chiral molecules. If you have more than two there is an error or the article is about a mixture or mix different data in the same article.
I open a discussion in the Chemistry project. See here if you want to add your comment. Snipre (talk) 13:57, 17 June 2015 (UTC)[reply]

Canonicalization method is ambiguous

[edit]

Current definition of the property does not specify which canonicalization method should be used. As is written on SMILES Wikipedia page, canonicalization output depends on algorithm/software, thus without properly specifying it there is risk of several alternative canonicalization methods being used for values, resulting in reduction of overall quality of data for this property. Ungurinis (talk) 06:55, 8 October 2021 (UTC)[reply]

I wonder if this is the same canonical SMILES as described in Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI (Q28133319). Pinging Snipre who created the property according to property's history. Ungurinis (talk) 10:53, 10 January 2022 (UTC)[reply]

URL format

[edit]

formatter URL (P1630) value added for this property is used for every SMILES statement throughout the Wikidata. While SMILES → structural formula is quite useful for Wikidata purposes (e.g. verification of stereochemistry, easy way to get a formula and add proper chemical classes), 3D molecule depiction is useless, it may serve only decorative function. I oppose any changes to the default URL format without prior discussion (with notification of WikiProject Chemistry members). Wostr (talk) 22:51, 29 September 2023 (UTC)[reply]