Suggestions for a Web Based Universal Exchange and
Inference Language for Medicine. Continuity of Patient Care
with PCAST Disaggregation.
Barry Robson*, Thomas P. CarusoΏ and Ulysses G. J. Balisΐ
Quantal Semantics Inc, Virginia, US. and also *St. Matthew‟s University School of Medicine,
Grand Cayman; *The Dirac Foundation, UK; *University of Wisconsin-Stout, US; University of
North Carolina, and University of Michigan, Michigan, US.
Tel: (00)1-345-3199 x 193; Fax: (001)1-345-945-3130; robsonb@aol.com
We describe here the applications of our recently proposed Q-UEL language to
continuity of patient care between physicians, specialists and institutions as mediated
via the Internet, giving examples derived from HL7 CDA and VistA of particular interest
to workflow. Particular attention is given to the Universal Exchange Language for
healthcare as requested by the US President‟s Council of Advisors on Science and
Technology (PCAST) released in December 2010, especially in regard to
disaggregation of the patient record on the Internet. To illustrate many features and
options, one of our most elaborate configurations combining them, for disaggregation
and reaggregation, is described. The Q-UEL tags used do not physically join, but query
each other from a random mix via the application. Despite the computationally
demanding complexity of the configuration with two joining tags for each data tag and
four independently evolving keys, plus a valuable but rate limiting isomorphism test,
packets of essential clinical data for patient could be recovered and displayed every 2
seconds for a “club” of 30,000-50,000 patients in the mix. All computation here is on a
standard laptop, but for practical use of the Internet to display downloaded data, the
above is adequate, so focus is primarily on increasing club size. In practice, it is not
necessary that a club comprise an entire nation. Assuming that one does not use purely
random assignments of patients to arbitrary clubs, there could for example be a club
comprising all schoolchildren in Scotland, or a club comprising all military veterans in
Illinois. In such cases, one is typically dealing with clubs each of the order of a mere
million patients. Using such club sizes efficiently, and in principle even a club the size of
a whole country, appears to be possible.
Keywords: Universal Exchange Language, PCAST report, Continuity of Care, Interoperability,
Electronic Health Record, Disaggregation
1. Introduction and Review.
1.1. Background
We recently made suggestions for a web-based universal exchange and inference
language (Q-UEL) for medicine [1], based on generating medical knowledge by data
mining many patient records (e.g. ref [2]) and authoritative medical text using XML-like
tags as artifacts1. However, we see many of the same considerations in the continuity of
care (COC) for a patient, where the most important artifact is just one patient‟s
electronic health record (EHR), or a subset of information on it, exchanged between
stakeholders (healthcare providers, authorized players) such as the patient, physician
and pharmacist. Our particular use of the term COC in this report applies when
stakeholders are in different institutions, networks of care such as accountable care
organizations [3], and even different countries.
We focus here on COC seen as a topic within the domain of health information
exchange (HIE), so that the present report is primarily directed to workers in that
computational and communications field. However, it has been pointed out to us that
several aspects will be of direct interest to stakeholders such as physicians. For that
reason, sections and major subsections tend to start with an overview of the broader
significance of what follows, when it is a very technical nature. Some aspects that may
have more direct impact on healthcare stakeholders are best described later below
(Section 2.13) after some review and explanatory discussion. The main motivation for
this report remains as follows. The HIE field is addressing matters that go beyond
sending medical information via a fax machine with humans doing all the information
processing. However, in the absence of strong consideration of COC so far, there has
been no strong selective pressure to inhibit appearance of multiple standards, divergent
evolution of standards, and variation in use. This presents an interoperability challenge
with opportunities for basic research in the computational sciences, and researchers
interested in several disciplines [4-10]2 may also be interested in the present report. It is
not least a challenge as COC is envisioned in the following US Federal report.
“Artifact” (or “exchange artifact” or “communication artifact”) is a term increasingly used in the emerging
medical IT interoperability, and especially the continuity of care field, for any construction that carries and
represents a packet of transmitted medical information. Traditionally, for data from one patient, it is a
transmission in some kind of specialist messaging language or an XML document. XML documents
contain tags describing, containing, and delimiting text as information. Such tags comprise strings within
angular brackets <…>. Q-UEL also uses these, though also because, by a remarkable coincidence, they
form the basis for packets of information in a notation used in physics [1], as also described below.
2
HIE between systems with differing vocabularies, ontological structures and worldviews makes the
challenges present significant semantic challenges. “Big data” mining remains directly important for
assessing the quality of COC [4]. Selecting best diagnosis and therapy for a patient is the primary use
case of COC, and clinical decision support systems (CDSS) use challenging and usually probabilistic
concepts that along with semantics relate to artificial intelligence (e.g. Refs. [5-8]). Our Q-UEL approach
[1,8] is rooted in quantum mechanics (QM) [9,10]. Not least, information exchange in COC increases the
1
1.2. The PCAST Report.
Recent US government emphasis has been on ensuring that diverse EHR
implementations have at least common required capabilities called meaningful use [11],
but this seems a considerable compromise after the more radical proposals in a
controversial report by the President‟s Council of Advisors on Science and Technology
(PCAST) in December 2010 [12]. It was concerned with the diversity of entrenched
standards for representation of the EHR and expressly proposed a single universal
exchange language (UEL) for healthcare. PCAST proposed that “…existing standards
groups would publish mappings of existing vocabularies and content standards … into
the adopted markup language. This straightforward step immediately expands the
semantically meaningful realm of tagged data exchanges to include data that are coded
in these existing standards.” PCAST did not require that the UEL was expressed in any
existing standard, speaking of UEL as “XML-like”, and stating “We believe that the
natural syntax for such a universal exchange language will be some kind of extensible
markup language (an XML variant, for example) capable of exchanging data from an
unspecified number of (not necessarily harmonized) semantic realms. Such languages
are structured as individual data elements, together with metadata that provide an
annotation for each data element” (our italics). A strong feature of the PCAST proposal
was the desire for a UEL to make the patient record available anywhere, anytime, via
the Internet, to authorized persons for the benefit of the patient. To that end they
proposed a mechanism for added security, privacy and granularity as disaggregation “of
complex records into the smallest possible data elements”.
1.3. Activity since PCAST: Reactions of Established Standards Organizations.
PCAST asserted that developing a UEL “incorporates these standards into the new
architecture, leveraging the work done by thousands of people for decades”, but the
implications are controversial. Once a sufficiently powerful UEL is consolidated and
stable, there seem no overwhelming reasons why EHRs and messaging artifacts should
not stay in a UEL form, effectively displacing the standards, except that whole
communities and many applications are built around each standard. However, PCAST
was suggesting that a UEL may be the route of overall least effort as a universal second
language. Q-UEL positions itself that way. Not least, bidirectional communication
between N distinct standards and implementations of them requires developing 2N
conversion procedures if a UEL is used as a single hub language, but N(N – 1) are
required if a hub language is not used. Despite that, standards bodies still appear to feel
that replacement by a UEL remains a possibility and threatens their authority over their
domains. Certainly it is the case that the major standards bodies are placing their efforts
demand for innovation in security and privacy, one of several controversial aspects discussed in Section
1.2.
on extending their standards to provide COC and strengthening interoperability between
centers of care and variants of their standard (Sections 1.8, 1.9), as well as on
satisfying meaningful use [11].
1.4. Activity Since PCAST: A Healthcare Role for the Semantic Web.
The Semantic Web (SW) [13-15] seeks to go beyond the current web that links web
pages by linking all data and knowledge through a hub of common meaning, accessed
by URLs (links), i.e. by the RDF method [14]. PCAST also stated, “The physician would
be able to securely search for, retrieve, and display these privacy-protected data
elements in much the way that web surfers retrieve results from a search engine when
they type in a simple query.” Consequently, we argued [1] that development of the
medical SW would best satisfy the PCAST proposal. It would also enable clinical
decision support systems (CDSS) that have been developing for many years, though
traditionally as “off line” expert systems using human experts to type in rules held locally
[16]. It would allow such knowledge to be pooled and shared as it also satisfied growing
demands for patient health information integration at a more nationwide level [17]. It
would ideally need to be a broader WW4, a Thinking Web based on probabilistic
semantics [18]. There is an essential absence of even basic features of the SW and its
RDF based approach [15] in the current EHR and HIE standards, so our proposal of QUEL was at that time unusual, but in 2013 the Yosemite Manifesto [19] proposed that
“electronic healthcare information should be exchanged in a format that either: (a) is an
RDF format directly; or (b) has a standard mapping to RDF”. We signed to this because
Q-UEL has such a mapping.
1.5. Other Efforts After or Relevant to PCAST.
The first steps of development of Q-UEL as a PCAST-like UEL solution had rather little
to draw on except the preexisting standards and the PCAST report itself. Growing calls
for a universal EHR prior to PCAST (e.g. Ref. [17]) were still very recent. Probabilistic
semantics is still not a settled discipline (e.g. Ref. [18]), and the particular probabilistic
theory used in Q-UEL only goes back to 2007 [20], albeit based directly on mathematics
developed by Dirac in the 1920s and 1930s [9,10]. There are, however, several recent
efforts in COC that come close to a UEL in their potential effect. For example, EPIC is
very active [6], and in a sense does pursue the position of the lingua franca of EHRs as
well as the end point for all health information. The Health Record Bank Alliance [21]
promotes another solution which allows health record archives to aggregate data for an
individual. There is much effort that reflects the trend in meaningful use [11] to allow
patients and health-conscious individuals variously to access, interact with, generate, or
control use of their data3. The relevance here is that such efforts are increasingly
making use of JSON (JavaScript Object Notation) [22], in some sense a UEL as
discussed in Section 1.12. It probably only comes closest to being a web-based UEL,
however, in a still rather special forms like JSON-LD [23], which link web data. There
are recent efforts to develop extensions to existing EHR standards for COC and
interoperability [24, 25], but they are not truly a UEL because focus is on communication
within a standard and between variants of it. Statistical overview of the quality of future
COC is seen as very important, as is the quality of the clinical data itself (e.g., the
National Quality Forum and the Quality Data model [26]). There is growing interest in
data mining HIE [4] and in web based probabilistic methods (e.g. Refs. [27-29]). Q-UEL
owes a considerable debt to many efforts concerning biomedical and general biological
information and the SW [31-52], and will benefit from relevant advances in high
performance “big data processing” (e.g. ref [53]), though it is notable that this large body
of work rarely touches upon probability.
1.6. Data Quality Representation and Controversial Probabilistic Aspects.
Probability does not immediately spring to mind as an issue for patient records. PCAST
did not address it, except implicitly in regard to data mining, and most stakeholders take
patient records as fact, albeit with a cautious eye for any possible irregularities.
However, Q-UEL is based on Dirac notation and algebra for quantum mechanics, so it
is natural for us to consider the uncertainties inherent in observations and
measurements and the probabilistic inference from them that characterize that
discipline. A physician comparing data for a patient with that from a population is not in
a fundamentally different position from a physicist comparing experiment and theory.
For example, diagnosis should ideally be based on comparing the joint distributions of
data dispersions for the single patient with those for the same kinds of measures from
populations, in order to compute the probability that the patient is, or is not, in a normal
state of health. The same idea is seen (albeit usually using rather extreme simplifying
assumptions of normal distribution and independency) in the routine use of “normal
range” of each clinical value, but properly considering dependence upon other
measurements and denominators such as sex, location, ethnicity, genetics, personal
and family history, comorbidities present, and lifestyle. Hence Q-UEL also allows for
representations of more complicated probability distributions as vectors and matrices
[1]. That for inference purposes Q-UEL interprets these as algebraic values of tags
treated as Dirac algebraic entities may look controversial outside of physics, but that
such information should be conveyed in some kind of way is essentially classical
statistics. The real controversy is about how much extra detail of this probabilistic nature
3
They will be reviewed elsewhere, but they include, for example, BlueButton, Validic, FitBit, Jawbone,
Moves, and Withings, the Patient-Powered Research Networks, and the Self-Generated Health
Information Exchange (SGHIx).
that any kind of UEL effort should compute and put on its tags or other artifacts for
specific patients before it appears to be imposing its worldview. Our preferred methods
for computing probabilistic quantities must stand alongside many diverse offerings in
CDSS that have traditionally been innovative and controversial in theory and method
from the outset [16], and those few SW efforts that are probabilistic can have different
kinds of probabilistic measure as input and output, e.g. Refs. [27-29]. Not surprisingly,
medical record standards bodies consider probabilistic interpretations as out of scope
and matters for developers of end-user analytical applications, who agree. Any further
debate on this may even be premature. All players await widespread automated feeds
from laboratory information management systems to provide greater mention of data
quality in source clinical documents. Q-UEL has a mechanism for deferring probabilistic
aspects (Section 2.2).
1.7. Conversion to Q-UEL: Use of Ontology and Nomenclature Standards.
The focus in this report is on the appearance of Q-UEL tags for COC and on their
disaggregation, but comparison of Q-UEL with major standards efforts in the following
Sections is most usefully done in the light of studies of how readily it interconverts with
them. While the sense in the field is that standards interconversion requires
sophisticated semantic methods, it seem much easier when a UEL is used and used in
the manner that PCAST described (Section 1.3). One can use brute force and write
specific converters, one per standard. For Q-UEL, it involves “hand crafting” tags within
Q-UEL‟s PERL-based applications [1] which are then generalized to become matchand-edit instructions (“regular expressions”) in converters. Most effort goes not in
expressing and using the important clinical information but into preserving information
about the complex semantic and ontological structure of the source in order to convert
records back into that source (although that is not in principle essential for a UEL Section 1.3). It is desirable to think of any UEL as going beyond a simple “drop box” for
content from different standards. Q-UEL already liked to describe data in many
nomenclatures, e.g. representing molecular formulae of drugs to interact with chemoinformatics applications, and in an ontological graph structured way [1]. This used QUEL‟s XML-like attributes extended by attribute metadata language (AML). It relates to
PCAST‟s “structured as individual data elements, together with metadata that provide
an annotation” (Section 2.1). It also relates to the SW, because Q-UEL tag structure as
semantic triple [15] expressions with AML-based attributes as arguments allows tags to
carry the relevant cross-language dictionary and “grammar” with RDF references where
necessary. By AML, Q-UEL‟s simple methods for specifying sources and nomenclature
are essentially the same: data elements can be like current children in a family tree that
threads back by line of descent, specifying the nomenclature used and then the source
that used it, highlighted by a CODE attribute. For example, in we might see the string
SOURCE:=CODE:=„HL7 C-CDA(R1.1 CCD)‟:=…CODE:= „SNOMED (RF2 ICD 10)‟:=....
where the ellipsis „…‟ implies other metadata, brackets (…) for graph structure, RDF
links, and/or values [1]. The current use AML in COC is described in Section 3.2.
1.8. Comparison of Q-UEL with Continuity of Care Efforts: HL7 CDA/CCD.
There are currently several large-scale COC interoperability efforts such as the
Standards and Interoperability (S&I) open government initiative [24] noted above, which
we monitor and in some cases participate. The S&I efforts primarily focus on HL7‟s use
of XML as Clinical Document Architecture (CDA) [54, 55]. It replaced the older HL7 V2
messaging that was not XML-based, but still widely used because updating installations
to CDA is not trivial [56]. Of particular interest to us has been the eHealth effort [25]. It is
technically part of S&I, but tackles the integration of European and US healthcare that
we presumed would be a challenging case for COC. The challenge for HL7
interoperability in general is that CDA versions have shown considerable evolution [5759] and each implementation in an institution can differ from others, because COC
outside its boundaries has previously been seen as less of an issue. Matters are harder
still between countries. HL7‟s CCD [59] seen as a messaging artifact is a restricted
subset of CDA that is very directed to US healthcare practices and processes. This
made it not so trivial for eHealth [25] to convert or have interoperability between CDA in
other countries such as in the European epSOS initiative [60]. Both use the HL7 CDA
R2.0 basis but it is specifically a matter between HL7 C-CDA R1.1 CCD and epSoS PS
v1.4. Faced with restrictive deadlines, the eHealth goal is, at this moment of writing,
essentially a whitepaper demonstrating what appears feasible. Q-UEL has focused on
epSoS CDA [60] because it might reveal new perspectives by not being US-oriented.
Indeed, epSoS documents are fairly rich in specifying stakeholders including those
managing the documents, all of which can provide information for provenance, workflow
management, and triggering events in COC (see Sections 1.9, 4.2).
1.9. Comparison of Q-UEL with Continuity of Care Efforts: VistA.
The US Department of Veterans Affairs VistA system [61] is widely used in the US and
at the time of writing is being considered for deployment in other countries, notably the
United Kingdom. Vista content is attractive for considering workflow because it tends to
provide very detailed accounts of “triggered” events involving such as prescriptions
(Section 4.3). VistA is also well known to vary in implementations and there has
similarly been a significant rise in VistA activity since PCAST. VistA is not based on
XML but on the old but ingenious MUMPS or “M” programming language, originally
developed for medical applications in the 1960s [62]. Its features encourage a
programming style where expressions resulting in data base entries can be little
ontologies very like attributes using Q-UEL‟s AML (Section 1.7, and Ref. [1]). One can
copy MUMPS source code and automatically edit it by, say, a Perl script to surround
data written to a database with Q-UEL tag features, so writing Q-UEL tags directly to a
database. Some VistA implementation features obscure the Q-UEL affinity so that it is
often as easy to intercept input and output, but its structure is that which the MUMPS
code implies. The overall effect is a rather granular one, emphasized in the following.
1.10. Comparison of Q-UEL with Continuity of Care Efforts: EAV Models.
Granularity is a notion highlighted by design in the Entity-Attribute-Value or EventAttribute-Value (EAV) model [63, 64]. EAV is not a specific standard or language, but it
has long been used as a concept in an on-going effort by many data miners to convert
XML documents and, in particular, HL7 CDA, into a more accessible interoperable form,
at least as far as data mining is concerned [1]. In effect, the EAV approach starts with
unstructured data mining including use of text-analytic and SW tools to convert source
into a more granular structured form. What is usually meant by EAV is a table with data
recorded as just three specific columns: the entity or event such as the patient, the
attribute or parameter such as “pulse” usually linked to a table of attribute definitions
and other information, and the value of the attribute, such as the pulse rate as a
number. Q-UEL‟s AML could in principle similarly have „patient X‟:=pulse(beats/minute)
:=80 or „patient X‟:=pulse(beats/minute):=www.qexl.org/pulse:=80. However, starting
only with descriptions like that, one would obviously lose ontological structure as the
organization of the data of the patient as a whole that other sources like XML provide.
Compared with XML, EAV is generally criticized as not being relational. Q-UEL is a
relational EAV model, with a relationship or predicate as in a semantic triple [15]. More
usually, though, when Q-UEL uses or creates EAV structures, “relationships‟ means an
XML-like ontological structure that AML can encode, as described in Section 3.2. The
following also have EAV flavor.
1.11. Comparison of Q-UEL with Continuity of Care Efforts: HL7 Pipe-hat.
HL7‟s older V2 in many respects it follows the EAV model as a messaging system, often
called “pipe-hat” due to its pipe „|‟ and hat „^‟ delimiter characters. Pipe-hat could not be
described as an ongoing effort because further development of it is deemphasized by
HL7 in favor of the XML-based approach, and so our own efforts here are preliminary.
However, it will become important for pipe-hat users to convert to another
representation, and ironically, while pipe-hat conversion to XML is somewhat difficult
[59], conversion to Q-UEL is relatively easy because of the granular EAV nature. A
distinct string in a pipe-hat message describes, say, the reaction to codeine (an allergy
or AL message string found in, e.g., HL7 V2.5.1 pipe-hat), which can become a Q-UEL
attribute. Metadata are consistent but variations in expressing content are considerable.
1.12. Comparison of Q-UEL with Continuity of Care Efforts: JSON
Overall, we see JSON [22] as a newly arising challenge to Q-UEL although it has been
developing since about 2001. JSON is currently mainly used to transmit data in general
between a server and web applications. What is transmitted is not confined to medical
data, but it already has the considerable advantage over Q-UEL of having a significant
number of medical installations. In principle, JSON poses little more competition to QUEL than does traditional XML. Where JSON argues that it is superior to XML, the
arguments include those which Q-UEL uses to indicate its own superiority to XML, i.e.
that XML is comparatively verbose, hard to read in practice, slower to process,
inefficient at handling of certain types of data, has a need to “escape” many characters
that have special reserve functions, and lacks association with any family of
programming languages. XML argues that compared with JSON it is no less easy to
read if formatted correctly, and is extensible and flexible. It could be argued that Q-UEL
satisfies both, primarily by building on XML towards a programming language form (and
retaining miscibility with, but not absolute dependence on, Perl [1]). Nonetheless,
JSON‟s features, such as attribute-value pairs, are already Q-UEL-like and give JSON
some status as a UEL with EAV flavor and make conversion to Q-UEL relatively easy.
Consistent use of JSON for the EHR is not yet established, so Q-UEL interconversion
with JSON is preliminary. There are significant variations in implementation because
new organizations often use JSON as their first entry into encoding patient data. In
practice, JSON‟s recently rising popularity as a candidate for the EHR is probably
largely because of low development and maintenance costs compared to existing XMLbased solutions. It is also seen as compatible with NoSQL type databases which allow
well-structured and efficient storage at typically less cost than relational data base
solutions. JSON is also exchangeable with web-based technology (like NodeJS on
server side and html/javascript), although as noted above, JSON itself arguably really
only comes close to a web-based UEL in forms like JSON-LD [23]. This allows an
application to start at one piece of linked data, and follow embedded links to others
hosted on different sites across the Web. However, Q-UEL is not technically behind
here: for example, Q-UEL‟s X-tract tags gather their own data content [1] by
automatically surfing the web to spawn new X-tract tags, each containing a canonical
rephrasing of a chunk of the source text in a way that maps to semantic triples. JSON
may also be popular simply because there are many programmers familiar with the
Java-like family of languages. Like Q-UEL, JSON essentially represents a programming
language, in JSON‟s case by being developed from JavaScript, but JSON as usually
applied is simply concerned with emulating XML layout in JavaScript format. To our
perception it is not a desirable format change, considering the useful similarity between
XML tags and the Dirac notation that provides a system of probabilistic algebraic
objects. JSON make no such claims.
We do not really have a UEL for healthcare if we actually use a different language like
JSON for each different aspect of it, but even confining debate to the COC domain, QUEL was designed to support several features directly important to COC at a
fundamental level. These include disaggregation, a powerful AML attribute structure,
and RDF references. Also, arguably important in the future is Q-UEL‟s treatment of
measurement uncertainty and its power as a probabilistic algebra (Section 1.6).
1.13. Features of the Present Report of Interest to Stakeholders in Healthcare.
The journal invited us to describe how the present work could have impact on the work
of stakeholders such as physicians (Section 1.1), who could therefore also be interested
readers. In general, overcoming the concerns expressed in the 2010 PCAST report
would, of course, be beneficial, and stakeholders are referred to that report [12]. The
following points may be particularly born in mind.
a. Physician Authority. One aspect of PCAST 2010 that is controversial for physicians is
that disaggregation and increase granularity also appears to have been favored by
PCAST [12] because each stakeholder need only see what he or she directly needed to
know (Sections 2.4 and 5), in contravention of traditional physician authority and holistic
principles of medicine. However, these matters are up to future public opinion and
legislature, they are not an automatic consequence of disaggregation. The technology
described below allows flexibility. It could put more control in the hands of patients, or of
personal physicians, or both in collaboration, by means of a fine grained consent
language written into the source record (Section 3.6).
b. Simplicity of Applying Disgaggregation. When a patient is travelling abroad and the latest
EHR or medical summary is needed, it can be quickly obtained from the Internet, by
those authorized and consented to see it. No special server configuration is required; a
simple website would suffice (indeed, a patient or traveler in general could simply
distribute disaggregated data as attachments across email providers and drop-boxes, as
a method of keeping copies of important documents that might be lost while traveling). If
the reaggregation application that reassembles the disaggregated data is not at hand to
the physician, a basic one could be downloaded freely from the web. Armed with a
reaggregation application, all that is required to obtain the medical data is, even in the
rather elaborate setup described in this report, either some four passwords or keys, or a
digital certificate and a “job number”. It could be simplified to a single password. If a
legitimate medical worker in a foreign country lacks recognized authority, then that can
be assigned as a kind of separate key.
c. The Patient Record as Query to the Web. In the longer term, a UEL with such a
semantic structure will be able to interact fluidly with a future medical Semantic Web
particularly designed to help physicians and patients. It is particularly well seen as an
aspect of COC when a single specific patient record is transmitted and reaggregated
and plays a central role in clinical decision support: in effect it can be, in whole or part,
the query: “What does this predict for the patient, what can be best done to enhance the
probability of a favorable outcome and future, and what gaps in knowledge about the
patient need to be filled?”
d. Medical Notation and Educational Value of Q-UEL. Perhaps least obviously, Q-UEL also
aspires to be a medical notation, so there is perhaps more interest to be found by a
physician or medical student directly looking at (unencrypted or decrypted) Q-UEL than
is typically the case with other computer languages. Originally, it was felt that the ability
of medical personnel to read directly and understand tags or similar transmitted artifacts
might be a lifesaver when the infrastructure fails or becomes potentially overloaded, as
in New York on 9/11, but when texting, faxing, or couriers are still possible [1]. Q-UEL
stripped of web management details (which are nonetheless not too distracting) seems
to provide an intuitive notation despite its origins in theoretical physics. It is capable of
describing patient data, evidence based medicine (EBM) and epidemiological measures,
medical knowledge, and healthcare workflows. Aspects of it are currently being taught tp
medical students in an EBM course by author BR.
It remains that the larger part of the present report will be of more interest to systems
designers, algorithmists, and developers in the HIE domain. To aid such persons, an
established mathematical paradigm, such as the Dirac notation in the present case, is
always a useful guideline as follows.
2. Theory
2.1. Q-UEL Medical Notation.
Many of the considerations specifically required for COC might not be expected to be
very theoretical, but the tag structure and algebraic interpretation of Q-UEL based on
Dirac‟s QM does make it well suited to the PCAST view of COC. As for Q-UEL in
general, the Dirac notation determines the format of what is yet to be disaggregated,
what disaggregated encrypted and consented tags look like, and summary information
from data mining all these. The commonest Q-UEL tags express Semantic Web
semantic triples in the form of Dirac‟s bra-operator-ket. They are formally probability
duals (Pfwd, Pbwd) [1,8].
< subject expression Pfwd:=x | relationship expression | object expression Pbwd:=y>
Tag value attributes Pfwd and Pbwd represent the tag‟s algebraic value, the h-complex
probability dual (Pfwd, Pbwd) [1, 5]. For example, Pfwd = P(“Type 2 diabetes causes
obesity”), but Pbwd = P(“Obesity causes type 2 diabetes”). Pfwd and Pbwd are not of
further concern here for reasons in Section 1.6, and the implied dual is (1,1) = 1.
However, Pbwd:=0 will be seen to indicate irreversibility, as in workflow. The default
value for any quantity not mentioned, including a probability, is 1. Q-UE”s deferment
“mechanism” (Section 1.6) is that probability 1 does not necessarily indicate “100% true”
but can mean ignorance about probability or deliberate choice not to convey it, lack of
communicated information I = -log(P) = -log(1) = 0, a statement posited awaiting any
possible refutation according to Popper‟s principle, and lack of quantitative impact in a
purely multiplicative inference net [1,8] as if it were absent.
Although explicit probabilities Pfwd and Pbwd are deemphasized in this report, the
same controversial probabilities for single patients (Section 1.6) appear in the theory for
disaggregation. Without disaggregation, a particular combination of attribute values can
have a very small joint probability, and that could uniquely pinpoint one record out of
millions. It might perhaps ultimately identify, to a dedicated, unauthorized and malicious
person, the one patient who has that combination. Conversely, with disaggregation, we
must design matters so that if an unauthorized person might be able to decrypt every
shred of data from many patients in a mix, they still could not, even with substantial
knowledge of a patient, see which pieces join up with any high probability. The
probabilities of greatest interest now become those that relate to “bounds” on these, i.e.,
reflecting the fact that it is good to make illegal access more improbable than seems to
be required, but not less so. Notwithstanding that, they must also represent a balance,
the probabilities with which an authorized person can correctly reaggregate the
attributes in reasonable time, versus the probabilities that a malicious, determined, and
computationally well-equipped unauthorized agent might also do so. Note that between
those two, any disaggregation solution inevitably presents a notion of error, of a certain
probability that the tags joined do not long belong to the same patient. Equivalently, with
appropriate verification mechanisms, that will manifest less dangerously as the problem
of persistently failing to find tags that do join.
Evidently, disaggregation systems must be designed so that probabilities of undesirable
effects are astronomically small. This is primarily a matter of matching between strings
carried on disaggregated tags and strings generated by the reaggregation application.
That is, the above kinds of probabilities are not written explicitly on tags in Pfwd and
Pbwd attributes, but as seemingly random character strings called join strings as values
of other special “tag value” attributes. They proactively determine, not passively report,
the probabilities involved. In effect, it is as if an unauthorized person has to guess a new
password to rejoin each and every data element while also guessing which tags belong
to the relevant patient, and the system can be designed so that these two are not the
same thing.
2.2. Q-UEL Tags in Continuity of Care.
The tag examples most prevalent for COC applications and that also use the default of
probability 1 are as follows, with the exception of (4) that is a statistical summary over
many patients, and (5) that displays Pbwd = 0 to indicate irreversibility. Types (6), (7)
carry the seemingly random join strings mention in Section 2.1 that enable their
reaggregation, but perhaps surprisingly (8), which is the only tag that carried reversibly
encrypted clinical data, does not (see discussion below).
(1) < patient ID and demographic data | has | stakeholders and clinical factors >
(2) < patient arbitrary ID | consented | clinical factor >
(3) < patient arbitrary ID | consented jointly | clinical factors >
(4) < clinical factors | when | clinical factors >
(5) < patient ID and/or demographic data and/or stakeholders and/or clinical factors
| triggered | stakeholders and/or clinical factors > (or “should trigger” etc., see Section 3.5)
(6) < irreversibly encrypted mapping to tag type (7) and isomorphic mapping to other tag
types (6) that come from same record |
(7) | irreversibly encrypted mapping from tag type (6) and to next tag type (6) in sequence ><
irreversibly encrypted mapping from encrypted data element(s) on tag (8) |
(8) | reversibly encrypted clinical data element(s) and optionally authority >
The functions of these tags as follows. Tag type (1) is essentially the medical record or
sub-record as a selected part of it. Tag types (2) and (3) are only released if consented
to be so, subject to fine-grained consent instructions in the source record. Simply by
containing more than one clinical factor, type (3) poses more risks to privacy than type
(2). Usually having visible content, they are freely available for data mining for research
including quality control. The results that produces, as statistical summaries over many
patients, are represented by tag type (4) described in Ref. [1]. A type (5) tag records a
workflow event that can be used to update tag (1). Types (1) and (5) generate many
encrypted tags of form (6), (7), and (8), and optionally many of the tags consented for
research. Use of tags (6), (7), (8) is the case considered in this report. Three
components are required to store and transmit a single data item d, so this is called a
triple shred of conceptual form <…| x |…><…| x |…> where the x represents joining4.
2.3. Disaggregation into Bra, Ketbra, and Ket Tags.
While placing patient data on the Internet seems risky (Section 1.13 (a)), the
disaggregation approach aspires to turn this risk on its head, and to advantage. The real
impact of disaggregation as a security feature is in the role of an addition to encryption
when disaggregated data elements are shreds mixed with shreds from hundreds of
thousands or millions of other patient records. The additional security feature this
provides is really the challenge of finding several specific needles in a haystack of
needles that all look like equal candidates. Dirac‟s notation, and the physics it reflects,
seem helpful by providing two particular principles as a conceptual framework for this
challenge. First, Dirac notation carries an inherent notion of aggregation. Entities <A|
and |B> are representations of physical states analogous to nouns or noun phrases in a
natural language but aggregated forms as products <A|B> (shorthand for <A| times |B>)
and <A| R |B> also exist as single algebraic objects. They describe the relationship or
4
Contrast the simplest possible disaggregation case of encrypted data elements d 1, d2, d3 etc carried in
corresponding tags <…d1…> x <…d2…> x <…d3…> ; this constitutes a single shred, meaning a single
shred per data item. One can also have a double shred using only tag types (6) and (8) and of conceptual
form <…| x |…>, and a multiple shred using (tag types (6),(8) and multiple tag types (7), of final form <…|
x |…><…| x |…><…| x |…><…| x |…>….<…| x |…>.
transformation between the states, and are analogous to a simple sentence. Second,
there is the notion of an entity called an index as describing what is essentially the same
state irrespective of how we measure or name it, so that we might write say |3>, and the
notion of operators increasing or decreasing an index, say of converting state |2> to
state |3> or vice versa respectively. The aggregation <1|2><2|3> makes physical sense
by the “chain rule” [8], an expression that estimates <1|3> with certain independency
assumptions, but makes no sense seen as a vector or matrix product of <1|2> with
<2|3> because each is scalar (though there is a relationship, making indices the more
general method, as decribed below). In practice, the index must relate much less
obviously to the join string (Section 2.1), including by increasing the index, and the way
of doing this is explained below (Section 2.5).
The reason why the above tags (1)-(8) had the forms of bra <…| (a row vector), of
ketbra |…><…| (a matrix) and of ket |…> (a column vector) respectively is that in Dirac
notation <A| R |B> is also an expression as a non-commutative (order dependent)
product, reversal of which can be seen as an aspect of disaggregation or shredding,
i.e., <A| R |B> → (<A|) (R) (|B>) with R representable by some product |X><Y|. It
represents the disaggregation of each data element B from a record, A and R having
linking functions. Recall that <A|B> and <A| R |B>, although products of vectors and
matrices, are not themselves vectors or matrices but simply scalars (usually, however,
complex scalars, i.e. with an imaginary part). Because the scalar value contains less
information than the vectors and matrices that each represent arrays of many such
scalar-like elements, it is aggregation, not disaggregation, that is associated with
information loss. This is contrary to what we would normally think of in regard to entropy
increase in a document shredding process. One might imagine the algorithm that the
available vectors and matrices are generated so that only particular ones will aggregate
to give an object with a particular required scalar value. In practice, this is complicated
and inefficient, and when we first disaggregate a large <…|…|…> form into many
smaller <,,,|…|…> it seems a less appropriate model. Indices become useful as more
generally applicable. An index approach is also a general kind of symbolic
representation of a vector matrix-approach, because in the above example <1|2><2|3>
one could have considered it as meaning <1| R |3> where <1| and |3> are vectors and
R = |2><2| is a matrix.
Theoretically, what disaggregation is doing is adding a dimension of entropy protection
on top of encryption. It is as if an encrypted patient record on paper is shredded by an
office shredding machine into a trash hopper containing encrypted shreds from many
other patient record shreds, yet can be re-aggregated from the mix on demand by
authorized persons with appropriate keys and/or digital certificate. The hopper mix in QUEL jargon is a “tag soup”, and a basic Q-UEL principle is that all Q-UEL tags, for any
purpose, can reside in this soup, recoverable by queries. Fig. 1 exemplifies how this
happens by joining just two clinical data elements that correspond to QM measurements
5 and 10 in the figure, the values being reversibly encrypted character strings. These
are the only two real observations or measurements. Joining these and the other
“virtual” measurements as character strings makes use of the idea of features that are
mutually degenerate after certain index operations based on evolving keys, i.e. closely
but obscurely related (Section 2.5). The essential “marriages” of degenerate
measurements are (2,3), (4,5), (2,7), (7,8) and (9,10), in practice in that order. (1,2),
(3,4), (6,7) and (8,9) are permanently joined. Remarkably, reversibly encrypted real
data need be all that each ket carries, by use of a further irreversible encryption to a
string on the ketbra tag (8). Bra tags and also ketbra tags have rejoining functions to
rebuild the source record. (4,5) and (9,10) link bras to ketbras, and (2,3) and (7,8) bras
to ketbras. It is natural to think of (1,6) as joining bra to bra, and this is so, but
theoretically (2,7) would suffice. Join (1,6) has a verification function, but it becomes
essential as the information contents of measurements 2, 3, 7 and 8 are reduced by
taking small substrings of them to enhance security. The relationship between strings 1
and 6 is that they encode two graphs isomorphic to each other, i.e., they are really the
same graph expressed differently. Note that for security the entities represented by
disaggregated tags never actually physically rejoin in reaggregation: a tag initiates an
automatic query for the next via a receiving application.
Fig. 1.
2.4. Disaggregation and Reaggregation by Data Elements.
The implications of this aspect of the 2010 PCAST report are perhaps less obvious than
may first appear. One has, perhaps, a mental picture of clinical data written on pieces of
a child‟s construction set like Lego pieces, that can be plugged into each other and then
unplugged. Strictly speaking, this is closer to granularity, which has many advantages
but may or may not have something to do with disaggregation as a security feature. In
practice, the principles that Q-UEL uses are also applicable to shredding and
unshredding documents arbitrarily, without worrying what each shred represents. That
could work for COC, and it does not require meaningful granularity. That is fortunate
when one comes to shred, for example, medical images. However, apart from other
advantages of granularity, arbitrary cut-points in shredding that ignore the granular
nature of the content can sometimes be a bad security idea. If the shreds are illicitly
decrypted, they can then start to look like pieces of a solvable jigsaw puzzle. By way of
simple example, disaggregated shreds of data such as „systolic BP (mmHg)‟:=160 and
„diastolic BP (mmHg)‟:=85 that keep integrity of data representation are (if decrypted)
actually much harder to see as belonging to the same patient than arbitrary shreds
“„systolic BP (mm” and “Hg)‟:=160 diastolic B” and “P (mmHg)‟:=85”. Much smaller
shreds say of one character would get round that, but be inefficient! The PCAST report
is often quoted as requesting “disaggregation by metadata”, which would translate as
disaggregation by Q-UEL attributes, though the term actually used was disaggregation
by data elements. Either way, the PCAST motivation for disaggregation coupled to
granularity is apparently partly because the same granular structure can then be used to
provide some authorities with subsets of all the data elements on a need-to-know basis.
One need only, in that viewpoint, reaggregate what is required.
This above issue of partitioning of data on a “need-to-know” basis touches upon a
controversial area (Section 1.13) but gives some impression of what a “data element”
needs sometimes to be, i.e., in practice, a block of related knowledge. A ket data tag,
say |B>, will often contain just one data element (though there may be an authority
attribute and additional web management features). Nonetheless, Q-UEL‟s AML also
allows this one attribute to contain several metadata and values, perhaps in a
hierarchic, ontological structure that could map to at least a part of an XML document
[1], as a “bundle” of data elements. Attributes to be disaggregated from each other in QUEL documents are those separated by a logical and operator (or implicitly so, since
and is the default). Note that immediately prior to disaggregation, non-identifying clinical
attributes can be moved into the ket part. Strictly speaking, a disaggregation as an
equation |A, B, C,…> = |A>|B>|C>… implies a dimensional product sometimes
attributed to Grassman [11], not random association between A, B,C,... That implies the
need for additional information, as follows.
2.5. Disaggregation Indices and Formal Irreversibility of Disaggregation.
Indices lie behind the essential mechanism used for reaggregation. A disaggregationreaggregation model based on multiplying vectors and matrices was discarded above
both as inefficient and as a QM analogy that is not always so well justified (Section 2.3).
There was the more general notion of indices. In QM, indices are numbers that label
energy states uniquely, rather than referring to them by descriptions in terms of
measurements of physical values such as position or momentum. The term “mutually
degenerate” in Fig. 1 is the term as used in information theory. The QM terminology
would be that the measurements joined are the same state with same index as seen in
a different measurement representation, after the index describing the energy level of
one of them is raised by one. For example, simply using brakets, we might say that
<1|2> joins to <3|4> because we raise the |2> state to give |3>, so allowing the
aggregation <1|3><3|4>.
In practice, we obscure such obvious relationships in several ways resulting in a method
that is, admittedly, complicated overall. These ways are primarily (a) by working with
functions of the indices, say f(3), as the “measurements”, (b) never actually allowing
tags to join up, and (c) never actually having the function of a raised index represented
on a tag but issued as a query for a partial match via the reaggregating application. The
“measurements” of Fig. 1 are in practice the join strings as functions of indices mapping
to strings called evolving keys that represent the queries via the application. At each
stage of evolution these keys partially match (see below) the “measurements” on the
tags, say key k(i) matches to f(i) where i is, or relates to, the index. Q-UEL developer
jargon is often rather lax in distinguishing entwined terms such as “indices”,
“measurements”, “join strings” and “keys”. “Disaggregation indices” relates to both
evolving keys as queries and the “measurements” matched.
As noted above, the matches between join strings and keys are partial. It is important
that the evolving key strings are usually considerably longer than the join strings on the
tags. In other words, as an added security feature, we depart from reversibility in QM,
and the string to match on any tag is a small substring of the evolving key (though it is
not so for measurements 4, 9 of Fig. 1 in this report). Counter to first intuition, the
algorithm for disaggregation can be made deliberately irreversible by several such
means. What actually happens in reaggregation is that we simulate disaggregation, as
far as indices are concerned. The disaggregator requires no information to know what
the next tag is, since it is “fed it”, but the reaggregator has to be continuously fed
sufficient information by queries to overcome entropy protection and recover the tags
from the tag soup. On top of encryption, entropy protection requires approximately
nlog2N bits of information to locate n data items for one patient from N for many
patients.
3. Methods
3.1. The Experimental System in Overview.
The focus in Methods is partly on what unencrypted Q-UEL tags for COC look like, but
even more so on disaggregation and reaggregation since in Q-UEL systems the EHRs
etc. really spend most of their lives, stored as well as transmitted, in the encrypted and
disaggregated form. The Q-UEL system helps to explore many possible working
configurations. We here describe a configuration for which reaggregation is not very
efficient because it uses, and hence illustrates, all the security features described in the
present report, namely a full “triple shred” into bra, ketbra and ket for each data element
and several systems of evolving keys. We used the QuantalCloud system configuration
in Fig. 1 of Ref. [1] Fig. 1 as the test-bed. Fig. 2 below fills in that picture as required for
the present report. The system is really a flexible research toolkit that can accomplish
tasks in various different ways. For the kind of use described in this report, an EHR or
part of it, say tag type (1) in the classification of Section 2.2, a medical exchange artifact
say of tag type (5), an AML file, or any arbitrary document or image can be
disaggregated by an application such as QuantalSHRED. It is disaggregated into tags
of type (6), (7), and (8), but also tags (2) and (3) of visible content can be consented for
research purposes and released by the same application. Note that only if medical data
is in Q-UEL format or converted to it will shredding normally be by the data elements
belonging to attributes. Otherwise, shredding will still occur but is arbitrary, or more
correctly stated, the shredding is by attributes with arbitrary metadata names and values
that are small chunks of the document or image. The application will also release a
digital certificate, job ID number, substitution keys that can override entries on the digital
certificate, and an authority key, all of which can be held by and/or received by an
authorized person using QuantalShred in order to re-aggregate tag types (6),(7),(8).
The arrow pointing back from the digital certificate to QuantalSHRED is because
QutalSHRED could in turn shred that, generating yet again tags of type (6),(7) and (8)
and the associated information required including another digital certificate.
For the present studies, we used real patient data like that of Ref. [2] and a collection of
example medical records in various standards, as well as records converted to AML file
format and some data from public health studies as described below (Section 3.2). Data
emulated as consented were released (deidentified, and still confidentially within our
“laboratories”) at the same time with “plaintext” data, i.e. not encrypted. Reaggregation
requires a system of keys or passwords most of which are normally (but not
necessarily) carried on the digital certificate. When transmitted by other means they are
substitution keys which the receiver can use to overwrite decoys in the digital certificate.
3.2. Interconversion of Health Records with Q-UEL.
A great deal of work for the present report was done using EHRs and medical data
exchange artifacts written in Q-UEL or a corresponding tabular AML format, not least
because these were specifically designed to allow disaggregation in the PCAST sense,
while other representations were not. However, in order to get real patient data in the
absence of any true medical installations of Q-UEL at the time of writing, converting
source medical data from other standards to Q-UEL is a necessary first step. A caveat
on that is that we have substantial deidentified data directly obtained from some 2,000
volunteers in a public health study that “historically” represents the first instances of
patient data going straight into Q-UEL format, as will be described elsewhere. Even so,
we wish to tackle the standards interconversion problem.
Fig. 2.
Attribute metadata language AML plays a persistent major role in interconversion
(Section 1.7). In designing any Q-UEL tag layout, truly scientifically meaningful
ontologies of data that cannot be expressed at the same rank level are captured in
attributes using AML. Data that are similar but do not exactly describe the same thing,
alternative representations from different sources of the same data elements, or
obtained at a different time as indicated by timestamp, can all appear within one
attribute, as in the simple example metadata1:= (metadata2:=value2, metadata3
:=value3). The Q-UEL general specification does not insist that, say, an EHR be put on
one tag with many attributes, but it is an interesting finding that it can be done because
overall ontological structure of a source document often reflects rather arbitrary
administrative matters in the philosophy of a standard. However, in order to facilitate
converting Q-UEL back to source ontology, one may pre-append each value by its line
of descent, a single path, metadata by metadata, from root node to the leaf node value
being considered: metadata1:=metadata2:= metadata3:= …. :=value. We approach
conversion of an EHR standard by first writing an AMLF or AML file. The columns
contain the values, and there is one row per patient. The metadata is the first row of the
file (column headings), and that will be, especially for XML source documents, a fairly
long string describing the line of decent applicable to all values below it in the column.
This AML file is used to write the Q-UEL tags, but to avoid visual clutter on those tags, a
format such as metadata:=www.qexl.org /AMLF/QXML5/Header6/:=‟systolic blood
pressure (mmHg)‟:= 140 is used, where Header6 means the heading of the sixth
column of the AMLF called Q5XML with the last metadata name (directly preceding the
value) reproduced. It is subject to the encryption rules in Section 3.5. Finally, for
commonly used tag types, attributes are rationalized (combined and simplified) in
various ways to satisfy Q-UEL human-readable style [1]5.
3.3. Tag Status.
As indicated above Q-UEL is being using directly in at least one health study, and QUEL tags also get interchanged in collaborations with other workers involved with real,
but sometimes contrived, patient data. There is always a small risk that data on such
tags, once out of our hands, could be directly or indirectly accidentally used for a real
patient. There is also perhaps a larger risk that directly or indirectly such data is
included in the data mining of what is supposed to be all real and reliable patient data.
Conversely, we can also foresee cases in which, analogously to the findings of another
interconversion study [25], real medical data is only partially converted but still
considered usable for the patient. For all such reasons, Q-UEL tags carry annotation as
to their status and whether they can be used for medical purposes. All Q-UEL tags,
including design suggestions and simplified examples are executable, and status is
indicated as an appendage to the tag name (the tag name is seen as just a special kind
of attribute). It shows provenance, including say, whether HL7 CDA/CCD or VistA was
the source (should there be multiple sources, the SOURCE metadata name is also
associated with the data in attributes: see Section 1.7). There, “handcrafted” will
indicate that the converter was at least in state of transitioning from specification to
actual use. Exception comments, i.e., in {!...!} brackets elsewhere in the tags, clarify that
or indicate deliberate omission of data, and often imply that Q-UEL is being created but
that reversal is a problem. Q-UEL binding variables, each a $ character followed by one
5
Notably, reserved words are implemented that can stand for the same idea in different source
vocabularies, and for common RDF references. The reserved words can be functions of tag name and
SOURCE (Section 1.7), and their meaning therefore resides in converter subroutines, not directly in RDF
references. This distinction, however, is not absolute in Q-UEL‟s design. Specification allows that these
subroutines can be referenced by RDF links on tags, including as downloadable code [1], although this
has yet to be fully implemented in the COC context. The mechanism is similar to that by which <A | R |B>
is seen as a dyadic function and R invokes executable code [1]. One purpose of such execution is to
redisplay the tag with implicit meanings revealed.
or more upper case letters indicate, are used to substitute for confidential, irrelevant, or
unknown information. Known information can be replaced.
3.4. Workflow and Job ID Number.
In COC, the correct sequence of events, and proper completion of it, is typically
important. Fulfilling a prescription is an example given below. In any PCAST UEL-style
scenario, there are two particularly practicable possible approaches to this, as well as
variants in-between. One can either put information onto tags to control flow, or have
each step of tag use initiate the release of a further kind of key, a workflow “number”
that has to exist and be used by a subsequent application in order that that the
subsequent step can be accomplished. As Fig. 2 indicates, Q-UEL uses the second
method. It is potentially more flexible, more secure, and allows a process to be halted
or withdrawn by destruction of the key. It is easily extended to a more managed system
such as when a QuantalMASTER application [1] is responsible for release and
management of keys. One can have some processes for which either one of two (or
more) keys from different sources might suffice (OR logic in workflow), and some for
which at least two keys are required (AND logic in workflow).
When any source is used to generate Q-UEL, a job ID or “job ID number” is also
released. It is a particular case of a more general workflow number in a Q-UEL system,
required to be available to a task in order to trigger it or authorize its use. It is an integer
typically of about 14 digits that is (usually randomly) generated by the preceding task.
Used rather like a transient PIN, chance duplication in the system causes no conflicts. It
does not usually appear on tags, since having to know it is the means of controlling use.
A job ID is generated by disaggregating applications for encrypting and decrypting the
keys on the certificate. The job ID is not required if all the keys are not on the certificate
but transmitted unencrypted by other means, but workflow control can be retained by
using it as a key that irreversibly encrypts data on the ketbra join tag.
3.5. Transformation of Q-UEL Tags by Encryption Operations.
In a workflow, the tag released as output from a process in workflow may look quite
unlike that which was input to that process. It is not practicable in general to have the
detailed instructions of the transformation encoded in the workflow number, or by
releasing a new kind of “instruction tag”. It is also not always prudent to have it fully in
the hands of the receiving appplication, but rather to let the incoming tag have
considerable say as to what transformation is intended. The following is an important
aspect for managing this, intrinsic to AML. It is a notation in tag attribute that tells an
encrypting application, or part of one, what to encrypt in that tag.
When a tag is constructed from source data, what constitutes metadata and what
constitutes an encryptable attribute value is adjustable in tags optionally initially
generated. If for any attributes on a tag there is no string :=, the metadata operator) the
usual default is that the whole attribute is encrypted to, for example, „zC7g9FqY22M7
(encrypted attribute)‟. It does not apply to a reserved set of words that are considered
as operators, not attributes. Otherwise, if := is present, only the rightmost terminal node
or “leaf” items in the attribute expressed in AML are considered as values for the
purpose of encryption. Such a “leaf” item is the string of characters to the right of the
last metadata operator := in the branch. When replacing any := by a single equal sign =,
that := rule is still followed, forcing some metadata to be seen as part of the value and
become encrypted. So sometimes the uninformative attribute metadata:=‟encrypted
material (encrypted data)‟ may be seen. In Q-UEL, tag names are special cases of
attributes and may contain := and/or = operators. Similar rules apply, but QUEL-X
where X is an alphanumeric string is usually not encrypted.
3.6. Data Extension and Fine Grained Consent.
If an application were to report to a stakeholder that a patient has a blood glucose
concentration of 160 mg/dL it may be the most important piece of information about that
measurement, but it is certainly not the only important one. The context is important, but
even more intimately involved with that number is when it was measured, the reliability
of the measurement that reflects the technique used, and (considering the fundamental
importance that Q-UEL places on consent) whether the patient consented the
information for research purposes etc. Q-UEL tries as much as possible to deduce and
secure from sources such extra “dimensions” of data qualification that it considers
important. A fuller format is metadata(units):=value+/-dispersion(time)(consent consent).
For quantitative data the data value is formally an expected, most likely, or mean value,
and data quality is usually an optional degree of dispersion such as in „systolic blood
pressure (mmHg)‟:= 142+/-9CI. The time stamp is usually in Unix/POSIX readable date
format for GMT, as in „142+/-9CI(Sat Feb 8 23:56:44 2014 GMT)‟. Since it could help
link data ket tags that are illicitly decrypted, it can also be suppressed and presented in
a separate time attribute in the same tag, or as a relative attribute time to last such time
attribute, e.g. 142+/-9CI(+4:20:00 2014 RAT). By mechanisms used for consent, any of
the level of detail of data encrypted or exposed may be modified. Consent is recognized
by (consented …) inserted after the data element in the source document. In
disaggregated tags one could see, if decrypted, „systolic blood pressure (mmHg)‟:=„142
+/-9CI(Sat Feb 8 23:56:44 2014 GMT) (consented nearest 5 and year)‟. If consented
visibly it appears as „systolic blood pressure (mmHg)(nearest 5)‟:=140 +/-9CI(2014)
because “nearest 5” is considered as of the nature of units and distinct from the
dispersion to subsequent advanced statistical analysis. Note that GMT is the default.
This is growing to a rather flexible consent language that can express otherwise,
necessarily so because there are subtle pitfalls to both patient privacy and analytics.
3.7. Information Required for Reaggregation.
It is evident so far that a significant amount of separated information is required to
reaggregate a disaggregated record or other clinical artifact. Again, the following is for
the “triple shred” approach. By use of the above information, it provides significant
flexibility as to how a system may be set up, and finer control of what the patient and
physician would like to happen. For example, a government data miner might be given
keys allowing the ability to decrypt and data mine private data tags, but not necessarily
to be able to reaggregate and tell which record they came from. Further refinements
such as alert and backtrack mechanisms discussed briefly below represent separates
“layer” and depend on release of additional kinds of tags. In general and most
fundamentally, use of the patient‟s data is controlled by the following.
(1) The Keys and their Roles. Whether or not they are carried on a digital certificate, the
following keys are required. PW1 or patient password key represents an irreversible
encryption of the password assigned to, or chosen by, the patient, and like the other keys on
the digital certificate, it is further reversibly encrypted to be decrypted by the Job ID. It
contrasts with the other keys on the certificate in that they are usually randomly assigned by
the disaggregation software, though this can be overridden. PW1 forms the seed for a
sequence evolving generation of a new key from it that is required to query down the next
chunkID or bra tag that will enable the data or ket tag to be incorporated into the reaggregating record. As a seed, PW1 requires PW2, the patient password evolver key, as a
particular catalyst key to do that irreversible encryption. PW3 is the decrypting key, really
two keys entangled: PW3a reversibly encrypts and decrypts the chunkID attribute value, a
string isomorphic to the chunkID values on the other tags, and PW3b reversibly encrypts
and decrypts the clinical data. PW4 is the starter tag. It is not required in variants of the
method where substrings to match on tags are long enough to be unique. It can be
envisaged as an dummy tag that queries for the first true tag in the re-aggregation process.
(2) The Authority Code Key. Authority code can be a further attribute value slot on the digital
certificate to assign authority for the first time or temporarily, but usually it is absent because
it is held for a prolonged period by the person or persons it authorizes. This key is seen as
weak security, because the same authorization can be assigned to a large number of
stakeholders with the same role. However, it could be unique, and can be in part an IP
address for the re-aggregator‟s machine. Visible or irreversibly encrypted authority codes
can appear in tag attributes, but can also optionally be encrypted along with the encoding of
the graph for the isomorphism test in the chunk ID attribute value, or not be present on the
tag at all but still required like any other key.
(3) The Club. The club is a large group of patients to which a patient has been assigned, such
as New York State Veteran, or an arbitrary club. Its primary function is to regulate the speed
of reaggregation indirectly by restricting the number of tags to be queried. It is a visible or
encrypted value of the club attribute. This club is often known to the authorized stakeholder
but can also be passed externally. A patient travelling abroad can be assigned to a
temporary club. A club name is an integer, or implies one. In the present report, assigning a
starting a run of digits that mathematically
patient to club n picks out an n th digit of
determines a very large family of possible graphs for that club. One member of that graph
family is randomly generated and assigned to a disaggregation task, not a patient.
(4) Encryption Level is the level of intensity of encryption applied to all the keys as they are
applied successively in the disaggregation and re-aggregation process. There are currently
four levels of which level one is least secure and fastest; it is currently normally level 3 that
is applied. The level is regarded as a fixed feature of an installation, though it can be reset.
3.8. Disaggregation and Reaggregation by Evolving Keys.
The following gives details of the evolving keys and how they work. The disaggregation
and reaggregation routines comprise a toolkit and any specific implementation is called
the shred configuration. As noted above, the approach used in the present study is the
most complicated and slowest configuration that has been explored, but illustrates the
broad range of optional features. Bra, ketbra, and ket tags are used (again, the socalled “triple shred”) with the full four separate systems of keys, corresponding to the
four quadrants, “key cycles” Ai, Bi, B’i, and Ci, shown in Fig. 3.
The upper half of Fig. 3 purely involves reversible encryption RE while the lower half
involves irreversible encryption IE. The transformation steps are applied, i.e. “the cycles
turns once”, every time a bra, ketbra and ket are conceptually joined. Recall that these
components never physically meet up, but query each other via the receiving
application. The order of application used here is Ai → B’i → Bi → Ci, repeated for each
data ket tag and its associated bra and ketbra. In describing them, however, it is helpful
to take them in order of increasing complexity, as follows.
(1) The use of cycle Ai at the lower left of Fig. 3 is relatively straightforward as the prototype
principle of the disaggregation method. Here, PW1, the encrypted patient‟s password, is the
initial string, i.e. seed, acting as a “plaintext” message to which in a chain of irreversible
encryption steps are reiteratively applied, one further encryption of the message per cycle. It
is the message string evolving from PW1 that queries for a match with a substring on tags.
Such substrings arising from irreversible encryption are values of join attributes on tags, and
they can be very short substrings, especially of the bra tag, such that illicit reaggregation is
confounded by mismatches unless supported by cycle B’i. In reaggregation, the join
attributes with their short strings as values are the first “text” hunted by queries in each cycle
of searching in an archive of Q-UEL tags. Cycle Ai conceptually joins bra and ketbra tags,
since the bra tag effectively queries for the ket tag via the application. However, since the
index used to match the string on the ketbra is then processed to match a substring on the
next bra in sequence, it also conceptually joins bra to bra. A reaggregating application
progressively transforms the message from PW1 and requires PW2 as key for the
encryption steps, a key of that kind being dubbed a catalyst key. The above is already a
viable algorithm if the catalyst key, here PW2, is unchanging. However, for all the cycles in
Fig. 3, they are also described as “seeded by” catalyst keys because catalyst keys can
themselves evolve. This makes unauthorized attempts at reaggregation a little more difficult
while adding negligibly to the time taken for the process. In the present study, the further key
required to do this was simply built from substrings of the “message” evolving from PW1 and
the string derived from PW2 from the previous cycle.
Fig. 3.
(2) At lower right of Fig. 3, cycle Ci joins ketbra to ket via strings as data attribute values on
these, though the value of such on the ketbra is irreversibly encrypted. The data ket tag
need only contain reversibly encrypted data with its attribute. The match to the
corresponding irreversibly encrypted string on the ketbra is a matter of irreversibly
encrypting the reversibly encrypted data, essentially the method used to verify passwords
behind secure websites without having the original password held on the server. It is
essentially like (1) above but starts with a string which is the reversibly encrypted data on
the data ket tag. The transformation of this string requires a catalyst key resembling PW2
but normally built from authority code, job ID, and club. This key evolves as does the
catalyst key in cycle Ai. Note that at each cycle of querying down the next data ket tag, the
string value of the reversibly encrypted data will, of course, change because the data is
generally different, but also the catalyst key cycle Bi shown above it in fig. 3, is changing.
(3) In the Bi cycle at the upper right of Fig. 3, the first string is the reversible encryption of the
source “plaintext” clinical data on the ket data tag attribute, and the key for each irreversible
encryption step in cycle Bi is part of PW3 which is really two keys merged. Reversible
encryption uses somewhat novel methods, not Perl encryption routines. In “Chaotic + XOR”
in Fig. 3, “XOR” actually means a bit (“binary unit”) shuffling algorithm that requires part of
PW3 (PW3b) as a catalyst key, this shuffling being combined with an XOR algorithm proper.
After each bit shuffle the exclusive disjunction operation using the bitwise logical XOR
(“exclusive OR”) operator is applied between every bit of every character in the message
evolving from the data on the ket tag and every bit of every character of a specified template
key string. So 01 or 10 with 1 in one string corresponding to 0 in the other gives 1, but 00
and 11 give 0. Both bit shuffling and XOR algorithm are reversible, i.e. allow decryption of
the data. “Chaotic” signifies the following. The template string is in the present study a
compound of authority code, job ID, and club with an integer generated by the Chaotic
procedure in that particular cycle. “Chaotic” means that a Chaotic process is emulated but
using integers. It is not fundamentally different from generating a chain of pseudorandom
numbers. These integers are used in the simulation of disaggregation that enables
reaggregation: the template evolves, but what it was at any step can be computed. The
“Chaotic + XOR” algorithm is also used in the following.
(4) The B‟i cycle at upper is essentially the same the same as for Bi (3) but in this case with a
reversible encryption PW4 of the encoding of the graph as a seed, and part of PW3 (PW3a)
used as the key to evolve it in the cycle. However, the “plaintext” data revealed by
decryption is not (as is the case for Bi) clinical data of interest, but a dummy ID called the
Chunk ID, the value of the Chunk ID attribute, different on every bra tag of the record or
“chunk” of it selected for transmission. Cycle B’i can be described as helping join each bra
tag to the next bra tag to form the “spine” of the record. More precisely, it provides
verification that what is being reaggregated by the Ai cycle does indeed belong to the same
record or chunk of such that was disaggregated, but this can be essential to reaggregation if
the join value on the bra tag is a very short string. Such verification is not dependent on the
order of reassembly. Recall that the Chunk ID value is a string encoding a graph that is
isomorphic to the graphs of the dummy IDs on the other bra tags for the source record, and
the graph implied by the dummy ID can belong to a family of graphs determined by the club:
see Section 3.7 (3). The bit shuffling algorithm in this case is supplemented with a
procedure that shuffles parts of a graph, in such a way as to provide an interesting feature.
Illicit attempts to decrypt the Chunk ID will, at various stages, reveal decoy solutions that
look like valid “plaintext”, meaning here not natural language, but rather well-formed valid
graphs for that club even though they are not valid for the specific patient.
4. Results
4.1. A Simple Tag Output Example.
Several types of tag were written by the above system, the most important being type
(1)-(8) as described in section 2.2. As an aid to understanding tag notation it is helpful to
start with one of the simplest. Personal medical data is usually stored and transmitted
in disaggregated and encrypted form. Some common principles may be made visible by
a simple unencrypted consented tag that appears where patients chose to consent
certain data.
<Q-UEL-CONSENTED-EXTRACT patient:=„7FNcZZ6c(random)' club:=1 | consented jointly | male
age:=35(2012) 'BMI(nearest 10)':=20 (2012) „blood pressure‟:=(systolic :=125+/-10CI (2012), diastolic
:=70+/-8CI:=(2012)) „Fat(%)(nearest 2)‟ :=10%(2012) Q-UEL-CONSENTED-EXTRACT>
String patient:=„7FNcZZ6c(random)' is probably least obvious. It relates to an alert and
track-back mechanism optionally consented by the patient, and requiring a special QUEL_ALERT tag as will be described elsewhere. Without that consent and tag, it carries
no meaningful information save that the authorization came from a patient. Otherwise,
the interpretation should be intuitive, as Q-UEL was designed to be readily readable by
humans. Note that single quotes round a metadata or value string are not required
unless there is embedded whitespace. That just the year was consented is obvious
without further annotation, but for statistical purposes data miners should be advised as
to the resolution to which data is re-expressed by the consent.
4.2. Patient Record or Summary Tag with Stakeholders.
In contrast to the above example, an EHR or extensive patient summary can be
represented by an extremely large tag, so only example content of such a tag,
highlighting features that are more peculiar to Q-UEL, are shown below. It is also true
that an EHR can be portrayed as list of smaller tags each like that in the above
example, or as an XML-like hierarchic structure as still valid Q-UEL [1], but these are
not favored formats. Though personal medical data is usually stored and transmitted
disaggregated and encrypted, exceptions are allowed in developer applications. The
term “stakeholder” (Section 1.1.) is used as attribute metadata, although most common
and important stakeholders, e.g., patient, physician, and pharmacist, are recognized
nominal-categorical data that can stand without it. Stakeholder attributes also reveal the
artifact‟s past and future workflow, and extends the idea from human players involved to
software applications and data involved. epSOS HL7 CDA source documents [59] are
conveniently rich in stakeholder information (except for the curious omission of the
physician in the source document at time of writing). Although the example below is of
patient summary type, ultimately the identifying and stakeholder information will be split
away from the patient clinical summary (Sections 1.9, 4.2). The emphasis below is on
the less familiar stakeholder content. Stakeholder information can be particularly
problematic in conserving privacy: a legal guardian is more revealing than a glucose
level. Generally, unless perhaps when one is absolutely sure that an example of patient
data is contrived, one should guard against risk to patient privacy and other rights, even
of a developer. To compare source and the Q-UEL rendering of its stakeholder content,
one may (at time of writing this) request authority to access the original source epSOS
document, at the S&I eHealth website [25] and should register first with the general S&I
initiative [24]. For present reading, Q-UEL masks potentially risky data in such a way
that the tags can still be used by the system, and the protections even reversed if
authorized, by comment indicated by {!...!} and by replacing data with Q-UEL variables
$... Key details below have been de-identified as $AAAAA, $BBBBB etc., Q-UEL
variables capable of being reassigned actual or mythical values when authorized. If
default operator and is specified, in some applications it optionally triggers a layout that
aids reading when there are many attributes, as follows.
<Q-UEL-EHR:=‟Patient Summary‟:=(meaning:=www.qexl.org/patient_summary_4/, source:=‟epSOS PS XML‟:=
„http://www.google.com/url?q=https://drive.google.com/file/d/0ByAfdYPeAnMejNEeEFZNWhqRjg/edit%3Fusp%3Dsharing &usd=2&usg=ALhdy2_JxPqGnmlQPdD7FbBWi4K8EottYA/‟,
„detected source title‟:= Slovenian:=code:=sl-SI:=‟Povzetek pacientovih osebnih podatkov‟,
Referrer:=‟Standards and Interoperability Initiative‟:=( http://www.siframework.org, EU-US eHealth‟:= „Work Group
Activities‟:= „http://wiki.siframework.org/Interoperability+of+EHR+Work+Group/), author:=‟Barry Robson(Jan 19
10:50:18 2014 GMT)‟ :=(http://www.qexl.org/Barry_Robson_1/, telephone:=(code:=US#):=1-345-945-1082,
email:=robsonb@aol.com), comment:=English:=‟Example transcription of epSOS PS XML‟, „Q-UEL
words‟:=English:=domain:=(EHR, demographic, stakeholder, LOINC, histories, complaints, diagnoses, prescriptions,
procedures, chemistry), „content words‟:=Slovenian:=code:=sl-SI, warning:=nonuse:=(example, handcrafted,
unencrypted))
patient:=name(„given then family‟):=„ $AAAAA‟:=‟http://www.qexl.org/SI_Patient_Reg$BBBBBB/‟
and address:=(‟physical address‟:=(country:=SI):=(city:=Ljublijana):=((street:=$CCCCC):=(residence:=$DDDDD),
postcode:=1000), telephone:=(+$EEEEE, use:=MV), email:=$FFFFFF)
and male
and birthdate:= „$GGGGG GMT‟
and speaks:= Slovenian:=code:=sl-SI
| has:=‟http://www.qexl.org/has_3/ |
stakeholder:=person:=‟primary physician‟ := {! „source data not detected‟ !}
and stakeholder:=person:=custodian:= („http://www.qexl.org/SI_ Custodian-Organization_Reg44444/‟,
address:=(‟physical address‟:=(country:=SI):=(city:=$HHHHH):=((street:=‟$IIIII‟):=(residence:=10),(postcode:={!likely
source error!}), telephone:=(+$JJJJJ, use:=MC)):=person:=name(„initial then family‟):=„$KKKKK ($LLLLL GMT)‟
:=(‟http://www.qexl.org/SI_Doccument-Author_Reg$MMMMM‟/
and stakeholder:=person:=„source document author‟:= organization:=‟ ZD Ljubljana ($NNNNN GMT)‟ :=(„http://
www.qexl.org/SI_Organization_Reg$OOOOOO/‟, address:=(‟physical
address‟:=(country:=SI):=(city:=Ljublijana):=((street:=‟ Neka ulica v ljubljani‟):=(residence:= 3 $PPPPP),
postcode:=1000), telephone:=(+$QQQQQQ, use:=WP), email:=$RRRRR‟}):=person:=name(„title then given then
family‟):=„$SSSSS:=‟http://twww.qexl.org/SI_Doccument-Author_Reg$TTTTTTT‟/
and stakeholder:=person:=„source legal authenticator‟:= organization:=‟ ZD Ljubljana (March {! „source data not
interpretable‟:= 2013033000107-sic !} 2013 GMT)‟ :=(„http://www.qexl.org/SI_Organization_Reg340008204600048/‟,
address:=(‟physical address‟:=(country:=SI):=(city:=Ljublijana):=((street:=‟ Neka ulica v ljubljani‟):=(residence:= 30b),
postcode:=1000), telephone:=(+386557925143, use:=WP), email:= {! „source data missing‟:= UNK-sic
!}):=person:=name(„tile then given then family‟):=„Dr. Stefan Pregl‟ :=‟http://www.qexl.org/SI_LegalAuthenticator_Reg540008204600049‟/
and stakeholder:= organization:=‟scoping organization‟:=‟ National institute of public health, Republic of
Slovenia‟:=(„http://www.qexl.org/SI_Organization_Reg340008204600048/‟, address:=(‟physical
address‟:=(country:=SI):=(city:=Ljublijana):=((street:= Trubarjeva):=(residence:= 2), postcode:= {! „source data
missing‟:= UNK-sic !}), telephone:=( +38612441597, use:=WP), email:=mailto:epsos@ivz-rs.si)
and stakeholder:=data:=„source document‟:=
„http://www.google.com/url?q=https://drive.google.com/file/d/0ByAfdYPeAnM-ejNEeEFZNWhqRjg/edit%3
Fusp%3Dsharing &usd=2&usg=ALhdy2_JxPqGnmlQPdD7FbBWi4K8EottYA/‟:=specifications:= code:=‟XML to QUEL converted‟:=(„xml version‟=1.0, encoding=UTF-8):=(‟ClinicalDocument‟:= moodCode=EVN,
classCode=DOCCLIN, xsi:schemaLocation=urn:hl7-org:v3 CDA.xsd, xmlns=urn:hl7-org:v3, xmlns:epsos=urn:epsosorg:ep:medication, (xmlns:xsi:=http://www.w3.org/2001/XMLSchema-instance/):=(typeId
extension:=POCD_HD000040, root:=$SSSSSS, „templateId root‟:=$TTTTT, „templateId root‟:=$UUUUU, „id
extension‟:=($VVVVV, root:=$WWWW):=(„code displayName‟:=‟Patient Summary‟, codeSystemName=LOINC
codeSystem=$XXXXX, code=$YYYYY))
and stakeholder:=data:=‟previous document to source document‟:=‟($ZZZZZ GMT)‟:= code:=XFRM:=
„http://www.qexl.org/SI_Organization_Reg$AAAAB‟/
{! A great deal of clinical data here !}
and tagtime:=„Oct 10 12:43:20 2002 GMT‟
Q-UEL-PATIENT-SUMMARY>
4.3. A Prescription Artifact.
The following example is a detailed record of a prescription event for a (definitely
contrived) patient in a MUMPS/VistA source code used as an example by the US
Veteran‟s Association. The relator is trigerred. This gives the “richest” example: prior to
the prescription being fulfilled, Q-UEL formalism allows just the bra part <…| to carry
the triggering event, here as a prescription request, although in practice to do that it
uses bra-relationship-ket forms with, for example, will trigger, should trigger, could
trigger, and would trigger with distinct meanings as will be described elsewhere.
<Q-UEL-PRESCRIPTION:=„order entry and results reporting‟:=(meaning:=www.qexl.org/prescription_3/,
source:=‟VistA FMQL‟:= http://vista.caregraf.info/fmql/:= referrer :=‟Tom Munnecke‟:=www.osehra.org/users/tommunnecke/, author:=‟Barry Robson (Sep 21 10:01:18 2013 GMT)‟:= www.qexl.org/Barry_Robson_1/,
comment:=‟example transcription of example VistA FMQL entry‟, warning:= (example, handcrafted,
unencrypted):=comment:=‟Do not use as input. Hand-crafted for discussion, specification, example, research,
development and test purposes only. May contain errors. This example contains RDF-style definitions and above tagname qualification features not in the original source.‟)
patient:=„John Smith‟:=www.qexl.org/US_Patient_Reg189958822/
and provider:=(center:=„Outpatient Site FMQL Clinic‟:= www.qexl.org/US_MedCenter_Reg8411/, (physician,
prescriber):=‟ James Kildare‟ :=www.qexl.org/US_MD _Reg74356/)
and Rx:=(simvastatin:=code:=(NDC:=000006-0749-54, VA:=4010153) :=www.qexl.org/simvastatin/,
tablets(number):=90, tablet(mg):=40, „prescriber instruction‟:= (literally:=„Take one tablet by mouth every evening‟,
formally:=(tablets:=1 by:=mouth with:=water(presumed) „when (local patient time 24 hour clock)‟:=19.00+/-4) ))
and Rx#:=„800018 (Mar 5 09:11:03 2002 local)‟
and fills:=(„earliest possible‟:= „Mar 5 09:11:03 2002 local „, „next possible‟:= „Apr 5 24:00:00 2002 local‟, „last
possible‟:= „March 6 24:00:00 2003 local‟)
and „patient status‟:=code:=SC:=(„not exempt from copayment‟, „days supply‟:=30 refills:=11,
renewable):=www.qexl.org/Verify_Status_US_Patient_Reg189958822_ US_MD _Reg74356_Rx#:=800018/
and order:=initiated
and „prescribing status‟:=expired
and „GMT minus local time(hours)‟:=7 and zone:=constant
| triggered:=www.qexl.org/ triggered_3/ |
dispensing:=(ordered:=10, „unit price($)‟:=0.80, available, delivery:=‟window pickup‟) and times:=(login:=‟ Mar 5
13:50:17 2002 local‟, fill:= „Mar 5 13:51:02 2002 local‟, „last dispensed‟:= „Mar 5 14:13:17 2002 local)‟, „label:=
„Mar 5 13:50:27 2002 local‟, release:= „Mar 5 13:50:42 2002 local‟))
and copies:=1
and counseling:=(given, understood)
and (pharmacist, enterer, printer, counselor):=‟Nancy Devillers‟:=www.qexl.org/ US_Pharmacist_ Reg101740/,
and order:=converted
and „dispensing status‟:=expired
and „refill status‟:=open
and „GMT minus local time(hours)‟:=7 and zone:=constant
and tagtime:=„Mar 5 20:50:43 2002 GMT‟
Pbwd:=0:=comment:=„Process is not reversible, and forward direction is certain as a matter of record (Pfwd:=1 is the
default)‟
Q-UEL-PRESCRIPTION>
4.5. Example Output Typical Disaggregated and Transmitted Forms.
We are now equipped to see what transmitted tags look like, and to give further details
based on these. Note below the recently included tagtypecheck attribute as a double
verification of interpretation, and the day attribute, which allows an optional clearance
cycle for the tags without giving much away as to timestamp. Tags with Tue will be
cleared from the Cloud next Monday (though the day can be back or forward dated),
and so on, after backing up the last weeks cycle of tags on a secure archive. Previously,
an optional day could, to same effect, be added to the tag name as, e.g., <Q-UELSHREAD1Tue. The first tag type of the “triple shred” method is Q-UEL-SHRED1, where
the 1 relates to the particular process, not the first tag from a disaggregated record.
<Q-UEL-SHREAD1 day:=Tue tagtypecheck:=pseudovector:=bra:=chunkID:=‟record spine‟
chunkID:='303e3e3e3031383e3e373e3e3c31313c3c32333c3c30383e323e3e313c363e3c2 d3
93c30323c30333e30313c35303c32303c32323632353e3c3e3c31303c3139303e3c313c3c3238
3e3e3c31373e3c303e3c30343532313e3e31363c34343e3e323e3e3c3c353c313c3e3c32323e3
c303e3c31323931333e37(encrypted chunkID)' club:=1 authority:='bkXodmTAB6ogDsWYIGoPt
c (encrypted authority)' join:='jwIqry(encrypted join)' |
The remarkably short string length in the join attribute is a (usually arbitrary) substring of
the actual index as evolved key with which it is matched. It is efficient as the first test,
but it is backed up by the isomorphism test. In contrast, the chunkID above is
remarkably long and repetitive, with common symbols. With our current routines that
also impedes hacking. When decrypted, it is currently formally a symbolic
representation of a graph that is isomorphic to other graphs on chunkIDs for the same
record, and it could be set to be a subgraph of such (a slower test). The set of valid
isomorphic tree structures is a directed graph randomly derived from a fully connected
graph assigned to a club, which at present is not secure when the club is a unique
integer because that graph is defined by a run of digits of that the integer indicates.
Hence the decryption and isomorphism applies to the scope of that club, and would not
conflict with keys etc, assigned to other clubs. Recall that there are also false or decoy
solutions for graph representations generated on route in disaggregation. The second
type of tag required for the “triple shred” is the Q-UEL-SHRED2 tag.
| tagtypecheck:=pseudomatrix:=ketbra:=operator:='join data' join:='5lpgWRM(encrypted join)'
to chunkID in club:=1 Q-UEL-SHRED2><Q-UEL-SHRED2 day:=Tue metadata:='History of
diabetes':=„ lnCLA FXgw 4.s3D NTSy7p2 2YPKyjpR/UlQ2fBt6fnLJ8Qq9BXHegd0gMrE5Am
UlcZuYDU2hVIaOu1Ul8sw Gd jT18YKTgC7.3GPpUCi5JLgNhtxgRfpDVTbGVngBZlT/CM
nhAsserL.1mPT2UWBmF87LWl 1QMnCuTw8xcAbBP7ToyzF7wd41jhp8WmOkChebkWpX
LDkQhwmqfIxhjYMTOtH9pNexMO 9/5F5LVsFQ0IxKGyrKMrUHckjNBl0H3k (encrypted data)' |
These tags do not necessarily carry secure personal health information that would
identify the patient because the patient‟s ability to retrieve it is proof of ownership. The
third tag type required for the “triple shred” is Q-UEL-SHRED3. The authority key is
optional, but it can control access to authorized data miners who can analyze these
data tags without having ability to reaggregate any one record.
| tagtypecheck:=pseudovector:=ket:=data day:=Tue metadata:='Total cholesterol':=
'21b41023b2110191a10981a1d999c1d181a10191898199414b237321032b2b139bc383ab2321
032b03ab09284374716274702465636279707475646024616471692137393831333935333334
383d3132323028267(encrypted data)' club:=1 authority:='YJt/4aDCOcEsSjxUBp4zhonCeQuo
ScZxYFDLhS0vv7zEpgX0TZkLa/Q (encrypted authority)' Q-UEL-SHRED3>
4.6. Disaggregation and Reaggregation Performance. Table 1 reports performance
results for methods and settings stated in the above text as “usually” or “typically”
applied, etc. These settings are very demanding on computation as discussed in regard
to Table 2 later below. Table 1 is for the full “triple shred” into bra, ketba, and ket with
extensive validation checks including the rate limiting graph isomorphism test relating to
the Chunk ID. Column 2 shows the time taken for the overall disaggregation and
reaggregation process when fully automated. Note that these are dependent on
hardware specified in the last three columns. Disaggregation is slowest; recall that
medical data is kept in the disaggregated state. Even for disaggregation, the slowest
process is “administrative”, being the high degree of randomization of order of the tags
in the tag soup (a file in this case), and was done for every new record in order to
provide fair bench marks. Three types of computer were used, an HP Compaq 6200
Pro 64 bit OS with i7-2600 8 x 3.40GHz processors and 4 GB RAM (memory), a
DELL Vostro 320 with E5300 2 x 2.6 GHz and 2 GB RAM, and a T340 Thinkpad 64bit
OS with i3-2350M 2 x 2.30 GHz and 6 GB RAM. These correspond to H, D, and T in
the last column and that order also reflects the age of models (with H, and old HP
server, as oldest), not each manufacturers‟ best current machine. There were also
problems in balancing the Perl computation across the 8 processors of an old HP
server, so ratings give a rather unfair impression of performance on that particular
machine. As expected the more recent machines performed better, and one may have
good expectations for future improvement as speed evidently does tend to increase with
new generations of processors. The rate of generation on latest standard laptops,
including the Thinkpad in Table 1, seems generally to be scalable at about 2 seconds
per shred (i.e. block of data associated with a metadata item) per 100,000 tags in the
mix. The receiving system can be set so that data elements associated each attribute is
displayed as it aggregates, so the time to obtain essential emergency information, that
can be placed in the early attributes, is important. The time of appearance of the first
attribute is shown in Column 3. The benchmarks have been done for tags with a variety
of amounts of data in the attributes; on average each disaggregated tag requires about
400 bytes of file space. High resolution DICOM medical images can take some 10-30
times longer to assemble than a typical text record but that depends on the extent to
which they are arbitrarily disaggregated, i.e. the bytes per “shred” and number of
shreds, as will be discussed elsewhere.
Table 1.
Bench Marks of the Full Method Described in the Text.
Number of
tags to
search in
soup.
Manually
requested
aggregation
time (seconds)
to show
values
associated
with first
metadata
block.
Overall
process:
seconds per
actionable
(readable)
metadata
displayed per
100,000 tags
in same club.
Manually
requested
reaggregation:
seconds per
actionable
(readable)
metadata
block
displayed per
100,000 tags
in same club.
Platform
(hardware
etc.)
81
Total time
(seconds) for
disaggregation,
shuffling of tags
in soup (time
consuming),
automated
immediate
search, and
reaggregation of
summary record
of 27 metadata
items.
2
<1
142
<1
H
81
3
<1
142
<1
T
81
3
<1
142
<1
D
402
3
<1
29
<1
T
671
3
<1
34
<1
D
697
2
<1
17
<1
H
1,064
4
<1
6
<1
D
4,589
16
<1
3
<1
D
26,113
3
<1
8
<1
T
119,847
45
<1
3
<1
D
129,862
232
<1
15
6
H
259,297
7946
19
15
9
H
359,416
142
8
2
2
D
469,284
97
8
2
2
T
599,022
254
21
2
3
D
1,987,820
400
27
2
2
T
3,216,347
684
30
2
2
T
Table 2 compares the above method as “usually” applied with departures from it. The
findings are necessarily preliminary because of the large number of combinations
exploring what effects what, and how they interact. In practice, the practical use of the
Internet especially for emergency services quickly becomes the rate limiting step if
attributes arriving as sets of data elements each correspond to the refreshing of a web
page. It is the number of patients in a club that can be managed that is important.
Table 2
Maximum Club Sizes (Numbers of Patients in Clubs) with Modifications that
Approximately Maintain Current Reaggregation Rates, Based on Preliminary Studies.
Q-UEL
Method
Name.
A
B
C
D
Methods used on current hand-held or laptop
devices using Internet, to maintain time-to-first data
element of less than 30 seconds and subsequent
displays of elements at less than three seconds
each. Divide values in last column by the average
number of data elements per record to obtain club
size meeting the above requirements for whole
records.
Method Described in text and Table 1, using one clinical
data item per metadata attribute.
Pool data into clinical attributes as “shreds” with an
average of N items per attribute. To right, n is still about
95% of N because of increase in encryption time, because
the graph isomorphism test is rate limiting.
Bypass of graph isomorphism test compensated by longer
strings on joint attributes produces variously 4-10 times
speed improvement. Separation of bra, ketbra and ket,
and clubs for each, into separate archives (no great loss to
security). Includes estimated effect based on comparison
of current Q-UEL experimental encryption methods written
in Perl and industrial encryption subroutine performance.
Parallel querying of p archives. Clubs, bras, ketbras, kets,
and first to Mth parts of record are also put on separate
“parallel archives” with M as circa 5. Evolving keys testable
as isomorphic in simpler way as graph test, so that
assembly order is rendered unimportant: metadata put in
required order after reaggregation. Querying for families of
isomorphisms, and then using evolving keys to query
these for the specific record.
Club Sizes (in thousands of
patients), expressed this
way as practical Internet use
can become rate limiting.
n= number of attributes
pooled per ket data tag, p=
number of parallel searches
30-50
n x 30-50
n x 1000-1500
Estimated p x n x 15,000 at
least, assuming reasonable
parallelization.
5. Discussion and Conclusions
Since records are stored and transmitted in the disaggregated state, reaggregation
efficiency is important for all COC functionalities of Q-UEL, at least in the current setup.
The elaborate method described above with benchmarks as in Table 1 allowed a club of
50,000 patients to be queried to obtain a basic patient summary of some 100 data
elements (clinical factors), bundled into an average of 5 per attribute (“one shred” of
record), in 1 minute. User controllable simplifications of the method raise the club size to
1-2 million patients. However, reasonable assertions in Table 2 imply that the same
record could be obtained in the same time, by plausible modifications, for a club the
approximate size of the US. Note that these estimates are still for an elaborate triple
shred bra-ketbra-ket, rather than a double shred bra-ket or single shred braket, model,
and performed on a standard laptop not interacting with a server. But however
achieved, all these assertions depend, of course, on scalability. The current rate
(serially querying) is scalable up to at least 3-4 million emulated patient tags (Table 1),
and earlier similar configurations reached 7-9 million on 2009 generation Thinkpads
though requiring extensive additional machine memory (RAM). Somewhat similar
(computational order) querying, associating, graph theoretic, and processing of 6.7
million chemical structure patent records had similar performance on 2006 generation
ThinkPads [64]. It is not hard to think of plausible improvements, but the tempting one of
using very high performance supercomputers as servers to reaggregate and transmit
encrypted results to portals would lose the “entropy protection” in transmission. This
brings to mind that even PCAST-style disaggregation is not perfectly secure if tags
transmitting to a reaggregating portal are somehow monitored, but that in turn suggests
using the last feature of method D (Table 2). Here, an “isomorphism family” is
transmitted as a kind of club, reaggregating at the portal the document of interest from
that club.
Because disaggregation has been made highly automatic, demonstrations are rather
unimpressive. In effect, one takes a document, places it in a black box, closes the lid,
opens the lid and takes out the document, albeit with a key or keys. To this may be
added that any document can be shredded by splitting it into arbitrary elements,
including images and scans of documents, spreadsheets, PowerPoint files, and so on.
Since one need not look in the black box, it may be questioned as to why one
disaggregates data elements by metadata (attributes in AML). The problem of
interconverting between standards vanishes, at least as far as protection by
disaggregation is concerned. The reason why it is good to “shred by metadata” is for
disposing pieces of medical data to authorized stakeholders on a need-to-know basis,
and for data mining the private encrypted data by authorized persons. These reflect
interpretations of PCAST requirements. At this point, the question might well be raised
as to what, if any, of all the above is what PCAST actually wanted. We believe that QUEL is compliant with what PCAST did want but it is indeed a matter of interpretation.
Our interpretations do not seem far removed from those of other observers on various
blogging sites, but there has been little in depth formal discussion outside of Q-UEL
itself. Even the Yosemite Manifesto [19] was rather vague, and although very general
“roadmaps” for implementing its proposal have appeared, the status of any subsequent
progress regarding them is rather unclear to us at the time of writing [66]. Indeed, the
work done and to be done is because PCAST did not define a UEL in detail, essentially
saying that “there are ways of doing this” [12].
References
1. B. Robson, T. P. Caruso and U. G. J. Balis, Suggestions for a Web Based Universal
Exchange and Inference Language for Medicine, Computers in Biology and
Medicine, 43(12) 2297 (2013).
2. I. M. Mullins, I. M., M.S. Siadaty, J. Lyman, K. Scully, G.T. Garrett, G. Miller, R.
Muller, B. Robson, C. Apte, C., S. Weiss, I. Rigoutsos, D. Platt, and S. Cohen, Data
mining and clinical data repositories: Insights from a 667,000 patient data set,
Computers in Biology and Medicine, 36(12) 1351 (2006).
3. http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/ACO/
4. N. Li, A. F. Laine, H. Jianying, W. Fei , S. Jimeng and S. Ebadollahi, Mining Electronic
Medical Records to Explore the Linkage between Healthcare Resource Utilization
and Disease Severity in Diabetic Patients, Healthcare Informatics, Imaging and
Systems Biology (HISB), IEEE International Conference, 250 (2011).
5. R. A. Greenes (Ed.), Clinical Decision Support, Academic Press (2006).
6. http://www.epic.com/software-intelligence.php (last accessed 2/10/2014).
7. J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Francisco CA: Morgan
Kaufmann (1985).
8. B. Robson, Hyperbolic Dirac Nets for Medical Decision Support. Theory, Methods,
and Comparison with Bayes Nets, Computers in Biology and Medicine, 51, 183
(2013).
9. P. A. M. Dirac, The Principles of QM, Oxford University Press, Oxford (1930).
10. R. Penrose, The Road to Reality. A Complete Guide to the Laws of the Universe,
Joanthan Cape, Random House, London (2004).
11. http://www.healthit.gov/policy-researchers-implementers/meaningful-use-regulations
12. http://www.whitehouse.gov/sites/default/files/microsites/ostp/pcast-health-itreport.pdf
13. http://en.wikipedia.org/wiki/Semantic_Web (last access 3/30/2013).
14. http://en.wikipedia.org/wiki/Resource_Description_Framework (last accessed
4/10/2013).
15. http://en.wikipedia.org/wiki/Triplestore (last accessed 6/5/2013).
16. B. Buchanan, E.H. Shortliffe, Rule Based Expert Systems. The Mycin Experiments
of the Stanford Heuristic Programming Project, Addison-Wesley: Reading,
Massachusetts (1982).
17. A. Sninsky, Developing Universal Electronic Medical Records, Gastroenterol.
Hepatol. (N Y). Mar 2008; 4(3) 193 (2008),
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3088297/ (last accessed 5/11/2014).
18. N. D. Goodman and D. Lassiter, Probabilistic Semantics and Pragmatics, The
Handbook of Contemporary Semantic Theory, Second Edition, Eds. S. Lapin, C.
Fox, Chapter 21, Wiley (in production) (2015).
19. http://yosemitemanifesto.org/ (last accessed 7/5/2014).
20. B. Robson, The New Physician as Unwitting Quantum Mechanic: Is Adapting Dirac‟s
Inference System Best Practice for Personalized Medicine, Genomics and
Proteomics?, J. Proteome Res. (Am. Chem. Soc.), Vol. 6, No. 8: 3114 (2007).
21. http://www.healthbanking.org/index2.html (last accessed 7/25/2014).
22. S. S. Siparasa, JavaScript and JSON Essentials, Packt Publishing (2013).
23. http://json-ld.org/ (last accessed 9/17/2014)
24. http://www.siframework.org/ (last accessed 3/29/2014).
25. http://wiki.siframework.org/Interoperability+of+EHR+Work+Group (last accessed
5/16/2014).
26. http://www.qualityforum.org/WorkArea/linkit.aspx?LinkIdentifier=id&ItemID=68545
(last accessed 3/29/2014).
27. http://semanticweb.org/wiki/Bayes_OWL (last accessed 7/3/2013).
28. http://www.pr-owl.org/basics/bn.php (last accessed 1/25/2014).
29. H. Nottelmann, N. Fuhr pDAML+OIL: A probabilistic extension to DAML+OIL,
http://duepublico.uni-duisburg-essen.de/servlets/DerivateServlet/Derivate5571/Nottelmann_Fuhr_04a.pdf (last accessed 7/28/2013)
30. R. D. Appel, H. J. Komorowski, C. E. Barr, E. Charles E, R. A. Greenes, Intelligent
Focusing in Knowledge Indexing and Retrieval: The Relatedness Tool, Proc. of the
Ann. Symp. on Computer Application in Medical Care 152-157 (1988).
31. S. Meystre, P. J. Haug, Medical Problem and Document Model for Natural
Language Understanding, AMIA Annual Symposium Proceedings 2003:455-459
(2003).
32. B. Robson, R. Mushlin, (2004) “Genomic Messaging System for Information-Based
Personalized Medicine with Clinical and Proteome Research Applications”, J.
Proteome Res. (Am. Chem. Soc.) 3(5); 930-948 (2004).
33. B. Robson and R. Mushlin “The Genomic Messaging System Language Including
Command Extensions for Clinical Data Categories” J. Proteome Res. (Am. Chem.
Soc.) 4 (2), 275 -299 (2005)
34. Y. Park, R. Yu, L. Rang, W. Hye Won, J. H. Kim, Integrating Microarray Gene
Expression Object Model and Clinical Document Architecture for Cancer Genomics
Research, AMIA Annual Symposium Proceedings 2005:1073 (2005)
35. Y. R. Park, R. Yu Rang, H. W. Lee, J. H Kim, H. Ju, Integrating Microarray Gene
Expression Object Model and Clinical Document Architecture for Cancer Genomics
Research, AMIA Annual Symposium Proceedings 2005:1074 (2005).
36. M. Popescu, G. Arthur, OntoQuest: A Physician Decision Support System based on
Ontological Queries of the Hospital Database, AMIA Annual Symposium
Proceedings 2006:639-643(2006).
37. L. Robu, V. Robu, B. Thirion, An introduction to the Semantic Web for health
sciences librarians, J. Medical Library Association, 94(2):198-205 (2006).
38. Q. Xu, Qingwei, Y. Shi Yixiang, Q. Lu, G. Zhang, Q. Luo, Y. Li, Q. Luo, Qingming,
Y. Li, GORouter: an RDF model for providing semantic query and inference
services for Gene Ontology and its associations, BMC Bioinformatics 9(Suppl 1):S6
(2008).
39. C. Tao, W-Q, Wei, R. H. Solbrig, G. Savova, C. G. Shute, CNTRO: A Semantic Web
Ontology for Temporal Relation Inferencing in Clinical Narratives AMIA Annual
Symposium Proceedings 2010:787-791 (2010).
40. B. Chisham, Brandon, B. Wright, T. Le, S Trung, C. t. Son, E. Pontelli, CDAOStore: Ontology-driven Data Integration for Phylogenetic Analysis, BMC
Bioinformatics 12:98 (2011).
41. S. Liu, B. Zhou, G. Xie, Guotong, J. Mei, H., Liu, Haifeng C. Changsheng, L. Qi,
Liang, Beyond Regional Health Information Exchange in China: A Practical and
Industrial-Strength Approach, AMIA Annual Symposium Proceedings , 2011:824833 (2011).
42. S. Heymans, M. McKennirey, J. Phillips, Semantic validation of the use of
SNOMED CT in HL7 clinical documents, J. of Biomedical Semantics 2:2 (2011).
43. C. Tao, H. R. Solbrig, C. G. Chute, CNTRO 2.0: A Harmonized Semantic Web
Ontology for Temporal Relation Inferencing in Clinical Narratives, AMIA Summits on
Translational Science Proceedings 2011:64-68 (2011).
44. G. Jiang, H. R. Solbrig, C. G. Chute, ADEpedia: A Scalable and Standardized
Knowledge Base of Adverse Drug Events Using Semantic Web Technology, AMIA
Annual Symposium Proceedings 2011:607-616 (2011).
45. A. Callahan, M. Dumontier, N. M. Shah, HyQue: evaluating hypotheses using
Semantic Web technologies, J. Biomedical Semantics 2(Suppl 2):S3 (2011).
46. V. Mironov, N. Seethappan, W. Blondé, E. Antezana, A. Splendiani, Andrea, M.
Kuiper, Gauging triple stores with actual biological data, BMC Bioinformatics
13(Suppl 1):S3 (2012).
47. M-F. Sy, Mohameth-François, S. Ranwez, J. Montmain, A. Regnault, M. Crampes,
V. Ranwez, User centered and ontology based information retrieval system for life
sciences, BMC Bioinformatics 13(Suppl 1):S4 (2012).
48. I. Sim, Ida, S. Carini, S. W. Tu, L. Detwiler, T. Landon, J. Brinkley, S. A. Mollah, K.
Burke, H. P. Lehmann, S. Chakraborty, K. M. Wittkowski, B. H. Pollock, T. M.
Johnson, V. Huser, Ontology-Based Federated Data Access to Human Studies
Information, AMIA Annual Symposium Proceedings 2012:856-865 (2012).
49. B. Chen, Y. Ding, D. J. Wild, Improving integrative searching of systems chemical
biology data using semantic annotation, J. of Cheminformatics 4:6 (2012).
50. J. F. Brinkley, F. James , L. T. Detwiler, T. Landon, A Query Integrator and
Manager for the Query Web, J. biomedical informatics , 45(5):975-991 (2012).
51. M. E. Holford, J. P. McCusker, K-H. Cheung, M. Krauthammer, A semantic web
framework to integrate cancer omics data with biological knowledge, BMC
Bioinformatics 13(Suppl 1):S10 (2012).
52. C. Garcia , L. Jael, C. McLaughlin, A. Garcia, Biotea: RDFizing PubMed Central in
support for the paper as an interface to the Web of Data, J. Biomedical Semantics
4(Suppl 1):S5 (2013).
53. J. Kepner, W. Arcand, D. Bestor, B Nergeron, C. Byun, V, Gadepally, M. Hubbell, P.
Michaleas, J. Mullen, A. Prout, A. Reuther, A. Rosa, C. Yee, Achieving 100,000,000
database inserts per second using Accumulo and D4M, IEEE High Performance
Extreme Computing (HPEC), in press (2014).
54. R. H. Dolin, L. Alschuler, S. Boyer, S., C. Beebe, An update on HL7's XML-based
document representation standards, Proceedings of the AMIA Symposium
2000;190-194.
55. R. H. Dolin, L. Alschuler, Liora, C. Beebe, P. V. Biron, S. l. Boyer, D. Essin, E.
Kimber, T. Lincoln, Tom, J. E. Mattison, The HL7 Clinical Document Architecture,
J. Am. Med. Informatics Association , JAMIA 2001;8(6):552-569 (2001)
56. https://www.progress.com/products/data-integration-suite/data-integration-suitedeveloper-center/data-integration-suite-tutorials/healthcare-applications/convertingfrom-hl7-2x-to-hl7-3x (last accessed 10/22/2014).
57. R. H. Dolin, L. Alschuler, S. Boyer, C. Beebe, F. M. Behlen, P. V. Biron, Paul V. ,
A. M. HL7 Clinical Document Architecture, Release 2, J.of the Am. Med. Informatics
Association, AMIA 2006;13(1):30-39 (2006).
58. https://www.hl7.org/documentcenter/public_temp_425ACAEF-1C23-BA170C6AF8FA2C5E1E32/wg/inm/Acf302.pdf (last accessed 10/22/2014).
59. http://www.hl7.org/implement/standards/product_brief.cfm?product_id=6 (last
accessed 10/22/2014).
60. http://www.epsos.eu/ (last accessed 2/10/2014).
61. http://en.wikipedia.org/wiki/VistA (last accessed 2/10/2014).
62. K. C. O‟Kane, The Mumps Programming Language, CreateSpace Independent
Publishing Platform (2008).
63. V. Dinu and P. Nardkarni, Guidelines for the Effective Use of Entity-Attribute-Value
Modeling for Biomedical Databases, Int J Med Inform. 76(11-12): 769–779 (2007).
64. C. Lovis, A. Lamb, R. Baud, A. M. Rassinoux, P. Fabry, A. Geissbühler, Clinical
Documents: Attribute-Values Entity Representation, Context, Page Layout And
Communication, AMIA Annual Symposium Proceedings 2003:396-400 (2003).
65. B. Robson, R. Dettinger, A. Peters, and S.K.P. Boyer, Drug discovery using very
large numbers of patents: general strategy with extensive use of match and edit
operations” J. Computer Aided Molecular Design 25(5):427-41
66. http://www.dataversity.net/semantic-interoperability-future-healthcare-data/ (last
accessed 10/22/2014).