RELREC - SDTM Programmer's Bermuda Triangle
RELREC - SDTM Programmer's Bermuda Triangle
RELREC - SDTM Programmer's Bermuda Triangle
ABSTRACT
The CDISC Study Data Tabulation Model (SDTM) provides framework for organizing and converting clinical
trial data into standard formats. This supports easy interpretation and maintaining consistency across trials.
Sometimes, it becomes vital to establish relationship between records/datasets in SDTM to facilitate linking
process at the time of conversion. Logic of relationship is either identified by profile/outliers in data (ex: PC
and PP) or by identifying the data link between domains to examine associated information from individual
domains collectively (ex: TU, TR and RS).
Related Records (RELREC) - Special Purpose Relationship Domain - can capture these explicit and
inexplicit relationship(s) to aid further in-depth exploration of data collected during trials. Often, perceived
as a challenging zone, this beauty is yet to be explored to its maximum potential.
This paper details the process to standardize relationships within and between SDTM domains by using
the concept of Group Identifier (--GRPID - a variable used to link a block of related records within a subject
in a domain), distinct requirements for assigning RELID, appropriate usage of RELTYPE and best variables
to be considered for populating IDVAR in the following scenarios:
i. The relationship between an intervention and its findings related to the efficacy endpoints of the
clinical study
ii. The relationship and control of event record over intervention, exposure and disposition of the
subject involved in the trial
iii. The relationship between oncology specific domains and
iv. The relationship between pharmacokinetics domains
INTRODUCTION
The Clinical
Data
Interchange Standards
Consortium (CDISC)
is a non-profit organization, started in
2000, to develop global and platform-independent standards for clinical trial data, to improve its data quality
and to accelerate the product development. It has established standards to support the acquisition,
exchange, submission and archive of clinical research data and metadata.
The CDISC Study Data Tabulation Model (SDTM) provides framework for organizing and converting clinical
trial data into standard formats. It is usually described as the study source data i.e. contents and structure
of data collected during a clinical trial. SDTM provides a standardized platform-independent mechanism for
representing all the essential information collected in clinical trial with intent to easily interpret, understand,
and navigate. The purpose of SDTM is to provide regulatory authority reviewers a clear description of the
structure, attributes and contents of each dataset and variables submitted as part of a product application.
In some circumstances, during data conversion to SDTM standards, it becomes vital to establish
relationship between records/datasets in SDTM to achieve the standard. The logic of these relationship is
either identified by profile/outliers in data (ex: PC and PP) or by identifying the data link between domains
to examine associated information from individual domains collectively (ex: TU, TR and RS).
The CDISC SDTM provides several ways to relate records within and between SDTM domains. Records
within a domain can be related by assigning them the same value for --GRPID. The --GRPID supports the
relationships within and between domains by being an ideal identifying variable(IDVAR) in RELREC. The
RELREC dataset can be used to relate multiple records in multiple domains. The types of relationships that
can be established using SDTM are:
➢ Record to record relationship
• --GRPID
1
RELREC – SDTM Programmer’s Bermuda Triangle, continued
The --GRPID has no inherent meaning across the subjects/domains in the study. All observations in the
same domain with the same --GRPID value are a group of records within an USUBJID. The --GRPID can
be assigned in a logical and sequential manner, during/after data collection. It does not have any restriction
with respect to controlled terminology of the SDTM. The grouping variable comes in handy when relating
peer records of the data collected. It also allows repeated events/assessments to be grouped logically for
analysis. For example, grouping retests done on a few parameters in the LB domain allows for separate
analyses of these tests.
Another example can be provided in the TU domain, where the TUGRPID can be used to identify the
'parent' tumor of split tumors and the 'parents' of merged tumors as shown in the sample TU dataset
shown in Table 2 below.
2
RELREC – SDTM Programmer’s Bermuda Triangle, continued
3
RELREC – SDTM Programmer’s Bermuda Triangle, continued
Usage of RELTYPE
The RELTYPE variable is populated only in circumstances when two entire datasets are fully related. The
variable RELTYPE identifies the type of relationship between the datasets. The permissible values for
populating RELTYPE – ONE and MANY. The information defines how to merge/join the data, and what
would be the outcome of the merge/join.
The subject information like USUBJID will not be provided when RELTYPE is populated, as it implies that
entire datasets are linked to get complete information. The possible combinations are:
ONE and ONE
This combination indicates that there is no hierarchical relationship between the datasets and the records
in the datasets. Only one record from each dataset will potentially have the same value of the IDVAR within
USUBJID.
ONE and MANY
This combination indicates that there is a hierarchical (parent/child) relationship between the datasets. One
record within USUBJID in the dataset identified by RELID will potentially have the same value of the IDVAR
with many (one or more) records in the dataset identified by RELID.
MANY and MANY
This combination is unusual and challenging to manage in a merge/join.
Assigning RELID
The RELREC dataset has a unique variable named RELID (Relationship Identifier), which is identical for
all related records. The value of RELID can be customized by the programmer. It will be ideal to set a
meaningful value to RELID allowing the traceability of the related records.
The best practice of creating RELIDs is to use the related domain abbreviations as the first four characters
of the RELID and adding a sequential number to it if more than one relationship exists. For instance, the
RELID for the records linked between AE and CM will be AECM001, AECM002, and so on. The sequential
number can be created as a part of programming after the records are linked in the initial stages.
The number suffixed to the RELID can be numbered sequentially, or it can have digit level values like 1,
10, 100, and so on for better clarity. The suffix can also be a roman numeral like I, II, III, IV, V, X and so on.
It is left to the proficient SDTM programmer’s to showcase a unique way of numbering.
The below SAS code is the sample for creating RELID programmatically:
data ae_cm;
merge ae (in=a) cm (in = b);
by usubjid aeid;
if a and b;
run;
4
RELREC – SDTM Programmer’s Bermuda Triangle, continued
CASE STUDIES
INTERVENTION RECORD RELATED TO FINDINGS RECORDS
PR related to FA
An oncology study which considers the prior radio therapy for checking medical condition of all subjects
participating in the clinical trial. The eCRF page captures the following:
1. Location of the radiotherapy
2. Start and end of radiotherapy
3. Type of radiotherapy
4. The patient’s best response for that radiotherapy
5. Further chemotherapy taken
6. If progression occurred
9999- RADIO
PR 1 DAY2-018 2012-08-22 DAY2 Y
1212 THERAPY
Table 3. SDTM.PR dataset
As per SDTMIG, the points 1-3 can be standardized under PR domain as it is about therapeutic and
diagnostic procedures. The points 4-6 are questions related to response or progressions, which are
significant for analysis in an oncology study. Therefore, for better utilization of the data, it is captured in
FA.
To perform further analysis, it is important to know the best response corresponding to each PR record.
So, a dataset-level relationship between PR and FA is established.
5
RELREC – SDTM Programmer’s Bermuda Triangle, continued
IDVAR:
The inexplicit relationship between the PR and FA can be established using the –SPID. The value of SPID
will be populated with a database generated unique identifier for each iteration of this form with which we
can relate the PR and FA records. IDVARVAL will not be populated with the value of –SPID as the
relationship exists for all the values of IDVARVAL.
RELTYPE:
In this case study, all the subjects participating in the trial are undergoing the prior radio therapy. Therefore,
RELTYPE will be populated. This combination of records indicates that there is a hierarchical (parent/child)
relationship between the datasets which would mean a ‘One to Many’ relationship.
RELID:
A single value of RELID establishes the complete relationship in this scenario. The ideal RELID will be the
summation of the domain abbreviations, i.e., PRFA.
EVENT RECORD RELATED TO INTERVENTION RECORDS (CM & EX) AND DISPOSITION
OF THE SUBJECT
AE related to CM, EX and DS
A clinical study is conducted for assessing the safety and efficacy of the study drug administrated. The
following are captured in eCRF
a) Concomitant Medication Page
b) Start Date
The ‘Reason for Medication’ collects the reason for the concomitant medication administration. Usually,
concomitant medications are administrated to treat the adverse events observed during a study. The
concomitant medications may have substantial efficacy implications on the study drug. For this purpose,
6
RELREC – SDTM Programmer’s Bermuda Triangle, continued
the relationships are established between AE and CM records, concentrating on serious adverse
events(SAEs).
c) Disposition Page
The protocol may have a series of checks for certain non-compliances at every visit. Thus, certain AEs can
make a subject non-compliant and result in termination from the study. Since an AE is the main cause of
termination, it is beneficial to relate it to the disposition data.
d) Exposure Page
During a trial, the study drug dose is adjusted for various reasons, including impact of multiple lab test
values, duration of administration or investigator decision. The dose is also adjusted after the occurrence
of an AE. Establishing this relationship shows the effect of the AE on the exposure of the study drug.
Prolonged/Serious AEs may lead to discontinuation of the study drug also. Tables 6, 7, 8 and 9 show
sample data in the AE, CM, DS and EX domains.
DOMAIN USUBJID AESEQ AESPID AETERM
AE 12345 20 15 ACUTE VIRAL NASOPHARYNGITIS
Table 6. SDTM.AE dataset
DOMAIN USUBJID CMSEQ CMSPID CMTRT CMINDC
CM 12345 10 2 AVAMYS AE
Table 7. SDTM.CM dataset
7
RELREC – SDTM Programmer’s Bermuda Triangle, continued
8
RELREC – SDTM Programmer’s Bermuda Triangle, continued
standard structures, the data from independent vendors can be collaborated efficiently and the standardized
data can be utilized directly for analysis.
IDVAR:
The three oncology domains mentioned above are closely associated to form a complete package of
disease response data. The --LNKID/--LNKGRP variable provides a unique code for each identified tumor,
and for each response and associated tumor measurements/assessments. This helps build a
comprehensive relationship between the three oncology domains to ensure their organic integrity. --LNKID
is more like a point to point linkage, whereas, --LNKGRP link the records having RSTEST= “Overall
Response” in the RS to TR as a one to many linkages.
RELTYPE:
The RELTYPE will be not populated when it’s a subject level relationship. In this oncology study example,
all the subjects are considered to have this relationship. Therefore, the RELTYPE is populated.
RELID:
The RELID will be the domain abbreviations followed by a sequential number/roman numeral.
The RELREC dataset for this example is shown in Table 11.
STUDYID RDOMAIN USUBJID IDVAR IDVARVAL RELTYPE RELID
ABC12345 TU TULNKID ONE TUTR-I
ABC12345 TR TRLNKID MANY TUTR-I
ABC12345 TR TRLNKGRP MANY TRRS-II
ABC12345 RS RSLNKGRP ONE TRRS-II
ABC12345 PR PRREFID ONE PRTU-I
ABC12345 TU TUREFID MANY PRTU-I
ABC12345 PR PRLNKGRP MANY PRRS-III
ABC12345 RS RSLNKGRP ONE PRRS-III
Table 11. Oncology domains linked among themselves and with PR which captures the Procedure
methods
The mock SAS code for deriving the relationships mentioned above:
****Creating TU & TR rel data*****;
proc sort data= sdtm.tr out=tr;
by usubjid trlnkid;
run;
proc sort data= sdtm.tu out=tu;
by usubjid tulnkid;
run;
proc sql;
create table in1_2 as select * from tu a inner join (select * from tr)
as tr
on a.usubjid = tr.usubjid and a.tulnkid = tr.trlnkid;
quit;
proc sort data = in1_2 out = in1_2_;
by usubjid lnkid;
run;
data rel_tr_tu;
set in1_2_;
by usubjid lnkid;
seq+1;
if first.usubjid then seq = 1;
9
RELREC – SDTM Programmer’s Bermuda Triangle, continued
RELID = compress("TUTR"||put(seq,best.));
run;
/*APPENDING TR TO TU*/
data tr_tu_fin;
set rel_tr_tu (in=a)
rel_tr_tu (in=b);
if b then do;
idvar = ’TRLNKID’;
idvarval = trlnkid;
rdomain = ’TR’;
end;
run;
10
RELREC – SDTM Programmer’s Bermuda Triangle, continued
RELID:
The RELID can be the domain abbreviations followed by a sequential number. It can also be a combination
of the domain abbreviations with suffixed alphabets for better clarity.
Table 12 shows the RELREC dataset for the linked pharmacokinetics domains.
CONCLUSION
The RELREC domain provides a flexible method to link data points. RELREC creation can be a herculean
task, considering the whirlpool of intertwining data relationships one needs to keep in mind while
programming. It is easy to lose track of the end objective while working on RELREC especially when the
data relationships are highly complex. That’s why I chose to term it as a ‘Bermuda Triangle’, where the
whirlpool of data keeps getting interesting and intertwining more and more that, one can lose track easily.
RELREC can be used to establish relationships: between records of a subjects and between different
SDTM domains. An additional review of the relationships established will be a good practice, so that
unwanted/mismatched relationships with mundane value can be avoided in the CSR.
This paper is an attempt to help programmers with a sequence of considerations for deriving the key
variables in RELREC domain through repetitive probable scenarios where the relationships needed to be
established in a clinical trial.
REFERENCES
Madhura Khare (2014) Findings about “Findings About”. PhUSE 2014 Paper CD02. Available at:
http://www.lexjansen.com/phuse/2014/cd/CD02.pdf
Haishan Kadeerbai (2014) Brief Introduction of Oncology Domains in SDTMIG, Version 3.2. PharmaSUG-
China-2014-CD02. Available at: http://www.lexjansen.com/pharmasug-cn/2014/CD/PharmaSUG-China-
2014-CD02.pdf
Wood, F., Schaefer, P. and Lewis, R. (2012) Considerations in the Submission of Pharmacokinetics (PK)
Data in an SDTM Compliant Format PharmaSUG 2012 - Paper DS10. Available at:
http://www.pharmasug.org/proceedings/2012/DS/PharmaSUG-2012-DS10.pdf
Fred Wood (2011), Creating SDTM Datasets from Legacy Data PharmaSUG 2011 - Paper HW03. Available
at: http://www.pharmasug.org/proceedings/2011/HW/PharmaSUG-2011-HW03.pdf
Changhong Shi, Beilei Xu (2011), A Special SDTM Domain RELREC and its Application. PharmaSUG2011
– Paper CD08. Available at: http://www.pharmasug.org/proceedings/2011/CD/PharmaSUG-2011-
CD08.pdf
Karl Miller, J. J. Hantsch, and Janet Stuelpner (2012), Avoiding a REL-WRECK; Using RELREC Well
PharmaSUG 2012 - Paper DS08. Available at:
http://www.pharmasug.org/proceedings/2012/DS/PharmaSUG-2012-DS08.pdf
An introduction of SDTM domain RELREC - David Shang May24, 2013. Available at:
http://www.phusewiki.org/docs/China%20SDE%202013%20Presentations/An%20introduction%20of%20
SDTM%20domain%20RELREC.pdf
11
RELREC – SDTM Programmer’s Bermuda Triangle, continued
ACKNOWLEDGMENTS
I would like to express my sincere gratitude to the management of my organization – “Ephicacy” for the
encouragement and support in helping me with all the necessary facilities to write this paper.
I would also like to thank members of the Ephicacy family for their encouragement, insightful comments
and hard questions.
My sincere thanks to my manager Mr. Tyagrajan Swaminathan and our India center head Mr. Siva
Ramamoorthy for the unwavered support.
RECOMMENDED READING
Study Data Tabulation Model Implementation Guide: Human Clinical Trials, Version 3.2, CDISC
Submission Data Standards Team (November 26, 2013).
Study Data Tabulation Model, Version 1.4, CDISC Submission Data Standards Team (November 26,
2013)
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Name: Charumathy Sreeraman
Enterprise: Ephicacy Lifescience Analytics Pvt. Ltd., India Address: No.6, 2nd Floor, 2nd Main Rd,
Arekere, Off. Bannerghatta Road, City, State ZIP: Bangalore, Karnataka 560076
E-mail: charumathy.sreeraman@ephicacy.in
Web: www.ephicacy.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
12