Abstract
Large collections of electronic patient records provide a vast but still underutilised source of information on the real world use of medicines. They are maintained primarily for the purpose of patient administration, but contain a broad range of clinical information highly relevant for data analysis. While they are a standard resource for epidemiological confirmatory studies, their use in the context of exploratory data analysis is still limited. In this paper, we present a framework for open-ended pattern discovery in large patient records repositories. At the core is a graphical statistical approach to summarising and visualising the temporal association between the prescription of a drug and the occurrence of a medical event. The graphical overview contrasts the observed and expected number of occurrences of the medical event in different time periods both before and after the prescription of interest. In order to effectively screen for important temporal relationships, we introduce a new measure of temporal association, which contrasts the observed-to-expected ratio in a time period immediately after the prescription to the observed-to-expected ratio in a control period 2 years earlier. An important feature of both the observed-to-expected graph and the measure of temporal association is a statistical shrinkage towards the null hypothesis of no association, which provides protection against highlighting spurious associations. We demonstrate the usefulness of the proposed pattern discovery methodology by a set of examples from a collection of over two million patient records in the United Kingdom. The identified patterns include temporal relationships between drug prescriptions and medical events suggestive of persistent and transient risks of adverse events, possible beneficial effects of drugs, periodic co-occurrence, and systematic tendencies of patients to switch from one medication to another.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Yu PS, Chen ASP (eds) Eleventh international conference on data engineering. IEEE Computer Society Press, Taipei, pp 3–14
Bate A, Lindquist M, Edwards IR, Olsson S, Orre R, Lansner A, De Freitas RM (1998) A Bayesian neural network method for adverse drug reaction signal generation. Eur J Clin Pharmacol 54: 315–321
Brown JS, Kulldorff M, Chan KA, Davis RL, Graham D, Pettus PT, Andrade SE, Raebel MA, Herrinton L, Roblin D, Boudreau D, Smith D, Gurwitz JH, Gunter MJ, Platt R (2007) Early detection of adverse drug events within population-based health networks: application of sequential testing methods. Pharmacoepidemiol Drug Saf 16(12): 1275–1284
DuMouchel W (1999) Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Am Stat 53: 177–202
Farrington CP (1995) Relative incidence estimation from case series for vaccine safety evaluation. Biometrics 51(1): 228–235
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1): 55–86
Hocine MN, Musonda P, Andrews NJ, Farrington CP (2009) Sequential case series analysis for pharmacovigilance. J R Stat Soc A Stat Soc 172(1): 213–236
Hopstadius J, Norén GN, Bate A, Edwards IR (2008) Adjustment for potential confounders in adverse drug reaction surveillance. Drug Saf 31(11): 1035–1048
Jin H, Chen J, He H, Williams GJ, Kelman C, O’Keefe CM (2008) Mining unexpected temporal associations: applications in detecting adverse drug reactions. IEEE Trans Inf Technol Biomed 12(4): 488–500
Keim DA, Schneidewind J (2007) Introduction to the special issue on visual analytics. SIGKDD Explor 9(2): 3–4
Lindquist M, Ståhl M, Bate A, Edwards IR, Meyboom RHB (2000) A retrospective evaluation of a data mining approach to aid finding new adverse drug reaction signals in the WHO international database. Drug Saf 23(6): 533–542
Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3): 259–289
Norén GN, Bate A, Orre R, Edwards IR (2006) Extending the methods used to screen the WHO drug safety database towards analysis of complex associations and improved accuracy for rare events. Stat Med 25(21): 3740–3757
Norén GN, Orre R, Bate A, Edwards IR (2007) Duplicate detection in adverse drug reaction surveillance. Data Min Knowl Discov 14(3): 305–328
Norén GN, Bate A, Hopstadius J, Star K, Edwards IR (2008a) Temporal pattern discovery for trends and transient effects: its application to patient records. In: Proceedings of the fourteenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’08). ACM Press, New York, pp 963–971
Norén GN, Sundberg R, Bate A, Edwards IR (2008) A statistical methodology for drug–drug interaction surveillance. Stat Med 27(16): 3057–3070
Pirmohamed M, James S, Meakin S, Green C, Scott A, Walley T, Farrar K, Park B, Breckenridge A (2004) Adverse drug reactions as cause of admission to hospital: prospective analysis of 18 820 patients. Br Med J 329: 15–19
Wadman M (2007) Experts call for active surveillance of drug safety. Nature 446: 358–359
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: R. Bharat Rao and Romer Rosales.
Rights and permissions
About this article
Cite this article
Norén, G.N., Hopstadius, J., Bate, A. et al. Temporal pattern discovery in longitudinal electronic patient records. Data Min Knowl Disc 20, 361–387 (2010). https://doi.org/10.1007/s10618-009-0152-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-009-0152-3