GraPhys

Ana Appel

GraPhys

Proceedings of the 25th International Conference Companion on World Wide Web - WWW '16 Companion

GraPhys: Understanding Health Care Insurance Data through Graph Analytics. Luis G. Moyano, Ana Paula Appel, Vagner F. de Santana, Marcia Ito, Thiago D. dos Santos IBM Research, Brazil {lmoyano, apappel, vagsant, marciaito, thiagodo}@br.ibm.com ABSTRACT of doctors, patients, healthcare providers, and other stakeholders. This has been the final objective of whole areas such as Business Intelligence and Business Analytics [2]. Traditional Business Analytics regards data querying and aggregation as the main building blocks of information processing. While this approach is fruitful and necessary, there are data structures that contain additional useful information: relationship data in the form of metadata. By relationship data we mean data structures that relate two or more entities in a precise, quantifiable way, such as a phone call connecting two people or two computers connected by a physical link. This is the case in healthcare insurance data, where patients relate to healthcare professionals by making consultation visits, performing exams, among other medical procedures. This relationship metadata may be presented in the form of a graph, where nodes correspond to patients and healthcare professionals (in this case nodes are not all of the same type), and links corresponds to any medical procedure that has involved a given pair of patient and doctor [10, 4]. Once it has been mapped to a graph structure, graph analytics techniques can be applied in order to better understand patterns in the data and extract useful insights for business decision making [9, 7, 3, 6, 5]. The main contribution of this work is the provision of a platform for graph analytics in healthcare insurance claims data, allowing for better visualization, exploration and analysis of the data. Our goal with this platform is to take advantage of the relationship data in a systematic way, exploring graph analytical techniques to better understand doctors, patients and other stakeholders as well as their practices. Our platform has been tested with real claims data from a major Brazilian healthcare insurance company. The data corresponds to more than 18 months of claims transactions nation-wide, totaling more than 2.1 million patients and more than 220.000 doctors and medical professionals. Healthcare insurance data represent a rich source of information and has the potential to contribute significantly in guiding business decision making. In this work we present GraPhys, a Graph Analysis platform designed for exploration, visualization and analysis of healthcare insurance data and its corresponding metadata. By taking advantage of relationships contained in healthcare claims data, we are able to apply Graph Analytics methods and algorithms in order to devise useful business metrics to guide data analysis and exploration. Our tool focuses in better understanding physicians, patients and their practices. We illustrate our approach by demonstrating two use cases where we show how graph analytics metrics, combined with other data, may lead to useful insights not directly available to traditional Business Analytics. Keywords graph mining, health, visual mining 1. INTRODUCTION Healthcare insurance companies generate large amounts of data as a product of their daily operations. These vast amount of data reflects the complexity in their operations, as it demands precise tracking of multiple pieces of information mainly associated to accurately account for costs and expenses of all medical services provided by healthcare professionals to patients. These transactional data are very rich, as it contains spatial temporal coordinates as well as information about the parties involved, plus many more variables, from disease codes to diagnostics in the form of unstructured, free text. The information contained in these data can be critical for decision making on several levels, from making better medical procedures to improving business processes. Indeed, it is easy to realize that such rich source of data may contain clues to better understanding patterns in the behavior 2. RELATIONSHIP EXPLORATION AND VISUALIZATION Health care insurance companies and other medical insurance providers have a wealth of data at their disposal. Operations in this area produces an amount of information, specially transactional, and also other, more persistent, forms of data, such as demographics, healthcare provider location, and other essential data to accurately carry forward all business processes. Copyright is held by the author/owner(s). WWW ’16 Companion, April 11–15, 2016, Montréal, Québec, Canada. ACM 978-1-4503-4144-8/16/04. http://dx.doi.org/10.1145/2872518.2890544 227 An important piece of transactional data are claims, which for the present work may be regarded as a report from the physician or healthcare provider to the insurance company. A claim informs all the details of a patient’s visit or medical procedure. Even though claims data may vary, it generally contains an ID of the healthcare professional involved in the procedure (it may also be a group of professionals), and ID of the patient who was treated and a timestamp corresponding to the moment the event took place. Further vital information is usually added, such as which types of health services were delivered, and the associated costs owed for the insurance company to process, among others. Claims data may be mapped in graph form by considering the healthcare professional as a node (of a certain type), patients also as nodes (of a different type), and establishing a link whenever there is a claim containing both. The resulting graph is an example of bipartite graph, where there are several types of nodes and there are no links between nodes of the same type. Moreover, additional bits of information in the data may be included in the graph in the form of weights in the links (such as the timestamp, the service provided or the expense amount), or as attributes in the nodes (as the patient demographics information or the medical specialty of the healthcare professional). Both weights and attributes may be represented in a graph layout by mapping their values to colors or sizes in the case of nodes, or colors and widths in the case of links. Up to now, we have described a mapping of claims data into graph form, which is particularly adept to being represented visually for exploration. Additionally, the GraphPhys platform contains additional metrics, computed internally, designed to explore other aspects of interest in the healthcare claims data. These metrics were designed specifically taking into consideration the interconnected nature of the data, exploiting the possibilities of graph analysis. The new metrics can be combined with the original data contained in the dataset. The combination of both sources of information may yield improved analysis and alternative exploration patterns, potentially leading to new insights. We will discuss specific examples in Secs. 4.1.1 and 4.1.2. Figure 1: GraPhys platform architecture. involves, one for links and another one for nodes. Each one of these database has views and search indexes with a description of all methods through which the application gets the data. Application Node.js Server (App.js). This component is responsible for the server description, as well as the authentication configuration parameters to access the Cloudant Database. Queries to the database solicited by the web page are done through this component. Graph Analytics. The Graph Analytics contains the Graph Layout submodule and the Analytics submodule. The Graph Layout module uses the D3 library to render the graph layout. When receiving the web page request with the filters chosen by the user, the data is obtained from the database through a request to the server. With the data received to render the graph layout as a starting point, the Analytics component computes all necessary metrics, statistics and distributions (for instance, degree and other centrality measures), using the D3 library. Visualization. The platform has been implemented as an HTML web page with CSS design. It uses jQuery and Ajax to interact with the user, to make HTTP requests and to update content. The graph is designed using the D3 library, through data obtained by requests to the database, and filtered according to the different user selections. 3. ARCHITECTURE The GraphPhys platform was develop using IBM Bluemix platform (Platform as a Service - PaaS)1 , IBM’s open cloud platform that provides mobile and Web developers access to IBM software for integration, security, transaction, and other key functions, as well as software from business partners. In Figure 1 we can see the different components that cover our platform, including a Data Storage component, an Application, a Graph layout component, a Histogram component, and a Visualization component. Data source. Data are extracted from a designated source file and loaded to a JSON file, structured as nodes and links. The data are saved to the database platform through a javascript application. Data Storage. We chose Cloudant DB 2 [8] as the database platform to store all healthcare claims data, as it is compatible with IBM Softlayer, the cloud platform utilized for the development of the demo. There are two databases 1 2 4. DEMO USE CASES Conference participants will be able to interact directly with the GraphPhys platform. There will be a test dataset containing anonymized real data to explore different visualizations, layers of aggregation as well as filtering options. The test dataset is derived from real healthcare claims data from a major Brazilian healthcare insurance company. The demonstration will contain all the main components of the platform, as described in Sec. 3. As first use case, we showcase two specific graph analytics metrics, retention and reciprocity, which we will describe in Section 4.1.1 and 4.1.2. We describe how these metrics fit in the exploration capabilities of the platform. As a second use case , we describe the possibility of exploring and visualizing, always from a graph perspective, different aggregation alternative in the data, explained in http://bluemix.net/ https://cloudant.com 228 In many cases, doctors and other healthcare professionals interact directly or indirectly with other members of the healthcare system. A doctor may recommend another physician or healthcare provider to initiate or continue treatment, or even refer the patient for treatment with doctors from other specialties are a better fit. This practice can be part of the normal workflow patients undergo and, depending on many factors, may occur with more or less frequency. On the other hand, there could be the case where two or more doctors share patients with frequencies off the typical rates found in similar cases [1]. These cases are interesting for the insurance company to understand further, and check if the atypical relationship is indeed justified or a case that should be rectified. Our platform computes a special dedicated metric, which we call reciprocity metric in order to quantify and detect this kind of situation. The reciprocity metric is built over the concept of mutual degree, i.e. the sum of the number of directed links shared by two given nodes in the two possible ways, back (wij ) and forth (wji ). We penalize this sum by subtracting the absolute value of the difference (|wij − wji |), in order to reveal asymmetries in both directions. Properly normalized, this metric captures pairs of doctors that share an unusual large number of patients, in particular the case where this is true in both directions. Figure 2: Front-end interface of the GraPhys platform. The use case shows a doctor-doctor graph, where nodes are color-coded proportionally to the retention metric. more detail in Sec. 4.2. 4.1 Graph Analytics metrics 4.2 4.1.1 Retention Aggregated Graph Data Exploration Healthcare data, as most types of data, has a layered structure which is essential to be taken into account into any analysis. These layers of information allow for various ways of data aggregation, which in turn leads to useful information and insights. For example, is to consider aggregating data by specialties, providing information about, for instance, all cardiologists, or all physicians that work in a given hospital, and so on. Our graph analytics approach naturally combines graph data structures with any kind of variable aggregation, by transforming the graph appropriately. To illustrate this feature we may start considering claims in a given time window. As described previous, doctors and patients are mapped into nodes (yielding a bipartite graph or network) and links represent claims, which in turn stand for visits, exams, hospitalization and other medical procedures. We could now want to visualize a more summarized version of the same data by aggregating doctors by healthcare provider (hospital, clinic, first aid post, etc.). This is achieved in our platform by selecting the appropriate variable for aggregation, in this case “provider”, which replaces individual doctor nodes by provider nodes, and rewires links appropriately. Each node will thus represent a healthcare provider, also standing for all doctors that work in that provider. To include this information, the tool has the option to resize nodes proportionally to the number of doctors associated to the node. There could be the case of a patient visiting many doctors in a hospital. In this case, links can also account vary their width proportionally to the number of claims. We could now want to explore these data in an even higher level scale, say by specialty, maintaining its graph form. The user selects the appropriate tool function to aggregate both doctors and patients by the “specialty” variable, and repeat the procedure described previous, resulting in another graph. The first example that will be presented is related to the concept of retention, which aims at capturing the ability of a given healthcare professional to retain a patient who visits her with some frequency. We compute the number of links (visits) of a given patient to all visited doctors, normalizing to obtain a relative measure. Combining the relative frequency of visits with the degree (i.e., the absolute number of visits) of each patient, we can filter and choose those patients that for some reason or another have a preferred doctor. Finally, we quantify how many of such patients any doctor has, allowing us to find those physicians that present an above-average number of such patients, which is a strong indication of a higher capacity of retaining patients. This metric may be an indirect proxy for patient loyalty and allows, through the use of the tool and in combination with other data, to pinpoint interesting cases for the user of the platform to a more extended analysis. In Figure 2 we can see our platform illustrating a doctordoctor graph, i.e. all nodes are of the same type. The central layout renders the graph structured data, and nodes connected to each other correspond to doctors with patients in common. The color of the nodes is proportional to the retention variable previous explained, which is selected by the user from the drop-down menu in the top of the interface. In this setting, it is straightforward to see which are the doctors with the highest value of the metric, and how they relate to each other. Further information about the resulting graph can be found at both sides of the central layout, from the total number of nodes and links in the graph, to different histograms and figures that characterize different metrics of interest for the specific subset of doctors. 4.1.2 Reciprocity 229 [2] H. Chen, R. H. Chiang, and V. C. Storey. Business intelligence and analytics: From big data to big impact. MIS quarterly, 36(4):1165–1188, 2012. [3] A. Chmiel, P. Klimek, and S. Thurner. Spreading of diseases through comorbidity networks across life and gender. New Journal of Physics, 16(11):115013, 2014. [4] P. B. Jensen, L. J. Jensen, and S. Brunak. Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics, 13(6):395–405, 2012. [5] P. Klimek, A. Kautzky-Willer, A. Chmiel, I. Schiller-Früwirth, and S. Thurner. Quantifying age-and gender-related diabetes comorbidity risks using nation-wide big claims data. arXiv preprint arXiv:1310.7505, 2013. [6] B. E. Landon, J.-P. Onnela, N. L. Keating, M. L. Barnett, S. Paul, A. J. OâĂŹMalley, T. Keegan, and N. A. Christakis. Using administrative data to identify naturally occurring networks of physicians. Medical care, 51(8):715, 2013. [7] K. D. Mandl, K. L. Olson, D. Mines, C. Liu, and F. Tian. Provider collaboration: cohesion, constellations, and shared patients. Journal of general internal medicine, 29(11):1499–1505, 2014. [8] G. Mone. Beyond hadoop. Communications of the ACM, 56(1):22–24, 2013. [9] S. K. Sauter, L. M. Neuhofer, G. Endel, P. Klimek, and G. Duftschmid. Analyzing healthcare provider centric networks through secondary use of health claims data. In Biomedical and Health Informatics (BHI), 2014 IEEE-EMBS International Conference on, pages 522–525. IEEE, 2014. [10] F. Wang, U. Srinivasan, S. Uddin, and S. Chawla. Application of network analysis on healthcare. In Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on, pages 596–603. IEEE, 2014. Figure 3: Front-end interface of the GraPhys platform. The use case shows a doctor-doctor graph aggregated by provider and specialty. Node size and link width are resized as explained in the text. This second use case is described in Figure 3. We show a similar case as before, but doctors have been aggregated both by provider and specialty. Nodes and links have been resized as explained before. The user may also select a node or link, which is highlighted while related information is collected in the side panel, including statistics of the elements contained in the aggregated node or link. 5. CONCLUSION AND FUTURE WORK We have presented our Graph Analytics platform for health insurance claims data. The tool was designed for exploration, visualization and analysis of claims data, leveraging the relationship metadata contained in the original data, by means of graph analytics. This data structure allows to naturally define new metrics based in graph theory algorithms and concepts which may be explored on its own, as well as in combination with other variables in the dataset. The combination of graph analysis and exploration opens up a wide range of applications, for instance, rapidly detecting pairs of doctors that share patients in abnormal levels, or ranking doctors with the highest level of patient loyalty compared to same specialty, among many other examples. GraPhys is evolving quickly in many fronts. One particular extension we are working on is adding new graphtheoretical metrics that address other aspects of the healthcare business. There are additional challenges ahead to achieve scalability for large (tens and hundreds of thousands of nodes) and very large datasets (millions of nodes). Finally, new functionalities from the analytics and reporting capabilities are being considered as further improvements. 6. REFERENCES [1] A. Beutel, L. Akoglu, and C. Faloutsos. Fraud detection through graph-based user behavior modeling. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 1696–1697. ACM, 2015. 230

Log In

GraPhys

Related papers

Related papers