Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Portal-based Support for Mental Health Research David Paul1 , Frans Henskens1 Patrick Johnston2 and Michael Hannaford 1 1 School of Electrical Engineering & Computer Science, The University of Newcastle, N.S.W. 2308, Australia 2 Centre for Mental Health Studies, The University of Newcastle, N.S.W. 2308, Australia Abstract. This paper describes experiences with the use of the Globus toolkit and related technologies for development of a secure portal that allows nationally-distributed Australian researchers to share data and application programs. The portal allows researchers to access infrastructure that will be used to enhance understanding of the causes of schizophrenia and advance its treatment, and aims to provide access to a resource that can expand into the world’s largest on-line collaborative mental health research facility. Since access to patient data is controlled by local ethics approvals, the portal must transparently both provide and deny access to patient data in accordance with the fine-grained access permissions afforded individual researchers. Interestingly, the access protocols are able to provide researchers with hints about currently inaccessible data that may be of interest to them, providing them the impetus to gain further access permissions. 1 Introduction Schizophrenia is a brain disease that affects approximately 0.6-1.5% of the population, with an incidence of 18 - 20 cases per 100,000 per year [9]. Although prevalence is low, the burden of the illness upon society and upon sufferers and their families is extremely high. The World Health Organisation, for example, rates schizophrenia amongst the ten leading causes of disease burden. The disorder involves severe cognitive, affective and perceptual dysfunctions, which, at an overt behavioural level, manifest themselves in terms of delusional beliefs and disorganised behaviours; perceptual disturbances including, particularly, auditory hallucinations; and lack of motivation, and general decline in personal and social functioning. Consequently, it is a disease associated with very high costs to government (AUD35,000 per patient per year) [1] and extremes of social impoverishment and economic disadvantage [10]. Recent scientific advances have le d to a model of schizophrenia that recognises the role of abnormal neuro-developmental and/or neurodegenerative processes in altering the structure and function of the brain. Until relatively recently detailed images of cerebral morphology could only be obtained from postmortem tissue. The limitations of the traditional tissue -based approach to neuropathology can potentially be overcome through the use of neuroimaging technologies. Neuroimaging techniques offer the potential for in vivo studies of brain structure as well as func tion, thus overcoming problems relating to tissue degeneration postmortem, invariably small samples of post-mortem brains and, of course, the obvious fact that the tissue is derived from deceased persons. Moreover, techniques such as magnetic resonance imaging (MRI) allow for repeated testing of the same individuals, and thus longitudinal studies may be undertaken. A further advantage of MRI is that it may be employed to produce high-resolution threedimensional digital representations of brain structure. This approach lends itself more easily to sharing and distribution of the primary source data (i.e. digital images) among research teams than does traditional approaches in neuropathology (i.e. where the brain tissue itself is the primary source data). It also supports the This research is supported by The Australian Research Council (ARC) grant SR0566756 (2005-2006). On-going work is supported by the National Health & Medical Research Council (NHMRC) grant AIP/ERP #1679 (2006-2010), and by a grant from the Pratt Foundation (2007-2011). 1 application of computational image processing techniques for the precise definition, localisation and measurement of brain structures. brain activity such as functional MRI (fMRI) or event-related potentials (ERPs). A further example of the significant impact of data-access-enabling infrastructure on research was the National Institute for Schizophrenia and Allied Disorders (NISAD) [15] Schizophrenia Research Register. It was intended that the Virtual Brain Bank would act as the foundation to which could later be added putative endophenotype measurements derived from the Schizophrenia Register participants and other neurocognitive studies of schizophrenia as well as genetic information derived from the DNA Bank and the Laboratory of Neuro Imaging (LONI) [13]. Such integrative strategies that combine various methodological approaches have been shown to considerably further the understanding of the pathology of schizophrenia. The recently established Australian Schizophrenia Research Bank (ASRB) builds on and extends the ideas of such previous facilities to create a nationally accessible resource for schizophrenia researchers in Australia and beyond. The heritability of schizophrenia is of the order of 70-80%. However, the inheritance pattern is not the cla ssical Mendelian type. As with other complex diseases (eg, diabetes, cardiovascular disease), it is believed to involve a number of contributing genes, each of small effect, interacting with each other and with environmental factors. With this in mind, traditional genetic research approaches based on the diagnostic category of schizophrenia need to be modified if we are to further our understanding of the genetic basis of this disease. A more recent approach in schizophrenia research has been to investigate discrete neurobiological or neurocognitive characteristics that may be more closely linked to a particular gene [8, 12] rather than the clinical syndrome diagnosed as schizophrenia. These characteristics, known as endophenotypes, can assist researchers in unravelling the complex genetic causality of schizophrenia and help to identify individuals who carry the genetic trait for these discrete deficits [20]. In this paper we describe and discuss issues in the use of primarily Globus -based [4] technology to build a grid [3] that allows geographically distributed researchers to contribute to initially the NISAD/LONI Virtual Brain Bank, and now the encompassing ASRB’s collection of schizophrenia -related data and software resources in the quest for knowledge on the reasons for and treatment of schizophrenia. The NISAD/LONI Virtual Brain Bank [14] primarily consists of a large distributed database of high resolution 3D computer representations of the brains of approximately 250 schizophrenia patients and age/gender-matched healthy control subjects, derived from structural MRI images and transformed into a standardised spatial coordinate system. The purpose of this bank is to provide a resource for the analysis of subtle structural variations between the brains of schizophrenia patients and healthy controls, and to map brain changes that occur as a result of variables such as age, gender, duration of illness and duration of untreated psychosis. The brain bank also provides the opportunity to explore associations between brain structure and clinical or neurocognitive measures, gene expression or genetic linkage data, and functional measures of 2 The ASRB Grid A major issue for schizophrenia research is the expense of the collection of patient data (e.g. MRI brain scans, tissue samples) needed for analysis. The ASRB will have a major impact on schizophrenia research in Australia because it will amortise the high cost and the significant time involved in obtaining data across the 2 national body of researchers. As schizophrenia is likely to involve multiple genes of small effect, access to large sample sizes is a key to undertaking studies of sufficient statistical power. With its cross-referenced data in clinical, cognitive, neuroanatomical and genetic domains, the ASRB will make a huge contribution to schizophrenia research on a national scale, enabling multiple research questions to be addressed relatively easily in a large sample that would otherwise be inaccessible or prohibitively expensive for independent investigators to acquire. This large data set will be formed by merging existing data held by groups around the country, and supplementing it with data obtained by a concerted recruitment and collection process. potentially patients. beneficial treatments for those As the ASRB Grid contains personal patient information, security is of vital importance. Typical Grids require strong security to determine whether a user should have access to a given system, or set of systems, without the need for any fine-grained security; a user is either allowed to access the system, or they are not. The ASRB Grid is different because users have different access rights to the resources provided by the Grid, even those on an individual component system. Further, a researcher should be able to perform a preliminary query on data for which they are not currently authorised, allowing them to identify data of interest as a pre-cursor to a request for access to it. For example, it should be possible for the researcher to search for scans exhibiting particular features to determine if there are sufficient samples to justify their requesting access to them. If there were insufficient data items that match their query, it would be a waste of time and resources to request access to the data. If, on the other hand, it was found that there was a sufficiently large extant data set (albeit currently unavailable to the individual researcher), it is likely that a request for access to that existing data would be significantly easier (and less expensive) to achieve than collection of new data. Notwithstanding, it is essential that certain aspects of the data, especially information that can identify patients, be inaccessible to any user who has not been given specific rights to access it. Ethics approvals are necessarily associated with the collection of data and samples from live patients. Such ethics approvals typically specify the project for which data is to be used, and limit the group of researchers who can access the data to, for example, those at a particular institution, or in a particular research group. It is also common that most researchers permitted to use and analyse patient data are prevented from being able to identify patients from their data (i.e. the data is de-identified). The extant data collections currently held at the disparate Australian member sites are all subject to existing ethics approvals. Access to the new patient data for which collection has been funded by the NHMRC, is similarly controlle d. Thus, a major and important aim of the ASRB Grid is to provide controlled access to the data available to each particular user of the Grid. The most obvious need is to allow all authorized users to access the newly collected data, but it is also important to allow access to any other data collections for which the user has approval, either through their institution, research group, or personally. A further consideration is that it should be possible for selected personnel to identify patients from the ir data in the circumstance that analysis has discovered Once the researcher has the data needed for their experiment, they typically would execute computer programs to analyse this data. At present this can involve manually collecting the data into a compressed archive, sending it to, for example, Los Angeles via FTP, and waiting for the results to be returned. At the remote processing site, a user must extract the data, 3 schedule it for analysis, collect the results and then return them to the initiating researcher. Other less compute-intensive tasks can be controlled by a single user, though these still require manual scheduling on computers in Australia, which can be time consuming, increasing the time needed by the researcher to do their job. It is intended that by utilizing compute servers in the ASRB Grid, this handson approach to computer-based analysis can be reduced, with researchers simply submitting the job to the Grid, after which the Grid automatically schedules and runs the job, collects the results, and returns them to the researcher, with no further human interaction required. after a request has been made the service can later be queried to obtain updated information about the task. As data access is a very important part of the ASRB Grid, two important components of the Globus Toolkit for this project are GridFTP [5] and OGSA-DAI (Open Grid Services Architecture Data Access and Integration) [17]. GridFTP is an extension to regular FTP that supports using Globus credentials for authorization and authentication. It has been extended in Globus Toolkit 4 with the Reliable File Transfer service, which is a Web service for managing secure third-party GridFTP transfers. OGSA-DAI is middleware designed to give secure access to data stores such as relational databases, as well as to integrate data from different sources via the Grid. It allows the access of relational databases using the WSRF, giving the ability to securely access them via Web services. A final and important requirement of the ASRB Grid is that it should be easy to use, and provide reasonable performance and feedback. If the user interface to the new infrastructure is too complex, or if the performance is pedestrian, users will prefer to continue using the familiar old methods, with all their problems. Thus, use of the new system must be as intuitive as possible, and should hide or abstract over all unnecessary complexity. This means that sensible defaults should be chosen for all options, and a consistent interface should be provided to enable the researchers to concentrate on their research rather than being caught up dealing with the vagaries of the computer support. It was decided that a Web portal should be used to access the Grid systems, as this will eliminate the need for researchers to install special software on their machines, providing flexibility with respect to client location and host computer. The portal framework chosen is Gridsphere [16], with GridPortlets [19] used to access the Grid. Gridsphere is an open-source portal framework completely compliant with the JSR 168 specifications, so that any standardscompliant portlet can be used by Gridsphere. GridPortlets are a set of portlets for Gridsphere that allow access to Grid resource and user credential management, as well as GridFTP operations, and many other useful Grid activities. The GT4Portlets extension to this allows the execution of jobs on remote Globus Toolkit 4 systems, and further enhances GridPortlet’s compatibility with the newest version of Globus. 3 Support for Fine-grained Security To make the ASRB Grid as accessible as possible, it was decided at an early stage that Web services should be used wherever possible . It was also a preference of the Australian Research Council that the Globus Toolkit 4 [4] be used. Thus Globus was chosen as the software to provide the grid framework. Version 4 of the Globus toolkit is mainly built on the Web Service Resource Framework (WSRF), which allows Web services to have state , so that 4 In order to supply users with credentials to access ASRB Grid resources, a SimpleCA certificate authority is being established. To further facilitate the researcher’s use of the system, PURSe portlets [2] are used to eliminate the user’s need to knowingly interact with this system. Using these portlets, a user fills in a Web-based form to request an account. The user is then sent an email to verify their request and an administrator is informed of the request. The administrator can accept or reject the user, and has the capability to provide the user with access to an account on the Grid; ultimately the user is informed by email of the result. Provided the user is accepted, appropriate Grid credentials are automatically created for the user and a proxy certificate stored for them in a MyProxy server. The user can then log in to the Web portal, using a password supplied by them in their initial request, and a proxy certificate is automatically retrieved from the MyProxy server. This proxy certificate will then be available for access by the portlets in the Web portal. The portlets use these credentials to authenticate with any Grid resources in a manner that is completely transparent to the user. is needed, and the complexities of these relationships can best be handled by the users themselves. GridFTP is the only component of the Globus Toolkit that supports CAS out of the box, though OGSA-DAI can be extended to support CAS with very little impact on performance [18]. Much of the Globus Toolkit is currently accessible only through the use of command-line statements. Technologies such as the CoG Kits [22] and GridPortlets make access to Globus Grids much easier, but the CAS technologies that we have chosen to use have really only been usable from the command line. Thus, one of the first things needed by this project is portlets for accessing CAS. A portlet that allows authorized users to manage CAS entities has been created. With this facility users with the correct CAS permissions are able to view, create, and delete CAS entities, such as groups or service actions. In addition the portlet provides the ability to grant and revoke rights to groups and services. CAS will thus also be used by administrators to grant access to various database tables, through OGSA-DAI. Since identified patient data will be stored on the ASRB Grid, it is vitally important that researchers are restricted to access only that data for which they are approved (resultant from ethics approval, or otherwise). As a result it is required that users be given different levels of access to resources based on both their own identity, and the groups to which they belong. The Globus Toolkit includes a component that can be used for this purpose: the Community Authorization Service (CAS) [6] (which is not to be confused with JA-SIG’s Central Authentication Service [11]). CAS allows resource providers to give course-grained access to various systems, handing finer-grained access control management to the community of users. This is important for the ASRB Grid because there are very complex levels of access for different data resources, so fine-grained control 4 Future Work Development of the ASRB Grid is very much an on-going project, and there are a number of parallel development tasks in progress, as described in the following sub-sections. 4.1 Description of Patient Data The above security framework is designed to provide tightly controlled access to resources such as data and computation. To date much of the extant patient data has not been available online; rather the data are stored on CDs or DVDs in researcher’s offices, and these must be moved to on-line storage subsystems so they can be 5 accessed using the Grid. A further issue is the existence of aggressive firewalls that have been used to protect confidentiality of patient data at some of the host sites. The recently-funded collection of substantial quantities of new patient data has not yet begun but is imminent, so provision of infrastructure for storage and processing of that new data is a priority. In parallel it is necessary to finalise meta-data description of the heterogeneous extant (and the homogeneous future) data that will be accessible through the ASRB Grid. Until this significant task is completed no specific tools development can take place. 4.3 Abstraction over Distributed File Storage A service that allows users to create logical folders, providing a window onto data on all the different GridFTP servers to which they have access, will also be integrated into the system. Thus, users will simply see a familiar folder-like structure containing sub-folders and files. This is achieved using the Globus Replica Location Service (RLS) [7] and a system to map a set of logical files to a set of logical folders; the actual files in the folders may be stored on any of the GridFTP servers available to the user of the Grid. The various locations of the data available to the user will be abstracted away by this service, allowing users to simply see their data without regard for the location at which it is stored. 4.2 Extension of Portlet Support To date, there has be en a paucity of reported development of portlets to access OGSA-DAI resources, especially for OGSA-DAI secured by CAS. While some OGSA-DAI portlets have been developed, they currently do not provide the level of support for security required by this implementation, and so must be extended to provide the necessary security. It will also be necessary to create or modify some GridFTP portlets to include CAS functionality so that researchers are able to easily share their data with groups to which they wish to provide such access. It is also planned to create a new PURSe Portlets registration module to automatically enrol users in various CAS groups when their account is created. This will include placing them in a group over which they have complete control, as well as giving them exclusive access to space on a GridFTP server. Users will then be able to create their own selfcontrolled groups, allowing them to share their data with authorised users while asserting as much fine-grained control as is necessary. There would be no requirement for administrator intervention in the establishment and control of such groups. 4.4 Access to Data Processing and Analysis Facilities The ultimate aim of the ASRB Grid infrastructure is to provide researchers with the ability to analyse (subsets of) the data collection, leading to advances in the understanding and treatment of schizophrenia. While it will be possible (subject to access rights) for researchers to download data to their own machines to perform analysis, there will be tasks which will benefit from access to the parallel resources of the Grid. For example, the data associated with a single MRI scan can exceed one gigabyte, and transfer of such quantities of data across the Internet is expensive with respect to time (noting that some of the member sites are up to 4,000 kilometres apart). Analysis of such data is more efficiently performed by positioning the computation close to the data source, with high bandwidth data path(s) joining them. Unfortunately, automatically executing a task on a set of remote machines is difficult. Projects such as GT4Portlets allow the execution of jobs on a single remote machine, and projects such as the Gridbus Broker [21] automatically 6 allocate tasks to servers, but the interfaces to these are very general. Thus a further task for this project is to create a portlet wizard that allows the easy creation of a portlet to execute a particular application. It is envisaged that these portlets will be based on the Gridbus Broker, but will enable researchers to choose input files and set parameters using a simple, easily understandable Web form. The provision of a wizard will make it easy for developers to create portlets for many different programs. If a specific program has special needs, however, developers will still have access to the full source code so that the portlet can be modified as needed. This will enable researchers and developers to use the processing capabilities of the distributed compute servers much more easily than is currently possible. and remotely stored data; intuitive wizard-based access to distributed compute servers and application programs; the ability for users to provide individuals and/or groups with controlled access to their personal data store. 6 References 1. Carr, V., Lewin, T., Neil, A., Halpin, S., and Holmes, S., Premorbid, psychosocial and clinical predictors of the costs of schizophrenia and other psychoses. British Journal of Psychiatry, 2004. 184: p. 517-525. 2. Christie, M., PURSe Portlets Website, http://www.extreme.indiana.edu/portals/purse-portlets. 3. Foster, I. and Kesselman, C., The Grid: Blueprint for a New Computing Infrastructure. 1999: Morgan Kaufmann. 4. Foster, I. Globus Toolkit Version 4: Software for Service-Oriented Systems. in IFIP International Conference on Network and Parallel Computing. 2005: SpringerVerlag. 5. Globus, GT 4.0 GridFTP, http://www.globus.org/toolkit/docs/4.0/data/gridftp/. 6. Globus, GT 4.0: Security, http://www.globus.org/toolkit/docs/4.0/security/. 7. Globus. RLS: Replica Location Service, http://www.globus.org/rls/. 8. Gottesman, I.I., McGuffin, P., and Farmer, A.E., Clinical genetics as clues to the real genetics of schizophrenia (a decade of modes t gains whilst playing for time). Schizophrenia Bulletin, 1987. 13(1): p. 23-47. 9. Gureje, O. and Bamidele, R.W., Gender and schizophrenia: association of age at onset with antecedent, clinical and outcome features. Australia and New Zealand Journal of Psychiatry, 1998. 32(3): p. 415-423. 10. Jablensky, A., Epidemiology of schizophrenia: the global burden of disease and disability. European Archives of Psy chiatry and Clinical Neuroscience, 2000. 250(6): p. 274-285. 11. JA-SIG, JA-SIG Central Authentication Service, http://www.ja-sig.org/products/cas. 12. Kremen, W.S., Faraone, S.V., and Seidman, L.J., Neuropsychological risk indicators for schizophrenia: a preliminary study of female relatives of schizophrenic and bipolar probands. Psychiatric Research, 1998. 79(3): p. 227-240. 13. LONI, Laboratory of Neuro Imaging, http://www.loni.ucla.edu/. 5 Conclusion This paper introduces a project that uses the Globus toolkit and related technologies that allows Australian Mental Health researchers to share data and application programs in their quest for understanding of schizophrenia and ultimately improvements in its treatment. A web services portal that provides fine-grained control over user access to resources is described. This portal simultaneously provides simple authentication-based access for users and certificate-based access to sub-sets of the entire resource collection. Users are unaware of host network boundaries and the need for separate authentication for the disparate sites and servers; these requirements are abstracted away by the portal. The ASRB Grid is very much a work in progress. On-going development of abstractions over distributed data storage, remote compute services and portal development are also presented. These facilities will result in: nested folders that provide consistent access to locally 7 14. NISAD, The NISAD/LONI Virtual Brain Bank , http://www.nisad.org.au/newsEvents/resNews/wwwscz res.asp. 15. NISAD, http://www.nisad.org.au/. 16. Novotny, J., Russell, M., and Wehrens, O., Gridsphere: A Portal Framework for Building Collaborations, Gridsphere Project Website. 17. OGSA -DAI, OGSA-DAI Software, http://www.ogsadai.org.uk/index.php. 18. Pereira, A., Muppavarapu, V., and Chung, C., RoleBased Access Control for Grid Database Services Using the Community Authorization Service. IEEE Trans. on Dependable and Secure Computing, 2006. 3(2): p. 156-166. 19. Russell, M., Novotny, J., and Wehrens, O., The Grid Portlets Web Application: A Grid Portal Framework, Gridsphere Project Website. 20. Trillenberg, P., Lencer, R., and Heide, W., Eye Movements and psychiatric disease. Current Opinion in Neurology, 2004. 17(1): p. 43-47. 21. Venugopal, S., Buyya, R., and Winton, L., A Grid Service Broker for Scheduling e-Science Applications on Global Data Grids. Concurrency and Computation: Practice and Experience, (accepted Jan 2005). 22. von Laszewski, G., Gawor, J., Lane, P., Rehn, N., Russell, M., and Jackson, K., Features of the Java Commodity Grid Kit. Concurrency and Computation: Practice and Experience, 2002. 14: p. 1045-1055. 8