Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Patrick Fuhrmann

    In early 2020, the EOSC Community took another crucial step on the road to the development and implementation of the European Open Science Cloud, as seven key EOSC-related Horizon 2020 projects signed a Collaboration Agreement in support... more
    In early 2020, the EOSC Community took another crucial step on the road to the development and implementation of the European Open Science Cloud, as seven key EOSC-related Horizon 2020 projects signed a Collaboration Agreement in support of the EOSC Governance. The Agreement involves all the projects supported within the INFRAEOSC-05-2018-2019 call. The Agreement provides a useful framework for all parties to collaborate on a wide range of topics, in order to enhance synergies in all mutual activities related to the EOSC. The projects also agreed on a Joint Activity Plan, which will guide them towards the first iteration of EOSC. Overlaps and complementarities among projects were identified, as well as specific areas for potential cooperation, ultimately aimed at the development of a common strategy to synchronise activities with the EOSC Working Groups. Between April and May 2020, EOSCsecretariat.eu collected the position papers on EOSC compiled by the INFRAEOSC 5b projects, the su...
    Distributed dCache instances become increasingly attractive for geographically distributed installations. One of the aspects of such deployments is securing the inter-component communication. We will present how to run dCache in a... more
    Distributed dCache instances become increasingly attractive for geographically distributed installations. One of the aspects of such deployments is securing the inter-component communication. We will present how to run dCache in a distributed manner over wide area networks.
    The eXtreme DataCloud (XDC) project is aimed at developing data management services capable to cope with very large data resources allowing the future e-infrastructures to address the needs of the next generation extreme scale scientific... more
    The eXtreme DataCloud (XDC) project is aimed at developing data management services capable to cope with very large data resources allowing the future e-infrastructures to address the needs of the next generation extreme scale scientific experiments. Started in November 2017, XDC is combining the expertise of 8 large European research organisations. The project aims at developing scalable technologies for federating storage resources and managing data in highly distributed computing environments. The project is use case driven with a multidisciplinary approach, addressing requirements from research communities belonging to a wide range of scientific domains: Life Science, Biodiversity, Clinical Research, Astrophysics, High Energy Physics and Photon Science, that represent an indicator in terms of data management needs in Europe and worldwide. The use cases proposed by the different user communities are addressed integrating different data management services ready to manage an incre...
    The ability to share information is crucial to scientific progress, yet it is not easy to manage and share the vast amounts of data generated at Photon and Neutron Research Infrastructures (PaN RIs). We spoke to Professor Dr Patrick... more
    The ability to share information is crucial to scientific progress, yet it is not easy to manage and share the vast amounts of data generated at Photon and Neutron Research Infrastructures (PaN RIs). We spoke to Professor Dr Patrick Fuhrmann and Dr Sophie Servan about the work of the ExPaNDS project in creating a framework for effective data management and supporting open science
    This document summarises work in ExPaNDS Data Analysis Services work package (WP4) for the preparation and publication of Photon and Neutron reference data sets to be used to adapt, align and validate the project's Prototype Data... more
    This document summarises work in ExPaNDS Data Analysis Services work package (WP4) for the preparation and publication of Photon and Neutron reference data sets to be used to adapt, align and validate the project's Prototype Data Analysis Services.
    International Symposium on Grids and Clouds 2016, ISGC 2016, Taipai, Taiwan, 13 Mar 2016 - 18 Mar 2016 ; (2016).
    As we demonstrated during the Barcelona workshop, dCache has a new RESTful interface, providing direct access to the chimera namespace. This time we will present a set of new API calls to move files from/to a tape backend
    Over 140 participants, including IT professionals, researchers and managers from the photon and neutron communities, and representatives of the EOSC landscape, attended the 1st online Photon and Neutron EOSC Symposium, jointly organised... more
    Over 140 participants, including IT professionals, researchers and managers from the photon and neutron communities, and representatives of the EOSC landscape, attended the 1st online Photon and Neutron EOSC Symposium, jointly organised by PaNOSC and ExPaNDS on 9 November 2020. Find in this booklet all presentations, the recordings can also be found here.
    This document presents the progress of the ExPaNDS project after its first 18 months of activities, spanning from September 2019 to February 2021. It reproduces the explanation of the work carried out by the ExPaNDS partners as provided... more
    This document presents the progress of the ExPaNDS project after its first 18 months of activities, spanning from September 2019 to February 2021. It reproduces the explanation of the work carried out by the ExPaNDS partners as provided to the European Commission in the first periodic report of the project.
    Documentation is an essential component of any software product. Unfortunately, it's always behind the actual development and is always out of date. To fix that issue, we decided to switch to an easy-to-use tool chain to allow... more
    Documentation is an essential component of any software product. Unfortunately, it's always behind the actual development and is always out of date. To fix that issue, we decided to switch to an easy-to-use tool chain to allow internal and external contributors to update and publish the 'dCache Admin Book'. We'll present the tool and its workflow.
    Starting with version 3.0, dCache supports high availability deployments. This allows central services to be scaled horizontally, provides fault tolerance for central services, as well as zero-downtime staged upgrades. Such a setup... more
    Starting with version 3.0, dCache supports high availability deployments. This allows central services to be scaled horizontally, provides fault tolerance for central services, as well as zero-downtime staged upgrades. Such a setup typically involves a redundant setup of ZooKeeper and PostgreSQL in addition to dCache, and typically relies on a redundant load balancer such as HAProxy.
    One of the ways to increase data resiliency in dCache is to keep multiple copies of precious files. In dCache this feature used to be provided by the Replica Manager. Based on a large set of new requirements by our customers it became... more
    One of the ways to increase data resiliency in dCache is to keep multiple copies of precious files. In dCache this feature used to be provided by the Replica Manager. Based on a large set of new requirements by our customers it became necessary to redesign and improve this component now called Resilient Manager. We will demonstrate new and exciting features of this subsystem and will present a typical migration path from its predecessor.
    The anticipated increase in storage requirements for the forthcoming HL-LHC data rates is not matched by a corresponding increase in budget. This results in a short-fall in available resources if the computing models remain unchanged.... more
    The anticipated increase in storage requirements for the forthcoming HL-LHC data rates is not matched by a corresponding increase in budget. This results in a short-fall in available resources if the computing models remain unchanged. Therefore, effort is being invested in looking for new and innovative ways to optimise the current infrastructure, so minimising the impact of this shortfall. In this paper, we describe an R&D effort targeting "Quality of Service" (QoS), as a working group within the WLCG DOMA activity. The QoS approach aims to reduce the impact of the shortfalls, and involves developing a mechanism that both allows sites to reduce the cost of their storage hardware, with a corresponding increase in storage capacity, while also supporting innovative deployments with radically reduced cost or improved performance. We describe the strategy this group is developing to support these innovations, along with the current status and plans for the future.
    DESY manages not only one of the largest Tier-2 sites with about 18 500 CPU cores for Grid workloads but also about 8000 CPU cores for interactive user analyses. In this presentation, we recapitulate the consolidation of the batch systems... more
    DESY manages not only one of the largest Tier-2 sites with about 18 500 CPU cores for Grid workloads but also about 8000 CPU cores for interactive user analyses. In this presentation, we recapitulate the consolidation of the batch systems in a common HTCondor based setup and the lessons learned as both use cases differ in their goals. Followingly, we will give an outlook on the future developments.\newline While for Grid jobs startup latencies are negligible and the primary focus is on an optimal utilization of the resources, users of the {\it National Analysis Factory} for interactive analyses prefer a high responsiveness of the batch system as well as the storage.\newline In the ongoing evolution of the batch system, we are exploring two different approaches to abstract the batch node's host OS from the actual job OS. For Grid jobs we are running legacy workloads in lightweight Singularity containers deployed via CVMFS. For interactive NAF jobs, we move towards Docker containe...
    The eXtreme-DataCloud (XDC) is an EU H2020 funded project aimed at developing scalable technologies for federating storage resources and managing data in highly distributed computing environments. XDC software stack is based on existing... more
    The eXtreme-DataCloud (XDC) is an EU H2020 funded project aimed at developing scalable technologies for federating storage resources and managing data in highly distributed computing environments. XDC software stack is based on existing tools, whose technical maturity is proved, that the project enriches with new functionalities and plugins to address real life requirements from user communities belonging to a variety of scientific domains: Life Science, Astrophysics, High Energy Physics, Photon Science and Clinical Research. The XDC toolbox includes services like ONEDATA (AGH), dChace (DESY), EOS (CERN), the INDIGO-PaaS Orchestrator (INFN), DYNAFED (CERN). The targeted platforms are the current and next generation e-Infrastructures deployed in Europe, such as the European Open Science Cloud (EOSC), the European Grid Infrastructure (EGI), the Worldwide LHC Computing Grid (WLCG) and the computing infrastructures funded by other public and academic initiatives. The focus of the project is on policy driven data management, on orchestration based on Quality of Services and on pre-processing of ingested data with user-defined applications. Smart caching and storage federation technologies for the creation of the so-called “DataLakes” are also important topics for XDC. In this manuscript, we present the recent advancements in the project services architecture and the developments carried on in preparation of the first XDC release planned for the end of 2018.
    The Sustainability Policy report describes the mechanisms and policies the ExPaNDS collaboration is envisioning to make the achievements of their activities available after the end of the project funding period.
    Within the DOMA working group, the QoS activity is looking at how best to describe innovative technologies and deployments. Once scenario that has emerged is providing storage that uses end-of-warranty disks: the cheap (almost free)... more
    Within the DOMA working group, the QoS activity is looking at how best to describe innovative technologies and deployments. Once scenario that has emerged is providing storage that uses end-of-warranty disks: the cheap (almost free) nature of this storage is offset by a much larger likelihood of data loss. In some situations, this trade-off is acceptable, provided the operational overhead of handling this data loss is not excessive. In this paper, we present a model where dCache provides access to this data. Improvements within dCache administrative interface allow for almost no operational overhead for handling such storage. The storage events concept allows experiment data-management frameworks, such as Rucio, to learn of any data-loss in a robust and fully automated fashion. These frameworks can then follow strategies to recover from these problems; for example, by copying the lost data back into dCache.
    The ExPaNDS project aims at deploying into EOSC data catalogues and data analysis services. This document describes the proposed architecture for these services and how they are planned to integrate into EOSC. The first part elaborates on... more
    The ExPaNDS project aims at deploying into EOSC data catalogues and data analysis services. This document describes the proposed architecture for these services and how they are planned to integrate into EOSC. The first part elaborates on our plan to abstract the data and analysis services with common APIs in order to make them available to portals and processing pipelines without knowing where the actual facility data, analysis stacks and ICT infrastructure resides. Then, as this is one of ExPaNDS high level objectives, the document describes the relationship between services provided by ExPaNDS RIs and the EOSC as well as their interdependencies.
    •Currently data lives on islands of storage •catalogues are the maps •FTS/gridFTP are the delivery companies •Experiment frameworks populate the island •Jobs are directed to places where the needed data is •or should be...... •Almost all... more
    •Currently data lives on islands of storage •catalogues are the maps •FTS/gridFTP are the delivery companies •Experiment frameworks populate the island •Jobs are directed to places where the needed data is •or should be...... •Almost all data lives on more than one island •Assumption: •perfect storage ( unlikely to impossible) •perfect experiment workflow and catalogues ( unlikely) •Strict locality has some limitations –a single missing file can derail the whole job •or series of jobs-> Failover to data on another island could help •Replica catalogues impose limitations, too –E.g. synchronization is difficult, performance too •Quest for direct, Web-like forms of data access •Great plus: other use cases may be fulfilled e.g. site EMI INFSO-RI-261611 caching, sharing storage amongst sites 18 Sept 2012 F.Furano- Dynamic federationsStorage federations •What’s the goal? –Make different storage clusters be seen as one –Make global file-based data access seamless •How should this be done? –Dynamically •easy to setup/maintain •no complex metadata persistency •no DB babysitting (keep it for the experiment’s metadata) •no replica catalogue inconsistencies, by design –Light config constraints on participating storage –Using standards •No strange APIs, everything looks familiar •Global direct access to global data
    In 2007, the most challenging high energy physics experiment ever, the LHC at CERN, will go online and will produce a sustained stream of data in the order of the content of one ordinary CD every two seconds, which would result in a stack... more
    In 2007, the most challenging high energy physics experiment ever, the LHC at CERN, will go online and will produce a sustained stream of data in the order of the content of one ordinary CD every two seconds, which would result in a stack of CDs as high as the Eiffel Tower once per week. For various political and technical reasons, this data is, while produced, distributed and persistently, tape based stored at several dozens of sites around the world, building the LHC data grid. Those sites are expected to provide the necessary middle-ware, abstractly called a Storage Element, talking the agreed protocols to receive the data and storing it at the site specific Hierarchical Storage Systems. One implementation of a Storage Element, installed at the majorities of sites, is the dCache/SRM. The dCache/SRM system has been designed and implemented in close collaboration between the Deutsches Elektronen-Synchrotron in Hamburg and the Fermi National Accelerator Laboratory near Chicago. Besi...
    Within the Federated Identity Management work package of DSIT we analysed the requirements of our users regarding federated authentication and authorization components. Based on these components an integrative architecture was developed.... more
    Within the Federated Identity Management work package of DSIT we analysed the requirements of our users regarding federated authentication and authorization components. Based on these components an integrative architecture was developed. Several pilots have been implemented to demonstrate the feasibility and general usefulness of the proposed framework. The LSDMA AAI includes bridging between SAML, OIDC and X.509 infrastructures as well as support for console access for traditionally web-oriented protocols like SAML and OIDC.
    After successful implementation and deployment of the dCache system over the last years, one of the additional required services, the namespace service, has faced additional and completely new requirements. Most of them are caused by the... more
    After successful implementation and deployment of the dCache system over the last years, one of the additional required services, the namespace service, has faced additional and completely new requirements. Most of them are caused by the scaling of the system, the integration with Grid services and the need for redundant (high availability) configurations. The existing system, having only NFSv2 access path, is easy to understand and is well accepted by the users. This single 'access path' limits data management task of making use of classical tools like 'find', 'ls' and others. This is intuitive for most users, but failed while dealing with millions of entries (files) and more sophisticated organisational schemes (metadata). The new system should support a native programmable interface (deeply coupled, yet fast), a 'classical' NFS path (now version 3 at least), a dCache native access and an SQL path allowing any type of metadata to be used in complex ...
    DESY is one of the world-wide leading centres for research with particle accelerators and synchrotron light. At the hadron-electron collider HERA three experiments are currently taking data and will be operated until 2007. Since end of... more
    DESY is one of the world-wide leading centres for research with particle accelerators and synchrotron light. At the hadron-electron collider HERA three experiments are currently taking data and will be operated until 2007. Since end of August 2004 the DESY Production Grid is operated on basis of the recent LCG-2 release. Its Grid infrastructure is used for all DESY Grid activities, including national and international Grid projects. The HERA experiments are adapting their Monte Carlo production schemes to the Grid.
    The LHC needs to achieve reliable high performance access to vastly distributed storage resources across the network. USCMS has worked with Fermilab-CD and DESY-IT on a storage service that was deployed at several sites. It provides Grid... more
    The LHC needs to achieve reliable high performance access to vastly distributed storage resources across the network. USCMS has worked with Fermilab-CD and DESY-IT on a storage service that was deployed at several sites. It provides Grid access to heterogeneous mass storage systems and synchronization between them. It increases resiliency by insulating clients from storage and network failures, and facilitates file sharing and network traffic shaping. This new storage service is implemented as a Grid Storage Element (SE). It consists of dCache, jointly developed by DESY and Fermilab, as the core storage system and an implementation of the Storage Resource Manager (SRM), that together allow both local and Grid based access to the mass storage facilities. It provides advanced accessing and distributing collaboration data. USCMS is using this system both as Disk Resource Manager at the Tier-1 center and at multiple Tier-2 sites, and as Hierarchical Resource Manager with Enstore as tape...
    This document describes the various architectures of the three middlewares that comprise the EMI software stack. It also outlines the common efforts in the security area that allow interoperability between these middlewares. The... more
    This document describes the various architectures of the three middlewares that comprise the EMI software stack. It also outlines the common efforts in the security area that allow interoperability between these middlewares. The assessment of the EMI Security presented in this document was performed internally by members of the Security Area of the EMI project.
    The high level objective of the data area within LSMDA is the provisioning of tools for conveniently storing, federating, accessing and sharing huge quantities of data. The resulting toolbox is mainly targeting scientific communities, not... more
    The high level objective of the data area within LSMDA is the provisioning of tools for conveniently storing, federating, accessing and sharing huge quantities of data. The resulting toolbox is mainly targeting scientific communities, not willing or not being able to develop their entire data management framework themselves. The selection of services and products within that toolbox is based by their usage of Open Standards and their availability on the Open Source market. Even more importantly, significant focus has been put on the evaluation of the potential self-sustainability of components, due to an active user community or due to the commitment of the product teams to further maintain their products, independently of LSMDA funding. Besides integrating well established and sustainable data management components, LSDMA evaluated gaps in existing data management procedures and, in response, either established working groups in international scientific organizations, like RDA or j...

    And 99 more