The ESCAPE Data Infrastructure for Open Science (DIOS) team has included in its datalake platform the first 10 storage endpoints from most of the ESCAPE partners. This will allow ESCAPE to perform the first working pilot and to start ambitious data transfer and management tests.
The ESCAPE DIOS is a federated data infrastructure of open access data that follows the FAIR data management principles and enables large national research data centres to work together and build a flexible datalake in terms of data storage, security, safety and transfer to curate and scale up to their multi-exabyte needs.

 

A DataLake with different storage technologies in a common service layer for all ESFRIs

After defining the datalake architecture based on the data management needs of the ESFRIs (European Strategy Forum on Research Infrastructures) involved in ESCAPE, the project finalised the inclusion of 10 storage endpoints in its ESCAPE DIOS, from most of the ESCAPE partners (INFN-CNAF, INFN-ROMA, INFN-Napoli, DESY, SURF-SARA, IN2P3-CC, CERN, IFAE-PIC, CNRS-LAPP and GSI) and populated it with some real data from several experiments: LOFAR, LSST, ATLAS and CMS data (in moderate data volumes). 
In order to match different needs, the ESCAPE DIOS harnesses a variety of storage technologies through the orchestration layer: dCache, DPM, XRootD, EOS, StoRM also covering distributed and federated storage systems as well as traditional/local installations. For the datalake working pilot, the orchestration layer has been consolidated and a dedicated RUCIO instance for data management has been created. RUCIO is open-source software with a proven track record in managing billions of files across over 100 data centres and is the key data management element for the ESCAPE DIOS. 
Due to the increasing scale of the datalake, the ESCAPE DIOS needs to foster knowledge transfer to sites and experiments, mostly through live-monitoring on: data transfers, data volume and number of files transferred, network status and a site reporting tool. 

Figure 1 - Snapshot of one of our ESCAPE DIOS dashboards (still under development).

Regarding the Data, content delivery and caching, the prototyping phase is well on track and three sites have deployed a proof of concept to evaluate the XCache technology that will be used to integrate the computing resources with the ESCAPE DIOS. For the moment, real data processing from/to the datalake has been demonstrated, while addressing ESFRIs’ requirements. Furthermore, a centralised information system has been set up, as well as the ESCAPE Authentication, Authorization and Identity Management (AAI).


Figure 2 - Caching infrastructure and the two main benefits to operate caches: latency hiding and file reusability.

 

What lies ahead?

The next challenges for the ESCAPE DIOS are to test and understand the capabilities of the pilot, as well the suitability of this technology to perform in the different use cases and data storage paradigms that the different experiments in ESCAPE require. [In the following months the ESCAPE DIOS workflow and pipeline implementation to test data access will progress. The implementation of storage Quality of Service across ESCAPE DIOS is also in place and data lifecycles are being implemented. 

 

READ THE REPORT “MILESTONE 8: INITIAL PILOT DATALAKE WITH AT LEAST 3 CORE DATA CENTRES” TO KNOW MORE  
 

 

Views 10,264