The ESCAPE Science Analysis Platform (SAP) aims to provide a gateway capable of accessing and combining data from multiple collections and stages for onward processing and analysis. Researchers from the European Open Science Cloud (EOSC) will be able to identify and stage existing data collections for analysis and tap into a wide-range of software tools, packages and workflows developed by the European Strategy Forum on Research Infrastructures (ESFRIs). Researchers can bring their own customised workflows to the platform and take advantage of both High Performance Computing (HPC) and High-throughput computing (HTC) infrastructures to access the data and execute the workflows.
ESCAPE SAP will provide a set of functionalities from which various communities and ESFRIs can assemble a science analysis platform geared to their specific needs, rather than to attempt providing a single, integrated platform to which all researchers must adapt. ESCAPE SAP users will have easy access to research data and the compute infrastructure and storage available, to seamlessly publish their advanced data products and software/workflows. In addition, they will be able to easily share/query published data and obtain its provenance.
ESCAPE SAP addresses different aspects of data management, by providing components that address different needs in one single place:
When developing a service, it is crucial to identify what the future users are looking for. ESCAPE SAP will support its users to search for data, select data and available software/workflows and to compute resources, process data, publish/share data, software, and research objects. To enable this, a single sign-on mechanism, giving seamless access to all integrated services, is required. But how should this be accomplished?
To accurately identify the ESFRIs needs, ESCAPE organised a use-case requirements workshop and also launched a survey to identify the services and components for ESCAPE SAP in terms of data properties, computing system properties, and software properties.
The inputs collected are from ESFRIs from astronomical observatories (EGO - Virgo, ESO - Paranal, EST, LOFAR, VLA, CTA), particle colliders (FAIR, HL-LHC) and astroparticle physics instruments (KM3NeT). ESFRIs that correlate data from multiple observatories (JIVE), multiple observations (Asteroseismology) and citizen science (Zooniverse- where ESCAPE Citizen Science Projects are hosted) also came on board.
From this detailed analysis, it was clear that ESFRIs need ESCAPE SAP to:
With this, ESCAPE was able to define the functionalities of each service component for ESCAPE SAP, listed in the table below.
The ESCAPE SAP Architectural Design was defined to easily decouple the development of the different components, while adding and replacing functionalities without compromising its robustness.
The ESCAPE SAP platform will be built following the micro service architectural design: each functionality will be implemented as an independent interconnected service. Independence means that if one service is removed, the others will still be able to process Application Program Interface (API) requests, and the platform will still continue to work, albeit with reduced functionality. The services are interconnected through the API Gateway and communicate with each other through exposed API endpoints to create, read, process and delete resources.
More information about the development of ESCAPE SAP are available on “D5.1 Preliminary Report on Requirements for ESFRI Science Analysis Use-cases” and “D5.2 Detailed Project Plan”.
A first initial science platform prototype of ESCAPE SAP, with discovery and data sharing features, is aimed to be ready by July 2020, with an initial deployment set of ESFRI software on this prototype by September 2020. A workshop in November of the same year will test the performance of this prototype.
Data volumes are increasing rapidly, making it more difficult for users to process, analyse and visualise data. The ESCAPE SAP will provide the infrastructure to support data discovery, processing and analysis of data from research infrastructures in a transparent and user friendly way.