Big Data Spain

15th ~ 16th OCT 2015 MADRID, SPAIN #BDS15


THANK YOU FOR AN AMAZING CONFERENCE!


THE 4th EDITION OF BIG DATA IN Oct 2015 WAS A RESOUNDING SUCCESS.

EUCLID & BIG DATA FROM DARK SPACE

Thursday 15th

from 18:30 pm to 19:15 pm

Room 25

-

Technical

Euclid is a high-precision survey mission developed in the frame of the Cosmic Vision Program of ESA in order to study the Dark Energy and the Dark Matter. Its Science Ground Segment (SGS) will have to deal with around 175 PB of data both coming from Euclid satellite data, complex pipeline processing, external ground based observations or simulations, and with an output catalog containing the description of around 10 billion of objects with hundreds of attributes. Thus, the implementation of the SGS is a real challenge in terms of architecture and organization. This talk describes the Euclid project challenges, the foreseen architecture, the ongoing proof of concept challenges and the plan for the future.

Read more

GROUND SEGMENT ARCHITECTURE

The Euclid SGS development is therefore a real challenge in terms of architecture design (storage, network, processing infrastructure) and of organization. Thus, 9 Euclid SDCs will have to be federated, ensuring an optimized data storage and processing distribution and providing sufficient networking interconnection and bandwidth. In terms of organization, more than 14 countries will be involved in the project and hundreds of non-necessarily collocated people will have to work together either on scientific, engineering or on IT aspects.

In particular, the reference architecture, currently proposed, for the SGS, will be based on:

  • A single metadata repository which inventories, indexes and localizes the huge amount of distributed data: Euclid Archive System (EAS)
  • A Monitoring & Control Service allowing to monitor the status of the SGS as a whole or at SDC level,
  • A COmmon ORchestration System (COORS) managing the distribution of the storage and the processing among the different SDCs (ensuring the best compromise between data availability and data transfers): "move the process not the data",
  • A Distributed Storage System (DSS) providing a unified view of the SDCs distributed storage and managing the data transfers between SDCs,
  • A set of services which allows a low coupling between SGS components: e.g. metadata query and access, data localization and transfer, data storage and data processing orchestration and M&C, …
  • An Infrastructure Abstraction Layer (IAL) allowing the data processing software to run on any SDC independently of the underlying IT infrastructure, and simplifying the development of the processing software itself. It shows generic interfaces to the processing software and isolates it from the “plumb” (e.g. it gathers input data, publish output data on behalf of the processing S/W).

This architecture concept has already been validated through "SGS Challenges", allowing namely to distribute and execute first simulation prototypes on any of the SDCs thanks to IAL and EAS prototypes. The first outcomes will be presented. This challenge approach allows deploying working prototypes at early stages and is a great factor of motivation for the teams disseminated among different laboratories and Computing Centers around Europe.

VIRTUALIZATION

Another factor of potential complexity is the fact that most of the SDCs rely on existing Computing Centers that usually do not share the same infrastructure and operating system. Rather than having to setup, test and maintain different targets for the Euclid software, the choice has been made to rely on virtualization in order to be able to deploy the same guest operating system and the same Euclid software distribution on any of the 10 Euclid SDCs host operating system and infrastructure.

This virtual processing node image also called “EuclidVM” will simplify a lot both the development of the Euclid processing software and its deployment. At the time being, we are studying the CernVM ecosystem(µCernVM, CernVM-FS, elastic virtual cluster based on Openstack) with the support of the CERN that developed it. This technology seems relevant for Euclid EC SGS and could be applicable with few adaptations, thus avoiding having to “reinvent the wheel”.

Guillermo Buenadicha foto

Guillermo Buenadicha

ESAEuclid Science Operations Center System Engineer