Hybrid Data Infrastructure

A Hybrid Data Infrastructure is a new type of Data Infrastructure specifically conceived to deal with data-intensive science (see also e-Science). In such a domain space, (potentially large-scale) datasets come in all forms and shapes from huge international experiments to cross-laboratory, single laboratory, or even from a multitude of individual observations. The management and processing of such datasets is beyond the capacity of traditional technological approaches based on local, specialized data facilities. Such data are characterized by the well known three V's: (i) Volume – data dimension in terms of bytes is huge, (ii) Velocity – data collection, processing and consumption is demanding in terms of speed, and (iii) Variety – data heterogeneity, in terms of data types and data sources requiring integration, is high.


A Hybrid Data Infrastructure is an innovative approach based on the assumption that several technologies, including Grid, private and public Cloud, can be integrated to provide an elastic access and usage of data and data-management capabilities. Moreover, it must be equipped with a rich array of mediator services for interfacing with existing data sources and repositories.


Overall, its goal is to enable a data-management-capability delivery model in which computing, storage, data and software are made available by the infrastructure as-a-Service. It might be equipped with a service supporting the dynamic creation of Virtual Research Environments, which conceptually can be seen as applications tailored to serve a specific need whose constituents are acquired by the HDI.


For more information see this article.