Semantics For Big Data Integration

November 2016 - present
52.500 €
Funding organization: 

Italian Ministry of Education, University and Research (PhD scholarship)

Person(s) in charge: 
Executive summary: 

Every day 2.5 quintillion bytes of data are created, and about 75% of such data is unstructured, coming from sources such as text, voice and video (“Bringing Big Data to Enterprise”, IBM). Nevertheless, many efforts are being made at research level in order to give a structure to this information, integrating and linking data from diverse sources.

A general approach and method must be defined to develop a software architecture that combines the state of the art of open technologies in this field. In such way, you can enable enterprises, public administrations, and other kind of institutions to create their own knowledge graph and improve data-driven processes, transparency, business opportunities, and quality of services.


The expression of Big Data refers to datasets that are so large or complex that traditional data processing applications are inadequate. In particular, such inadequacy is connected to specific features of data, the so-called 5 Vs: volume, velocity, value, veracity, and variety. As noticed by Pascal Hitzler and Krzysztof Janowicz (“Linked Data, Big Data, and the 4th Paradigm”, Semantic Web – Interoperability, Usability, Applicability) “the Big Data notion of variety seems the most intriguing one for the Semantic Web research community”. Variety could be considered “a generalization of semantic heterogeneity” in which Linked Data can represent drivers of integration and linking.

For this reasons, experiments on Linked Data and Big Data frameworks can represent a real field of application, not just from a knowledge organization perspective (see also the paper: Shiri, A. (2014). “Linked Data Meets Big Data: A Knowledge Organization Systems Perspective”. Advances In Classification Research Online, 24(1). doi:10.7152/acro.v24i1.14672). There are now the conditions to design and develop an open architecture and a knowledge graph to demonstrate the real potential in the combination of Big Data and Linked Data approaches.


Last Update: 2017-03-15; Next Expected Update: TODO

This research activity is based on a PhD research program and it is driven by 2 research questions:

  • RQ1: How can the technological foundations of Linked Data and Big Data can be combined to create an open software architecture for a multi-thematic and multi-perspective knowledge graph from heterogeneous sources?
  • RQ2: Which are the features of a research method to meet and evaluate scalability, performance, and interoperability of the software architecture mentioned in RQ1? And how we can measure the quality of the knowledge graph produced with this software architecture?

Last Update: 2017-03-15; Next Expected Update: TODO

Currently, the research activity is developing along an exploratory stage on three different ways:

  • Automatically learning the semantics of structured data sources
  • High-level models to semantically describe data derived from multiple heterogeneous sources
  • Large-scale (and cloud-based) data management systems for storing and processing (RDF) information
Related Publications:
Giuseppe Futia, Alessio Melandri, Antonio Vetrò, Federico Morando, and Juan Carlos De Martin
28 May -1 June 2017
14th European Semantic Web Conference
Giuseppe Futia, Federico Morando , Alessio Melandri, Lorenzo Canova, Francesco Ruggiero
19 November 2016
Third Workshop on Legal Knowledge and the Semantic Web (LK&SW-2016)

Project news can be found on the following channel: GitHub

semantic-for-big-data-integration commits feed

12/04/2017 - 11:01
Add S4BDI logo
15/02/2017 - 16:26
Add source data
15/02/2017 - 16:20
Algorithm skeleton
15/02/2017 - 13:01
Add ontologies