The DIDA Platform aims to provide a series of interoperable open-source tools (coming from FIWARE and Apache communities) that can be adopted in the scope of lowering barriers during the development and deployment phases. Starting from the bottom there is the Smart Field and External Systems block, a physical layer where are located for instance the industrial devices, machines, actuators, sensors, wearable devices, robots, etc, such as the interfaces to collect data and communicate with IIoT systems. The External Systems block is composed of all IT systems for supporting industrial processes. Custom interfaces and system wrappers are a crucial part of the component, aiming to share data using smart data models for representing information.
Data Ingestion provides a bridge between the physical layer and the data brokering, where the data from the devices are shared in a standardized structure with the broker, putting the information at the disposal of the tools that will analyse them.
- IDAS Agents: The IoT Agent component allows to connect objects to gather data or interact with them, typical IoT uses case scenarios. It’s needed in case of connecting IoT devices/gateways to FIWARE-based ecosystems. IoT Agents translate IoT-specific protocols into the NGSI context information protocol, which is the FIWARE standard data exchange model. IoT Agent for OPC UA, IoT Agent for JSON, IoT Agent for Ultralight are some IDAS Agent in FIWARE Catalogue.
- Custom Agents: Any other custom Agent can be developed basing it on the same standard and features of the IDAS Agents.
The Data Brokering sublayer is charge of feed the persistence and processing phase, where main actors are the ORION Context Broker and Apache Kafka. The FIWARE Orion Context Broker is an implementation of the Publish/Subscribe Broker Generic Enabler, able to manage the entire lifecycle of context information including updates, queries, registrations, and subscriptions. It based on NGSI-LD server implementation to manage context information and its availability. This GE allows to create context elements and manage them through updates and queries, and to subscribe to context information receiving a notification when a condition is satisfied, for example in case of context change. Apache Kafka is a widely used event streaming platform able to publish (write) and subscribe to (read) streams of events, including continuous import/export of data from other systems; store streams of events durably and reliably for as long as needed; process streams of events as they occur or retrospectively.
Data Persistence and Processing
The core part is storing the data collected and processing them. In the following, a list of main FIWARE and Apache open-source components.
- Cygnus: is a connector with the scope to persist context data sources into third-party databases and storage systems, creating a historical view of the context. It is based on Apache Flume, which is a data flow system structured on the concepts of flow-based programming. Built to automate the flow of data between systems, it supports powerful and scalable directed graphs of data routing.
- Quantum Leap: is a Generic Enabler focused on persisting historical context data into time-series databases such as CrateDB with reference to maintaining a scalable architecture and compatibility with visualization tools such as Grafana.
- Draco: is a connector used to persist text data sources into other third-party databases and storage systems, creating a historical view of the context. Based on Apache NiFi, a popular framework for data management and processing from multiple sources, it connects the Orion Context Broker to a wide range of external systems such as MySQL, MongoDB etc. Another usage of Draco is filter and repost context data back into Orion.
- Cosmos: is a FIWARE Generic Enabler for big data analysis, it is composed of a set of tools (Orion-Flink Connector, Orion-Spark Connector, Apache Flink Processing Engine, Apache Spark Processing Engine, Streaming processing examples using Orion Context Broker) that help achieve the tasks of Streaming and Batch processing over context data.
- Apache Livy: is a service allowing easy interaction with a Spark cluster over a REST interface. Through it, can be easily submitted Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, everything via a simple REST interface or an RPC client library. Apache Livy also simplifies the interaction between Spark and application servers.
- Apache Spark: is an open-source parallel processing framework for running large-scale data analytics applications across clustered computers. It can handle both batch and real-time analytics and data processing workloads. It's part of a greater set of tools, including Apache Hadoop and other open-source resources for today’s analytics community. In this way, it can be considered as a data analytics cluster computing tool. It can be used with the Hadoop Distributed File System (HDFS), which is a particular Hadoop component that facilitates complicated file handling.
- Apache StreamPipes: is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze, and explore IoT data streams. StreamPipes has an exchangeable runtime execution layer and executes pipelines using one of the provided wrappers, e.g. standalone or distributed in Apache Flink. Pipeline elements in StreamPipes can be installed at runtime - the built-in SDK allows you to easily implement new pipeline elements according to your needs. Pipeline elements are standalone microservices that can run anywhere - centrally on your server, in a large-scale cluster or close at the edge.
At the end of data storage and processing, the results of the analysis need to be visualized. The data visualization assumes a relevant role, because in this phase is fundamental to have a clear idea of what the information means, giving it visual context through maps or graphs. This makes the data more natural for the human mind to comprehend. The mentioned platforms are powerful tools to do that in a simple way and in the meantime, they are compliant with the most common data sources.
- WireCloud: offers a platform aimed at allowing end users without programming skills to easily create web applications and dashboards/cockpits. The purpose is to integrate heterogeneous data, application logic, and UI components (widgets) sourced from the Web to create new coherent and value-adding composite applications.
- Grafana: is a web application for analytics and interactive visualization. It provides charts, graphs, and alerts for the web when connected to supported data sources (MySQL, PostgreSQL, …). It is expandable through a plug-in system. End users can create complex monitoring dashboards using interactive query builders.
- Knowage: offers a complete set of tools for analytics, paying attention in particular at the data visualization for the most common data sources and big data. It has many modules (Big Data, Smart Intelligence, Enterprise Reporting, Location Intelligence, Performance Management, Predictive Analysis) to fit the needs of the consumers.
- Apache Superset is a data exploration and visualization platform designed to be visual, intuitive and interactive. It allows users to analyse data using its SQL editor and to easily create charts and dashboards.
Smart Data Spaces and Applications
The final goal is the development of smart data applications. The Smart Data Spaces and Applications layer, in fact, contains the system and user applications for presenting and consuming data. BI & Analytics, AR/VR, Chatbots & Virtual Assistants, Self-service Visualization and Generic Cognitive Applications are the main fields supporting and providing the requirements for developments.
The persistence is a key component of a successful platform, it is needed to store all the relevant information data, making them accessible for the outside when it is required.
- HDFS: the Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets.
- Postgres: is a powerful, open-source object-relational database system. It has like main features the transactions with Atomicity, Consistency, Isolation, Durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures.
- MongoDB: is a distributed database, document-based, generic purpose for modern application and cloud. It stores documents in JSON format, supports matrix and nested objects, an advanced query language allows the user to filter data using whatever key in JSON document, having at the same time all the advantages of a relation DB like ACID transactions, the use of join in the queries and so on.
In the security block are defined the components for the authorization and authentication of users and systems. They also integrate modules for data protection and privacy.
- KeyRock: is a FIWARE component for Identity Management. Using Keyrock (in conjunction with other security components such as PEP Proxy and Authzforce) it is added OAuth2-based authentication and authorization security to services and applications.
- Wilma: in combination with Identity Management and Authorization PDP GEs, adds authentication and authorization security to backend applications. Thus, only FIWARE users will be able to access Generic Enablers and other REST services. The PEP Proxy allows for programmatically managing specific permissions and policies to resources allowing different access levels to users.
- AuthZForce: is the reference implementation of the Authorization PDP Generic Enabler. Indeed, as mandated by the GE specification, this implementation provides an API to get authorization decisions based on authorization policies, and authorization requests from PEPs. The API follows the REST architecture style and complies with XACML v3.0.
The Data Sovereignty block contains the components of the IDS ecosystem able to exchange data in a secure way guaranteeing the technological usage control and the implementation of the data sovereignty principles.
- TRUE Connector: is one of the available open-source connectors based on IDS standards, it is a technical component to standardize data exchange between participants in the data space.
- ENG Clearing House: is an intermediary that provides clearing and settlement services for all financial and data exchange transactions.
- IDS Services: describe complementary services for deploying an IDS ecosystem. An example is the IDS Metadata Broker, an intermediary that stores and manages information about the data sources available in the data space or the IDS Identity Provider that offers a service to create, maintain, manage, monitor, and validate identity information of and for participants in the data space.