State-of-the-art approaches for managing Big data pipelines assume their anatomy is known by design and expressed through adhoc Domain-Specific Languages (DSLs), with insufficient knowledge of the dark data involved i...
详细信息
ISBN:
(纸本)9783031416224;9783031416231
State-of-the-art approaches for managing Big data pipelines assume their anatomy is known by design and expressed through adhoc Domain-Specific Languages (DSLs), with insufficient knowledge of the dark data involved in the pipeline execution. Dark data is data that organizations acquire during regular business activities but is not used to derive insights or for decision-making. The recent literature on Big data processing agrees that a new breed of Big data pipeline discovery (BDPD) solutions can mitigate this issue by solely analyzing the event log that keeps track of pipeline executions over time. Relying on well-established process mining techniques, BDPD can reveal fact-based insights into how data pipelines transpire and access dark data. However, to date, a standard format to specify the concept of Big data pipeline execution in an event log does not exist, making it challenging to apply process mining to achieve the BDPD task. To address this issue, in this paper we formalize a universally applicable reference data model to conceptualize the core properties and attributes of a data pipeline execution. We provide an implementation of the model as an extension to the XES interchange standard for event logs, demonstrate its practical applicability in a use case involving a data pipeline for managing digital marketing campaigns, and evaluate its effectiveness in uncovering dark data manipulated during several pipeline executions.
This paper presents a generic and flexible reference data model meant as the blueprint of the database component of information and decision support systems related to different types of biomass-based supply chains (e...
详细信息
This paper presents a generic and flexible reference data model meant as the blueprint of the database component of information and decision support systems related to different types of biomass-based supply chains (e.g. first to fourth generation biomass for production of bioenergy and biomaterials). The datamodel covers the biomass types and handling operations as characterised by their attributes and mutual relationships resulting from a life cycle inventory analysis. The datamodel enables the identification of the possible operation sequences in the specified chain. This functionality is demonstrated in a case study in which biomass from tall herb communities and mesotrophic grasslands is supplied for biogas or compost production. A comparative analysis has pointed out that the datamodel includes the required object types to add specific attributes of biomass supply chain simulation and optimisation models (such as spatial and temporal dimensions). (C) 2016 Elsevier Ltd. All rights reserved.
Integration and analysis of clinical data collected in multiple data sources over a long period of time is a major challenge even when data warehouses and metadata registries are used. Since most metadata registries f...
详细信息
ISBN:
(纸本)9781614994329;9781614994312
Integration and analysis of clinical data collected in multiple data sources over a long period of time is a major challenge even when data warehouses and metadata registries are used. Since most metadata registries focus on describing data elements to establish domain consistent data definition and providing item libraries, hierarchical and temporal dependencies cannot be mapped. Therefore we developed and validated a reference data model, based on ISO/IEC 11179, which allows revision and branching control of conceptually similar data elements with heterogeneous definitions and representations.
In the context of the Industry 4.0 approach, applications and solutions supporting monitoring, simulation, optimisation and decision-making in production systems are exponentially growing. These solutions are commonly...
详细信息
In the context of the Industry 4.0 approach, applications and solutions supporting monitoring, simulation, optimisation and decision-making in production systems are exponentially growing. These solutions are commonly...
详细信息
In the context of the Industry 4.0 approach, applications and solutions supporting monitoring, simulation, optimisation and decision-making in production systems are exponentially growing. These solutions are commonly built on digital twins, i.e., comprehensive, structured and effective digital representations of the production system and its entities, whose current status is constantly updated by the plugged data sources. The arising of the Industry 5.0 paradigm and the established key role of workers in manufacturing require new Digital Twins to represent also humans. In fact, as cognitive automation becomes more and more pervasive and its behaviour unintelligible to humans, it becomes essential for improving performance and well-being, at the same time, to model humans as data-driven agents and to represent their interaction with the factory systems. Currently, a standardised solution for creating Digital Twins is missing, forcing industrial solution architects to resort to ad-hoc implementations and models. These solutions lack re-usability, scalability and extensibility, preventing the introduction of a human digital representation in existent twins, so hindering the complete shift to the new Industry 5.0 paradigm. In this paper, such limitations are faced by introducing an extensible and flexible IIoT-industrial internet of things-based platform with a twofold benefit: on the one hand, to support the creation of customised data representations of production systems and their entities including humans; on the other hand, to provide a modular infrastructure, along with its interchangeable components, for easy digital twin instantiation and ramp-up. An implementation of the platform has been tested with different applications in a laboratory setting and released as a public resource. Finally, potential future applications of the proposed digital twin are discussed, highlighting its main benefits.
The paper depicts the development of reference data models for strategic key performance indicator systems specific to waste management firms providing a new comprehensive typology of generic models for data warehouse...
详细信息
The paper depicts the development of reference data models for strategic key performance indicator systems specific to waste management firms providing a new comprehensive typology of generic models for data warehouse solutions. Additionally, a development methodology for industry solutions is applied, which, given the empirically founded typification process and the theoretically derived performance measurement systems, is characterized by a high degree of structure and transparency. The new approach thus systematically integrates both inductive-empirical and deductive-analytical elements.
暂无评论