The amount of research on the gathering and handling of healthcare data keeps growing. To support multi-center research, numerous institutions have sought to create a common data model (CDM). However, dataquality iss...
详细信息
ISBN:
(纸本)9781643684574;9781643684567
The amount of research on the gathering and handling of healthcare data keeps growing. To support multi-center research, numerous institutions have sought to create a common data model (CDM). However, dataquality issues continue to be a major obstacle in the development of CDM. To address these limitations, a dataquality assessment system was created based on the representative data model OMOP CDM v5.3.1. Additionally, 2,433 advanced evaluation rules were created and incorporated into the system by mapping the rules of existing OMOP CDM quality assessment systems. The dataquality of six hospitals was verified using the developed system and an overall error rate of 0.197% was confirmed. Finally, we proposed a plan for high-qualitydata generation and the evaluation of multi-center CDM quality.
In the big data era, large amounts of data are under generation and accumulation in various industries. However, users usually feel hindered by the dataquality issues when extracting values from the big data. Thus, d...
详细信息
In the big data era, large amounts of data are under generation and accumulation in various industries. However, users usually feel hindered by the dataquality issues when extracting values from the big data. Thus, dataquality issues are gaining more and more attention from dataqualitymanagement analysts. Cutting-edge solutions like data ETL, data cleaning, and dataquality monitoring systems have many deficiencies in capability and efficiency, making it difficult to cope with complicated situations on big data. These problems inspire us to build SparkDQ, a generic distributed dataqualitymanagement model and framework that provides a series of dataquality detection and repair interfaces. Users can quickly build custom tasks of dataquality computing for various needs by utilizing these interfaces. In addition, SparkDQ implements a set of algorithms that in a parallel manner with optimizations. These algorithms aim at various dataquality goals. We also propose several system-level optimizations, including the job level optimization with multi-task execution scheduling and the data-level optimization with data state caching. The experimental evaluation shows that the proposed distributed algorithms in SparkDQ run up to 12 times faster compared to the corresponding stand-alone serial and multi-thread algorithms. Compared with the cutting-edge distributed dataquality solution Apache Griffin, SparkDQ has more features, and its execution time is only around half of Apache Griffin on average. SparkDQ achieves near linear data and node scalability. (C) 2021 Elsevier Inc. All rights reserved.
The amount of research on the gathering and handling of healthcare data keeps growing. To support multi-center research, numerous institutions have sought to create a common data model (CDM). However, dataquality iss...
详细信息
ISBN:
(纸本)9781643683898
The amount of research on the gathering and handling of healthcare data keeps growing. To support multi-center research, numerous institutions have sought to create a common data model (CDM). However, dataquality issues continue to be a major obstacle in the development of CDM. To address these limitations, a dataquality assessment system was created based on the representative data model OMOP CDM v5.3.1. Additionally, 2,433 advanced evaluation rules were created and incorporated into the system by mapping the rules of existing OMOP CDM quality assessment systems. The dataquality of six hospitals was verified using the developed system and an overall error rate of 0.197% was confirmed. Finally, we proposed a plan for high-qualitydata generation and the evaluation of multi-center CDM quality.
Objectives: In the medical field, we face many challenges, including the high cost of data collection and processing, difficult standards issues, and complex preprocessing techniques. It is necessary to establish an o...
详细信息
Objectives: In the medical field, we face many challenges, including the high cost of data collection and processing, difficult standards issues, and complex preprocessing techniques. It is necessary to establish an objective and systematic data quality management system that ensures data reliability, mitigates risks caused by incorrect data, reduces datamanagement costs, and increases data utilization. We introduce the concept of SMART data in a data quality management system and conducted a case study using real-world data on colorectal cancer. Methods: We defined the data quality management system from three aspects (Construction - Operation - Utilization) based on the life cycle of medical data. Based on this, we proposed the "SMART data" concept and tested it on colorectal cancer data, which is actual real-world data. Results: We define "SMART data" as systematized, high-qualitydata collected based on the life cycle of data construction, operation, and utilization through quality control activities for medical data. In this study, we selected a scenario using data on colorectal cancer patients from a single medical institution provided by the Clinical Oncology Network (CONNECT). As SMART data, we curated 1,724 learning data and 27 Clinically Critical Set (CCS) data for colorectal cancer prediction. These datasets contributed to the development and finetuning of the colorectal cancer prediction model, and it was determined that CCS cases had unique characteristics and patterns that warranted additional clinical review and consideration in the context of colorectal cancer prediction. Conclusions: In this study, we conducted primary research to develop a medical data quality management system. This will standardize medical data extraction and quality control methods and increase the utilization of medical data. Ultimately, we aim to provide an opportunity to develop a medical dataqualitymanagement methodology and contribute to the establishment of a medical da
暂无评论