本文以H公司为例,探讨金融数据仓库的数据质量评估。首先介绍证券行业数据仓库数据内容及特点,构建包含完整性、准确性等7个一级指标及相关二级指标的评价体系并量化。接着阐述模糊层次分析法和熵权法,前者通过构建层次模型和模糊判断矩阵确定主观权重,后者经数据标准化等步骤计算客观权重,两者结合得出综合权重。通过对H公司主体、交易等四个主题域数据集的算例分析,包括指标量化、权重计算及质量评估,结果表明主体和渠道数据在准确性及一致性方面有不足,研究为金融数据仓库数据质量管理提供了科学方法和改进方向。This research focuses on the data quality evaluation of financial data warehouses, taking H Company as an example. Firstly, it introduced the data content and characteristics of the security industry data warehouse, constructed an evaluation system including 7 first-level indicators such as integrity and accuracy and related second-level indicators, and quantified them. Then, it elaborated on the fuzzy analytic hierarchy process and the entropy weight method. The former determines subjective weights by constructing a hierarchical model and a fuzzy judgment matrix, while the latter calculates objective weights through steps such as data standardization. The two methods are combined to obtain comprehensive weights. Through a case analysis of the data sets of four theme domains such as the main body and transactions of Company H, including index quantification, weight calculation, and quality assessment, the results show that the main body and channel data have deficiencies in terms of accuracy and consistency. This study provides a scientific method and an improvement direction for the data quality management of financial data warehouses.
自1938年美国首先颁布管控化学品的法律以来,健康风险评估逐步发展,各个国家和地区相继颁布文件,并已形成较为完善的评估框架.由于在化学品健康风险评估的过程中,存在大量收集引用的数据及信息,因此数据质量评估是保证风险评估结果可信的关键.目前为止美国环保署(environmental protection agency,EPA)和欧盟《化学品的注册、评估、授权和限制》(regulation concerning the registration,evaluation,authorization and restriction of chemicals,REACH)法规均对数据质量评估方法进行了详细的规定.两个地区均采用数据评分(1—4)与证据权重(weight of evidence,WOE)(1/2)相结合的评估方法,不同之处在于欧盟更侧重于对数据整体的相关性、可靠性和充分性进行评估和打分,U.S EPA则更为细致具体,其侧重于不同情境下的数据分组分析,根据不同的评分领域和指标,规定高置信度、中置信度、低置信度和数据不可接受的标准.通过总结对比欧盟及美国的数据质量评估方法,建议我国采用数据打分与证据权重相结合的定量评估方法,并明确规定不同情景下不同数据来源的打分规则和标准,使数据质量评估过程系统化.
暂无评论