When faced with dirty data fused from different sources, modern data management generally trace origins using data provenance technical to repair it fundamentally. Unfortunately, the state-of-art data provenance appro...
详细信息
ISBN:
(纸本)9781467368506
When faced with dirty data fused from different sources, modern data management generally trace origins using data provenance technical to repair it fundamentally. Unfortunately, the state-of-art data provenance approaches can only deal with small amount of data using annotation or inverse process. What's worse, these approaches just work under stand-alone mode, resulting in low efficiencies. In this paper, we raise a distributed provenance storage strategy DPSM under the environment of parallel data-base combined with the small-amount-storage advantage of "Provenance Tree". The model optimizes both query efficiencies and storage cost by using a detached method to store different provenance information.
暂无评论