We present RADD, an innovative analytic pipeline used to measure reliability and availability for cloud-based distributed databases by leveraging the vast amount of telemetry present in the cloud. RADD can perform roo...
详细信息
ISBN:
(纸本)9781450367356
We present RADD, an innovative analytic pipeline used to measure reliability and availability for cloud-based distributed databases by leveraging the vast amount of telemetry present in the cloud. RADD can perform root cause analysis (RCA) to provide a minute-by-minute summary of the availability of a database close to real-time. On top of this data, RADD can raise alerts, analyze the stability of new versions during their deployment, and provide Key Performance Indicators (KPIs) that allow us to understand the stability of our system across all deployed databases. RADD implements an event correlation framework that puts the emphasis on data compliance and uses information entropy to measure causality and reduce noisy signals. It also uses statistical modelling to analyze new versions of the product and detect potential regressions early in our software development lifecycle. We demonstrate the application of RADD on top of azure Synapse Analytics, where the system has helped us identify top-hitting and new issues and support on-call teams regarding every aspect of database health.
暂无评论