检索结果-内蒙古大学图书馆

33rd Chinese Control and Decision Conference (CCDC)

作者： Liu, Mengyu Wang, Yuhang Lin, Ruishi Wang, Shenhang Zheng, Wei Beijing Aerosp Automat Control Inst Beijing 100854 Peoples R China

ISBN: (纸本)9781665440899

Domestic mass data processing system in aerospace field uses big data simple sampling algorithm for data specification in the data preprocessing stage. This paper analyzes the data curve distortion caused by this algorithm, and proposes an optimization method for that. Finally, a big data sampling algorithm based on peak detection is adopted to achieve the purpose of quickly viewing the fidelity and complete picture of massive historical data, while ensuring the correctness of the data interpretation after data preprocessing at the same time. Through the using of real test data for verification, in the data preprocessing stage of the domestic mass data processing system, the large data sampling algorithm based on peak detection is adopted to achieve the high fidelity of the data curve after sampling.

关键词： Massive data Processing Peak Detection big data sampling data Curve Fidelity

来源：评论

学校读者我要写书评

暂无评论

big data sampling Algorithm Based on Peak Detection

Big Data Sampling Algorithm Based on Peak Detection

引用

第33届中国控制与决策会议

作者： Mengyu Liu Yuhang Wang Ruishi Lin Shenhang Wang Wei Zheng Beijing Aerospace Automatic Control Institute

Domestic mass data processing system in aerospace field uses big data simple sampling algorithm for data specification in the data preprocessing *** paper analyzes the data curve distortion caused by this algorithm,and proposes an optimization method for ***,a big data sampling algorithm based on peak detection is adopted to achieve the purpose of quickly viewing the fidelity and complete picture of massive historical data,while ensuring the correctness of the data interpretation after data preprocessing at the same *** the using of real test data for verification,in the data preprocessing stage of the domestic mass data processing system,the large data sampling algorithm based on peak detection is adopted to achieve the high fidelity of the data curve after sampling.

关键词： Massive data Processing Peak Detection big data sampling data Curve Fidelity

来源：评论

学校读者我要写书评

暂无评论

Euclidean distance stratified random sampling based clustering model for big data mining

COMPUTATIONAL AND MATHEMATICAL METHODS

引用

COMPUTATIONAL AND MATHEMATICAL METHODS 2021年第6期3卷

作者： Pandey, Kamlesh Kumar Shukla, Diwakar Dr Hari Singh Gour Vishwavidyalaya Dept Comp Sci & Applicat Sagar Madhya Pradesh India

big data mining is related to large-scale data analysis and faces computational cost-related challenges due to the exponential growth of digital technologies. Classical data mining algorithms suffer from computational deficiency, memory utilization, resource optimization, scale-up, and speed-up related challenges in big data mining. sampling is one of the most effective data reduction techniques that reduces the computational cost, improves scalability and computational speed with high efficiency for any data mining algorithm in single and multiple machine execution environments. This study suggested a Euclidean distance-based stratum method for stratum creation and a stratified random sampling-based big data mining model using the K-Means clustering (SSK-Means) algorithm in a single machine execution environment. The performance of the SSK-Means algorithm has achieved better cluster quality, speed-up, scale-up, and memory utilization against the random sampling-based K-Means and classical K-Means algorithms using silhouette coefficient, Davies Bouldin index, Calinski Harabasz index, execution time, and speedup ratio internal measures.

关键词： big data mining big data sampling big data clustering Euclidean distance based stratum random sampling sample extension SSK-Means stratified sampling

来源：评论

学校读者我要写书评

暂无评论

An evidential analytics for buried information in big data samples: Case study of semiconductor manufacturing

引用

INFORMATION SCIENCES 2019年 486卷 190-203页

作者： Ko, Yu-Chien Fujita, Hamido Chung Hua Univ Dept Informat Management Hsinchu 30012 Taiwan Ho Chi Minh City Univ Technol HUTECH Fac Informat Technol Ho Chi Minh City Vietnam

The big data samples are important source for analytics. However, its relevant/irrelevant information, unspecified variables/scales, noise/null, and so forth impose huge challenges on the analysis of relevance, feature, cause, and evaluation. This paper proposes an evidential analytics to disclose buried information in big data samples. Technically, it models memberships composed of relevant preference and replaces data with these priors. Its operations include generating analytics baselines, reducing variables, identifying sparse features, and inducing rules by taking advantage of evidence. In illustration, a case study of semiconductor manufacturing in UCI secom is presented. It discloses relevant signals, key factors, variables' thresholds, sparse characteristics, and causal effect of damages buried in normal samples. The contribution of this paper not only contains these achievements but provides priori data for inference. (C) 2019 Elsevier Inc. All rights reserved.

关键词： big data sampling Analytics baselines Membership Buried information Evidential inference

来源：评论

学校读者我要写书评

暂无评论

Towards a Powerful Solution for data Accuracy Assessment in the big data Context

引用

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS 2020年第2期11卷 419-429页

作者： Talha, Mohamed Elmarzouqi, Nabil Kalam, Anas Abou El Cadi Ayyad Univ ENSA Marrakech Marrakech Morocco Mohammed V Univ Rabat ENSET Rabat Rabat Morocco

data Accuracy is one of the main dimensions of data Quality;it measures the degree to which data are correct. Knowing the accuracy of an organization's data reflects the level of reliability it can assign to them in decision-making processes. Measuring data accuracy in big data environment is a process that involves comparing data to assess with some "reference data" considered by the system to be correct. However, such a process can be complex or even impossible in the absence of appropriate reference data. In this paper, we focus on this problem and propose an approach to obtain the reference data thanks to the emergence of big data technologies. Our approach is based on the upstream selection of a set of criteria that we define as "Accuracy Criteria". We use furthermore a set of techniques such as big data sampling, Schema Matching, Record Linkage, and Similarity Measurement. The proposed model and experiment results allow us to be more confident in the importance of data quality assessment solution and the configuration of the accuracy criteria to automate the selection of reference data in a data Lake.

关键词： big data data quality data accuracy assessment big data sampling schema matching record linkage similarity measurement

来源：评论

学校读者我要写书评

暂无评论

MapReduce particle filtering with exact resampling and deterministic runtime

引用

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING 2017年第1期2017卷 71页

作者： Thiyagalingam, Jeyarajan Kekempanos, Lykourgos Maskell, Simon Univ Liverpool Dept Elect Engn & Elect Liverpool L69 3GJ Merseyside England

Particle filtering is a numerical Bayesian technique that has great potential for solving sequential estimation problems involving non-linear and non-Gaussian models. Since the estimation accuracy achieved by particle filters improves as the number of particles increases, it is natural to consider as many particles as possible. MapReduce is a generic programming model that makes it possible to scale a wide variety of algorithms to big data. However, despite the application of particle filters across many domains, little attention has been devoted to implementing particle filters using MapReduce. In this paper, we describe an implementation of a particle filter using MapReduce. We focus on a component that what would otherwise be a bottleneck to parallel execution, the resampling component. We devise a new implementation of this component, which requires no approximations, has O(N) spatial complexity and deterministic O((log N)(2)) time complexity. Results demonstrate the utility of this new component and culminate in consideration of a particle filter with 2(24) particles being distributed across 512 processor cores.

关键词： MCMC methods Particle filters big data sampling MapReduce Resampling

来源：评论

学校读者我要写书评

暂无评论

Adaptive Bayesian Network Structure Learning from big datasets 4th

Adaptive Bayesian Network Structure Learning from Big Datase...

引用

22nd International Conference on database Systems for Advanced Applications (DASFAA)

作者： Tang, Yan Zhang, Qidong Liu, Huaxin Wang, Wangsong Hohai Univ Coll Comp & Informat Nanjing 210098 Jiangsu Peoples R China

ISBN: (纸本)9783319557052;9783319557045

Since big data contain more comprehensive probability distributions and richer causal relationships than conventional small datasets, discovering Bayesian network (BN) structure from big datasets is becoming more and more valuable for modeling and reasoning under uncertainties in many areas. Facing big data, most of the current BN structure learning algorithms have limitations. First, learning BNs structure from big datasets is an expensive process that requires high computational cost, often ending in failure. Second, given any dataset as input, it is very difficult to choose one algorithm from numerous candidates for consistently achieving good learning accuracy. To address these issues, we introduce a novel approach called Adaptive Bayesian network Learning (ABNL). ABNL begins with an adaptive sampling process that extracts a sufficiently large data partition from any big dataset for fast structure learning. Then, ABNL feeds the data partition to different learning algorithms to obtain a collection of BN Structures. Lastly, ABNL adaptively chooses the structures and merge them into a final network structure using an ensemble method. Experimental results on four big datasets show that ABNL leads to a significantly improved performance than whole dataset learning and more accurate results than baseline algorithms.

关键词： Bayesian network structure learning Bayesian score big data sampling Ensemble method

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：