检索结果-内蒙古大学图书馆

Efficient iterative programs with distributed data collections

JOURNAL OF LOGICAL AND ALGEBRAIC METHODS IN PROGRAMMING 2025年 144卷

作者： Chlyah, Sarah Gesbert, Nils Geneves, Pierre Layaida, Nabil Univ Grenoble Alpes CNRS Inria Grenoble INPLIG F-38000 Grenoble France

Big data programming frameworks have become increasingly important for the development of applications for which performance and scalability are critical. In those complex frameworks, optimizing code by hand is hard and time-consuming, making automated optimization particularly necessary. In order to automate optimization, a prerequisite is to find suitable abstractions to represent programs;for instance, algebras based on monads or monoids to represent distributed data collections. Currently, however, such algebras do not represent recursive programs in a way which allows for analyzing or rewriting them. In this paper, we extend a monoid algebra with a fixpoint operator for representing recursion as a first class citizen and show how it enables new optimizations. Experiments with the Spark platform illustrate performance gains brought by these systematic optimizations.

关键词： Fixpoint operator distributed data Rewrite rules Optimization

来源：评论

学校读者我要写书评

暂无评论

A hydrological knowledge-informed LSTM model for monthly streamflow reconstruction using distributed data: Application to typical rivers across the Tibetan plateau

引用

JOURNAL OF HYDROLOGY 2025年 649卷

作者： Hou, Shengling Wei, Jiahua Hou, Minglei Xu, Jiaqi Han, Lu Qinghai Univ Sch Civil Engn & Water Resources State Key Lab Plateau Ecol & Agr Lab Ecol Protect & High Qual Dev Upper Yellow Rive Xining 810016 Peoples R China Tsinghua Univ State Key Lab Hydrosci & Engn Beijing 100084 Peoples R China

Long-term streamflow data are essential for water resources planning and management, cascade reservoir scheduling, and understanding the response of water resources to climate change and human activities. Streamflow reconstructions can effectively "fill-in" missing runoff data gaps. However, considering the scarcity of observational monitoring stations and the limitations of distributed hydrological models, the reconstruction of long-term time series of runoff under varying surface and climatic conditions remains a challenge. Here, we propose a hydrological knowledge-informed Long Short-Term Memory (LSTM) model (Hydro-LSTM) for monthly streamflow reconstruction using open-access distributed data. Hydrological knowledge was derived from hydrological governing equations and parameters for each independent water cycle component. The Hydro-LSTM addresses the lack of physical consistency inherent in data-driven models, along with missing observations. The approach was applied to simulate monthly runoff of representative rivers in the Tibetan Plateau (TP) from 1980 to 2018. The results show that streamflow reconstructions for these eight stations yielded favorable levels of performance;trends in dynamic change and the range of runoff in the model training period and test period are consistent with the measured values. Values of NSE, CC, and KGE range between 0.715-0.968, 0.847-0.985, and 0.786-0.969, respectively. The influence of hydrological expertise and distributed data on the model is discussed. The introduction of hydrological knowledge makes the driving elements have hydrological significance, which improves the physical consistency and interpretability of the Hydro-LSTM model. The proposed Hydro-LSTM is expected to (1) achieve accurate and efficient reconstructions of long-term runoff time series using open-access distributed data and limited observations and (2) provide a new perspective for runoff reconstruction and prediction, with promising application prospec

关键词： Long short-term memory network (LSTM) Hydrological knowledge-informed LSTM (Hydro-LSTM) Monthly streamflow reconstruction distributed data data-driven model

来源：评论

学校读者我要写书评

暂无评论

Composite quantile regression for a distributed system with non-randomly distributed data

引用

STATISTICAL PAPERS 2025年第1期66卷 1-30页

作者： Jin, Jun Hao, Chenyan Chen, Yewen Yangzhou Univ Coll Math Sci Yangzhou Peoples R China Univ Georgia Coll Publ Hlth Athens GA USA

The composite quantile regression estimator is widely acknowledged for its robustness and efficiency, offering a compelling alternative to both ordinary least squares and quantile regression estimators in linear models. However, when data is not randomly distributed across different workers in distributed settings, existing methods for composite quantile regression become statistically inefficient. To address this limitation, we present a novel one-step upgraded pilot composite quantile regression method. Our proposed approach involves two essential steps. In the first step, we obtain a pilot estimator by leveraging a small random sample collected from different workers. Subsequently, in the second step, we perform one-step updating based on the pilot estimator, involving the summarization of sample moment quantities on each worker. The resulting estimator is theoretically proven to be as statistically efficient as the composite quantile regression estimator using the entire sample, without relying on restrictive assumptions about randomness. Furthermore, the resulting estimator inherits the robustness and efficiency advantages of the composite quantile regression estimator, while also being computationally efficient in terms of communication cost and storage usage. To validate the practical performance of our proposed method, we conduct numerical studies using simulated and real data, demonstrating its effectiveness in real-world scenarios.

关键词： Communication efficiency distributed system distributed data Robust estimation Statistical efficiency.

来源：评论

学校读者我要写书评

暂无评论

Analysis of Model Merging Methods for Continual Updating of Foundation Models in distributed data Settings

引用

APPLIED SCIENCES-BASEL 2025年第9期15卷 5196-5196页

作者： Kubota, Kenta Togo, Ren Maeda, Keisuke Ogawa, Takahiro Haseyama, Miki Hokkaido Univ Grad Sch Informat Sci & Technol N-14W-9Kita Ku Sapporo 0600814 Japan Hokkaido Univ Fac Informat Sci & Technol N-14W-9Kita Ku Sapporo 0600814 Japan

Foundation models have achieved remarkable success across various domains, but still face critical challenges such as limited data availability, high computational requirements, and rapid knowledge obsolescence. To address these issues, we propose a novel framework that integrates model merging with federated learning to enable continual foundation model updates without centralizing sensitive data. In this framework, each client fine-tunes a local model, and the server merges these models using multiple merging strategies. We experimentally evaluate the effectiveness of these methods using the CLIP model for image classification tasks across diverse datasets. The results demonstrate that advanced merging methods can surpass simple averaging in terms of accuracy, although they introduce challenges such as catastrophic forgetting and sensitivity to hyperparameters. This study defines a realistic and practical problem setting for decentralized foundation model updates, and provides a comparative analysis of merging techniques, offering valuable insights for scalable and privacy-preserving model evolution in dynamic environments.

关键词： model merging federated learning distributed data foundation model

来源：评论

学校读者我要写书评

暂无评论

Least Squares Model Averaging for distributed data

引用

JOURNAL OF MACHINE LEARNING RESEARCH 2023年第1期24卷 1-59页

作者： Zhang, Haili Liu, Zhaobo Zou, Guohua Shenzhen Polytech Univ Inst Appl Math Shenzhen 518055 Peoples R China Shenzhen Univ Inst Adv Study Shenzhen 518060 Peoples R China Capital Normal Univ Sch Math Sci Beijing 100048 Peoples R China

Divide and conquer algorithm is a common strategy applied in big data. Model averaging has the natural divide-and-conquer feature, but its theory has not been developed in big data scenarios. The goal of this paper is to fill this gap. We propose two divide-and conquer-type model averaging estimators for linear models with distributed data. Under some regularity conditions, we show that the weights from Mallows model averaging criterion converge in L-2 to the theoretically optimal weights minimizing the risk of the model averaging estimator. We also give the bounds of the in-sample and out-of-sample mean squared errors and prove the asymptotic optimality for the proposed model averaging estimators. Our conclusions hold even when the dimensions and the number of candidate models are divergent. Simulation results and a real airline data analysis illustrate that the proposed model averaging methods perform better than the commonly used model selection and model averaging methods in distributed data cases. Our approaches contribute to model averaging theory in distributed data and parallel computations, and can be applied in big data analysis to save time and reduce the computational burden.

关键词： consistency distributed data divide and conquer algorithm Mallows' criterion model averaging optimality

来源：评论

学校读者我要写书评

暂无评论

Optimal subsampling algorithm for composite quantile regression with distributed data

引用

COMPUTATIONAL STATISTICS 2024年 1-36页

作者： Yuan, Xiaohui Zhou, Shiting Wang, Yue Changchun Univ Technol Sch Math & Stat Changchun 130012 Jilin Peoples R China

For massive data stored on multiple machines, we propose a distributed subsampling procedure for the composite quantile regression. By establishing the consistency and asymptotic normality of the composite quantile regression estimator from a general subsampling algorithm, we derive the optimal subsampling probabilities and the optimal allocation sizes under the L-optimality criteria. A two-step algorithm is developed to approximate the optimal subsampling procedure. The proposed methods are illustrated through numerical experiments on simulated and real datasets.

关键词： Composite quantile regression distributed data Massive data Optimal subsampling

来源：评论

学校读者我要写书评

暂无评论

Robust estimation for nonrandomly distributed data

引用

ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS 2023年第3期75卷 493-509页

作者： Li, Shaomin Wang, Kangning Xu, Yong Beijing Normal Univ Ctr Stat & Data Sci 18 Jinfeng Rd Zhuhai 519087 Peoples R China Shandong Technol & Business Univ Sch Stat 191 Binhai Middle Rd Yantai 264005 Peoples R China Shandong Technol & Business Univ Sch Business Adm 191 Binhai Middle Rd Yantai 264005 Peoples R China

In recent years, many methodologies for distributed data have been developed. However, there are two problems. First, most of these methods require the data to be randomly and uniformly distributed across different machines. Second, the methods are mainly not robust. To solve these problems, we propose a distributed pilot modal regression estimator, which achieves robustness and can adapt when the data are stored nonrandomly. First, we collect a random pilot sample from different machines;then, we approximate the global MR objective function by a communication-efficient surrogate that can be efficiently evaluated by the pilot sample and the local gradients. The final estimator is obtained by minimizing the surrogate function in the master machine, while the other machines only need to calculate their gradients. Theoretical results show the new estimator is asymptotically efficient as the global MR estimator. Simulation studies illustrate the utility of the proposed approach.

关键词： distributed data Communication-efficient Modal regression Robustness

来源：评论

学校读者我要写书评

暂无评论

Federated stochastic configuration networks for distributed data analytics

引用

INFORMATION SCIENCES 2022年 614卷 51-70页

作者： Dai, Wei Ji, Langlong Wang, Dianhui China Univ Min & Technol Sch Informat & Control Engn Xuzhou 221116 Jiangsu Peoples R China China Univ Min & Technol Artificial Intelligence Res Inst Xuzhou 221116 Jiangsu Peoples R China Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China

Stochastic configuration networks (SCNs), as a class of randomized learning models, are incrementally built under a supervisory mechanism, and theoretically ensure error-free learning for training sets. This paper proposes a federated version of SCNs (FSCNs) for large-scale data, which are geographically distributed among different end-user clients with non-shareable data due to privacy and security concerns. Unlike centralized learning that needs to collect data from clients and store them collectively on a cloud server, FSCNs enable distributed analytics in a collaborative learning paradigm without centrally aggregating new data, thereby preventing the leakage of private information. Considering different supervisory and aggregate schemes of model parameters, two FSC algorithms with two aggregate strategies are presented. The experiment results on both data regression and classification show the effectiveness and feasibility of our proposed federated learning scheme. (c) 2022 Elsevier Inc. All rights reserved.

关键词： Stochastic configuration networks Federated learning distributed data Privacy and security

来源：评论

学校读者我要写书评

暂无评论

Enhancing Digital Market Research Through distributed data and Knowledge-Based Systems: Analyzing Emerging Trends and Strategies 23rd

Enhancing Digital Market Research Through Distributed Data a...

引用

23rd International Conference on Next Generation Wired/Wireless Networks and Systems (NEW2AN) / 16th Conference on Internet of Things and Smart Spaces (RuSMART)

作者： Sharopova, Nafosat Tashkent State Univ Econ Mkt Dept Islam Karimov St 49 Tashkent 100066 Uzbekistan

ISBN: (纸本)9783031609961;9783031609978

In this era of transformation businesses and organizations are navigating the intricate landscape of digital markets. These markets rely on data driven insights to make decisions and achieve success. This research paper explores the world of market research specifically focusing on the synergy, between distributed data and knowledge-based systems. Our goal is to understand and capitalize on emerging trends by unraveling the dynamics of this combination. We begin our investigation with an exploration of methods and technologies for gathering and harmonizing data from various sources such as social media platforms, e commerce websites, Internet of Things (IoT) devices and more. By integrating these sources, we create datasets that form the foundation for our research. Next, we dive into knowledge-based systems utilizing intelligence and machine learning algorithms to extract valuable insights and patterns from our integrated data. These insights not deepen our understanding of emerging market trends. Also serve as a basis for developing effective digital marketing strategies and campaigns. Throughout this journey we also consider ethical aspects and respect privacy concerns since data usage is crucial, in today's information age. Our paper showcases real life examples and practical uses from industries to demonstrate the advantages of our approach. In essence we take a glimpse into the future speculating on how digital market research will evolve, mapping out paths and emphasizing areas, for exploration and innovation. This research aims to equip businesses with the insights and resources needed to navigate the changing digital market landscape.

关键词： Digital Market Research distributed data Knowledge-Based Systems Emerging Trends data Harmonization Digital Marketing Strategies Ethical data Usage

来源：评论

学校读者我要写书评

暂无评论

Algorithm A for distributed data classification 28th

Algorithm A for distributed data classification

引用

28th International Conference on Knowledge Based and Intelligent information and Engineering Systems, KES 2024

作者： Tetteh, Evans Teiko Zielosko, Beata Doctoral School University of Silesia in Katowice Bankowa 14 Katowice40-007 Poland Institute of Computer Science University of Silesia in Katowice Bedzinska 39 Sosnowiec41-200 Poland

Knowledge discovery is one of the key areas in predictive data mining tasks. Performing classification tasks on a single source of data using a decision tree algorithm is a relatively straightforward process. However, the complication arises when we have distributed sources of data that yield sets of decision trees. Classifier ensembles are often created, and a decision is assigned to a new object based on a certain voting strategy. The article proposes a different approach to creating a rule-based classifier. Using a set of decision trees, a global model of decision rules is induced. It contains rules which are true for the maximum number of trees from a set of decision trees. This model is verified by data corresponding to distributed local data sources. The bootstrapping technique was used to obtain distributed data sources. Pruning of decision trees was applied to improve the accuracy of the rule-based classifier. The conducted experiments confirm the validity of using algorithm A for the learning of decision rules from a set of decision trees. © 2024 The Authors.

关键词： Classification Decision rules Decision trees distributed data

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：