检索结果-内蒙古大学图书馆

11th ACM International Conference on Web Search and Data Mining

作者： Teng, Shang-Hua Univ Southern Calif Los Angeles CA 90007 USA

ISBN: (纸本)9781450355810

In the age of network sciences and machine learning, efficient algorithms are now in higher demand more than ever before. Big Data fundamentally challenges the classical notion of efficient algorithms: algorithms that used to be considered efficient, according to polynomial-time characterization, may no longer be adequate for solving today's problems. It is not just desirable, but essential, that efficient algorithms should be scalable. In other words, their complexity should be nearly linear or sub-linear with respect to the problem size. Thus, scalability, not just polynomial-time computability, should be elevated as the central complexity notion for characterizing efficient computation. Using several basic tasks in network analysis, machine learning, and optimization as examples - in this talk - I will highlight a family of fundamental algorithmic techniques for designing provably-good scalable algorithms.

关键词： scalable algorithms graph sparsification local algorithms advanced sampling big data network sciences machine learning

来源：评论

学校读者我要写书评

暂无评论

Route to exascale: Novel mathematical methods, scalable algorithms and Computational Science skills

引用

JOURNAL OF COMPUTATIONAL SCIENCE 2016年第May期14卷 1-4页

作者： Alexandrov, Vassil ICREA Barcelona Supercomp Ctr Barcelona Spain

This editorial outlines the research context, the needs and challenges on the route to exascale. In particular the focus is on novel mathematical methods and mathematical modeling approaches together with scalable scientific algorithms that are needed to enable key science applications at extreme-scale. This is especially true as HPC systems continue to scale up in compute node and processor core count. These extreme-scale systems require novel mathematical methods to be developed that lead to scalable scientific algorithms to hide network and memory latency, have very high computation/communication overlap, have minimal communication, have fewer synchronization points. It stresses the need of scalability at all levels, starting from mathematical methods level through algorithmic level, and down to systems level in order to achieve overall scalability. It also points out that with the advances of Data Science in the past few years the need of such scalable mathematical methods and algorithms able to handle data and compute intensive applications at scale becomes even more important. The papers in the special issue are selected to address one or several key challenges on the route to exascale. (C) 2016 Published by Elsevier B.V.

关键词： Novel mathematical methods scalable algorithms Exascale computing Computational Science research methods HPC

来源：评论

学校读者我要写书评

暂无评论

Fast linear model trees by PILOT

引用

MACHINE LEARNING 2024年第9期113卷 6561-6610页

作者： Raymaekers, Jakob Rousseeuw, Peter J. Verdonck, Tim Yao, Ruicong Univ Antwerp Dept Math Middelheimlaan 1 B-2020 Antwerp Belgium Katholieke Univ Leuven Sect Stat & Data Sci Celestijnenlaan 200B B-3001 Leuven Belgium Maastricht Univ Dept Quantitat Econ Maastricht Netherlands

Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addition, they are more prone to overfitting and extrapolation issues than standard regression trees. In this paper we introduce PILOT, a new algorithm for linear model trees that is fast, regularized, stable and interpretable. PILOT trains in a greedy fashion like classic regression trees, but incorporates an L2 boosting approach and a model selection rule for fitting linear models in the nodes. The abbreviation PILOT stands for PIecewise Linear Organic Tree, where 'organic' refers to the fact that no pruning is carried out. PILOT has the same low time and space complexity as CART without its pruning. An empirical study indicates that PILOT tends to outperform standard decision trees and other linear model trees on a variety of data sets. Moreover, we prove its consistency in an additive model setting under weak assumptions. When the data is generated by a linear model, the convergence rate is polynomial.

关键词： Consistency Piecewise linear model Regression trees scalable algorithms

来源：评论

学校读者我要写书评

暂无评论

RIS-Aided Multiple-Input Multiple-Output Broadcast Channel Capacity

引用

IEEE TRANSACTIONS ON COMMUNICATIONS 2024年第1期72卷 117-132页

作者： Tuan, H. D. Nasir, A. A. Dutkiewicz, E. Poor, H. V. Hanzo, L. Univ Technol Sydney Sch Elect & Data Engn Ultimo NSW 2007 Australia King Fahd Univ Petr & Minerals KFUPM Dept Elect Engn Dhahran 31261 Saudi Arabia King Fahd Univ Petr & Minerals KFUPM Ctr Commun Syst & Sensing Dhahran 31261 Saudi Arabia Princeton Univ Dept Elect & Comp Engn Princeton NJ 08544 USA Univ Southampton Sch Elect & Comp Sci Southampton SO17 1BJ England

scalable algorithms are conceived for obtaining the sum-rate capacity of the reconfigurable intelligent surface (RIS)-aided multiuser (MU) multiple-input multiple-output (MIMO) broadcast channel (BC), where a multi-antenna base station (BS) transmits signals to multi-antenna users with the help of an RIS equipped with a massive number of finite-resolution programmable reflecting elements (PREs). As a byproduct, scalable path-following algorithms emerge for determining the sum-rate capacity of the conventional MIMO BCs, closing a long-standing open problem of information theory. The paper also develops scalable algorithms for maximizing the minimum rate (max-min rate optimization) of the users achieved by the joint design of RIS's PRE and transmit beamforming for such an RIS-aided BC. The simulations provided confirm the high performance achieved by the algorithms developed, despite their low computational complexity.

关键词： Optimization MIMO communication Optimized production technology Minimax techniques Computational modeling Sensors Linear programming Reconfigurable intelligent surface (RIS)-aided communication sum capacity mixed discrete continuous optimization scalable algorithms

来源：评论

学校读者我要写书评

暂无评论

scalable algorithms for Delaunay Mesh Generation

Scalable Algorithms for Delaunay Mesh Generation

引用

作者： Slatton, Andrew G. The Ohio State University

学位级别：Ph.D.

Delaunay refinement is a useful tool for sampling and meshing. Pioneered by Ruppert and Chew for piecewise-linear complexes in R 2, Delaunay refinement has been extended to myriad classes of shapes including smooth 1- and 2-manifolds, volumes bounded by smooth 2-manifolds, and piecewise-smooth complexes. Delaunay refinement algorithms often carry certain guarantees regarding the geometric and topological closeness of output to input, as well as guarantees of the quality of mesh elements, making meshes generated via Delaunay refinement a natural choice for simulation and rendering. A major shortcoming of Delaunay refinement is its scalability: as the size of the mesh grows, the data structures necessary to carry out Delaunay refinement efficiently (such as the Delaunay triangulation and its dual, the Voronoi diagram) also grow, and this incurs memory thrashing when generating dense meshes. In this dissertation, we improve Delaunay refinement in two main capacities: (1) we improve the memory scalability of Delaunay refinement to allow for the generation of truly huge meshes, and (2) we improve the time scalability of Delaunay refinement by developing a novel parallel algorithm. To address the issue of memory scalability, we developed a localized refinement method embodying a divide-and-conquer paradigm for meshing smooth surfaces. The algorithm divides the sampling of the input domain via octree and meshes one node of the octree at a time, thus reducing memory requirements. Our theoretical results show that the algorithm terminates, and that the output is not only consistent, but also is geometrically close to the input and is a subcomplex of the Delaunay triangulation of the sampling restricted to the input surface. Our initial work nicely addresses the aforementioned shortcoming of Delaunay refinement, but only for smooth 2-manifolds. In later work, we extended this technique to another important input class: volumes bounded by smooth 2-manifolds. It is not immediat

关键词： Computer Science Mesh generation Delaunay Parallel algorithms scalable algorithms

来源：评论

学校读者我要写书评

暂无评论

Apache Spark-based scalable feature extraction approaches for protein sequence and their clustering performance analysis

引用

INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2023年第4期15卷 359-378页

作者： Jha, Preeti Tiwari, Aruna Bharill, Neha Ratnaparkhe, Milind Patel, Om Prakash Harshith, Nilagiri Mounika, Mukkamalla Nagendra, Neha Indian Inst Technol Indore India Mahindra Univ Hyderabad India ICAR Indian Inst Soybean Res Indore India

Genome sequencing projects are rapidly contributing to the rise of high-dimensional protein sequence datasets. Extracting features from a high-dimensional protein sequence dataset poses many challenges. However, many features extraction methods exist, but extracting features from millions of protein sequences becomes impractical because these approaches are not scalable. Therefore, to design an efficient scalable feature extraction approach that extracts significant features, we have proposed two Apache Spark-based scalable feature extraction approaches that extracts significantly important features based on statistical properties from huge protein sequences, which are termed 60d-SPF (60-dimensional scalable Protein Feature) and 6d-SCPSF (6-dimensional scalable Co-occurrence-based Probability-Specific Feature). The proposed 60d-SPF and 6d-SCPSF approaches capture the statistical properties of amino acids to create a fixed-length numeric feature vector that represents each protein sequence in terms of 60-dimensional and 6-dimensional features, respectively. The preprocessed huge protein sequences are used as an input in four clustering algorithms, i.e., scalable random sampling with iterative optimization fuzzy c-means (SRSIO-FCM), scalable literal fuzzy c-means (SLFCM), kernelized SRSIO-FCM (KSRSIO-FCM), and kernelized SLFCM (KSLFCM) for clustering. We have conducted extensive experiments on various soybean protein datasets to demonstrate the effectiveness of the proposed feature extraction methods, 60d-SPF, 6d-SCPSF, and existing feature extraction methods on SRSIO-FCM, SLFCM, KSRSIO-FCM, and KSLFCM clustering algorithms. The reported results in terms of the Silhouette index and the Davies-Bouldin index show that the proposed 60d-SPF extraction method on SRSIO-FCM, SLFCM, KSRSIO-FCM, and KSLFCM clustering algorithms achieve significantly better results than the proposed 6d-SCPSF and existing feature extraction approaches.

关键词： Apache Spark cluster Big Data Feature extraction Fuzzy clustering Huge protein sequences scalable algorithms

来源：评论

学校读者我要写书评

暂无评论

Finding MIDDLE Ground: scalable and Secure Distributed Learning 24

Finding MIDDLE Ground: Scalable and Secure Distributed Learn...

引用

33rd ACM International Conference on Information and Knowledge Management (CIKM)

作者： Bornstein, Marco Nazir, Nawaf Drgona, Jan Kundu, Soumya Adetola, Veronica Univ Maryland College Pk MD 20742 USA Pacific Northwest Natl Lab Richland WA USA

ISBN: (纸本)9798400704369

Edge-computing methods allow devices to efficiently train a high-performing, robust, and personalized model for predictive tasks. However, these methods succumb to privacy and scalability concerns such as adversarial data recovery and expensive model communication. Furthermore, edge computing methods unrealistically assume that all devices train an identical model. In practice, edge devices have varying computational and memory constraints, which may not allow certain devices to have the space or speed to train a specific model. To overcome these issues, we propose MIDDLE, a model-independent distributed learning algorithm that allows heterogeneous edge devices to assist each other in training while communicating only non-sensitive information. MIDDLE unlocks the ability for edge devices, regardless of computational or memory constraints, to assist each other even with completely different model architectures. Furthermore, MIDDLE does not require model or gradient communication, significantly reducing communication size and time. We prove that MIDDLE attains the optimal convergence rate O(1/root TM) of stochastic gradient descent for convex and non-convex smooth optimization (for total iterations.. and batch size M). Finally, our experimental results demonstrate that MIDDLE attains robust and high-performing models without model or gradient communication.

关键词： Distributed Machine Learning scalable algorithms Anomaly Detection Privacy Preserving

来源：评论

学校读者我要写书评

暂无评论

Efficient Exploration of the Rashomon Set of Rule-Set Models 24

Efficient Exploration of the Rashomon Set of Rule-Set Models

引用

30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

作者： Ciaperoni, Martino Xiao, Han Gionis, Aristides Aalto Univ Espoo Finland Upright Project Helsinki Finland KTH Royal Inst Technol Stockholm Sweden

ISBN: (纸本)9798400704901

Today, as increasingly complex predictive models are developed, simple rule sets remain a crucial tool to obtain interpretable predictions and drive high-stakes decision making. However, a single rule set provides a partial representation of a learning task. An emerging paradigm in interpretable machine learning aims at exploring the Rashomon set of all models exhibiting near-optimal performance. Existing work on Rashomon-set exploration focuses on exhaustive search of the Rashomon set for particular classes of models, which can be a computationally challenging task. On the other hand, exhaustive enumeration leads to redundancy that often is not necessary, and a representative sample or an estimate of the size of the Rashomon set is sufficient for many applications. In this work, we propose, for the first time, efficient methods to explore the Rashomon set of rule-set models with or without exhaustive search. Extensive experiments demonstrate the effectiveness of the proposed methods in a variety of scenarios.

关键词： Interpretable machine learning Rashomon set Rule-based classification scalable algorithms

来源：评论

学校读者我要写书评

暂无评论

scalable User Rate and Energy-Efficiency Optimization in Cell-Free Massive MIMO

引用

IEEE TRANSACTIONS ON COMMUNICATIONS 2022年第9期70卷 6050-6065页

作者： Tuan, H. D. Nasir, A. A. Ngo, H. Q. Dutkiewicz, E. Poor, H., V Univ Technol Sydney Sch Elect & Data Engn Sydney NSW 2007 Australia King Fahd Univ Petr & Minerals KFUPM Dept Elect Engn Dhahran 31261 Saudi Arabia King Fahd Univ Petr & Minerals KFUPM Ctr Commun Syst & Sensing Dhahran 31261 Saudi Arabia Queens Univ Belfast Sch Elect Elect Engn & Comp Sci Belfast BT3 9DT Antrim North Ireland Princeton Univ Dept Elect & Comp Engn Princeton NJ 08544 USA

This paper considers a cell-free massive multiple-input multiple-output network (cfm-MIMO) with a massive number of access points (APs) distributed across an area to deliver information to multiple users. Based on only local channel state information, conjugate beamforming is used under both proper and improper Gaussian signalings. To accomplish the mission of cfm-MIMO in providing fair service to all users, the problem of power allocation to maximize the geometric mean (GM) of users' rates (GM-rate) is considered. A new scalable algorithm, which iterates linear-complex closed-form expressions and thus is practical regardless of the scale of the network, is developed for its solution. The problem of quality-of-service (QoS) aware network energy-efficiency is also addressed via maximizing the ratio of the GM-rate and the total power consumption, which is also addressed by iterating linear-complex closed-form expressions. Intensive simulations are provided to demonstrate the ability of the GM-rate based optimization to achieve multiple targets such as a uniform QoS, a good sum rate, and a fair power allocation to the APs.

关键词： Manganese Channel estimation Optimization Resource management Random variables Quality of service Indexes Cell-free massive MIMO (cfm-MIMO) conjugate beamforming (CB) energy efficiency geometric mean nonconvex optimization scalable algorithms

来源：评论

学校读者我要写书评

暂无评论

HUGE: Huge Unsupervised Graph Embeddings with TPUs 23

HUGE: Huge Unsupervised Graph Embeddings with TPUs

引用

29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)

作者： Mayer, Brandon A. Tsitsulin, Anton Fichtenberger, Hendrik Halcrow, Jonathan Perozzi, Bryan Google Res Mountain View CA 94043 USA Google Res Zurich Switzerland

ISBN: (纸本)9798400701030

Graphs are a representation of structured data that captures the relationships between sets of objects. With the ubiquity of available network data, there is increasing industrial and academic need to quickly analyze graphs with billions of nodes and trillions of edges. A common first step for network understanding is Graph Embedding, the process of creating a continuous representation of nodes in a graph. A continuous representation is often more amenable, especially at scale, for solving downstream machine learning tasks such as classification, link prediction, and clustering. A high-performance graph embedding architecture leveraging Tensor Processing Units (TPUs) with configurable amounts of high-bandwidth memory is presented that simplifies the graph embedding problem and can scale to graphs with billions of nodes and trillions of edges. We verify the embedding space quality on real and synthetic large-scale datasets.

关键词： graph embedding scalable algorithms tensor processing units

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：