检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Yang, Rui Kornaropoulos, Evgenios M. Cheng, Yue University of Virginia United States George Mason University United States

Learned Index Structures (LIS) view a sorted index as a model that learns the data distribution, takes a data element key as input, and outputs the predicted position of the key. The original LIS can only handle lookup operations with no support for updates, rendering it impractical to use for typical workloads. To address this limitation, recent studies have focused on designing efficient dynamic learned indexes. ALEX, as the first and one of the representative dynamic learned index structures, enables dynamism by incorporating a series of design choices, including adaptive key space partitioning, dynamic model retraining, and sophisticated engineering and policies that prioritize read/write performance. While these design choices offer improved average-case performance, the emphasis on flexibility and performance increases the attack surface by allowing adversarial behaviors that maximize ALEX’s memory space and time complexity in worst-case scenarios. In this work, we present the first systematic investigation of algorithmic complexity attacks (ACAs) targeting the worst-case scenarios of ALEX. We introduce new ACAs that fall into two categories, space ACAs and time ACAs, which target the memory space and time complexity, respectively. First, our space ACA on data nodes exploits ALEX’s gapped array layout and uses Multiple-Choice Knapsack (MCK) to generate an optimal adversarial insertion plan for maximizing the memory consumption at the data node level. Second, our space ACA on internal nodes exploits ALEX’s catastrophic cost mitigation mechanism, causing an out-of-memory (OOM) error with only a few hundred adversarial insertions. Third, our time ACA generates pathological insertions to increase the disparity between the actual key distribution and the linear models of data nodes, deteriorating the runtime performance by up to 1, 641× compared to ALEX operating under legitimate workloads. Copyright © 2024, The Authors. All rights reserved.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

The Complexity of Drawing Graphs on Few Lines and Few Planes

引用

Journal of Graph Algorithms and Applications 2023年第6期27卷 459-488页

作者： Chaplick, Steven Fleszar, Krzysztof Lipp, Fabian Ravsky, Alexander Verbitsky, Oleg Wolff, Alexander Maastricht University Netherlands Institute of Informatics University of Warsaw Poland Institut für Informatik Universität Würzburg Germany Pidstryhach Institute for Applied Problems of Mechanics and Mathematics National Academy of Sciences of Ukraine Lviv Ukraine Institut für Informatik Humboldt Universität Germany

It is well known that any graph admits a crossing-free straight-line drawing in R3 and that any planar graph admits the same even in R2. For a graph G and d ∈ {2, 3}, let ρ1d(G) denote the smallest number of lines in Rd whose union contains a crossing-free straight-line drawing of G. For d = 2, the graph G must be planar. Similarly, let ρ23(G) denote the smallest number of planes in R3 whose union contains a crossing-free straight-line drawing of G. We investigate the complexity of computing these three parameters and obtain the following hardness and algorithmic results. • For d ∈ {2, 3}, we prove that deciding whether ρ1d(G) ≤ k for a given graph G and integer k is ∃R-complete. • Since NP ⊆ ∃R, deciding ρ1d(G) ≤ k is NP-hard for d ∈ {2, 3}. On the positive side, we show that the problem is fixed-parameter tractable with respect to k. • Since ∃R ⊆ PSPACE, both ρ12(G) and ρ13(G) are computable in polynomial space. On the negative side, we show that drawings that are optimal with respect to ρ12 or ρ13 sometimes require irrational coordinates. • We prove that deciding whether ρ23(G) ≤ k is NP-hard for any fixed k ≥ 2. Hence, the problem is not fixed-parameter tractable with respect to k unless P = NP. © 2023, Brown University. All rights reserved.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Probabilistic automatic complexity of finite strings

arXiv

引用

arXiv 2024年

作者： Gill, Kenneth Penn State University United States

We introduce a new complexity measure for finite strings using probabilistic finite-state automata (PFAs), in the same spirit as existing notions employing DFAs and NFAs, and explore its properties. The PFA complexity AP (x) is the least number of states of a PFA for which x is the most likely string of its length to be accepted. The variant AP,δ(x) adds a real-valued parameter δ specifying a required lower bound on the gap in acceptance probabilities between x and other strings. We prove AP,δ is δ-computable for all δ, relate AP to the DFA and NFA complexities, and obtain a complete classification of binary strings with AP = 2. Finally, we discuss several other variations on AP with a view to obtaining additional desirable *** Codes 68Q30 Copyright © 2024, The Authors. All rights reserved.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Multi-dimensional state space collapse in non-complete resource pooling scenarios

arXiv

引用

arXiv 2024年

作者： Cardinaels, Ellen Borst, Sem van Leeuwaarden, Johan S.H.

The present paper establishes an explicit multi-dimensional state space collapse (SSC) for parallel-processing systems with arbitrary compatibility constraints between servers and job types. This breaks major new ground beyond the SSC results and queue length asymptotics in the literature which are largely restricted to complete resource pooling (CRP) scenarios where the steady-state queue length vector concentrates around a line in heavy traffic. The multi-dimensional SSC that we establish reveals heavy-traffic behavior which is also far more tractable than the pre-limit queue length distribution, yet exhibits a fundamentally more intricate structure than in the one-dimensional case, providing useful insight into the system dynamics. In particular, we prove that the limiting queue length vector lives in a K-dimensional cone of which the set of spanning vectors is random in general, capturing the delicate interplay between the various job types and servers. For a broad class of systems we provide a further simplification which shows that the collection of random cones constitutes a fixed K-dimensional cone, resulting in a Kdimensional SSC. The dimension K represents the number of critically loaded subsystems, or equivalently, capacity bottlenecks in heavy-traffic, with K = 1 corresponding to conventional CRP scenarios. Our approach leverages probability generating function (PGF) expressions for Markovian systems operating under redundancy policies. © 2024, CC BY.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

FastECPP over MPI

arXiv

引用

arXiv 2024年

作者： Enge, Andreas INRIA Université de Bordeaux CNRS CANARI Talence33400 France

The FastECPP algorithm is currently the fastest approach to prove the primality of general numbers, and has the additional benefit of creating certificates that can be checked independently and with a lower complexity. This article shows how by parallelising over a linear number of cores, its quartic time complexity becomes a cubic wall-clock time complexity;and it presents the algorithmic choices of the FastECPP implementation in the author’s Cm software https://***/cm/ which has been written with massive parallelisation over MPI in mind, and which has been used to establish a new primality record for the "repunit" (1086453 − 1)/9. Copyright © 2024, The Authors. All rights reserved.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Exploring Power Usage and Inference Speed Evaluation of Low-power Clustered Many-core Platforms

引用

ACM Transactions on Cyber-Physical systems 2025年第2期9卷

作者： Hasumi, Masahiro Yabe, Takuma Azumi, Takuya Saitama University Saitama Japan

High-performance platforms capable of running deep neural network (DNN)-based applications are necessary for embedded systems such as autonomous-driving systems. These systems must be compact and power-efficient, rather than relying on rich-computational power platforms such as graphics processing units (GPUs). Additionally, platforms capable of executing multiple applications in parallel in such large-scale systems are required. Clustered many-core platforms, such as the Kalray massively parallel processor array (MPPA) 3-80 Coolidge, have been designed to meet these requirements. Coolidge employs more computing cores compared with single-core or multi-core processors, allowing for reduced clock frequencies per core. Consequently, Coolidge is a high-performance platform with low-power usage. Additionally, in Coolidge, cores are grouped into clusters, each of which can independently run different applications. This enables a single Coolidge platform to support multiple applications simultaneously. In the realm of cyber-physical systems (CPS), which bridge the physical and digital domains, these platforms become crucial. CPS relies on real-time embedded systems, like those in autonomous vehicles, which necessitate low-power, high-performance platforms that can perform complex computations like DNNs for object detection. In our evaluation, we examine DNN inference speed and explore the performance of the Coolidge platform. The DNN task for the evaluation employs object detection, which is commonly used in autonomous-driving systems. The acceptable speed threshold for real-time applications in object detection is 30 frames per second. Our study reveals that the "you only look once"v5 model can exceed performance benchmarks with only one cluster in Coolidge for int8 data type and two clusters for the higher precision fp16 data type. Moreover, when two clusters are utilized for DNN tasks, the remaining clusters are available for other non-DNN applications, underlining the

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Vectorized and parallel Computation of Large Smooth-Degree Isogenies using Precedence-Constrained Scheduling

IACR Transactions on Cryptographic Hardware and Embedded Sys...

引用

IACR Transactions on Cryptographic Hardware and Embedded systems 2023年第3期2023卷 246-269页

作者： Phalakarn, Kittiphon Suppakitpaisarn, Vorapong Rodríguez-Henríquez, Francisco Hasan, M. Anwar University of Waterloo Waterloo Canada The University of Tokyo Tokyo Japan CINVESTAV-IPN Mexico City Mexico Technology Innovation Institute Abu Dhabi United Arab Emirates

Strategies and their evaluations play important roles in speeding up the computation of large smooth-degree isogenies. The concept of optimal strategies for such computation was introduced by De Feo et al., and virtually all implementations of isogeny-based protocols have adopted this approach, which is provably optimal for single-core platforms. In spite of its inherent sequential nature, several recent works have studied ways of speeding up this isogeny computation by exploiting the rich parallelism available in vectorized and multi-core platforms. One obstacle to taking full advantage of this parallelism, however, is that De Feo et al.’s strategies are not necessarily optimal in multi-core environments. To illustrate how the speed of vectorized and parallel isogeny computation can be improved at the strategy-level, we present two novel software implementations that utilize a state-of-the-art evaluation technique, called precedence-constrained scheduling (PCS), presented by Phalakarn et al., with our proposed strategies crafted for these environments. Our first implementation relies only on the parallelism provided by multi-core processors. The second implementation targets multi-core processors supporting the latest generation of the Intel’s Advanced Vector eXtensions (AVX) technology, commonly known as AVX-512IFMA instructions. To better handle the computational concurrency associated with PCS, we equip both implementations with extensive synchronization techniques. Our first implementation outperforms the implementation of Cervantes-Vázquez et al. by yielding up to 14.36% reduction in the execution time, when targeting platforms with two-to four-core processors. Our second implementation, equipped with four cores, achieves up to 34.05% reduction in the execution time compared to the single-core implementation of Cheng et al. of CHES 2022. © 2023, Ruhr-University of Bochum. All rights reserved.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Complexity of the LTI system trajectory boundedness problem 60

Complexity of the LTI system trajectory boundedness problem

引用

60th IEEE Conference on Decision and Control (CDC)

作者： Berger, Guillaume O. Jungers, Raphael M. UCLouvain ICTEAM Inst Louvain Belgium

ISBN: (纸本)9781665436595

We study the algorithmic complexity of the problem of deciding whether a Linear Time Invariant dynamical system with rational coefficients has bounded trajectories. Despite its ubiquitous and elementary nature in systems and Control, it turns out that this question is quite intricate, and, to the best of our knowledge, unsolved in the literature. We show that classical tools, such as Gaussian Elimination, the Routh-Hurwitz Criterion, and the Euclidean Algorithm for GCD of polynomials indeed allow for an algorithm that is polynomial in the bit size of the instance. However, all these tools have to be implemented with care, and in a non-standard way, which relies on an advanced analysis.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

A locality-based lens for coded computation

引用

IEEE Transactions on Information Theory 2025年

作者： Rudow, Michael Rashmi, K.V. Guruswami, Venkatesan Carnegie Mellon University PittsburghPA15213 United States

Coded computation is an emerging paradigm of applying coding theory to large-scale distributed computing to provide resilience against slow or otherwise unavailable workers. We propose a new approach to view coded computation via the lens of the locality of codes. We do so by defining a new notion of locality, called computational locality, using the locality properties of an appropriately defined code for the function being computed. This notion of locality incorporates the unique aspects of locality arising in the context of coded computation. Our first major contribution is to demonstrate how to design a coded computation scheme for a function using the local recovery scheme of an appropriately defined code. The so-obtained scheme rederives the best known coded computation scheme for multivariate polynomial functions via the viewpoint of the locality of the Reed-Muller code. Our second major contribution is to show that the proposed locality-based approach enables new tradeoffs (e.g., communication bandwidth vs number of workers) compared to existing coded computation schemes. Specifically for the case when there is known linear dependence among inputs—common in many real-world applications—the proposed approach significantly reduces resource overhead (i.e., number of workers) without incurring any tradeoffs. © 1963-2012 IEEE.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Explicit Constructions of Capacity-Achieving T-PIR Schemes over Small Fields via Generalized Minor Matrices

引用

IEEE Transactions on Information Theory 2025年

作者： Xu, Jingke Fang, Weijun Shandong Agricultural University School of Information Science and Engineering Tai’an271017 China Shandong University State Key Laboratory of Cryptography and Digital Economy Security Qingdao266237 China Ministry of Education Shandong University Key Laboratory of Cryptologic Technology and Information Security Qingdao266237 China Shandong University School of Cyber Science and Technology Qingdao266237 China

Suppose a distributed storage system containing M files is replicated across N servers, and a user wants to privately retrieve one file by accessing the servers such that the identity of the retrieved file is kept secret from any subset of up to T servers, where each file can be viewed as a vector over the q-ary finite field Fq. A scheme designed for this purpose is called a T-private information retrieval (T-PIR) scheme. We consider the problem of explicitly constructing capacity-achieving T-PIR schemes over small finite fields. In this paper, we first provide a general framework for constructing explicit capacity-achieving T-PIR schemes for all parameters, which only relies on an MDS array matrix with a special information set. To construct such an MDS array matrix, we propose a new family of matrices over finite fields, called the generalized minor matrices of the Moore matrix, and establish a series of key identities. By combining favourable properties of generalized minor matrices with our framework, we construct an explicit capacityachieving T-PIR scheme with optimal sub-packetization over the field Fq, as small as possible, for three classes of parameters N, T,M ≥ 3. Specifically, the first class of construction works for all N = d(2t − 1), T = dt, and the field size q is the least prime power satisfying qt−1 ≥ N. Moreover, this construction generalizes the scheme proposed by Xu and Wang in 2022, which only considers the case of N = d(2t−1), T = dt with 2t−1≥ N. For all N = d(2t + 1), T = dt, our second T-PIR scheme is the first explicit construction, and the field size q is the least prime power satisfying qt ≥ N, which is the smallest field size among all known explicit capacity-achieving T-PIR schemes. Particularly, when 2t≥ N, the field size of such constructions can be reduced to 2. In the case of N = 4s and T = 2s + 1, our scheme is the first one to reduce the field size to q = 2. Compared with all known explicitly capacity-achieving T-PIR schemes, the

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：