Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carr...
Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center. the papers in this book comprise the proceedings of the meeting mentioned on the cover and title page. they reflect the authors' opinions and, in the interests of timely dissemination, are published as presented and without change. their inclusion in this publication does not necessarily constitute endorsement by the editors or the Institute of Electrical and Electronics Engineers, Inc.
Transformer-based models are widely used in natural language processing tasks, and their application has been further extended to computer vision as well. In their usage, data security has become a crucial concern whe...
详细信息
Transformer-based models are widely used in natural language processing tasks, and their application has been further extended to computer vision as well. In their usage, data security has become a crucial concern when deploying deep learning services on cloud platforms. To address these security concerns, Multi-party computation (MPC) is employed to prevent data and model leakage during the inference process. However, Transformer model introduces several challenges for MPC computation, including the time overhead of the Softmax (normalized exponential) function, the accuracy issue caused by the "dynamic range" of approximated division and exponential, and the high memory overhead when processing long sequences. To overcome these challenges, we propose MLformer, an MPC-based inference framework for transformer models based on Crypten Knott et al. (Adv Neural Inf Process Syst 34: 4961-4973, 2021), a secure machine learning framework suggested by Facebook AI Research group, in the semi-honest adversary model. In this framework, we replace the softmax attention with linear attention, which has linear time and memory complexity with input length. the modification eliminates the softmax function entirely, resulting in lower time and memory overhead. To ensure the accuracy of linear attention, we propose the scaled linear attention to address the dynamic range issue caused by the MPC division used and a new approximate division function is proposed to reduce the computational time of the attention block. Furthermore, to improve the efficiency and accuracy of MPC exponential and reciprocal which are commonly used in transformer model, we propose a novel MPC exponential protocol and first integrate the efficient reciprocal protocol Bar-Ilan and Beaver (in Proceedings of the 8thannualacmsymposium on principles of distributed computing, pp. 201-209, 1989) to our framework. Additionally, we optimize the computation of causal linear attention, which is utilized in private in
Two novel variations on sample sort, one using only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead and another using regular sam...
详细信息
Two novel variations on sample sort, one using only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead and another using regular sampling for choosing splitters, were studied. the two were coded in Split-C and were run on a variety of platforms. Results were consistent withtheoretical analysis and illustrated the scalability and efficiency of the algorithms.
Scheduling problems that are critical and prevalent in practical parallel computing are computed. A polynomial time makespan algorithm that produces a schedule of length O(V+Φ log T), which is therefore an O(log T) a...
详细信息
ISBN:
(纸本)9780897918091
Scheduling problems that are critical and prevalent in practical parallel computing are computed. A polynomial time makespan algorithm that produces a schedule of length O(V+Φ log T), which is therefore an O(log T) approximation is presented to solve these problems. the makespan algorithm can be extended to minimize the weighted average completion time over all the jobs to the same approximation factor of O(log T).
A library, called PAD, of basic parallelalgorithms and data structures for the PRAM is currently being implemented using the PRAM programming language Fork95. Main motivations of the PAD project is to study the PRAM ...
详细信息
A library, called PAD, of basic parallelalgorithms and data structures for the PRAM is currently being implemented using the PRAM programming language Fork95. Main motivations of the PAD project is to study the PRAM as a practical programming model, and to provide an organized collection of basic PRAM algorithms for the SB-PRAM under completion at the University of Saarbruecken. We give a brief survey of Fork95, and describe the main components of PAD. Finally we report on the status of the language and library and discuss further developments.
We describe a randomized CRCW PRAM algorithm that finds a minimum spanning forest of an n-vertex graph in O(log n) time and linear work. this shaves a factor of 2log* n off the best previous running time for a linear-...
详细信息
ISBN:
(纸本)9780897918091
We describe a randomized CRCW PRAM algorithm that finds a minimum spanning forest of an n-vertex graph in O(log n) time and linear work. this shaves a factor of 2log* n off the best previous running time for a linear-work algorithm. the novelty in our approach is to divide the computation into two phases, the first of which finds only a partial solution. this idea has been used previously in parallel connected components algorithms.
We analyze universal routing protocols, that is, protocols that can be used for any communication pattern in any network, under a stochastic model of continuous message generation, In particular, we present two univer...
详细信息
ISBN:
(纸本)9780897918091
We analyze universal routing protocols, that is, protocols that can be used for any communication pattern in any network, under a stochastic model of continuous message generation, In particular, we present two universal protocols, a store-and-forward and a wormhole routing protocol, and characterize their performance by the following three parameters: the maximum message generation rate for which the protocol is stable, the expected delay of a message from generation to service, and the time the protocol needs to recover from worst-case scenarios. Both protocols yield significant performance improvements over all previously known continuous routing protocols. In addition, we present adaptations of our protocols to continuous routing in node-symmetric networks, butterflies, and meshes.
暂无评论