检索结果-内蒙古大学图书馆

31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024

作者： Chadha, Mohak Wieland, Paul Gerndt, Michael Technische Universität München Computer Architecture and Parallel Systems Garching Germany

ISBN: (纸本)9798350330663

With the advent of AWS Lambda in 2014, Serverless Computing, particularly Function-as-a-Service (FaaS), has witnessed growing popularity across various application domains. FaaS enables an application to be decomposed into fine-grained functions that are executed on a FaaS platform. It offers several advantages such as no infrastructure management, a pay-per-use billing policy, and on-demand fine-grained autoscaling. However, despite its advantages, developers today encounter various challenges while adopting FaaS solutions that reduce productivity. These include FaaS platform lock-in, support for diverse function deployment parameters, and diverse interfaces for interacting with FaaS platforms. To address these challenges, we present gFaaS, a novel framework that facilitates the holistic development and management of functions across diverse FaaS platforms. Our framework enables the development of generic functions in multiple programming languages that can be seamlessly deployed across different platforms without modifications. Results from our experiments demonstrate that gFaaS functions perform similarly to native platform-specific functions across various scenarios. A video demonstrating the functioning of gFaaS is available from https://***/STbb6ykJFf0. © 2024 IEEE.

关键词： Interoperability

来源：评论

学校读者我要写书评

暂无评论

Real-Time Capability of Dlr's Beamforming Synthetic Aperture Radar Processing architecture

Real-Time Capability of Dlr's Beamforming Synthetic Aperture...

引用

2023 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2023

作者： Schlemon, Maron Schulz, Martin Scheiber, Rolf Jasger, Marc Oliva, Joel Amao Germany Technical University of Munich Computer Architecture and Parallel Systems Germany

Synthetic Aperture Radar (SAR) enables the generation of realistic and high-resolution 2D or 3D representations of landscapes. Typically, radar instruments are deployed in specially equipped, low-flying aircraft that capture a significant amount of raw data, necessitating image reconstruction processing. However, the aircraft's limited onboard processing capabilities (power, size, weight, cooling, and communication bandwidth to ground stations) and the need to generate multiple SAR products, such as slant-range and geo-coded images during a single flight, require efficient onboard processing and transmission to the ground station. This paper outlines the processing architecture of the digital beamforming SAR (DBFSAR) employed by the German Aerospace Center (DLR) and the specific measures implemented to enable onboard processing. We elucidate the essential software optimizations and their integration into the SAR onboard routines, facilitating (near) real-time capability under certain conditions. Furthermore, we share the insights gained from our work and discuss their applicability to other processing scenarios with limited resource availability. © 2023 IEEE.

关键词： Synthetic aperture radar

来源：评论

学校读者我要写书评

暂无评论

What’s Missing in Agile Hardware Design? Verification!

引用

Journal of computer Science & Technology 2023年第4期38卷 735-736页

作者： Babak Falsafi Parallel Systems Architecture Laboratory Institute of Computer and Communication SciencesSchool of Computer andCommunication SciencesEcole Polytechnique Fédérale de LausanneLausanneCH-1015Switzerland

Agile hardware design is an approach to developing hardware systems that draws inspiration from the principles and practices of agile software *** emphasizes collaboration,flexibility,iterative development,and quick adaptation to changing *** agile hardware design,the focus is on delivering functionalhardware systems in shorter development cycles while maintaining high-quality and customer *** particular,agile hardware design is of great interest in the open-source hardware ***-sourcehardware development—such as RISC-V—is at the forefront of initiatives to democratize hardware and drive innovation in chip design *** design is instrumental for the RISC-V community because it supportsrapid iteration,accommodates the evolving RISC-V standard and the addition of custom extensions,improvescommunity collaboration and time-to-market,and addresses the design challenges associated with complex architectural features.

关键词： hardware agile architectural

来源：评论

学校读者我要写书评

暂无评论

FedLesScan: Mitigating Stragglers in Serverless Federated Learning

FedLesScan: Mitigating Stragglers in Serverless Federated Le...

引用

2022 IEEE International Conference on Big Data, Big Data 2022

作者： Elzohairy, Mohamed Chadha, Mohak Jindal, Anshul Grafberger, Andreas Gu, Jianfeng Gerndt, Michael Abboud, Osama Computer Architecture and Parallel Systems Germany Huawei Technologies Munich Germany

ISBN: (纸本)9781665480451

Federated Learning (FL) is a machine learning paradigm that enables the training of a shared global model across distributed clients while keeping the training data local. While most prior work on designing systems for FL has focused on using stateful always running components, recent work has shown that components in an FL system can greatly benefit from the usage of serverless computing and Function-as-a-Service technologies. To this end, distributed training of models with severless FL systems can be more resource-efficient and cheaper than conventional FL systems. However, serverless FL systems still suffer from the presence of stragglers, i.e., slow clients due to their resource and statistical heterogeneity. While several strategies have been proposed for mitigating stragglers in FL, most methodologies do not account for the particular characteristics of serverless environments, i.e., cold-starts, performance variations, and the ephemeral stateless nature of the function instances. Towards this, we propose FedLesScan, a novel clustering-based semi-asynchronous training strategy, specifically tailored for serverless F L. FedLesScan dynamically adapts to the behavior of clients and minimizes the effect of stragglers on the overall system. We implement our strategy by extending an open-source serverless FL system called FedLess. Moreover, we comprehensively evaluate our strategy using the 2nd generation Google Cloud Functions with four datasets and varying percentages of stragglers. Results from our experiments show that compared to other approaches FedLesScan reduces training time and cost by an average of 8% and 20% respectively while utilizing clients better with an average increase in the effective update ratio of 17.75%. © 2022 IEEE.

关键词： Deep learning

来源：评论

学校读者我要写书评

暂无评论

Design of an FPGA-Based Neutral Atom Rearrangement Accelerator for Quantum Computing

Design of an FPGA-Based Neutral Atom Rearrangement Accelerat...

引用

2025 Design, Automation and Test in Europe Conference, DATE 2025

作者： Guo, Xiaorang Winklmann, Jonas Stober, Dirk Elsharkawy, Amr Schulz, Martin Technical University of Munich Computer Architecture and Parallel Systems Garching Germany

ISBN: (纸本)9783982674100

Neutral atoms have emerged as a promising technology for implementing quantum computers due to their scalability and long coherence times. However, the execution frequency of neutral atom quantum computers is constrained by image processing procedures, particularly the assembly of defect-free atom arrays, which is a crucial step in preparing qubits (atoms) for execution. To optimize this assembly process, we propose a novel quadrant-based rearrangement algorithm that employs a divide-and-conquer strategy and also enables the simultaneous movement of multiple atoms, even across different columns and rows. We implement the algorithm on Field Programmable Gate Arrays (FPGAs) to handle each quadrant independently (hardware-level optimization) while maximizing parallelization. To the best of our knowledge, this is the first hardware acceleration work for atom rearrangement, and it significantly reduces the processing time. This achievement also contributes to the ongoing efforts of tightly integrating quantum accelerators into High-Performance Computing (HPC) systems. Tested on a Zynq RFSoC FPGA at 250 MHz, our hardware implementation is able to complete the rearrangement process of a 30 × 30 compact target array, derived from a 50 × 50 initial loaded array, in approximately 1.0 μs. Compared to a comparable CPU implementation and to state-of-the-art FPGA work, we achieved about 54 x and 300 x speedups in the rearrangement analysis time, respectively. Additionally, the FPGA-based acceleration demonstrates good scalability, allowing for seamless adaptation to varying sizes of the atom array, which makes this algorithm a promising solution for large-scale quantum systems. © 2025 EDAA.

关键词： Qubits

来源：评论

学校读者我要写书评

暂无评论

Realistic Neutral Atom Image Simulation 4

Realistic Neutral Atom Image Simulation

引用

4th IEEE International Conference on Quantum Computing and Engineering, QCE 2023

作者： Winklmann, Jonas Tsevas, Dimitrios Schulz, Martin Computer Architecture and Parallel Systems Technical University of Munich Munich Germany Max Planck Institute of Quantum Optics Quantum Many-Body Systems Division Garching Germany

ISBN: (纸本)9798350343236

Neutral atom quantum computers require accurate single atom detection for the preparation and readout of their qubits. This is usually done using fluorescence imaging. The occupancy of an atom site in these images is often somewhat ambiguous due to the stochastic nature of the imaging process. Further, the lack of ground truth makes it difficult to rate the accuracy of reconstruction algorithms. We introduce a bottom-up simulator that is capable of generating sample images of neutral atom experiments from a description of the actual state in the simulated system. Possible use cases include the creation of exemplary images for demonstration purposes, fast training iterations for deconvolution algorithms, and generation of labeled data for machine-learning-based atom detection approaches. The implementation is available through our GitHub as a C library or wrapped Python package. We show the modeled effects and implementation of the simulations at different stages of the imaging process. Not all real-world phenomena can be reproduced perfectly. The main discrepancies are that the simulator allows for only one characterization of optical aberrations across the whole image, supports only discrete atom locations, and does not model all effects of complementary metal-oxide-semiconductor (CMOS) cameras perfectly. Nevertheless, our experiments show that the generated images closely match real-world pictures to the point that they are practically indistinguishable and can be used as labeled data for training the next generation of detection algorithms. © 2023 IEEE.

关键词： Atoms

来源：评论

学校读者我要写书评

暂无评论

Proximity-based service discovery for distributed digital twin systems

引用

Discover Internet of Things 2025年第1期5卷 1-24页

作者： Rothermel, Kurt Herzog, Otthein Zhiqiang, Siegfried Wu Institute Of Parallel and Distributed Systems University of Stuttgart Universitaetsstr. 38 Stuttgart70569 Germany College of Architecture and Urban Planning Tongji University 1239 Siping Road Shanghai200092 China Department of Mathematics and Computer Science University of Bremen Bibliothekstrasse 5 Bremen23884 Germany

Over the past decade, there has been a significant increase in interest in digital twin (DT) technology in a variety of domains. While research on DTs of single assets was initially prevalent, there has been a notable shift towards distributed systems of DTs, which connect to each other to collaborate. Typically, collaboration is enabled by DTs providing services that can be consumed by other DTs. In service-oriented systems, a service is typically identified by type information. However, this is not sufficient in distributed DT systems, where DTs associated with different physical entities may provide the same type of service. Consequently, selecting the appropriate service depends not only on the service type, but also on the associated physical entity. However, requiring DTs to know the mapping of services to their physical environment is not feasible for large dynamic systems. This paper presents a novel proximity-based service discovery method that allows DTs to select services based on service type and their proximity to other objects. That is, service specifications are fully abstracted from the mapping of services to physical objects, relieving DTs from maintaining information about this mapping. Furthermore, service discovery is robust to changes in the physical environment and service population. The proposed service discovery method has been implemented on top of a spatial DBMS. We argue that this implementation is optimal in terms of network utilization and latency, and perform comprehensive evaluations to show the performance of discovery queries as a function of their complexity. © The Author(s) 2025.

关键词： Location based services

来源：评论

学校读者我要写书评

暂无评论

Leveraging Hybrid Classical-Quantum Methods for Efficient Load Rebalancing in HPC

Leveraging Hybrid Classical-Quantum Methods for Efficient Lo...

引用

2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024

作者： Zawalska, Justyna Chung, Minh Rycerz, Katarzyna Schulz, Laura Schulz, Martin Kranzlmuller, Dieter AGH University of Krakow Institute of Computer Science Krakow Poland CYFRONET AGH Academic Computer Center Krakow Poland Garching bei München Germany Chair for Computer Architecture and Parallel Systems Germany MNM-Team Germany

ISBN: (纸本)9798350355543

Load imbalance is a challenge for parallel applications in High Performance Computing (HPC). It is caused by processes having different execution times or load values, leading to idle or wait times at synchronization points, where faster processes must wait for the slowest process to catch up. To mitigate this issue, applications can employ load balancing (LB) strategies, which migrate load between processes to even out load. This is often referred to as the Load Rebalancing Problem (LRP). While many approaches solving the LRP exist, they can only be heuristics and hence further optimization potential exists. In our work, we turn to a novel approach by using hybrid classical-quantum approaches and present two versions of the constrained quadratic model for solving the LRP;the two differ in how they balance the number of qubits required with the types of applied constraints. We compare the quantum-based methods with classical methods using heuristic algorithms Greedy, Karmarkar-Karp, and ProactLB. We evaluate our approaches using imbalance ratio and speedup as metrics, as well as the number of migrated tasks to indicate overhead caused by migrations. Our results show that the quantum-based methods outperform the classic methods. For example, we need only 1/4 of the number of migrated tasks in a realistic use case compared with classical methods, particularly Greedy and KK, to balance the load. © 2024 IEEE.

关键词： CQM HPC HPCQC integration Load Rebalancing Quantum Computing task migration

来源：评论

学校读者我要写书评

暂无评论

VersaSlot: Efficient Fine-grained FPGA Sharing with *** Slots and Live Migration in FPGA Cluster

arXiv

引用

arXiv 2025年

作者： Gu, Jianfeng Wang, Hao Guo, Xiaorang Schulz, Martin Gerndt, Michael Chair of Computer Architecture and Parallel Systems Technical University of Munich Munich Germany

As FPGAs gain popularity for on-demand application acceleration in data center computing, dynamic partial reconfiguration (DPR) has become an effective fine-grained sharing technique for FPGA multiplexing. However, current FPGA sharing encounters partial reconfiguration contention and task execution blocking problems introduced by the DPR, which significantly degrade application performance. In this paper, we propose VersaSlot, an efficient spatio-temporal FPGA sharing system with novel *** slot architecture that can effectively resolve the contention and task blocking while improving resource utilization. For the heterogeneous *** architecture, we introduce an efficient slot allocation and scheduling algorithm, along with a seamless cross-board switching and live migration mechanism, to maximize FPGA multiplexing across the cluster. We evaluate the VersaSlot system on an FPGA cluster composed of the latest Xilinx UltraScale+ FPGAs (ZCU216) and compare its performance against four existing scheduling algorithms. The results demonstrate that VersaSlot achieves up to 13.66x lower average response time than the traditional temporal FPGA multiplexing, and up to 2.19x average response time improvement over the state-of-the-art spatio-temporal sharing systems. Furthermore, VersaSlot enhances the LUT and FF resource utilization by 35% and 29% on average, respectively. Copyright © 2025, The Authors. All rights reserved.

关键词： Scheduling algorithms

来源：评论

学校读者我要写书评

暂无评论

Design of an FPGA-Based Neutral Atom Rearrangement Accelerator for Quantum Computing

Design of an FPGA-Based Neutral Atom Rearrangement Accelerat...

引用

Design, Automation and Test in Europe Conference and Exhibition

作者： Xiaorang Guo Jonas Winklmann Dirk Stober Amr Elsharkawy Martin Schulz Chair of Computer Architecture and Parallel Systems Technical University of Munich Garching Germany

ISBN: (数字)9783982674100

ISBN: (纸本)9798331534646

关键词： computers Schedules Quantum system Quantum computing Scalability Qubit Atoms Central Processing Unit Field programmable gate arrays Assembly

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：