Online storage and use of digital assets and applications are both aspects of cloud computing. Through a computer network, distributed information systems store and transmit data. Data and effort have both grown for s...
Online storage and use of digital assets and applications are both aspects of cloud computing. Through a computer network, distributed information systems store and transmit data. Data and effort have both grown for studying and keeping track of outcomes. Cloud-basedparallelprocessing techniques must be improved. Our research hastens the resolution of interactive composite jobs. In this innovative architecture, cell phones act as servers for distributedparallelprocessing and cloud computing. Users with restricted availability could find the update useful for completing challenging jobs. The recommended strategy is supported by cloud computing, which may expand to infinity servers (webserver). Servers between clients and servers are automatically registered by web servers. Cloud storage and client-side matrices (web server). To ensure timely completion, the server distributes matrix algebra assignments to all subscribed workstations. Before transmitting results to the user, the server uploads them to the cloud. A large number of sets of two-dimensional matrices were requested from the user, and the responses were compared to the total number of servers that were available.
The proceedings contain 24 papers. The special focus in this conference is on parallel and distributedprocessing Techniques. The topics include: parallel N-Body Performance Comparison: Julia, Rust, and More;REFT...
ISBN:
(纸本)9783031856372
The proceedings contain 24 papers. The special focus in this conference is on parallel and distributedprocessing Techniques. The topics include: parallel N-Body Performance Comparison: Julia, Rust, and More;REFT: Resource-Efficient Federated Training Framework for Heterogeneous and Resource-Constrained Environments;An Efficient Data Provenance Collection Framework for HPC I/O Workloads;using Minicasts for Efficient Asynchronous Causal Unicast and Byzantine Tolerance;a Comparative Study of Two Matrix Multiplication Algorithms Under Current Hardware Architectures;Is Manual Code Optimization Still Required to Mitigate GPU Thread Divergence? Applying a Flattening Technique to Observe Performance;towards Automatic, Predictable and High-Performance parallel Code Generation;Attack Graph Generation on HPC Clusters;analyzing the Influence of File Formats on I/O Patterns in Deep Learning;inference of Cell–Cell Interactions Through Spatial Transcriptomics Data Using Graph Convolutional Neural networks;natural Product-Like Compound Generation with Chemical Language Models;improved Early–Modern Japanese Printed Character Recognition Rate with Generated Characters;Improved Method for Similar Music Recommendation Using Spotify API;Reconfigurable Virtual Accelerator (ReVA) for Large-Scale Acceleration Circuits;Building Simulation Environment of Reconfigurable Virtual Accelerator (ReVA);vector Register Sharing Mechanism for High Performance Hardware Acceleration;Efficient Compute Resource Sharing of RISC-V Packed-SIMD Using Simultaneous Multi-threading;introducing Competitive Mechanism to Differential Evolution for Numerical Optimization;hyper-heuristic Differential Evolution with Novel Boundary Repair for Numerical Optimization;jump Like a Frog: Optimization of Renewable Energy Prediction in Smart Gird based on Ultra Long Term network;vision Transformer-based Meta Loss Landscape Exploration with Actor-Critic Method;Fast Computation Method for Stopping Condition of Range Restricted
With the continuous development of smart cars, many high-precision cameras have been applied to cars, which means that more videos will be processed through the vehicle computer network, which also leads to the proble...
With the continuous development of smart cars, many high-precision cameras have been applied to cars, which means that more videos will be processed through the vehicle computer network, which also leads to the problem of information leakage. To address these issues, a video desensitization processing model is proposed, which desensitizes the license plate and facial information in high-definition videos, maximizing privacy protection without losing video clarity and vehicle recognition accuracy. This model is based on YOLOv7-tiny for improvement and upgrading. GSConv is used to replace the original convolution to reduce model parameters and improve running speed. The original neck structure is improved, and a GSN framework structure is proposed to replace YOLOv7-tiny's original ELAN structure, further improving model accuracy.
Artificial intelligence (AI) model inference performance in browsers is constrained, and transmitting data to the server consumes substantial transfer time by cloud computing. In this paper, we investigate the status ...
Artificial intelligence (AI) model inference performance in browsers is constrained, and transmitting data to the server consumes substantial transfer time by cloud computing. In this paper, we investigate the status quo of cloud and browser processing and explore model computation partitioning methods. Our study is rooted in WebGPU and employs the *** framework, encompassing seven AI models spanning computer vision, natural language processing, and automatic speech recognition domains. Leveraging the characteristics of neural network layers, we find a significant performance boost through a method that partitions AI models at layer granularity. We design a system called WebInf to partition AI models at layer granularity between the browser and server for faster inferencing-based adaptive model partitioning. WebInf supports diverse hardware, wireless networks, neural network structures, servers, and adaptive partitioning models for optimal inference performance. We evaluate WebInf on two laptops and servers, demonstrating that WebInf yields inference time improvements of 30% and 52%, respectively, when compared to separate inference execution in servers and browsers. The improvements can even peak at 33% and 69% respectively.
Traditional kernel networkprocessing suffers from high delay and overhead, which has become the bottleneck of high-speed networks. A natural method to accelerate packet processing is to bypass the kernel network stac...
详细信息
Traditional kernel networkprocessing suffers from high delay and overhead, which has become the bottleneck of high-speed networks. A natural method to accelerate packet processing is to bypass the kernel network stack and process packets in user space directly, $e.g$., DPDK. However, due to many network functions are implemented in the kernel network stack, bypassing the stack means that we need to redesign the required functions elsewhere, leading to poor compatibility. One promising technology to address this problem is called eXpress Data Path (XDP), which can support high-performance packet processing while preserving the kernel stack. However, existing solutions mainly run XDP in software mode, resulting in relatively poor packet processing performance. Fortunately, with the development of programmable hardware, running XDP in hardware mode is a more promising approach. Thus, in this paper, we design and implement OXDP, the first-of-its-kind work on accelerating packet processing by offloading XDP to SmartNICs. Since today’s SmartNICs are still subject to some limitations regarding the rigid runtime environment, it is nontrivial to offload XDP to SmartNICs. To address this issue, OXDP performs best-effort offloading based on the primitive packet operations, thus maximizing the use of SmartNIC’s resources. Specifically, OXDP splits the forwarding function into two parts, one part offloading on SmartNIC with hardware XDP and the other part deploying on host. We evaluate the efficiency of OXDP with comprehensive experiments. Evaluation results show that the forwarding rate of OXDP can reach 18.7 Mpps, which improves $30 \times$ compared with the single-core performance of software XDP.
Convolutional neural network (CNN) has attracted increasing attention and been widely used in imaging processing, bioinformatics and so on. As the cloud computing and multiparty computing are booming, the training and...
Convolutional neural network (CNN) has attracted increasing attention and been widely used in imaging processing, bioinformatics and so on. As the cloud computing and multiparty computing are booming, the training and inference data of convolutional neural network often comes from diverse users. These users tend to jointly perform the computation but reluctantly share original data with others. Multi-key fully homomorphic encryption (MKFHE) supports homomorphic computation on ciphertexts encrypted with different keys, which is especially suitable for this scenario. In this paper, we firstly propose secure convolution, matrix multiplication, comparison and maximum protocols based on MKFHE. Then we design the secure CNN training and inference framework, outsourcing almost all computations to cloud server. To improve the efficiency, we use key switching technique for ciphertext transformation. We prove that the proposed frameworks are secure and feasible. The theoretical and experimental analysis show that our framework achieves the trade-off between security, efficiency and scalability.
The proceedings contain 117 papers. The topics discussed include: a novel iBeacon deployment scheme for indoor pedestrian positioning;near-optimal resource allocation and virtual network function placement at network ...
ISBN:
(纸本)9781665408783
The proceedings contain 117 papers. The topics discussed include: a novel iBeacon deployment scheme for indoor pedestrian positioning;near-optimal resource allocation and virtual network function placement at network edges;predicting downside in stock market using knowledge and news data;two-layer traffic signal optimization: a EDGE-assisted pressure balance approach based on cooperative game;passenger payment willingness prediction by static and dynamic multi-dimensional ticket attributes fusion;towards network-accelerated ML-baseddistributed computer vision systems;ATO-EDGE: adaptive task offloading for deep learning in resource-constrained edge computing systems;anti-replay: a fast and lightweight voice replay attack detection system;an electromagnetic covert channel based on neural network architecture;choosing appropriate AI-enabled edge devices, not the costly ones;and BRNN-GAN: generative adversarial networks with bi-directional recurrent neural networks for multivariate time series imputation.
Neural Architecture Search (NAS), the process of automatic network architecture design, has enabled remarkable progress over the last years on Computer Vision tasks. In this paper, we propose a novel and efficient NAS...
详细信息
With the large-scale access of distributed photovoltaic, controllable load, and energy storage devices to the low-voltage distribution network, the requirements for the transmission quality and processing efficiency o...
With the large-scale access of distributed photovoltaic, controllable load, and energy storage devices to the low-voltage distribution network, the requirements for the transmission quality and processing efficiency of power line communication data are rapidly increasing. However, large access of high proportion power electronic components will introduce color noise, which will have a great impact on the channel environment of power line communication. This paper first constructs a multi-layer hierarchical power line communication networking architecture. Then, a reliability dynamic perception-based optimal subcarrier selection method is proposed to meet the high-speed transmission requirements of renewable power system services. The simulation results show that the algorithm has excellent performance in probability of selecting the optimal subcarrier versus iteration and data transmission efficiency.
This study focuses on implementing a real-time control system for a particle accelerator facility that performs high energy physics experiments. A critical operating parameter in this facility is beam loss, which is t...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
This study focuses on implementing a real-time control system for a particle accelerator facility that performs high energy physics experiments. A critical operating parameter in this facility is beam loss, which is the fraction of particles deviating from the accelerated proton beam into a cascade of secondary particles. Accelerators employ a large number of sensors to monitor beam loss. The data from these sensors is monitored by human operators who predict the relative contribution of different sub-systems to the beam loss. Using this information, they engage control interventions. In this paper, we present a controller to track this phenomenon in real-time using edge-Machine Learning (ML) and support control with low latency and high accuracy. We implemented this system on an Intel Arria 10 SoC, with optimizations at the algorithm, high-level synthesis, and interface levels to improve latency and resource usage. Our design implements a neural network, which can predict the main source of beam loss (between two possible causes) at speeds up to 575 frames per second (fps) (average latency of 1.74ms). The practical deployed system is required to operate at 320 fps, with a 3ms latency requirement, which has been met by our design successfully.
暂无评论