Modern computing has rapidly progressed with ultra-compact devices using the von Neumann architecture, which separates memory from processing. Yet, Computation in memory (CIM) is rectifying these separations for effic...
详细信息
The on-demand provision of computing resources as services is known as cloud computing. The distributed denial of service (DDoS) attack is a major security risk that affects cloud services. Because of the computationa...
详细信息
ISBN:
(纸本)9798350391558;9798350379990
The on-demand provision of computing resources as services is known as cloud computing. The distributed denial of service (DDoS) attack is a major security risk that affects cloud services. Because of the computational complexity that must be handled, detecting DDoS attacks is a very difficult operation for cloud computing. Hence, the research work focuses on developing an efficient classifier model using optimized long short-term memory (LSTM) based on the partial opposition-based elephant herding optimization (POEHO) method called POEHO-LSTM for detecting DDoS attacks. The experimental findings showed that, for two DDoS datasets-NSL-KDD and ISCXIDS-2012-the suggested POEHO-LSTM had produced exceptional results.
Nowadays the data center switches employ the on-chip shared buffer to absorb bursts and avoid packet loss during transient congestion. However, as the buffer-per-port-per-Gbps in production data centers decreases, it ...
详细信息
ISBN:
(纸本)9798350386066;9798350386059
Nowadays the data center switches employ the on-chip shared buffer to absorb bursts and avoid packet loss during transient congestion. However, as the buffer-per-port-per-Gbps in production data centers decreases, it becomes more challenging to provide efficient buffer management to meet the requirements of heterogeneous traffic. We observe that typical shared buffer management policies have two steps: first, they identify short flows arriving at ports and then allocate more buffer room for these ports. Unfortunately, the lack of isolation between long and short flows leads to increased queue buildup and even packet loss of short flows. To address this limitation, we propose (DT)-T-2, which uses different queue length thresholds for long and short flows. Specifically, we first design a compact data structure to distinguish between long and short flows. Then when two kinds of flows coexist at the same port, the threshold of long flows will decrease to absorb the bursty short flows. We implement (DT)-T-2 at a P4-programmable switch and large-scale simulations. The results demonstrate that (DT)-T-2 reduces both average and tail flow completion times (FCT) of short flows by up to 29% and 62% compared with the state-of-the-art policies, respectively.
In recent years, one-sided communication has emerged as an alternative to message-based communication to improve the scalability of distributed programs. Decoupling communication and synchronization in such programs a...
详细信息
ISBN:
(纸本)9783031506833;9783031506840
In recent years, one-sided communication has emerged as an alternative to message-based communication to improve the scalability of distributed programs. Decoupling communication and synchronization in such programs allows for more asynchronous execution of processes, but introduces new challenges to ensure program correctness and efficiency. The concept of memory access diagrams presented in this paper opens up a new analysis perspective to the programmer. Our approach visualizes the interaction of synchronous, asynchronous, and remote memory accesses. We present an interactive tool that can be used to perform a postmortem analysis of a distributed program execution. The tool supports hybrid parallel programs, shared MPI windows, and GASPI communication operations. In two application studies taken from the European aerospace industry we illustrate the usefulness of memory access diagrams for visualizing and understanding the logical causes of programming errors, performance flaws, and to find optimization opportunities.
The Industrial Edge computing (IEC) network has recently received considerable attention, where industrial devices offload their computation-intensive and delay-sensitive tasks to servers located at the network edge. ...
详细信息
ISBN:
(纸本)9798350386066;9798350386059
The Industrial Edge computing (IEC) network has recently received considerable attention, where industrial devices offload their computation-intensive and delay-sensitive tasks to servers located at the network edge. Task offloading scheduling is a fundamental problem in IEC networks to achieve satisfactory quality of service. Many prior efforts have been devoted to scheduling task offloading for networks with complete information, while the complete information is hard or even infeasible to acquire by the scheduler. Therefore, their performance degrades in IEC networks with incomplete information. Scheduling task offloading for IEC networks with incomplete information is urgent and presents great technical challenges. This paper proposes a group-centric task offloading framework tailored for IEC networks with incomplete information, and models the minimum delay scheduling problem as a Partially Observable Markov Decision Process. Then, the SGOS algorithm integrating the Long Short-Term memory with Soft Actor-Critic networks in reinforcement learning is proposed to devise online task offloading schedules for IEC networks with incomplete information. Extensive experimental results verify that the SGOS algorithm can achieve the best performance compared with baseline schemes in terms of major metrics, including convergence, delay, and workload balance.
Machine Learning as a Service (MLaaS) has paved the way for numerous applications for resource-limited clients, such as IoT/mobile users. However, it raises a great challenge for privacy, including both the data priva...
详细信息
ISBN:
(纸本)9798350386066;9798350386059
Machine Learning as a Service (MLaaS) has paved the way for numerous applications for resource-limited clients, such as IoT/mobile users. However, it raises a great challenge for privacy, including both the data privacy of clients and model privacy of the server. While there have been extensive studies on privacy-preserving MLaaS, a direct adoption of current frameworks leads to intractable efficiency bottleneck for MLaaS with resource constrained clients. In this paper, we focus on MLaaS with resource constrained clients and propose a novel privacy-preserving framework called SPOT to address a unique challenge, the memory constraint of such clients, such as IoT/mobile devices, which results in significant computation stalls at the server in privacy-preserving MLaaS. We develop 1) a novel structure patching scheme to enable independent computations for sequential inputs at the server to eliminate the computation stall, and 2) a patch overlap tweaking scheme to minimize overlapped data between adjacent patches and thus enable more efficient computation with flexible cryptographic parameters. SPOT demonstrates significant improvement on computation efficiency for MLaaS with IoT/mobile clients. Compared with the state-of-the-art framework for privacy-preserving MLaaS, SPOT achieves up to 2x memory utilization boost and a speedup up to 3x on computation time for modem neural networks such as ResNet and VGG.
We introduce a distributedmemory parallel algorithm for force-directed node embedding that places vertices of a graph into a low-dimensional vector space based on the interplay of attraction among neighboring vertice...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
We introduce a distributedmemory parallel algorithm for force-directed node embedding that places vertices of a graph into a low-dimensional vector space based on the interplay of attraction among neighboring vertices and repulsion among distant vertices. We develop our algorithms using two sparse matrix operations, SDDMM and SpMM. We propose a configurable pull -push -based communication strategy that optimizes memory usage and data transfers based on computing resources and asynchronous MPI communication to overlap communication and computation. Our algorithm scales up to 256 nodes on distributed supercomputers by surpassing the performance of state-of-the-art algorithms
Deploying Deep Learning (DL) models on edge devices presents several challenges due to the limited set of processing and memory resources, and the bandwidth constraints while ensuring performance and energy requiremen...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
Deploying Deep Learning (DL) models on edge devices presents several challenges due to the limited set of processing and memory resources, and the bandwidth constraints while ensuring performance and energy requirements. In-memorycomputing (IMC) represents an efficient way to accelerate the inference of data-intensive DL tasks on the edge. Recently, several analog, digital, and mixed digital-analog memory technologies emerged as promising solutions for IMC. Among them, digital SRAM IMC exhibits a deterministic behavior and compatibility with advanced technology scaling rules making it a viable path for integration with hardware accelerators. This work focuses on discussing the potentially powerful aspects of digital IMC (DIMC) on edge System-on-Chip (SoC) devices. The limitations and ()pen challenges of DIMC are also discussed.
The primary targets for improving efficiency for large-scale matrix factorization are reducing synchronization, addressing the overlap in communication and computation, and improving load balance. In recent years, til...
详细信息
ISBN:
(纸本)9798400708893
The primary targets for improving efficiency for large-scale matrix factorization are reducing synchronization, addressing the overlap in communication and computation, and improving load balance. In recent years, tiled algorithms with task parallelism in multicore shared memory systems have become well-established as efficient methods for conducting fine-grained computations on smaller tiles. Moreover, they provide flexible execution orders for a runtime system in many situations. However, traditional hybrid programs with MPI and OpenMP for distributedmemory systems use a fork-join model for multi-threads in each process, which leads to thread-parallel computing tasks interchange with sequential communication tasks. In this paper, we incorporate task parallelism and low-rank approximation into a hybrid task-based Cholesky factorization in a distributed environment and propose some low-rank variants. We evaluate the performance of our programs on both full-rank inputs and low-rank inputs and report the pros and cons of the proposed programs.
This paper presents a comprehensive study on optimizing resource allocation in cloud computing environments using an ensemble of machine learning techniques and optimization algorithms. We developed a multifaceted app...
详细信息
ISBN:
(纸本)9798350391961;9798350391954
This paper presents a comprehensive study on optimizing resource allocation in cloud computing environments using an ensemble of machine learning techniques and optimization algorithms. We developed a multifaceted approach, integrating Long Short-Term memory (LSTM) networks for forecasting resource demands, Particle Swarm Optimization (PSO) for initial resource allocation, Q-learning for dynamic resource adjustment, and Linear Regression (LR) for predicting energy consumption. Our LSTM model demonstrated high accuracy in demand forecasting, with detailed performance metrics indicating its effectiveness in diverse scenarios. The PSO algorithm significantly enhanced the efficiency of resource distribution, evidenced by a reduction in the number of utilized units. Q-learning contributed to the system's adaptability, optimizing resource allocation based on changing demands in real-time. The LR model accurately predicted energy consumption, aligning closely with observed data and highlighting the potential for energy-efficient cloud management.
暂无评论