multi- and many-core processors based on a network-on-chip (NoC) interconnect are pervasive in computing platforms ranging from server farms to embedded systems. Such complex systems often make wide use of third-party...
详细信息
ISBN:
(纸本)9798350377217;9798350377200
multi- and many-core processors based on a network-on-chip (NoC) interconnect are pervasive in computing platforms ranging from server farms to embedded systems. Such complex systems often make wide use of third-party intellectual property elements from untrusted organizations. This manuscript proposes a methodology that combines on-chip traffic monitoring, through the insertion of lightweight counters in the NoC routers, and on-chip analysis, through machine-learning techniques, into a blue-team approach that detects the execution of unintended applications with an average accuracy of 89% and limited overheads in terms of area, power, performance, and timing.
With the transformation towards industrial intelligence, multi-core processors are increasingly being applied in real-time networked control systems to ensure secure execution of sensing, computing and actuating tasks...
详细信息
ISBN:
(纸本)9798350354416;9798350354409
With the transformation towards industrial intelligence, multi-core processors are increasingly being applied in real-time networked control systems to ensure secure execution of sensing, computing and actuating tasks under time constraints. However, existing scheduling methods result in either low CPU utilization or many missed task deadlines in dynamic systems. In this paper, we propose a two-layer scheduling architecture to address this issue by fully exploring the complex dependency between real-time tasks. To be specific, the local layer determines task execution priorities considering both dependency between tasks and deadline constraints by utilizing a reinforcement learning approach. Moreover, to better utilize the parallel capabilities of multi-core processors and reduce temporal collisions, this paper minimizes the requested core count for the task set based on a greedy strategy. The global layer designs a scheduling algorithm based on the preempt method and provides schedulability analysis of multiple task sets. Experimental results validate the correctness of the proposed scheduling approach, and efficiency is demonstrated through comparisons with baseline method.
In order to help user terminal devices (UTDs) efficiently handle computation-intensive and delay-sensitive tasks, the use of multi-access edge computing (MEC) has been proposed. However, owing to the performance limit...
详细信息
To seek for an efficient distributed offloading solution to the multi-task multi-helper (MTMH) problem in the fog computing networks, we model it as a matching game between a set of task nodes (TNs) having task comput...
详细信息
ISBN:
(纸本)9798350393613
To seek for an efficient distributed offloading solution to the multi-task multi-helper (MTMH) problem in the fog computing networks, we model it as a matching game between a set of task nodes (TNs) having task computation needs and a set of helper nodes (HNs) having available computing resources. However, the uncertainty of computing resource availability of HNs as well as dynamics of QoS requirements of tasks result in the lack of preferences of TN side that mainly poses a critical challenge to obtain a stable and reliable matching outcome. To address this challenge, we apply a multi-armed bandit (MAB) learning using Thomson sampling (TS) mechanism to acquire better exploitation and exploration trade-off, allowing TNs to match with their corresponding HNs efficiently. Based on these, this paper proposes an efficient bandit learning based matching (BLM) for distributed task offloading in the fog computing networks. Extensive simulation results demonstrate the potential advantages of the TS-type algorithm over the epsilon-greedy and UCB based offloading algorithms.
As multi-coresystems continue to grow in complexity, Network-on-Chip (NoC) architectures have emerged as a scalable and efficient solution for managing on-chip communication. However, ensuring reliable communication ...
详细信息
Important modern applications such as machine learning, deep learning, graph processing, databases (and many others) are memory-bound. This creates a bottleneck caused by the movement of large amounts of data between ...
详细信息
ISBN:
(纸本)9798350373769;9798350373752
Important modern applications such as machine learning, deep learning, graph processing, databases (and many others) are memory-bound. This creates a bottleneck caused by the movement of large amounts of data between the main memory and the CPU. Processing-in-Memory (PiM) is currently viewed as a useful new paradigm to alleviate such bottlenecks by computing the data where it resides, i.e., in memory itself. Our goal is to analyze the potential of modern general-purpose PiM architectures to accelerate neural networks (NNs), which constantly consume high volumes of new data (i.e., low data reutilization) and are ideal for in-memory processing. We selected the UPMEM system as it is the first commercially available general-purpose PiM architecture. In this work, we implemented a multi-layer perceptron and evaluated the implementation in real use cases. We compared the PiM implementation with a sequential version running in an Intel Xeon Silver 4215 CPU. The UPMEM implementation achieved up to 260 Chi speedup for performing inference exploiting the available PiM memory.
When using FPGA clusters for multi-access Edge computing (MEC), some tasks need multiple FPGA boards, so it is necessary to allocate each task to several boards that are not currently in use on the FPGA cluster. In or...
详细信息
ISBN:
(纸本)9798350393613
When using FPGA clusters for multi-access Edge computing (MEC), some tasks need multiple FPGA boards, so it is necessary to allocate each task to several boards that are not currently in use on the FPGA cluster. In order to maintain constant communication bandwidth and latency, we should allocate them in contiguous areas of a group of FPGAs. The current task allocation method for our FPGA clusters searches for Maximum Empty Rectangles (MER) based on Scan Line Algorithm (SLA). However, in this algorithm, the allocation target is restricted to a rectangular shape, which makes the cluster utilization inefficient. In this paper, we devise a more flexible algorithm called Non-Rectangular (NR) allocation and attempt to improve allocation efficiency. NR allocation provides the hybrid method, which chooses column-major allocation and row-major allocation. By using this proposed hybrid method, we have made it possible to allocate non-rectangular areas while maintaining contiguous regions by considering the board's position after allocation. Simulation results show a 42% improvement in the total waiting time compared to the conventional method.
Managing cache coherence is essential for optimizing computing performance and energy efficiency in modern multi-/many-coresystems. Enabling cache coherence in scalable agile hardware architectures, particularly with...
详细信息
Layout Optimization (LO) is gaining increasing attention as competition among enterprises intensifies and the demand for cost reduction and efficiency improvement continues to grow. LO poses its peculiar challenges wi...
详细信息
ISBN:
(纸本)9798350377859;9798350377842
Layout Optimization (LO) is gaining increasing attention as competition among enterprises intensifies and the demand for cost reduction and efficiency improvement continues to grow. LO poses its peculiar challenges with many-dimensions, many-constraints and many-optima. Although many approaches have been proposed for various LO applications, they focus only on the diversity and convergence of the objective space, neglecting the diversity of solutions, which is detrimental to solving potential issues. Therefore, this paper proposes a constrained multimodal multiobjective optimization (CMMO) framework based on dual-population multi-task collaborative evolution. This framework assists in solving the workshop layout problem by creating an auxiliary task population and transferring knowledge to support the optimization process. Additionally, the dynamic narrowing of constraint boundaries and gradual expansion of the contrast neighborhood for the auxiliary tasks ensure a high degree of correlation with the main task, continuously providing supplementary evolutionary directions. Ultimately, compared to other constrained multimodal multiobjective algorithms and two classic genetic algorithms, the proposed algorithm proves to be more competitive in practical layout optimization problems.
The emergence of many-core processors presents significant opportunities for large-scale multithreading. Exploiting these intensive computing resources poses an urgent challenge for data processing systems. Although c...
详细信息
暂无评论