With the rapid development of the tourism industry, traditional tourism methods are undergoing significant transformation, and online tourism is gradually becoming a new highlight in the market. However, faced with th...
详细信息
With the continuous development of digital image processing algorithms, its application scenarios have been integrated from the simple research of a single image and a single algorithm to a multi-algorithm fusion anal...
详细信息
Multivariate time series anomaly detection (MTAD) poses a challenge due to temporal and feature dependencies. The critical aspects of enhancing the detection performance lie in accurately capturing the dependencies be...
详细信息
Self-supervised time series anomaly detection (TSAD) demonstrates remarkable performance improvement by extracting high-level data semantics through proxy tasks. Nonetheless, most existing self-supervised TSAD techniq...
详细信息
B-mode ultrasound tongue imaging is a non-invasive and real-time method for visualizing vocal tract deformation. However, accurately extracting the tongue's surface contour remains a significant challenge due to t...
详细信息
DIMM-based near-memory processing (NMP) architectures address the 'memory wall' problem by incorporating near-memory accelerators (NMAs) into main memory devices for high memory bandwidth and low energy consum...
详细信息
In the era of big data, efficiently processing and retrieving insights from unstructured data presents a critical challenge. This paper introduces a scalable leader-worker distributed data pipeline designed to handle ...
详细信息
Near DRAM processing (NDP) architectures have emerged to be a promising solution for commercializing in-memory computing and addressing the 'memory wall' problem, especially for the memory-intensive machine le...
详细信息
ISBN:
(纸本)9798331506476
Near DRAM processing (NDP) architectures have emerged to be a promising solution for commercializing in-memory computing and addressing the 'memory wall' problem, especially for the memory-intensive machine learning (ML) workloads. In NDP architectures, the processing Units (PUs) are distributed next to different memory units to exploit the high internal bandwidth. Therefore, in order to fully utilize the bandwidth advantage of NDP architectures for ML applications, meticulous evaluations and optimizations of data placement in DRAM and workload scheduling among different PUs are required. However, existing simulation and compilation tools face two insuperable obstacles to achieving these targets. On the one hand, tools for traditional von Neumann architectures only focus on the data access behaviors between the host and DRAM and treat DRAM as a whole part, which cannot support NDP architectures with multiple independent processing and memory units working simultaneously. On the other hand, existing NDP simulators and compilers are designed for specific DRAM technology and NDP architecture, lacking compatibility for various NDP architectures. In order to overcome these challenges and optimize data mapping and workload scheduling for different NDP architectures, we propose UniNDP, a unified NDP compilation and simulation tool for ML applications. Firstly, we propose a unified tree-based NDP hardware abstraction and the corresponding instruction set, enabling the support for various NDP architectures based on different DRAM technologies. Secondly, we design a cycle-accurate and instruction-driven NDP simulator to evaluate hardware performance by accurately tracking the working status of memory elements and PUs. The accurate simulation can provide effective guidance for compilation. Thirdly, we design an NDP compiler that optimizes data partition, mapping, and workload scheduling in different DRAM hierarchies. Furthermore, to enhance the compilation efficiency, we propo
暂无评论