In key-value storage scenarios where storage space is at a premium, our focus is on a class of solutions that only store the value, which is highly space-efficient. While these solutions have proven their worth in dis...
详细信息
ISBN:
(数字)9798350317152
ISBN:
(纸本)9798350317169
In key-value storage scenarios where storage space is at a premium, our focus is on a class of solutions that only store the value, which is highly space-efficient. While these solutions have proven their worth in distributed storage, networking, and bioinformatics, they still face two significant issues: one is that their space cost could be further reduced; the other is their are vulnerable to update failures, which can necessitate a complete table reconstruction. To address these issues, we introduce VisionEmbedder, a compact key-value embedder with constant-time lookup, fast dynamic updates, and a near-zero risk of reconstruction. VisionEmbedder cuts down the storage requirement from 2.2L bits to just 1.6L bits per key-value pair with an L-bit value, and it significantly reduces the chance of update failures by a factor of n, where
$n$
is the number of keys (for instance, 1 million or more). The compromise with VisionEmbedder comes with a minor reduction in query throughput on certain data sizes. The enhancements offered by VisionEmbedder have been theoretically validated and are effective across any dataset. Additionally, we have implemented VisionEmbedder on both FPGA and CPU platforms, with all codes made available as open-source.
In a Loss of Coolant Accident (LOCA), reactor core temperatures can rise rapidly, leading to potential fuel damage and radioactive material release. This research presents a groundbreaking method that combines the pow...
详细信息
Understanding program behavior is crucial in computer architecture research, but the growing size of benchmarks makes analyzing and simulating entire programs increasingly challenging. In practice, researchers often s...
详细信息
ISBN:
(纸本)9798350342543
Understanding program behavior is crucial in computer architecture research, but the growing size of benchmarks makes analyzing and simulating entire programs increasingly challenging. In practice, researchers often select representative program intervals for analysis and testing. These intervals are different sections of continuous execution of a program. SimPoint is a well-known method for selecting representative intervals using hardware-independent information. However, when focusing on a specific microarchitecture study, it is desirable to select intervals that are more relevant to that study. For instance, intervals with more branch mispredictions are more appropriate for branch prediction studies. We refer to these intervals as "tailored intervals" for branch prediction *** paper presents a Multi-level Behavior Analysis guided Program Interval Selection (MBAPIS) for selecting tailored intervals. For a given microarchitecture study, the first level of MBAPIS uses hardware performance counters to prioritize selecting the intervals that exhibit clearer microarchitectural characteristics relevant to that study. The second level analyzes the processor performance bottlenecks to further select the intervals where the concerned microarchitecture design more strongly impacts performance. Finally, MBAPIS performs clustering analysis with the basic block information of each interval selected by the first two levels, and selects the representative intervals among them while preserving the diverse software behavior. Additionally, we present a general and extensible interval-replaying design to accurately re-execute selected *** SPEC CPU2006 and CPU2017 benchmarks are used for evaluation. The results demonstrate that MBAPIS can select representative and tailored intervals for two typical microarchitecture studies and deliver accurate estimates of the concerned hardware events for all tailored intervals in each benchmark, with an average error rate of le
General Matrix Multiplication (GEMM) is a critical computational operation in scientific computing and machine learning domains. While traditional GEMM performs well on large matrices, it is inefficient in terms of da...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
General Matrix Multiplication (GEMM) is a critical computational operation in scientific computing and machine learning domains. While traditional GEMM performs well on large matrices, it is inefficient in terms of data transfer and computation for small matrices. Many High-Performance Computing (HPC) tasks can be decomposed into large batches of small matrix multiplication operations. Multi-core Digital Signal Processors (DSPs) are commonly used to accelerate high-performance computing. We present a design for batched fusion small matrix multiplication (BFMM) tailored for multi-core DSP architecture. To address the inefficiencies and redundancy in storage and computational operations associated with batch small matrix multiplications, we designed several strategies. We design a matrix fusion concatenation strategy, an access coordination mechanism, and a mechanism for fragment aggregation. BFMM supports an efficient K-dimension multi-core parallelization strategy. The parameter constraint model makes BFMM highly portable. BFMM also includes a performance evaluation model that facilitates assessment and verification. Experimental results demonstrate that, compared to traditional GEMM (TGEMM) on multi-core DSP and traditional GEMM with concatenated data access (TGEMM Op), BFMM exhibits superior performance. For large batches of small matrices, our design achieves 1.21x to 18x higher performance than TGEMM Op on single-core DSP, while on multi-core DSP, it outperforms TGEMM Op by 1.14x to 18.1x.
Traditional unlearnable strategies have been proposed to prevent unauthorized users from training on the 2D image data. With more 3D point cloud data containing sensitivity information, unauthorized usage of this new ...
Basic recursive summation and common dot product algorithm have a backward error bound that grows linearly with the vector dimension. Blanchard [1] proposed a class of fast and accurate summation and dot product algor...
详细信息
With the increasing adoption of graph neural networks (GNNs) in the graph-based deep learning community, various graph programming frameworks and models have been developed to improve the productivity of GNNs. The cur...
详细信息
ISBN:
(纸本)9781665443326
With the increasing adoption of graph neural networks (GNNs) in the graph-based deep learning community, various graph programming frameworks and models have been developed to improve the productivity of GNNs. The current GNN frameworks choose GPU as an essential tool to accelerate GNN training. However, it is still challenging to train GNNs on large graphs with limited GPU memory. Unlike traditional neural networks, generating mini-batch data by sampling in GNNs requires some complicated tasks such as traversing the graph to select neighboring nodes and gathering their features. This process takes up most of the training and we find the main bottleneck comes from transferring nodes features from CPU to GPU through limited bandwidth. In this paper, We propose a method Reusing Batch Data for the problem of data transmission. This method utilizes the similarity between adjacent mini-batches to reduce repeated data transmission from CPU to GPU. Furthermore, to reduce the overhead introduced by this method, we design a fast algorithm based on GPU to detect repeated nodes’ data and achieve shorter additional computation time. Evaluations on three representative GNN models show that our method can reduce transmission time by up to 60% and speed the end-to-end GNN training by up to 1.79× over the state-ofthe-art baselines. Besides, Reusing Batch Data can effectively save GPU memory footprint by about 19% to 40% while still reducing the training time compared to the static cache strategy.
In this paper, we present OpenMedIA, an open-source toolbox library containing a rich set of deep learning methods for medical image analysis under heterogeneous Artificial Intelligence (AI) computing platforms. Vario...
详细信息
Segment Anything Model (SAM) has recently gained much attention for its outstanding generalization to unseen data and tasks. Despite its promising prospect, the vulnerabilities of SAM, especially to universal adversar...
暂无评论