Underwater target recognition is one of the most challenging tasks in underwater signal processing. Previous deep learning methods have relied on fusing more acoustic features, ignoring the rich information contained ...
详细信息
In recent years, withthe development of machine learning, plenty of personal data have been utilized in the training process of the models which incurs severe privacy leakage in the field. Current regulations mandate...
详细信息
Graph analysis now percolates society with applications ranging from advertising and transportation to medical research. the structure of graphs is becoming more complex every day while they are getting larger. the in...
详细信息
ISBN:
(纸本)9781450384414
Graph analysis now percolates society with applications ranging from advertising and transportation to medical research. the structure of graphs is becoming more complex every day while they are getting larger. the increasing size of graph networks has made many of the classical algorithms reasonably slow. Fortunately, CPU architectures have evolved to adjust to new and more complex problems in terms of core-level parallelism and vector-level parallelism (SIMD-level). In this paper, we are exploring how the modern vector architecture of CPUs can help with community detection, partitioning, and coloring kernels by studying two representatives algorithms. We consider the Intel SkylakeX and Cascade Lake architectures, which support gather and scatter instructions on 512-bit vectors. the existing vectorized graph algorithms of classic graph problems, such as BFS and PageRank, do not apply well to community detection;we show the support of gather and scatter are necessary. In particular for the implementation of the reduce-scatter patterns. We evaluate the performances achieved on the two architectures and conclude that good hardware support for scatter instructions is necessary to fully leverage the vector processing for graph partitioning problems.
Stencil computation is widely adopted in scientific applications as one of the most significant computation patterns. Although there are various optimizations proposed to accelerate the stencil computation, the low-or...
详细信息
Amidst global population growth and escalating food demands, real-time agricultural monitoring is crucial for ensuring food security. During the initial stages of crop growth, however, it faces significant challenges ...
详细信息
During recent years, various hardware platforms were developed, each one suitable for use in different kind of applications. Platforms based on FPGAs, DSPs, GPUs, Single Board Computers, microcontrollers extend proces...
详细信息
ISBN:
(数字)9781665467179
ISBN:
(纸本)9781665467179
During recent years, various hardware platforms were developed, each one suitable for use in different kind of applications. Platforms based on FPGAs, DSPs, GPUs, Single Board Computers, microcontrollers extend processing capabilities and functionality in comparison with traditional personal computers based on a single CPU. Furthermore, co-design combines advantages from different types of processing units, rendering such architectures more attractive to researchers. In this paper, we achieve acceleration of image processingalgorithms using a hardware platform based on a Raspberry Pi Single Board Computer and a custom designed FPGA HAT (Hardware Attached on Top) for RPi. the FPGA HAT consists of a Cyclone 10LP device the FPGA undertakes a computationally demanding load such as robotic vision algorithms exploiting parallelism, while the RPi can apply higher level operations such as running ROS (Robot Operating System). In order to overcome bottleneck in exchanging data between RPi and FPGA, a 16-bit parallel customized protocol was developed from scratch. the achieved transfer rate was about 50 Mbytes/sec when multi threaded software was implemented for the RPi. An image edge detector was implemented in order to verify the system performance. When only the RPi was used the processing rate was 48fps for images with resolution 512x512 pixels. RPi and FPGA co-design achieved processing rate 170fps for the same resolution images, which means an acceleration of about 350%. the proposed system was also evaluated in terms of power consumption.
Speaker verification is an essential task in speech processing. Implementation this task based on convolutional neural networks. Several key metrics were evaluated, including equal error rate and precision top-K, and ...
详细信息
Data prefetching is a widely used technique to alleviate "memory wall"problem by fetching the data that may be touched in the near future in advance. Generally, data prefetching is classified into hardware p...
详细信息
Withthe rapid development of neural networks and deep learning, speech synthesis technology has been significantly improved. the end-to-end speech synthesis systems based on deep learning have been able to synthesize...
详细信息
this paper explores the problem of boundary data classification ambiguity that arises when machine learning techniques are applied in the field of intrusion detection. the features and attributes of the boundary data ...
详细信息
暂无评论