Image annotation generates a set of semantic labels that describe the contents of an input *** deep learning techniques have achieved significant success in many areas of image *** this paper,we present a multi-label ...
详细信息
Image annotation generates a set of semantic labels that describe the contents of an input *** deep learning techniques have achieved significant success in many areas of image *** this paper,we present a multi-label image annotation method that combines unsupervised object hypotheses generation and deep neural *** an image,object hypotheses are generated in an unsupervised *** we extract the image features for each hypothesis with a deep neural network *** combining the features of all hypotheses,we get the features of the entire ***,we calculate for each label the probability of that the label is correlated with the given *** can be trained in an end-to-end way using the standard backward propagation *** results on multiple benchmark datasets show that our method is better than the state-of-the-art ones.
Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form...
详细信息
Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form is very important for the research on HPC software and hardware technologies. To effectively evaluate the per- formance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation plat- form, called HPC-NetSim. HPC-NetSim uses application- driven workloads and inherits the characteristics of the de- tailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router's on/off states. We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.
On virtualization platforms, peak memory de- mand caused by hotspot applications often triggers page swapping in guest OS, causing performance degradation in- side and outside of this virtual machine (VM). Even thou...
详细信息
On virtualization platforms, peak memory de- mand caused by hotspot applications often triggers page swapping in guest OS, causing performance degradation in- side and outside of this virtual machine (VM). Even though host holds sufficient memory pages, guest OS is unable to utilize free pages in host directly due to the semantic gap between virtual machine monitor (MM) and guest operat- ing system (OS). Our work aims at utilizing the free memory scattered in multiple hosts in a virtualization environment to improve the performance of guest swapping in a transparent and implicit way. Based on the insightful analysis of behav- ioral characteristics of guest swapping, we design and im- plement a distributed and scalable framework HybridSwap. It dynamically constructs virtual swap pools using various policies, and builds up a synthetic swapping mechanism in a peer-to-peer way, which can adaptively choose different vir- tual swap pools. We implement the prototype of HybridSwap and evaluate it with some benchmarks in different scenar- ios. The evaluation results demonstrate that our solution has the ability to promote the guest swapping efficiency indeed and shows a double performance promotion in some cases. Even in the worst case, the system overhead brought by Hy- bridSwap is acceptable.
Convolutional neural networks (CNNs) are widely used in many computer vision applications. Previous FPGA implementations of CNNs are mainly based on the conventional convolutional algorithm. However, the high arithmet...
Convolutional neural networks (CNNs) are widely used in many computer vision applications. Previous FPGA implementations of CNNs are mainly based on the conventional convolutional algorithm. However, the high arithmetic complexity of conventional convolution algorithm for CNNs restricts the performance of accelerators and significantly increases the challenges of design. It has been proved that the Winograd algorithm for CNNs can effectively reduce the computational complexity. Although a few FPGA approaches based on the Winograd algorithm have been implemented, their works are lake of evaluation on the performance for different tile sizes of the Winograd algorithm. In this work, we focus on exploring the possibility of using the Winograd algorithm to accelerate CNNs on FPGA. First, we propose an accelerator architecture applying to both convolutional layers and fully connected layers. Second, we use high level synthesis tool to expediently implement our design. Finally, we evaluate our accelerator with different tile sizes in terms of resource utilization, performance and efficiency. On VUS440 platform, we achieve an average 943 GOPS for overall VGG16 under low resource utilization, which reaches higher efficiency than the state-of-the-art works on FPGAs.
The scale of global data center market has been explosive in recent years. As the market grows, the demand for fast provisioning of the virtual resources to support elas- tic, manageable, and economical computing over...
详细信息
The scale of global data center market has been explosive in recent years. As the market grows, the demand for fast provisioning of the virtual resources to support elas- tic, manageable, and economical computing over the cloud becomes high. Fast provisioning of large-scale virtual ma- chines (VMs), in particular, is critical to guarantee quality of service (QoS). In this paper, we systematically review the existing VM provisioning schemes and classify them in three main categories. We discuss the features and research status of each category, and introduce two recent solutions, VMThunder and VMThunder+, both of which can provision hundreds of VMs in seconds.
Image diffusion plays a fundamental role for the task of image denoising. Recently proposed trainable nonlinear reaction diffusion (TNRD) model defines a simple but very effective framework for image denoising. Howeve...
详细信息
In the Internet of Things, it is important to detect the various relations among objects for mining useful knowledge. Existing works on relation detection are based on centralized processing, which is not suitable for...
详细信息
In this paper, an improved algorithm is proposed for the reconstruction of singularity connectivity from the available pairwise connections during preprocessing phase. To evaluate the performance of our algorithm, an ...
详细信息
Bloom filters are frequently used to to check the membership of an item in a set. However, Bloom filters face a dilemma: the transmission bandwidth and the accuracy cannot be optimized simultaneously. This dilemma is ...
详细信息
暂无评论