Accelerated architectures such as GPUs (Graphics processing Units) and MICs (Many Integrated Cores) have been proven to increase the performance of many algorithms compared to their CPU counterparts and are widely ava...
详细信息
ISBN:
(纸本)9780769556970
Accelerated architectures such as GPUs (Graphics processing Units) and MICs (Many Integrated Cores) have been proven to increase the performance of many algorithms compared to their CPU counterparts and are widely available in local, campus-wide and national infrastructures, however, their utilization is not following the same pace as their deployment. Reasons for the underutilization lay partly on the software side with proprietary and complex interfaces for development and usage. A common API providing an extra layer to abstract the differences and specific characteristics of those architectures would deliver a far more portable interface for application developers. this cloud challenge proposal presents such an API that addresses these issues using a container-based approach. the resulting environment provides Docker-based containers for deploying accelerator libraries, such as CUDA Toolkit, OpenCL and OpenACC, onto a wide variety of different platforms and operating systems. By leveraging the container approach, we can overlay accelerator libraries onto the host without needing to be concerned about the intricacies of underlying operating system of the host. Docker therefore provides the advantage of being easily applicable on diverse architectures, virtualizing the necessary environment and including libraries as well as applications in a standardized way. the novelty of our approach is the extra layer for utilization and device discovery in this layer improving the usability and uniform development of accelerated methods with direct access to resources.
Quantum computer simulation (QCS) provides an effective platform for the development and validation of quantum algorithms. the exponential runtime overhead limits the simulation scale on classical computers which make...
详细信息
R is a widely-used statistical programming language in the data science community. However, in the big data era, R faces the challenges from large scale data analysis tasks. It lacks the ability of distributed linear ...
详细信息
Image super resolution ( SR) based on self-similarity has achieved impressive results by exploiting the local similarity across various scaled images. However, it is time consuming and far from meeting practical appli...
详细信息
ISBN:
(纸本)9781467391528
Image super resolution ( SR) based on self-similarity has achieved impressive results by exploiting the local similarity across various scaled images. However, it is time consuming and far from meeting practical applications' requirements and it is prone to unnatural visual artifacts such as facets and enhanced noise. In this paper, withthe powerful general purposeGPU, an accelerated parallel implementation based on NVIDIA CUDA architecture has been proposed. then denoising is combined and implemented together during the SR processing using the patch matching information of similarity, and it makes the images clearer and cleaner. Also, subpixel accuracy of interpolation and searching is applied to improve the similarity matching process, which make the upscaled images more natural-looking. Experimental results demonstrate that the proposed method achieves an impressive speedup rate and produces improved visual quality images in comparison with other state-of-the-art SR.
We address the task of parsing semanticallyindeterminate expressions, for which several correct structures exist that do notlead to differences in meaning. We present a novel non-deterministic structure transfermethod...
详细信息
We address the task of parsing semanticallyindeterminate expressions, for which several correct structures exist that do notlead to differences in meaning. We present a novel non-deterministic structure transfermethod that accumulates all structural information based on cross-lingual worddistance derived from parallel corpora. Our system's output is a ranked list oftrees. To evaluate our system, we adopted common IR metrics. We show that oursystem outperforms previous cross-lingual structure transfer methods significantly. Inaddition, we illustrate that tree accumulation can be used to combine partial evidenceacross languages to form a single structure, thereby making use of sparseparallel data in an optimal way.
GPU is the mainstream co-processor computers of heterogeneous architecture. parallel graph algorithms are fundamental for many data-driven applications to be solved on heterogeneous clusters. SSSP (Single Source Short...
详细信息
Image processingalgorithms are widely used in the automotive field for ADAS (Advanced Driver Assistance System) purposes. To embed these algorithms, semiconductor companies offer heterogeneous architectures which are...
详细信息
Image processingalgorithms are widely used in the automotive field for ADAS (Advanced Driver Assistance System) purposes. To embed these algorithms, semiconductor companies offer heterogeneous architectures which are composed of different processing units, often with massively parallel computing unit. However, embedding complex algorithms on these So Cs (System on Chip) remains a difficult task due to heterogeneity, it is not easy to decide how to allocate parts of a given algorithm on processing units of a given SoC. In order to help automotive industry in embedding algorithms on heterogeneous architectures, we propose a novel approach to predict performances of image processingalgorithms on different computing units of a given heterogeneous SoC. Our methodology is able to predict a more or less wide interval of execution time with a degree of confidence using only high level description of algorithms to embed, and a few characteristics of computing units.
Today, millions of legacy programs are awaiting their parallelization. For this reason, the automatic discovery of parallelism in sequential programs is now receiving considerable attention. However, past efforts main...
详细信息
Floating point computing ability is an important concern in high performance scientific application and engineering computing. Although as a fundamental operation, floating point division (or reciprocal) has long been...
详细信息
the microscopic information processing machinery of biological cells provides inspiration for the field of molecular computation, and for the use of synthetic DNA to store and process information and instructions. A s...
详细信息
ISBN:
(纸本)9783319231082;9783319231075
the microscopic information processing machinery of biological cells provides inspiration for the field of molecular computation, and for the use of synthetic DNA to store and process information and instructions. A single microlitre of solution can contain billions of distinct DNA sequences and consequently DNA computation offers huge potential for parallelprocessing. However, conventional data readout systems are complex, and the methods used are not well-suited for combination with mainstream computer circuits. Immobilisation of DNA machines on surfaces may allow integration of molecular devices with traditional electronics, facilitating data readout and enabling low-power massively parallelprocessing. Here we outline a general framework for hybrid bio-electronic systems and proceed to describe the results of our preliminary experiments on dynamic DNA structures immobilised on a surface, performed using QCM-D (quartz crystal microbalance with dissipation monitoring), which involves the use of acoustic waves to probe a molecular layer on a gold-coated quartz sensor.
暂无评论