this paper studies the loosely integration of application accelerators consisting of an array of tightly-coupled lightweight reconfigurable processors into a system-on-a-chip. In order to explore a multitude of design...
详细信息
ISBN:
(纸本)9781424449231
this paper studies the loosely integration of application accelerators consisting of an array of tightly-coupled lightweight reconfigurable processors into a system-on-a-chip. In order to explore a multitude of design variations a C++ simulation model of the accelerator has been integrated with a system-on-a-chip environment consisting of a general purpose processor, a DMA controller, an interrupt controller and a memory module. Dependent on the applications, different kinds of I/O buffers are designed around the processor array and the effects of the buffer size on the overall execution time are evaluated. the evaluations are based on new mathematical estimation models derived from the system and application constraints. the estimations are validated with experimental results with an error less than 1%. Exploring several designs points that using our architecture along with suitable buffer sizes, can improve the system execution time, one to two magnitudes for the selected algorithms.
In this paper, a pixel-parallel image sensor/processor architecture with a fine-grain massively parallel SIMD analogue processor array is overviewed and the latest VLSI implementation, SCAMP-3 vision chip, comprising ...
详细信息
Analysis of urban traffic data has obtained a great attention in recent years. In the study of urban traffic data processing, the batch computing based on historical data and the stream computing based on real-time da...
详细信息
ISBN:
(纸本)9783319654829;9783319654812
Analysis of urban traffic data has obtained a great attention in recent years. In the study of urban traffic data processing, the batch computing based on historical data and the stream computing based on real-time data are isolated, and the two computing frameworks are not synergized. therefore, a method of urban traffic data processing based on batch and stream collaborative computing is proposed. Batch computing has the advantage of high throughput, so it is more suitable for calculating the historical data of urban traffic and the results of stream computing deeply. Stream computing withthe advantage of low delay can be used to calculate the traffic data in real time, combined withthe results of batch computing, then the conclusion of urban traffic data processing are more comprehensive and accurate.
this paper describes the principles and architectures of a one and a two-dimensional optical content addressable memory (OCAM). these architectures are based on optical matrix multipliers which use free space intercon...
详细信息
作者:
Kim, HKorea Univ
Grad Sch Informat Secur Seoul 136701 South Korea
We present parallelalgorithms for the process of the biorthogonal wavelet transform(BWT). We have constructed processing elements (PEs) for the decomposition and reconstruction of the BWT to minimize computational op...
详细信息
ISBN:
(纸本)3540240136
We present parallelalgorithms for the process of the biorthogonal wavelet transform(BWT). We have constructed processing elements (PEs) for the decomposition and reconstruction of the BWT to minimize computational operations. they can be performed using only integer shift and superposition operations;therefore, they may be applied to the implementation of image compression standards based on BWT, such as JPEG2000.
this paper proposes a novel configuration data compression technique for coarse-grained reconfigurable architectures (CGRAs). the proposed technique is based on a multicast configuration technique called RoMultiC, whi...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
this paper proposes a novel configuration data compression technique for coarse-grained reconfigurable architectures (CGRAs). the proposed technique is based on a multicast configuration technique called RoMultiC, which reduces the configuration time by multicasting the same data to multiple PEs(processing Elements) with two bit-maps. Scheduling algorithms for an optimizing the order of multicasting have been proposed. In general, configuration data for CGRAs can be divided into some fields like machine code formats. the proposed scheme confines a part of fields for multicasting so that the possibility of multicasting more PEs can be increased. this paper analyzes algorithms to find a configuration pattern which maximizes the number of multicasted PEs. We implemented the proposed scheme to CMA (Cool Mega Array), a straight forward CGRA as a case study. Experimental results show that the proposed method achieves 40.0% smaller configuration for an image processing application at maximum. Furthermore, it achieves 35.6% reduction of the power consumption for the configuration with a negligible area overhead.
Time series motifs are an integral part of diverse data mining applications including classification, summarization and near-duplicate detection. these are used across wide variety of domains such as image processing,...
详细信息
ISBN:
(纸本)9783642152900
Time series motifs are an integral part of diverse data mining applications including classification, summarization and near-duplicate detection. these are used across wide variety of domains such as image processing, bioinformatics, medicine, extreme weather prediction, the analysis of web log and customer shopping sequences, the study of XML query access patterns, electroencephalograph interpretation and entomological telemetry data mining. Exact Motif discovery in soft real-time over 100K time series is a challenging problem. We present novel parallelalgorithms for soft real-time exact motif discovery on multi-core architectures. Experimental results on large scale P6 SMP system, using real life and synthetic time series data, demonstrate the scalability of our algorithms and their ability to discover motifs in soft real-time. To the best of our knowledge, this is the first such work on parallel scalable soft real-time exact motif discovery.
the exponential increase of the amount of data available in several domains and the need for processing such data makes problems become computationally intensive. Consequently, it is infeasible to carry out sequential...
详细信息
ISBN:
(纸本)9789897580277
the exponential increase of the amount of data available in several domains and the need for processing such data makes problems become computationally intensive. Consequently, it is infeasible to carry out sequential analysis, so the need for parallelprocessing. Over the last few years, the widespread deployment of multicore architectures, accelerators, grids, clusters, and other powerful architectures such as FPGAs and ASICs has encouraged researchers to write parallelalgorithms using available parallel computing paradigms to solve such problems. the major challenge now is to take advantage of these architectures irrespective of their heterogeneity. this is due to the fact that designing an execution model that can unify all computing resources is still very difficult. Moreover, scheduling tasks to run efficiently on heterogeneous architectures still needs a lot of research. Existing solutions tend to focus on individual architectures or deal with heterogeneity among CPUs and GPUs only, but in reality, often, heterogeneous systems exist. Up to now very cumbersome, manual adaption is required to take advantage of these heterogeneous architectures. the aim of this paper is to provide a proposal for a functional-level design of a multiagent-based framework to deal withthe heterogeneity of hardware architectures and parallel computing paradigms deployed to solve those problems. Bioinformatics will be selected as a case study.
Separated grid systems. are becoming the new information islands when more and more grid systems are deployed. Grid interoperation is a direction to solve that problem. this paper introduces the implementation of data...
详细信息
ISBN:
(纸本)9783540729044
Separated grid systems. are becoming the new information islands when more and more grid systems are deployed. Grid interoperation is a direction to solve that problem. this paper introduces the implementation of data interoperation between ChinaGrid and SRB. the data interoperation between them is divided into two parts: data access from SRB to ChinaGrid and from ChinaGrid to SRB. Also this paper considers the issues about performance optimization. We get a satisfied experiment result through the optimization measures.
暂无评论