In this paper we describe the parallelization of two nearest neighbour classification algorithms. Nearest neighbour methods are well-known machine learning techniques. they have been successfully applied to Text Categ...
详细信息
ISBN:
(纸本)9783540744658
In this paper we describe the parallelization of two nearest neighbour classification algorithms. Nearest neighbour methods are well-known machine learning techniques. they have been successfully applied to Text Categorization task. Based on standard parallel techniques we propose two versions of each algorithm on message passing architectures. We also include experimental results on a cluster of personal computers using a large text collection. Our algorithms attempt to balance the load among the processors, they are portable, and obtain very good speedups and scalability.
Highly parallel architecture for local histogram equalisation is studed. three different kinds of approaches to the parallel architecture are regarded in this paper. 1) Module-level which focuses on processing as many...
详细信息
ISBN:
(纸本)9781424411610
Highly parallel architecture for local histogram equalisation is studed. three different kinds of approaches to the parallel architecture are regarded in this paper. 1) Module-level which focuses on processing as many data as possible within a single module. 2) 1D - Several modules conducting simultaneously histogram equalization on partially overlapping (either horizontally or vertically) frames. 3) 2D - utilizes the same approach as ID but in two dimensions. Beside above-mentioned solutions, differencial processing of overlapping frames was also considered. At the end of this paper the optimal proportion of the above mention solutions are studied and implementation results given.
Many approaches have been proposed to improve efficiency of interrupt handling, most of which aim at single processor systems. Traditional model of interrupt management has been used for several decades in parallel co...
详细信息
ISBN:
(纸本)9783540729044
Many approaches have been proposed to improve efficiency of interrupt handling, most of which aim at single processor systems. Traditional model of interrupt management has been used for several decades in parallel computing environment. It can work well in most occasions, even in real-time environments. But it is often incapable to incorporate reliability and the temporal predictability demanded on hard real-time systems. Many solutions, such as In-line interrupt handling and Predictable interrupt management, all have special applying fields. In this paper we propose an algorithm that could schedule interrupts in terms of their deadlines for multiprocessor systems. Hard priorities of IRQs are still left to hardware, we only manager those who can get noticed by the kernel. Each interrupt will be scheduled only before its first execution according to their arrival time and deadlines so that it is called lazy Earliest-Deadline-First algorithm. the scheme tries to make as many as possible ISRs finish their work within the time limit. Finally we did some experiments using task simulation, which proved there was a big improvement in interrupts management.
In this paper we present a new approach for generating high-speed optimized event-driven instruction set level simulators for adaptive massively parallel processor architectures. the simulator generator is part of a m...
详细信息
ISBN:
(纸本)9781450378345
In this paper we present a new approach for generating high-speed optimized event-driven instruction set level simulators for adaptive massively parallel processor architectures. the simulator generator is part of a methodology for the systematic mapping, evaluation, and exploration of massively parallel processor architecturesthat are designed for special purpose applications in the world of embedded computers. the generation of high-speed cycle-accurate simulators is of utmost importance here, because they are directly used both for parallel processor architecture debugging and evaluation purposes, as well as during time-consuming architecture/compiler co-exploration. We developed a modeling environment which automatically generates a C++ simulation model either from a graphical input or directly from an XML-based architecture description. Here, we focus on the underlying event-driven simulation model and present our modeling environment, in particular the features of the graphical parallel processor architecture editor and the automatic instruction set level simulator generator. Finally, in a case-study, we demonstrate the pertinence of our approach by simulating different processor arrays. the superior performance of the generated simulators compared to existing simulators and simulator generation approaches is shown.
the co-allocation architecture was developed to enable the parallel download of datasets/servers from selected replica servers, and the bandwidth performance is the main factor that affects the internet transfer betwe...
详细信息
ISBN:
(纸本)9783540729044
the co-allocation architecture was developed to enable the parallel download of datasets/servers from selected replica servers, and the bandwidth performance is the main factor that affects the internet transfer between the client and the server. therefore, it is important to reduce the difference of finished time among replica servers, and manage changeful network performance during the term of transferring as well. In this paper, we proposed an Anticipative Recursively-Adjusting Co-Allocation scheme, to adjust the workload of each selected replica server, which handles unwarned variant network performances of the selected replica servers. the algorithm is based on the previous finished rate of assigned transfer size, to anticipate that bandwidth status on next section for adjusting the workload, and further, to reduce file transfer time in a grid environment. Our approach is usefully in unstable gird environment, which reduces the wasted idle time for waiting the slowest server and decreases file transfer completion time.
Data cube has been playing an essential role in OLAP (online analytical processing). ne pre-computation of data cubes is critical for improving the response time of OLAP systems. However, as the size of data cube grow...
详细信息
ISBN:
(纸本)9783540729044
Data cube has been playing an essential role in OLAP (online analytical processing). ne pre-computation of data cubes is critical for improving the response time of OLAP systems. However, as the size of data cube grows, the time it takes to perform this pre-computation becomes a significant performance bottleneck. In a high dimensional OLAP, it might not be practical to build all these cuboids and their indices. In this paper, we propose a parallel hierarchical cubing algorithm, based on an extension of the previous minimal cubing approach. the algorithm has two components: decomposition of the cube space based on multiple dimension attributes, and an efficient OLAP query engine based on a prefix bitmap encoding of the indices. this method partitions the high dimensional data cube into low dimensional cube segments. Such an approach permits a significant reduction of CPU and I/O overhead for many queries by restricting the number of cube segments to be processed for boththe fact table and bitmap indices. the proposed data allocation and processing model support parallel I/O and parallelprocessing, as well as load balancing for disks and processors. Experimental results show that the proposed parallel hierarchical cubing method is significantly more efficient than other existing cubing methods.
In this paper, we introduced a reconfigurable processor optimized for implementation of Forward Error Correction (FEC) algorithms and provided the implementation results of the Viterbi and Turbo decoding algorithms. I...
详细信息
ISBN:
(纸本)9783540712671
In this paper, we introduced a reconfigurable processor optimized for implementation of Forward Error Correction (FEC) algorithms and provided the implementation results of the Viterbi and Turbo decoding algorithms. In this architecture, an array of processing elements is employed to perform the required operations in parallel. Each processing element encapsulates multiple functional units which are highly optimized for FEC algorithms. A data buffer coupled with high bandwidth interconnection network facilitates pumping the data to the array and collecting the results. A processing element controller orchestrates the operation and the data movement. Different FEC algorithms like Viterbi, Turbo, Reed-Solomon and LDPC are widely used in digital communication and could be implemented on this architecture. Unlike traditional approach to programmable FEC architectures, this architecture is instruction-level programmable which results the ultimate flexibility and programmability.
Block-based motion estimation technique is being widely used in video compression applications, for the removal of video temporal redundancy. In this paper we have implemented the six-level nested Do-loop FullSearch B...
详细信息
the paper discusses a choice of appropriate software architecture with regards to the specifications of embedded applications as information systems particularly used in area of radio and television broadcast audio pr...
详细信息
the paper discusses a choice of appropriate software architecture with regards to the specifications of embedded applications as information systems particularly used in area of radio and television broadcast audio program processing. the main requirement of mentioned embedded systems is an extremely good real-time response and minimized latency between input signal and output signal after processing. the requirements imposed to software architecture from the viewpoint of methods and algorithms which process the signal are in our case subjected to the minimal input-output latency requirement and specifics of hardware architecture and data structures used in embedded systems. the software architecture components and options are discussed and reviewed withtheir assets and limitations.
In this paper, we consider a parallel distributed detection network consisting of a fusion center and N sensors. We assume that the observations at different sensors are conditionally dependent, and optimize the syste...
详细信息
ISBN:
(纸本)0662478304
In this paper, we consider a parallel distributed detection network consisting of a fusion center and N sensors. We assume that the observations at different sensors are conditionally dependent, and optimize the system performance under the Neyman-Pearson criterion. Unlike previous papers dealing withthe optimal N-P detection problem, we allow the sensor decision rules to be randomized, and obtain the necessary conditions for optimal fusion rule and sensor decision rules without making any assumptions on the joint density functions of sensor observations. the optimality conditions are obtained using an important property of points on the overall ROC curve that is established in the paper. And, a sufficient condition that guarantees the optimal sensor decision rules to be deterministic is also presented.
暂无评论