Multi-core platforms seem to be the way towards increasing performance of processors. As the multi-cores are becoming the defacto processors, the need for new scheduling and resource sharing protocols has arisen. Howe...
详细信息
Multi-core platforms seem to be the way towards increasing performance of processors. As the multi-cores are becoming the defacto processors, the need for new scheduling and resource sharing protocols has arisen. However, taking such technology to an industrial setting, it needs to be evaluated such that appropriate scheduling, synchronization and partitioning algorithms are selected. In this paper we present our ongoing work on a tool for investigation and evaluation of different approaches to scheduling, synchronization and task allocation on multi-core platforms. Our tool allows for comparison of different approaches with respect to a number of parameters such as number of schedulable systems and number of processors required for scheduling. The output of the tool includes a set of information and graphs to facilitate evaluation and comparison of different approaches.
We analyze the technique for reducing the complexity of entropy coding consisting of the a priori grouping of the source alphabet symbols, and in dividing the coding process in two stages: first coding the number of t...
详细信息
We analyze the technique for reducing the complexity of entropy coding consisting of the a priori grouping of the source alphabet symbols, and in dividing the coding process in two stages: first coding the number of the symbol's group with a more complex method, followed by coding the symbol's rank inside its group using a less complex method, or simply using its binary representation. Because this method proved to be quite effective it is widely used in practice, and is an important part in standards like MPEG and JPEG. However, a theory to fully exploit its effectiveness had not been sufficiently developed. In this work, we study methods for optimizing the alphabet decomposition, and prove that a necessary optimality condition eliminates most of the possible solutions, and guarantees that dynamic programming solutions are optimal. In addition, we show that the data used for optimization have useful mathematical properties, which greatly reduce the complexity of finding optimal partitions. Finally, we extend the analysis, and propose efficient algorithms, for finding min-max optimal partitions for multiple data sources. Numerical results show the difference in redundancy for single and multiple sources.
We propose a method for designing a partitioning clustering algorithm from reusable components that is suitable for finding the appropriate number of clusters (K) in microarray data. The proposed method is evaluated o...
详细信息
We propose a method for designing a partitioning clustering algorithm from reusable components that is suitable for finding the appropriate number of clusters (K) in microarray data. The proposed method is evaluated on 10 datasets (4 syntetic and 6 real-word microarrays) by considering 1008 reusable-component-based algorithms and four normalization methods. The best performing algorithm were reported on every dataset and also rules were identified for designing microarray-specific clustering algorithms. The obtained results indicate that in the majority of cases a data-tailored clustering algorithm design outperforms the results reported in the literature. In addition, data normalization can have an important influence on algorithm performance. The method proposed in this paper gives insights for design of divisive clustering algorithms that can reveal the optimal K in a microarray dataset.
Generation of test vectors for the VLSI devices used in contemporary digital systems is becoming much more difficult as these devices increase in size and complexity. Automatic Test Pattern Generation (ATPG) technique...
详细信息
Generation of test vectors for the VLSI devices used in contemporary digital systems is becoming much more difficult as these devices increase in size and complexity. Automatic Test Pattern Generation (ATPG) techniques are commonly used to generate these tests. Since ATPG is an NP complete problem with complexity exponential to circuit size, the application of parallel processing techniques to accelerate the process of generating test vectors is an promising area of research. The simplest approach to parallelization of the test generation process is to simply divide the processing of the fault list across multiple processors. Each individual processor then performs the normal rest generation process on its own portion of the fault list, typically without interaction with the other processors. The major drawback of this technique, called fault partitioning, is that the processors perform redundant work generating test vectors for faults covered by vectors generated on another processor. This problem has been solved with the introduction of dynamic load balancing and detected fault broadcasting. Previous research has indicated that algorithmic fault partitioning moderately improves the performance of fault partitioned ATPG without detected fault broadcasting by reducing redundant work. However algorithmic fault partitioning can add significant preprocessing time to the ATPG process. This paper presents results that show that algorithmic partitioning is unnecessary prior to fault partitioned parallel ATPG using detected fault broadcasting and dynamic load balancing. Considering preprocessing time, random fault partitioning is shown to be the most efficient technique for partitioning faults prior to fault partitioned ATPG.
Iterative improvement partitioning algorithms such as those due to Fiduccia and Mattheyses (1982) and Krishnamurthy (1984) exploit an efficient gain bucket data structure in selecting modules that are moved from one p...
详细信息
Iterative improvement partitioning algorithms such as those due to Fiduccia and Mattheyses (1982) and Krishnamurthy (1984) exploit an efficient gain bucket data structure in selecting modules that are moved from one partition to the other. In this paper, we investigate three gain bucket implementations and their effect on the performance of the Fiduccia-Mattheyses partitioning algorithm. Surprisingly, selection from gain buckets maintained as Last-In-First-Out (LIFO) stacks leads to significantly better results than selection from gain buckets maintained randomly or as First-In-First-Out (FIFO) queues. Our experiments show that LIFO buckets result in a 35% improvement over random buckets and a 42% improvement over FIFO buckets. Furthermore, eliminating randomization from the bucket selection is of greater benefit to Fiduccia-Mattheyses performance than adding the Krishnamurthy gain vector. By combining insights from the LIFO gain buckets with those of Krishnamurthy's original work, a new higher-level gain formulation is proposed. This alternative formulation results in a further 16% reduction in the average cut cost when compared directly to the Krishnamurthy formulation for higher-level gains, assuming LIFO organization for the gain buckets.
In this paper, batch processing partitioning parameter identification agorithms are obtained using the "partitioning" approach to estimation. The algorithms, herein denoted the GPIA's, are applicable to ...
详细信息
In this paper, batch processing partitioning parameter identification agorithms are obtained using the "partitioning" approach to estimation. The algorithms, herein denoted the GPIA's, are applicable to linear as well as nonlinear systems and are derived by a natural applicalton of the generalized partitioned algorithms (GPA's) of Lainiotis; namely, by selecting a natural partitioning of the augmented state vector (the system state and unknown parameters); by linearization of the model equations; and then by using, in an iterative fashion, the GPA algorithms for the augmented state. The relationships between the GPIA's and maximum-likelihood identification methods, which employ gradient based numerical techniques to obtain a solution, are also established. An example of the application of the GPIA to aircraft parameter identification from actual flight test data is presented, as well as a direct comparison with the results obtaining using an iterated extended Kalman filter algorithm.
Some pedagogical aspects pertaining to the algorithms that are used to compute the discrete Fourier and the Hadamard transforms are considered. Elementary matrix partitioning techniques are used to illustrate the mann...
详细信息
Some pedagogical aspects pertaining to the algorithms that are used to compute the discrete Fourier and the Hadamard transforms are considered. Elementary matrix partitioning techniques are used to illustrate the manner in which these algorithms work and how they are related. It is felt that this approach can be used to good advantage to introduce the student to this class of algorithms before proceding with more rigorous developments.
The architectural trend towards heterogeneity has pushed heterogeneous computing to the fore of parallel computing research. Heterogeneous algorithms, often carefully hand-crafted, have been designed for several impor...
详细信息
ISBN:
(纸本)9781538610428
The architectural trend towards heterogeneity has pushed heterogeneous computing to the fore of parallel computing research. Heterogeneous algorithms, often carefully hand-crafted, have been designed for several important problems from parallel computing such as sorting, graph algorithms, matrix computations, and the like. A majority of these algorithms follow a work partitioning approach where the input is divided into appropriate sized parts so that individual devices can process the "right" parts of the input. However, arriving at a good work partitioning is usually non-trivial and may require extensive empirical search. Such an extensive empirical search can potentially offset any gains accrued out of heterogeneous algorithms. Other recently proposed approaches too are in general inadequate. In this paper, we propose a simple and effective technique for work partitioning in the context of heterogeneous algorithms. Our technique is based on sampling and therefore can adapt to both the algorithm used and the input instance. Our technique is generic in its applicability as we will demonstrate in this paper. We validate our technique on three problems: finding the connected components of a graph (CC), multiplying two unstructured sparse matrices (spmm), and multiplying two scale-free sparse matrices. For these problems, we show that using our method, we can find the required threshold that is under 10% away from the best possible thresholds.
暂无评论