Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip ...
详细信息
Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router.
Based on the basic theory and deficient points of Model-free Learning Adaptive Control (MFLAC) method, the Multi-innovation (MI) theory and Particle Swarm Optimization (PSO) algorithm were introduced into the improved...
详细信息
The challenge for on-chip networks is to provide low latency communication in a very low power budget. To reduce the latency and keep the simplicity of a mesh network, torus network is proposed. As torus networks have...
详细信息
The problem of selecting the patterns to be learned by any model is usually not considered by the time of designing the concrete model but as a preprocessing step. Information theory provides a robust theoretical fram...
详细信息
ISBN:
(纸本)2930307099
The problem of selecting the patterns to be learned by any model is usually not considered by the time of designing the concrete model but as a preprocessing step. Information theory provides a robust theoretical framework for performing input variable selection thanks to the concept of mutual information. Recently the computation of the mutual information for regression tasks has been proposed so this paper presents a new application of the concept of mutual information not to select the variables but to decide which prototypes should belong to the training data set in regression problems. The proposed methodology consists in deciding if a prototype should belong or not to the training set using as criteria the estimation of the mutual information between the variables. The novelty of the approach is to focus in prototype selection for regression problems instead of classification as the majority of the literature deals only with the last one. Other element that distinguishes this work from others is that it is not proposed as an outlier identificator but as an algorithm that determines the best subset of input vectors by the time of building a model to approximate it. As the experiment section shows, this new method is able to identify a high percentage of the real data set when it is applied to a highly distorted data sets.
Efficiency of batch processing is becoming increasingly important for many modern commercial service centers, e.g., clusters and cloud computing datacenters. However, periodical resource contentions have become the ma...
详细信息
Efficiency of batch processing is becoming increasingly important for many modern commercial service centers, e.g., clusters and cloud computing datacenters. However, periodical resource contentions have become the major performance obstacles for concurrently running applications on mainstream CMP servers. I/O contention is such a kind of obstacle, which may impede both the co-running performance of batch jobs and the system throughput seriously. In this paper, a dynamic I/O-aware scheduling algorithm is proposed to lower the impacts of I/O contention and to enhance the co-running performance in batch processing. We set up our environment on an 8-socket, 64-core server in Dawning Linux Cluster. Fifteen workloads ranging from 8 jobs to 256 jobs are evaluated. Our experimental results show significant improvements on the throughputs of the workloads, which range from 7% to 431%. Meanwhile, noticeable improvements on the slowdown of workloads and the average runtime for each job can be achieved. These results show that a well-tuned dynamic I/O-aware scheduler is beneficial for batch-mode services. It can also enhance the resource utilization via throughput improvement on modern service platforms.
The paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector ma...
详细信息
A trend of distance underestimations in Virtual Reality (VR) is well documented, but the reason still remains unclear. Therefore, this paper investigates the effect of differently sized Virtual Environments (VEs) on e...
详细信息
Digitized substation is the substation automation technology of the popular development direction, compared with the traditional transformer substation equipment, equipments and information system has had a great chan...
详细信息
Today's approaches towards heterogeneous computing rely on either the programmer or dedicated programming models to efficiently integrate heterogeneous components. In this work, we propose an adaptive cost-aware f...
详细信息
ISBN:
(纸本)9781450302418
Today's approaches towards heterogeneous computing rely on either the programmer or dedicated programming models to efficiently integrate heterogeneous components. In this work, we propose an adaptive cost-aware function-migration mechanism built on top of a light-weight hardware abstraction layer. With this mechanism, the highly dynamic task of choosing the most beneficial processing unit will be hidden from the programmer while causing only minor variation in the work and program flow. The migration mechanism transparently adapts to the current workload and system environment without the necessity of JIT compilation or binary translation. Evaluation shows that our approach successfully adapts to new circumstances and predicts the most beneficial processing unit (PU). Through fine-grained PU selection, our solution achieves a speedup of up to 2.27 for the average kernel execution time but introduces only a marginal overhead in case its services are not required. Copyright 2011 ACM.
This paper discusses software pipelining for a new class of architectures that we call transport-triggered. These architectures reduce the interconnection requirements between function units. They also exhibit code sc...
详细信息
暂无评论