As the Memory Wall remains a bottleneck for Chip Multiprocessors (CMP), the effective management of CMP last level caches becomes of paramount importance in minimizing expensive off- chip memory accesses. For the CMPs...
详细信息
ISBN:
(纸本)9781424464432
As the Memory Wall remains a bottleneck for Chip Multiprocessors (CMP), the effective management of CMP last level caches becomes of paramount importance in minimizing expensive off- chip memory accesses. For the CMPs with private last level caches, Cooperative Caching (CC) has been proposed to enable capacity sharing among private caches by spilling an evicted block from one cache to another. But this eviction-driven CC does not necessarily promote the cache performance since it implicitly favors the applications full of block evictions regardless of their real capacity demand. The recent Dynamic Spill-Receive (DSR) paradigm improves CC by prioritizing applications with higher benefit from extra capacity in spilling blocks. However, the DSR paradigm only exploits the coarse-grained application-level difference in capacity demand, making it less effective as the non-uniformity exists at a much finer level. This paper (i) highlights the observation of cache set-level non- uniformity of capacity demand, and (ii) presents a novel L2 cache design, named SNUG (Set-level Non-Uniformity identifier and Grouper), to exploit the fine-grained non-uniformity to further enhance the effectiveness of cooperative caching. By utilizing a per-set shadow tag array and saturating counter, SNUG can identify whether a set should either spill or receive blocks;by using an index-bit flipping scheme, SNUG can group peer sets for spilling and receiving in an flexible way, capturing more opportunities for cooperative caching. We evaluate our design through extensive execution-driven simulations on Quad-core CMP systems. Our results show that for 6 classes of workload combinations our SNUG cache can improve the CMP throughput by up to 22.3%, with an average of 13.9% over the baseline configuration, while the state-of-the-art DSR scheme can only achieve an improvement by up to 14.5% and 8.4% on average.
Contemporary processors are becoming wider and more parallel. Thus developers must work hard to extract performance gains. An alternative computing paradigm is to use FPGA technology in a reconfigurable computing envi...
详细信息
ISBN:
(纸本)9781424465330
Contemporary processors are becoming wider and more parallel. Thus developers must work hard to extract performance gains. An alternative computing paradigm is to use FPGA technology in a reconfigurable computing environment-where both software and hardware can be specified. This has the potential to realise substantial performance gains in a variety of applications, however it is a daunting task as hardware development is required to harness the benefits. In this research the acceleration of common data structures-with the priority queue (PQ) as a case study-has been explored in the context of such a reconfigurable computing environment. A Java-based hybrid hardware/software PQ has been developed that is a 'drop-in' replacement for a software implementation; achieved by strictly adhering to the same programming interface. The accelerated PQ has demonstrated up to 3x speedup when performing a minimum spanning tree graph computation. Taking this further, a suite of accelerated data structures represents an attractive way for developers to harness the potential of reconfigurable computing in the future across a wide gamut of application domains.
Summary form only given. To effectively manage large-scale data centers and utility clouds, operators must understand current system and application behaviors. This requires continuous monitoring along with online ana...
详细信息
Summary form only given. To effectively manage large-scale data centers and utility clouds, operators must understand current system and application behaviors. This requires continuous monitoring along with online analysis of the data captured by the monitoring system. As a result, there is a need to move to systems in which both tasks can be performed in an integrated fashion, thereby better able to drive online system management. Coining the term `monalytics' to refer to the combined monitoring and analysis systems used for managing large-scale data center systems, this talk articulates principles for monalytics systems, describes software approaches for implementing them, and provides experimental evaluations justifying principles and implementation approach. Specific technical contributions include consideration of scalability across both `space' and `time', the ability to dynamically deploy and adjust monalytics functionality at multiple levels of abstraction in target systems, and the capability to operate across the range of application to hypervisor layers present in large-scale data center or cloud computing systems. Our monalytics implementation targets virtualized systems and cloud infrastructures, via the integration of its functionality into the Xen hypervisor.
Microelectrical and -mechanical systems for manipulation applications of small components have become more and more important in recent years. Against this background, a small parallel robot was developed at the insti...
详细信息
ISBN:
(纸本)9781617820199
Microelectrical and -mechanical systems for manipulation applications of small components have become more and more important in recent years. Against this background, a small parallel robot was developed at the institute for microtechnology in Braunschweig. The "microrobot" is fabricated by means of a ceramic wafer applying key microtechnologies. It is driven by three Lorentz-force actuators, which are arranged radially around a delta-shaped endeffector [1]. The movement principle is based on Lorentz-force. Besides the three actuators, the parallel structure of the microrobot is composed of several microsprings, microjoints, couplers, and the mentioned endeffector in the center. The position acquisition of each actuator is realized with comb shaped capacitive sensors. For fast and precise measurement, the sensors are directly integrated into the drivetrain. Each sensor consists of two fixed electrodes with an SU-8 insulation layer and a flexible electrode in between. Within this context, one important point is the realization of a precise positioning of the endeffector. For this reason the inverse kinematics of the entire system has to be solved. Therefore, the displacement of each actuator has to be calculated depending on a predefined endeffector position. The calculation is based on the model of an ideal robotic system. For the usage in robotic applications the endeffector has to be displaced permanently. Hence, a fast and precise control algorithm is necessary. This paper reports about the development and realization of this closed-loop control as well as the analysis of the sensor signals. Furthermore the software algorithms for a LabVIEW® based program are described.
Summary form of only given: Apache Hadoop has become the platform of choice for developing large-scale data-intensive applications. In this tutorial, we will discuss design philosophy of Hadoop, describe how to design...
详细信息
Summary form of only given: Apache Hadoop has become the platform of choice for developing large-scale data-intensive applications. In this tutorial, we will discuss design philosophy of Hadoop, describe how to design and develop Hadoop applications and higher-level application frameworks to crunch several terabytes of data, using anywhere from four to 4,000 computers. We will discuss solutions to common problems encountered in maximizing Hadoop application performance. We will also describe several frameworks and utilities developed using Hadoop that increase programmer-productivity and application-performance.
This paper gives out the definition of switched systems and those solution structure. Using railway signal systems, we put out a switched computation method of railway route. The macro route and cent route are compute...
详细信息
This paper gives out the definition of switched systems and those solution structure. Using railway signal systems, we put out a switched computation method of railway route. The macro route and cent route are computed out by the switched matrices or transfer matrices. Traditionally, the route are searched by deep or width methods of the graph. The switched route computation method can expediently by Matlab programming and matrices operations. An practical instance of railway station and its experimental data are analyzed by this computation algorithms.
Todays commercial off-the-shelf computer systems are multicore computing systems as a combination of CPU, graphic processor (GPU) and custom devices. In comparison with CPU cores, graphic cards are capable to execute ...
详细信息
Todays commercial off-the-shelf computer systems are multicore computing systems as a combination of CPU, graphic processor (GPU) and custom devices. In comparison with CPU cores, graphic cards are capable to execute hundreds up to thousands compute units in parallel. To benefit from these GPU computing resources, applications have to be parallelized and adapted to the target architecture. In this paper we show our experience in applying the NQueens puzzle solution on GPUs using Nvidia's CUDA (Compute Unified Device Architecture) technology. Using the example of memory usage and memory access, we demonstrate that optimizations of CUDA programs may have contrary results on different CUDA architectures. Evaluation results will point out, that it is not sufficient to use new programming languages or compilers to achieve best results with emerging graphic card computing.
The performance analysis of distributedsoftwaresystems is a challenging task in which the assessment of performance measures is a vital step. Due to its versatility, the concept of software performance engineering (...
详细信息
The performance analysis of distributedsoftwaresystems is a challenging task in which the assessment of performance measures is a vital step. Due to its versatility, the concept of software performance engineering (SPE) has been advocated as a promising solution towards realizing that step. This paper illustrates how by using our recently proposed Model-Driven SPE (MDSPE) approach, one can design annotated UML performance models for the performance analysis of distributedsoftwaresystems, based on the UML profile for Schedulability, Performance and Time. A case study of a business system is used to validate the stated goal.
We present Deetoo, an algorithm to perform completely general queries, for instance high-dimensional proximity queries or regular expression matching, on a P2P network. Deetoo is an efficient unstructured query system...
详细信息
ISBN:
(纸本)9781424465330
We present Deetoo, an algorithm to perform completely general queries, for instance high-dimensional proximity queries or regular expression matching, on a P2P network. Deetoo is an efficient unstructured query system on top of existing structured P2P ring topologies. Deetoo provides a reusable search tool to work alongside a DHT, thus, it provides new capabilities while reusing existing P2P models and software. Since our algorithm is for unstructured search, there is no structural relationship between the queries and the network topology and hence no need to provide a mapping of queries onto a fixed DHT structure. Deetoo is optimal in terms of the trade-off in querying and caching cost. For networks of size N, O(¿N) cost for both caching and querying is required to achieve a constant (in N) search success probability. Queries execute a time of O(log 2 N).
In this paper, we consider a methodology that utilizes qualitative expert knowledge for inference in a Bayesian network. The decision-making assumptions and the mathematical equation for Bayesian inference are derived...
详细信息
ISBN:
(纸本)9781424474226;9780769540887
In this paper, we consider a methodology that utilizes qualitative expert knowledge for inference in a Bayesian network. The decision-making assumptions and the mathematical equation for Bayesian inference are derived based on data and knowledge obtained from experts. A detailed method to transform knowledge into a set of qualitative statements and an “a priori” distribution for Bayesian probabilistic models are proposed. We also propose a simplified method for constructing the “a prior” model distribution. Each statement obtained from the experts is used to constrain the model space to the subspace which is consistent with the statement provided. Finally, we present qualitative knowledge models and then show a full formalism of how to translate a set of qualitative statements into probability inequality constraints.
暂无评论