the emergence of 5G networks and real-time applications across networks has a strong impact on the performance requirements of IP lookup engines. these engines must support not only high-bandwidth but also low-latency...
详细信息
ISBN:
(纸本)9781538658154
the emergence of 5G networks and real-time applications across networks has a strong impact on the performance requirements of IP lookup engines. these engines must support not only high-bandwidth but also low-latency lookup operations. this paper presents the hardware architecture of a low-latency IPv6 lookup engine capable of supporting the bandwidth of current Ethernet links. the engine implements the SHIP lookup algorithm, which exploits prefix characteristics to build a compact and scalable data structure. the proposed hardware architecture leverages the characteristics of the data structure to support low-latency lookup operations, while making efficient use of memory. the architecture is described in C++, synthesized with a highlevel synthesis tool, then implemented on a Virtex-7 FPGA. Compared to the proposed IPv6 lookup architecture, other well-known approaches use at least 87% more memory per prefix, while increasing the lookup latency by a factor of 2.3x.
As the computing ability of highperformancecomputers are improved by increasing the number of computing elements, how to utilize the available computing resources becomes an important issue. Different strategies to ...
详细信息
ISBN:
(纸本)9781605585871
As the computing ability of highperformancecomputers are improved by increasing the number of computing elements, how to utilize the available computing resources becomes an important issue. Different strategies to solve an problem based on a multi-processing system can bring about distinct performance. In this paper, we propose a method to predict the performance of parallel applications. the method describes the parallel features of the multi-processing systems in a hierarchy way, and evaluates solutions based on the description. In this way, programmers can find the better solution of an application before real programming.
Intelligently partitioning the last-level cache within a chip multiprocessor can bring significant performance improvements. Resources are given to the applications that can benefit most from them, restricting each co...
详细信息
ISBN:
(纸本)9781467308243;9781467308267
Intelligently partitioning the last-level cache within a chip multiprocessor can bring significant performance improvements. Resources are given to the applications that can benefit most from them, restricting each core to a number of logical cache ways. However, although overall performance is increased, existing schemes fail to consider energy saving when making their partitioning decisions. this paper presents Cooperative Partitioning, a runtime partitioning scheme that reduces both dynamic and static energy while maintaining highperformance. It works by enforcing cached data to be way-aligned, so that a way is owned by a single core at any time. Cores cooperate with each other to migrate ways between themselves after partitioning decisions have been made. Upon access to the cache, a core needs only to consult the ways that it owns to find its data, saving dynamic energy. Unused ways can be power-gated for static energy saving. We evaluate our approach on two-core and four-core systems, showing that we obtain average dynamic and static energy savings of 35% and 25% compared to a fixed partitioning scheme. In addition, Cooperative Partitioning maintains highperformance while transferring ways five times faster than an existing state-of-the-art technique.
Fog computing extends the Cloud computing paradigm to the edge of the network, developing a decentralized infrastructure in which services are distributed to locations that best meet the needs of the applications such...
详细信息
ISBN:
(纸本)9781538677698
Fog computing extends the Cloud computing paradigm to the edge of the network, developing a decentralized infrastructure in which services are distributed to locations that best meet the needs of the applications such as low communication latency, data caching or confidentiality. P2P-based platforms are good candidates to host Fog computing, but they usually lack important elements such as controlling where the data is stored and who will handle the computing tasks. As a consequence, controlling where the data is stored becomes as important as controlling who handle it. In this paper we propose different techniques to reinforce data-locality for P2P-based middlewares, and study how these techniques can be implemented. Experimental results demonstrate the interest of data locality on the data access performances.
Recent Graphics Processing Units (GPUs) have employed cache memories to boost performance. However, cache memories are well known to be harmful to time predictability for CPUs. For high-performance real-time systems u...
详细信息
ISBN:
(纸本)9781479987818
Recent Graphics Processing Units (GPUs) have employed cache memories to boost performance. However, cache memories are well known to be harmful to time predictability for CPUs. For high-performance real-time systems using GPUs, it remains unknown whether or not cache memories should be employed. In this paper, we quantitatively compare the performance for GPUs with and without caches, and find that GPUs without the cache actually lead to better average-case performance, withhigher time predictability. However, we also study a profiling -based cache bypassing method, which can use the Li data cache more efficiently to achieve better average-case performancethan that without the cache. therefore, it seems still beneficial to employ caches for real-time computing on GPUs.
Multi-level cell Spin-Transfer Torque RAM (MLC STT-RAM) greatly suffers from the significantly degraded operation reliability and high programming cost. In this paper, a novel MLC design, namely ternary-state MLC (TS-...
详细信息
ISBN:
(纸本)9781479945917
Multi-level cell Spin-Transfer Torque RAM (MLC STT-RAM) greatly suffers from the significantly degraded operation reliability and high programming cost. In this paper, a novel MLC design, namely ternary-state MLC (TS-MLC STT-RAM), is proposed for high-reliable high-performance memory systems by leveraging a cross-layer solution set. Based on the structure, several circuit and architecture schemes are proposed to enhance boththe reliability and access latency of the memory cells.
Middleware promotes interoperability as well as provides transparent location of servers in heterogeneous client-server environments. Although a number of benefits are provided by middleware, careful consideration of ...
详细信息
ISBN:
(纸本)0780350049
Middleware promotes interoperability as well as provides transparent location of servers in heterogeneous client-server environments. Although a number of benefits are provided by middleware, careful consideration of system architecture is required to achieve highperformance. Based on implementation and measurements made on a network of workstations running a commercial CORBA compliant ORB called ORBeline this paper is concerned withthe impact of client-agent-server interaction architecture on performance. the paper reports on the relative performances of three interaction architectures under different workload conditions. In particular the impact of inter-node delays, message size, and request service times on the latency and scalability attributes of these architectures is analyzed. A method called agent cloning and how it can be used for improving system performance are described.
Current Chip Multiprocessors (CMPs) consist of several cores, cache memories and interconnection networks in the same chip. Private last level cache (LLC) configurations assign a static portion of the LLC to each core...
详细信息
ISBN:
(纸本)9781467308243;9781467308267
Current Chip Multiprocessors (CMPs) consist of several cores, cache memories and interconnection networks in the same chip. Private last level cache (LLC) configurations assign a static portion of the LLC to each core. this provides lower latency and isolation, at the cost of depriving the system of the possibility of reassigning underutilized resources. A way of taking advantage of underutilized resources in other private LLCs in the same chip is to use the coherence mechanism to determine the state of those caches and spill lines to them. Also, it is well known that memory references are not uniformly distributed across the sets of a set-associative cache. therefore, applying a uniform spilling policy to all the sets in a cache may not be the best option. this paper proposes Adaptive Set-Granular Cooperative Caching (ASCC), which measures the degree of stress of each set and performs spills between spiller and potential receiver sets, while it tackles capacity problems. Also, it adds a neutral state to prevent sets from being either spillers or receivers when it could be harmful. Furthermore, we propose Adaptive Variable-Granularity Cooperative Caching (AVGCC), which dynamically adjusts the granularity for applying these policies. Both techniques have a negligible storage overhead and can adapt to many core environments using scalable structures. AVGCC improved average performance by 7.8% and reduced average memory latency by 27% related to a traditional private LLC configuration in a 4-core CMP. Finally, we propose an extension of AVGCC to provide Quality of Service that increases the average performance gain to 8.1%.
Lowering supply voltage is one of the most effective techniques for reducing microprocessor power consumption. Unfortunately, at low voltages, chips are very sensitive to process variation, which can lead to large dif...
详细信息
ISBN:
(纸本)9781467308243;9781467308267
Lowering supply voltage is one of the most effective techniques for reducing microprocessor power consumption. Unfortunately, at low voltages, chips are very sensitive to process variation, which can lead to large differences in the maximum frequency achieved by individual cores. this paper presents Booster, a simple, low-overhead framework for dynamically rebalancing performance heterogeneity caused by process variation and application imbalance. the Booster CMP includes two power supply rails set at two very low but different voltages. Each core can be dynamically assigned to either of the two rails using a gating circuit. this allows cores to quickly switch between two different frequencies. An on-chip governor controls the timing of the switching and the time spent on each rail. the governor manages a "boost budget" that dictates how many cores can be sped up (depending on the power constraints) at any given time. We present two implementations of Booster: Booster VAR, which virtually eliminates the effects of core-to-core frequency variation in near-threshold CMPs, and Booster SYNC, which additionally reduces the effects of imbalance in multithreaded applications. Evaluation using PARSEC and SPLASH2 benchmarks running on a simulated 32-core system shows an average performance improvement of 11% for Booster VAR and 23% for Booster SYNC.
the Advanced Telecommunications computingarchitecture (ATCA) and Micro Telecommunications computingarchitecture (μTCA) standards, intended for high-performance applications, offer an array of features that are comp...
详细信息
暂无评论