Nonblocking collective communication operations are currently being considered for inclusion into the MPI standard and are at? area of active research. the benefits of such operations are documented by several recent ...
详细信息
ISBN:
(纸本)9781424437511
Nonblocking collective communication operations are currently being considered for inclusion into the MPI standard and are at? area of active research. the benefits of such operations are documented by several recent publications, but so jar research concentrates on InfiniBand clusters. this paper describes an implementation of nonblocking collectives for clusters withthe Scalable Coherent Interface (SCI) interconnect. We use synthetic and application kernel benchmarks to show, that with nonblocking functions for collective communication performance enhancements can be achieved on SCI systems. Our results indicate that for the implementation of these nonblocking collectives data transfer methods other that? those usually used for the blocking version should be considered to realize such improvements.
Nowadays, many intrusion detection and tolerance systems have been proposed in order to detect attacks in both wired and wireless networks. Even if these solutions have shown some efficiency by detecting a set of comp...
详细信息
ISBN:
(纸本)9781424437511
Nowadays, many intrusion detection and tolerance systems have been proposed in order to detect attacks in both wired and wireless networks. Even if these solutions have shown some efficiency by detecting a set of complex attacks in wireless environments, they are unable to detect attacks using transaction bared traffic in wireless environments. In this context, we propose an intrusion detection and tolerance scheme that is able to monitor heterogeneous traffic and to detect and tolerate attacks targeting transaction based applications interoperating in wireless environments. A case study is given to illustrate the proposed system capabilities against a complex attack scenario targeting a multi-player wireless gaming service.
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be ut...
详细信息
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be utilized in parallel to improve the throughput further. However, several challenges must be addressed to make such solutions feasible. First, the memory distribution over different pipelines, as well as across different stages of each pipeline, must be balanced. Second, the traffic among these pipelines should be balanced. third, the intra-flow packet order (i.e. the sequence) must be preserved. In this paper, we propose a parallel SRAM-based multi-pipeline architecture for IP lookup. A two-level mapping scheme is developed to balance the memory requirement among the pipelines as well as across the stages in each pipeline. To balance the traffic, we propose an early caching scheme to exploit the data locality inherent in the architecture. Our technique uses neither a large reorder buffer nor complex reorder logic. Instead, a flow-aware queuing scheme exploiting the flow information is used to maintain the intra-flow sequence. Extensive simulation using real-life traffic traces shows that the proposed architecture with 8 pipelines can achieve a throughput of up to 10 billion packets per second, i.e. 3.2 Tbps for minimum size (40 bytes) packets, while preserving intra-flow packet order. (c) 2009 Elsevier Inc. All rights reserved.
Interest management is essential for real-time large-scale distributed virtual environments (DVEs) which seeks to filter irrelevant messages on the network. Many existing interest management schemes such as HLA DDM fo...
详细信息
ISBN:
(纸本)9780769538686
Interest management is essential for real-time large-scale distributed virtual environments (DVEs) which seeks to filter irrelevant messages on the network. Many existing interest management schemes such as HLA DDM focus on providing precise message filtering mechanisms. However, this leads to a second problem: the computational overhead of the interest matching process. If the CPU cost of interest matching is too high, it would be unsuitable for real-time applications such as multiplayer online games for which runtime performance is important. this paper evaluates the performance of existing interest matching algorithms and proposes a new algorithm based on parallelprocessing. the new algorithm is expected to have better computational efficiency than existing algorithms and maintain the same accuracy of message filtering as them. Experimental evidence shows that our approach works well in practice.
the recent emergence of dramatically large computational power, spanning desktops with multi-core processors and multiple graphics cards to supercomputers with 10(5) processor cores, has suddenly resulted in simulatio...
详细信息
ISBN:
(纸本)9780769538686
the recent emergence of dramatically large computational power, spanning desktops with multi-core processors and multiple graphics cards to supercomputers with 10(5) processor cores, has suddenly resulted in simulation-based solutions trailing behind in the ability to fully tap the new computational capacity. Here, we motivate the need for switching the parallel simulation research to a higher gear to exploit the new, immense levels of computational power. the potential for grand-scale real-time solutions is illustrated using preliminary results from prototypes in four example application areas: (a) state- or regional-scale vehicular mobility modeling, (b) very large-scale epidemic modeling, (c) modeling the propagation of wireless network signals in very large, cluttered terrains, and, (d) country- or world-scale social behavioral modeling. We believe the stage is perfectly poised for the parallel/distributed simulation community to envision and formulate similar grand-scale, real-time simulation-based solutions in many application areas.
Automatic differentiation is the primary means of obtaining analytic derivatives from a numerical model given as a computer program. therefore, it is an essential productivity tool in numerous computational science an...
详细信息
ISBN:
(纸本)9781424437511
Automatic differentiation is the primary means of obtaining analytic derivatives from a numerical model given as a computer program. therefore, it is an essential productivity tool in numerous computational science and engineering domains. Computing gradients withthe adjoint (also called reverse) mode via source transformation is a particularly beneficial but also challenging use of automatic differentiation. To date only ad hoc solutions for adjoint differentiation of MPI programs have been available, forcing automatic differentiation tool users to reason about parallel communication dataflow and dependencies and manually develop adjoint communication code. Using the communication graph as a model we characterize the principal problems of adjoining the most frequently used communication idioms. We propose solutions to cover these idioms and consider the consequences for the MPI implementation, the MPI user and MPI-aware program analysis. the MIT general circulation model serves as a use case to illustrate the viability of our approach.
Withthe ever increasing demand for high quality 3D image processing on markets such as cinema and gaming, graphics processing units (GPUs) capabilities have shown tremendous advances. Although GPU-based cluster compu...
详细信息
ISBN:
(纸本)9780769536804
Withthe ever increasing demand for high quality 3D image processing on markets such as cinema and gaming, graphics processing units (GPUs) capabilities have shown tremendous advances. Although GPU-based cluster computing, which uses GPUs as the processing units, is one of the most promising high performance parallel computing platforms, currently there is no programming environment, interface or library designed to use these multiple computing resources to compute tasks in parallel. this paper proposes the CaravelaMPI, a new message passing interface targeted for GPU cluster computing, providing a unified and transparent interface to manage both communication and GPU execution. Experimental results show that the transparent interface of CaravelaMPI allows to efficiently program GPU-based clusters, not only decreasing the required programming effort but also increasing the performance of GPU-based cluster computing platforms.
In this paper, a new signal detection scheme using bothparallel interference cancellation (PIC) and equalization for the efficient joint distributed space-time coding is proposed to suppress the impact of imperfect s...
详细信息
Clustering is defined as the grouping of similar items in a set, and is an important process within the field of data mining. As the amount of data for various applications continues to increase, in terms of its size ...
详细信息
Network simulation faces an increasing demand for highly detailed simulation models which in turn require efficient handling of their inherent computational complexity. this demand for detailed models includes both ac...
详细信息
ISBN:
(纸本)9781424449262
Network simulation faces an increasing demand for highly detailed simulation models which in turn require efficient handling of their inherent computational complexity. this demand for detailed models includes both accurate estimations of processing time and in-depth modeling of wireless technologies. For instance, one might want to investigate if a particular device can incorporate a computationally complex radio transmission technology while meeting the deadlines of a multi-media streaming application such as VoIP.
暂无评论