The current trend toward heterogeneous architectures motivates us to reconsider current software and hardware paradigms. The focus is centered around new parallel programming models, compiler design, and runtime resou...
详细信息
This paper describes the incorporation of the IEEE-TCPP Curriculum Initiative into CS 2 at the University of Illinois at Urbana-Champaign. With control over only one course that requires a semi-rigid curriculum, we de...
详细信息
Network signature matching is an important task in many applications such as network security or traffic analysis, which generally rely on a flexible signature matching system to extract important packet information f...
详细信息
Network signature matching is an important task in many applications such as network security or traffic analysis, which generally rely on a flexible signature matching system to extract important packet information from each processed packet. This task is computation and data intensive, and requires significant processing time in sequential manner. In order to accelerate signature matching of giga-bit network traffic, we aim to exploit the inherent parallelism of signature matching through the use of parallel graphics processor units. In this paper, we present detailed analysis of signature matching along with the system design for parallel graphics processors(GPUs). The signature matching schema proposed is based on port matching and keyword matching in each packet header. A real system on graphics processor units was implemented to evaluate the efficacy of our design. Experimental results proved that signature matching can be efficiently done on graphics processor units.
Porting software to different platforms can require modifications of the application. One of the issues is that the targeted hardware supports another memory consistency model. As a consequence, the completion order o...
详细信息
The problem size of the stencil computation on GPU is limited by the GPU memory capacity, which is typically smaller than that of host memory. This paper proposes and evaluates a multi-level optimization method for st...
详细信息
This paper presents a high performance GPU accelerated implementation of 2-opt local search algorithm for the Traveling Salesman Problem (TSP). GPU usage significantly decreases the execution time needed for tour opti...
详细信息
High content throughput imaging systems must apply time consuming complex image processing algorithms to multiple bio-medical image *** systems are typically designed to use parallel resources in order to achieve resu...
详细信息
High content throughput imaging systems must apply time consuming complex image processing algorithms to multiple bio-medical image *** systems are typically designed to use parallel resources in order to achieve results in reasonable time *** paper presents the design of a distributed framework that provides separation of the largely orthogonal parallelisation from the domain image processing algorithm *** allows reuse and pluggable extension of parallelising patterns,as well as providing for extensibility of domain image processing.
In recent years, heterogeneous clusters using accelerators have been widely used in high performance computing systems. In such clusters, inter-node communication among accelerators requires several memory copies via ...
详细信息
The performance gap between computing power and the I/O system is ever increasing, and in the meantime more and more High Performance Computing (HPC) applications are becoming data intensive. This study describes an I...
详细信息
ISBN:
(纸本)9781467360661
The performance gap between computing power and the I/O system is ever increasing, and in the meantime more and more High Performance Computing (HPC) applications are becoming data intensive. This study describes an I/O data replication scheme, named Pattern-Direct and Layout-Aware (PDLA) data replication scheme, to alleviate this performance gap. The basic idea of PDLA is replicating identified data access pattern, and saving these reorganized replications with optimized data layouts based on access cost analysis. A runtime system is designed and developed to integrate the PDLA replication scheme and existing parallel I/O system; a prototype of PDLA is implemented under the MPICH2 and PVFS2 environments. Experimental results show that PDLA is effective in improving data access performance of parallel I/O systems.
Wireless Mesh Networks (WMNs) provide a promising foundation for a flexible and reliable communication infrastructure in industrial environments. Meeting the QoS demands of real-time applications, though, requires the...
详细信息
ISBN:
(纸本)9781479913725
Wireless Mesh Networks (WMNs) provide a promising foundation for a flexible and reliable communication infrastructure in industrial environments. Meeting the QoS demands of real-time applications, though, requires the deployment of various advanced mechanisms. Compared to wired networks, applications face higher packet loss rates in wireless networks due to the inherent unreliability of wireless communication. Furthermore, if mobile stations are involved, links that fail due to node movement frequently cause packet losses. In this paper, we present an approach to tolerate those specific losses by locally recovering lost packets and transiently re-routing them over an alternative path. The evaluation in real-world experiments shows that we can completely prevent packet loss without significantly increasing the end-to-end latency. This allows the deployment of WMNs for real-time applications without explicitly considering the increased error-proneness of wireless communication and station mobility.
暂无评论