The heterogeneous communication characteristics of clustered SMP systems create great potential for optimizations which favor physical locality. This paper describes a novel technique for automating such optimizations...
详细信息
The heterogeneous communication characteristics of clustered SMP systems create great potential for optimizations which favor physical locality. This paper describes a novel technique for automating such optimizations, applied to barrier operations. Portability poses a challenge when optimizing for locality, as costs are bound to variations in platform topology. This challenge is addressed through representing both platform structure and barrier algorithms as input data, and altering the algorithm based on benchmark results which can be easily obtained from a given platform. Our resulting optimization technique is empirically tested on two modern clusters, up to eight dual quad-core nodes on one, and up to ten dual hex-core nodes on another. Included test results show that the method captures performance advantages on both systems without any explicit customization, and produces specialized barriers of superior performance to a topology-neutral implementation.
The size and complexity of large-scale distributed embedded systems such as automotive and process controls have increased recently. Sophisticated systems that are safe and environmentally friendly require numerous ty...
详细信息
The size and complexity of large-scale distributed embedded systems such as automotive and process controls have increased recently. Sophisticated systems that are safe and environmentally friendly require numerous types of sensor data, which are collected from various devices and sent to computers through networks. To develop such large-scale distributed embedded systems with high dependability and productivity, we have developed a virtual execution environment platform. This platform integrates numerous CPU simulators and various device simulators through the network and provides network-wide simulation functionalities. In this paper, we describe a fast CPU simulator and controlled object simulation for testing control software in a virtual software execution environment. The virtual environment integrates a CPU simulator and a controlled object simulator in order to test functional behaviors of embedded control software. The environment enables the developer to test control software at the same execution rate as a real system without the source codes. This is very helpful because in this industry, not all of the source codes are provided.
The Xilinx Partial Reconfiguration Early Access software Tools for ISE 9.2i has been an instrumental package for performing a wide variety of research on Xilinx FPGAs, and is now superseded with the corresponding non-...
详细信息
The Xilinx Partial Reconfiguration Early Access software Tools for ISE 9.2i has been an instrumental package for performing a wide variety of research on Xilinx FPGAs, and is now superseded with the corresponding non-free add-on for ISE 12.3. The original package was developed and offered by Xilinx as a downloadable add-on package to the Xilinx ISE 9.2 tools. The 9.2i toolkit provided a methodology for creating rectangular partial reconfiguration modules that could be swapped in and out of a static baseline design with one or more PR slots. This paper presents a new PR toolkit called Open PR that, for starters, provides similar functionality as the Xilinx PR Toolkit, yet is extendable to explore other modes of partial reconfiguration. The distinguishing feature of this toolkit is that it is being released as open source, and is intended to extend to the needs of individual researchers.
RFID(Radio Frequency Identification) system is a promising automatic identification technology that use communication via radio waves to identify and track moving objects. RFID systems, however, are susceptible to a v...
详细信息
RFID(Radio Frequency Identification) system is a promising automatic identification technology that use communication via radio waves to identify and track moving objects. RFID systems, however, are susceptible to a variety of security problems and fundamental solutions to RFID security vulnerabilities have not yet been proposed. This paper proposes an RFID tag mutual authentication scheme that resists security threats posed by inherent characteristics of low-cost RFID systems. The proposed scheme utilize the modulo operation to improve the efficiency of tag retrieval in the back-end server and employs the hash function to provide enhanced security.
Many fields of science and engineering, such as astronomy, medical imaging, seismology and spectroscopy, have been revolutionized by Fourier methods. The fast Fourier transform (FFT) is an efficient algorithm to compu...
详细信息
Many fields of science and engineering, such as astronomy, medical imaging, seismology and spectroscopy, have been revolutionized by Fourier methods. The fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. The emerging class of high performance computing architectures, such as GPU, seeks to achieve much higher performance and efficiency by exposing a hierarchy of distinct memories to programmers. However, the complexity of GPU programming poses a significant challenge for programmers. In this paper, based on the Kronecker product form multi-dimensional FFTs, we propose an automatic performance tuning framework for various OpenCL GPUs. Several key techniques of GPU programming on AMD and NVIDIA GPUs are also identified. Our OpenCL FFT library achieves up to 1.5 to 4 times, 1.5 to 40 times and 1.4 times the performance of clAmdFft 1.0 for 1D, 2D and 3D FFT respectively on an AMD GPU, and the overall performance is within 90% of CUFFT 4.0 on two NVIDIA GPUs.
The paper suggests an attack trees based approach to security analysis of information systems. The approach considers both software-technical and social engineering attacks. It extends the approach to network security...
详细信息
The paper suggests an attack trees based approach to security analysis of information systems. The approach considers both software-technical and social engineering attacks. It extends the approach to network security analysis based on software-technical attacks which was suggested earlier by the authors of this paper. The main difference is in generalizing the suggested approach for information systems and in use of different conceptions, models and frameworks related to social-engineering attacks. In particular, we define conceptions of legitimate users and control areas. Besides, social-engineering attacks and attacks that require physical access to control areas are included to the attack trees used for security analysis. The paper also describes a security analysis toolkit based on the approach suggested and experiments with it to define the security level of information system.
In heterogeneous multi-core systems, such as the Cell BE processor, each accelerator core has its own fast local memory without hardware supported coherence and the software is responsible to dynamically transfer data...
详细信息
In heterogeneous multi-core systems, such as the Cell BE processor, each accelerator core has its own fast local memory without hardware supported coherence and the software is responsible to dynamically transfer data between the fast local and slow global memory. The data can be transferred through either a software controlled cache or a direct buffer. The software controlled cache maintains correctness for arbitrary access patterns, but introduces the extra overhead of cache lookup. Direct buffer is efficient for regular accesses, while requiring precise analysis, detailed modeling of execution, and significant code generation. In this paper we present the design and implementation of {\em DMATiler} which combines compiler analysis and runtime management to optimize local memory performance via automatic loop tiling and buffer optimization techniques. The DMATiler chooses a data transfer friendly loop order and using a empirically validated DMA performance model, it formulates and solves a convex optimization problem to determine globally optimal tile sizes. Further, the DMATiler applies optimization techniques such compressed data transfers and DMA commands to achieve the best DMA performance for a given loop nest. We have implemented the DMATiler in the IBM XL Single Source Compiler (SSC), and have conducted experiments with a set of loop nest benchmarks. The results show that the DMATiler is much more efficient than software controlled cache (average speedup of 9.8x) and single level loop blocking (average speedup of 6.2x) on the Cell BE processor.
The emergence of multi-core processor architectures and of diverse parallel computing paradigms has permeated into the area of mainstream computing. In this paper we present various parallelization approaches to High ...
详细信息
The emergence of multi-core processor architectures and of diverse parallel computing paradigms has permeated into the area of mainstream computing. In this paper we present various parallelization approaches to High Dynamic Range image creation, a rising technology employed in the field of imaging manipulation and processing. OpenMP and Pthreads implementation details are provided, and the performance and load-balancing capabilities of each approach is discussed, together with a scheme for overlapping I/O intensive regions with CPU-bound sections of the code, on three relevant datasets with resolutions spanning from 1:2 to 20 Mega Pixels.
Gene expression data is a very complex data set characterised by abundant numbers of features but with a low number of observations. However, only a small number of these features are relevant to an outcome of interes...
详细信息
Gene expression data is a very complex data set characterised by abundant numbers of features but with a low number of observations. However, only a small number of these features are relevant to an outcome of interest. With this kind of data set, feature selection becomes a real prerequisite. This paper proposes a methodology for feature selection for an imbalanced leukaemia gene expression data based on random forest algorithm. It presents the importance of feature selection in terms of reducing the number of features, enhancing the quality of machine learning and providing better understanding for biologists in diagnosis and prediction. Algorithms are presented to show the methodology and strategy for feature selection taking care to avoid over fitting. Moreover, experiments are done using imbalanced Leukaemia gene expression data and special measurement is used to evaluate the quality of feature selection and performance of classification.
Autonomic computing systems promise to manage themselves on a set of basic rules specified to higher level objectives. One of the challenges in making this possible is dependable collaboration among peers in a large-s...
详细信息
Autonomic computing systems promise to manage themselves on a set of basic rules specified to higher level objectives. One of the challenges in making this possible is dependable collaboration among peers in a large-scale network. Effective maintenance of next generation distributedsystems, such as clouds and second generation grids, will be nearly impossible without autonomic computing, with ever increasing scale of such systems. In addition, due to the nature of autonomous clouds to form administrative boundaries, depend able collaboration becomes a much harder problem. Employing information proxies may help improve such collaboration in existence of administrative boundaries. Although a general proxy definition can refer to many contexts, we focus on such proxies for dependable collaboration for distributed resource scheduling. Our definition of information proxies, and the particular areas we make use of them mainly contribute to the self configuring and self-optimizing fundamentals of the autonomic computing paradigm in general. By simulation, we show that information proxies help improve resource scheduling decisions that support large-scale autonomic computing systems.
暂无评论