High-performance processor architectures are moving toward the integration of more processing cores on a single chip. PCs and parallel computer systems will have higher computing performance. It is very meaningful for...
详细信息
High-performance processor architectures are moving toward the integration of more processing cores on a single chip. PCs and parallel computer systems will have higher computing performance. It is very meaningful for power system simulation, online stability assessment and control. But traditional serial and parallelsoftware can not fully exploit multi-core's capability without parallelizing restructure. In this paper, parallel simulation algorithms are analyzed and software restructure strategies are studied for multi-core processor platforms. The research indicates that: (1) For PCs, implicit parallel strategy with OpenMP is easier to implement and explicit parallel strategy with MPI or Pthreads is more adaptable to large-scale system simulation. (2) For clusters and distributedsystems, MPI+OpenMP hybrid programming is convenient and multilevel partitioning is potential. Both methods provide better speedup.
Today, the multi-core processor has occupied more and more market shares, and the programming personnel also must face the collision brought by the revolution of multi-core processor. Semiconductor scaling limits and ...
详细信息
Today, the multi-core processor has occupied more and more market shares, and the programming personnel also must face the collision brought by the revolution of multi-core processor. Semiconductor scaling limits and associated power and thermal challenges limit performance growth for single-core microprocessors. This reason leads many microprocessor vendors to turn instead to multi-core chip organizations. So programmer or compiler explicitly parallelize the software is the key for enhance the performance on multi-core chip. At the same time, parallel processing is not only the opportunity but also a challenge. The programmer or compiler explicitly parallelize the software is the key for enhance the performance on multi-core chip. In this paper, what we want to know is there any effective way that can reduce our time on rewrite or can automatically parallel the program for multi-processing purpose and do speedup the processing. We discussed some tools that can automatically generate OpenMP directives from serial C/C++ codes, and compare them with each other include normal C/C++ code, and run on general computer and embedded system. Also we compared some tools that are specifically designed to extract the most of data parallelism from C and FORTRAN kernels and translate them into NVIDIA CUDA or OpenCL to know how mush fast after use them.
A new framework for designing evolved program execution control in distributed programs is discussed in the paper. The framework provides an infrastructure for designing distributed program control based on monitoring...
详细信息
A new framework for designing evolved program execution control in distributed programs is discussed in the paper. The framework provides an infrastructure for designing distributed program control based on monitoring of global application states. Global control constructs are proposed which logically bind distributed program modules and define the flow of control dependent on the monitoring of global application states. Such control can be organized in programs at the process and thread levels. Special processes and threads called synchronizers collect state information from application modules, construct strongly consistent global states and evaluate control predicates on global states. Based on this evaluation control signals are sent to processes and threads to define inter module flow of control and to influence the internal module behavior. The proposed constructs are incorporated into the framework as a graphical API which is compiled into C/C++ programs with the MPI2, pthreads and Open Mp libraries for communication.
Recently, several formal approaches have been presented to address the problem of schedulability analysis of real-time systems by some varieties of timed automata, e.g., UPPAAL and TIMES. In this paper, we consider a ...
详细信息
Recently, several formal approaches have been presented to address the problem of schedulability analysis of real-time systems by some varieties of timed automata, e.g., UPPAAL and TIMES. In this paper, we consider a more general and complicated formal computational model for distributedsystems. To analyze the schedulability of tasks within this model by automata theory, we present a model, action automata, which is a class of suspension automata, to describe the execution semantics of tasks, we also define an environment model, environment automata, to describe the arrival patterns of tasks. One main result gives the scheduling policies under which the schedulability can be analyzed by our method correctly. To achieve this result, we translate the schedulability analysis to the reach ability analysis of the network of action automata and environment automata. Therefore, another main result of this paper is that we prove the reach ability of action automata is decidable. Based on these conclusions, we implement a prototype tool for schedulability analysis and test its performance under EDF policy.
CARS (Computational Architecture for Reflective systems) is a low-cost test bed for studying self-organization and real-time distributed behavior, using cars with on-board computers as autonomous agents, in an uncontr...
详细信息
CARS (Computational Architecture for Reflective systems) is a low-cost test bed for studying self-organization and real-time distributed behavior, using cars with on-board computers as autonomous agents, in an uncontrolled and largely unpredictable environment. This paper describes the software infrastructure for CARS, based on our Wrapping approach to knowledge-based integration. It allows us to share code between simulations for algorithm development and instrumented experiments with the real cars in a real environment. It also allows us to use many computational resources during algorithm development, and then to ``compile-out'' all resources that will not be needed, and all decision processes that have only one choice, in a given real environment. The instrumented experiment is run in parallel with the simulation, and the differences can be used to adjust the models. We describe the autonomic agent infrastructure, i.e., the ``enabling software'' processes: health and status, local activity maintenance, and fault management. These processes can be very resource-hungry in any agent, and our use of simulations allows us to study trade-offs directly between safety and capability in the agents, to tune the trade-off at deployment time, based on what we know or expect of the environment, and to monitor and change those assumptions when necessary.
State Machines (ASM) are mathematically defined environment for high-level system design, verification and analysis. This paper presents a definition of the hybrid approach to the specification, analysis and testing o...
详细信息
State Machines (ASM) are mathematically defined environment for high-level system design, verification and analysis. This paper presents a definition of the hybrid approach to the specification, analysis and testing of stateful grid services using ASM. This approach allows an easy integration of created specification of developed middle ware with existing components of grid systems. The important advantage of this approach is an automatic testing of the implementation, following the model-based testing approach. This allows a smooth transition from the specification to implementation stage, as well as investigation of features of specification and implementation, at every stage of their development. Also, a software environment has been developed which implements the defined approach.
The rapid diversification and evolution of wireless and multimedia standards change the flexibility of embedded processors from an option to the must. Coarse-Grained Reconfigurable Architectures (CGRAs) which make a g...
详细信息
The rapid diversification and evolution of wireless and multimedia standards change the flexibility of embedded processors from an option to the must. Coarse-Grained Reconfigurable Architectures (CGRAs) which make a good trade off between low power, non-programmable ASICs and high power, flexible DSPs become more and more popular. The mapping of the applications to CGRAs is the key to get high computational throughput. Because there is huge design space to explore on CGRAs, compilers must not only map the programs with high effectiveness, but also with high efficiency. However, there are many sorts of constraints exist during the mapping. Over or under estimate those constraints will lead to either low schedule quality or low efficiency. To meet this challenge, we propose an accurate constraints aware modulo scheduling approach: based on the co-analysis of the architecture and application, the compiler starts the scheduling with enough critical resources reserved and strictly make the re-try follow the correct order. The experiments on wireless base band programs show that the compilation can be speeded up by 300%.
In this paper, various motion vector predictors are analyzed and their selection method is proposed for 3D video coding. In a ubiquitous multimedia system, video compression has been expected to be an important elemen...
详细信息
In this paper, various motion vector predictors are analyzed and their selection method is proposed for 3D video coding. In a ubiquitous multimedia system, video compression has been expected to be an important element since an available bandwidth is very limited. In the proposed method, several motion vector predictors are competed each other with the slightly modified rate-distortion criterion. Spatial, temporal, and inter-view predictors are considered as motion vector predictors to reduce spatial, temporal, and interview redundancies, respectively. The proposed method increases the motion vector coding efficiency by selecting the best motion vector predictor among them. Accordingly, overall bit rates are reduced by 5.2 % in average, up to 6.1 %compared to reference software JMVC 6.0 in terms of the Bjontegaard Metric.
In peer-to-peer networks (P2Ps), many autonomous peers without preexisting trust relationships share resources with each other. Due to their open environment, the P2Ps usually employ reputation systems to provide guid...
详细信息
In peer-to-peer networks (P2Ps), many autonomous peers without preexisting trust relationships share resources with each other. Due to their open environment, the P2Ps usually employ reputation systems to provide guidance in selecting trustworthy resource providers for high reliability and security. However, node collusion impairs the effectiveness of reputation systems in trustworthy node selection. Although some reputation systems have certain mechanisms to counter collusion, the effectiveness of the mechanisms is not sufficiently high. In this paper, we leverage social networks to enhance the capability of reputation systems in combating collusion. We first analyzed real trace of the reputation system in the Overstock online auction platform which incorporates a social network. The analysis reveals the important impact of the social network on user purchasing and reputation rating patterns. We thus identified suspicious collusion behavior patterns and propose a social network based mechanism, namely Social Trust, to counter collusion. Social Trust adaptively adjusts the weight of ratings based on the social distance and interest relationship between peers. Experimental results show that Social Trust can significantly strengthen the capability of current reputation systems in combating collusion.
Summary form only given. Reaching an ExaScale computer by the end of the decade, and enabling the continued performance scaling of smaller systems requires significant research breakthroughs in three key areas: power ...
详细信息
Summary form only given. Reaching an ExaScale computer by the end of the decade, and enabling the continued performance scaling of smaller systems requires significant research breakthroughs in three key areas: power efficiency, programmability, and execution granularity. To build an ExaScale machine in a power budget of 20 MW requires a 200-fold improvement in energy per instruction: from 2 nJ to 10 pJ. Only 4x is expected from improved technology. The remaining 50x must come from improvements in architecture and circuits. To program a machine of this scale requires more productive parallel programming environments - that make parallel programming as easy as sequential programming is today. Finally, problem size and memory size constraints prevent the continued use of weak scaling, requiring these machines to extract parallelism at very fine granularity down to the level of a few instructions. This talk will discuss these challenges and current approaches to address them.
暂无评论