Existing Web services composition technology, lack of services selecting. A particle swarm optimization algorithm based on the multi-objective optimization strategies was provided to address quality of services to cho...
详细信息
Existing Web services composition technology, lack of services selecting. A particle swarm optimization algorithm based on the multi-objective optimization strategies was provided to address quality of services to choose the issue of global optimization. It is to transform Web services select the overall optimization problem with QoS into a multi-objective constrained optimization problem based on Qos, using multi-objective PSO intelligent theory, by a number of Qos parameters optimized at the same time, and ultimately produce a set of constraints to meet the Pareto optimal solution. The experimental results show the feasibility and efficiency of the algorithm.
In order to ensure the correctness and reliability of Web services composition based on OWL-S, verify the interaction protocol of Web services. It is provided that three lay architecture. Composition service based on ...
详细信息
In order to ensure the correctness and reliability of Web services composition based on OWL-S, verify the interaction protocol of Web services. It is provided that three lay architecture. Composition service based on OWL-S, GA model is the middle model and the Promela model is the verification model, SPIN as a model validator. The OWL-S composition Web services transform a top-down conversation protocol process into a GA model, use WAST tool into Promela model, the SPIN tool analysis and verification structure and performance of composition services. This method is flexibility and scalability which provide a solution for Web services composition model verification.
With virtual machine technology, distributed services deployed in multiple cooperative virtual machines, such as multi-tier Web services, may reside on one physical machine. This situation requires an efficient inter-...
详细信息
With virtual machine technology, distributed services deployed in multiple cooperative virtual machines, such as multi-tier Web services, may reside on one physical machine. This situation requires an efficient inter-domain communication channel, and meanwhile transparency and security should be guaranteed, for diverse existing distributed applications are serving on plenty of machines. In this paper, we have implemented a highly efficient inter-domain communication channel, called SChannel, with full transparency to both user applications and network protocol stack, and security between guest domains on Xen platform. Between two co-resident domains, SChannel establishes a two-way shared memory channel with elastic size, which is set up using static shared memory mechanism, instead of high-cost dynamic shared memory. Furthermore, SChannel avoids one additional copy from the shared data channel on the receiver domain side. In our evaluation using a number of standard benchmarks, SChannel increases the throughput 5 times than standard inter-domain mechanism offered by the hypervisor. Compared with other typical transparent inter-domain communication mechanism, SChannel achieves approximately 44.5% improvement of throughput, and reduces more than 3500 CPU cycles per packet.
As the design complexity increases dramatically, results of functional simulation are usually checked through only a part of signals during design verification. It is important, therefore, to consider the observabilit...
详细信息
ISBN:
(纸本)9781424437696
As the design complexity increases dramatically, results of functional simulation are usually checked through only a part of signals during design verification. It is important, therefore, to consider the observability of internal signals for effective checking. This paper proposes a static observability analysis method to automatically select internal observation signals, which improves the quality of functional verification. A series of formulas are defined to evaluate observability of internal signals, and an algorithm is proposed to locate the sources of low-observability. Such sources, rather than general hard-to-observe signals, are desirable internal observation signals. Experimental results indicate that signals selected by this method can improve the observability of designs more than those randomly selected from hard-to-observe signals.
Heterogeneity is considered as a solution for supercomputers to scale to petascale. Many systems which are composed of general CPUs and special processing units such as Cells, GPGPUs and FPGAs have been implemented. I...
详细信息
Heterogeneity is considered as a solution for supercomputers to scale to petascale. Many systems which are composed of general CPUs and special processing units such as Cells, GPGPUs and FPGAs have been implemented. In these systems, CPU needs interact with special processing units to process data together, thus communications between these heterogeneous processing units become a key problem, and the communication subsytem should provide low latency and high bandwidth. In this paper, we propose HPP-Controller, which is designed for connecting two different types of CPUs (AMD and Loongson) in one node. It connects heterogeneous CPUs on top of no-coherent HyperTransport (HT) fabric and supports Global Physical Address Space. We implement a FPGA-based prototype and evaluate it via experiments. Initial results show that HPP-Controller has low latency of 0.75 us and high bandwidth close to bandwith of HT links.
While computing is entering a new phase in which CPU improvements are driven by the addition of multiple cores on a single chip, rather than higher frequencies. Parallel processing on these systems is in a primitive s...
详细信息
With the wide application of EDA technique, the period for the development of electronic products has been shortened. That implements the software of the hardware design and reduces the costs. Based on the analysis of...
详细信息
With the wide application of EDA technique, the period for the development of electronic products has been shortened. That implements the software of the hardware design and reduces the costs. Based on the analysis of the principle of digital logic analyzer circuit, this paper discusses the working principles of its flip-flop circuit module and the implementation method of FPGA, and presents the program design and emulation result of part circuits.
This paper describes the design-for-testability (DFT) features and low-cost testing solutions of a general purpose microprocessor. The optimized DFT features are presented in detail. A hybrid scan compression struct...
详细信息
This paper describes the design-for-testability (DFT) features and low-cost testing solutions of a general purpose microprocessor. The optimized DFT features are presented in detail. A hybrid scan compression structure was executed and achieved compression ratio more than ten times. Memory built-in self-test (BIST) circuitries were designed with scan collars instead of bitmaps to reduce area overheads and to improve test and debug efficiency. The implemented DFT framework also utilized internal phase-locked loops (PLL) to provide complex at-speed test clock sequences. Since there are still limitations in this DFT design, the test strategies for this case are quite complex, with complicated automatic test pattern generation (ATPG) and debugging flow. The sample testing results are given in the paper. All the DFT methods discussed in the paper are prototypes for a high-volume manufacturing (HVM) DFT plan to meet high quality test goals as well as slow test power consumption and cost.
Chip multiprocessors (CMP) have become the main stream microprocessor architecture. In CMP, the cache, especially the last level cache, is the critical part of its performance and becomes a focus of current research a...
详细信息
Chip multiprocessors (CMP) have become the main stream microprocessor architecture. In CMP, the cache, especially the last level cache, is the critical part of its performance and becomes a focus of current research activities. CMP cache faces the conflicting requirements of satisfying both latency and capacity, and has to trade off between techniques that reduce off-chip and cross-chip misses. The private cache design minimizes the cache access latency but reduces the total effective cache capacity. The shared cache design maximizes the effective cache capacity but incurs long hit latency. In this paper, a CMP cache design (tradeoff cache between latency and capacity, TCLC) is proposed. TCLC is a private and shared hybrid design. TCLC can dynamically identify the cache blocks' shared type and optimize them respectively. The private type is optimized through migration policy, the shared read-only type is optimized through replication policy, and the shared read-write type is optimized through center placement policy. TCLC tries to make cache access latency close to private design, and effective cache capacity close to shared design, which can mitigate the impact of the wire delay and reduce the average memory access latency. The experiment results indicate that this proposal performs 13.7% better than a private cache and 12% better than a shared cache.
In processor architectures such as MIPS, ALPHA, SPARC and PowerPC, indirect addressing mode is always adopted to access global variables and static ones. Since the addresses of these variables and the corresponding va...
详细信息
In processor architectures such as MIPS, ALPHA, SPARC and PowerPC, indirect addressing mode is always adopted to access global variables and static ones. Since the addresses of these variables and the corresponding values are in different data sections in the corresponding binary file, the data locality of the program will be very poor. As a result, accessing the read only addresses of these variables every time tends to result in non-trivial redundant data cache miss memory accesses. Moreover, such indirect addressing mode will generate two sequential load instructions which have data dependences between them. As a result, the amount of instruction level parallelism (ILP) of the program will be decreased. The authors present an address register promotion method based on feedbacks (ARPF) to solve the above problems. ARPF algorithm reduces the redundant accesses to the read only addresses of the global variables and static ones, increases the amount of instruction level parallelism of a program, and avoids the performance declines due to the increase in register pressure caused by register promotion. The algorithm has been implemented in the Loongson compiler for MIPS architecture. Experiments on SPEC CPU2000INT benchmarks are conducted to show that ARPF can improve the performance of all benchmarks by 1%-6%.
暂无评论