Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at t...
详细信息
Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.
The widening gap between processor and memory speeds makes cache an important issue in the computer system design. Compared with work set of programs, cache resource is often rare. Therefore, it is very important for ...
详细信息
The widening gap between processor and memory speeds makes cache an important issue in the computer system design. Compared with work set of programs, cache resource is often rare. Therefore, it is very important for a computer system to use cache efficiently. Toward a dynamically reconfigurable cache proposed recently, DOOC (Data- Object Oriented Cache), this paper proposes a quantitative framework for analyzing the cache requirement of data-objects, which includes cache capacity, block size, associativity and coherence protocol. And a kind of graph coloring algorithm dealing with the competition between data-objects in the DOOC is proposed as well. Finally, we apply our approaches to the compiler management of DOOC. We test our approaches on both a single-core platform and a four-core platform. Compared with the traditional caches, the DOOC in both platforms achieves an average reduction of 44.98% and 49.69% in miss rate respectively. And its performance is very close to the ideal optimal cache.
The pull-based development model, widely used in distributed software teams on open source communities, can efficiently gather the wisdom from crowds. Instead of sharing access to a central repository,contributors cre...
详细信息
The pull-based development model, widely used in distributed software teams on open source communities, can efficiently gather the wisdom from crowds. Instead of sharing access to a central repository,contributors create a fork, update it locally, and request to have their changes merged back, i.e., submit a pull-request. On the one hand, this model lowers the barrier to entry for potential contributors since anyone can submit pull-requests to any repository, but on the other hand it also increases the burden on integrators, who are responsible for assessing the proposed patches and integrating the suitable changes into the central repository. The role of integrators in pull-based development is crucial. They must not only ensure that pull-requests should meet the project’s quality standards before being accepted, but also finish the evaluations in a timely manner. To keep up with the volume of incoming pull-requests, continuous integration(CI) is widely adopted to automatically build and test every pull-request at the time of submission. CI provides extra evidences relating to the quality of pull-requests, which would help integrators to make final decision(i.e., accept or reject). In this paper, we present a quantitative study that tries to discover which factors affect the process of pull-based development model, including acceptance and latency in the context of CI. Using regression modeling on data extracted from a sample of Git Hub projects deploying the Travis-CI service, we find that the evaluation process is a complex issue, requiring many independent variables to explain adequately. In particular, CI is a dominant factor for the process, which not only has a great influence on the evaluation process per se, but also changes the effects of some traditional predictors.
The internal single-event transient(SET) induced upset in flip-flops is becoming significant with the increase of the operating frequency. However, the conventional soft error rate(SER) evaluation approach could only ...
详细信息
The internal single-event transient(SET) induced upset in flip-flops is becoming significant with the increase of the operating frequency. However, the conventional soft error rate(SER) evaluation approach could only produce an approximate upset prediction result caused by the internal SET. In this paper, we propose an improved SER evaluation approach based on Monte Carlo simulation. A novel SET-based upset model is implemented in the proposed evaluation approach to accurately predict upsets caused by the internal SET. A test chip was fabricated in a commercial 65 nm bulk process to validate the accuracy of the improved SER evaluation approach. The predicted single-event upset cross-sections are consistent with the experimental data.
distributed software systems are becoming more and more complex *** is easy to find a huge amount of computing nodes in a nationwide or global information *** example,We Chat(Wei Xin),a well-known mobile application i...
详细信息
distributed software systems are becoming more and more complex *** is easy to find a huge amount of computing nodes in a nationwide or global information *** example,We Chat(Wei Xin),a well-known mobile application in China,has reached a record of 650 million monthly active users in the third quarter of *** the same time,researchers are starting to talk about software systems which have billions of lines of codes[1]or can last one hundred years.
Virtual Machine(VM) allocation for multiple tenants is an important and challenging problem to provide efficient infrastructure services in cloud data centers. Tenants run applications on their allocated VMs, and th...
详细信息
Virtual Machine(VM) allocation for multiple tenants is an important and challenging problem to provide efficient infrastructure services in cloud data centers. Tenants run applications on their allocated VMs, and the network distance between a tenant's VMs may considerably impact the tenant's Quality of Service(Qo S). In this study, we define and formulate the multi-tenant VM allocation problem in cloud data centers, considering the VM requirements of different tenants, and introducing the allocation goal of minimizing the sum of the VMs' network diameters of all tenants. Then, we propose a Layered Progressive resource allocation algorithm for multi-tenant cloud data centers based on the Multiple Knapsack Problem(LP-MKP). The LP-MKP algorithm uses a multi-stage layered progressive method for multi-tenant VM allocation and efficiently handles unprocessed tenants at each stage. This reduces resource fragmentation in cloud data centers, decreases the differences in the Qo S among tenants, and improves tenants' overall Qo S in cloud data centers. We perform experiments to evaluate the LP-MKP algorithm and demonstrate that it can provide significant gains over other allocation algorithms.
Graph is a significant data structure that describes the relationship between entries. Many application domains in the real world are heavily dependent on graph data. However, graph applications are vastly different f...
详细信息
Graph is a significant data structure that describes the relationship between entries. Many application domains in the real world are heavily dependent on graph data. However, graph applications are vastly different from traditional applications. It is inefficient to use general-purpose platforms for graph applications, thus contributing to the research of specific graph processing platforms. In this survey, we systematically categorize the graph workloads and applications, and provide a detailed review of existing graph processing platforms by dividing them into general-purpose and specialized systems. We thoroughly analyze the implementation technologies including programming models, partitioning strategies, communication models, execution models, and fault tolerance strategies. Finally, we analyze recent advances and present four open problems for future research.
Internet-scale open source software (OSS) pro- duction in various communities generates abundant reusable resources for software developers. However, finding the de- sired and mature software with keyword queries fr...
详细信息
Internet-scale open source software (OSS) pro- duction in various communities generates abundant reusable resources for software developers. However, finding the de- sired and mature software with keyword queries from a considerable number of candidates, especially for the fresher, is a significant challenge because current search services often fail to understand the semantics of user queries. In this paper, we construct a software term database (STDB) by analyzing tagging data in Stack Overflow and propose a correlationbased software search (CBSS) approach that performs correlation retrieval based on the term relevance obtained from STDB. In addition, we design a novel ranking method to optimize the initial retrieval result. We explore four research questions in four experiments, respectively, to evaluate the effectiveness of the STDB and investigate the performance of the CBSS. The experiment results show that the proposed CBSS can effectively respond to keyword-based software searches and significantly outperforms other existing search services at finding mature software.
ETLs are temporal logics employing w-automata as temporal connectives. This paper presents sound and complete axiom systems for ETLl, ETLf, and ETLr, respectively. Axioms and rules reflecting temporal behaviors of loo...
详细信息
ISBN:
(纸本)9783540752905
ETLs are temporal logics employing w-automata as temporal connectives. This paper presents sound and complete axiom systems for ETLl, ETLf, and ETLr, respectively. Axioms and rules reflecting temporal behaviors of looping, finite and repeating automaton connectives are provided. Moreover, by encoding temporal operators into automaton connectives and instantiating the axioms and rules relating to automaton connectives, one may derive axiom systems for given ETL fragments.
In data center networks, resource allocation based on workload is an effective way to allocate the infrastructure resources to diverse cloud applications and satisfy the quality of service for the users, which refers ...
详细信息
In data center networks, resource allocation based on workload is an effective way to allocate the infrastructure resources to diverse cloud applications and satisfy the quality of service for the users, which refers to mapping a large number of workloads provided by cloud users/tenants to substrate network provided by cloud providers. Although the existing heuristic approaches are able to find a feasible solution, the quality of the solution is not guaranteed. Concerning this issue, based on the minimum mapping cost, this paper solves the resource allocation problem by modeling it as a distributed constraint optimization problem. Then an efficient approach is proposed to solve the resource allocation problem, aiming to find a feasible solution and ensuring the optimality of the solution. Finally, theoretical analysis and extensive experiments have demonstrated the effectiveness and efficiency of our proposed approach.
暂无评论