Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at t...
详细信息
Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.
The pull-based development model, widely used in distributed software teams on open source communities, can efficiently gather the wisdom from crowds. Instead of sharing access to a central repository,contributors cre...
详细信息
The pull-based development model, widely used in distributed software teams on open source communities, can efficiently gather the wisdom from crowds. Instead of sharing access to a central repository,contributors create a fork, update it locally, and request to have their changes merged back, i.e., submit a pull-request. On the one hand, this model lowers the barrier to entry for potential contributors since anyone can submit pull-requests to any repository, but on the other hand it also increases the burden on integrators, who are responsible for assessing the proposed patches and integrating the suitable changes into the central repository. The role of integrators in pull-based development is crucial. They must not only ensure that pull-requests should meet the project’s quality standards before being accepted, but also finish the evaluations in a timely manner. To keep up with the volume of incoming pull-requests, continuous integration(CI) is widely adopted to automatically build and test every pull-request at the time of submission. CI provides extra evidences relating to the quality of pull-requests, which would help integrators to make final decision(i.e., accept or reject). In this paper, we present a quantitative study that tries to discover which factors affect the process of pull-based development model, including acceptance and latency in the context of CI. Using regression modeling on data extracted from a sample of Git Hub projects deploying the Travis-CI service, we find that the evaluation process is a complex issue, requiring many independent variables to explain adequately. In particular, CI is a dominant factor for the process, which not only has a great influence on the evaluation process per se, but also changes the effects of some traditional predictors.
As a solution to growing global wire delay, non-uniform cache architecture (NUCA) has already been a trend in large cache designs. The access time of NUCA is determined by the distance between the cache bank containin...
详细信息
distributed software systems are becoming more and more complex *** is easy to find a huge amount of computing nodes in a nationwide or global information *** example,We Chat(Wei Xin),a well-known mobile application i...
详细信息
distributed software systems are becoming more and more complex *** is easy to find a huge amount of computing nodes in a nationwide or global information *** example,We Chat(Wei Xin),a well-known mobile application in China,has reached a record of 650 million monthly active users in the third quarter of *** the same time,researchers are starting to talk about software systems which have billions of lines of codes[1]or can last one hundred years.
Virtual Machine(VM) allocation for multiple tenants is an important and challenging problem to provide efficient infrastructure services in cloud data centers. Tenants run applications on their allocated VMs, and th...
详细信息
Virtual Machine(VM) allocation for multiple tenants is an important and challenging problem to provide efficient infrastructure services in cloud data centers. Tenants run applications on their allocated VMs, and the network distance between a tenant's VMs may considerably impact the tenant's Quality of Service(Qo S). In this study, we define and formulate the multi-tenant VM allocation problem in cloud data centers, considering the VM requirements of different tenants, and introducing the allocation goal of minimizing the sum of the VMs' network diameters of all tenants. Then, we propose a Layered Progressive resource allocation algorithm for multi-tenant cloud data centers based on the Multiple Knapsack Problem(LP-MKP). The LP-MKP algorithm uses a multi-stage layered progressive method for multi-tenant VM allocation and efficiently handles unprocessed tenants at each stage. This reduces resource fragmentation in cloud data centers, decreases the differences in the Qo S among tenants, and improves tenants' overall Qo S in cloud data centers. We perform experiments to evaluate the LP-MKP algorithm and demonstrate that it can provide significant gains over other allocation algorithms.
Internet-scale open source software (OSS) pro- duction in various communities generates abundant reusable resources for software developers. However, finding the de- sired and mature software with keyword queries fr...
详细信息
Internet-scale open source software (OSS) pro- duction in various communities generates abundant reusable resources for software developers. However, finding the de- sired and mature software with keyword queries from a considerable number of candidates, especially for the fresher, is a significant challenge because current search services often fail to understand the semantics of user queries. In this paper, we construct a software term database (STDB) by analyzing tagging data in Stack Overflow and propose a correlationbased software search (CBSS) approach that performs correlation retrieval based on the term relevance obtained from STDB. In addition, we design a novel ranking method to optimize the initial retrieval result. We explore four research questions in four experiments, respectively, to evaluate the effectiveness of the STDB and investigate the performance of the CBSS. The experiment results show that the proposed CBSS can effectively respond to keyword-based software searches and significantly outperforms other existing search services at finding mature software.
Based on 3 D-TCAD simulations, single-event transient(SET) effects and charge collection mechanisms in fully depleted silicon-on-insulator(FDSOI) transistors are investigated. This work presents a comparison between28...
详细信息
Based on 3 D-TCAD simulations, single-event transient(SET) effects and charge collection mechanisms in fully depleted silicon-on-insulator(FDSOI) transistors are investigated. This work presents a comparison between28-nm technology and 0.2-lm technology to analyze the impact of strike location on SET sensitivity in FDSOI devices. Simulation results show that the most SET-sensitive region in FDSOI transistors is the drain region near the gate. An in-depth analysis shows that the bipolar amplification effect in FDSOI devices is dependent on the strike locations. In addition, when the drain contact is moved toward the drain direction, the most sensitive region drifts toward the drain and collects more charge. This provides theoretical guidance for SET hardening.
FinFET technologies are becoming the mainstream process as technology scales down. Based on a 28-nm bulk p- FinFET device, we have investigated the fin width and height dependence of bipolar amplification for heavy-io...
详细信息
FinFET technologies are becoming the mainstream process as technology scales down. Based on a 28-nm bulk p- FinFET device, we have investigated the fin width and height dependence of bipolar amplification for heavy-ion-irradiated FinFETs by 3D TCAD numerical simulation. Simulation results show that due to a well bipolar conduction mechanism rather than a channel (fin) conduction path, the transistors with narrower fins exhibit a diminished bipolar amplification effect, while the fin height presents a trivial effect on the bipolar amplification and charge collection. The results also indicate that the single event transient (SET) pulse width can be mitigated about 35% at least by optimizing the ratio of fin width and height, which can provide guidance for radiation-hardened applications in bulk FinFET technology.
Single event upset (SEU) is one of the most important origins of soft errors in aerospace *** technology scales down persistently, charge sharing is playing a more and more significant effect on SEU of flip-flop. Char...
详细信息
Single event upset (SEU) is one of the most important origins of soft errors in aerospace *** technology scales down persistently, charge sharing is playing a more and more significant effect on SEU of flip-flop. Charge sharing can often bring about multi-node charge collection in storage nodes and non-storage nodes in a flip-flop. In this paper, multi-node charge collection in flip-flop data input and flip-flop clock signal is investigated by 3D TCAD mixed-mode simulations, and the simulate results indicate that single event double transient (SEDT) in flip-flop data input and flip-flop clock signal can also cause a SEU in flip-flop. This novel mechanism is called the SEDT-induced SEU, and it is also verified by heavy-ion experiment in 65 nm twin-well process. The simulation results also indicate that this mechanism is closely related with the well-structure,and the triple-well structure is more effective to increase the SEU threshold of this mechanism than twin-well structure.
In data center networks, resource allocation based on workload is an effective way to allocate the infrastructure resources to diverse cloud applications and satisfy the quality of service for the users, which refers ...
详细信息
In data center networks, resource allocation based on workload is an effective way to allocate the infrastructure resources to diverse cloud applications and satisfy the quality of service for the users, which refers to mapping a large number of workloads provided by cloud users/tenants to substrate network provided by cloud providers. Although the existing heuristic approaches are able to find a feasible solution, the quality of the solution is not guaranteed. Concerning this issue, based on the minimum mapping cost, this paper solves the resource allocation problem by modeling it as a distributed constraint optimization problem. Then an efficient approach is proposed to solve the resource allocation problem, aiming to find a feasible solution and ensuring the optimality of the solution. Finally, theoretical analysis and extensive experiments have demonstrated the effectiveness and efficiency of our proposed approach.
暂无评论