Sparse coding has shown its great potential in learning image feature representation. Recent developed methods such as group sparse coding prefer discovering the group relationships among examples and have achieved th...
详细信息
Sparse coding has shown its great potential in learning image feature representation. Recent developed methods such as group sparse coding prefer discovering the group relationships among examples and have achieved the state-of-the-art results in image classification. However, they suffer from poor robustness shortcomings in practice. This paper proposes a robust weighted supervised sparse coding method(RWSSC) to address this ***, RWSSC distinguishes different classes' contributions to the sparse coding by a novel weighting strategy meanwhile removes the out liers by imposing1 l-regularization over the noisy entries. Benefitting from these strategies, RWSSC can effectively boost performance of sparse coding in image ***, we developed the block coordinate descent algorithm to optimize it, and proved its *** results of image classification on two popular datasets show that RWSSC outperforms the representative sparse coding methods in quantities.
In order to utilize the shared last-level cache (LLC) in chip multi-processors (CMP) more efficiently, the partitioning of LLC resources among all cores should have the characteristics of low-latency for access, fine ...
详细信息
In order to utilize the shared last-level cache (LLC) in chip multi-processors (CMP) more efficiently, the partitioning of LLC resources among all cores should have the characteristics of low-latency for access, fine granularity for migration and simple hardware complexity for implementation. This paper proposes a dynamic LLC management scheme to achieve these goals. The proposed scheme migrates cache resources among different cores at the granularity of cache blocks, instead of ways. The quantity of victim cache blocks that each victim core can migrate to other target cores are related to an eviction probability, which are calculated according to the performance goal. Then the victim cache blocks for a target core is chosen from the nearest victim core who has non-zero eviction probability by introducing innovate E-Table structure in CMP. The eviction probabilities are updated periodically. With the help of E-Tables, the proposal achieves low-latency accesses by always keeping the required cache blocks near to the target cores. And fine granularity is guaranteed by maintaining an eviction probability for each core. In addition, only little additional hardware changes to traditional cache structure is required. Simulation results suggest significant performance improvements from 6.8% to 22.7% over related works.
Currently, the performance problems of software systems gets more and more attentions. Among various diagnosis methods based on system traces, principal component analysis (PCA) based methods are widely used due to th...
详细信息
ISBN:
(纸本)9781479919352
Currently, the performance problems of software systems gets more and more attentions. Among various diagnosis methods based on system traces, principal component analysis (PCA) based methods are widely used due to the high accuracy of the diagnosis results and requiring no specific domain knowledge. However, according to our experiments, we have validated several shortcomings existed in PCA-based methods, including requiring traces with a same call sequence, inefficiency when the traces are long, and missing performance problems. To cope with these issues, we introduce a segmentation based online diagnosis method in this poster.
Message Passing Interfaces (MPI) plays an important role in parallel computing. Many parallel applications are implemented as MPI programs. The existing methods of bug detection for MPI programs have the shortage of p...
详细信息
ISBN:
(纸本)9781479981120
Message Passing Interfaces (MPI) plays an important role in parallel computing. Many parallel applications are implemented as MPI programs. The existing methods of bug detection for MPI programs have the shortage of providing both input and non-determinism coverage, leading to missed bugs. In this paper, we employ symbolic execution to ensure the input coverage, and propose an on-the-fly schedule algorithm to reduce the interleaving explorations for non-determinism coverage, while ensuring the soundness and completeness. We have implemented our approach as a tool, called MPISE, which can automatically detect the deadlock and runtime bugs in MPI programs. The results of the experiments on benchmark programs and real world MPI programs indicate that MPISE finds bugs effectively and efficiently. In addition, our tool also provides diagnostic information and replay mechanism to help understand bugs.
Data clustering is usually time-consuming since it by default needs to iteratively aggregate and process large volume of data. Approximate aggregation based on sample provides fast and quality ensured results. In this...
详细信息
ISBN:
(纸本)9781467365994
Data clustering is usually time-consuming since it by default needs to iteratively aggregate and process large volume of data. Approximate aggregation based on sample provides fast and quality ensured results. In this paper, we propose to leverage approximation techniques to data clustering to obtain the trade-off between clustering efficiency and result quality, along with online accuracy estimation. The proposed method is based on the bootstrap trials. We implemented this method as an Intelligent Bootstrap Library (IBL) on Spark to support efficient data clustering. Intensive evaluations show that IBL can provide a 2x speed-up over the state of art solution with the same error bound.
In recent years, many companies are embracing the Hadoop MapReduce system for large-data processing with completion time constrains. However, exiting Hadoop schedulers still suffer from the reducer load imbalancing pr...
详细信息
ISBN:
(纸本)9781467381741
In recent years, many companies are embracing the Hadoop MapReduce system for large-data processing with completion time constrains. However, exiting Hadoop schedulers still suffer from the reducer load imbalancing problem. In this paper, we present a novel run-time load balancing method for MapReduce. Our approach predicts the workload of each reduce task at run-time, and assigns the reduce tasks to specified machines based on the estimated workload of reduce tasks dynamically. Therefore, our approach can achieve load balance among machines. The experimental results show that our approach achieves high accuracy while predicting the workload of reduce tasks, and improves the job completion time by up to 23.15%.
On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design...
详细信息
On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.
Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to ...
详细信息
Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to the standard DFF design in static test mode. In dynamic test mode, TMRFF shows much stronger frequency dependency than the DFF design, which reduces its advantage over DFF at higher operation frequency. At 160 MHz, the TMRFF is only 3.2× harder than the standard DFF. Such small improvement in the SEU performance of the TMR design may warrant reconsideration for its use in hardening design.
Interconnection network plays an important role in scalable high performance computer (HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interpr...
详细信息
Interconnection network plays an important role in scalable high performance computer (HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessot communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an upper message passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface.
Nonnegative matrix factorization (NMF) is a powerful technique for dimensionality reduction. Conventional NMF algorithms usually keep the matrices W and H nonnegative while iterating. However, to get the NMF of a matr...
详细信息
ISBN:
(纸本)9781479986989
Nonnegative matrix factorization (NMF) is a powerful technique for dimensionality reduction. Conventional NMF algorithms usually keep the matrices W and H nonnegative while iterating. However, to get the NMF of a matrix, it's unnecessary to force the temporary solutions in iterations nonnegative. In this paper, we propose a two-staged approach for NMF. At the relaxation stage, the nonnegative constraint of temporary solutions is relaxed and a real valued matrix factorization is generated. At the constraint stage, the real valued matrix factorization is transformed to a nonnegative matrix factorization by an invertible linear transformation. Based on this approach, we study on exact nonnegative matrix factorization when rank=2. We proved that, given two real valued matrices of rank=2, there exists an invertible linear transformation which can transform the real valued matrices to nonnegative matrices with their product stable. We propose an algorithm to find out the transformation. When rank is higher than 2, this kind of transformation may not exist. In the experiments, it's showed that this approach can reach a nonnegative matrix factorization with lower reconstruction error than conventional methods, and the technique for rank=2 exact NMF works well.
暂无评论