Semi-supervised clustering aims at boosting the clustering performance on unlabeled samples by using labels from a few labeled samples. Constrained NMF (CNMF) is one of the most significant semi-supervised clustering ...
详细信息
ISBN:
(纸本)9781479914821
Semi-supervised clustering aims at boosting the clustering performance on unlabeled samples by using labels from a few labeled samples. Constrained NMF (CNMF) is one of the most significant semi-supervised clustering methods, and it factorizes the whole dataset by NMF and constrains those labeled samples from the same class to have identical encodings. In this paper, we propose a novel soft-constrained NMF (SCNMF) method by softening the hard constraint in CNMF. Particularly, SCNMF factorizes the whole dataset into two lower-dimensional factor matrices by using multiplicative update rule (MUR). To utilize the labels of labeled samples, SCNMF iteratively normalizes both factor matrices after updating them with MURs to make encodings of labeled samples close to their label vectors. It is therefore reasonable to believe that encodings of unlabeled samples are also close to their corresponding label vectors. Such strategy significantly boosts the clustering performance even when the labeled samples are rather limited, e.g., each class owns only a single labeled sample. Since the normalization procedure never increases the computational complexity of MUR, SCNMF is quite efficient and effective in practices. Experimental results on face image datasets illustrate both efficiency and effectiveness of SCNMF compared with both NMF and CNMF.
To reduce the access latencies of end hosts,latency-sensitive applications need to choose suitably close service machines to answer the access requests from end *** K nearest neighbor search locates K service machines...
详细信息
To reduce the access latencies of end hosts,latency-sensitive applications need to choose suitably close service machines to answer the access requests from end *** K nearest neighbor search locates K service machines closest to end hosts,which can efficiently optimize the access latencies for end *** work has weakness in terms of the accuracy and *** to the scalable and accurate K nearest neighbor search problem,we propose a distributed K nearest neighbor search method called DKNNS in this *** machines are organized into a locality-aware multilevel *** first locates a service machine that starts the search process based on a farthest neighbor search scheme,then discovers K nearest service machines based on a backtracking approach within the proximity region containing the target in the latency *** analysis,simulation results and deployment experiments on the PlanetLab show that,DKNNS can determine K approximately optimal service machines,with modest completion time and query ***,DKNNS is also quite stable that can be used for reducing frequent searches by caching found nearest neighbors.
Projective non-negative matrix factorization (PNMF) projects a set of examples onto a subspace spanned by a non-negative basis whose transpose is regarded as the projection matrix. Since PNMF learns a natural parts-ba...
详细信息
ISBN:
(纸本)9781479914821
Projective non-negative matrix factorization (PNMF) projects a set of examples onto a subspace spanned by a non-negative basis whose transpose is regarded as the projection matrix. Since PNMF learns a natural parts-based representation, it has been successfully used in text mining and pattern recognition. However, it is non-trivial to analyze the convergence of the optimization algorithms for PNMF because its objective function is non-convex. In this paper, we propose a Box-constrained PNMF (BPNMF) method to overcome this deficiency of PNMF. In particular, BPNMF introduces an auxiliary variable, i.e., the coefficients of examples, and incorporates the following two types of constraints: 1) each entry of the basis is non-negative and upper-bounded, i.e., box-constrained, and 2) the coefficients equal to the projected points of the examples. The first box constraint makes the basis to be bound and the second equality constraint keeps its equivalence to PNMF. Similar to PNMF, BPNMF is difficult because the objective function is non-convex. To solve BPNMF, we developed an efficient algorithm in the frame of augmented Lagrangian multiplier (ALM) method and proved that the ALM-based algorithm converges to local minima. Experimental results on two face image datasets demonstrate the effectiveness of BPNMF compared with the representative methods.
Non-negative matrix factorization (NMF) is an efficient dimension reduction method and plays an important role in many pattern recognition and computer vision tasks. However, conventional NMF methods are not robust si...
详细信息
Non-negative matrix factorization (NMF) is an efficient dimension reduction method and plays an important role in many pattern recognition and computer vision tasks. However, conventional NMF methods are not robust since the objective functions are sensitive to outliers and do not consider the geometric structure in datasets. In this paper, we proposed a correntropy graph regularized NMF (CGNMF) to overcome the aforementioned problems. CGNMF maximizes the correntropy between data matrix and its reconstruction to filter out the noises of large magnitudes, and expects the coefficients to preserve the intrinsic geometric structure of data. We also proposed a modified version of our CGNMF which construct the adjacent graph by using sparse representation to enhance its reliability. Experimental results on popular image datasets confirm the effectiveness of CGNMF.
The concerns of data-intensiveness and energy awareness are actively reshaping the design of high-performance computing (HPC) systems nowadays. The Graph500 is a widely adopted benchmark for evaluating the performance...
详细信息
ISBN:
(纸本)9781479947638
The concerns of data-intensiveness and energy awareness are actively reshaping the design of high-performance computing (HPC) systems nowadays. The Graph500 is a widely adopted benchmark for evaluating the performance of computing systems for data-intensive workloads. In this paper, we introduce a data-parallel implementation of Graph500 on the Intel Single-chip Cloud computer (SCC). The SCC features a non-coherent many-core architecture and multi-domain on-chip DVFS support for dynamic power management. With our custom-made shared virtual memory programming library, memory sharing among threads is done efficiently via the shared physical memory (SPM) while the library has taken care of the coherence. We conduct an in-depth study on the power and performance characteristics of the Graph500 workloads running on this system with varying system scales and power states. Our experimental results are insightful for the design of energy-efficient many-core systems for data-intensive applications.
parallel programs face a new security problem - concurrency vulnerability, which is caused by a special thread scheduling instead of inputs. In this paper, we propose to automatically fix concurrency vulnerabilities b...
详细信息
parallel programs face a new security problem - concurrency vulnerability, which is caused by a special thread scheduling instead of inputs. In this paper, we propose to automatically fix concurrency vulnerabilities by reducing thread scheduling space. Our method is based on two observations. First, most concurrency vulnerabilities are caused by atomicity violation errors. Second, reducing thread scheduling space does not harm the correctness of the original program. We designed a prototype runtime system shield using deterministic multithreading techniques. Shield is designed to transparently run parallel programs and schedule threads in large instruction blocks to prevent atomicity violation at best effort. In case some concurrency vulnerabilities cannot be fixed by shield's scheduling reducing scheme, we also provide a remedy strategy by integrating shield with record&replay function, so that it can help programmers to analyze attacker's behavior for manually fixing.
Maximum common sub-graph isomorphism (MCS) is a famous NP-hard problem in graph processing. The problem has found application in many areas where the similarity of graphs is important, for example in scene matching, v...
详细信息
Maximum common sub-graph isomorphism (MCS) is a famous NP-hard problem in graph processing. The problem has found application in many areas where the similarity of graphs is important, for example in scene matching, video indexing, chemical similarity and shape analysis. In this paper, a novel algorithm Qwalk is proposed for approximate MCS, utilizing the discrete-time quantum walk. Based on the new observation that isomorphic neighborhood group matches can be detected quickly and conveniently by the destructive interference of a quantum walk, the new algorithm locates an approximate solution via merging neighborhood groups. Experiments show that Qwalk has better accuracy, universality and robustness compared with the state-of-the-art approximate MCS methods. Meanwhile, Qwalk is a general algorithm to solve the MCS problem approximately while having modest time complexity.
Cloud computing provides a new paradigm for resource utilization and sharing. However, the reliability problems, like system failures, often happen in cloud systems and bring enormous loss. Trace-oriented monitoring i...
详细信息
Cloud computing provides a new paradigm for resource utilization and sharing. However, the reliability problems, like system failures, often happen in cloud systems and bring enormous loss. Trace-oriented monitoring is an important runtime method to improve the reliability of cloud systems. In this paper, we propose to bring runtime verification into trace-oriented monitoring, to facilitate the specification of monitoring requirements and to improve the efficiency of monitoring cloud systems. Based on a data set collected from a cloud storage system in a real environment, we validate our approach by monitoring the critical properties of the storage system. The preliminary experimental results indicate the promise of our approach.
Cyber Movement Organization (CMO) is a special kind of social movement organization on the Web. In this paper, we propose a model to simulate the mobilizing process of CMO, which consists of the individual unit, organ...
详细信息
Link partition clusters edges of a complex network to discover its overlapping communities. Due to Its effectiveness, link partition has attracted much attentions from the network science community. However, since lin...
详细信息
Link partition clusters edges of a complex network to discover its overlapping communities. Due to Its effectiveness, link partition has attracted much attentions from the network science community. However, since link partition assigns each edge of a network to unique community, it cannot detect the disjoint communities. To overcome this deficiency, this paper proposes a link partition on asymmetric weighted graph (LPAWG) method for detecting overlapping communities. Particularly, LPAWG divides each edge into two parts to distinguish the roles of connected nodes. This strategy biases edges to a specific node and helps assigning each node to its affiliated community. Since LPAWG introduces more edges than those in the original network, it cannot efficiently detect communities from some networks with relative large amount of edges. We therefore aggregate the line graph of LPAWG to shrink its scale. Experimental results of community detection on both synthetic datasets and the realworld networks show the effectiveness of LPAWG comparing with the representative methods.
暂无评论