The detection of community structure in complex networks is crucial since it provides insight into the substructures of the whole network. Spectral clustering algorithms that employ the eigenvalues and eigenvectors of...
详细信息
The detection of community structure in complex networks is crucial since it provides insight into the substructures of the whole network. Spectral clustering algorithms that employ the eigenvalues and eigenvectors of an appropriate input matrix have been successfully applied in this field. Despite its empirical success in community detection, spectral clustering has been criticized for its inefficiency when dealing with large scale data sets. This is confirmed by the fact that the time complexity for spectral clustering is cubic with respect to the number of instances;even the memory efficient iterative eigensolvers, such as the power method, may converge slowly to the desired solutions. In efforts to improve the complexity and performance, many non-traditional spectral clustering algorithms have been proposed. Rather than using the real eigenvalues and eigenvectors as in the traditional methods, the non-traditional clusterings employ additional topological structure information characterized by the spectrum of a matrix associated with the network involved, such as the complex eigenvalues and their corresponding complex eigenvectors, eigenspaces and semi-supervised labels. However, to the best of our knowledge, no work has been devoted to comparison among these newly developed approaches. This is the main goal of this paper, through evaluating the effectiveness of these spectral algorithms against some benchmark networks. The experimental results demonstrate that the spectral algorithm based on the eigenspaces achieves the best performance but is the slowest algorithm;the semi-supervised spectral algorithm is the fastest but its performance largely depends on the prior knowledge;and the spectral method based on the complement network shows similar performance to the conventional ones.
We combine the replica approach of statistical physics with a variational technique to make it applicable for the analysis of machine learning algorithms on real data. The method is applied to Gaussian process models ...
详细信息
We combine the replica approach of statistical physics with a variational technique to make it applicable for the analysis of machine learning algorithms on real data. The method is applied to Gaussian process models and their relative, the support vector machine. We discuss the quality of our theoretical results in comparison to experiments. As a key result, we apply our theory on real world benchmark data and show its potential for practical applications by deriving approximate expressions for data averaged performance measures which hold for general data distributions and allow us to optimize the performance of the learning algorithm.
To any sequence of real numbers (n >= 0), we can associate another sequence (s >= 0), which Knuth calls its binomial transform. This transform is defined through the rule as = Bsan = Sigma(n) (-1)(n) ((s)(n)) a(...
详细信息
To any sequence of real numbers < a(n)>(n >= 0), we can associate another sequence < a(s)>(s >= 0), which Knuth calls its binomial transform. This transform is defined through the rule as = Bsan = Sigma(n) (-1)(n) ((s)(n)) a(n). We study the properties of this transform, obtaining rules for its manipulation and a table of transforms, that allow us to invert many transforms by inspection. We use these methods to perform a detailed analysis of skip lists, a probabilistic data structure introduced by Pugh as an altemative to balanced trees. In particular, we obtain the mean and variance for the cost of searching for the first or the last element in the list (confirming results obtained previously by other methods), and also for the cost of searching for a random element (whose variance was not known). We obtain exact solutions, although not always in closed form. From them we are able to find the corresponding asymptotic expressions. (c) 2005 Elsevier B.V. All rights reserved.
Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of array-...
详细信息
Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of array-tries, list tries, and bst-tries ("ternary search tries"). The size and the search costs of the corresponding representations are analysed precisely in the average case, while a complete distributional analysis of the height of tries is given. The unifying data model used is that of dynamical sources and it encompasses classical models like those of memoryless sources with independent symbols, of finite Markov chains, and of nonuniform densities. The probabilistic behaviour of the main parameters, namely, size, path length, or height, appears to be determined by two intrinsic characteristics of the source: the entropy and the probability of letter coincidence. These characteristics are themselves related in a natural way to spectral properties of specific transfer operators of the Ruelle type.
We introduce a version of the cavity method for diluted mean field spin models that allows the computation of thermodynamic quantities similar to the Franz-Parisi quenched potential in sparse random graph models. This...
详细信息
We introduce a version of the cavity method for diluted mean field spin models that allows the computation of thermodynamic quantities similar to the Franz-Parisi quenched potential in sparse random graph models. This method is developed in the particular case of partially decimated random constraint satisfaction problems. This allows us to develop a theoretical understanding of a class of algorithms for solving constraint satisfaction problems, in which elementary degrees of freedom are sequentially assigned according to the results of a message passing procedure (belief propagation). We confront this theoretical analysis with the results of extensive numerical simulations.
This paper demonstrates that the potential of intrinsic parallelism in Monte Carlo methods, which has remained essentially untapped so far, can be exploited to implement these methods efficiently on SIMD and MIMD comp...
详细信息
This paper demonstrates that the potential of intrinsic parallelism in Monte Carlo methods, which has remained essentially untapped so far, can be exploited to implement these methods efficiently on SIMD and MIMD computers. Two basic static and dynamic computation assignment schemes are proposed for assigning the primary estimate computations (PECs) to processors in a parallel computer. These schemes can be used to design parallel Monte Carlo algorithms for many applications. The time complexity analyses of static computation assignment (SCA) schemes are carried out using some results from order statistics, whereas those of dynamic computation assignment (DCA) schemes are carried out using results from order statistics, renewal and queueing theories. It is shown that for smaller number of processors, linear speedup can be achieved with the SCA schemes and the speedup almost equal to the number of processors can be achieved with the DCA schemes. Some computational results for Monte Carlo solutions of Laplace’s equation are given to illustrate the performance of the various SCA and DCA schemes.
Over the past decade, tag search protocols have been suggested to efficiently acquire a specific RFID tag among a large group of tags by an RFID reader. For instance, in a warehouse, where there are thousands of packa...
详细信息
Over the past decade, tag search protocols have been suggested to efficiently acquire a specific RFID tag among a large group of tags by an RFID reader. For instance, in a warehouse, where there are thousands of packages each having an RFID tag attached, staffs may find specific packages using a reader that employs a tag search protocol. Although tag search protocols promise convenience, most of them can threaten the privacy of RFID tags in different ways. For instance, an attacker can impersonate a tag to replace it with another tag or can find the identity of a tag to track it. Recently, Sundaresan et al. have proposed an RFID tag search protocol based on 128-bit pseudo random number generators and exclusive-or operations which both can be easily implemented on low-cost RFID passive tags in EPC global Class-I Gen-2 standard even for large-scale implementations. They claim that their protocol not only offers anonymity, location privacy and forward secrecy for the reader and the tag, but also resists against de-synchronization, replay and impersonation attacks. In this paper, we analyze the security of their proposed tag search protocol and show that the protocol is vulnerable to de-synchronization and impersonation attacks and also cannot provide location privacy for the tag. (C) 2016 Elsevier B.V. All rights reserved.
This paper presents moment analyses and characterizations of limit distributions for the construction cost of hash tables under the linear probing strategy. Two models are considered, that of full tables and that of s...
详细信息
This paper presents moment analyses and characterizations of limit distributions for the construction cost of hash tables under the linear probing strategy. Two models are considered, that of full tables and that of sparse tables with a fixed tilling ratio strictly smaller than one. For full tables, the construction cost has expectation O(n(3/2)), the standard deviation is of the same order, and a limit law of the Airy type holds. (The Airy distribution is a semiclassical distribution that is defined in terms of the usual Airy functions or equivalently in terms of Bessel functions of indices -1/3, 2/3.) For sparse tables, the construction cost has expectation O(n), standard deviation O(root n), and a limit law of the Gaussian type. Combinatorial relations with other problems leading to Airy phenomena (like graph connectivity, tree inversions, tree path length, or area under excursions) are also briefly discussed.
BucketSort is a new external sorting algorithm for very large files that is a substantial improvement over merge sorting with disks. BucketSort requires an associative secondary storage device, which can be realized b...
详细信息
BucketSort is a new external sorting algorithm for very large files that is a substantial improvement over merge sorting with disks. BucketSort requires an associative secondary storage device, which can be realized by large disk drives with logic-per-track capabilities or by magnetic bubble memory (MBM). This paper describes and analyzes a hypothetical Bucket-Sort implementation that uses bubble memory. A new software marking technique is introduced that reduces the effective time for an associative search.
This paper extends the analysis of extendible hashing to cover partial expansions with elastic buckets. Although previous studies of elastic buckets can be adapted to extendible hashing, the approach taken here provid...
详细信息
This paper extends the analysis of extendible hashing to cover partial expansions with elastic buckets. Although previous studies of elastic buckets can be adapted to extendible hashing, the approach taken here provides another view to the problem. We provide a correspondence between fixed bucket capacities and elastic buckets. Furthermore, the results are based on easy and straightforward approximations.
暂无评论