We propose a set of highly scalable algorithms for the combinatorial data analysis problem of seriating similarity matrices. Seriation consists of finding a permutation of data instances, such that similar instances a...
详细信息
We propose a set of highly scalable algorithms for the combinatorial data analysis problem of seriating similarity matrices. Seriation consists of finding a permutation of data instances, such that similar instances are nearby in the ordering. Applications of the seriation problem can be found in various disciplines such as in bioinformatics for genome sequencing, data visualization and exploratory dataanalysis. Our algorithms attempt to minimize certain p-SUM objectives, which also arise in the problem of envelope reduction of sparse matrices. In particular, we present a set of graduated non-convexity algorithms for vector-based relaxations of the general p-SUM problem for p that can scale to very large problem sizes. Different choices of p emphasize global versus local similarity pattern structure. We conduct a number of experiments to compare our algorithms to various state-of-the-art combinatorial optimization methods on real and synthetic datasets. The experimental results demonstrate that compared to other approaches, the proposed algorithms are very competitive and scale well with large problem sizes.
We present a procedure (RAUS) for residual analysis of a dissimilarity matrix whereby unidimensional scaling is successively applied to the absolute value of residuals. A key advantage of RAUS is that the efficient De...
详细信息
We present a procedure (RAUS) for residual analysis of a dissimilarity matrix whereby unidimensional scaling is successively applied to the absolute value of residuals. A key advantage of RAUS is that the efficient Defays formulation of unidimensional scaling can be used for the fitting of each scale. An example using U.S. Supreme Court voting data is provided to illustrate the interpretation of successive scales. A simulation study was performed to evaluate RAUS.
We consider the problem of recovering a circular arrangement of data instances with respect to some proximity measure, such that nearby instances are more similar. Applications of this problem, also referred to as cir...
详细信息
We consider the problem of recovering a circular arrangement of data instances with respect to some proximity measure, such that nearby instances are more similar. Applications of this problem, also referred to as circular seriation, can be found in various disciplines such as genome sequencing, data visualization and exploratory dataanalysis. Circular seriation can be expressed as a quadratic assignment problem, which is in general an intractable problem. Spectral-based approaches can be used to find approximate solutions, but are shown to perform well only for a specific class of data matrices. We propose a bilevel optimization framework where we employ a spherical embedding approach together with a spectral method for circular ordering in order to recover circular arrangements of the embedded data. Experiments on real and synthetic datasets demonstrate the competitive performance of the proposed method. (C) 2019 Elsevier Ltd. All rights reserved.
The problem of comparing the agreement of two n x n matrices has a variety of applications in experimental psychology. A well-known index of agreement is based on the sum of the element-wise products of the matrices. ...
详细信息
The problem of comparing the agreement of two n x n matrices has a variety of applications in experimental psychology. A well-known index of agreement is based on the sum of the element-wise products of the matrices. Although less familiar to many researchers, measures of agreement based on within-row and/or within-column gradients can also be useful. We provide a suite of MATLAB programs for computing agreement indices and performing matrix permutation tests of those indices. Programs for computing exact p-values are available for small matrices, whereas resampling programs for approximate p-values are provided for larger matrices.
This work is related to the combinatorial data analysis problem of seriation used for data visualization and exploratory analysis. Seriation re-sequences the data, so that more similar samples or objects appear closer...
详细信息
This work is related to the combinatorial data analysis problem of seriation used for data visualization and exploratory analysis. Seriation re-sequences the data, so that more similar samples or objects appear closer together, whereas dissimilar ones are further apart. Despite the large number of current algorithms to realize such re-sequencing, there has not been a systematic way for analyzing the resulting sequences, comparing them, or fusing them to obtain a single unifying one. We propose a new positional proximity measure that evaluates the similarity of two arbitrary sequences based on their agreement on pairwise positional information of the sequenced objects. Furthermore, we present various statistical properties of this measure as well as its normalized version modeled as an instance of the generalized correlation coefficient. Based on this measure, we define a new procedure for consensus seriation that fuses multiple arbitrary sequences based on a quadratic assignment problem formulation and an efficient way of approximating its solution. We also derive theoretical links with other permutation distance functions and present their associated combinatorial optimization forms for consensus tasks. The utility of the proposed contributions is demonstrated through the comparison and fusion of multiple seriation algorithms we have implemented, using many real-world datasets from different application domains.
Over three decades ago an algebraic factoring algorithm was created as part of active logic synthesis efforts in the semiconductor industry. It initiated logic synthesis successes and steered synthesis in the directio...
详细信息
Over three decades ago an algebraic factoring algorithm was created as part of active logic synthesis efforts in the semiconductor industry. It initiated logic synthesis successes and steered synthesis in the direction of practical design tools: they can now take a design from its initial state to the final chip that meets a complex set of constraints. The evolved computing platforms today stimulate new exploratory avenues to extend the functionality of fundamental synthesis algorithms. We reexamine logic factoring in an effort to establish a stronger connection to the functional intent rather than structural implications of a design description within synthesis. A scalable factoring algorithm that reduces its dependence on the two-level minimization is described to improve the area of synthesized circuits. It is implemented using a new distributed environment that harnesses the key-value concept to find solutions through systematic transformations of a data set.
Dynamic programming methods for matrix permutation problems in combinatorial data analysis can produce globally-optimal solutions for matrices up to size 30x30, but are computationally infeasible for larger matrices b...
详细信息
Dynamic programming methods for matrix permutation problems in combinatorial data analysis can produce globally-optimal solutions for matrices up to size 30x30, but are computationally infeasible for larger matrices because of enormous computer memory requirements. Branch-and-bound methods also guarantee globally-optimal solutions, but computation time considerations generally limit their applicability to matrix sizes no greater than 35x35. Accordingly, a variety of heuristic methods have been proposed for larger matrices, including iterative quadratic assignment, tabu search, simulated annealing, and variable neighborhood search. Although these heuristics can produce exceptional results, they are prone to converge to local optima where the permutation is difficult to dislodge via traditional neighborhood moves (e.g., pairwise interchanges, object-block relocations, object-block reversals, etc.). We show that a heuristic implementation of dynamic programming yields an efficient procedure for escaping local optima. Specifically, we propose applying dynamic programming to reasonably-sized subsequences of consecutive objects in the locally-optimal permutation, identified by simulated annealing, to further improve the value of the objective function. Experimental results are provided for three classic matrix permutation problems in the combinatorial data analysis literature: (a) maximizing a dominance index for an asymmetric proximity matrix;(b) least-squares unidimensional scaling of a symmetric dissimilarity matrix;and (c) approximating an anti-Robinson structure for a symmetric dissimilarity matrix.
Multiobjective programming, a technique for solving mathematical optimization problems with multiple conflicting objectives, has received increasing attention among researchers in various academic disciplines. A summa...
详细信息
Multiobjective programming, a technique for solving mathematical optimization problems with multiple conflicting objectives, has received increasing attention among researchers in various academic disciplines. A summary of multiobjective programming techniques and a review of their applications in quantitative psychology are provided. (C) 2011 Elsevier Inc. All rights reserved.
We propose a hybrid heuristic procedure based on scatter search and tabu search for the problem of clustering objects to optimize multiple criteria. Our goal is to search for good approximations of the efficient front...
详细信息
We propose a hybrid heuristic procedure based on scatter search and tabu search for the problem of clustering objects to optimize multiple criteria. Our goal is to search for good approximations of the efficient frontier for this class of problems and provide a means for improving decision making in multiple application areas. Our procedure can be viewed as an extension of SSPMO (a scatter search application to nonlinear multiobjective optimization) to which we add new elements and strategies specially suited for combinatorial optimization problems. Clustering problems have been the subject of numerous studies;however, most of the work has focused on single-objective problems. Clustering using multiple criteria and/or multiple data sources has received limited attention in the operational research literature. Our scatter tabu search implementation is general and tackles several problems classes within this area of combinatorial data analysis. We conduct extensive experimentation to show that our method is capable of delivering good approximations of the efficient frontier for improved analysis and decision making. Journal of the Operational Research Society (2011) 62, 2034-2046. doi:10.1057/jors.2010.180 Published online 29 December 2010
Establishing blockmodels for one- and two-mode binary network matrices has typically been accomplished using multiple restarts of heuristic algorithms that minimize functions of inconsistency with an ideal block struc...
详细信息
Establishing blockmodels for one- and two-mode binary network matrices has typically been accomplished using multiple restarts of heuristic algorithms that minimize functions of inconsistency with an ideal block structure. Although these algorithms likely yield exceptional performance, they are not assured to provide blockmodels that optimize the functional indices. In this paper, we present integer programming models that, for a prespecified image matrix, can produce guaranteed optimal solutions for matrices of nontrivial size. Accordingly, analysts performing a confirmatory analysis of a prespecified blockmodel structure can apply our models directly to obtain an optimal solution. In exploratory cases where a blockmodel structure is not prespecified, we recommend a two-stage procedure, where a heuristic method is first used to identify an image matrix and the integer program is subsequently formulated and solved to identify the optimal solution for that image matrix. Although best suited for ideal block structures associated with structural equivalence, the integer programming models have the flexibility to accommodate functional indices pertaining to regular equivalence. Computational results are reported for a variety of one- and two-mode matrices from the blockmodeling literature. (C) 2009 Elsevier Inc. All rights reserved.
暂无评论