The advent of high throughput technologies, in particular microarrays, for biological research has revived interest in clustering, resulting in a plethora of new clustering algorithms. However, model selection, i.e., ...
详细信息
The advent of high throughput technologies, in particular microarrays, for biological research has revived interest in clustering, resulting in a plethora of new clustering algorithms. However, model selection, i.e., the identification of the correct number of clusters in a dataset, has received relatively little attention. Indeed, although central for statistics, its difficulty is also well known. Fortunately, a few novel techniques for model selection, representing a sharp departure from previous ones in statistics, have been proposed and gained prominence for microarray data analysis. Among those, the stability-based methods are the most robust and best performing in terms of prediction, but the slowest in terms of time. It is very unfortunate that as fascinating and classic an area of statistics as model selection, with important practical applications, has received very little attention in terms of algorithmic design and engineering. In this paper, in order to partially fill this gap, we make the following contributions: (A) the first general algorithmic paradigm for stability-based methods for model selection;(B) reductions showing that all of the known methods in this class are an instance of the proposed paradigm;(C) a novel algorithmic paradigm for the class of stability-based methods for cluster validity, i.e., methods assessing how statistically significant is a given clustering solution;(D) a general algorithmic paradigm that describes heuristic and very effective speed-ups known in the literature for stability-based model selection methods. Since the performance evaluation of model selection algorithms is mainly experimental, we offer, for completeness and without even attempting to be exhaustive, a representative synopsis of known experimental benchmarking results that highlight the ability of stability-based methods for model selection and the computational resources that they require for the task. As a whole, the contributions of this paper generaliz
This paper investigates how to maintain an efficient dynamic ordered set of bit strings, which is an important problem in the field of information search and information processing. Generally, a dynamic ordered set is...
详细信息
This paper investigates how to maintain an efficient dynamic ordered set of bit strings, which is an important problem in the field of information search and information processing. Generally, a dynamic ordered set is required to support 5 essential operations including search, insertion, deletion, max-value retrieval and next-larger-value retrieval. Based on previous research fruits, we present an advanced data structure named rich binary tree (RBT), which follows both the binary-search-tree property and the digital-search-tree property. Also, every key K keeps the most significant difference bit (MSDB) between itself and the next larger value among K's ancestors, as well as that between itself and the next smaller one among its ancestors. With the new data structure, we can maintain a dynamic ordered set in O(L) time. Since computers represent objects in binary mode, our method has a big potential in application. In fact, RBT can be viewed as a general-purpose data structure for problems concerning order, such as search, sorting and maintaining a priority queue. For example, when RBT is applied in sorting, we get a linear-time algorithm with regard to the key number and its performance is far better than quick-sort. What is more powerful than quick-sort is that RBT supports constant-time dynamic insertion/deletion.
Cai and Schieber (1997) proved that bipartite graphs plus one edge can be recognized in linear time. We extend their result to bipartite graphs plus two edges. Our algorithm works on a depth-first-search spanning tree...
详细信息
Cai and Schieber (1997) proved that bipartite graphs plus one edge can be recognized in linear time. We extend their result to bipartite graphs plus two edges. Our algorithm works on a depth-first-search spanning tree. This gives, as a byproduct, also a simplified solution to the one-edge case. It is a notoriously open question whether recognizing bipartite graphs plus k edges is a fixed-parameter tractable problem. The present result might support the affirmative conjecture. (C) 2002 Elsevier Science B.V. All rights reserved.
We study combinatorial and algorithmic questions around minimal feedback vertex sets (FVS) in tournament graphs. On the combinatorial side, we derive upper and lower bounds on the maximum number of minimal FVSs in an ...
详细信息
We study combinatorial and algorithmic questions around minimal feedback vertex sets (FVS) in tournament graphs. On the combinatorial side, we derive upper and lower bounds on the maximum number of minimal FVSs in an n-vertex tournament. We prove that every tournament on n vertices has at most 1.6740n minimal FVSs, and that there is an infinite family of tournaments, all having at least 1.5448n minimal FVSs. This improves and extends the bounds of Moon (1971). On the algorithmic side, we design the first polynomial space algorithm that enumerates the minimal FVSs of a tournament with polynomial delay. The combination of our results yields the fastest known algorithm for finding a minimum-sized FVS in a tournament.
Affix trees are a generalization of suffix trees that are based on the inherent duality of suffix trees induced by the suffix links. An algorithm is presented that constructs affix trees on-line by expanding the under...
详细信息
Affix trees are a generalization of suffix trees that are based on the inherent duality of suffix trees induced by the suffix links. An algorithm is presented that constructs affix trees on-line by expanding the underlying string in both directions and that is the first known algorithm to do this with linear time complexity.
In aerodynamic shape optimization, traditional static geometry control methods can produce suboptimal performance by introducing performance tradeoffs at various stages of optimization, enforcing arbitrary constraints...
详细信息
In aerodynamic shape optimization, traditional static geometry control methods can produce suboptimal performance by introducing performance tradeoffs at various stages of optimization, enforcing arbitrary constraints on open-ended optimization, and necessitating foreknowledge of problem behavior to design an effective control scheme. These shortcomings can be mitigated through dynamic geometry control, which partly automates the geometry control design process by refining the geometry control topology throughout optimization. Such refinement can occur in a predetermined fashion (as in progressive control) or more automatically using sensitivity information to guide refinement (as in adaptive control). Both progressive control and adaptive control are implemented in the context of axial and free-form deformation geometry control, and novel contributions are made to the adaptive algorithm, including the treatment of active constraints and several novel "potential indicators" to rank candidate refinements. Application to a wide suite of aerodynamic shape optimization problems demonstrates that dynamic geometry control is effective, producing lower final drag than well-designed static schemes while reducing required iterations to convergence by 50% or more, and simultaneously reducing labor requirements on the user. These benefits are demonstrated across a wide variety of problems, representative of detailed and exploratory problems often encountered in both academia and industry.
Increased space sensing enables new measurements of a wide range of Earth science phenomena including volcanism, flooding, wildfires, and weather. Large-scale observation constellations of hundreds of assets have alre...
详细信息
Increased space sensing enables new measurements of a wide range of Earth science phenomena including volcanism, flooding, wildfires, and weather. Large-scale observation constellations of hundreds of assets have already been deployed (for example, Planet Labs's Dove satellites), and several constellations of tens of thousands of assets are planned. New challenges exist to rapidly assimilate available data and to optimize measurements by directing spacecraft assets to best observe complex Earth science phenomena. Centralized approaches to managing request allocation in these large constellations are constrained by 1) the need to assign/elect a central node to assign requests to spacecraft and 2) reliance on a single agent communicating with potentially thousands of dependent agents. On the other hand, entirely decentralized approaches to request allocation and observation are prone to oversatisfaction of some requests and undersatisfaction of others due to a lack of communication among agents. In large constellations, an intermediary method is necessary to solve the request allocation problem in a distributed manner. We present distributed artificial intelligence/multiagent methods that leverage existing work on distributed constraint optimization to allocate observations in a satellite constellation. We compare their performance to centralized and highly decentralized approaches using realistic orbits and observation request distributions. Our distributed algorithms can find approximate solutions to the large-scale constellation request allocation problem with low data volume for agent coordination and extend to continuous planning problems with varying request sets and availability of spacecraft agents.
暂无评论