The field of multi-stage stochastic programming provides a rich modelling framework to tackle a broad range of real-world decision problems. In order to numerically solve such programs - once they get reasonably large...
详细信息
algorithms for preprocessing databases with incomplete and imprecise data are seldom studied. For the most part, we lack numerical tools to quantify the mutual information between fuzzy random variables. Therefore, th...
详细信息
algorithms for preprocessing databases with incomplete and imprecise data are seldom studied. For the most part, we lack numerical tools to quantify the mutual information between fuzzy random variables. Therefore, these algorithms (discretization, instance selection, feature selection, etc.) have to use crisp estimations of the interdependency between continuous variables, whose application to vague datasets is arguable. In particular, when we select features for being used in fuzzy rule-based classifiers, we often use a mutual information-based ranking of the relevance of inputs. But, either with crisp or fuzzy data, fuzzy rule-based systems route the input through a fuzzification interface. The fuzzification process may alter this ranking, as the partition of the input data does not need to be optimal. In our opinion, to discover the most important variables for a fuzzy rule-based system, we want to compute the mutual information between the fuzzified variables, and we should not assume that the ranking between the crisp variables is the best one. In this paper we address these problems, and propose an extended definition of the mutual information between two fuzzified continuous variables. We also introduce a numerical algorithm for estimating the mutual information from a sample of vague data. We will show that this estimation can be included in a feature selection algorithm, and also that, in combination with a genetic optimization, the same definition can be used to obtain the most informative fuzzy partition for the data. Both applications will be exemplified with the help of some benchmark problems. (C) 2008 Elsevier Inc. All rights reserved.
We develop and apply a previously undescribed framework that is designed to extract information in the form of a positive definite kernel matrix from possibly crude, noisy, incomplete, inconsistent dissimilarity infor...
详细信息
We develop and apply a previously undescribed framework that is designed to extract information in the form of a positive definite kernel matrix from possibly crude, noisy, incomplete, inconsistent dissimilarity information between pairs of objects, obtainable in a variety of contexts. Any positive definite kernel defines a consistent set of distances, and the fitted kernel provides a set of coordinates in Euclidean space that attempts to respect the information available while controlling for complexity of the kernel. The resulting set of coordinates is highly appropriate for visualization and as input to classification and clustering algorithms. The framework is formulated in terms of a class of optimization problems that can be solved efficiently by using modern convex cone programming software. The power of the method is illustrated in the context of protein clustering based on primary sequence data. An application to the globin family of proteins resulted in a readily visualizable 3D sequence space of globins, where several subfamilies and subgroupings consistent with the literature were easily identifiable.
We present a multistage stochastic programming model for mean-risk optimization of electricity portfolios containing physical components and energy derivative products. We consider a medium-term time horizon of up to ...
详细信息
In this paper, the concept of statistical bandwidth of multi-access systems are studied and extended to the case of unknown statistical descriptors. The results can improve the statistical characterization of the tail...
详细信息
ISBN:
(纸本)3540259201
In this paper, the concept of statistical bandwidth of multi-access systems are studied and extended to the case of unknown statistical descriptors. The results can improve the statistical characterization of the tail distribution of aggregated load presented to a multi-access system which is traditionally based on the logarithmic moment generation function (LMGF)[1]. In the paper, an extended moment generating function is introduced for calculating the statistical bandwidth and as a result a novel admission algorithm is presented. To further maximize the admitted load into the multi-access system the free parameter of the extended statistical bandwidth is optimized based on the geometrical optimization of polygonal surfaces. In this way, the system utilization can be near-optimal.
Currently, overlay measurements are characterized by "recipe", which defines both physical parameters such as focus, illumination et cetera, and also the software parameters such as algorithm to be used and ...
详细信息
ISBN:
(纸本)0819457353
Currently, overlay measurements are characterized by "recipe", which defines both physical parameters such as focus, illumination et cetera, and also the software parameters such as algorithm to be used and regions of interest. Setting up these recipes requires both engineering time and wafer availability on an overlay tool, so reducing these requirements will result in higher tool productivity. One of the significant challenges to automating this process is that the parameters are highly and complexly correlated. At the same time, a high level of traceability and transparency is required in the recipe creation process, so a technique that maintains its decisions in terms of well defined physical parameters is desirable. Running time should be short, given the system (automatic recipe creation) is being implemented to reduce overheads. Finally, a failure of the system to determine acceptable parameters should be obvious, so a certainty metric is also desirable. The complex, nonlinear interactions make solution by an expert system difficult at best, especially in the verification of the resulting decision network. The transparency requirements tend to preclude classical neural networks and similar techniques. Genetic algorithms and other "global minimization" techniques require too much computational power (given system footprint and cost requirements). A Bayesian network, however, provides a solution to these requirements. Such a network, with appropriate priors, can be used during recipe creation/optimization not just to select a good set of parameters, but also to guide the direction of search, by evaluating the network state while only incompleteinformation is available. As a Bayesian network maintains an estimate of the probability distribution of nodal values, a maximum-entropy approach can be utilized to obtain a working recipe in a minimum or near-minimum. number of steps. In this paper we discuss the potential use of a Bayesian network in such a capacity, r
The paper presents INFOMIX, a novel system which supports powerful information integration, utilizing advanced reasoning capabilities. While INFOMIX is based on solid theoretical foundations, it is a user-friendly sys...
详细信息
ISBN:
(纸本)8854801224
The paper presents INFOMIX, a novel system which supports powerful information integration, utilizing advanced reasoning capabilities. While INFOMIX is based on solid theoretical foundations, it is a user-friendly system, endowed with graphical user interfaces for the average database user and administrator, respectively. The main features of the INFOMIX system are: (i) a comprehensive information model, through which the knowledge about the integration domain can be declaratively specified, (ii) capability of dealing with data that may result incomplete and/or inconsistent with respect to global ICs, (iii) advanced information integration algorithms, which reduce (in a sound and complete way) query answering to cautious reasoning on disjunctive Datalog programs, (iv) sophisticated optimization techniques guaranteeing the effectiveness of query evaluation in INFOMIX, (v) a rich data acquisition and transformation framework for accessing heterogeneous data in many formats including relational, XML, and HTML data.
暂无评论