We investigate a corpus of geographical distributions of 17,126 Finnish dialect words. Our goal is to automatically find sets of words characteristic to geographical regions. Though our approach is related to the prob...
详细信息
We investigate a corpus of geographical distributions of 17,126 Finnish dialect words. Our goal is to automatically find sets of words characteristic to geographical regions. Though our approach is related to the problem of dividing the investigation area into linguistically (and geographically) relatively coherent dialect regions, we do not aim at constructing more or less questionable dialect regions. Instead, we let the boundaries of the regions overlap to get insight to the degree of lexical change between adjacent areas. More concretely, we study the applicability of data clustering approaches to find sets of words with tight spatial distributions, and to cluster the extracted distributions according to their distribution areas. The extracted words belonging to the same cluster can then be utilized as a means to characterize the lexicon of the region. We also automatically pick up words with occurrences appearing in two or more areas that are geographically far from each other. These words may give valuable insight to, e.g., the study of cultural history and history of settlement.
We propose a method that takes observations of a random vector as input, and learns to segment each observation into two disjoint parts. We show how to use the internal coherence of segments to learn to segment almost...
详细信息
We propose a method that takes observations of a random vector as input, and learns to segment each observation into two disjoint parts. We show how to use the internal coherence of segments to learn to segment almost any random variable. Coherence is formalized using the principle of autoprediction, i.e. two elements are similar if the observed values are similar to the predictions given by the elements for each other. To obtain a principled model and method, we formulate a generative model and show how it can be estimated in the limit of zero noise. The ensuing method is an abstract, adaptive (learning) generalization of well-known methods for image segmentation. It enables segmentation of random vectors in cases where intuitive prior information necessary for conventional segmentation methods is not available.
The utilization of image enhancement algorithms could improve the performance of the computer aided detection (CAD) systems. This work is an improvement stage for our previously developed CAD system. The CAD system is...
详细信息
Microarray technology is a powerful tool for analyzing the expression of a large number of genes in parallel. A typical microarray image consists of a few thousands of spots which determine the level of gene expressio...
详细信息
Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the effective dimensional...
详细信息
Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the effective dimensionality of such a dataset is a nontrivial problem. We consider the problem of defining a robust measure of dimension for 0/1 datasets, and show that the basic idea of fractal dimension can be adapted for binary data. However, as such the fractal dimension is difficult to interpret. Hence we introduce the concept of normalized fractal dimension. For a dataset D, its normalized fractal dimension counts the number of independent columns needed to achieve the unnormalized fractal dimension of D. The normalized fractal dimension measures the degree of dependency structure of the data. We study the properties of the normalized fractal dimension and discuss its computation. We give empirical results on the normalized fractal dimension, comparing it against PCA.
In this work, three different methodologies for fuzzy expert systems creation are compared: a well-known neuro-fuzzy approach, a knowledge-based approach and a novel methodology, based on rule-extraction. The adaptive...
详细信息
In this work, three different methodologies for fuzzy expert systems creation are compared: a well-known neuro-fuzzy approach, a knowledge-based approach and a novel methodology, based on rule-extraction. The adaptive neuro-fuzzy information system (ANFIS) is used to automatically generate a fuzzy expert system. In the knowledge-based approach and the rule-extraction methodology, the idea is to start with a model described by crisp rules, provided by medical experts in the first case or extracted using data mining techniques in the second, and then to transform them into a set of fuzzy rules, creating a fuzzy model. In either case, the adjustment of the model's parameters is performed via a stochastic global optimization procedure. All three approaches are applied to a medical domain problem, the cardiac arrhythmic beat classification. The ability to interpret the decisions made from the created fuzzy expert systems is a major advantage compared to other "black box" approaches
When a brief current pulse is incident on excitable cells in cardiac and other nervous tissue, a change in phase of the cell's response is usually observed. In cardiac tissue, the cells are bared to external stimu...
详细信息
When a brief current pulse is incident on excitable cells in cardiac and other nervous tissue, a change in phase of the cell's response is usually observed. In cardiac tissue, the cells are bared to external stimulation of generally positive currents, which depolarize the cells. In this paper an overview of the application of the phase resetting technique (PRT) in several cardiac models is presented. We discuss the effects of external stimuli in several cardiac cell models and we provide the phase transition curves (PTCs) resulted from the application of PRT with the Zhang et al. sinoatrial node model
This paper analyzes the reflections of an agile team, developing a large-scale project in an industry setting. The team uses an iteration summary meeting practice, which includes four elements: the customer's summ...
详细信息
This paper analyzes the reflections of an agile team, developing a large-scale project in an industry setting. The team uses an iteration summary meeting practice, which includes four elements: the customer's summary, a formal presentation of the system, review of metrics and a reflection. The technique for the entire meeting and for the reflection element in particular is described, and empirical evidence is given to show that it is assessed as highly effective, achieving its intended goals, and increasing team satisfaction. Further, the proposed practice supports tracking past decisions. This practice is shown to be valuable to stabilizing a new project as well as a continuous improvement forum for a stable project. It also incurs a lower overhead than existing alternative reflection practices
Identification of gene subsets responsible for discerning between available samples of gene microarray data is an important task in bioinformatics. Due to the large number of genes in samples, there is an exponentiall...
详细信息
Identification of gene subsets responsible for discerning between available samples of gene microarray data is an important task in bioinformatics. Due to the large number of genes in samples, there is an exponentially large search space of solutions. The main challenge is to reduce or remove the redundant genes, without affecting discernibility between objects. Reducts, from rough set theory, correspond to a minimal subset of essential genes. We present an algorithm for generating reducts from gene microarray data. It proceeds by preprocessing gene expression data, discretization of real value attributes into categorical followed by positive region based approach for reduct generation. For comparison, different approaches for reduct generation have also been discussed. Results on benchmark gene expression datasets demonstrate more than 90% reduction of redundant genes
A three-stage method for fetal heart rate extraction, from abdominal ECG recordings, is proposed. In the first stage the maternal R-peaks and fiducial points (QRS onset and offset) are detected, using time-frequency a...
详细信息
A three-stage method for fetal heart rate extraction, from abdominal ECG recordings, is proposed. In the first stage the maternal R-peaks and fiducial points (QRS onset and offset) are detected, using time-frequency analysis, and the maternal QRS complexes are eliminated. The second stage locates the positions of the candidate fetal R-peaks, using complex wavelets and pattern matching theory techniques. In the third stage, the fetal R-peaks that overlap with the maternal QRS complexes are found. The method is validated using a dataset of 4 long duration recordings and the obtained results indicate high detection ability of the method (96% accuracy)
暂无评论