We consider the problem of jumbled matching where the objective is to find all permuted occurrences of a pattern in a text. Besides exact matching we study approximate matching where each occurrence is allowed to cont...
详细信息
ISBN:
(纸本)9788001057872
We consider the problem of jumbled matching where the objective is to find all permuted occurrences of a pattern in a text. Besides exact matching we study approximate matching where each occurrence is allowed to contain at most k wrong or superfluous characters. We present online algorithms applying bit-parallelism to both types of jumbled matching. Most of our algorithms are variations of earlier algorithms. We show by practical experiments that our algorithms are competitive with the previous solutions.
The paper presents the results of the investigation of the I. Boaventura und A. Gonzaga integrated performance evaluation method of edge detection [1-2], obtained using the bundled software of stochastic simulation &q...
详细信息
ISBN:
(纸本)9781467374880
The paper presents the results of the investigation of the I. Boaventura und A. Gonzaga integrated performance evaluation method of edge detection [1-2], obtained using the bundled software of stochastic simulation "CS sF" [3]. The methods and approaches of stochastic simulation were used in the experiments and the reference images were approximated with the two-dimensional renewal stream [4-7]. The performance of the outline drawing detection was evaluated by Boaventura und Gonzaga method for three algorithms of the edge detection ("Canny", "Marr-Hildreth" and "ISEF") under different levels of peak signal-to-noise ratio. The results of the investigation are presented as dependences of the estimate probability of the correct edge detection, the type 1 and 2 errors, on S/N ratio. The performance analysis of the above three algorithms for the images, produced on the basis of morphology type "A" and "F", is done based on the performed evaluation.
A new text classification algorithm has been put forward based on basic support vector machine algorithm. The SVM-KNN algorithm for text classification has been proposed which combined SVM algorithm and KNN algorithm....
详细信息
ISBN:
(纸本)9781479932795
A new text classification algorithm has been put forward based on basic support vector machine algorithm. The SVM-KNN algorithm for text classification has been proposed which combined SVM algorithm and KNN algorithm. The SVM-KNN algorithm can improve the performance of classifier by the feedback and improvement of classifying prediction probability. The actual effect of SVM-KNN algorithm is tested and the performance is proved in related Chinese web page classification test system.
The essay outlines one particular possibility of efficient evaluating the Performance of edge detector algorithms. Three generally known and published algorithms (Canny, Marr, Shen) were analysed by way of example. Th...
详细信息
ISBN:
(纸本)9781479945306
The essay outlines one particular possibility of efficient evaluating the Performance of edge detector algorithms. Three generally known and published algorithms (Canny, Marr, Shen) were analysed by way of example. The analysis is based on two-dimensional signals created by means of two-dimensional Semi-Markov Model and subsequently provided with an additive Gaussian noise component. Five quality metrics allow an objective comparison of the algorithms.
A new text classification algorithm has been put forward based on basic support vector machine *** SVM-KNN algorithm for text classification has been proposed which combined SVM algorithm and KNN *** SVMKNN algorithm ...
详细信息
A new text classification algorithm has been put forward based on basic support vector machine *** SVM-KNN algorithm for text classification has been proposed which combined SVM algorithm and KNN *** SVMKNN algorithm can improve the performance of classifier by the feedback and improvement of classifying prediction *** actual effect of SVM-KNN algorithm is tested and the performance is proved in related Chinese web page classification test system.
The paper is dedicated to issues of special software development for stochastic modeling "CS sF". It considers the short description of the software features, a generalized block diagram "CS sF", s...
详细信息
ISBN:
(纸本)9781467361415
The paper is dedicated to issues of special software development for stochastic modeling "CS sF". It considers the short description of the software features, a generalized block diagram "CS sF", series of morphologies of obtained space-time signals (STS) and their bitmap images. The numerical simulations results of generated STS statistical processing are presented. The directions of perspective research are formulated. The possibility of theoretical information analysis of STS provided by this software is shown. The results of a numerical experiment set based on stochastic simulation package "CS sF" are given.
This paper outlines a particular method of modelling stochastic intensity fields by isotropic, one-step Markov chains. The field characteristics are determined among each other by "Palma formulas" whereas th...
详细信息
This paper outlines a particular method of modelling stochastic intensity fields by isotropic, one-step Markov chains. The field characteristics are determined among each other by "Palma formulas" whereas the correlating characteristics of the generated random fields depend only on the mosaic grating structure. The alphabetical/ABC selection based on the grating structure (morphology of the field) is determined by the operator manually in advance. The presented method allows the generating of different types of grating structures with horizontal, vertical, and diagonal elements in an 8-adjacency.
Assignment Problem (AP) was well studied in the past 50 years, and is of great value in operations research and engineering. The Hungarian Method is one of the effective algorithms for the assignment problem. Although...
详细信息
ISBN:
(纸本)9787900719706
Assignment Problem (AP) was well studied in the past 50 years, and is of great value in operations research and engineering. The Hungarian Method is one of the effective algorithms for the assignment problem. Although the assignment problem is well studied and a variety of algorithms are proposed, the efficiencies of the algorithms are never completely compared in the literature. In this paper, we summarize the properties of the assignment problem, and then survey its research history, add two new algorithms to existing classification, and compare the performances of these two algorithms with that of the Hungarian Method. The computational results show that the Hungarian Method can solve larger balanced assignment problems in engineering and is more efficient than the two algorithms involved, and the bidding algorithm is superior to solve unbalanced assignment problems.
When facing a typical pattern recognition task, one usually comes up with a number of so-called features: properties that describe the objects to be recognised. Based on these features, the task of the classifier buil...
详细信息
When facing a typical pattern recognition task, one usually comes up with a number of so-called features: properties that describe the objects to be recognised. Based on these features, the task of the classifier building algorithm is to find useful rules that are suitable for the recognition of new objects.
Feature selection is a process where one tries to identify the useful features from among a potentially large set of candidates. The task is notoriously hard, and researchers have been tackling it already for decades. Solving the problem properly might today be more important than ever before, because in many applications, dataset sizes seem to grow faster than does the processing power of computers. For example, in the domain of genetic microarray data, there can easily be thousands of features.
Several research groups have published comparisons aiming to identify the feature selection method that is universally the best. Unfortunately, too often the way that such comparisons are done is just plain wrong. Based on the results of such studies, the computationally intensive search algorithms seem to perform much better than the simple approaches. However, it is shown in this thesis that when the comparison is done properly, it very often turns out that the simple and fast algorithms give results that are just as good, if not even better.
In addition, many studies suggest that excluding some of the features is much more useful than it actually is. This observation is relevant in practice, because the selection process typically takes a lot of time and computing resources - therefore, it would be very convenient not to have to carry it out at all. This thesis shows that the benefits obtained may be negligible compared to what has been presented previously, provided that they are measured correctly.
Moreover, the thesis presents a better-performing approach for accuracy estimation in case the amount of data is small. Further, extensions are discussed from feature se
Predictive accuracy claims should give explicit descriptions of the steps followed, with access to the code used. This allows referees and readers to check for common traps, and to repeat the same steps on other data....
详细信息
ISBN:
(纸本)9781920682415
Predictive accuracy claims should give explicit descriptions of the steps followed, with access to the code used. This allows referees and readers to check for common traps, and to repeat the same steps on other data. Feature selection and/or model selection and/or tuning must be independent of the test data. For use of cross-validation, such steps must be repeated at each fold. Even then, such accuracy assessments have the limitation that the target population, to which results will be applied, is commonly different from the source population. Commonly, it is shifted forward in time, and it may differ in other respects also.A consequence of source/target differences is that highly sophisticated modeling may be pointless or even counter-productive. At best, model effects in the target population may be broadly similar. Investigation of the pattern of changes over time is required. Such studies are unusual in the data mining literature, in part because relevant data have not been *** recent investigations are noted that shed interesting light on the comparison between observational and experimental studies, with particular relevance when there is an interest in giving parameter estimates a causal *** mining activity would benefit from wider co-operation in the development and deployment of computing tools, and from better integration of those tools into the publication process.
暂无评论