Partition-and-recur (PAR) method is a simple and useful formal method used to design and prove algorithmic programs. In this paper, we address that PAR method is really an effective formal method on solving combinator...
详细信息
Partition-and-recur (PAR) method is a simple and useful formal method used to design and prove algorithmic programs. In this paper, we address that PAR method is really an effective formal method on solving combinatorics problems. We formally derive combinatorics problems by PAR method, which can not only simplify the process of algorithmic program's designing and correctness testifying, but also effectively improve the automatization, standardization and correctness of algorithmic program's designing by changing many creative labors to mechanized labors. Lastly, we develop typical algorithms of combinatorics problem instances, knapsack problem, and get accurate running result by RADL algorithmic program which derived by PAR method and can be transformed to C++ programs by the automatic program transforming system of PAR platform.
Use of published organizational data for a variety of purposes has the chance of violation of leakage of individual secret information. Though this is taken care by the organizations by removal or encryption of explic...
详细信息
Use of published organizational data for a variety of purposes has the chance of violation of leakage of individual secret information. Though this is taken care by the organizations by removal or encryption of explicit identifiers, valuable information may still be leaked by quasi identifiers in the released data. The concept of k-anonymity was introduced and several algorithms in this direction have been proposed by different researchers [1, 2, 3, 4, 5, 6, 7, 8, 11] to handle this problem. But the notion of k-anonymity is susceptible to two types of attacks which necessitated the requirement of a better privacy preserving notion leading to the proposal of l- diversity [6]. In [14] a third phase is added to the two phase clustering-based k-anonymisation algorithm OKA [4] to achieve l-diversity. Recently, the clustering stage of the algorithm has been improved in [14] and the diversity stage algorithm is improved in [16] to come up with a fast l-diversity algorithm which deals with a single sensitive attribute in a relational table. Our main contribution in this paper is to develop an l-diversity algorithm to handle multi-sensitive attributes in databases. Also, we shall improve the adjustment stage algorithm so that it becomes more efficient. We also analyse and provide enough reasons to show that though the second and third stages of the algorithm are not necessary in most of the cases, we cannot avoid using these two stages in some cases.
Automatic methods for discovering program runtime and proving program termination have always been a challenging problem in computer science. We present here a novel and systematic approach for calculating an upper bo...
详细信息
Automatic methods for discovering program runtime and proving program termination have always been a challenging problem in computer science. We present here a novel and systematic approach for calculating an upper bound of the maximum runtime of functions for a non-trivial class of programs. The proof is based on an induction over a tree of execution traces - a new mathematical data structure. As a consequence, this can also show termination for these functions. The approach uses symbolic-numeric algorithms over a novel mathematical data structure, and can systematically find the maximum runtime for a wide class of functions.
Two-dimensional entropic segmentation method has been greatly developed because of high segmentation accuracy and good stability, while a hard problem is that it gives rise to the exponential increment of computationa...
详细信息
Two-dimensional entropic segmentation method has been greatly developed because of high segmentation accuracy and good stability, while a hard problem is that it gives rise to the exponential increment of computational time in comparison with the traditional one-dimensional histogram partition technology. To solve this problem, a new kind of image thresholding method is presented based on the combination of the artificial immune algorithm (AIA) and two-dimensional entropy techniques in this paper. This method can effectively improve the computation time and avoid getting into local optimization of the threshold by making use of AIA's characteristics of the intelligent computation, adaptive evolution and globally optimizing. The test results show that the method is effective and practicable
Aiming the deficiency of traditional load model used for reliability assessment, this paper proposes a new integrative K-means clustering algorithm incorporating load characteristics information. Based on the clusteri...
详细信息
Aiming the deficiency of traditional load model used for reliability assessment, this paper proposes a new integrative K-means clustering algorithm incorporating load characteristics information. Based on the clustering algorithm, this model adopts multi-dimension correlation sampling technique considering bus load correlation load forecasting uncertainty to determine the bus load in evaluation period. At last, the effects of number of clustering, bus load correlation, and load forecast uncertainty are discussed and compared by case studies. The feasibility and reasonableness of the proposed algorithm are proved.
We consider the problem of aligning multiple protein sequences with the goal of maximizing the SP (sum-of-pairs) score, when the number of sequences is large. The QOMA (quasi-optimal multiple alignment) algorithm addr...
详细信息
We consider the problem of aligning multiple protein sequences with the goal of maximizing the SP (sum-of-pairs) score, when the number of sequences is large. The QOMA (quasi-optimal multiple alignment) algorithm addressed this problem when the number of sequences is small. However, as the number of sequences increases, QOMA becomes impractical. This paper develops a new algorithm, QOMA2, which optimizes the SP score of the alignment of arbitrarily large number of sequences. Given an initial (potentially sub-optimal) alignment , QOMA2 selects short subsequences from this alignment by placing a window on it. It quickly estimates the amount of improvement that can be obtained by optimizing the alignment of the subsequences in short windows on this alignment. This estimate is called the SW (sum of weights) score. It employs a dynamic programming algorithm that selects the set of window positions with the largest total expected improvement. It partitions the subsequences within each window into clusters such that the number of subsequences in each cluster is small enough to be optimally aligned within a given time. Also, it aims to select these clusters so that the optimal alignment of the subsequences in these clusters produces the highest expected SP score. The experimental results show that QOMA2 produces high SP scores quickly even for large number of sequences. They also show that the SW score and the resulting SP score are highly correlated. This implies that it is promising to aim for optimizing the SW score since it is much cheaper than aligning multiple sequences optimally. The software and the benchmark data set are available from the authors on request.
Clustering is for many years now one of the most complex and most studied problems in data mining. Until now the most commonly used algorithm for finding groups of similar objects in large databases is CURE. The main ...
详细信息
Clustering is for many years now one of the most complex and most studied problems in data mining. Until now the most commonly used algorithm for finding groups of similar objects in large databases is CURE. The main advantage of CURE, compared to other clustering algorithms, is its ability to identify non spherical or rectangular shaped objects. In this paper we present a new algorithm called CUZ (Clustering Using Zones). The main innovation of CUZ lies in the technique that it uses to calculate the representatives. This technique overcomes the problem of identifying clusters with non-convex shapes. Experimental results show that CUZ is a generally competitive technique, while it is particularly adequate when we have to do with clusters that do not have convex shapes.
In direct marketing, in order to increase the return rate of a marketing campaign, the massive customer dataset is needed to be analyzed, to make best product offers to the customers through the most proper channels. ...
详细信息
ISBN:
(纸本)9781509059119
In direct marketing, in order to increase the return rate of a marketing campaign, the massive customer dataset is needed to be analyzed, to make best product offers to the customers through the most proper channels. However, this problem is very challenging, since, usually only for very small portions of the whole dataset, some positive returns are received. This paper studies the similar problem for bank product marketing. The proposed approach is a two layer system, which first clusters the customers and then, constructs a classification model for product and communication channel offers. Experimental analysis on real life banking campaign dataset shows promising results.
We present an unsupervised segmentation algorithm comprising an annealing process to select the maximum a posteriori (MAP) realization of a hierarchical Markov random field (MRF) model. The algorithm consists of a sam...
详细信息
We present an unsupervised segmentation algorithm comprising an annealing process to select the maximum a posteriori (MAP) realization of a hierarchical Markov random field (MRF) model. The algorithm consists of a sampling framework which unifies the processes of model selection, parameter estimation and image segmentation, in a single Markov chain. To achieve this, reversible jumps are incorporated into the Markov chain to allow movement between model spaces. By using partial decoupling to segment the MRF it is possible to generate jump proposals efficiently while providing a mechanism for the use of deterministic methods, such as Gabor filtering, to speed up convergence.
We present a novel method to visualize high-dimensional dataset as a landscape. The goal is to provide clear and compact representation to reveal the structure of high-dimensional datasets in a way that the size and d...
详细信息
We present a novel method to visualize high-dimensional dataset as a landscape. The goal is to provide clear and compact representation to reveal the structure of high-dimensional datasets in a way that the size and distinctiveness of clusters can be easily discerned, and the relationships among single points can be preserved. Our method is network-based, and consists of two main steps: clustering and embedding. First of all, the similarity graph of high-dimensional dataset is constructed based on the Euclidean distances between data points. For clustering, we propose a new network community detection algorithm to calculate the membership-degree of each vertex belonging to each community. For embedding, we bring forward a practical algorithm to obtain an evenly distributed and regularly shaped layout of data points, in a way that the original relationships among single points are preserved. Finally, the landscape-like visualization is produced by assigning altitudes to data points according to their membership-degrees and by inserting control points. In our high-dimensional data visualization, clusters form highlands, and border data points among clusters show up as valleys. The area and altitude of highland indicate the size and distinctiveness of data cluster respectively.
暂无评论