The Eclat algorithm is one of the most widely used frequent itemset mining methods. However, the inefficiency for calculating the intersection of itemsets makes it a time-consuming method, especially when the dataset ...
详细信息
The Eclat algorithm is one of the most widely used frequent itemset mining methods. However, the inefficiency for calculating the intersection of itemsets makes it a time-consuming method, especially when the dataset has a large number of transactions. In this work, for the purpose of efficiency improvement, we proposed an approximate Eclat algorithm named HashEclat based on MinHash, which could quickly estimate the size of the intersection set, and adjust the parameters k, E and minSup to consider the tradeoff between accuracy of the mining results and execution time. The parameter k is the top-k parameter of one-permutation MinHash algorithm;the parameter E is the estimate error of one intersection size;the parameter minSup is the minimum support threshold. In many real situations, an approximate result with faster speed maybe more useful than 'exact' result. The theoretical analysis and experiment results that we present demonstrate that the proposed algorithm can output almost all of the frequent itemset with faster speed and less memory space.
As a major type of continuous spatial queries, the moving k nearest neighbor (kNN) query has been studied extensively. However, most existing studies have focused on only the query efficiency. In this paper, we consid...
详细信息
As a major type of continuous spatial queries, the moving k nearest neighbor (kNN) query has been studied extensively. However, most existing studies have focused on only the query efficiency. In this paper, we consider further the usability of the query results, in particular the diversification of the returned data points. We thereby formulate a new type of query named the moving k diversified nearest neighbor query (MkDNN). This type of query continuously reports the k diversified nearest neighbors while the query object is moving. Here, the degree of diversity of the kNN set is defined on the distance between the objects in the kNN set. Computing the k diversified nearest neighbors is an NP-hard problem. We propose an algorithm to maintain incrementally the k diversified nearest neighbors to reduce the query processing costs. We further propose two approximate algorithms to obtain even higher query efficiency with precision bounds. We verify the effectiveness and efficiency of the proposed algorithms both theoretically and empirically. The results confirm the superiority of the proposed algorithms over the baseline algorithm.
Gaussian kernel support vector machine recursive feature elimination (GKSVM-RFE) is a method for feature ranking in a nonlinear way. However, GKSVM-RFE suffers from the issue of high computational complexity, which hi...
详细信息
Gaussian kernel support vector machine recursive feature elimination (GKSVM-RFE) is a method for feature ranking in a nonlinear way. However, GKSVM-RFE suffers from the issue of high computational complexity, which hinders its applications. This paper investigates the issue of computational complexity in GKSVM-RFE, and proposes two fast versions for GKSVM-RFE, called fast GKSVM-RFE (FGKSVM-RFE), to speed up the procedure of recursive feature elimination in GKSVM-RFE. For this purpose, we design two kinds of ranking scores based on the first-order and second-order approximate schemes by introducing approximate Gaussian kernels. In iterations, FGKSVM-RFE fast calculates approximate ranking scores according to approximate schemes and ranks features based on approximate ranking scores. Experimental results reveal that our proposed methods can faster perform feature ranking than GKSVM-RFE and have compared performance to GKSVM-RFE.
This paper considers wireless networks where communication links are unstable and link interference is a challenge to design high performance scheduling algorithms. Wireless links are time varying and are modeled by M...
详细信息
This paper considers wireless networks where communication links are unstable and link interference is a challenge to design high performance scheduling algorithms. Wireless links are time varying and are modeled by Markov stochastic processes. The problem of designing an optimal link scheduling algorithm to maximize the expected reliability of the network is formulated into a Markov Decision Process first. The optimal solution can be obtained by the finite backward induction algorithm. However, the time complexity is very high. Thus, we develop an approximate link scheduling algorithm with an approximate ratio of 2 (N - 1) (r(M)Delta - r(m)delta);where N is the number of decision epochs, r(M) is the maximum link reliability, r(m) is the minimum link reliability, Delta is the number of links in the largest maximal independent set and delta is the number of links in the smallest maximal independent set. Simulations are conducted in different scenarios under different network topologies.
The behavior of the welding pool plays an important role in determining the quality of the weld, and the surface behavior of the welding pool contains some important information as feedback to adjust welding parameter...
详细信息
The behavior of the welding pool plays an important role in determining the quality of the weld, and the surface behavior of the welding pool contains some important information as feedback to adjust welding parameters. In order to study the dynamic characteristics of the molten pool surface in the TIG welding process with the filler wire, a grid structure laser measurement platform, based on the principle of surface reflection, was designed to observe the molten pool surface in this work. CCD was used to record the imaging on the projection screen. A new three-dimensional reconstruction algorithm was proposed for calculation of the welding pool surface. This algorithm analyzes the image which is captured by the CCD to restore the three-dimensional topography of the fixed-point wire-filled TIG welding pool, so as to obtain the three-dimensional topography evolution the during welding process. The difference between the obtained weld pool height and the experimental results is very small.
Scheduling parallel tasks in multi-cluster grid can be seen as two interdependent problems: cluster allocation and scheduling parallel task on the allocated cluster. In this paper both rigid and moldable parallel task...
详细信息
Scheduling parallel tasks in multi-cluster grid can be seen as two interdependent problems: cluster allocation and scheduling parallel task on the allocated cluster. In this paper both rigid and moldable parallel tasks are considered. We propose a theoretical model of utility-oriented parallel task scheduling in multi-cluster grid with advance reservations. On the basis of the model we present an approximation algorithm, a repair strategy based genetic algorithm and greedy heuristics MaxMax, T-Sufferage and R-Sufferage to solve the two interdependent problems. We compare the performance of these algorithms in aspect of utility optimality and timing results. Simulation results show on average the (1+alpha)-approximation algorithm achieves the best trade-off between utility optimality and timing. Genetic algorithm could achieve better utility than greedy heuristics and approximate algorithm at expensive time cost. Greedy heuristics do not perform equally well when adapted to different utility functions while the approximation algorithm shows its intrinsic stable performance.
People's opinions are often affected by their social network, and the associated misinformation on the online social networks can easily mislead people's judgment and decision-making process, leading people to...
详细信息
People's opinions are often affected by their social network, and the associated misinformation on the online social networks can easily mislead people's judgment and decision-making process, leading people to take unconventional or even radical behaviors. People's decision-making behavior is influenced by their concern to the misinformation they receive. Building on this, we explore the competitive concern minimization problem of leveraging agents who post correct information to minimize users' concern to misinformation. First, considering users' concern to misinformation, this paper constructs a concern-critical competitive model and introduces the Coulomb's law to quantify the dynamic evolution of users' concern in information diffusion. Second, we prove hardness results for the competitive concern minimization problem and discuss the modularity of the objective function. Then, to optimize the nonsubmodular objective function, a two-stage approximate projected subgradient algorithm with data-dependent approximation ratio is developed using Lovasz extension and convex envelope. Finally, the experimental simulations on three real networks highlight the efficiency of the approaches proposed in this paper, which is at least 9.71% better than other baselines in reducing misinformation concern.
Chemical Reaction Optimization (CRO) is a recently established metaheuristics for optimization, inspired by the nature of chemical reactions. A chemical reaction is a natural process of transforming the unstable subst...
详细信息
Chemical Reaction Optimization (CRO) is a recently established metaheuristics for optimization, inspired by the nature of chemical reactions. A chemical reaction is a natural process of transforming the unstable substances to the stable ones. In microscopic view, a chemical reaction starts with some unstable molecules with excessive energy. The molecules interact with each other through a sequence of elementary reactions. At the end, they are converted to those with minimum energy to support their existence. This property is embedded in CRO to solve optimization problems. CRO can be applied to tackle problems in both the discrete and continuous domains. We have successfully exploited CRO to solve a broad range of engineering problems, including the quadratic assignment problem, neural network training, multimodal continuous problems, etc. The simulation results demonstrate that CRO has superior performance when compared with other existing optimization algorithms. This tutorial aims to assist the readers in implementing CRO to solve their problems. It also serves as a technical overview of the current development of CRO and provides potential future research directions.
The design of an OLAP system for supporting real-time queries is one of the major research issues. One approach is to use data cubes, which are materialized precomputed multidimensional views of data in a data warehou...
详细信息
The design of an OLAP system for supporting real-time queries is one of the major research issues. One approach is to use data cubes, which are materialized precomputed multidimensional views of data in a data warehouse. We can derive a set of data cubes to answer each frequently asked query directly. However, there are two practical problems: ( 1) the maintenance cost of the data cubes, and ( 2) the query cost to answer those queries. Maintaining a data cube requires disk storage and CPU computation, so the maintenance cost is related to the total size as well as the total number of data cubes materialized. In most cases, materializing all data cubes is impractical. The maintenance cost may be reduced by merging some data cubes. However, the resulting larger data cubes will increase the query cost of answering some queries. If the bounds on the maintenance cost and the query cost are too strict, we help the user decide which queries to be sacrificed and not taken into consideration. We have defined an optimization problem in data cube system design. Given a maintenance-cost bound, a query-cost bound and a set of frequently asked queries, it is necessary to determine a set of data cubes such that the system can answer a largest subset of the queries without violating the two bounds. This is an NP-hard problem. We propose approximate Greedy algorithms GR, 2GM and 2GMM, which are shown to be both effective and efficient by experiments done on a census data set and a forest-cover-type data set.
We consider the NP-hard integer three-index axial assignment problem. Strategies for combining feasible solutions of the problem are investigated. Combining can be used as a supplement to heuristic or approximate solu...
详细信息
We consider the NP-hard integer three-index axial assignment problem. Strategies for combining feasible solutions of the problem are investigated. Combining can be used as a supplement to heuristic or approximate solution algorithms instead of the generally accepted step of choosing the record among the feasible solutions found. The results of computational experiments are presented that demonstrate the promising nature of the approach proposed.
暂无评论