Constrained clustering algorithms integrate user knowledge as constraints in the clustering process to guide it towards a desired outcome. When interacting with users, it is essential to quickly ask simple questions t...
详细信息
ISBN:
(数字)9798331527235
ISBN:
(纸本)9798331527242
Constrained clustering algorithms integrate user knowledge as constraints in the clustering process to guide it towards a desired outcome. When interacting with users, it is essential to quickly ask simple questions to identify informative constraints that will efficiently enhance an initial partition. We propose a new active query strategy for incremental clustering that translates user feedback into interpretable decision rules and identifies relevant points for queries using rule-based heuristics Experiments on benchmark datasets highlight the benefits of our new approach, making it suitable for real-world applications.
Feature selection is an essential technique used in high dimensional data. Basically, feature selection is focused on removing irrelevant features. But, removing redundant features is also equally important. We propos...
详细信息
Feature selection is an essential technique used in high dimensional data. Basically, feature selection is focused on removing irrelevant features. But, removing redundant features is also equally important. We propose a novel feature subset selection algorithm based on the idea of consensus clustering. Our algorithm constructs a complete graph on feature space and partitions the graph using various graph partitioning algorithms from social networks. Consensus clustering is applied to find the best partitioning and final feature subset is formed by selecting the most `representative' feature that has highest correlation to target class from each cluster. Classification is used as validation and the algorithm is evaluated on benchmark data sets of dimensionality ranging between 8 to 168 features. The results show that the proposed approach is efficient in removing irrelevant and redundant features. The number of features selected using proposed method is very less and classifier accuracies using selected features are on par with the accuracies of the latest approaches proposed in the literature.
The authors present a hardware accelerator for a module placement algorithm based on the divide-and-conquer paradigm. They consider a partitioning algorithm for the approximate solution of a large placement problem. T...
详细信息
The authors present a hardware accelerator for a module placement algorithm based on the divide-and-conquer paradigm. They consider a partitioning algorithm for the approximate solution of a large placement problem. This algorithm divides the set of logic modules into small clusters and generates an optimal placement for each cluster. Finally, in a pasting step, the algorithm combines the optimal solutions for the smaller problems into a near-optimal solution for the original placement problem. The algorithm lends itself very naturally to a parallel realization, and maps nicely onto an SIMD (single-instruction, multiple data-stream) organization. Considerations such as cost-effectiveness and suitability to VLSI implementation led to the selection of the reduced array architecture as the target architecture for the placement accelerator.< >
A proximity search looks for similar complex documents such as images, sounds, DNA sequences that share two or more separately matching terms within a specified distance from within a large collection. Retrieving thos...
详细信息
A proximity search looks for similar complex documents such as images, sounds, DNA sequences that share two or more separately matching terms within a specified distance from within a large collection. Retrieving those similar complex documents are of great importance to many applications. To achieve an efficiency query process, many different access methods have been proposed. Token list based proximity search has been proved to be a good alternative method to the LSH for a large massive database proximity search. However, single-token based method leads to a high overhead in the results refinements process to achieve a required similarity. In this paper, we investigate how the multi-token list affects the performance of database proximity search. Numerous experiments have been conducted and the results show that two-token adjacent token list can achieve the best query performance in multi-token list based proximity search.
This paper describes a method to verbalize the trends of time-series data. As an example of time-series data, we use the price of Nikkei stock average and develop a method to generate natural language sentences which ...
详细信息
ISBN:
(纸本)9781509006274
This paper describes a method to verbalize the trends of time-series data. As an example of time-series data, we use the price of Nikkei stock average and develop a method to generate natural language sentences which describe how the stock price goes in the market. As the basic idea for making linguistic descriptions of the stock price trends, we firstly classify all the time-series data including a newly observed time-series data, i.e., the target to be verbalized, by means of spectral clustering employing Dynamic Time Warping distance as its similarity metric. Secondly, a bi-gram language model for the newly observed data is built based on the weighted bi-gram language models of the other time-series data classified in the same cluster. The weights for the bi-gram model of the target data from other time-series data are decided based on the similarity between the target data and the other data in the same cluster. Lastly, linguistic summarization for the target data is generated by finding the most likely combination of words by means of dynamic programming, employing the weighted bi-gram model. Through the experiments under the conditions of various cluster numbers in spectral clustering, we have confirmed that natural language sentences, which properly describe the trends of the stock price, are generated by our method.
Effectively handling precedence constraints and resource synchronization is a challenging problem in the era of multiprocessor systems even with massively parallel computation power. One common approach is to apply li...
详细信息
Effectively handling precedence constraints and resource synchronization is a challenging problem in the era of multiprocessor systems even with massively parallel computation power. One common approach is to apply list scheduling to a given task graph with precedence constraints. However, in some application scenarios, such as the OpenMP task model and multiprocessor partitioned scheduling for resource synchronization using binary semaphores, several operations can be forced to be tied to the same processor, which invalidates the list scheduling. This paper studies a special case of this challenging scheduling problem, where a task comprised of (at most) three subtasks is executed sequentially on the same processor and the second subtasks of the tasks may have sequential dependencies, e.g., due to synchronization. We demonstrate the limits of existing algorithms and provide effective heuristics considering preemptive execution. The evaluation results show a significant improvement, compared to the existing multiprocessor partitioned scheduling strategies.
This paper proposes the use of a local feature selection scheme, for the effective selection of relevant features, when designing Genetic Fuzzy Rule-Based Classification Systems (GFRBCSs). The method relies in providi...
详细信息
This paper proposes the use of a local feature selection scheme, for the effective selection of relevant features, when designing Genetic Fuzzy Rule-Based Classification Systems (GFRBCSs). The method relies in providing the genetic search with deterministic information about the quality of each feature with respect to its classification ability, directing the evolution in selecting the most useful features. To evaluate our method, we propose a learning algorithm that iteratively generates the final fuzzy rule base, extracting one rule at a time, as directed by a boosting algorithm. Experimental results in a number of well-known classification datasets prove the efficiency of the proposed system in dealing with high-dimensional feature spaces.
To increase the efficiency of the clustering algorithms and for visualization purpose the dimension reduction techniques may be employed. In this paper our aim is to develop a simple dimension reduction technique to c...
详细信息
To increase the efficiency of the clustering algorithms and for visualization purpose the dimension reduction techniques may be employed. In this paper our aim is to develop a simple dimension reduction technique to convert a high dimensional data to two dimensional data and then apply K-Means clustering algorithm on converted (two dimensional) data. We have applied our technique on three real datasets to evaluate the performance of our technique and for comparative purpose we have compared our technique with other existing technique.
In this paper we present a fast algorithm for computing the value of a spectral transform of Boolean or multiple-valued functions for a given assignment of input variables. Our current implementation is for arithmetic...
详细信息
In this paper we present a fast algorithm for computing the value of a spectral transform of Boolean or multiple-valued functions for a given assignment of input variables. Our current implementation is for arithmetic transform, because our work is primarily aimed at optimizing the performance of probabilistic verification methods. However, the presented technique is equally applicable for other discrete transforms, e.g. Walsh or Reed-Muller transforms. Previous methods for computing spectral transforms used truth tables, sum-of-product expressions, or various derivatives of decision diagrams. They were fundamentally limited by the excessive memory requirements of these data structures. We present a new algorithm that partitions the computation of the spectral transform based on the dominator relations of the circuit graph representing the function to be transformed. As a result, the presented algorithm can handle larger functions than previously possible.
Simulation is an important step in the design cycle of VLSI systems. The increasing size and complexity of modern systems require simulation techniques optimized for time. Researchers are resorting to parallel simulat...
详细信息
Simulation is an important step in the design cycle of VLSI systems. The increasing size and complexity of modern systems require simulation techniques optimized for time. Researchers are resorting to parallel simulation to reduce simulation time. Logic partitioning plays an important role in parallel simulation. Two factors, concurrency amongst the partitions and communication between them, determine the effectiveness of partitioning. The concurrency achieved and the communication overhead resulting from the intersecting signals can directly affect the speed-up achieved in the simulation. Hybrid FPGA-software simulation offers an alternative for increasing the speed of simulation. In addition to above factors, size and cost of FPGA also determine the partitioning technique for FPGA based emulation. This paper addresses the issues involved in hybrid FPGA-software simulation and presents a new partitioning scheme. With our approach, communication between partitions reduces to at least 50% of that observed in the best of the other algorithms. Also for most of the benchmarks, only 25% of the circuit elements are in the FPGA partition. Presimulation is employed as an effective tool to achieve this aim.
暂无评论