Majority of the data available for knowledge discovery and information retrieval are prone to identity disclosure. The major act to disclose the identity is through exploring the pattern of attributes involved in data...
详细信息
ISBN:
(纸本)9789811024719;9789811024702
Majority of the data available for knowledge discovery and information retrieval are prone to identity disclosure. The major act to disclose the identity is through exploring the pattern of attributes involved in data formation. The existing benchmarking models are anonymizing the data either by generalizing, deleting the sensitive attributes, or adding noise to the data. Either of these approaches is not guaranteed in optimality and accuracy in results that obtained from the mining models applied on that data set. The deviation in results often causes falsified decision-making, which is unconditionally not acceptable in certain domains like health mining. To fill the gap, here, we proposed a novel hybridization of feature set partitioning and data restructuring to achieve the pattern anonymization. The model is particularly aimed to restructure the data for supervised learning. To the best of our knowledge, pattern anonymization is first of its kind that attempted to anonymize the patterns rather individual attributes. The experiment results also indicate the scope of robustness and scalability of the supervised learning on restructured data.
Clustering is the procedure of consortium a set of entities in such a manner those similar entities should in the same group. Cluster analysis is not one specific approach, but the general process to be observed. Clus...
详细信息
ISBN:
(纸本)9789811016752;9789811016745
Clustering is the procedure of consortium a set of entities in such a manner those similar entities should in the same group. Cluster analysis is not one specific approach, but the general process to be observed. Clustering can be viewed by different algorithms that differ independently, in their view what is meant by a cluster and how to find them perfectly. Popular notions of clusters include groups with minimum distances among the cluster members. The clustering problem has been discussed by researchers in different things with respective domain. It reveals broad scope of clustering and it is very important in the process of data analysis as one step. However, it is very difficult because of the researchers may assume in different contexts. Clustering is one of best approach of datamining and a common methodology for statistical data analysis. It is used in all major domains like Banking, Health care, Robotics, and other disciplines. This paper mainly aims to discuss about limitations, scope, and purpose of different clustering algorithms in a great detail.
作者:
Jia, JiaTsinghua Univ
Dept Comp Sci & Technol Beijing 100084 Peoples R China Minist Educ Beijing
Key Lab Pervas Comp Natl Res Ctr Informat Sci & Technol Beijing Peoples R China
Mental health has become a general concern of people nowadays. It is of vital importance to detect and manage mental health issues before they turn into severe problems. Traditional psychological interventions are rel...
详细信息
ISBN:
(纸本)9780999241127
Mental health has become a general concern of people nowadays. It is of vital importance to detect and manage mental health issues before they turn into severe problems. Traditional psychological interventions are reliable, but expensive and hysteretic. With the rapid development of social media, people are increasingly sharing their daily lives and interacting with friends online. Via harvesting social media data, we comprehensively study the detection of mental wellness, with two typical mental problems, stress and depression, as specific examples. Initializing with binary user-level detection, we expand our research towards multiple contexts, by considering the trigger and level of mental health problems, and involving different social media platforms of different cultures. We construct several benchmark real-world datasets for analysis and propose a series of multi-modal detection models, whose effectiveness are verified by extensive experiments. We also make in-depth analysis to reveal the underlying online behaviors regarding these mental health issues.
In this paper, Wireless sensor sequence datamining model is demonstrated for the smart home and Internet of Things data analytics. Exploration of the sensor data patterns by correlating with the multi stream sensor d...
详细信息
ISBN:
(纸本)9789811024719;9789811024702
In this paper, Wireless sensor sequence datamining model is demonstrated for the smart home and Internet of Things data analytics. Exploration of the sensor data patterns by correlating with the multi stream sensor data that are fused from the wireless sensor network is presented. The effective realization of the sensor data patterns from heterogeneous sensing systems for various applications of IoT can be known from the proposed conceptual data model. The conceptual data model includes the discovering of frequent pattern item sets using various computational archetypal. Results of the explicit patterns augmented for data analytics are encouraging as the prototype was tested through real-time data rather than test bed scenario data or synthetic data.
Straggler task is commonly considered as the major bottleneck in parallel data processing. Previous work mainly focuses on the coarse-grained straggler detection and optimization such as speculative scheduling. Howeve...
详细信息
ISBN:
(纸本)9781538625880
Straggler task is commonly considered as the major bottleneck in parallel data processing. Previous work mainly focuses on the coarse-grained straggler detection and optimization such as speculative scheduling. However, fine-grained root-cause analysis of straggler tasks is rarely considered. In addition, existing work simply depends on empirical analysis, which lacks of useful guidance to performance optimization. In this paper, we propose a new methodology of fine-grained straggler root-cause analysis using machine learning. We collect raw metrics from Spark event log and hardware sampling tool, and refine them into high-level metrics for model learning. Then we present the root-cause analysis of stragglers through CART tree. A customized prune method is also applied to improve analysis accuracy. From the analysis, we derive several new findings beyond the well known causes of stragglers. Our work provides a new perspective on identifying and understanding the inefficiency in parallel data processing programs by applying machine learning techniques to fine-grained root-cause analysis of straggler tasks.
data stream mining is the process of extracting knowledge from continuously generated data. Since data stream processing is not a trivial task, the streams have to be analyzed with proper stream mining techniques. In ...
详细信息
ISBN:
(纸本)9789811038747;9789811038730
data stream mining is the process of extracting knowledge from continuously generated data. Since data stream processing is not a trivial task, the streams have to be analyzed with proper stream mining techniques. In many large volume of data stream processing, stream clustering helps to find the valuable hidden information. Many works have concentrated on clustering the data streams using various methods, but mostly those approaches lack in some core tasks needed to improve the cluster accuracy and quick processing of data streams. To tackle the problem of improving cluster quality and reducing the time for data stream processing time in cluster generation, the partition-based DBStream clustering method is proposed. The result has been compared with various data stream clustering methods, and it is evident from the experiments that the purity of clusters improves 5% and the time taken is reduced by 10% than the average time taken by other methods for clustering the data streams.
The quantity of elderly people like to live in their homes, secluded, in their brilliant age is expanding exponentially. This is not a perfect path for an elderly individual to live. However, the urbanization and resu...
详细信息
Reconfigurable computing has been considering as one of the main approaches to utilize billions transistors integrated in a chip in this nanoscale era. In this paper, we introduce our adaptable VLIW processor whose or...
详细信息
ISBN:
(纸本)9781538606070
Reconfigurable computing has been considering as one of the main approaches to utilize billions transistors integrated in a chip in this nanoscale era. In this paper, we introduce our adaptable VLIW processor whose organization is reconfigurable at design time so that the processor is optimized for a particular program or application-domain. Issue-width, the number of registers, instruction and data caches, and functional units availability are reconfigurable parameters. The processor, in all configurations, can function in a 5-stage pipeline model to improve performance. Moreover, we present our design framework that helps users determine an optimized VLIW processor configuration according to requirements of the executed program. The design framework also allows users to implement the VLIW processor on an FPGA-based platform without knowing HDLs. We synthesize the processor in different configurations with four FPGA technologies. The synthesis results show that our processor can work at up to 174.89 MHz with the Xilinx Virtex 7 technology. When compared to NIOS II RISC-like softcore processor at the same working frequency, our adaptable VLIW processor achieves speed-ups by up to 22.6x.
Frequent pattern mining is playing an increasingly important role in a growing number of real-time data flow scenarios, such as large-scale order stream data, network traffic monitoring, web accessing record stream, a...
详细信息
ISBN:
(纸本)9781509063185
Frequent pattern mining is playing an increasingly important role in a growing number of real-time data flow scenarios, such as large-scale order stream data, network traffic monitoring, web accessing record stream, and so on. The continuous, unbounded and high speed characteristics of massive data stream are a huge challenge for the current frequent pattern mining approach. The main challenge is that, as data stream continuously arriving, the non frequent patterns discarded can possibly become frequent again. In this paper, aimed at the characteristics of real-time data stream, we propose a compact data structure, called CPS-tree to maintain and operate the full information of data stream. Compared to current related works, our algorithm can dynamically support large-scale data stream with one-pass scan which can be easily applied to other data stream processing environments;Moreover, the load imbalance in the current frequent pattern mining is a pretty common problem. We analysis the features of data stream, and propose a depth-based strategy to solve the imbalance problem in our parallel algorithm. In conclusion, we propose the BPFPMS algorithm, a balanced parallel frequent pattern mining over massive data stream, to dynamically and efficiently mine frequent patterns over large scale data stream. Our experiments show that our algorithm can achieve a good speedup and a good degree of balance among each node with different degree of parallelism.
Owing to the rapid growth of multimedia technology, multimedia information is easily accessed by any user and the same information construction and distribution are also very easy. Due to technology development, the m...
详细信息
ISBN:
(纸本)9789811024719;9789811024702
Owing to the rapid growth of multimedia technology, multimedia information is easily accessed by any user and the same information construction and distribution are also very easy. Due to technology development, the multimedia information increases due to variety of factors: it can be uploaded by unprofessional users nowadays. Due to the low quality and the large number of duplicated video files available, this leads video extraction more and more complex. The general method of representing each video segment is shot that consists of series of frames. Among this series, the input frame based shot method is specifically assisted for searching the video content as clients provided image query/search where an image will be matched with the indexed key frames with assist of resemblance distance. As a result, the key frames selection is most significant, and several methods are used to automate the process. This paper proposes a new technique for key frame selection. The proposed method shows significantly good and the experiments prove the above statement.
暂无评论