YouTube is the most popular video contents sharing platform around the world. As a consequence, YouTube has become one of the most preferred choices to the movie producers and studios for connecting/communicating with...
详细信息
ISBN:
(纸本)9781538621752
YouTube is the most popular video contents sharing platform around the world. As a consequence, YouTube has become one of the most preferred choices to the movie producers and studios for connecting/communicating with their potential viewers through sharing trailers and teasers. data regarding the trailers of a movie from YouTube can provide useful insights for predicting the gross income of movies. In this paper, we have prepared a dataset of 7988 movie trailers from YouTube. The dataset contains different attributes like opening income, number of views, number of likes, number of dislikes, number of comments. We prepared two prediction models and applied four regression techniques to find out the most suitable technique for predicting the gross income of a movie. The comparative analysis has depicted that linear regression is the most suitable method regarding the prediction of movies gross income using these attributes. Furthermore, we have provided future research issues from where our work has ended.
This paper introduces an example system of network monitoring and auditing based on the technical requirements of network monitoring and security auditing. By sampling and analyzing the network data, the system monito...
详细信息
This paper introduces an example system of network monitoring and auditing based on the technical requirements of network monitoring and security auditing. By sampling and analyzing the network data, the system monitors the behavior of network users, records and alarms the security behavior of the system through log auditing of the host and proxy server, facilitates the publication of network information, and provides the analysis results and statistical data to ADMI. The administrators have greatly improved the level of network security management and achieved satisfactory results. Firstly, this paper analyses the network security audit log system. According to the need of the design and construction of the network security audit log monitoring system, under the guidance of the design and implementation of the prototype of B/S C/S hybrid system, the architecture of the system is given. Current audit methods generally adopt some methods of analysis and comparison. On this basis, this paper proposes that the system adopts a security audit engine with learning ability, and gives the framework of the security audit engine. Through an example analysis, it proves that the prototype system can achieve real-time monitoring and centralized management. Realize intelligent audit. The implementation of this system can provide strong support for the security audit of network system.
Multivariate statistical process control techniques have been widely used to improve processes by reducing variation and preventing defects. In modern manufacturing, because of the complexity and variability of proces...
详细信息
Multivariate statistical process control techniques have been widely used to improve processes by reducing variation and preventing defects. In modern manufacturing, because of the complexity and variability of processes, traditional multivariate control charts such as Hotelling's T-2 cannot efficiently handle situations in which the patterns of process observations are nonlinear, multimodal, and time varying. In the present study, we propose a nonparametric control chart, which is capable of adaptively monitoring time-varying and multimodal processes. Experiments with simulated and real process data from a thin film transistor-liquid crystal display (TFT-LCD) demonstrate the effectiveness and accuracy of the proposed method. (C) 2015 Elsevier Ltd. All rights reserved.
Disease prediction and the pursuit of health has always been the relish topic, especially as the aggravating population aging phenomenon. However, the existing medical knowledge used in disease prediction system is gi...
详细信息
ISBN:
(纸本)9781509027323
Disease prediction and the pursuit of health has always been the relish topic, especially as the aggravating population aging phenomenon. However, the existing medical knowledge used in disease prediction system is given by experts, which is vulnerable to restrictions of real-time and convenience. On the other hand, people don't want to use the high cost, complex medical monitoring system which could induce the inconvenience of everyday life. At the same time, the design and development of wearable bio-sensor systems for health monitoring has garnered lots of attention in the scientific community and the industry during the last years. In this article, we used an electronic health system based on multi-sensor fusion strategy. On the basis of not affecting the daily life, we got a variety of sensors with sensing, computational and data communication features, worn on the body and extracted the targeted health characteristic parameters of human. Therefore, we used the data mining algorithm to process the data in order to predict the disease tendency in long-term. Taking stroke prediction model as an example, after analyzing factors which are related to stroke, we used multi-sensor fusion system and collected 25 healthy attributes, coupled with a class attribute, built the training sample which contains a total of 125 cases of samples with 26 attributes. Based on the training sample, we used machine learning algorithms, where recognition rate could achieve 94.35%. On the basis of the classification weights, we got the most closely attributes associated with stroke and 9 healthy attributes combination with the best classification accuracy of 96.77%. Result shows that the system has achieved good classification accuracy and verified the feasibility of stroke prediction model proposed. Thus, electronic health system based on multi-sensor fusion has a high research value.
Recently, datamining in big data has become an important concern for researchers. datamining, which refers to mining the relationship between items in a dataset, has been applied by businesses to seek profitable out...
详细信息
ISBN:
(纸本)9781467398039
Recently, datamining in big data has become an important concern for researchers. datamining, which refers to mining the relationship between items in a dataset, has been applied by businesses to seek profitable outcomes. Association rule miningalgorithms such as Apriori and the FP-growth are efficient methods for discovering relations between items in large databases. To enhance the performance, many researches tend to enhance the traditional method using the MapReduce framework. In this paper, we proposed an improved association rule algorithm (IPARBC) based on MapReduce framework. The concept of combinatorial mathematics is used as the theoretical basis of the algorithm, and in order to improve mining performance by MapReduce framework, we address the high volume problem of big data. Experimental results show that the proposed algorithm outperform other algorithms substantially in terms of runtime.
Antivirus systems have difficulty in detecting polymorphic variants of known viruses without explicit signatures for such variants. Initial work on investigating efficient and effective string-based approaches for the...
详细信息
ISBN:
(纸本)9781509006229
Antivirus systems have difficulty in detecting polymorphic variants of known viruses without explicit signatures for such variants. Initial work on investigating efficient and effective string-based approaches for the automatic generation of signatures for the identification of some or all new polymorphic variants, was initially encouraging. That initial work was restricted by a number of experimental aspects. The aim of the research reported here is to examine the effects of using different substitution matrices in a string-based methods for the automatic generation of signatures for the detection of some or all new polymorphic variants. We establish how our proposed syntactic-based method using the well-known string matching Smith-Waterman algorithm can successfully identify the known polymorphic variants of *** virus. Our string-matching technique may metamorphose our understanding of polymorphic variant generation and may lead to a new phase of syntactic-based anti-viral software.
As a newly emerging solution to electricity problems in China, smart grid is also the development tendency of electric power industry worldwide in the future. In smart grid, huge data of electricity production involve...
详细信息
As a newly emerging solution to electricity problems in China, smart grid is also the development tendency of electric power industry worldwide in the future. In smart grid, huge data of electricity production involve multi-type big data technologies. Selecting suitable ones from these technologies, which will be used to analyze, compute and manage big data, is a vital to find out the potential value, law and pattern of smart ***, this paper proposes evaluation criterion and approach based on data encapsulation analysis and characteristics of big data technologies in smart grid, and finally explores the applicability of various big data technologies in smart grid.
Hierarchical Naive Bayes (HNB) is a multivariate classification algorithm that can be used to forecast the probability of a specific disease by analysing a set of Single Nucleotide Polymorphisms (SNPs). In this paper ...
详细信息
ISBN:
(纸本)9783319195513;9783319195506
Hierarchical Naive Bayes (HNB) is a multivariate classification algorithm that can be used to forecast the probability of a specific disease by analysing a set of Single Nucleotide Polymorphisms (SNPs). In this paper we present the implementation of HNB using a parallel approach based on the Map-Reduce paradigm built natively on the Hadoop framework, relying on the Amazon Cloud Infrastructure. We tested our approach on two GWAS datasets aimed at identifying the genetic bases of Type 1 (T1D) and Type 2 Diabetes (T2D). Both datasets include individual level data of 1,900 cases and 1,500 controls with similar to 420,000 SNPs. For T2D the best results were obtained using the complete set of SNPs, whereas for T1D the best performances were reached using few SNPs selected through standard univariate association tests. Our cloud-based implementation allows running genome wide simulations cutting down computational time and overall infrastructure costs.
Demand forecasting plays a very important role in retail business. Retail information systems commonly store large amounts of data which are subsequently used by sophisticated datamining tools for building forecastin...
详细信息
ISBN:
(纸本)9789532330816
Demand forecasting plays a very important role in retail business. Retail information systems commonly store large amounts of data which are subsequently used by sophisticated datamining tools for building forecasting models. Quality of these models is usually measured through their predictive accuracy as their most important property, followed by other measures which consider average underestimate and overestimate costs etc. Even though the choice of data mining algorithm is usually paramount, training set cleansing and preparation has a significant influence on final model performance. This article discusses and analyses the impact of training set preparation and tailoring on a final forecasting model performance used in a real world example from the retail industry.
In this paper we show that frequent closed itemset mining and biclustering, the two most prominent application fields in pattern discovery, can be reduced to the same problem when dealing with binary (0-1) data. FCPMi...
详细信息
In this paper we show that frequent closed itemset mining and biclustering, the two most prominent application fields in pattern discovery, can be reduced to the same problem when dealing with binary (0-1) data. FCPMiner, a new powerful pattern mining method, is then introduced to mine such data efficiently. The uniqueness of the proposed method is its extendibility to non-binary data. The mining method is coupled with a novel visualization technique and a pattern aggregation method to detect the most meaningful, non-overlapping patterns. The proposed methods are rigorously tested on both synthetic and real data sets. (c) 2014 Elsevier Ltd. All rights reserved.
暂无评论