It has been worthy of notice that the number of scientific researchers has experienced a rapid growth in China. Meanwhile, the strict restriction to the total number and the position structure of researchers has exert...
详细信息
Most existing algorithms of anomaly detection are suitable for static data where all data are available during detection but are incapable of handling dynamic data streams. In this study, we proposed an improved iLOF ...
详细信息
Most existing algorithms of anomaly detection are suitable for static data where all data are available during detection but are incapable of handling dynamic data streams. In this study, we proposed an improved iLOF (incremental local outlier factor) algorithm based on the landmark window model, which provides an efficient method for anomaly detection in data streams and outperforms conventional methods. What is more, data windows as updating units are introduced to reduce the false alarm rate, and multiple tests are taken here to identify candidate anomalies and real anomalies. The improved iLOF shows its obvious advantage with its false positive rate. Furthermore, the proposed algorithm instantly deletes data points of identified real anomalies. We analyzed the performance of the improved algorithm and the sensitivity of certain parameters via empirical experiments using synthetic and real data sets. The experimental results demonstrate that the proposed improved algorithm achieved better performance on the higher detection rate and the lower false alarm rate compared with the original iLOF algorithm and its improvements.
Many organizations recognize the necessities of utilizing sophisticated tools and systems to protect their computer networks and reduce the risk of compromising their information. Although many machine learning-based ...
详细信息
Many organizations recognize the necessities of utilizing sophisticated tools and systems to protect their computer networks and reduce the risk of compromising their information. Although many machine learning-based data classification algorithm has been proposed in network intrusion detection problem, each of them has its own strengths and weaknesses. In this paper, we propose an effective intrusion detection framework by using a new adaptive, robust, precise optimization method, namely, time varying chaos particle swarm optimization (TVCPSO) to simultaneously do parameter setting and feature selection for multiple criteria linear programming (MCLP) and support vector machine (SVM). In the proposed methods, a weighted objective function is provided, which takes into account trade-off between the maximizing the detection rate and minimizing the false alarm rate, along with considering the number of features. Furthermore, to make the particle swarm optimization algorithm faster in searching the optimum and avoid the search being trapped in local optimum, chaotic concept is adopted in PSO and time varying inertia weight and time varying acceleration coefficient is introduced. The performance of proposed methods has been evaluated by conducting experiments with the NSL-KDD dataset, which is derived and modified from well-known KDD cup 99 data sets. The empirical results show that the proposed method performs better in terms of having a high detection rate and a low false alarm rate when compared with the obtained results using all features. (C) 2016 Elsevier B.V. All rights reserved.
We find that there exist statistically significant negative overnight returns in China's security markets, which is totally different from the previous research on HS300 Index by He et al. (2013), and the negative...
详细信息
We find that there exist statistically significant negative overnight returns in China's security markets, which is totally different from the previous research on HS300 Index by He et al. (2013), and the negative overnight returns are comparatively larger in China's GEM (Growth Enterprise Market) board and SME (Small and Medium Enterprise) board than in the mainboards of Shanghai and Shenzhen security markets. We also find some of the SWS Primary Sectors have negative overnight returns after ticking out of market effects, which can be a great guide for investing in hedging portfolios of specific sectors.
As two kinds of popular datamining methods, metric learning and SVM have a interesting and valuable internal relationship. The basic idea of metric learning is to learn a data-dependent metric, instead of Euclidean m...
详细信息
As two kinds of popular datamining methods, metric learning and SVM have a interesting and valuable internal relationship. The basic idea of metric learning is to learn a data-dependent metric, instead of Euclidean metric, to shrink the distances between similar points and extend the distances between dissimilar points. From a different view, LSSVM can reach a similar goal as metric learning. It finds two parallel hyperplanes to make the distances between points and corresponding hyperplane as small as possible and the distance between two hyperplanes as large as possible. LSSVM can be looked as a slack version of metric learning. Then, it can be improved by modifying the way in measuring between-class distance, lead to the raise of our novel approach ML-LSSVM, which adds constraints of inter-class distance into LSSVM. Alternating direction method of multipliers algorithm was implemented to solve ML-LSSVM effectively, much faster than handling the original quadratic convex programming problem. Experiments were made to validate the efficacy of ML-LSSVM and prove that different measurements of intra-class distance and inter-class distance have significant impact on classification. At last, the relation between LMNN and ML-LSSVM was discussed to illustrate that the local formulation of LMNN is equivalent to ML-LSSVM.
Feature noise, namely noise on inputs is a long-standing plague to support vector machine(SVM). Conventional SVM with the hinge loss(C-SVM) is sparse but sensitive to feature noise. Instead, the pinball loss SVM(pin-S...
详细信息
Feature noise, namely noise on inputs is a long-standing plague to support vector machine(SVM). Conventional SVM with the hinge loss(C-SVM) is sparse but sensitive to feature noise. Instead, the pinball loss SVM(pin-SVM) enjoys noise robustness but loses the sparsity completely. To bridge the gap between C-SVM and pin-SVM, we propose the truncated pinball loss SVM((pin) over bar -SVM) in this paper. It provides a flexible framework of trade-off between sparsity and feature noise insensitivity. Theoretical properties including Bayes rule, misclassification error bound, sparsity, and noise insensitivity are discussed in depth. To train (pin) over bar -SVM, the concave-convex procedure(CCCP) is used to handle non-convexity and the decomposition method is used to deal with the subproblem of each CCCP iteration. Accordingly, we modify the popular solver LIBSVM to conduct experiments and numerical results validate the properties of (pin) over bar -SVM on the synthetic and real-world data sets. (C) 2017 Elsevier Ltd. All rights reserved.
A rational translational surface is a typical modeling surface used in computer-aided design and the architecture industry. In this study, we determine whether a given algebraic surface implicitly defined as V is a ra...
详细信息
A rational translational surface is a typical modeling surface used in computer-aided design and the architecture industry. In this study, we determine whether a given algebraic surface implicitly defined as V is a rational translational surface or not. This problem is reduced to finding the rational parameterizations of two space curves. More important, our discussions are constructive, and thus if V is translational, we provide a parametric representation of V of the form P(t(1), t(2)) = P-1(t(1)) + P-2(t(2)). (C) 2017 Elsevier B.V. All rights reserved.
The task of math word problem (MWP) generation, which generates an MWP given an equation and relevant topic words, has increasingly attracted researchers' attention. In this work, we introduce a simple memory retr...
详细信息
The task of math word problem (MWP) generation, which generates an MWP given an equation and relevant topic words, has increasingly attracted researchers' attention. In this work, we introduce a simple memory retrieval module to search related training MWPs, which are used to augment the generation. To retrieve more relevant training data, we also propose a disentangled memory retrieval module based on the simple memory retrieval module. To this end, we first disentangle the training MWPs into logical description and scenario description and then record them in respective memory modules. Later, we use the given equation and topic words as queries to retrieve relevant logical descriptions and scenario descriptions from the corresponding memory modules, respectively. The retrieved results are then used to complement the process of the MWP generation. Extensive experiments and ablation studies verify the superior performance of our method and the effectiveness of each proposed module. The code is available at https://***/mwp-g/MWPG-DMR.
Personalized search aims to adapt document ranking to user's personal interests. Traditionally, this is done by extracting click and topical features from historical data in order to construct a user profile. In r...
详细信息
ISBN:
(纸本)9781450361729
Personalized search aims to adapt document ranking to user's personal interests. Traditionally, this is done by extracting click and topical features from historical data in order to construct a user profile. In recent years, deep learning has been successfully used in personalized search due to its ability of automatic feature learning. However, the small amount of noisy personal data poses challenges to deep learning models to learn the personalized classification boundary between relevant and irrelevant results. In this paper, we propose PSGAN, a Generative Adversarial Network (GAN) framework for personalized search. By means of adversarial training, we enforce the model to pay more attention to training data that are difficult to distinguish. We use the discriminator to evaluate personalized relevance of documents and use the generator to learn the distribution of relevant documents. Two alternative ways to construct the generator in the framework are tested: based on the current query or based on a set of generated queries. Experiments on data from a commercial search engine show that our models can yield significant improvements over state-of-the-art models.
The over-smoothing problem is an obstacle of developing deep graph neural network (GNN). Although many approaches to improve the over-smoothing problem have been proposed, there is still a lack of comprehensive unders...
详细信息
暂无评论