With the popularity of the Internet and the increase in the number of netizens, it is of great significance to effectively monitor and predict the negative network public opinion on social media platforms. Most mainst...
With the popularity of the Internet and the increase in the number of netizens, it is of great significance to effectively monitor and predict the negative network public opinion on social media platforms. Most mainstream negative network public opinion early warning models mainly focus on mining textual features, while neglecting the response and dissemination among netizens during the process of negative network public opinion fermentation. This is one of the important reasons for the accuracy bottleneck of them. This article proposes a negative network public opinion early warning model based on evolutionary feature mining. Firstly, by using entity linking techniques based on knowledge graphs, domain prior knowledge is supplemented for the text need to be analyzed for negative network public opinion. Then, the text sentiment features extracted based on long short-term memory networks and the evolutionary features extracted based on graph convolutional networks are fused through an enhanced-feature fusion method. Finally, the features are used for classification and early warning. Experimental results based on the Weibo public dataset show that the EFM model has higher early warning performance for negative network public opinion compared to other baseline models. The accuracy of this model is 0.959.
Incremental learning has emerged to solve the problem of incrementally updating the classification model as the number of data classes grows. There are many challenges in incremental learning such as catastrophic forg...
详细信息
Incremental learning has emerged to solve the problem of incrementally updating the classification model as the number of data classes grows. There are many challenges in incremental learning such as catastrophic forgetting and learning efficiency. In this paper, we present a method of the modulation recognition based on incremental learning, that allows learning continuously with a class-incremental way. The new classes can be added into the existing model progressively from a sequential data stream. We conduct experiments on the modulation signal dataset characterized by the constellation diagram, the experimental results prove the feasibility of our incremental learning system. Our method performs similarly in classification accuracy compared to common multi-task joint training, but performs better in training efficiency.
Reducing accident severity is an effective mean to improve road safety level. Many researches have been done to identify the risky features which would influence the accident severity. Many risky features need to be c...
详细信息
ISBN:
(数字)9781728107707
ISBN:
(纸本)9781728107714
Reducing accident severity is an effective mean to improve road safety level. Many researches have been done to identify the risky features which would influence the accident severity. Many risky features need to be considered when building accident severity analysis model, including driver, highway, vehicle, accident, and atmospheric factors. Some of those features are irrelevant of redundant. Using those features would decrease the performance of the prediction model and bring additional computational burden. However, there are very few researches on feature selection in accident severity analysis problem to date. In this paper, we propose a particle swarm optimization (PSO) based feature selection method for accident severity analysis. The proposed method can obtain a reduced number of feature subset from the original feature pool. In order to testify the method, the accident data of Beijing from 2008 to 2010 are used for experiment. Experimental results show the proposed PSO based feature selection method can significantly reduce the number of features while improving the classification accuracy. Moreover, it can provide better interpretation of the accident severity analysis model.
When facing various and massive data resources, how to effectively utilize the resources according to the division field is one of the core problem of the institutional repository research. In this paper, we improved ...
详细信息
When facing various and massive data resources, how to effectively utilize the resources according to the division field is one of the core problem of the institutional repository research. In this paper, we improved Bayesian classification algorithm, then proposed a text classification algorithm based on domain knowledge. Furthermore, some key technologies such as text classification, feature selection, weight improvement and domain knowledge algorithm improvement are designed and implemented. We use widely applied IkAnalyzer method to classify Chinese words. For feature selection and weight improvement part, we focus on the processing of special vocabulary in the document. We introduce the field expand vocabulary assist the Bayesian formula in the field application part to obtain the final result. The experiment result shows that the improved algorithm enhanced the accuracy of the classification efficiently, and the system calculating time is acceptable.
Private Set Intersection (PSI) is one of the most important functions in secure multiparty computation (MPC). PSI protocols have been a practical cryptographic primitive and there are many privacy-preserving applicati...
Private Set Intersection (PSI) is one of the most important functions in secure multiparty computation (MPC). PSI protocols have been a practical cryptographic primitive and there are many privacy-preserving applications based on PSI protocols such as computing conversion of advertising and distributed computation. Private Set Intersection Cardinality (PSI-CA) is a useful variant of PSI protocol. PSI and PSI-CA allow several parties, each holding a private set, to jointly compute the intersection and cardinality, respectively without leaking any additional information. Nowadays, most PSI protocols mainly focus on two-party settings, while in multiparty settings, parties are able to share more valuable information and thus more desirable. On the other hand, with the advent of cloud computing, delegating computation to an untrusted server becomes an interesting problem. However, most existing delegated PSI protocols are unable to efficiently scale to multiple clients. In order to solve these problems, this paper proposes MDPPC, an efficient PSI protocol which supports scalable multiparty delegated PSI and PSI-CA operations. Security analysis shows that MDPPC is secure against semi-honest adversaries and it allows any number of colluding clients. For 15 parties with set size of 2 20 on server side and 2 16 on clients side, MDPPC costs only 81 seconds in PSI and 80 seconds in PSI-CA, respectively. The experimental results show that MDPPC has high scalability.
Dangling pointer error is pervasive in C/C++ programs and it is very hard to detect. This paper introduces an efficient detector to detect dangling pointer error in C/C++ programs. By selectively leave some memory acc...
Dangling pointer error is pervasive in C/C++ programs and it is very hard to detect. This paper introduces an efficient detector to detect dangling pointer error in C/C++ programs. By selectively leave some memory accesses unmonitored, our method could reduce the memory monitoring overhead and thus achieves better performance over previous methods. Experiments show that our method could achieve an average speed up of 9% over previous compiler instrumentation based method and more than 50% over previous page protection based method.
Dynamic searchable symmetric encryption (DSSE) enables users to delegate the keyword search over dynamically updated encrypted databases to an honest-but-curious server without losing keyword privacy. This paper studi...
Super spreaders are the flow that have a large number of distinct connections (also called spread), which related with many threats to networks. Estimating flow spread is the crucial step in super spreader detection. ...
Super spreaders are the flow that have a large number of distinct connections (also called spread), which related with many threats to networks. Estimating flow spread is the crucial step in super spreader detection. However, existing methods cannot achieve flow spread estimation in terms of accurate, efficient, and reversible simultaneously. All these characteristics is highly required for high-speed network measurement. In this paper, we propose MorphSketch, a new data structure that estimates flow spread for super spreader detection with high accuracy, memory efficiency, high throughput and reversibility. MorphSketch combines hashing with sampling to process packets in order to improve throughput. It uses self-morph bitmap to record spread information, which can adaptively enlarge the upper bound of spread estimation under limited memory usage to ensure accuracy and memory efficiency. Moreover, MorphSketch can track candidate super spreader by comparing corresponding spread information, which realizes reversibility in super spreader detection. We perform a series of performance evaluations on real world traffic trace. Experiment results demonstrate that under same memory usage, the MorphSketch significantly outperforms existing work in terms of accuracy and efficiency.
SMT solvers check the satisfiability of logic formulas over first-order theories, which have been utilized in a rich number of critical applications, such as software verification, test case generation, and program sy...
SMT solvers check the satisfiability of logic formulas over first-order theories, which have been utilized in a rich number of critical applications, such as software verification, test case generation, and program synthesis. Bugs hidden in SMT solvers would severely mislead those applications and further cause severe consequences. Therefore, ensuring the reliability and robustness of SMT solvers is of critical importance. Although many approaches have been proposed to test SMT solvers, it is still a challenge to discover bugs effectively. To tackle such a challenge, we conduct an empirical study on the historical bug-triggering formulas in SMT solvers' bug tracking systems. We observe that the historical bug-triggering formulas contain valuable skeletons (i.e., core structures of formulas) as well as associated atomic formulas which can cast significant impacts on formulas' ability in triggering bugs. Therefore, we propose a novel approach that utilizes the skeletons extracted from the historical bug-triggering formulas and enumerates atomic formulas under the guidance of association rules derived from historical formulas. In this study, we realized our approach as a practical fuzzing tool HistFuzz and conducted extensive testing on the well-known SMT solvers Z3 and cvc5. To date, HistFuzz has found 111 confirmed new bugs for Z3 and cvc5, of which 108 have been fixed by the developers. More notably, out of the confirmed bugs, 23 are soundness bugs and invalid model bugs found in the solvers' default mode, which are essential for SMT solvers. In addition, our experiments also demonstrate that HistFuzz outperforms the state-of-the-art SMT solver fuzzers in terms of achieved code coverage and effectiveness.
Learning Bayesian networks is NP-hard. Even with recent progress in heuristic and parallel algorithms, modeling capabilities still fall short of the scale of the problems encountered. In this paper, we present a massi...
详细信息
Learning Bayesian networks is NP-hard. Even with recent progress in heuristic and parallel algorithms, modeling capabilities still fall short of the scale of the problems encountered. In this paper, we present a massively parallel method for Bayesian network structure learning, and demonstrate its capability by constructing genome-scale gene networks of the model plant Arabidopsis thaliana from over 168.5 million gene expression values. We report strong scaling efficiency of 75% and demonstrate scaling to 1.57 million cores of the Tianhe-2 supercomputer. Our results constitute three and five orders of magnitude increase over previously published results in the scale of data analyzed and computations performed, respectively. We achieve this through algorithmic innovations, using efficient techniques to distribute work across all compute nodes, all available processors and coprocessors on each node, all available threads on each processor and coprocessor, and vectorization techniques to maximize single thread performance.
暂无评论