This paper presents a self-training word embedding text classification model based on knowledge graph expansion for text classification. Current mixed word embedding methods are overly dependent on the Fasttext pre-tr...
详细信息
ISBN:
(数字)9781728143286
ISBN:
(纸本)9781728143293
This paper presents a self-training word embedding text classification model based on knowledge graph expansion for text classification. Current mixed word embedding methods are overly dependent on the Fasttext pre-training model and here is still a problem of missing words with rich semantic information are not mapped. First, we propose a method for extracting missing nouns based on shape near word filtering. Second, we design a self-training word embedding method based on knowledge graph that mixes with pre-training word embedding to obtain a high-quality mixed word vector with rich semantics and rich semantics. Third, we designed a GRU model based on improved mixed word embedding to improve the quality of text classification. Experiments conducted on multiple text classification datasets demonstrate that our methods can effectively improve the text classification accuracy.
In the era of big data, HBase has been widely used in scenarios of massive unstructured data. For the financial big data, due to the integrity and timing of it, unreasonable data storage and management usually lead to...
详细信息
ISBN:
(数字)9781728143286
ISBN:
(纸本)9781728143293
In the era of big data, HBase has been widely used in scenarios of massive unstructured data. For the financial big data, due to the integrity and timing of it, unreasonable data storage and management usually lead to hot spots that decreases the query performance. In practice, the separation of hot and cold financial data will improve data query performance and utilization rate of cluster resources. In this paper, a hot and cold data separation scheme is designed, to store infrequently queried financial data to HBase, and frequently queried one to Redis. The cold data is reasonably planned and managed through pre-partitioning and row key design for HBase. A hot data cache based on Redis is realized to improve the query speed and reduces the pressure of HBase. In addition, due to the lack of Redis's inherent cache elimination strategy, we propose a caching strategy based on the frequencies of updating and querying operations. The experimental results show that the scheme can effectively avoid the hot storage problem, and improve the query performance, and improve the cache hit ratio of Redis. Therefore, the number of cold data access requests can be effectively reduced.
Community detection is a significant research direction in the research of social networks. To improve the quality of seeds selection and expansion, we propose an influence seeds extension overlapping community detect...
详细信息
One of the traditional ways for detecting dynamic communities is to find the communities at each interval through the static community detection ***,it usually leads to high computation *** this paper,a novel algorith...
详细信息
ISBN:
(纸本)9781509036202
One of the traditional ways for detecting dynamic communities is to find the communities at each interval through the static community detection ***,it usually leads to high computation *** this paper,a novel algorithm based on the Map Reduce model and the label propagation progress with the strategy of incremental related vertices is proposed,which is called PLPIRV(Parallel Label Propagation and Incremental Related Vertices).Based on the communities found at the previous interval,the new algorithm adjusts the communities the incremental related vertices belong *** clustering of the whole network can be avoided by incrementally analyzing the variation of the networks,so that the time cost can be greatly *** on artificial and real datasets show that the proposed algorithm performs well on dynamic community detection.
Finding communities in networks is one of the challenging issues in complex network research. We have to deal with very large networks that contain billions of vertices, which makes community discovery a computational...
详细信息
We propose two simple and effective spiking neuron models to improve the response time of the conventional spiking neural network. The proposed neuron models adaptively tune the presynaptic input current depending on ...
详细信息
Erasure coding has been increasingly used by distributed storage systems to maintain fault tolerance with low storage redundancy. However, how to enhance the performance of degraded reads in erasure-coded storage has ...
详细信息
ISBN:
(纸本)9781509035144
Erasure coding has been increasingly used by distributed storage systems to maintain fault tolerance with low storage redundancy. However, how to enhance the performance of degraded reads in erasure-coded storage has been a critical issue. We revisit this problem from two different perspectives that are neglected by existing studies: data placement and encoding rules. To this end, we propose an encoding-aware data placement (EDP) approach that aims to reduce the number of I/Os in degraded reads during a single failure for general XOR-based erasure codes. EDP carefully places sequential data based on the encoding rules of the given erasure code. Trace-driven evaluation results show that compared to two baseline data placement methods, EDP reduces up to 37.4% of read data on the most loaded disk and shortens up to 15.4% of read time.
Nonlocal interferometric phase filtering methods achieve excellent performance in both noise reduction and texture preservation, even in the case of complicated topography and low coherence. The main limitation of the...
详细信息
Nonlocal interferometric phase filtering methods achieve excellent performance in both noise reduction and texture preservation, even in the case of complicated topography and low coherence. The main limitation of the nonlocal methods is the computational burden. This paper proposed a nonlocal phase filtering strategy for the practical InSAR system, which combine the nonlocal algorithm with the traditional method to improve the efficiency.
暂无评论