Distributed network data has the characteristics of distribution and concurrency, which leads to the complexity of data processing and reduces the effectiveness of security risk assessment. Therefore, a security risk ...
详细信息
Distributed network data has the characteristics of distribution and concurrency, which leads to the complexity of data processing and reduces the effectiveness of security risk assessment. Therefore, a security risk assessment method for distributed network data based on the simhash algorithm is proposed. The actual support of the distributed network data set is reconstructed by probability distortion technology, and the data mining results after probability transformation are obtained by using the data mining method of random disturbance. In order to avoid the existence of duplicate information and redundant data, duplicate distributed network data is removed by calculating text similarity. Finally, the simhash algorithm is used to calculate the hash value before and after the distributed network data attack, calculate the security risk assessment value of the distributed network data, and complete the security risk assessment. The analysis of the experimental results shows that the proposed method effectively improves the reliability of risk assessment of distributed network data and reduces the communication overhead of the assessment, with the maximum communication overhead not exceeding 10 bits. Therefore, the research method has high effectiveness and practicability.
In order to overcome the low recall and precision of traditional English teaching information retrieval model, this paper designs a hierarchical retrieval model of digital English teaching information based on ontolog...
详细信息
In order to overcome the low recall and precision of traditional English teaching information retrieval model, this paper designs a hierarchical retrieval model of digital English teaching information based on ontology. TF-IDF and simhash algorithm are used to judge the similarity of digital English teaching database documents and calculate the weight of English teaching information retrieval keywords. Using the relationship between different retrieval concepts to build a semantic network diagram, a hierarchical retrieval model of digital English teaching information was built, the retrieval results according to the user's interest were adjusted, and more accurate retrieval results are obtained. The experimental results show that the recall rate of the model is more than 94%, the precision rate is more than 96%, and the average retrieval time is only 0.44s, which shows that the recall rate and precision rate of the design model are higher, and the retrieval time is shorter.
In the context of the rapidly evolving power industry, the efficient operation of a power work ticket risk prediction system is essential to ensure the safe operation and efficiency of the power system. In light of th...
详细信息
ISBN:
(纸本)9798400718267
In the context of the rapidly evolving power industry, the efficient operation of a power work ticket risk prediction system is essential to ensure the safe operation and efficiency of the power system. In light of the challenges posed by the need to improve response speed and to handle highly concurrent requests, the application of Large Language Models (LLMs) in power system work ticket risk assessment has become a necessary advancement to achieve more accurate risk prediction. Nevertheless, the utilisation of Large Language Models (LLMs) for the assessment of work ticket risk in power systems has resulted in the necessity to respond more expeditiously. This study proposes a similar matching cache using Memcached as an LLM pre-processing request for fast response challenges, with the aim of optimising the speed of operation for work ticket risk prediction in power systems. A cache of high-frequency response results is maintained on the server side and is optimised by a combination of the simhash algorithm and the cosine similarity algorithm to match and filter matching predictions from the cache in order to respond to user requests. The results of this study show that the method can significantly optimise the speed of operation for risk prediction of work tickets in power systems, effectively addressing the challenges of highly concurrent processing.
Purpose The operating wagon records were produced from distinct railway information systems, which resulted in the wagon routing record with the same oriental destination (OD) was different. This phenomenon has broug...
详细信息
Purpose The operating wagon records were produced from distinct railway information systems, which resulted in the wagon routing record with the same oriental destination (OD) was different. This phenomenon has brought considerable difficulties to the railway wagon flow forecast. Some were because of poor data quality, which misled the actual prediction, while others were because of the existence of another actual wagon routings. This paper aims at finding all the wagon routing locus patterns from the history records, and thus puts forward an intelligent recognition method for the actual routing locus pattern of railway wagon flow based on SST algorithm. Design/methodology/approach Based on the big data of railway wagon flow records, the routing metadata model is constructed, and the historical data and real-time data are fused to improve the reliability of the path forecast results in the work of railway wagon flow forecast. Based on the division of spatial characteristics and the reduction of dimension in the distributary station, the improved simhash algorithm is used to calculate the routing fingerprint. Combined with Squared Error Adjacency Matrix Clustering algorithm and Tarjan algorithm, the fingerprint similarity is calculated, the spatial characteristics are clustering and identified, the routing locus mode is formed and then the intelligent recognition of the actual wagon flow routing locus is realized. Findings This paper puts forward a more realistic method of railway wagon routing pattern recognition algorithm. The problem of traditional railway wagon routing planning is converted into the routing locus pattern recognition problem, and the wagon routing pattern of all OD streams is excavated from the historical data results. The analysis is carried out from three aspects: routing metadata, routing locus fingerprint and routing locus pattern. Then, the intelligent recognition SST-based algorithm of railway wagon routing locus pattern is propos
This paper intends to perform de-duplication for enhancing the storage optimization by utilizing the similarity in mutual information. Hence, this paper contributes by proposing a hybrid fingerprint extracting using S...
详细信息
This paper intends to perform de-duplication for enhancing the storage optimization by utilizing the similarity in mutual information. Hence, this paper contributes by proposing a hybrid fingerprint extracting using SH and HC algorithms. Secondly, the data is clustered using the latest technique called as SOMI-GO to extract the metadata. The extracted metadata is stored in metadata server which provides better storage optimization and de-duplication. SOMI-GO is adopted as it provides maximum second-order mutual information based on the similarity index. The proposed SOMI-GO technique is compared with the existing methods such as K-means, K-mode, ED-PSO, ED-GA and ED-GWO in terms of accuracy, TPR, TNR and performance time and the significance of the SOMI-GO method is described.
This paper intends to perform de-duplication for enhancing the storage optimization. Hence, this paper contributes by proposing a hybrid fingerprint extracting using simhash (SH) and Huffman coding (HC) algorithms. Se...
详细信息
This paper intends to perform de-duplication for enhancing the storage optimization. Hence, this paper contributes by proposing a hybrid fingerprint extracting using simhash (SH) and Huffman coding (HC) algorithms. Secondly, the data is clustered using the latest technique called as grey wolf optimization (GWO) to extract the metadata. The extracted metadata is stored in metadata server which provides better storage optimization and de-duplication. Euclidean distance based GWO is adopted as it provides minimum Euclidean distance in the GWO based clustering for de-duplication. The proposed GWO based clustering method is compared with the existing methods such as k-means, k-mode, Euclidean distance based Particle Swarm Optimization and Euclidean distance based genetic algorithm in terms of accuracy, True Positive Rate (TPR), True Negative Rate (TNR) and performance time and the significance of the GWO based clustering method is described.
In order to carry out similarity retrieval in mass information accurately and efficiently,this paper proposes a similarity retrieval algorithm based on multilevel fingerprint comparison *** mass text information,first...
详细信息
ISBN:
(纸本)9781510873919
In order to carry out similarity retrieval in mass information accurately and efficiently,this paper proposes a similarity retrieval algorithm based on multilevel fingerprint comparison *** mass text information,firstly,using the simhash algorithm to generate multilevel fingerprints;secondly,selecting the similar texts,and constructing a comparison matrix;then,the similarity between texts is accurately marked by using the comparison matrix;Finally,the real data of a company is applied to verify the accuracy and efficiency of the proposed algorithm.
Traditional web search forces the developers to leave their working environments and look for solutions in the web browsers. It often does not consider the context of their programming problems. The context-switching ...
详细信息
ISBN:
(纸本)9781479929313
Traditional web search forces the developers to leave their working environments and look for solutions in the web browsers. It often does not consider the context of their programming problems. The context-switching between the web browser and the working environment is time-consuming and distracting, and the keyword-based traditional search often does not help much in problem solving. In this paper, we propose an Eclipse IDE-based web search solution that collects the data from three web search APIs-Google, Yahoo, Bing and a programming Q & A site- StackOverflow. It then provides search results within IDE taking not only the content of the selected error into account but also the problem context, popularity and search engine recommendation of the result links. Experiments with 25 runtime errors and exceptions show that the proposed approach outperforms the keyword-based search approaches with a recommendation accuracy of 96%. We also validate the results with a user study involving five prospective participants where we get a result agreement of 64.28%. While the preliminary results are promising, the approach needs to be further validated with more errors and exceptions followed by a user study with more participants to establish itself as a complete IDE-based web search solution.
暂无评论