PROBLEM Recent years have witnessed the rapid progress of self-supervised language models (LMs)[1],especially large language models (LLMs)[2].LLMs not only achieved state-of-the-art performance on many natural languag...
PROBLEM Recent years have witnessed the rapid progress of self-supervised language models (LMs)[1],especially large language models (LLMs)[2].LLMs not only achieved state-of-the-art performance on many natural language processing tasks,but also captured widespread attention from the public due to their great potential in a variety of real-world applications (***,search engines,writing assistants,etc.)through providing general-purpose intelligent services.A few of the LLMs are becoming foundation models,an analogy to infrastructure,that empower hundreds of downstream applications.
In order to dynamically create a sequence of textual descriptions for images, image description models often make use of the attention mechanism, which involves an automatic focus on different regions within an image....
详细信息
Complex networks are becoming more complex because of the use of many components with diverse technologies. In fact, manual configuration that makes each component interoperable has breed latent danger to system secur...
详细信息
Complex networks are becoming more complex because of the use of many components with diverse technologies. In fact, manual configuration that makes each component interoperable has breed latent danger to system security. There is still no comprehensive review of these studies and prospects for further research. According to the complexity of component configuration and difficulty of security assurance in typical complex networks, this paper systematically reviews the abstract models and formal analysis methods required for intelligent configuration of complex networks, specifically analyzes, and compares the current key technologies such as configuration semantic awareness, automatic generation of security configuration, dynamic deployment, and verification evaluation. These technologies can effectively improve the security of complex networks intelligent configuration and reduce the complexity of operation and maintenance. This paper also summarizes the mainstream construction methods of complex networks configuration and its security test environment and detection index system, which lays a theoretical foundation for the formation of the comprehensive effectiveness verification capability of configuration security. The whole lifecycle management system of configuration security process proposed in this paper provides an important technical reference for reducing the complexity of network operation and maintenance and improving network security.
Wind field forecasting is crucial for human activities, but numerical weather prediction still has room to improve accuracy. In this paper, we formalize wind field forecast correction as a spatiotemporal sequence pred...
详细信息
Underwater target detection is an important method for detecting marine organisms. However, due to the image occlusion of underwater targets, blurred water quality, poor lighting conditions, small targets, and complex...
详细信息
The modern university computer lab and kindergarden through 12th grade classrooms require a centralized solution to efficiently manage a large number of desktops. The existing solutions either bring virtualization ove...
详细信息
The modern university computer lab and kindergarden through 12th grade classrooms require a centralized solution to efficiently manage a large number of desktops. The existing solutions either bring virtualization overhead in runtime or requires loading a large image over 30 GB leading to an unacceptable network latency. In this work, we propose Troy which takes advantage of the differencing virtual hard disk techniques in Windows *** such, Troy only loads the modifications made on one machine to all other machines. Troy consists of two modules that are responsible to generate an initial image and merge a differencing image with its parent image, respectively. Specifically, we identify the key fields in the virtual hard disk image that links the differencing image and the parent image and find the modified blocks in the differencing images that should be used to replace the blocks in the parent image. We further design a lazy copy solution to reduce the I/O burden in image merging. We have implemented Troy on bare metal machines. The evaluation results show that the performance of Troy is comparable to the native implementation in Windows, without requiring the Windows environment.
Constructing an effective common latent embedding by aligning the latent spaces of cross-modal variational autoencoders(VAEs) is a popular strategy for generalized zero-shot learning(GZSL). However, due to the lac...
详细信息
Constructing an effective common latent embedding by aligning the latent spaces of cross-modal variational autoencoders(VAEs) is a popular strategy for generalized zero-shot learning(GZSL). However, due to the lack of fine-grained instance-wise annotations, existing VAE methods can easily suffer from the posterior collapse problem. In this paper, we propose an innovative asymmetric VAE network by aligning enhanced feature representation(AEFR) for GZSL. Distinguished from general VAE structures, we designed two asymmetric encoders for visual and semantic observations and one decoder for visual reconstruction. Specifically, we propose a simple yet effective gated attention mechanism(GAM) in the visual encoder for enhancing the information interaction between observations and latent variables, alleviating the possible posterior collapse problem effectively. In addition, we propose a novel distributional decoupling-based contrastive learning(D2-CL) to guide learning classification-relevant information while aligning the representations at the taxonomy level in the latent representation space. Extensive experiments on publicly available datasets demonstrate the state-of-the-art performance of our method. The source code is available at https://***/seeyourmind/AEFR.
Pre-trained language models(PLMs),such as BERT,have achieved good results on many natural language processing(NLP)***,some studies have attempted to integrate factual knowledge into PLMs to adapt to vari-ous downstrea...
详细信息
Pre-trained language models(PLMs),such as BERT,have achieved good results on many natural language processing(NLP)***,some studies have attempted to integrate factual knowledge into PLMs to adapt to vari-ous downstream *** sentiment analysis tasks,sentiment knowledge,such as sentiment words,plays a significant role in determining the sentiment tendencies of *** Chinese sentiment analysis,historical stories and fables imbue words with richer connotations and more complex sentiments than those typically found in English,which makes senti-ment knowledge injection *** clearly,this knowledge has not been fully *** this paper,we propose EKBSA,a Chinese sentiment analysis model,which is based on the K-BERT model and utilizes a sentiment knowledge graph to achieve better results on sentiment analysis *** construct a high-quality sentiment knowledge graph,we collect a large number of sentiment words by combining several existing sentiment ***,in order to under-stand texts better,we enhance local attention through syntactic analysis and direct to EKBSA focus more on syntactical-ly relevant *** is compatible with BERT and existing structural *** results show that EKBSA achieves better performance on Chinese sentiment analysis *** upon EKBSA,we further change the gen-eral attention to the context attention and propose Context EKBSA,so that the model can adapt to sentiment analysis tasks in Chinese conversations and achieve good performance.
Long-term urban traffic flow prediction is an important task in the field of intelligent transportation,as it can help optimize traffic management and improve travel *** improve prediction accuracy,a crucial issue is ...
详细信息
Long-term urban traffic flow prediction is an important task in the field of intelligent transportation,as it can help optimize traffic management and improve travel *** improve prediction accuracy,a crucial issue is how to model spatiotemporal dependency in urban traffic *** recent years,many studies have adopted spatiotemporal neural networks to extract key information from traffic ***,most models ignore the semantic spatial similarity between long-distance areas when mining spatial *** also ignore the impact of predicted time steps on the next unpredicted time step for making long-term ***,these models lack a comprehensive data embedding process to represent complex spatiotemporal *** paper proposes a multi-scale persistent spatiotemporal transformer(MSPSTT)model to perform accurate long-term traffic flow prediction in *** adopts an encoder-decoder structure and incorporates temporal,periodic,and spatial features to fully embed urban traffic data to address these *** model consists of a spatiotemporal encoder and a spatiotemporal decoder,which rely on temporal,geospatial,and semantic space multi-head attention modules to dynamically extract temporal,geospatial,and semantic *** spatiotemporal decoder combines the context information provided by the encoder,integrates the predicted time step information,and is iteratively updated to learn the correlation between different time steps in the broader time range to improve the model’s accuracy for long-term *** on four public transportation datasets demonstrate that MSPSTT outperforms the existing models by up to 9.5%on three common metrics.
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In thi...
详细信息
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering(VQA), document-oriented VQA, key information extraction(KIE), and handwritten mathematical expression recognition(HMER). To facilitate the assessment of optical character recognition(OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression *** importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal *** evaluation pipeline and benchmark are available at https://***/Yuliang-Liu/Multimodal OCR.
暂无评论