This paper presents an improved machine learning approach for prediction of second-hand housing prices in Shanghai. It firstly builds the random forest model and the XGboost model with Shanghai second-hand housing tra...
详细信息
Much of named entity recognition (NER) research focuses on developing dataset-specific models based on data from the domain of interest, and a limited set of related entity types. This is frustrating as each new datas...
详细信息
Cloud computing usage in electronic government (e-government) is developing across nations and presents a range of difficulties. Infrastructure problems, such as unreliable internet connections and insufficient device...
详细信息
Modern mobile devices have access to enormous amounts of user data including text, images, speech, etc., which can be utilized to train high-performance learning models and enhance the user experience. However, access...
详细信息
The cooperative global robust output regulation problem for a class of nonlinear uncertain multi-agent systems with dynamic uncertainty has been approached by some distributed state feedback control law, however this ...
详细信息
In this paper, we present an enhanced medical image segmentation approach leveraging the nnUNet framework, specifically tailored to integrate bounding box prompts for improved segmentation accuracy in resource-constra...
详细信息
Convolutional neural networks are usually composed of convolutional layers and pooling layers. Pooling operations effectively control the weight update of convolutional neural networks. The existing pooling operations...
详细信息
Dirty data are prevalent in time series, such as energy consumption or stock data. Existing data cleaning algorithms present shortcomings in dirty data identification and unsatisfactory cleaning decisions. To handle t...
详细信息
Dirty data are prevalent in time series, such as energy consumption or stock data. Existing data cleaning algorithms present shortcomings in dirty data identification and unsatisfactory cleaning decisions. To handle these drawbacks, we leverage inherent recurrent patterns in time series, analogize them as fixed combinations in textual data, and incorporate the concept of perplexity. The cleaning problem is thus transformed to minimize the perplexity of the time series under a given cleaning cost, and we design a four-phase algorithmic framework to tackle this problem. To ensure the framework's feasibility, we also conduct a brief analysis of the impact of dirty data and devise an automatic budget selection strategy. Moreover, to make it more generic, we additionally introduce advanced solutions, including an ameliorative probability calculation method grounded in the homomorphic pattern aggregation and a greedy-based heuristic algorithm for resource savings. Experiments on 12 real-world datasets demonstrate the superiority of our methods.
In document-level neural machine translation (DocNMT), multi-encoder approaches are common in encoding context and source sentences. Recent studies (Li et al., 2020) have shown that the context encoder generates noise...
详细信息
Identifying semantic types for attributes in relations,known as attribute semantic type(AST)identification,plays an important role in many data analysis tasks,such as data cleaning,schema matching,and keyword search i...
详细信息
Identifying semantic types for attributes in relations,known as attribute semantic type(AST)identification,plays an important role in many data analysis tasks,such as data cleaning,schema matching,and keyword search in ***,due to a lack of unified naming standards across prevalent information systems(*** islands),AST identification still remains as an open *** tackle this problem,we propose a context-aware method to figure out the ASTs for relations in this *** transform the AST identification into a multi-class classification problem and propose a schema context aware(SCA)model to learn the representation from a collection of relations associated with attribute values and schema *** on the learned representation,we predict the AST for a given attribute from an underlying relation,wherein the predicted AST is mapped to one of the labeled *** improve the performance for AST identification,especially for the case that the predicted semantic types of attributes are not included in the labeled ASTs,we then introduce knowledge base embeddings(***)to enhance the above representation and construct a schema context aware model with knowledge base enhanced(SCA-KB)to get a stable and robust *** experiments based on real datasets demonstrate that our context-aware method outperforms the state-of-the-art approaches by a large margin,up to 6.14%and 25.17%in terms of macro average F1 score,and up to 0.28%and 9.56%in terms of weighted F1 score over high-quality and low-quality datasets respectively.
暂无评论