Materials datasets usually contain many redundant(highly similar)materials due to the tinkering approach historically used in material *** redundancy skews the performance evaluation of machine learning(ML)models when...
详细信息
Materials datasets usually contain many redundant(highly similar)materials due to the tinkering approach historically used in material *** redundancy skews the performance evaluation of machine learning(ML)models when using random splitting,leading to overestimated predictive performance and poor performance on out-of-distribution *** issue is well-known in bioinformatics for protein function prediction,where tools like CD-HIT are used to reduce redundancy by ensuring sequence similarity among samples greater than a given *** this paper,we survey the overestimated ML performance in materials science for material property prediction and propose MD-HIT,a redundancy reduction algorithm for material *** MD-HIT to composition-and structure-based formation energy and band gap prediction problems,we demonstrate that with redundancy control,the prediction performances of the ML models on test sets tend to have relatively lower performance compared to the model with high redundancy,but better reflect models’true prediction capability.
Diagnosis prediction is becoming crucial to develop healthcare plans for patients based on Electronic Health Records (EHRs). Existing works usually enhance diagnosis prediction via learning accurate disease representa...
详细信息
In recent years, detecting objects in aerial images has emerged as a crucial area of study within the domain of computer vision. However, due to obstacles like the limited size of objects, dense distributions, and cla...
详细信息
It is crucial for autonomous vehicles to make safe and effective decisions in real-time dynamic road environments through decision-making systems. Traditional rulebased decision-making methods struggle to handle compl...
详细信息
Training deep neural networks (DNNs) is computationally expensive, which is problematic especially when performing duplicated or similar training runs in model ensemble or fine-tuning pre-trained models, for example. ...
详细信息
The lower and upper approximations in rough set theory are important for dealing with uncertain knowledge. The covering rough set is an important part of the rough set theory and is suitable for dealing with numerical...
详细信息
Internet of Vehicles (IoV) integrates with various heterogeneous nodes, such as connected vehicles, roadside units, etc., which establishes a distributed network. Vehicles are managed nodes providing all the services ...
详细信息
Quantum-inspired models have demonstrated superior performance in many downstream language tasks, such as question answering and sentiment analysis. However, recent models primarily focus on embedding and measurement ...
详细信息
With the rapid development of artificial intelligence (AI) technology, its application in the field of clinical electroen-cephalography (EEG) diagnosis shows remarkable prospects. It has become an urgent need to assis...
详细信息
Currently, the field of individual identification utilizing coded modulation visual evoked potentials (cVEP) is gaining significant attention. However, existing methods face challenges due to the EEG signals' low ...
详细信息
暂无评论