检索结果-内蒙古大学图书馆

Shallowfake and deepfake image manipulation localization using noise and RGB-based dual branch method

SIGNAL IMAGE AND VIDEO PROCESSING 2024年第10期18卷 7065-7077页

作者： Dagar, Deepak Vishwakarma, Dinesh Kumar Delhi Technol Univ DTU Dept Informat Technol IT Biometr Res Lab Bawana Rd Delhi 110042 India

The reliability of multimedia is being progressively tested by sophisticated Image Manipulation localization (IML) methods, which has led to the creation of the IML domain. A good manipulation model requires extracting non-semantic differences features between manipulated and authentic regions to exploit artifacts, which calls for explicit comparisons between the two areas. Existing models either use handcrafted-based feature methods, convolutional neural networks (CNNs), or a combination of both. Handcrafted feature methods assume the tampering beforehand, limiting their capabilities for diverse tampering operations, while CNNs model semantic information, which is not enough for the manipulation artifact. To improve these limitations, we have designed a dual-branch model that combines handcrafted feature noise and CNNs as an encoder-decoder(ED) powered by the attention mechanism. This dual-branch model uses noise features on one branch and RGB on the other before feeding to an ED architecture for semantic learning and skip connection deployed to retain spatial information. Furthermore, this architecture uses channel spatial attention to strengthen further and refine the features' representation. Extensive experimentation on the shallowfakes dataset (CASIA, COVERAGE, COLUMBIA, NIST16) and deepfake datasets Faceforensics + + (FF + +) to demonstrate the superior feature extraction capabilities and performance to various baseline models with AUC score even reaching 99%. Also, it is one of the first methods to perform localization on the deepfake dataset. The model is relatively lighter, has 38 million parameters, and easily outperforms other State-of-the-Art(SoTA) models.

关键词： Image forensics Manipulation localization Noise inconsistencies Deepfake localization Shallowfakes and deepfake localization encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Unveiling the Power of Convolutional Neural Networks: A Comprehensive Study on Remote Sensing Image Captioning and encoder Selection

Unveiling the Power of Convolutional Neural Networks: A Comp...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Das, Swadhin Khandelwal, Akshat Sharma, Raksha Indian Inst Technol Comp Sci & Engn Roorkee Haridwar India Indian Inst Technol Chem Engn Roorkee Roorkee India

ISBN: (纸本)9798350359329;9798350359312

Extracting semantic information from remote sensing (RS) images has gained attention for its wide applications in defense, disaster management, and urban planning. Captioning RS images is challenging due to intricate properties like resolutions, color bands, and object types. Generating precise captions requires domain expertise, and manual annotation is timeconsuming. The common approach involves using an encoderdecoder-based framework for RS image captioning, where an input image is encoded into a feature vector and decoded into a caption. Selecting the right image encoder is vital for optimizing caption prediction systems in specific domains. While Convolutional Neural Network (CNN) based encoders are acknowledged for extracting crucial image features, it's important to assess variations in their mechanisms and architectures carefully. This paper thoroughly examines various CNNs to evaluate their effectiveness in RS image captioning. We also explore the performance of two caption generation techniques, viz., greedy search and beam search. The encoders are clustered as good, medium, and bad, with ResNet (CNN) emerging as the preferred choice in the good cluster across all considered datasets. The impact of choosing between beam search and greedy search is minimal. Additionally, we conduct a subjective evaluation of leading models to address limitations associated with purely numerical assessments. The paper is a novel contribution, providing the first-of-its-kind subjective evaluation of CNN-based encoders for the RS image captioning task.

关键词： Remote Sensing (RS) images captioning CNN encoder-decoder beam search greedy search and subjective evaluation

来源：评论

学校读者我要写书评

暂无评论

Multiscale Spatio-Temporal Information Cascade Single-Object Visual Tracker

引用

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 2025年 74卷

作者： Ni, Xiaoyu Yuan, Liang Han, Yongming Beijing Univ Chem Technol Coll Informat Sci & Technol Minist Educ Beijing 100029 Peoples R China Hebei Univ Architecture Sch Mech Engn Zhangjiakou 075000 Hebei Peoples R China Shanghai Jiao Tong Univ ICCI Shanghai 200240 Peoples R China Beijing Univ Chem Technol Engn Res Ctr Intelligent PSE Minist Educ China Beijing 100029 Peoples R China

Recently, Transformer has been largely explored in object tracking, and shown state-of-the-art (SOTA) performance compared to convolutional neural networks (CNNs). Especially, single-object trackers based on pure Transformer and "CNN+Transformer" frameworks have achieved great success in terms of accuracy and speed. However, most methods do not fully exploit the temporal and spatial information of targets. Furthermore, the potential of trackers for spatial information interaction and propagation between the search area and templates remains underexplored. These all limit the further improvement in tracking performance. To address these issues, we propose a multiscale cascaded single-object tracking framework based on spatio-temporal information fusion (STIF), which more comprehensively integrates the temporal and spatial information of targets and more deeply interacts the information in the search area and templates. In particular, to establish extensive spatio-temporal feature correlations, the STIF network is introduced, which uses a Transformer-based encoder-decoder structure to cross-fuse the global nonlinear temporal and spatial information of the target search area with static and dynamic templates, effectively performing fusion-based propagation. To focus on rich spatial semantic information, we design a multiscale feature extraction (MFE) network, and a feature cascade aggregation (FCA) module based on the encoder-decoder structure, which can effectively carry out interaction-based propagation. Finally, a bounding box prediction head and an IoU score head are used to predict the exact location of the target and update the dynamic templates, respectively. Extensive experiments demonstrate that our method attains better tracking performance than the baseline method. Meanwhile, the proposed method also obtains comparable results with other SOTA trackers on six challenging benchmarks, including GOT-10k, TrackingNet, LaSOT, UAV123, NFS30, and OTB100, while runni

关键词： Target tracking Transformers Feature extraction Object tracking Visualization Decoding Cameras Convolutional neural networks Robot vision systems Pipelines encoder-decoder single-object tracker spatio-temporal information Transformer visual object tracking

来源：评论

学校读者我要写书评

暂无评论

MF-EDNet: predicting stock market sector indices based on multi-feature fusion under emergency events

引用

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS 2025年 1-22页

作者： Han, Tianjiao Yuan, Chenxun Wang, Pengcheng Hao, Xingwei Guo, Fenghua UCL Math Dept Gower St London WC1E 6AE England Shandong Univ Sch Software Jinan 250101 Peoples R China

The impact of emergency events on the stock market cannot be underestimated, as their unpredictability poses significant challenges to investors' stock operations. This calls for researchers and investors to seek more effective features and reasonable methods to mitigate risks. In the context of multi-feature prediction methods, analyzing the correlation between multi-dimensional features or data has always been a challenging issue. This paper proposes a stock market index prediction framework based on an encoder-decoder architecture (MF-EDNet). The framework leverages the dynamic correlation between stock data and futures data as prior knowledge, integrating features of both internal sequences (industry indices) and external sequences (futures data) to capture the impact of emergency events on the stock market. The newly proposed Multi-Dimensional Convolutional Attention Module (MCAM) further enhances the feature extraction and attention capabilities of the attention mechanism. Experiments on multiple industry indices in the Chinese stock market demonstrate that MF-EDNet can effectively extract important features from stock and futures data, exhibiting good predictive performance under emergency events. The proposed MF-EDNet model achieved improvements of 35.8% and 22.9% in the Matthews correlation coefficient (MCC), a 3.3% increase in accuracy (ACC) and a 7.86% enhancement in profit compared to previous state-of-the-art methods.

关键词： Emergency events Stock trend prediction Feature fusion MCAM encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

R-UAV-Net: Enhanced YOLOv4 With Graph-Semantic Compression for Transformative UAV Sensing in Paddy Agronomy

引用

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 2025年第2期11卷 1197-1209页

作者： Sangaiah, Arun Kumar Anandakrishnan, Jayakrishnan Devarapelly, Aniruth Reddy Bin Mohamad, Muhammad Luqman Arif Bian, Gui-Bin Alenazi, Mohammed J. F. AlQahtani, Salman A. Natl Yunlin Univ Sci & Technol Int Grad Sch AI Touliu 64002 Taiwan Sunway Univ Sch Engn & Technol Subang Jaya 47500 Malaysia Univ Ctr Res & Dev Chandigarh Univ Mohali 140413 India Natl Inst Technol Puducherry Dept Comp Sci & Engn Karaikal 609605 India Kalasalingam Acad Res & Educ Dept Comp Sci & Engn Srivilliputhur 626190 India UPM Dept Math Fac Sci Seri Kembangan 43400 Malaysia Chinese Acad Sci Inst Automation State Key Lab Multimodal Artificial Intelligence S Beijing 100190 Peoples R China King Saud Univ Coll Comp & Informat Sci Comp Engn Dept Riyadh 11543 Saudi Arabia

Common leaf diseases pose severe problems to the agricultural industry, particularly for paddy rice, a staple crop consumed worldwide, making early detection and rapid prevention crucial for maintaining both quality and yield. This research dwells on the object detection farmwork for identifying and localising paddy leaf diseases. Future-tech Unmanned Aerial Vehicles (UAVs) offer benefits such as reduced deployment costs, increased availability, enhanced operability, and improved geographical and temporal resolution. You Only Look Once (YOLO) models excel in disease part detection but require excessive computing. A severe challenge of UAV sensing is the resource-efficient collection, transmission and disease detection from this high-resolution ground data. This research addresses these issues by introducing a Graph-inspired encoder-decoder Semantic Compression (G-SC) coupled with enhanced YOLOv4 architecture for disease detection in paddy agronomy. The proposed R-UAV-Net is an improved YOLOv4 architecture incorporating various spatial and channel feature extraction blocks with attention mechanisms for revolutionizing precision farming. R-UAV-Net outperformed state-of-the-art (SOTA) techniques, showing a 0.69% improvement in mean average precision (mAP) and a 0.12 increase in F1 score over the best-performing leaf detection model.

关键词： Diseases YOLO Feature extraction Autonomous aerial vehicles Accuracy Real-time systems Farming UAV remote sensing YOLOv4 semantic compression (G-SC) encoder-decoder precision farming leaf disease detection

来源：评论

学校读者我要写书评

暂无评论

Patch Attention U-Net for knee cartilage segmentation in magnetic resonance images

引用

BIOMEDICAL SIGNAL PROCESSING AND CONTROL 2025年 106卷

作者： Wang, Xiang Shi, Cao Qingdao Univ Sci & Technol Sch Informat Sci & Technol Qingdao 266000 Shandong Peoples R China

Knee cartilage segmentation in magnetic resonance images is a challenging task with significant clinical implications for the diagnosis and treatment of osteoarthritis. Recent advances in deep convolutional neural networks (CNNs) have shown promise in improving the accuracy of knee segmentation. However, CNNs often struggle with small samples with irregular shapes, which may result in the omission of important features such as cartilage. In this study, we propose a novel network for knee cartilage segmentation that incorporates Patch Attention (PA) block to improve the ability of network to detect small objects and a Feature Aggregation block to fuse the features from same level of the encoder and previous layer of the decoder. The PA block includes Patch-based Channel-wise Attention block and Patch-based Patch-wise Attention block, which capture intrachannel and intra-patch relationships, respectively. Our method is evaluated on publicly available datasets the 2010 Grand Challenge Knee Image Segmentation (SKI-10) dataset and Osteoarthritis Initiative (OAI) dataset, and the results demonstrate that our approach achieves impressive performance in knee cartilage segmentation.

关键词： Deep learning encoder-decoder Hybrid loss Knee cartilage segmentation Patch Attention

来源：评论

学校读者我要写书评

暂无评论

Two-Stage Residual Projection Network for Image Bit-Depth Enhancement

引用

IEEE ACCESS 2025年 13卷 31215-31227页

作者： Liu, Zepeng Wang, Yizong Sang, Yayuan Liu, Chao Tian, Jiya Xinjiang Inst Technol Sch Informat Engn Aksu 843100 Peoples R China

Low bit-depth (LBD) images produce stubborn false contour artifacts and make detailed information disappear, making bit-depth enhancement (BDE) a challenging task. Considering the mixture of structural distortions and real edges in LBD images, multi-scale features are crucial for the BDE tasks. However, existing CNN-based methods suffer from structural bottlenecks, which make it difficult to capture sufficient LBD features in a single-stage network. To overcome this issue, this paper proposes a two-stage residual projection network (TRPN) to explore the multi-scale features of BDE. An encoder-decoder structure based on alternating up and down sampling is proposed to learn wide context information in stage 1. In stage 2, a residual projection module based on dense connection is proposed to preserve the detailed texture as much as possible and avoid over-smoothing in non-flat regions, which is caused by alternating up and down sampling. To efficiently utilize multi-scale features, we introduce a supervised attention module that improves network ability by dynamically adjusting the attention weights within the model. Finally, extensive experiments demonstrate that our method achieves outstanding performance improvements both quantitatively and qualitatively, which illustrates its effectiveness.

关键词： Feature extraction Convolution Image color analysis Image reconstruction Distortion Deconvolution Training Signal processing algorithms Colored noise Visualization Bit-depth enhancement multi-scale features encoder-decoder alternating up and down sampling residual projection

来源：评论

学校读者我要写书评

暂无评论

Tissue segmentation for traumatic brain injury based on multimodal MRI image fusion-semantic segmentation

引用

BIOMEDICAL SIGNAL PROCESSING AND CONTROL 2025年 99卷

作者： Xu, Yao Chen, Zhongmin Wang, Xiaohui Jiang, Shanghai Wang, Fuping Lu, Hong Chongqing Univ Technol Sch Pharm & Bioengn Chongqing 400054 Peoples R China Chongqing Univ Technol Peoples Hosp Chongqing 7 Cent Hosp Dept Med Imaging Chongqing 400054 Peoples R China Univ Elect Sci & Technol China Chengdu 611731 Peoples R China Chongqing Univ Technol Chongqing Key Lab Opt Fiber Sensor & Photoelect De Chongqing 400051 Peoples R China

Accurate segmentation of traumatic brain injury (TBI) has great significance for physicians to diagnose and assess a patient's condition. The utilization of multimodal information plays a critical role in TBI segmentation. However, most of the existing methods mainly focus on direct extraction and selection of deep semantic features, whereas in this paper, we use image fusion as an auxiliary task for feature learning based on multimodal feature extraction to achieve more sufficient fusion of multimodal features. Therefore, we design a multimodal image fusion-semantic segmentation based framework. The proposed approach mainly consists of a semantic encoder module, a semantic segmentation module and an image fusion module. The semantic encoder compresses the input image into a smaller feature space to extract semantic features. The semantic segmentation module utilizes both the detailed information extracted by the encoder and the semantic information of high-level features extracted from the semantic segmentation module to generate the segmentation results. The image fusion module fuses semantic feature information from different modalities as an auxiliary task to semantic segmentation. Furthermore, to enhance the model's performance even further, an uncertainty-based approach is employed, which dynamically adjusts the loss weights for the image fusion task and the semantic segmentation task during the model training process. The proposed method is evaluated on a private dataset, and compared with other widely recognized methods. It demonstrates outstanding performance in both Dice score and Recall metrics.

关键词： Traumatic brain injury segmentation Image fusion Multimodal MRI encoder-decoder Deep learning

来源：评论

学校读者我要写书评

暂无评论

Investigation of data-driven model predictive control for liquid nitrogen cooling on helium refrigerator

引用

FUSION ENGINEERING AND DESIGN 2025年 211卷

作者： Yu, Qiang Zhou, Zhiwei Yuan, Kai Li, Shanshan Zhu, Zhigang Zhuang, Ming Chinese Acad Sci Inst Plasma Phys Hefei Inst Phys Sci Hefei 230031 Peoples R China Univ Sci & Technol China Hefei 230026 Peoples R China

The helium refrigerator, which is a critical infrastructure of the fusion device, should be controlled well and maintain stability. During the operation of one refrigerator in the Comprehensive Research Facility for Fusion Technology, a continuous oscillation behavior was observed in the liquid nitrogen (LN2) cooling system. This paper explores a data-driven Model Predictive Control (MPC) scheme for the LN2 cooling control. Modeling the complex system dynamics under the oscillation disturbance is achieved by the encoder-decoder recurrent neural network, which provides an end-to-end implementation for multistep prediction. The data-driven MPC applies the particle swarm optimization algorithm to find the optimal control actions, in which a novelty particle initialization method is adopted to improve the search efficiency. The performance of the data-driven MPC is evaluated by closed-loop simulation, and the simulation results indicate that the disturbance can be effectively restrained. The proposed scheme shows a promising extension prospect, such as smoothing the pulse heat load disturbance in the fusion cryogenic system.

关键词： Model predictive control encoder-decoder Recurrent neural network Liquid nitrogen cooling Helium refrigerator

来源：评论

学校读者我要写书评

暂无评论

CE-RoadNet: A Cascaded Efficient Road Network for Road Extraction from High-Resolution Satellite Images

引用

REMOTE SENSING 2025年第5期17卷 831-831页

作者： Cheng, Ke-Nan Ni, Weiping Zhang, Han Wu, Junzheng Xiao, Xiao Yang, Zhigang Northwest Inst Nucl Technol Xian 710024 Peoples R China Northwestern Polytech Univ Sch Artificial Intelligence Opt & Elect iOPEN Xian 710072 Peoples R China

The reconstruction of road networks from high-resolution satellite images is of significant importance across a range of disciplines, including traffic management, vehicle navigation and urban planning. However, existing models are computationally demanding and memory-intensive due to their high model complexity, rendering them impractical in many real-world applications. In this work, we present Cascaded Efficient Road Network (CE-RoadNet), a novel neural network architecture which emphasizes the elegance and simplicity of its design, while also retaining a noteworthy level of performance in road extraction tasks. First, a simple encoder-decoder architecture (Effi-RoadNet) is proposed, which leverages smoothed dilated convolutions combined with an attention-guided feature fusion module to aggregate features from multiple levels. Subsequently, an extended variant termed CE-RoadNet is designed in a cascaded architecture to enhance the feature representation ability of the model. Benefiting from the concise network design and the prominent representational ability of the stacking mechanism, our network can accomplish better trade-offs between accuracy and efficiency. Extensive experiments on public road datasets demonstrate that our approach achieves state-of-the-art results with lower complexity. All codes and models will be released soon to facilitate reproduction of our results.

关键词： road extraction remote sensing image encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：