The reliability of multimedia is being progressively tested by sophisticated Image Manipulation localization (IML) methods, which has led to the creation of the IML domain. A good manipulation model requires extractin...
详细信息
The reliability of multimedia is being progressively tested by sophisticated Image Manipulation localization (IML) methods, which has led to the creation of the IML domain. A good manipulation model requires extracting non-semantic differences features between manipulated and authentic regions to exploit artifacts, which calls for explicit comparisons between the two areas. Existing models either use handcrafted-based feature methods, convolutional neural networks (CNNs), or a combination of both. Handcrafted feature methods assume the tampering beforehand, limiting their capabilities for diverse tampering operations, while CNNs model semantic information, which is not enough for the manipulation artifact. To improve these limitations, we have designed a dual-branch model that combines handcrafted feature noise and CNNs as an encoder-decoder(ED) powered by the attention mechanism. This dual-branch model uses noise features on one branch and RGB on the other before feeding to an ED architecture for semantic learning and skip connection deployed to retain spatial information. Furthermore, this architecture uses channel spatial attention to strengthen further and refine the features' representation. Extensive experimentation on the shallowfakes dataset (CASIA, COVERAGE, COLUMBIA, NIST16) and deepfake datasets Faceforensics + + (FF + +) to demonstrate the superior feature extraction capabilities and performance to various baseline models with AUC score even reaching 99%. Also, it is one of the first methods to perform localization on the deepfake dataset. The model is relatively lighter, has 38 million parameters, and easily outperforms other State-of-the-Art(SoTA) models.
Extracting semantic information from remote sensing (RS) images has gained attention for its wide applications in defense, disaster management, and urban planning. Captioning RS images is challenging due to intricate ...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
Extracting semantic information from remote sensing (RS) images has gained attention for its wide applications in defense, disaster management, and urban planning. Captioning RS images is challenging due to intricate properties like resolutions, color bands, and object types. Generating precise captions requires domain expertise, and manual annotation is timeconsuming. The common approach involves using an encoderdecoder-based framework for RS image captioning, where an input image is encoded into a feature vector and decoded into a caption. Selecting the right image encoder is vital for optimizing caption prediction systems in specific domains. While Convolutional Neural Network (CNN) based encoders are acknowledged for extracting crucial image features, it's important to assess variations in their mechanisms and architectures carefully. This paper thoroughly examines various CNNs to evaluate their effectiveness in RS image captioning. We also explore the performance of two caption generation techniques, viz., greedy search and beam search. The encoders are clustered as good, medium, and bad, with ResNet (CNN) emerging as the preferred choice in the good cluster across all considered datasets. The impact of choosing between beam search and greedy search is minimal. Additionally, we conduct a subjective evaluation of leading models to address limitations associated with purely numerical assessments. The paper is a novel contribution, providing the first-of-its-kind subjective evaluation of CNN-based encoders for the RS image captioning task.
Recently, Transformer has been largely explored in object tracking, and shown state-of-the-art (SOTA) performance compared to convolutional neural networks (CNNs). Especially, single-object trackers based on pure Tran...
详细信息
Recently, Transformer has been largely explored in object tracking, and shown state-of-the-art (SOTA) performance compared to convolutional neural networks (CNNs). Especially, single-object trackers based on pure Transformer and "CNN+Transformer" frameworks have achieved great success in terms of accuracy and speed. However, most methods do not fully exploit the temporal and spatial information of targets. Furthermore, the potential of trackers for spatial information interaction and propagation between the search area and templates remains underexplored. These all limit the further improvement in tracking performance. To address these issues, we propose a multiscale cascaded single-object tracking framework based on spatio-temporal information fusion (STIF), which more comprehensively integrates the temporal and spatial information of targets and more deeply interacts the information in the search area and templates. In particular, to establish extensive spatio-temporal feature correlations, the STIF network is introduced, which uses a Transformer-based encoder-decoder structure to cross-fuse the global nonlinear temporal and spatial information of the target search area with static and dynamic templates, effectively performing fusion-based propagation. To focus on rich spatial semantic information, we design a multiscale feature extraction (MFE) network, and a feature cascade aggregation (FCA) module based on the encoder-decoder structure, which can effectively carry out interaction-based propagation. Finally, a bounding box prediction head and an IoU score head are used to predict the exact location of the target and update the dynamic templates, respectively. Extensive experiments demonstrate that our method attains better tracking performance than the baseline method. Meanwhile, the proposed method also obtains comparable results with other SOTA trackers on six challenging benchmarks, including GOT-10k, TrackingNet, LaSOT, UAV123, NFS30, and OTB100, while runni
The impact of emergency events on the stock market cannot be underestimated, as their unpredictability poses significant challenges to investors' stock operations. This calls for researchers and investors to seek ...
详细信息
The impact of emergency events on the stock market cannot be underestimated, as their unpredictability poses significant challenges to investors' stock operations. This calls for researchers and investors to seek more effective features and reasonable methods to mitigate risks. In the context of multi-feature prediction methods, analyzing the correlation between multi-dimensional features or data has always been a challenging issue. This paper proposes a stock market index prediction framework based on an encoder-decoder architecture (MF-EDNet). The framework leverages the dynamic correlation between stock data and futures data as prior knowledge, integrating features of both internal sequences (industry indices) and external sequences (futures data) to capture the impact of emergency events on the stock market. The newly proposed Multi-Dimensional Convolutional Attention Module (MCAM) further enhances the feature extraction and attention capabilities of the attention mechanism. Experiments on multiple industry indices in the Chinese stock market demonstrate that MF-EDNet can effectively extract important features from stock and futures data, exhibiting good predictive performance under emergency events. The proposed MF-EDNet model achieved improvements of 35.8% and 22.9% in the Matthews correlation coefficient (MCC), a 3.3% increase in accuracy (ACC) and a 7.86% enhancement in profit compared to previous state-of-the-art methods.
Common leaf diseases pose severe problems to the agricultural industry, particularly for paddy rice, a staple crop consumed worldwide, making early detection and rapid prevention crucial for maintaining both quality a...
详细信息
Common leaf diseases pose severe problems to the agricultural industry, particularly for paddy rice, a staple crop consumed worldwide, making early detection and rapid prevention crucial for maintaining both quality and yield. This research dwells on the object detection farmwork for identifying and localising paddy leaf diseases. Future-tech Unmanned Aerial Vehicles (UAVs) offer benefits such as reduced deployment costs, increased availability, enhanced operability, and improved geographical and temporal resolution. You Only Look Once (YOLO) models excel in disease part detection but require excessive computing. A severe challenge of UAV sensing is the resource-efficient collection, transmission and disease detection from this high-resolution ground data. This research addresses these issues by introducing a Graph-inspired encoder-decoder Semantic Compression (G-SC) coupled with enhanced YOLOv4 architecture for disease detection in paddy agronomy. The proposed R-UAV-Net is an improved YOLOv4 architecture incorporating various spatial and channel feature extraction blocks with attention mechanisms for revolutionizing precision farming. R-UAV-Net outperformed state-of-the-art (SOTA) techniques, showing a 0.69% improvement in mean average precision (mAP) and a 0.12 increase in F1 score over the best-performing leaf detection model.
Knee cartilage segmentation in magnetic resonance images is a challenging task with significant clinical implications for the diagnosis and treatment of osteoarthritis. Recent advances in deep convolutional neural net...
详细信息
Knee cartilage segmentation in magnetic resonance images is a challenging task with significant clinical implications for the diagnosis and treatment of osteoarthritis. Recent advances in deep convolutional neural networks (CNNs) have shown promise in improving the accuracy of knee segmentation. However, CNNs often struggle with small samples with irregular shapes, which may result in the omission of important features such as cartilage. In this study, we propose a novel network for knee cartilage segmentation that incorporates Patch Attention (PA) block to improve the ability of network to detect small objects and a Feature Aggregation block to fuse the features from same level of the encoder and previous layer of the decoder. The PA block includes Patch-based Channel-wise Attention block and Patch-based Patch-wise Attention block, which capture intrachannel and intra-patch relationships, respectively. Our method is evaluated on publicly available datasets the 2010 Grand Challenge Knee Image Segmentation (SKI-10) dataset and Osteoarthritis Initiative (OAI) dataset, and the results demonstrate that our approach achieves impressive performance in knee cartilage segmentation.
Low bit-depth (LBD) images produce stubborn false contour artifacts and make detailed information disappear, making bit-depth enhancement (BDE) a challenging task. Considering the mixture of structural distortions and...
详细信息
Low bit-depth (LBD) images produce stubborn false contour artifacts and make detailed information disappear, making bit-depth enhancement (BDE) a challenging task. Considering the mixture of structural distortions and real edges in LBD images, multi-scale features are crucial for the BDE tasks. However, existing CNN-based methods suffer from structural bottlenecks, which make it difficult to capture sufficient LBD features in a single-stage network. To overcome this issue, this paper proposes a two-stage residual projection network (TRPN) to explore the multi-scale features of BDE. An encoder-decoder structure based on alternating up and down sampling is proposed to learn wide context information in stage 1. In stage 2, a residual projection module based on dense connection is proposed to preserve the detailed texture as much as possible and avoid over-smoothing in non-flat regions, which is caused by alternating up and down sampling. To efficiently utilize multi-scale features, we introduce a supervised attention module that improves network ability by dynamically adjusting the attention weights within the model. Finally, extensive experiments demonstrate that our method achieves outstanding performance improvements both quantitatively and qualitatively, which illustrates its effectiveness.
Accurate segmentation of traumatic brain injury (TBI) has great significance for physicians to diagnose and assess a patient's condition. The utilization of multimodal information plays a critical role in TBI segm...
详细信息
Accurate segmentation of traumatic brain injury (TBI) has great significance for physicians to diagnose and assess a patient's condition. The utilization of multimodal information plays a critical role in TBI segmentation. However, most of the existing methods mainly focus on direct extraction and selection of deep semantic features, whereas in this paper, we use image fusion as an auxiliary task for feature learning based on multimodal feature extraction to achieve more sufficient fusion of multimodal features. Therefore, we design a multimodal image fusion-semantic segmentation based framework. The proposed approach mainly consists of a semantic encoder module, a semantic segmentation module and an image fusion module. The semantic encoder compresses the input image into a smaller feature space to extract semantic features. The semantic segmentation module utilizes both the detailed information extracted by the encoder and the semantic information of high-level features extracted from the semantic segmentation module to generate the segmentation results. The image fusion module fuses semantic feature information from different modalities as an auxiliary task to semantic segmentation. Furthermore, to enhance the model's performance even further, an uncertainty-based approach is employed, which dynamically adjusts the loss weights for the image fusion task and the semantic segmentation task during the model training process. The proposed method is evaluated on a private dataset, and compared with other widely recognized methods. It demonstrates outstanding performance in both Dice score and Recall metrics.
The helium refrigerator, which is a critical infrastructure of the fusion device, should be controlled well and maintain stability. During the operation of one refrigerator in the Comprehensive Research Facility for F...
详细信息
The helium refrigerator, which is a critical infrastructure of the fusion device, should be controlled well and maintain stability. During the operation of one refrigerator in the Comprehensive Research Facility for Fusion Technology, a continuous oscillation behavior was observed in the liquid nitrogen (LN2) cooling system. This paper explores a data-driven Model Predictive Control (MPC) scheme for the LN2 cooling control. Modeling the complex system dynamics under the oscillation disturbance is achieved by the encoder-decoder recurrent neural network, which provides an end-to-end implementation for multistep prediction. The data-driven MPC applies the particle swarm optimization algorithm to find the optimal control actions, in which a novelty particle initialization method is adopted to improve the search efficiency. The performance of the data-driven MPC is evaluated by closed-loop simulation, and the simulation results indicate that the disturbance can be effectively restrained. The proposed scheme shows a promising extension prospect, such as smoothing the pulse heat load disturbance in the fusion cryogenic system.
The reconstruction of road networks from high-resolution satellite images is of significant importance across a range of disciplines, including traffic management, vehicle navigation and urban planning. However, exist...
详细信息
The reconstruction of road networks from high-resolution satellite images is of significant importance across a range of disciplines, including traffic management, vehicle navigation and urban planning. However, existing models are computationally demanding and memory-intensive due to their high model complexity, rendering them impractical in many real-world applications. In this work, we present Cascaded Efficient Road Network (CE-RoadNet), a novel neural network architecture which emphasizes the elegance and simplicity of its design, while also retaining a noteworthy level of performance in road extraction tasks. First, a simple encoder-decoder architecture (Effi-RoadNet) is proposed, which leverages smoothed dilated convolutions combined with an attention-guided feature fusion module to aggregate features from multiple levels. Subsequently, an extended variant termed CE-RoadNet is designed in a cascaded architecture to enhance the feature representation ability of the model. Benefiting from the concise network design and the prominent representational ability of the stacking mechanism, our network can accomplish better trade-offs between accuracy and efficiency. Extensive experiments on public road datasets demonstrate that our approach achieves state-of-the-art results with lower complexity. All codes and models will be released soon to facilitate reproduction of our results.
暂无评论