Task scheduling, which is important in cloud computing, is one of the most challenging issues in this area. Hence, an efficient and reliable task scheduling approach is needed to produce more efficient resource employ...
详细信息
With the continuous advancement of satellite technology, remote sensing images has been increasingly applied in fields such as urban planning, environmental monitoring, and disaster response. However, remote sensing i...
详细信息
With the continuous advancement of satellite technology, remote sensing images has been increasingly applied in fields such as urban planning, environmental monitoring, and disaster response. However, remote sensing images often feature small target sizes and complex backgrounds, posing significant computational challenges for object detection tasks. To address this issue, this paper proposes a lightweight remote sensing images object detection algorithm based on YOLOv9. The proposed algorithm incorporates the SimRMB module, which effectively reduces computational complexity while improving the efficiency and accuracy of feature extraction. Through a dynamic attention mechanism, SimRMB is capable of focusing on important regions while minimizing background interference, and by integrating residual learning and skip connections, it ensures the stability of deep networks. To further enhance detection performance, the FasterRepNCSPELAN4 module is introduced, which employs PConv operations to reduce computational load and memory usage. It also utilizes dilated convolutions and DFC attention mechanisms to strengthen feature extraction, thereby increasing the efficiency and accuracy of object detection. Additionally, this study integrates the GhostModuleV2 module, which generates core feature maps and employs lightweight operations to create redundant features, greatly reducing the computational complexity of *** results show that on the SIMD dataset, the improved YOLOv9 model has a parameter size of 167.88 MB and GFLOPs of 208.6. Compared to the baseline YOLOv9 model (parameter size: 194.57 MB, GFLOPs: 239.0), the parameter size is reduced by 13.71%, GFLOPs are reduced by 12.72%, and detection accuracy is improved by 1.4%. These results demonstrate that the proposed lightweight YOLOv9 model effectively reduces computational overhead while maintaining excellent detection performance, providing an efficient solution for object detection tasks in resou
In the realm of medical diagnostics, particularly in differential diagnosis, where differentiating between illnesses or ailments with comparable symptoms is essential, deep learning has gained importance. Recent devel...
详细信息
Decentralized Anonymous Payment Systems (DAP), often known as cryptocurrencies, stand out as some of the most innovative and successful applications on the blockchain. These systems have garnered significant attention...
详细信息
Dear Editor,This letter focuses on how an attacker can design suitable improved zero-dynamics (ZD) attack signal based on state estimates of target system. Improved ZD attack is to change zero dynamic gain matrix of a...
Dear Editor,This letter focuses on how an attacker can design suitable improved zero-dynamics (ZD) attack signal based on state estimates of target system. Improved ZD attack is to change zero dynamic gain matrix of attack signal to a matrix with determinant greater than 1.
The proliferation of cooking videos on the internet these days necessitates the conversion of these lengthy video contents into concise text recipes. Many online platforms now have a large number of cooking videos, in...
详细信息
The proliferation of cooking videos on the internet these days necessitates the conversion of these lengthy video contents into concise text recipes. Many online platforms now have a large number of cooking videos, in which, there is a challenge for viewers to extract comprehensive recipes from lengthy visual content. Effective summary is necessary in order to translate the abundance of culinary knowledge found in videos into text recipes that are easy to read and follow. This will make the cooking process easier for individuals who are searching for precise step by step cooking instructions. Such a system satisfies the needs of a broad spectrum of learners while also improving accessibility and user simplicity. As there is a growing need for easy-to-follow recipes made from cooking videos, researchers are looking on the process of automated summarization using advanced techniques. One such approach is presented in our work, which combines simple image-based models, audio processing, and GPT-based models to create a system that makes it easier to turn long culinary videos into in-depth recipe texts. A systematic workflow is adopted in order to achieve the objective. Initially, Focus is given for frame summary generation which employs a combination of two convolutional neural networks and a GPT-based model. A pre-trained CNN model called Inception-V3 is fine-tuned with food image dataset for dish recognition and another custom-made CNN is built with ingredient images for ingredient recognition. Then a GPT based model is used to combine the results produced by the two CNN models which will give us the frame summary in the desired format. Subsequently, Audio summary generation is tackled by performing Speech-to-text functionality in python. A GPT-based model is then used to generate a summary of the resulting textual representation of audio in our desired format. Finally, to refine the summaries obtained from visual and auditory content, Another GPT-based model is used
The agricultural sector contributes significantly to greenhouse gas emissions, which cause global warming and climate change. Numerous mathematical models have been developed to predict the greenhouse gas emissions fr...
详细信息
In recent decades, brain tumors have been regarded as a severe illness that causes significant damage to the health of the individual, and finally it results to death. Hence, the Brain Tumor Segmentation and Classific...
详细信息
In recent decades, brain tumors have been regarded as a severe illness that causes significant damage to the health of the individual, and finally it results to death. Hence, the Brain Tumor Segmentation and Classification (BTSC) has gained more attention among researcher communities. BTSC is the process of finding brain tumor tissues and classifying the tissues based on the tumor types. Manual tumor segmentation from is prone to error and a time-consuming task. A precise and fast BTSC model is developed in this manuscript based on a transfer learning-based Convolutional Neural Networks (CNN) model. The utilization of a variant of CNN is because of its superiority in distinct tasks. In the initial phase, the Magnetic Resonance Imaging (MRI) brain images are acquired from the Brain Tumor Image Segmentation Challenge (BRATS) 2019, 2020 and 2021 databases. Then the image augmentation is performed on the gathered images by using zoom-in, rotation, zoom-out, flipping, scaling, and shifting methods that effectively reduce overfitting issues in the classification model. The augmented images are segmented using the layers of the Visual-Geometry-Group (VGG-19) model. Then feature extraction using An Attribute Aware Attention (AWA) methodology is carried out on the segmented images following the segmentation block in the VGG-19 model. The crucial features are then selected using the attribute category reciprocal attention phase. These features are inputted to the Model Agnostic Concept Extractor (MACE) to generate the relevance score between the features for assisting in the final classification process. The obtained relevance scores from the MACE are provided to the max-pooling layer of the VGG-19 model. Then, the final classified output is obtained from the modified VGG-19 architecture. The implemented Relevance score with the AWA-based VGG-19 model is used to classify the tumor as the whole tumor, enhanced tumor, and tumor core. In the classification section, the proposed
In recent years, deep learning has significantly advanced skin lesion segmentation. However, annotating medical image data is specialized and costly, while obtaining unlabeled medical data is easier. To address this c...
详细信息
Image captioning is an interdisciplinary research hotspot at the intersection of computer vision and natural language processing, representing a multimodal task that integrates core technologies from both fields. This...
详细信息
暂无评论