检索结果-内蒙古大学图书馆

Automatic summarization of cooking videos using transfer learning and transformer-based models

Discover Artificial Intelligence 2025年第1期5卷 1-20页

作者： Sadique, P. M. Alen Aswiga, R.V. School of Computer Science and Engineering Vellore Institute of Technology Tamil Nadu Chennai600127 India

The proliferation of cooking videos on the internet these days necessitates the conversion of these lengthy video contents into concise text recipes. Many online platforms now have a large number of cooking videos, in which, there is a challenge for viewers to extract comprehensive recipes from lengthy visual content. Effective summary is necessary in order to translate the abundance of culinary knowledge found in videos into text recipes that are easy to read and follow. This will make the cooking process easier for individuals who are searching for precise step by step cooking instructions. Such a system satisfies the needs of a broad spectrum of learners while also improving accessibility and user simplicity. As there is a growing need for easy-to-follow recipes made from cooking videos, researchers are looking on the process of automated summarization using advanced techniques. One such approach is presented in our work, which combines simple image-based models, audio processing, and GPT-based models to create a system that makes it easier to turn long culinary videos into in-depth recipe texts. A systematic workflow is adopted in order to achieve the objective. Initially, Focus is given for frame summary generation which employs a combination of two convolutional neural networks and a GPT-based model. A pre-trained CNN model called Inception-V3 is fine-tuned with food image dataset for dish recognition and another custom-made CNN is built with ingredient images for ingredient recognition. Then a GPT based model is used to combine the results produced by the two CNN models which will give us the frame summary in the desired format. Subsequently, Audio summary generation is tackled by performing Speech-to-text functionality in python. A GPT-based model is then used to generate a summary of the resulting textual representation of audio in our desired format. Finally, to refine the summaries obtained from visual and auditory content, Another GPT-based model is used

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

Energy-Efficient Dynamic Configurable Datapath Architecture for IoT Devices

引用

Journal of Communications and Information Networks 2024年第3期9卷 251-261页

作者： Ruizhe Zhang Junhui Liu Han Wang Li Lu Laboratory of Intelligent Collaborative Computing University of Electronic Science and Technology of ChinaChengdu 611731China School of Computer Science and Engineering University of Electronic Science and Technology of ChinaChengdu 611731China

This paper introduces a novel RISC-V processor architecture designed for ultra-low-power and energy-efficient applications,particularly for Internet of things(IoT)*** architecture enables runtime dynamic reconfiguration of the datapath,allowing efficient balancing between computational performance and power *** is achieved through interchangeable components and clock gating mechanisms,which help the processor adapt to varying workloads.A prototype of the architecture was implemented on a Xilinx Artix 7 field programmable gate array(FPGA).Experimental results show significant improvements in power efficiency and *** mini configuration achieves an impressive reduction in power consumption,using only 36%of the baseline ***,the full configuration boosts performance by 8%over the *** flexible and adaptable nature of this architecture makes it highly suitable for a wide range of low-power IoT applications,providing an effective solution to meet the growing demands for energy efficiency in modern IoT devices.

关键词： dynamic reconfiguration Internet of things(IoT) power efficiency RISC-V

来源：评论

学校读者我要写书评

暂无评论

Brain tumor segmentation and classification using transfer learning based CNN model with model agnostic concept interpretation

引用

Multimedia Tools and Applications 2025年第5期84卷 2509-2538页

作者： Nancy, A. Maria Maheswari, R. School of Computer Science and Engineering Vellore Institute of Technology Tamil Nadu Chennai632014 India

In recent decades, brain tumors have been regarded as a severe illness that causes significant damage to the health of the individual, and finally it results to death. Hence, the Brain Tumor Segmentation and Classification (BTSC) has gained more attention among researcher communities. BTSC is the process of finding brain tumor tissues and classifying the tissues based on the tumor types. Manual tumor segmentation from is prone to error and a time-consuming task. A precise and fast BTSC model is developed in this manuscript based on a transfer learning-based Convolutional Neural Networks (CNN) model. The utilization of a variant of CNN is because of its superiority in distinct tasks. In the initial phase, the Magnetic Resonance Imaging (MRI) brain images are acquired from the Brain Tumor Image Segmentation Challenge (BRATS) 2019, 2020 and 2021 databases. Then the image augmentation is performed on the gathered images by using zoom-in, rotation, zoom-out, flipping, scaling, and shifting methods that effectively reduce overfitting issues in the classification model. The augmented images are segmented using the layers of the Visual-Geometry-Group (VGG-19) model. Then feature extraction using An Attribute Aware Attention (AWA) methodology is carried out on the segmented images following the segmentation block in the VGG-19 model. The crucial features are then selected using the attribute category reciprocal attention phase. These features are inputted to the Model Agnostic Concept Extractor (MACE) to generate the relevance score between the features for assisting in the final classification process. The obtained relevance scores from the MACE are provided to the max-pooling layer of the VGG-19 model. Then, the final classified output is obtained from the modified VGG-19 architecture. The implemented Relevance score with the AWA-based VGG-19 model is used to classify the tumor as the whole tumor, enhanced tumor, and tumor core. In the classification section, the proposed

关键词： Magnetic resonance imaging

来源：评论

学校读者我要写书评

暂无评论

MVP-MCTS: Language Modelling a Flexible Multi-Agent Planning Framework 4

MVP-MCTS: Language Modelling a Flexible Multi-Agent Planning...

引用

4th International Symposium on Artificial Intelligence and Intelligent Manufacturing, AIIM 2024

作者： Li, Zhuoyang Ji, Jianmin School of Computer Science and Technology University of Science and Technology Hefei China

ISBN: (纸本)9798331541729

Large language models have come under the spotlight in recent years for their seemingly multifaceted capabilities which extend far beyond text processing. In particular, they have been shown to possess logical and reasoning capabilities, which has been augmented in various ways via the use of inference frameworks such as reasoning trees and planning graphs. Meanwhile, some studies tried to explore tasks that are not limited to single-agent scenarios, focusing on tasks in multi-agent settings. However, most of their works focus on text-based role-playing games, while long-term planning games have yet to be extensively explored. In this work we extend the reasoning and planning abilities of the language model to coordinate between multiple agents which have been tasked to achieve specific goals with the least amount of resource expenditure. We devise the framework MVP-MCTS: Multi-agent Value-coordination and Prompt-based Monte-Carlo Tree Search(MCTS) that integrates the prompting of a language model with a tree search procedure without the need for additional grounding. Our method relies on task-specific prompting and the innate world model of the LLM to perform search space simulation and decision-making. Our experiments show that our framework is able to outperform current state-of-the-art LLM-driven planning frameworks. © 2024 IEEE.

关键词： Multi agent systems

来源：评论

学校读者我要写书评

暂无评论

Priority-Aware Task Offloading and UAV Trajectory Optimization for Aerial Access Network - Assisted Mobile Edge Computing 17

Priority-Aware Task Offloading and UAV Trajectory Optimizati...

引用

17th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, CISP-BMEI 2024

作者： Zhou, Yuqiang Sun, Haifeng School of Computer Science and Technology Southwest University of Science and Technology China

ISBN: (纸本)9798331507398

With the surge in computational data, Mobile Edge Computing (MEC) is set to become a crucial technology for reducing communication latency and congestion. However, the widespread adoption of MEC faces several challenges. Aerial Access Networks (AANs), comprising hierarchical High Altitude Platforms (HAPs) and low-altitude Unmanned Aerial Vehicles (UAVs), offer a groundbreaking framework for MEC task offloading, particularly enhancing the service experience of Internet of Things (IoT) devices in disaster zones, battlefield healthcare, or remote areas. In this paper, we propose an MEC task offloading framework supported by AANs to serve IoT devices distributed on the ground. We define system gain based on the energy consumption and latency of tasks with varying priorities. Our objective is to maximize workload fairness among UAVs and overall system gain by jointly optimizing UAVs' flight trajectories, IoT devices' computational task offloading decisions, and service fairness. We propose a multi-agent proximal policy optimization (MAPPO)-based algorithm to solve this joint optimization problem. Experimental results validate the effectiveness of the proposed approach, and numerical analysis evaluates system performance. © 2024 IEEE.

关键词： Unmanned aerial vehicles (UAV)

来源：评论

学校读者我要写书评

暂无评论

Camera-Radar Fusion With Feature Alignment: Adding Camera Texture to Radar 17

Camera-Radar Fusion With Feature Alignment: Adding Camera Te...

引用

17th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, CISP-BMEI 2024

作者： Jian, Xin Gao, Xiaoming Dong, Wanli Zhu, Zhilei School of Computer Science and Technology Southwest University of Science and Technology China

ISBN: (纸本)9798331507398

Radar can enhance target sensing capability after fusion with visible light to achieve all-weather target detection and identification due to lower requirements for weather and light conditions. However, the mainstream radar and camera fusion methods now use decision-level fusion, which fuses the separately processed radar and image data detection results, and fails to take full advantage of the camera's semantic richness and radar's accurate detection distance. Based on this basic observation, we propose a novel feature-level fusion method, which first optimizes for the camera and radar feature misalignment problem by using a deformable attention mechanism to guide the camera features to offset to the corresponding radar positions and then integrates the optimized camera information into two consecutive cross-attention layers, which incorporate the camera and radar features in turn, exploiting the spatial and contextual relationships to achieve stable and efficient fusion. Extensive experimental results on the popular RADIATE dataset have shown the effectiveness of our method. Compared with the baselines, our method performs better under bad weather conditions. Moreover, the proposed method is robust against various real-world scenes such as rain, fog, and snow. © 2024 IEEE.

关键词： Radar imaging

来源：评论

学校读者我要写书评

暂无评论

An Apricot Detection Algorithm in Complex Environments Based on Improved YOLOv7

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer science 2024年第12期51卷 2135-2144页

作者： Guo, Qiang Ma, Chi Hu, Hui School of Computer Science and Software Engineering University of Science and Technology LiaoNing AnShan114051 China School of Computer Science and Engineering Huizhou University Huizhou516007 China

Apricot detection is a prerequisite for counting and harvesting tasks. Existing algorithms face challenges in adapting to the impacts of complex environmental factors such as lighting variations, shadows, dense foliage, and the uneven distribution of samples in mechanized apricot harvesting. This paper proposes an enhanced model, YOLOv7-DC, based on YOLOv7, to address these challenges. YOLOv7-DC preprocesses diverse apricot tree samples to accommodate real-world harvesting detection scenarios. To improve model inference speed and detection accuracy, the detection network is redesigned with a new feature fusion method. DCNv2 is embedded within the efficient layer aggregation network (ELAN), and PConv is introduced to replace conventional convolutions, reducing the parameter impact of DCNv2. The training process incorporates the CBAM attention mechanism to enhance spatial and channel information. The ConvMixer architecture captures spatial and channel relationships transmitted to the detection head through the attention mechanism, improving the model’s detection accuracy for each specific classification sample. Experimental results show that YOLOv7-DC maintains good detection speed and recognition rates across various classification tasks. The improved model achieves a 6.2% increase in average detection accuracy compared to previous algorithms, with a 13% reduction in model parameters. YOLOv7-DC is better suited for handling imbalanced samples and complex environmental scenarios. © (2024), (International Association of Engineers). All rights reserved.

关键词： Apricot biloba detection Attention mechanism Feature fusion YOLOv7

来源：评论

学校读者我要写书评

暂无评论

Research on Image Defogging Algorithm Based on Improved FFA-Net

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer science 2024年第6期51卷 634-641页

作者： Qinrong, Li Chi, Ma Qiang, Guo Hui, Hu School of Computer Science and Software Engineering University of Science and Technology LiaoNing AnShan114051 China School of Computer Science and Engineering Huizhou University Huizhou516007 China

Images captured under severe weather conditions, such as haze and fog, suffer from image quality degradation caused by atmospheric particle diffusion. This degradation manifests as color fading, reduced contrast, and adversely affects the performance of various computer vision tasks. To address this, this paper presents an end-to-end feature fusion attention network (FFA-Net) designed to directly restore haze-free images. By incorporating the SSIM loss into the original loss function, the proposed method effectively captures the visual disparities between the estimated defogged image and the authentic haze-free image. Additionally, it mitigates the color distortion problem inherent in the original algorithm. To address the challenge of low brightness in input images, a low illumination enhancement module is introduced, seamlessly integrated with the FFA-Net defogging method. Subsequently, a comparative analysis of different defogging algorithms is conducted using two distinct foggy datasets. Multiple evaluation metrics are employed to assess the performance of these algorithms. The findings indicate that our algorithm significantly outperforms others in terms of objective indicators such as PSNR and SSIM, as well as visual effects. © (2024), (International Association of Engineers). All rights reserved.

关键词： Image enhancement

来源：评论

学校读者我要写书评

暂无评论

Dynamic Graph Transformer for Brain Disorder Diagnosis

引用

IEEE Journal of Biomedical and Health Informatics 2025年第6期29卷 4388-4400页

作者： Shehzad, Ahsan Zhang, Dongyu Yu, Shuo Abid, Shagufta Xia, Feng Dalian University of Technology School of Software Technology Dalian116620 China Dalian University of Technology School of Foreign Languages School of Software Technology Dalian116024 China Dalian University of Technology School of Computer Science and Technology Dalian116024 China RMIT University School of Computing Technologies MelbourneVIC3000 Australia

Dynamic brain networks play a pivotal role in diagnosing brain disorders by capturing temporal changes in brain activity and connectivity. Previous methods often rely on sliding-window approaches for constructing these networks using fMRI data. However, these methods face two key limitations: a fixed temporal length that inadequately captures brain activity dynamics and a global spatial scope that introduces noise and reduces sensitivity to localized dysfunctions. These challenges can lead to inaccurate brain network representations and potential *** address these challenges, we propose BrainDGT, a dynamic Graph Transformer model designed to enhance the construction and analysis of dynamic brain networks for more accurate diagnosis of brain disorders. BrainDGT leverages adaptive brain states by deconvolving the Hemodynamic Response Function (HRF) within individual functional brain modules to generate dynamic graphs, addressing the limitations of fixed temporal length and global spatial scope. The model learns spatio-temporal local features through attention mechanisms within these graphs and captures global interactions across modules using adaptive fusion. This dual-level integration enhances the model's ability to analyze complex brain connectivity patterns. We validate BrainDGT's effectiveness through classification experiments on three fMRI datasets (ADNI, PPMI, and ABIDE), where it outperforms state-of-the-art methods. By enabling adaptive, localized analysis of dynamic brain networks, BrainDGT advances neuroimaging and supports the development of more precise diagnostic and treatment strategies in biomedical research. © 2013 IEEE.

关键词： Functional neuroimaging

来源：评论

学校读者我要写书评

暂无评论

Multi-View Ensemble Clustering Algorithm Weighted by Kernel Density Estimation 17

Multi-View Ensemble Clustering Algorithm Weighted by Kernel ...

引用

17th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, CISP-BMEI 2024

作者： Hu, Hong Li, Xuejun Liao, Jing School of Computer Science and Technology Southwest University of Science and Technology China

ISBN: (纸本)9798331507398

Most of the existing ensemble clustering algorithms improve the performance by weighting the basic clusters to reduce the influence of low-quality basic clusters on the final clustering results. Low-quality base clustering can be understood as misclassifying sample points, which are presented as discrete points in the co-association matrix. So, we proposed a multi-view ensemble clustering algorithm based on the weighting of kernel density estimation, starting from the density distribution of discrete points in the co-association matrix. Firstly, the different sets of basic clusters are made into a co-association matrix using the evidence accumulation model. Secondly, we converted the co-association matrix into a sparse matrix, and then we calculated the density distribution weights of the nonzero elements in the matrix using kernel density estimation. Next we let these weights multiply with the sparse matrix. Thirdly, we reduce the weighted sparse matrix to the shape of the initial co-association matrix to obtain the density-weighted co-association matrix. Next, we use the K-means clustering algorithm on the weighted co-association matrix to obtain the final clustering results. Finally, the algorithm is subjected to comparison experiments and ablation experiments on five commonly used datasets. The experimental results show that the proposed ensemble clustering algorithm with kernel density estimation weighting performs better than other comparative algorithms. © 2024 IEEE.

关键词： Matrix algebra

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：