The proliferation of cooking videos on the internet these days necessitates the conversion of these lengthy video contents into concise text recipes. Many online platforms now have a large number of cooking videos, in...
详细信息
The proliferation of cooking videos on the internet these days necessitates the conversion of these lengthy video contents into concise text recipes. Many online platforms now have a large number of cooking videos, in which, there is a challenge for viewers to extract comprehensive recipes from lengthy visual content. Effective summary is necessary in order to translate the abundance of culinary knowledge found in videos into text recipes that are easy to read and follow. This will make the cooking process easier for individuals who are searching for precise step by step cooking instructions. Such a system satisfies the needs of a broad spectrum of learners while also improving accessibility and user simplicity. As there is a growing need for easy-to-follow recipes made from cooking videos, researchers are looking on the process of automated summarization using advanced techniques. One such approach is presented in our work, which combines simple image-based models, audio processing, and GPT-based models to create a system that makes it easier to turn long culinary videos into in-depth recipe texts. A systematic workflow is adopted in order to achieve the objective. Initially, Focus is given for frame summary generation which employs a combination of two convolutional neural networks and a GPT-based model. A pre-trained CNN model called Inception-V3 is fine-tuned with food image dataset for dish recognition and another custom-made CNN is built with ingredient images for ingredient recognition. Then a GPT based model is used to combine the results produced by the two CNN models which will give us the frame summary in the desired format. Subsequently, Audio summary generation is tackled by performing Speech-to-text functionality in python. A GPT-based model is then used to generate a summary of the resulting textual representation of audio in our desired format. Finally, to refine the summaries obtained from visual and auditory content, Another GPT-based model is used
This paper introduces a novel RISC-V processor architecture designed for ultra-low-power and energy-efficient applications,particularly for Internet of things(IoT)*** architecture enables runtime dynamic reconfigurati...
详细信息
This paper introduces a novel RISC-V processor architecture designed for ultra-low-power and energy-efficient applications,particularly for Internet of things(IoT)*** architecture enables runtime dynamic reconfiguration of the datapath,allowing efficient balancing between computational performance and power *** is achieved through interchangeable components and clock gating mechanisms,which help the processor adapt to varying workloads.A prototype of the architecture was implemented on a Xilinx Artix 7 field programmable gate array(FPGA).Experimental results show significant improvements in power efficiency and *** mini configuration achieves an impressive reduction in power consumption,using only 36%of the baseline ***,the full configuration boosts performance by 8%over the *** flexible and adaptable nature of this architecture makes it highly suitable for a wide range of low-power IoT applications,providing an effective solution to meet the growing demands for energy efficiency in modern IoT devices.
In recent decades, brain tumors have been regarded as a severe illness that causes significant damage to the health of the individual, and finally it results to death. Hence, the Brain Tumor Segmentation and Classific...
详细信息
In recent decades, brain tumors have been regarded as a severe illness that causes significant damage to the health of the individual, and finally it results to death. Hence, the Brain Tumor Segmentation and Classification (BTSC) has gained more attention among researcher communities. BTSC is the process of finding brain tumor tissues and classifying the tissues based on the tumor types. Manual tumor segmentation from is prone to error and a time-consuming task. A precise and fast BTSC model is developed in this manuscript based on a transfer learning-based Convolutional Neural Networks (CNN) model. The utilization of a variant of CNN is because of its superiority in distinct tasks. In the initial phase, the Magnetic Resonance Imaging (MRI) brain images are acquired from the Brain Tumor Image Segmentation Challenge (BRATS) 2019, 2020 and 2021 databases. Then the image augmentation is performed on the gathered images by using zoom-in, rotation, zoom-out, flipping, scaling, and shifting methods that effectively reduce overfitting issues in the classification model. The augmented images are segmented using the layers of the Visual-Geometry-Group (VGG-19) model. Then feature extraction using An Attribute Aware Attention (AWA) methodology is carried out on the segmented images following the segmentation block in the VGG-19 model. The crucial features are then selected using the attribute category reciprocal attention phase. These features are inputted to the Model Agnostic Concept Extractor (MACE) to generate the relevance score between the features for assisting in the final classification process. The obtained relevance scores from the MACE are provided to the max-pooling layer of the VGG-19 model. Then, the final classified output is obtained from the modified VGG-19 architecture. The implemented Relevance score with the AWA-based VGG-19 model is used to classify the tumor as the whole tumor, enhanced tumor, and tumor core. In the classification section, the proposed
Large language models have come under the spotlight in recent years for their seemingly multifaceted capabilities which extend far beyond text processing. In particular, they have been shown to possess logical and rea...
详细信息
With the surge in computational data, Mobile Edge Computing (MEC) is set to become a crucial technology for reducing communication latency and congestion. However, the widespread adoption of MEC faces several challeng...
详细信息
Radar can enhance target sensing capability after fusion with visible light to achieve all-weather target detection and identification due to lower requirements for weather and light conditions. However, the mainstrea...
详细信息
Apricot detection is a prerequisite for counting and harvesting tasks. Existing algorithms face challenges in adapting to the impacts of complex environmental factors such as lighting variations, shadows, dense foliag...
详细信息
Images captured under severe weather conditions, such as haze and fog, suffer from image quality degradation caused by atmospheric particle diffusion. This degradation manifests as color fading, reduced contrast, and ...
详细信息
Dynamic brain networks play a pivotal role in diagnosing brain disorders by capturing temporal changes in brain activity and connectivity. Previous methods often rely on sliding-window approaches for constructing thes...
详细信息
Most of the existing ensemble clustering algorithms improve the performance by weighting the basic clusters to reduce the influence of low-quality basic clusters on the final clustering results. Low-quality base clust...
详细信息
暂无评论