检索结果-内蒙古大学图书馆

Script-to-Storyboard: A new contextual retrieval dataset and benchmark

Computational Visual Media 2025年第1期11卷 103-122页

作者： Xi Tian Yong-Liang Yang Qi Wu Department of Computer Science University of BathBath BA27AYUK Australian Institute for Machine Learning School of Computer ScienceThe University of AdelaideAdelaideSA 5005Australia

Storyboards comprising key illustrations and images help filmmakers to outline ideas,key moments,and story events when filming *** by this,we introduce the first contextual benchmark dataset Script-to-Storyboard(Sc2St)composed of storyboards to explicitly express story structures in the movie domain,and propose the contextual retrieval task to facilitate movie story *** Sc2St dataset contains fine-grained and diverse texts,annotated semantic keyframes,and coherent storylines in storyboards,unlike existing movie *** contextual retrieval task takes as input a multi-sentence movie script summary with keyframe history and aims to retrieve a future keyframe described by a corresponding sentence to form the *** to classic text-based visual retrieval tasks,this requires capturing the context from the description(script)and keyframe *** benchmark existing text-based visual retrieval methods on the new dataset and propose a recurrent-based framework with three variants for effective context *** experiments demonstrate that our methods compare favourably to existing methods;ablation studies validate the effectiveness of the proposed context encoding approaches.

关键词： dataset benchmark text-based image retrieval movie

来源：评论

学校读者我要写书评

暂无评论

Facial Expression Recognition Using machine learning and Deep learning Techniques: A Systematic Review

引用

SN computer science 2024年第4期5卷 432页

作者： Mohana, M. Subashini, P. Centre for Machine Learning and Intelligence Department of Computer Science Avinashilingam Institute Coimbatore India

In the contemporary era, Facial Expression Recognition (FER) plays a pivotal role in numerous fields due to its vast application areas, such as e-learning, healthcare, marketing, and psychology, to name a few examples. Several research studies have been conducted on FER, and many reviews are available. The existing FER review paper focused on presenting a standard pipeline for FER to predict basic expressions. However, previous studies have not given an adequate amount of importance to FER datasets and their influence on affecting FER system performance. In this systematic review, 105 papers retrieved papers from IEEE, ACM, science Direct, Scopus, Web of science, and Springer from the years 2002 to 2023, following systematic review guidelines. Review protocol and research questions are also developed for the analysis of study results. The review identified that the accuracy of the FER system in controlled and spontaneous facial expression datasets is being affected, along with other challenges such as illumination, pose, and scale variation. Furthermore, this paper comparatively analyzed the FER model in both machine and deep learning techniques, including face detection, pre-processing, handcrafted feature extraction techniques, and emotion classifiers. In addition, we discussed some unresolved issues in FER and suggested solutions to enhance FER system performance further. In the future, multimodal FER systems need to be developed for real-time scenarios, considering the computational efficiency of model performance when integrating more than one model and dataset to achieve promising accuracy and reduce error rates. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2024.

关键词： Deep learning (DL) Face detection Facial emotion Facial Expression Recognition (FER) machine learning (ML) Survey

来源：评论

学校读者我要写书评

暂无评论

Revisiting face detection: Supercharging Viola-Jones with particle swarm optimization for enhanced performance

引用

Journal of Intelligent and Fuzzy Systems 2024年第4期46卷 10727-10741页

作者： Mohana, M. Subashini, P. Shukla, Diksha Centre for Machine Learning and Intelligence Department of Computer Science Avinashilingam Institute Tamil Nadu Coimbatore India Department of Electrical Engineering and Computer Science University of Wyoming LaramieWY United States

In recent years, face detection has emerged as a prominent research field within computer Vision (CV) and Deep learning. Detecting faces in images and video sequences remains a challenging task due to various factors such as pose variation, varying illumination, occlusion, and scale differences. Despite the development of numerous face detection algorithms in deep learning, the Viola-Jones algorithm, with its simple yet effective approach, continues to be widely used in real-time camera applications. The conventional Viola-Jones algorithm employs AdaBoost for classifying faces in images and videos. The challenge lies in working with cluttered real-time facial images. AdaBoost needs to search through all possible thresholds for all samples to find the minimum training error when receiving features from Haar-like detectors. Therefore, this exhaustive search consumes significant time to discover the best threshold values and optimize feature selection to build an efficient classifier for face detection. In this paper, we propose enhancing the conventional Viola-Jones algorithm by incorporating Particle Swarm Optimization (PSO) to improve its predictive accuracy, particularly in complex face images. We leverage PSO in two key areas within the Viola-Jones framework. Firstly, PSO is employed to dynamically select optimal threshold values for feature selection, thereby improving computational efficiency. Secondly, we adapt the feature selection process using AdaBoost within the Viola-Jones algorithm, integrating PSO to identify the most discriminative features for constructing a robust classifier. Our approach significantly reduces the feature selection process time and search complexity compared to the traditional algorithm, particularly in challenging environments. We evaluated our proposed method on a comprehensive face detection benchmark dataset, achieving impressive results, including an average true positive rate of 98.73% and a 2.1% higher average prediction accura

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

引用

2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024

作者： Meister, Clara Giulianelli, Mario Pimentel, Tiago ETH Zürich Department of Computer Science Institute for Machine Learning Switzerland

ISBN: (纸本)9798891761643

Surprisal theory posits that the cognitive effort required to comprehend a word is determined by its contextual predictability, quantified as surprisal. Traditionally, surprisal theory treats words as distinct entities, overlooking any potential similarity between them. Giulianelli et al. (2023) address this limitation by introducing information value, a measure of predictability designed to account for similarities between communicative units. Our work leverages Ricotta and Szeidl's (2006) diversity index to extend surprisal into a metric that we term similarity-adjusted surprisal, exposing a mathematical relationship between surprisal and information value. Similarity-adjusted surprisal aligns with information value when considering graded similarities and reduces to standard surprisal when words are treated as distinct. Experimental results with reading time data indicate that similarity-adjusted surprisal adds predictive power beyond standard surprisal for certain datasets, suggesting it serves as a complementary measure of comprehension effort. © 2024 Association for Computational Linguistics.

关键词：

来源：评论

学校读者我要写书评

暂无评论

MCANet: Multimodal Caption Aware Training-Free Video Anomaly Detection via Large Language Model 27th

MCANet: Multimodal Caption Aware Training-Free Video Anomaly...

引用

27th International Conference on Pattern Recognition, ICPR 2024

作者： Dev, Prabhu Prasad Hazari, Raju Das, Pranesh Machine Learning Laboratory Department of Computer Science and Engineering National Institute of Technology Calicut India

ISBN: (纸本)9783031781247

Towards Video Anomaly Detection (VAD), existing methods require labor-intensive data collection and model retraining, making them costly and domain-specific. The proposed method, termed as Multi-modal Caption Aware Network (MCANet), introduces a novel paradigm that identifies anomalies in video sequences without requiring prior domain knowledge. This training-free VAD approach dynamically generates and analyzes textual descriptions of video frames by utilizing off-the-shelf vision-language model (VLM), audio-language model (ALM) and large language model (LLM). MCANet has four primary modules. The first module utilizes image-text similarities to clean noisy captions generated by the image captioning model, while the second module applies audio-text similarities to refine noisy captions produced by the audio captioning model. The third module employs a LLM to consolidate scene dynamics over time. Finally, the fourth module enhances the results by aggregating scores from semantically similar frames based on video-text similarity. To validate the effectiveness of the proposed method, experiments are conducted on two large-scale benchmark datasets (UCF-Crime and XD-Violence). Experimental results demonstrate that MCANet surpasses existing unsupervised and one-class approaches without requiring any training or data collection. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Enhancing Automated Short Answer Grading with Prompt-Driven Augmentation and Prompt Adaptive Oversampling 27th

Enhancing Automated Short Answer Grading with Prompt-Driven...

引用

27th International Conference on Pattern Recognition, ICPR 2024

作者： Afeefa, P.P. Hazari, Raju Das, Pranesh Machine Learning Laboratory Department of Computer Science and Engineering National Institute of Technology Calicut Kozhikode India

ISBN: (纸本)9783031781186

Automated Short Answer Grading (ASAG) comes under automatic answer script evaluation where the answer length is limited from one phrase to one paragraph. The main task in ASAG is generating a good sentence embedding for both the student and the reference answers. The existing works on the embedding creation perform better when using different deep-learning techniques and language models. However, the deep-learning techniques’ performance mainly depends on the training set size and quality. Most of the publicly available datasets typically have a limited number of reference and student answer pairs. To automate the dataset expansion, text augmentation techniques can be used. Conventional methods like back-translation, synonym replacement, and random deletion may replace some important technical words with other non-relevant terms, resulting in a loss of contextual meaning. We propose a new augmentation strategy for the ASAG datasets using LLM (Large Language Model) prompting. The effect of the proposed strategy is analysed on sentence transformer fine-tuning. We experimented with four different sizes of augmented training sets to determine the impact of the size of augmented training data on fine-tuning the sentence transformer model. Results indicate that sentence transformer fine-tuned using a 50% prompt-driven augmented dataset generates better embeddings. After having good embeddings, the traditional classifiers can be used to classify the student answers to different scores. We introduce "Prompt Adaptive Oversampling (PAO)" to address the class imbalance issue during grade classification. The effectiveness of the proposed strategy is analysed on two different public datasets: SPRAG, and Mohler-ASAG. The proposed method performs better while training highly imbalanced datasets. The source code of this work is available here. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Deep learning

来源：评论

学校读者我要写书评

暂无评论

Diffusion models for 3D generation: A survey

引用

Computational Visual Media 2025年第1期11卷 1-28页

作者： Chen Wang Hao-Yang Peng Ying-Tian Liu Jiatao Gu Shi-Min Hu Department of Computer and Information Science University of PennsylvaniaPhiladelphiaPennsylvania 19104USA Department of Computer Science and Technology Tsinghua UniversityBeijing 100084China Machine Learning Research Apple AI/MLNew YorkUSA.E-mail:jiatao@***

Denoising diffusion models have demonstrated tremendous success in modeling data distributions and synthesizing high-quality *** the 2D image domain,they have become the state-of-the-art and are capable of generating photo-realistic images with high *** recently,researchers have begun to explore how to utilize diffusion models to generate 3D data,as doing so has more potential in real-world *** requires careful design choices in two key ways:identifying a suitable 3D representation and determining how to apply the diffusion *** this survey,we provide the first comprehensive review of diffusion models for manipulating 3D content,including 3D generation,reconstruction,and 3D-aware image *** classify existing methods into three major categories:2D space diffusion with pretrained models,2D space diffusion without pretrained models,and 3D space *** also summarize popular datasets used for 3D generation with diffusion *** with this survey,we maintain a repository https://***/cwchenwang/awesome-3d-diffusion to track the latest relevant papers and ***,we pose current challenges for diffusion models for 3D generation,and suggest future research directions.

关键词： diffusion models 3D generation generative models AIG

来源：评论

学校读者我要写书评

暂无评论

Finding the transcription factor binding locations using novel algorithm segmentation to filtration (S2F)

引用

Journal of Ambient Intelligence and Humanized Computing 2024年第9期15卷 3347-3358页

作者： Theepalakshmi, P. Srinivasulu Reddy, U. Department of Computer Science and Engineering Gandhi Institute of Technology and Management Karnataka Bengaluru India Machine Learning and Data Analytics Lab Center of Excellence in Artificial Intelligence Department of Computer Applications National Institute of Technology Tamilnadu Tiruchirappalli India

The primary aim of identifying the binding motifs in gene regulation is to understand the transcriptional regulation molecular mechanism systematically. In this study, the (, d) motif search issue was considered which entails finding the length motifs which differ by at most d substitutions. However, identifying the high-quality pattern (, d) is challenging. It is intended to address the above problem with motif discovery and handle it using the proposed algorithm S2F (Segmentation to Filtration) based on the qPMS (quorum Planted Motif Search) algorithm model. From the entire DNA sequences, five percent are chosen at random to be used in the motif discovery process. This random sub segment (subseg) portion is split up into base, sub k-mers, and its sizes (motif length ()) are determined by the iterative approach. Corresponding to the sizes of and d (mutations), the k-mers are chosen which participated in filtration techniques and the base k-mer count and frequency are updated. The highest frequency of k-mer is recognized as the motif. The algorithm’s performance was evaluated using the two real datasets Escherichia coli cyclic AMP receptor protein (CRP) and mouse Embryonic Stem Cell (mESC) ChIP-seq (Chromatin Immuno Precipitation) dataset. Results from the experiments show that S2F can identify the motifs and appear faster compared to previous state-of-the-art PMS (Planted Motif Search) and qPMS algorithms. Graphical Abstract: (Figure presented.) © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.

关键词： DNA sequences

来源：评论

学校读者我要写书评

暂无评论

Automated Plant Disease Detection: CNN for Corn Maize, Tomato, and Potato 7th

Automated Plant Disease Detection: CNN for Corn Maize, Tomat...

引用

7th International Conference on Soft Computing: Theories and Applications, SoCTA 2023

作者： Angeline, R. Aruneshwaran, S. Department of Computer Science and Engineering with Specialization in Artificial Intelligence and Machine Learning SRM Institute of Science and Technology Ramapuram India

ISBN: (纸本)9789819720880

Plant ailments pose present a significant challenge to the worldwide food security and the agricultural sector. Swift and precise detection of these diseases is pivotal for effectively managing them and preventing crop yield reductions. Lately, advanced deep learning techniques, specifically Convolutional Neural Networks (CNNs), have exhibited encouraging outcomes across various tasks involving image recognition. This undertaking strives to create and execute a model founded on CNNs to prognosticate plant diseases through leaf images. The proposed strategy encompasses three main phases: compiling and preparing the data, developing the model architecture, and assessing performance. Initially, an extensive dataset of plant leaf images, encompassing leaves afflicted by diverse diseases, is assembled. The images undergo preprocessing to heighten quality and eliminate disturbances, ensuring a dependable model training process. Subsequently, a CNN structure is devised and trained to employ the dataset. The chosen CNN model adheres to a sequential design, where each layer possesses precisely one input and output. These layers are arranged sequentially to construct the entire network and incorporate multiple convolutional layers such as Conv2D, MaxPooling2D, Flatten, and Dense, enabling the learning of features from the input images. The findings underscore that the CNN-centered model for forecasting plant diseases attains remarkable training precision of 99.65%, accompanied by a testing precision of 99.44% and a validation precision of 98.61%, proficiently identifying prevalent ailments like common rust disease in corn plants, bacterial spot infection in tomato crops, and the early blight ailment in potato plants. In conclusion, the proposed CNN-driven prognostic model for plant diseases manifests encouraging outcomes in precisely recognizing these diseases from leaf images. The efficacious application of this model can assist farmers and agricultural specialists in inform

关键词： Fruits

来源：评论

学校读者我要写书评

暂无评论

Optimized Automated Stock Trading using DQN and Double DQN

Optimized Automated Stock Trading using DQN and Double DQN

引用

2024 International Conference on Intelligent Algorithms for Computational Intelligence Systems, IACIS 2024

作者： Bharadwaj, Gurudutt S Pratap, David Darapaneni, Narayana Pes University Department of Computer Science Bengaluru India Great Learning Department of Data Science and Machine Learning Bengaluru India

ISBN: (纸本)9798350360660

Stock Portfolio management involves managing the buying, holding and selling decisions for the various stocks in the portfolio. There has been work where Reinforcement learning (RL) based actor-critic methods like Deep Direct Policy Gradient (DDPG) have been used for asset allocation problems. Here an attempt has been made to use solely critic-based value function methods like Deep Q-network (DQN) and Double DQN for estimating Q-values of market actions. Then, an optimized portfolio management algorithm designed to balance trades across a basket of stocks is designed. Five stocks are chosen of different price ranges from NYSE and NASDAQ stock exchanges. The average cumulative percentage returns provided by DQN was 55% on testing data with an average Maximum Drawdown (MDD) of 2.5%. The same with Double DQN was 71% on testing data with an average MDD of 2.83%. These results were significantly better than the case when a traditional Buy and Hold strategy was to be employed. © 2024 IEEE.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：