检索结果-内蒙古大学图书馆

Script-to-Storyboard: A new contextual retrieval dataset and benchmark

Computational Visual Media 2025年第1期11卷 103-122页

作者： Xi Tian Yong-Liang Yang Qi Wu Department of Computer Science University of BathBath BA27AYUK Australian Institute for Machine Learning School of Computer ScienceThe University of AdelaideAdelaideSA 5005Australia

Storyboards comprising key illustrations and images help filmmakers to outline ideas,key moments,and story events when filming *** by this,we introduce the first contextual benchmark dataset Script-to-Storyboard(Sc2St)composed of storyboards to explicitly express story structures in the movie domain,and propose the contextual retrieval task to facilitate movie story *** Sc2St dataset contains fine-grained and diverse texts,annotated semantic keyframes,and coherent storylines in storyboards,unlike existing movie *** contextual retrieval task takes as input a multi-sentence movie script summary with keyframe history and aims to retrieve a future keyframe described by a corresponding sentence to form the *** to classic text-based visual retrieval tasks,this requires capturing the context from the description(script)and keyframe *** benchmark existing text-based visual retrieval methods on the new dataset and propose a recurrent-based framework with three variants for effective context *** experiments demonstrate that our methods compare favourably to existing methods;ablation studies validate the effectiveness of the proposed context encoding approaches.

关键词： dataset benchmark text-based image retrieval movie

来源：评论

学校读者我要写书评

暂无评论

Diffusion models for 3D generation: A survey

引用

Computational Visual Media 2025年第1期11卷 1-28页

作者： Chen Wang Hao-Yang Peng Ying-Tian Liu Jiatao Gu Shi-Min Hu Department of Computer and Information Science University of PennsylvaniaPhiladelphiaPennsylvania 19104USA Department of Computer Science and Technology Tsinghua UniversityBeijing 100084China Machine Learning Research Apple AI/MLNew YorkUSA.E-mail:jiatao@***

Denoising diffusion models have demonstrated tremendous success in modeling data distributions and synthesizing high-quality *** the 2D image domain,they have become the state-of-the-art and are capable of generating photo-realistic images with high *** recently,researchers have begun to explore how to utilize diffusion models to generate 3D data,as doing so has more potential in real-world *** requires careful design choices in two key ways:identifying a suitable 3D representation and determining how to apply the diffusion *** this survey,we provide the first comprehensive review of diffusion models for manipulating 3D content,including 3D generation,reconstruction,and 3D-aware image *** classify existing methods into three major categories:2D space diffusion with pretrained models,2D space diffusion without pretrained models,and 3D space *** also summarize popular datasets used for 3D generation with diffusion *** with this survey,we maintain a repository https://***/cwchenwang/awesome-3d-diffusion to track the latest relevant papers and ***,we pose current challenges for diffusion models for 3D generation,and suggest future research directions.

关键词： diffusion models 3D generation generative models AIG

来源：评论

学校读者我要写书评

暂无评论

Facial Expression Recognition Using machine learning and Deep learning Techniques: A Systematic Review

引用

SN computer science 2024年第4期5卷 432页

作者： Mohana, M. Subashini, P. Centre for Machine Learning and Intelligence Department of Computer Science Avinashilingam Institute Coimbatore India

In the contemporary era, Facial Expression Recognition (FER) plays a pivotal role in numerous fields due to its vast application areas, such as e-learning, healthcare, marketing, and psychology, to name a few examples. Several research studies have been conducted on FER, and many reviews are available. The existing FER review paper focused on presenting a standard pipeline for FER to predict basic expressions. However, previous studies have not given an adequate amount of importance to FER datasets and their influence on affecting FER system performance. In this systematic review, 105 papers retrieved papers from IEEE, ACM, science Direct, Scopus, Web of science, and Springer from the years 2002 to 2023, following systematic review guidelines. Review protocol and research questions are also developed for the analysis of study results. The review identified that the accuracy of the FER system in controlled and spontaneous facial expression datasets is being affected, along with other challenges such as illumination, pose, and scale variation. Furthermore, this paper comparatively analyzed the FER model in both machine and deep learning techniques, including face detection, pre-processing, handcrafted feature extraction techniques, and emotion classifiers. In addition, we discussed some unresolved issues in FER and suggested solutions to enhance FER system performance further. In the future, multimodal FER systems need to be developed for real-time scenarios, considering the computational efficiency of model performance when integrating more than one model and dataset to achieve promising accuracy and reduce error rates. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2024.

关键词： Deep learning (DL) Face detection Facial emotion Facial Expression Recognition (FER) machine learning (ML) Survey

来源：评论

学校读者我要写书评

暂无评论

Optimized Automated Stock Trading using DQN and Double DQN

Optimized Automated Stock Trading using DQN and Double DQN

引用

2024 International Conference on Intelligent Algorithms for Computational Intelligence Systems, IACIS 2024

作者： Bharadwaj, Gurudutt S Pratap, David Darapaneni, Narayana Pes University Department of Computer Science Bengaluru India Great Learning Department of Data Science and Machine Learning Bengaluru India

ISBN: (纸本)9798350360660

Stock Portfolio management involves managing the buying, holding and selling decisions for the various stocks in the portfolio. There has been work where Reinforcement learning (RL) based actor-critic methods like Deep Direct Policy Gradient (DDPG) have been used for asset allocation problems. Here an attempt has been made to use solely critic-based value function methods like Deep Q-network (DQN) and Double DQN for estimating Q-values of market actions. Then, an optimized portfolio management algorithm designed to balance trades across a basket of stocks is designed. Five stocks are chosen of different price ranges from NYSE and NASDAQ stock exchanges. The average cumulative percentage returns provided by DQN was 55% on testing data with an average Maximum Drawdown (MDD) of 2.5%. The same with Double DQN was 71% on testing data with an average MDD of 2.83%. These results were significantly better than the case when a traditional Buy and Hold strategy was to be employed. © 2024 IEEE.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

引用

2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024

作者： Meister, Clara Giulianelli, Mario Pimentel, Tiago ETH Zürich Department of Computer Science Institute for Machine Learning Switzerland

ISBN: (纸本)9798891761643

Surprisal theory posits that the cognitive effort required to comprehend a word is determined by its contextual predictability, quantified as surprisal. Traditionally, surprisal theory treats words as distinct entities, overlooking any potential similarity between them. Giulianelli et al. (2023) address this limitation by introducing information value, a measure of predictability designed to account for similarities between communicative units. Our work leverages Ricotta and Szeidl's (2006) diversity index to extend surprisal into a metric that we term similarity-adjusted surprisal, exposing a mathematical relationship between surprisal and information value. Similarity-adjusted surprisal aligns with information value when considering graded similarities and reduces to standard surprisal when words are treated as distinct. Experimental results with reading time data indicate that similarity-adjusted surprisal adds predictive power beyond standard surprisal for certain datasets, suggesting it serves as a complementary measure of comprehension effort. © 2024 Association for Computational Linguistics.

关键词：

来源：评论

学校读者我要写书评

暂无评论

NOVASCORE: A New Automated Metric for Evaluating Document Level Novelty 31

NOVASCORE: A New Automated Metric for Evaluating Document Le...

引用

31st International Conference on Computational Linguistics, COLING 2025

作者： Ai, Lin Gong, Ziwei Deshpande, Harshsaiprasad Johnson, Alexander Phung, Emmy Emami, Ahmad Hirschberg, Julia Machine Learning Center of Excellence JPMorgan Chase & Co. Japan Department of Computer Science Columbia University United States

ISBN: (纸本)9798891761964

The rapid expansion of online content has intensified the issue of information redundancy, underscoring the need for solutions that can identify genuinely new information. Despite this challenge, the research community has seen a decline in focus on novelty detection, particularly with the rise of large language models (LLMs). Additionally, previous approaches have relied heavily on human annotation, which is time-consuming, costly, and particularly challenging when annotators must compare a target document against a vast number of historical documents. In this work, we introduce NOVASCORE (Novelty Evaluation in Atomicity Score), an automated metric for evaluating document-level novelty. NOVASCORE aggregates the novelty and salience scores of atomic information, providing high interpretability and a detailed analysis of a document's novelty. With its dynamic weight adjustment scheme, NOVASCORE offers enhanced flexibility and an additional dimension to assess both the novelty level and the importance of information within a document. Our experiments show that NOVASCORE strongly correlates with human judgments of novelty, achieving a 0.626 Point-Biserial correlation on the TAP-DLND 1.0 dataset and a 0.920 Pearson correlation on an internal human-annotated dataset. © 2025 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Finding the transcription factor binding locations using novel algorithm segmentation to filtration (S2F)

引用

Journal of Ambient Intelligence and Humanized Computing 2024年第9期15卷 3347-3358页

作者： Theepalakshmi, P. Srinivasulu Reddy, U. Department of Computer Science and Engineering Gandhi Institute of Technology and Management Karnataka Bengaluru India Machine Learning and Data Analytics Lab Center of Excellence in Artificial Intelligence Department of Computer Applications National Institute of Technology Tamilnadu Tiruchirappalli India

The primary aim of identifying the binding motifs in gene regulation is to understand the transcriptional regulation molecular mechanism systematically. In this study, the (, d) motif search issue was considered which entails finding the length motifs which differ by at most d substitutions. However, identifying the high-quality pattern (, d) is challenging. It is intended to address the above problem with motif discovery and handle it using the proposed algorithm S2F (Segmentation to Filtration) based on the qPMS (quorum Planted Motif Search) algorithm model. From the entire DNA sequences, five percent are chosen at random to be used in the motif discovery process. This random sub segment (subseg) portion is split up into base, sub k-mers, and its sizes (motif length ()) are determined by the iterative approach. Corresponding to the sizes of and d (mutations), the k-mers are chosen which participated in filtration techniques and the base k-mer count and frequency are updated. The highest frequency of k-mer is recognized as the motif. The algorithm’s performance was evaluated using the two real datasets Escherichia coli cyclic AMP receptor protein (CRP) and mouse Embryonic Stem Cell (mESC) ChIP-seq (Chromatin Immuno Precipitation) dataset. Results from the experiments show that S2F can identify the motifs and appear faster compared to previous state-of-the-art PMS (Planted Motif Search) and qPMS algorithms. Graphical Abstract: (Figure presented.) © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.

关键词： DNA sequences

来源：评论

学校读者我要写书评

暂无评论

Low-Rank Optimal Transport through Factor Relaxation with Latent Coupling 38

Low-Rank Optimal Transport through Factor Relaxation with La...

引用

38th Conference on Neural Information Processing Systems, NeurIPS 2024

作者： Halmos, Peter Liu, Xinhao Gold, Julian Raphael, Benjamin J. Department of Computer Science Princeton University United States Center for Statistics and Machine Learning Princeton University United States

Optimal transport (OT) is a general framework for finding a minimum-cost transport plan, or coupling, between probability distributions, and has many applications in machine learning. A key challenge in applying OT to massive datasets is the quadratic scaling of the coupling matrix with the size of the dataset. Forrow et al. (2019) introduced a factored coupling for the k-Wasserstein barycenter problem, which Scetbon et al. (2021) adapted to solve the primal low-rank OT problem. We derive an alternative parameterization of the low-rank problem based on the latent coupling (LC) factorization previously introduced by Lin et al. (2021) generalizing Forrow et al. (2019). The LC factorization has multiple advantages for low-rank OT including decoupling the problem into three OT problems and greater flexibility and interpretability. We leverage these advantages to derive a new algorithm Factor Relaxation with Latent Coupling (FRLC), which uses coordinate mirror descent to compute the LC factorization. FRLC handles multiple OT objectives (Wasserstein, Gromov-Wasserstein, Fused Gromov-Wasserstein), and marginal constraints (balanced, unbalanced, and semi-relaxed) with linear space complexity. We provide theoretical results on FRLC, and demonstrate superior performance on diverse applications - including graph clustering and spatial transcriptomics - while demonstrating its interpretability. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

MCANet: Multimodal Caption Aware Training-Free Video Anomaly Detection via Large Language Model 27th

MCANet: Multimodal Caption Aware Training-Free Video Anomaly...

引用

27th International Conference on Pattern Recognition, ICPR 2024

作者： Dev, Prabhu Prasad Hazari, Raju Das, Pranesh Machine Learning Laboratory Department of Computer Science and Engineering National Institute of Technology Calicut India

ISBN: (纸本)9783031781247

Towards Video Anomaly Detection (VAD), existing methods require labor-intensive data collection and model retraining, making them costly and domain-specific. The proposed method, termed as Multi-modal Caption Aware Network (MCANet), introduces a novel paradigm that identifies anomalies in video sequences without requiring prior domain knowledge. This training-free VAD approach dynamically generates and analyzes textual descriptions of video frames by utilizing off-the-shelf vision-language model (VLM), audio-language model (ALM) and large language model (LLM). MCANet has four primary modules. The first module utilizes image-text similarities to clean noisy captions generated by the image captioning model, while the second module applies audio-text similarities to refine noisy captions produced by the audio captioning model. The third module employs a LLM to consolidate scene dynamics over time. Finally, the fourth module enhances the results by aggregating scores from semantically similar frames based on video-text similarity. To validate the effectiveness of the proposed method, experiments are conducted on two large-scale benchmark datasets (UCF-Crime and XD-Violence). Experimental results demonstrate that MCANet surpasses existing unsupervised and one-class approaches without requiring any training or data collection. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

WHOLE-SONG HIERARCHICAL GENERATION OF SYMBOLIC MUSIC USING CASCADED DIFFUSION MODELS 12

WHOLE-SONG HIERARCHICAL GENERATION OF SYMBOLIC MUSIC USING C...

引用

12th International Conference on learning Representations, ICLR 2024

作者： Wang, Ziyu Min, Lejun Xia, Gus Computer Science Department NYU Shanghai China Machine Learning Department MBZUAI United Arab Emirates

Recent deep music generation studies have put much emphasis on long-term generation with structures. However, we are yet to see high-quality, well-structured whole-song generation. In this paper, we make the first attempt to model a full music piece under the realization of compositional hierarchy. With a focus on symbolic representations of pop songs, we define a hierarchical language, in which each level of hierarchy focuses on the semantics and context dependency at a certain music scope. The high-level languages reveal whole-song form, phrase, and cadence, whereas the low-level languages focus on notes, chords, and their local patterns. A cascaded diffusion model is trained to model the hierarchical language, where each level is conditioned on its upper levels. Experiments and analysis show that our model is capable of generating full-piece music with recognizable global verse-chorus structure and cadences, and the music quality is higher than the baselines. Additionally, we show that the proposed model is controllable in a flexible way. By sampling from the interpretable hierarchical languages or adjusting pre-trained external representations, users can control the music flow via various features such as phrase harmonic structures, rhythmic patterns, and accompaniment texture. © 2024 12th International Conference on learning Representations, ICLR 2024. All rights reserved.

关键词： Diffusion

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：