检索结果-内蒙古大学图书馆

CLIP-Flow:Decoding images encoded in CLIP space

Computational visual Media 2024年第6期10卷 1157-1168页

作者： Hao Ma Ming Li Jingyuan Yang Or Patashnik Dani Lischinski Daniel Cohen-Or Hui Huang Visual Computing Research Center College of Computer Science and Software EngineeringShenzhen UniversityShenzhen 518060China Department of Computer Science Tel Aviv UniversityTel Aviv 6997801Israel School of Computer Science and Engineering the Hebrew University of JerusalemJerusalem 91904Israel

This study introduces CLIP-Flow,a novel network for generating images from a given image or *** effectively utilize the rich semantics contained in both modalities,we designed a semantics-guided methodology for image-and text-to-image *** particular,we adopted Contrastive Language-Image Pretraining(CLIP)as an encoder to extract semantics and StyleGAN as a decoder to generate images from such ***,to bridge the embedding space of CLIP and latent space of StyleGAN,real NVP is employed and modified with activation normalization and invertible *** the images and text in CLIP share the same representation space,text prompts can be fed directly into CLIP-Flow to achieve text-to-image *** conducted extensive experiments on several datasets to validate the effectiveness of the proposed image-to-image synthesis *** addition,we tested on the public dataset Multi-Modal CelebA-HQ,for text-to-image *** validated that our approach can generate high-quality text-matching images,and is comparable with state-of-the-art methods,both qualitatively and quantitatively.

关键词： image-to-image text-to-image contrastive language-image pretraining(CLIP) flow StyleGAN

来源：评论

学校读者我要写书评

暂无评论

Taming diffusion model for exemplar-based image translation

引用

Computational visual Media 2024年第6期10卷 1031-1043页

作者： Ma, Hao Yang, Jingyuan Huang, Hui Shenzhen University Visual Computing Research Center College of Computer Science and Software Engineering Shenzhen China (GRID:grid.263488.3) (ISNI:0000 0001 0472 9649)

Exemplar-based image translation involves converting semantic masks into photorealistic images that adopt the style of a given ***,most existing GAN-based translation methods fail to produce photorealistic *** this study,we propose a new diffusion model-based approach for generating high-quality images that are semantically aligned with the input mask and resemble an exemplar in *** proposed method trains a conditional denoising diffusion probabilistic model(DDPM)with a SPADE module to integrate the semantic *** then used a novel contextual loss and auxiliary color loss to guide the optimization process,resulting in images that were visually pleasing and semantically *** demonstrate that our method outperforms state-of-the-art approaches in terms of both visual quality and quantitative metrics.

关键词： exemplar image translation denoising diffusion probabilistic model(DDPM)

来源：评论

学校读者我要写书评

暂无评论

Self-Supervised Color-Concept Association via Image Colorization

引用

IEEE Transactions on visualization and computer Graphics 2023年第1期29卷 247-256页

作者： Hu, Ruizhen Ye, Ziqi Chen, Bin Van Kaick, Oliver Huang, Hui Shenzhen University Visual Computing Research Center China Carleton University School of Computer Science Canada

The interpretation of colors in visualizations is facilitated when the assignments between colors and concepts in the visualizations match human's expectations, implying that the colors can be interpreted in a semantic manner. However, manually creating a dataset of suitable associations between colors and concepts for use in visualizations is costly, as such associations would have to be collected from humans for a large variety of concepts. To address the challenge of collecting this data, we introduce a method to extract color-concept associations automatically from a set of concept images. While the state-of-the-art method extracts associations from data with supervised learning, we developed a self-supervised method based on colorization that does not require the preparation of ground truth color-concept associations. Our key insight is that a set of images of a concept should be sufficient for learning color-concept associations, since humans also learn to associate colors to concepts mainly from past visual input. Thus, we propose to use an automatic colorization method to extract statistical models of the color-concept associations that appear in concept images. Specifically, we take a colorization model pre-trained on ImageNet and fine-tune it on the set of images associated with a given concept, to predict pixel-wise probability distributions in Lab color space for the images. Then, we convert the predicted probability distributions into color ratings for a given color library and aggregate them for all the images of a concept to obtain the final color-concept associations. We evaluate our method using four different evaluation metrics and via a user study. Experiments show that, although the state-of-the-art method based on supervised learning with user-provided ratings is more effective at capturing relative associations, our self-supervised method obtains overall better results according to metrics like Earth Mover's Distance (EMD) and Entropy Differenc

关键词： Data mining

来源：评论

学校读者我要写书评

暂无评论

Making Large Language Models Better Reasoners with Orchestrated Streaming Experiences

Making Large Language Models Better Reasoners with Orchestra...

引用

2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024

作者： Liu, Xiangyang He, Junliang Qiu, Xipeng School of Computer Science Fudan University Shanghai Collaborative Innovation Center of Intelligent Visual Computing China

ISBN: (纸本)9798891761643

Large language models (LLMs) can perform complex reasoning by generating intermediate thoughts under zero-shot or few-shot settings. However, zero-shot prompting always encounters low performance, and the superior performance of few-shot prompting hinges on the manual-crafted demonstrations. In this paper, we present RoSE (Reasoning with Orchestrated Streaming Experiences), a general framework for solving reasoning tasks that can self-improve without complex external efforts. To enable RoSE, we describe an architecture that extends an LLM to store all answered questions and their thoughts in a streaming experience pool then orchestrates helpful questions from the pool to assist in answering new questions. To set up a question-aware orchestration mechanism, RoSE first calculates the similarity of each question in the pool with a new test question. Since the solution to each answered question is not always correct, RoSE will sort the questions according to their similarity with the new question, and then uniformly divide them into multiple buckets. It finally extracts one question from each bucket to make these extracted questions more diverse. To make these extracted questions help RoSE answer new questions as much as possible, we introduce two other attributes of uncertainty and complexity for each question. RoSE will preferentially select the questions with low uncertainty and high complexity from each bucket. We evaluate the versatility of RoSE in various reasoning tasks, LLMs, and CoT methods. © 2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

A hybrid memory architecture supporting fine-grained data migration

引用

Frontiers of computer science 2024年第2期18卷 31-41页

作者： Ye CHI Jianhui YUE Xiaofei LIAO Haikun LIU Hai JIN National Engineering Research Center for Big Data Technology and System Services Computing Technology and System LabCluster and Grid Computing LabSchool of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan 430074China Department of Computer Science Michigan Technological UniversityMichigan 49931USA

Hybrid memory systems composed of dynamic random access memory(DRAM)and Non-volatile memory(NVM)often exploit page migration technologies to fully take the advantages of different memory *** previous proposals usually migrate data at a granularity of 4 KB pages,and thus waste memory bandwidth and DRAM *** this paper,we propose Mocha,a non-hierarchical architecture that organizes DRAM and NVM in a flat address space physically,but manages them in a cache/memory *** the commercial NVM device-Intel Optane DC Persistent Memory Modules(DCPMM)actually access the physical media at a granularity of 256 bytes(an Optane block),we manage the DRAM cache at the 256-byte size to adapt to this feature of *** design not only enables fine-grained data migration and management for the DRAM cache,but also avoids write amplification for Intel Optane *** also create an Indirect Address Cache(IAC)in Hybrid Memory Controller(HMC)and propose a reverse address mapping table in the DRAM to speed up address translation and cache ***,we exploit a utility-based caching mechanism to filter cold blocks in the NVM,and further improve the efficiency of the DRAM *** implement Mocha in an architectural *** results show that Mocha can improve application performance by 8.2%on average(up to 24.6%),reduce 6.9%energy consumption and 25.9%data migration traffic on average,compared with a typical hybrid memory architecture-HSCC.

关键词： non-volatile memory hybrid memory system data migration fine-grained caching

来源：评论

学校读者我要写书评

暂无评论

Crowd dynamics analysis and behavior recognition in surveillance videos based on deep learning

引用

Multimedia Tools and Applications 2024年 1-35页

作者： Ilyas, Anum Bawany, Narmeen Center for Computing Research Department of Computer Science and Software Engineering Jinnah University for Women Karachi Pakistan

Video surveillance is widely adopted across various sectors for purposes such as law enforcement, COVID-19 isolation monitoring, and analyzing crowds for potential threats like flash mobs or violence. The vast amount of data generated daily from surveillance devices holds significant potential but requires effective analysis to extract value. Detecting anomalous crowd behavior, which can lead to chaos and casualties, is particularly challenging in video surveillance due to its labor-intensive nature and susceptibility to errors. To address these challenges, this research contributes in two key areas: first, by creating a diverse and representative video dataset that accurately reflects real-world crowd dynamics across eight different categories;second, by developing a reliable framework, ‘CRAB-NET,’ for automated behavior recognition. Extensive experimentation and evaluation, using Convolutional Long Short-Term Memory networks (ConvLSTM) and Long-Term Recurrent Convolutional Networks (LRCN), validated the effectiveness of the proposed approach in accurately categorizing behaviors observed in surveillance videos. The employed models were able to achieve the accuracy score of 99.46% for celebratory crowd, 99.98% for formal crowd and 96.69% for violent crowd. The demonstrated accuracy of 97.20% for comprehensive dataset achieved by the LRCN underscores its potential to revolutionize crowd behavior analysis. It ensures safer mass gatherings and more effective security interventions. Incorporating AI-powered crowd behavior recognition like ‘CRAB-NET’ into security measures not only safeguards public gatherings but also paves the way for proactive event management and predictive safety strategies. © The Author(s), under exclusive licence to Springer science+Business Media, LLC, part of Springer Nature 2024.

关键词： Shellfish

来源：评论

学校读者我要写书评

暂无评论

Composer Classification Using Maximum Probability Partitioning Based on Compression Principles

引用

Transactions of the Japanese Society for Artificial Intelligence 2024年第2期39卷 1-10页

作者： Takamoto, Ayaka Hironaka, Shiori Umemura, Kyoji Department of Computer Science and Engineering Toyohashi University of Technology Japan Academic Center for Computing and Media Studies Kyoto University Japan

Music classification is a fundamental task in the field of Music Information Retrieval. This paper focuses on composer classification, a specific task within music classification. Compressive techniques are commonly employed in such music classification tasks. In this study, we propose a method to apply the computing Information Quantity using Maximum Probability partitioning to music classification. To evaluate the effectiveness of our proposed method, we perform composer classification, specifically distinguishing between Haydn and Mozart, who are well-known for their stylistic similarities. The experimental results demonstrate that our proposed approach outperforms traditional compression-based classification methods. Furthermore, we compare our method with non-compressive techniques, discussing the significance of feature extraction methods. Our proposed method is a parameter-free classification approach that does not require domain-specific musical expertise or feature extraction based on such expertise. © 2024, Japanese Society for Artificial Intelligence. All rights reserved.

关键词： Classification (of information)

来源：评论

学校读者我要写书评

暂无评论

Dual Encoder-Decoder Shifted Window-Based Transformer Network for Polyp Segmentation with Self-Learning Approach

IEEE Transactions on Artificial Intelligence

引用

IEEE Transactions on Artificial Intelligence 2024年第7期5卷 3456-3469页

作者： Lijin, P. Ullah, Mohib Vats, Anuja Cheikh, Faouzi Alaya Kumar, Santhosh Nair, Madhu S. Cochin University of Science and Technology Artificial Intelligence & Computer Vision Laboratory Department of Computer Science Kerala Kochi682022 India Norwegian University of Science and Technology Gjovik2815 Norway Norwegian University of Science and Technology Norwegian Colour and Visual Computing Laboratory Gjovik2815 Norway

According to WHO reports, cancer is the leading cause of death worldwide. The second most prevalent cause of cancer-related death in both men and women is colorectal cancer (CRC). One potential approach for reducing the severity of colon cancer is to utilize automatic segmentation and detection of colorectal polyps in colonoscopy videos. This technology can assist endoscopists in quickly identifying colorectal disease, leading to earlier intervention and better patient Quality of Life (QoL). In this article, we propose a self-supervised transformer based dual encoder-decoder architecture named P-SwinNet for polyps segmentation in colonoscopy images. The P-SwinNet adapts the dual encoder-decoder type of model to enhance the feature maps by sharing multiscale information from the encoder to the decoder. The proposed model uses multiple dilated convolutions to enlarge the field of view to gather more information without increasing the computational cost and the loss of spatial information. We also leverage a large-scale unlabeled dataset for training our model using the self-learning strategy of Barlow twins. Additionally, to capture the long-range dependencies in the data, we used a shift window-based approach that computes global attention. We extensively evaluate our model against state-of-the-art algorithms. The quantitative results show that the proposed P-SwinNet achieves a mean dice score of 0.87 and a mean Intersection over Union (IoU) of 0.82 on five datasets used in our study. This performance demonstrates a substantial advancement over existing similar works, highlighting the advantage and novelty of our proposed approach in the field of medical image segmentation. © 2020 IEEE.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

The performance-interpretability trade-off: a comparative study of machine learning models

引用

Journal of Reliable Intelligent Environments 2025年第1期11卷 1-15页

作者： Assis, André Dantas, Jamilson Andrade, Ermeson Department of Computing Federal Rural University of Pernambuco Pernambuco Recife Brazil Computer Science Center Federal University of Pernambuco Pernambuco Recife Brazil

Machine learning models are increasingly being integrated into various aspects of society, impacting decision-making processes across domains such as healthcare, finance, and autonomous systems. However, as these models become more complex, concerns about their transparency and interpretability have emerged. Transparent models, which provide detailed and understandable explanations, stand in contrast to opaque models, which often achieve higher accuracy but lack interpretability. This study presents a comparative analysis, examining the performance and explainability of transparent models (K-Nearest Neighbors (KNN), Decision Trees, and Logistic Regression) and opaque models (Convolutional Neural Networks (CNN), Random Forests, and Support Vector Machines (SVM)) in an intelligent environment. Our experimental evaluation explores the balance between performance (accuracy and response time) and explainability, a crucial aspect for the effective deployment of Artificial Intelligence (AI) in smart systems. Our results indicate that opaque models such as CNN, SVM, and Random Forest achieved higher accuracy (up to 98% on MNIST and 95% on Fake and Real News) compared to transparent models (up to 94% on MNIST and 92% on Fake and Real News). However, transparent models exhibited faster response times and greater interpretability, especially under high workload conditions, highlighting the trade-off between performance and interpretability. © The Author(s), under exclusive licence to Springer Nature Switzerland AG 2024.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

A learning automata based edge resource allocation approach for IoT-enabled smart cities

引用

Digital Communications and Networks 2024年第5期10卷 1258-1266页

作者： Sampa Sahoo Kshira Sagar Sahoo Bibhudatta Sahoo Amir H.Gandomi Department of Computer Science and Engineering C.V.Raman Global UniversityBhubaneswarIndia Department of Computer Science and Engineering SRM UniversityAmaravatiIndia Department of Computing Science Umea UniversityUmea90187Sweden Department of Computer Science and Engineering NIT RourkelaIndia Faculty of Engineering and Information Technology University of Technology SydneyAustralia University Research and Innovation Center(EKIK) Obuda University1034 BudapestHungary

The development of the Internet of Things(IoT)technology is leading to a new era of smart applications such as smart transportation,buildings,and smart ***,these applications act as the building blocks of IoT-enabled smart *** high volume and high velocity of data generated by various smart city applications are sent to flexible and efficient cloud computing resources for ***,there is a high computation latency due to the presence of a remote cloud *** computing,which brings the computation close to the data source is introduced to overcome this *** an IoT-enabled smart city environment,one of the main concerns is to consume the least amount of energy while executing tasks that satisfy the delay *** efficient resource allocation at the edge is helpful to address this *** this paper,an energy and delay minimization problem in a smart city environment is formulated as a bi-objective edge resource allocation ***,we presented a three-layer network architecture for IoT-enabled smart ***,we designed a learning automata-based edge resource allocation approach considering the three-layer network architecture to solve the said bi-objective minimization *** Automata(LA)is a reinforcement-based adaptive decision-maker that helps to find the best task and edge resource *** extensive set of simulations is performed to demonstrate the applicability and effectiveness of the LA-based approach in the IoT-enabled smart city environment.

关键词： Edge computing IoT Learning automata Resource allocation Smart city

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：