检索结果-内蒙古大学图书馆

32nd ACM World Wide Web Conference, WWW 2023

作者： Sun, Bingkun Shen, Liwei Peng, Xin Wang, Ziming Fudan University China School of Computer Science Shanghai Key Laboratory of Data Science Fudan University China Shanghai Collaborative Innovation Center of Intelligent Visual Computing China

ISBN: (纸本)9781450394161

The physical world we live in is accelerating digitalization with the vigorous development of Internet of Things (IoT). Following this trend, Web of Things (WoT) further enables fast and efficient creation of various applications that perceive and act on the physical world using standard Web technologies. A popular way for creating WoT applications is Trigger-Action Programming (TAP), which allows users to orchestrate the capabilities of IoT devices in the form of "if trigger, then action". However, existing TAP approaches don't support scenario-centric WoT applications which involve abstract modeling of physical environments and complex spatio-temporal dependencies between events and actions. In this paper, we propose an approach called SCTAP which supports Scenario-Centric Trigger-Action Programming based on software-defined physical environments. SCTAP defines a structured and conceptual representation for physical environments, which provides the required programming abstractions for WoT applications. Based on the representation, SCTAP defines a grammar for specifying scenario-centric WoT applications with spatio-temporal dependencies. Furthermore, we design a service-based architecture for SCTAP which supports the integration of device access, event perception, environment representation, and rule execution in a loosely-coupled and extensible way. We implement SCTAP as a WoT infrastructure and evaluate it with two case studies including a smart laboratory and a smart coffee house. The results confirm the usability, feasibility and efficiency of SCTAP and its implementation. © 2023 ACM.

关键词： Internet of things

来源：评论

学校读者我要写书评

暂无评论

TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition 32

TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text R...

引用

32nd International Joint Conference on Artificial Intelligence, IJCAI 2023

作者： Zheng, Tianlun Chen, Zhineng Bai, Jinfeng Xie, Hongtao Jiang, Yu-Gang Shanghai Collaborative Innovation Center of Intelligent Visual Computing School of Computer Science Fudan University China Tomorrow Advance Life China University of Science and Technology of China China

ISBN: (纸本)9781956792034

Text irregularities pose significant challenges to scene text recognizers. Thin-Plate Spline (TPS)based rectification is widely regarded as an effective means to deal with them. Currently, the calculation of TPS transformation parameters purely depends on the quality of regressed text borders. It ignores the text content and often leads to unsatisfactory rectified results for severely distorted text. In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time. TPS++ formulates the parameter calculation as a joint process of foreground control point regression and content-based attention score estimation, which is computed by a dedicated designed gated-attention block. TPS++ builds a more flexible content-aware rectifier, generating a natural text correction that is easier to read by the subsequent recognizer. Moreover, TPS++ shares the feature backbone with the recognizer in part and implements the rectification at feature-level rather than image-level, incurring only a small overhead in terms of parameters and inference time. Experiments on public benchmarks show that TPS++ consistently improves the recognition and achieves state-of-the-art accuracy. Meanwhile, it generalizes well on different backbones and recognizers. Code is at https://***/simplify23/TPS PP. © 2023 International Joint Conferences on Artificial Intelligence. All rights reserved.

关键词： Character recognition

来源：评论

学校读者我要写书评

暂无评论

FaceA-Net: Facial Attribute-Driven ID Preserving Image Generation Network 39

FaceA-Net: Facial Attribute-Driven ID Preserving Image Gener...

引用

39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025

作者： Wang, Jiayu Yu, Yue Chen, Jingjing Dai, Qi Jiang, Yu-Gang Shanghai Key Lab of Intell. Info. Processing School of Computer Science Fudan University China Shanghai Collaborative Innovation Center of Intelligent Visual Computing China Microsoft Research Asia China

ISBN: (纸本)157735897X

Recent advances in diffusion-based generative models have demonstrated superior performance in subject-driven image generation. Identity (ID) preserving image generation, as a subtask of subject-driven image generation, aims to generate customized images for specific human identity and has broad application potential. However, this task remains challenging due to the requirement for high ID fidelity and precise detail preservation. Additionally, generating high-quality context presents another challenge, as existing methods struggle to achieve both high ID fidelity and satisfactory context simultaneously. To address the issues of insufficient ID fidelity, we introduce a simple yet effective test-time fine-tuning approach. Specifically, we propose an attribute-driven training method that establishes global-level and local-level tasks to learn the global face feature and fine-grained attribute features, respectively. Furthermore, we introduce a novel ID-context decoupling framework that decouples image context generation from human ID generation, ensuring the quality of contextual content as well as facilitating the learning of ID information. Through extensive experiments, we demonstrate the effectiveness of the proposed method and showcase its capabilities across various applications. Copyright © 2025, Association for the Advancement of Artificial Intelligence (***). All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Uncer2Natural: Uncertainty-Aware Unsupervised Image Denoising 48

Uncer2Natural: Uncertainty-Aware Unsupervised Image Denoisin...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Huang, Chenyu Tan, Weimin Shi, Jiaxing Xing, Zhen Yan, Bo Shanghai Collaborative Innovation Center of Intelligent Visual Computing Fudan University School of Computer Science Shanghai Key Laboratory of Intelligent Information Processing Shanghai China

ISBN: (纸本)9781728163277

Recently, unsupervised image denoising methods learning from paired noisy samples have received increasing attention. These methods build on the idea that the mean of multiple noisy images of the same scene is the ideal clean image. However, these methods ignore the effect of Aleatoric uncertainty in the noisy image (e.g., pixels deviating from the expected distribution). The presence of Aleatoric uncertainty causes degradation of the reconstructed target pixels, resulting in high uncertainty for these pixels (i.e., low confidence), which in turn leads to sub-optimal denoising results. To address this problem, we propose a novel uncertainty-aware unsupervised image denoising method named Uncer2Natural (U2N). It dynamically predicts the Aleatoric uncertainty for each noisy sample and produces satisfactory denoising results by reducing the effect of Aleatoric uncertainty. Extensive experimental results show that U2N outperforms state-of-the- art unsupervised image denoising methods in terms of both quantitative metrics and qualitative visual quality. © 2023 IEEE.

关键词： Pixels

来源：评论

学校读者我要写书评

暂无评论

Motion Matters: Difference-based Multi-scale Learning for Infrared UAV Detection

Motion Matters: Difference-based Multi-scale Learning for In...

引用

2023 IEEE/CVF Conference on computer Vision and Pattern Recognition Workshops, CVPRW 2023

作者： He, Ruian Zhou, Shili Cheng, Ri Sun, Yuqi Tan, Weimin Yan, Bo Shanghai Collaborative Innovation Center of Intelligent Visual Computing Fudan University School of Computer Science Shanghai Key Laboratory of Intelligent Information Processing Shanghai China

ISBN: (纸本)9798350302493

Unmanned Aerial Vehicle (UAV) detection in the wild is a challenging task due to the presence of background noise and the varying size of the object. To address these obstacles, we propose a novel learning framework for robust UAV detectors, which we call Difference-based Multi-scale Learning (DML). We argue that motion information matters in UAV detection because of the low recognition in one frame. Our method utilizes the frame difference of multiple previous frames, extracting motion information and blocking background noise. We also fuse multiple spatial-temporal scales for training and inferencing, enabling fusion from different sources. In addition, to better evaluate the performance of UAV detection in different scales, we propose Multi-Scale Average Precision (MSAP) metric to aggregate the detection accuracy over multiple scales. Through extensive experiments, we demonstrate that our proposed approach improves the detection accuracy of baseline models. Notably, we achieve SOTA performance in the 3rd Anti-UAV Challenge, with 2nd place in Track 2 and 4th place in Track 1. © 2023 IEEE.

关键词： Antennas

来源：评论

学校读者我要写书评

暂无评论

Clinical Inspired MRI Lesion Segmentation 22

Clinical Inspired MRI Lesion Segmentation

引用

22nd IEEE International Symposium on Biomedical Imaging, ISBI 2025

作者： Yan, Lijun Wang, Churan Zhong, Fangwei Wang, Yizhou School of Software and Microelectronics Peking University China School of Computer Science Nat'l Eng. Research Center of Visual Technology Peking University Center on Frontiers of Computing Studies China School of Artificial Intelligence Beijing Normal University China

ISBN: (纸本)9798331520526

Magnetic resonance imaging (MRI) is a potent diagnostic tool for detecting pathological tissues in various diseases. Different MRI sequences have different contrast mechanisms and sensitivities for different types of lesions, which pose challenges to accurate and consistent lesion segmentation. In clinical practice, radiologists commonly use the subsequence feature, i.e. the difference between post contrastenhanced T1-weighted (post) and pre-contrast-enhanced (pre) sequences, to locate lesions. Inspired by this, we propose a residual fusion method to learn subsequence representation for MRI lesion segmentation. Specifically, we iteratively and adaptively fuse features from pre- and post-contrast sequences at multiple resolutions, using dynamic weights to achieve optimal fusion and address diverse lesion enhancement patterns. Our method achieves state-of-the-art performances on BraTS2023 dataset for brain tumor segmentation and our in-house breast MRI dataset for breast lesion segmentation. Our method is clinically inspired and has the potential to facilitate lesion segmentation in various applications. © 2025 IEEE.

关键词： Dynamic contrast enhanced MRI

来源：评论

学校读者我要写书评

暂无评论

Autoregressive Sequence Modeling for 3D Medical Image Representation 39

Autoregressive Sequence Modeling for 3D Medical Image Repres...

引用

39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025

作者： Wang, Siwen Wang, Churan Gao, Fei Su, Lixian Zhang, Fandong Wang, Yizhou Yu, Yizhou School of Computing and Data Science The University of Hong Kong Hong Kong Center on Frontiers of Computing Studies School of Computer Science Nat'l Eng. Research Center of Visual Technology Peking University China Deepwise AI Lab China State Key Lab of General Artificial Intelligence Inst. for Artificial Intelligence Peking University China

ISBN: (纸本)157735897X

Three-dimensional (3D) medical images, such as Computed Tomography (CT) and Magnetic Resonance Imaging (MRI), are essential for clinical applications. However, the need for diverse and comprehensive representations is particularly pronounced when considering the variability across different organs, diagnostic tasks, and imaging modalities. How to effectively interpret the intricate contextual information and extract meaningful insights from these images remains an open challenge to the community. While current self-supervised learning methods have shown potential, they often consider an image as a whole thereby overlooking the extensive, complex relationships among local regions from one or multiple images. In this work, we introduce a pioneering method for learning 3D medical image representations through an autoregressive pre-training framework. Our approach sequences various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence. By employing an autoregressive sequence modeling task, we predict the next visual token in the sequence, which allows our model to deeply understand and integrate the contextual information inherent in 3D medical images. Additionally, we implement a random startup strategy to avoid overestimating token relationships and to enhance the robustness of learning. The effectiveness of our approach is demonstrated by the superior performance over others on nine downstream tasks in public datasets. Copyright © 2025, Association for the Advancement of Artificial Intelligence (***). All rights reserved.

关键词： Supervised learning

来源：评论

学校读者我要写书评

暂无评论

Blocky Volume Package: a Web-friendly Volume Storage and Compression Solution

Blocky Volume Package: a Web-friendly Volume Storage and Com...

引用

作者： Lesar, Žiga Bohak, Ciril Marolt, Matija University of Ljubljana Faculty of Computer and Information Science Večna pot 113 Ljubljana1000 Slovenia King Abdullah University of Science and Technology Visual Computing Center Thuwal23955 Saudi Arabia

The Blocky Volume Package (BVP) format is a distributed, platform-independent and API-independent format for storing static and temporal volumetric data. It is designed for efficient transfer over a network by supporting sparse volumes, multiple resolutions, random access, and streaming, as well as providing a strict framework for supporting a wide palette of encoding formats. The BVP format achieves this by dividing a volume or a volume sequence into blocks that can be compressed and reused. The metadata for the blocks are stored in separate files so that a client has all the information required for loading and decoding the blocks before the actual transmission, decoding and rendering take place. This design allows for random access and parallel loading and has been specifically designed for efficient use on the web platform by adhering to the current living standards. In the paper, we compare the BVP format with some of the most often implemented volume storage formats, and show that the BVP format supports most major features of these formats while at the same time being easily implementable and extensible. © 2023 The Author(s).

关键词： Volumetric analysis

来源：评论

学校读者我要写书评

暂无评论

Making Large Language Models Better Reasoners with Orchestrated Streaming Experiences

arXiv

引用

arXiv 2025年

作者： Liu, Xiangyang He, Junliang Qiu, Xipeng School of Computer Science Fudan University Shanghai Collaborative Innovation Center of Intelligent Visual Computing China

Large language models (LLMs) can perform complex reasoning by generating intermediate thoughts under zero-shot or few-shot settings. However, zero-shot prompting always encounters low performance, and the superior performance of few-shot prompting hinges on the manual-crafted demonstrations. In this paper, we present RoSE (Reasoning with Orchestrated Streaming Experiences), a general framework for solving reasoning tasks that can self-improve without complex external efforts. To enable RoSE, we describe an architecture that extends an LLM to store all answered questions and their thoughts in a streaming experience pool then orchestrates helpful questions from the pool to assist in answering new questions. To set up a question-aware orchestration mechanism, RoSE first calculates the similarity of each question in the pool with a new test question. Since the solution to each answered question is not always correct, RoSE will sort the questions according to their similarity with the new question, and then uniformly divide them into multiple buckets. It finally extracts one question from each bucket to make these extracted questions more diverse. To make these extracted questions help RoSE answer new questions as much as possible, we introduce two other attributes of uncertainty and complexity for each question. RoSE will preferentially select the questions with low uncertainty and high complexity from each bucket. We evaluate the versatility of RoSE in various reasoning tasks, LLMs, and CoT methods. Copyright © 2025, The Authors. All rights reserved.

关键词： Complex networks

来源：评论

学校读者我要写书评

暂无评论

GenRec: Unifying Video Generation and Recognition with Diffusion Models 38

GenRec: Unifying Video Generation and Recognition with Diffu...

引用

38th Conference on Neural Information Processing Systems, NeurIPS 2024

作者： Weng, Zejia Yang, Xitong Xing, Zhen Wu, Zuxuan Jiang, Yu-Gang Shanghai Key Lab of Intell. Info. Processing School of CS Fudan University China Shanghai Collaborative Innovation Center of Intelligent Visual Computing China Department of Computer Science University of Maryland United States

Video diffusion models are able to generate high-quality videos by learning strong spatial-temporal priors on large-scale datasets. In this paper, we aim to investigate whether such priors derived from a generative process are suitable for video recognition, and eventually joint optimization of generation and recognition. Building upon Stable Video Diffusion, we introduce GenRec, the first unified framework trained with a random-frame conditioning process so as to learn generalized spatial-temporal representations. The resulting framework can naturally supports generation and recognition, and more importantly is robust even when visual inputs contain limited information. Extensive experiments demonstrate the efficacy of GenRec for both recognition and generation. In particular, GenRec achieves competitive recognition performance, offering 75.8% and 87.2% accuracy on SSV2 and K400, respectively. GenRec also performs the best on class-conditioned image-to-video generation, achieving 46.5 and 49.3 FVD scores on SSV2 and EK-100 datasets. Furthermore, GenRec demonstrates extraordinary robustness in scenarios that only limited frames can be observed. Code will be available at https://***/wengzejia1/GenRec. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：