检索结果-内蒙古大学图书馆

38th AAAI conference on Artificial Intelligence (AAAI) / 36th conference on Innovative Applications of Artificial Intelligence / 14th Symposium on Educational Advances in Artificial Intelligence

作者： Deng, Sen Feng, Yidan Lin, Haoneng Fan, Yiting Lee, Alex Pui-Wai Hu, Xiaowei Qin, Jing Hong Kong Polytech Ctr Smart Hlth Sch Nursing Hong Kong Peoples R China Shanghai Jiao Tong Univ Shanghai Chest Hosp Dept Cardiol Sch Med Shanghai 200030 Peoples R China Chinese Univ Hong Kong Div Cardiol Dept Med & Therapeut Hong Kong Peoples R China Shanghai Artificial Intelligence Lab Shanghai Peoples R China

ISBN: (纸本)1577358872

Semi-supervised learning (SSL) is a powerful tool to address the challenge of insufficient annotated data in medical segmentation problems. However, existing semi-supervised methods mainly rely on internal knowledge for pseudo labeling, which is biased due to the distribution mismatch between the highly imbalanced labeled and unlabeled data. Segmenting left atrial appendage (LAA) from transesophageal echocardiogram (TEE) images is a typical medical image segmentation task featured by scarcity of professional annotations and diverse data distributions, for which existing SSL models cannot achieve satisfactory performance. In this paper, we propose a novel strategy to mitigate the inherent challenge of distribution mismatch in SSL by, for the first time, incorporating a large foundation model (i. e. SAM in our implementation) into an SSL model to improve the quality of pseudo labels. We further propose a new self-reconstruction mechanism to generate both noise-resilient prompts to demonically improve SAM's generalization capability over TEE images and self-perturbations to stabilize the training process and reduce the impact of noisy labels. We conduct extensive experiments on an in-house TEE dataset;experimental results demonstrate that our method achieves better performance than state-of-the-art SSL models.

关键词： image enhancement

来源：评论

学校读者我要写书评

暂无评论

Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis

Progressive Disentangled Representation Learning for Fine-Gr...

引用

IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR)

作者： Wang, Duomin Deng, Yu Yin, Zixin Shum, Heung-Yeung Wang, Baoyuan Xiaobing AI Beijing Peoples R China

ISBN: (纸本)9798350301298

We present a novel one-shot talking head synthesis method that achieves disentangled and fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression. We represent different motions via disentangled latent representations and leverage an image generator to synthesize talking heads from them. To effectively disentangle each motion factor, we propose a progressive disentangled representation learning strategy by separating the factors in a coarse-to-fine manner, where we first extract unified motion feature from the driving signal, and then isolate each fine-grained motion from the unified feature. We leverage motion-specific contrastive learning and regressing for non-emotional motions, and introduce feature-level decorrelation and self-reconstruction for emotional expression, to fully utilize the inherent properties of each motion factor in unstructured video data to achieve disentanglement. Experiments show that our method provides high quality speech&lip-motion synchronization along with precise and disentangled control over multiple extra facial motions, which can hardly be achieved by previous methods.

关键词： image and video synthesis and generation

来源：评论

学校读者我要写书评

暂无评论

Rethinking Out-of-distribution (OOD) Detection: Masked image Modeling is All You Need

Rethinking Out-of-distribution (OOD) Detection: Masked Image...

引用

IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR)

作者： Li, Jingyao Chen, Pengguang He, Zexin Yu, Shaozuo Liu, Shu Jia, Jiaya Chinese Univ Hong Kong Hong Kong Peoples R China SmartMore Hong Kong Peoples R China

ISBN: (纸本)9798350301298

The core of out-of-distribution (OOD) detection is to learn the in-distribution (ID) representation, which is distinguishable from OOD samples. Previous work applied recognition-based methods to learn the ID features, which tend to learn shortcuts instead of comprehensive representations. In this work, we find surprisingly that simply using reconstruction-based methods could boost the performance of OOD detection significantly. We deeply explore the main contributors of OOD detection and find that reconstruction-based pretext tasks have the potential to provide a generally applicable and efficacious prior, which benefits the model in learning intrinsic data distributions of the ID dataset. Specifically, we take Masked image Modeling as a pretext task for our OOD detection framework (MOOD). Without bells and whistles, MOOD outperforms previous SOTA of one-class OOD detection by 5.7%, multi-class OOD detection by 3.0%, and near-distribution OOD detection by 2.1%. It even defeats the 10-shot-per-class outlier exposure OOD detection, although we do not include any OOD samples for our detection. Codes are available at https://***/lijingyao20010602/MOOD.

关键词： Self-supervised or unsupervised representation learning

来源：评论

学校读者我要写书评

暂无评论

Action-Infused Reinforcement Learning for Predicting Vascular Ultrasound images 11

Action-Infused Reinforcement Learning for Predicting Vascula...

引用

11th IEEE International conference on Cybernetics and Intelligent Systems (CIS) / 11th IEEE International conference on Robotics, Automation and Mechatronics (RAM)

作者： Li, Jiaming Li, Yunjiao Guo, Cong Huang, Haohui Lou, Haifang Guo, Jing Guangdong Univ Technol Sch Automat Guangzhou Peoples R China Zhejiang Chinese Med Univ Affiliated Hosp 1 Zhejiang Prov Hosp Chinese Med Hangzhou Peoples R China

ISBN: (纸本)9798350364200;9798350364194

Ultrasound, as a safe, cost-effective, real-time imaging diagnostic tool, has been widely utilized in medical examinations. In recent years, research has focused on utilizing reinforcement learning (RL) and 3D vascular reconstruction methods to achieve robot-assisted ultrasound scanning. In robot-assisted scanning, predicting the ultrasound images and features corresponding to the robot's next action helps the agent make better action decisions and achieve scanning goals more efficiently. To this end, we propose a reinforcement learning (RL) framework using the Advantage Actor-Critic (A2C) algorithm to predict ultrasound images and incorporate a LSTM module to leverage temporal information from adjacent time points. To validate the algorithm's effectiveness, we constructed virtual and real environments to collect scanning data for agent training. In ultrasound vascular scanning, the focus is often on the relationship between the vessel's position and shape in ultrasound images as the probe's position changes. To extract relevant information from ultrasound images, we employ an ellipse fitting method for feature extraction and train a Unet network in a real environment for vessel segmentation in ultrasound images. By collecting vascular ultrasound scanning data and inputting it into the RL agent network for training, we can predict the ultrasound image information corresponding to the probe's position in the next time point, given the probe's position and ultrasound images from the previous N time points.

关键词： Ultrasound image prediction reinforcement learning robotic ultrasound

来源：评论

学校读者我要写书评

暂无评论

Robust Category-Level 3D Pose Estimation from Diffusion-Enhanced Synthetic data

Robust Category-Level 3D Pose Estimation from Diffusion-Enha...

引用

IEEE/CVF Winter conference on Applications of Computer Vision (WACV)

作者： Yang, Jiahao Ma, Wufei Wang, Angtian Yuan, Xiaoding Yuille, Alan Kortylewski, Adam Peking Univ Beijing Peoples R China Johns Hopkins Univ Baltimore MD USA Univ Freiburg Freiburg Germany Max Planck Inst Informat Saarbrucken Germany

ISBN: (纸本)9798350318920;9798350318937

Obtaining accurate 3D object poses is vital for numerous computer vision applications, such as 3D reconstruction and scene understanding. However, annotating real-world objects is time-consuming and challenging. While synthetically generated training data is a viable alternative, the domain shift between real and synthetic data is a significant challenge. In this work, we aim to narrow the performance gap between models trained on synthetic data and fully supervised models trained on a large amount of real data. We achieve this by approaching the problem from two perspectives: 1) We introduce P3D-Diffusion, a new synthetic dataset with accurate 3D annotations generated with a graphics-guided diffusion model. 2) We propose Cross-domain 3D Consistency, CC3D, for unsupervised domain adaptation of neural mesh models. In particular, we exploit the spatial relationships between features on the mesh surface and a contrastive learning scheme to guide the domain adaptation process. Combined, these two approaches enable our models to perform competitively with state-of-the-art models using only 10% of the respective real training images, while outperforming the SOTA model by a wide margin using only 50% of the real training data. By encouraging the diversity of synthetic data and generating the images with an OOD-aware manner, our model further demonstrates robust generalization to out-of-distribution scenarios despite being trained with minimal real data. The code is available at https://***/YangYY06/synthetic_3d.

关键词： 3D computer vision Algorithms Algorithms image recognition and understanding

来源：评论

学校读者我要写书评

暂无评论

TS-NeRF: temporal and structural regularization for few-shot neural radiance fields 3

TS-NeRF: temporal and structural regularization for few-shot...

引用

3rd International conference on Electronic Information Engineering, Big data, and Computer Technology, EIBDCT 2024

作者： Wei, Yunpei Zhang, Weimin Li, Fangxing Guo, Ziyuan School of Mechatronical Engineering Beijing Institute of Technology Beijing China Zhengzhou Academy of Intelligent Technology Zhengzhou China Key Laboratory of Biomimetic Robots and Systems Beijing Institute of Technology Ministry of Education Beijing China Wuhan Second Ship Design and Research Institute Wuhan China

ISBN: (纸本)9781510680449

For NeRF(Neural Radiance Fields), synthesizing new views from sparse inputs poses a challenge as too few inputs can lead to artifacts in the rendered views. Recent methods have tackled this issue by introducing external supervision or utilizing regularization based on priors like depth to enhance reconstruction quality. Few-shot NeRF requires additional constraint information to ensure reconstruction quality. To address this, we employed two novel regularization methods. Firstly, we introduced a loss related to view structure, reshaping multiple random points into a batch to capture the relationships between points. This structural regularization method is termed SSLIP. Additionally, studies indicate that high-frequency signals during reconstruction hinder neural networks from effectively learning low-frequency information. Based on this research, we improved the encoding of positional codes, enabling their frequency to increase with the number of training iterations, referred to as temporal regularization. This enhancement ensures NeRF effectively learns lowfrequency information during the initial training stages. Our method, building upon the current state-of-the-art ViP-NeRF model, achieved superior results on the LLFF dataset. © COPYRIGHT SPIE. Downloading of the abstract is permitted for personal use only.

关键词： image processing

来源：评论

学校读者我要写书评

暂无评论

Learning Rich Information for Quad Bayer Remosaicing and Denoising 17th

Learning Rich Information for Quad Bayer Remosaicing and D...

引用

17th European conference on Computer Vision, ECCV 2022

作者： Jia, Jun Sun, Hanchi Liu, Xiaohong Xiao, Longan Xu, Qihang Zhai, Guangtao Institute of Image Communication and Network Engineering Shanghai Jiao Tong University Shanghai China John Hopcroft Center for Computer Science Shanghai Jiao Tong University Shanghai China Shanghai Transsion Information Technology Shanghai China

ISBN: (纸本)9783031250712

In this paper, we propose a DNNs-based solution to jointly remosaic and denoise the camera raw data in Quad Bayer pattern. The traditional remosaic problem can be viewed as an interpolation process that converts the Quad Bayer pattern to a normal CFA pattern, such as the RGGB one. However, this process becomes more challenging when the input Quad Bayer data is noisy. In addition, the limited amount of data available for this task is not sufficient to train neural networks. To address these issues, we view the remosaic problem as a bayer reconstruction problem and use an image restoration model to remove noises while remosaicing the Quad Bayer data implicitly. To make full use of the color information, we propose a two-stage training strategy. The first stage uses the ground-truth RGGB Bayer map to supervise the reconstruction process, and the second stage leverages the provided image Signal Processor (ISP) to generate the RGB images from our reconstructed bayers. With the use of color information in the second stage, the quality of reconstructed bayers is further improved. Moreover, we propose a data pre-processing method including data augmentation and bayer rearrangement. The experimental results show it can significantly benefit the network training. Our solution achieves the best KLD score with one order of magnitude lead, and overall ranks the second in Quad Joint Remosaic and Denoise @ MIPI-challenge. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： image reconstruction

来源：评论

学校读者我要写书评

暂无评论

DLGAN: Depth-Preserving Latent Generative Adversarial Network for 3D reconstruction

引用

IEEE TRANSACTIONS ON MULTIMEDIA 2021年 23卷 2843-2856页

作者： Liu, Caixia Kong, Dehui Wang, Shaofan Li, Jinghua Yin, Baocai Beijing Univ Technol Fac Informat Technol Beijing Artificial Intelligence Inst Beijing Key Lab Multimedia & Intelligent Software Beijing 100124 Peoples R China

Although deep networks based methods outperform traditional 3D reconstruction methods which require multiocular images or class labels to recover the full 3D geometry, they may produce incomplete recovery and unfaithful reconstruction when facing occluded parts of 3D objects. To address these issues, we propose Depth-preserving Latent Generative Adversarial Network (DLGAN) which consists of 3D Encoder-Decoder based GAN (EDGAN, serving as a generator and a discriminator) and Extreme Learning Machine (ELM, serving as a classifier) for 3D reconstruction from a monocular depth image of an object. Firstly, EDGAN decodes a latent vector from the 2.5D voxel grid representation of an input image, and generates the initial 3D occupancy grid under common GAN losses, a latent vector loss and a depth loss. For the latent vector loss, we design 3D deep AutoEncoder (AE) to learn a target latent vector from ground truth 3D voxel grid and utilize the vector to penalize the latent vector encoded from the input 2.5D data. For the depth loss, we utilize the input 2.5D data to penalize the initial 3D voxel grid from 2.5D views. Afterwards, ELM transforms float values of the initial 3D voxel grid to binary values under a binary reconstruction loss. Experimental results show that DLGAN not only outperforms several state-of-the-art methods by a large margin on both a synthetic dataset and a real-world dataset, but also predicts more occluded parts of 3D objects accurately without class labels.

关键词： Three-dimensional displays image reconstruction Shape Gallium nitride Generative adversarial networks Two dimensional displays Transforms 3D reconstruction depth loss ELM latent vector monocular depth image

来源：评论

学校读者我要写书评

暂无评论

Optimization of data Acquisition in Tomography Using Kalman Estimation Filter 46

Optimization of Data Acquisition in Tomography Using Kalman ...

引用

46th Annual International conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2024

作者： Pashaie, Ramin Florida Atlantic University Electrical and Computer Engineering Department 777 Glades Rd. Engineering East Building Room 325 Boca RatonFL33432 United States

ISBN: (纸本)9798350371499

Tomography is the process of reconstructing three dimensional images from two-dimensional projections. In general, this process has two separate phases: data acquisition and image reconstruction. This article concentrates on the optimization of the first phase via using the tools of estimation theory and specifically the Kalman estimation filter. We demonstrate that by choosing the right measurement matrix in Kalman filter, we can maximize the information content of each measurement and reconstruct images of the same quality making less number of measurements or by using less number of sources/detectors. © 2024 IEEE.

关键词： Kalman filters

来源：评论

学校读者我要写书评

暂无评论

HIGH-QUALITY SELF-SUPERVISED SNAPSHOT HYPERSPECTRAL IMAGING 47

HIGH-QUALITY SELF-SUPERVISED SNAPSHOT HYPERSPECTRAL IMAGING

引用

47th IEEE International conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Quan, Yuhui Qin, Xinran Chen, Mingqin Huang, Yan South China Univ Technol Sch Comp Sci & Engn Guangzhou Peoples R China

ISBN: (纸本)9781665405409

Hyperspectral image (HSI) reconstruction is about recovering a 3D HSI from its 2D snapshot measurements, to which deep models have become a promising approach. However, most existing studies train deep models on large amounts of organized data, the collection of which can be difficult in many applications. This paper leverages the image priors encoded in untrained neural networks (NNs) to have a self-supervised learning method which is free from training datasets while adaptive to the statistics of a test sample. To induce better image priors and prevent the NN overfitting undesired solutions, we construct an unrolling-based NN equipped with fractional max pooling (FMP). Furthermore, the FMP is used with randomness to enable self-ensemble for reconstruction accuracy improvement. In the experiments, our self-supervised learning approach enjoys high-quality reconstruction and outperforms recent methods including the supervised ones.

关键词： Hyperspectral imaging Self-Supervised learning image reconstruction Untrained network priors

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：