检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

281 篇 会议
41 册 图书
3 篇 期刊文献

馆藏范围

325 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

170 篇 工学
- 151 篇 计算机科学与技术...
- 92 篇 软件工程
- 46 篇 光学工程
- 35 篇 信息与通信工程
- 28 篇 生物工程
- 22 篇 电气工程
- 16 篇 控制科学与工程
- 9 篇 电子科学与技术（可...
- 8 篇 化学工程与技术
- 8 篇 生物医学工程（可授...
- 6 篇 网络空间安全
- 5 篇 机械工程
- 5 篇 安全科学与工程
- 3 篇 仪器科学与技术
- 3 篇 材料科学与工程（可...
- 3 篇 建筑学
- 3 篇 农业工程
76 篇 理学
- 47 篇 物理学
- 30 篇 生物学
- 24 篇 数学
- 11 篇 统计学（可授理学、...
- 8 篇 化学
20 篇 医学
- 15 篇 临床医学
- 3 篇 公共卫生与预防医...
- 3 篇 特种医学
18 篇 管理学
- 10 篇 管理科学与工程(可...
- 9 篇 图书情报与档案管...
- 5 篇 工商管理
4 篇 农学
- 4 篇 作物学
2 篇 法学
- 2 篇 社会学
2 篇 教育学

主题

37 篇 computer vision
25 篇 image processing...
21 篇 artificial intel...
20 篇 pattern recognit...
20 篇 computer imaging...
17 篇 machine learning
14 篇 computer applica...
13 篇 computer systems...
12 篇 signal, image an...
11 篇 deep learning
8 篇 image processing
6 篇 computers and ed...
5 篇 vision transform...
5 篇 low-level vision
4 篇 image enhancemen...
4 篇 object detection
4 篇 cell microscopy
4 篇 image segmentati...
4 篇 graphics process...
4 篇 stereo image pro...

机构

17 篇 microsoft resear...
17 篇 microsoft res as...
12 篇 tsinghua univers...
7 篇 university of sc...
7 篇 univ sci & techn...
6 篇 national key lab...
6 篇 shanghai collabo...
6 篇 shanghai key lab...
5 篇 university of te...
5 篇 dalian universit...
5 篇 university of sy...
5 篇 institute of aut...
5 篇 shenzhen univers...
4 篇 key laboratory o...
4 篇 national enginee...
4 篇 university of ud...
4 篇 tsinghua univ pe...
4 篇 microsoft cloud ...
4 篇 university of yo...
4 篇 vision and image...

作者

8 篇 han hu
7 篇 hu han
6 篇 zuxuan wu
6 篇 min xu
5 篇 hui huang
5 篇 jing dong
5 篇 yu-gang jiang
5 篇 jiwen lu
5 篇 risheng liu
5 篇 wanli ouyang
5 篇 huchuan lu
4 篇 gian luca forest...
4 篇 qi dai
4 篇 zhiguo cao
4 篇 dongdong chen
4 篇 edwin hancock
4 篇 boxin shi
4 篇 jean-jacques rou...
4 篇 zheng zhang
4 篇 andrea fusiello

语言

296 篇 英文
29 篇 其他
10 篇 中文

检索条件"任意字段=2023 Asia Conference on Computer Vision, Image Processing and Pattern Recognition, CVIPPR 2023"

共 325 条记录，以下是211-220 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

3D Cinemagraphy from a Single image

3D Cinemagraphy from a Single Image

引用

conference on computer vision and pattern recognition (CVPR)

作者： Xingyi Li Zhiguo Cao Huiqiang Sun Jianming Zhang Ke Xian Guosheng Lin Key Laboratory of Image Processing and Intelligent Control Ministry of Education School of Artificial Intelligence and Automation Huazhong University of Science and Technology S-Lab Nanyang Technological University Adobe Research

We present 3D Cinemagraphy, a new technique that mar-ries 2D image animation with 3D photography. Given a single still image as input, our goal is to generate a video that contains both visual content animation and camera motion. We empirically find that naively combining existing 2D image animation and 3D photography methods leads to obvious artifacts or inconsistent animation. Our key insight is that representing and animating the scene in 3D space offers a natural solution to this task. To this end, we first convert the input image into feature-based layered depth images using predicted depth values, followed by unprojecting them to a feature point cloud. To animate the scene, we perform motion estimation and lift the 2D motion into the 3D scene flow. Finally, to resolve the problem of hole emer-gence as points move forward, we propose to bidirectionally displace the point cloud as per the scene flow and synthe-size novel views by separately projecting them into target image planes and blending the results. Extensive experiments demonstrate the effectiveness of our method. A user study is also conducted to validate the compelling rendering results of our method.

关键词：

来源：评论

学校读者我要写书评

暂无评论

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Mo...

引用

conference on computer vision and pattern recognition (CVPR)

作者： Sucheng Ren Fangyun Wei Zheng Zhang Han Hu Microsoft Research Asia

Masked image modeling (MIM) performs strongly in pretraining large vision Transformers (ViTs). However, small models that are critical for real-world applications can-not or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on imageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on imageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://***/OliverRensu/TinyMIM.

关键词：

来源：评论

学校读者我要写书评

暂无评论

DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality

DeepVecFont-v2: Exploiting Transformers to Synthesize Vector...

引用

conference on computer vision and pattern recognition (CVPR)

作者： Yuqing Wang Yizhi Wang Longhui Yu Yuesheng Zhu Zhouhui Lian Wangxuan Institute of Computer Technology Peking University China School of Electronic and Computer Engineering Peking University China

Vector font synthesis is a challenging and ongoing problem in the fields of computer vision and computer Graphics. The recently-proposed DeepVecFont [27] achieved state-of-the-art performance by exploiting information of both the image and sequence modalities of vector fonts. However, it has limited capability for handling long sequence data and heavily relies on an image-guided outline refinement post-processing. Thus, vector glyphs synthesized by DeepVecFont still often contain some distortions and artifacts and cannot rival human-designed results. To address the above problems, this paper proposes an enhanced version of DeepVecFont mainly by making the following three novel technical contributions. First, we adopt Transformers instead of RNNs to process sequential data and design a relaxation representation for vector outlines, markedly improving the model's capability and stability of synthesizing long and complex outlines. Second, we propose to sample auxiliary points in addition to control points to precisely align the generated and target Bézier curves or lines. Finally, to alleviate error accumulation in the sequential generation process, we develop a context-based self-refinement module based on another Transformer-based decoder to remove artifacts in the initially synthesized glyphs. Both qualitative and quantitative results demonstrate that the proposed method effectively resolves those intrinsic problems of the original DeepVecFont and outperforms existing approaches in generating English and Chinese vector fonts with complicated structures and diverse styles.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Learning on Gradients: Generalized Artifacts Representation for GAN-Generated images Detection

Learning on Gradients: Generalized Artifacts Representation ...

引用

conference on computer vision and pattern recognition (CVPR)

作者： Chuangchuang Tan Yao Zhao Shikui Wei Guanghua Gu Yunchao Wei Institute of Information Science Beijing Jiaotong University Beijing Key Laboratory of Advanced Information Science and Network Technology School of Information Science and Engineering Yanshan University Hebei Key Laboratory of Information Transmission and Signal Processing

Recently, there has been a significant advancement in image generation technology, known as GAN. It can easily generate realistic fake images, leading to an increased risk of abuse. However, most image detectors suffer from sharp performance drops in unseen domains. The key of fake image detection is to develop a generalized representation to describe the artifacts produced by generation models. In this work, we introduce a novel detection framework, named Learning on Gradients (LGrad), designed for identifying GAN-generated images, with the aim of constructing a generalized detector with cross-model and cross-data. Specifically, a pretrained CNN model is employed as a transformation model to convert images into gradients. Subsequently, we leverage these gradients to present the generalized artifacts, which are fed into the classifier to ascertain the authenticity of the images. In our framework, we turn the data-dependent problem into a transformation-model-dependent problem. To the best of our knowledge, this is the first study to utilize gradients as the representation of artifacts in GAN-generated images. Extensive experiments demonstrate the effectiveness and robustness of gradients as generalized artifact representations. Our detector achieves a new state-of-the-art performance with a remarkable gain of 11.4%. The code is released at https://***/chuangchuangtan/LGrad.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Multi-View Body image-Based Prediction of Body Mass Index and Various Body Part Sizes

Multi-View Body Image-Based Prediction of Body Mass Index an...

引用

IEEE computer Society conference on computer vision and pattern recognition Workshops (CVPRW)

作者： Seunghyun Kim Kunyoung Lee Eui Chul Lee Department of AI & Informatics Graduate School Sangmyung University Department of Computer Science Graduate School Sangmyung University Department of Human-Centered Artificial Intelligence Sangmyung University

This paper proposes a novel model for predicting body mass index and various body part sizes using front, side, and back body images. The model is trained on a large dataset of labeled images. The results show that the model can accurately predict body mass index and various body part sizes such as chest, waist, hip, thigh, forearm, and shoulder width. One significant advantage of the proposed model is that it can use multiple views of the body to achieve more accurate predictions, overcoming the limitations of models that only used a single image. The model also does not require complex pre-processing or feature extraction, making it straightforward to apply in practice. We also explore the impact of different environmental factors, such as clothing and posture, on the model's performance. The findings show that the model is relatively insensitive to posture but is more sensitive to clothing, emphasizing the importance of controlling for clothing when using this model. Overall, the proposed model represents a step forward in predicting body mass index and various body part sizes from images. The model's accuracy, convenience, and ability to use multiple views of the body make it a promising tool for a wide range of applications. The proposed method is expected to be utilized as a parameter for accurate sensing of various vision-based non-contact biomarkers, in addition to body mass index inference.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Open-Vocabulary Point-Cloud Object Detection without 3D Annotation

Open-Vocabulary Point-Cloud Object Detection without 3D Anno...

引用

conference on computer vision and pattern recognition (CVPR)

作者： Yuheng Lu Chenfeng Xu Xiaobao Wei Xiaodong Xie Masayoshi Tomizuka Kurt Keutzer Shanghang Zhang National Key Laboratory for Multimedia Information Processing School of Computer Science University of California Berkeley

The goal of open-vocabulary detection is to identify novel objects based on arbitrary textual descriptions. In this paper, we address open-vocabulary 3D point-cloud detection by a dividing-and-conquering strategy, which involves: 1) developing a point-cloud detector that can learn a general representation for localizing various objects, and 2) connecting textual and point-cloud representations to enable the detector to classify novel object categories based on text prompting. Specifically, we resort to rich image pretrained models, by which the point-cloud detector learns localizing objects under the supervision of predicted 2D bounding boxes from 2D pretrained detectors. Moreover, we propose a novel de-biased triplet cross-modal contrastive learning to connect the modalities of image, point-cloud and text, thereby enabling the point-cloud detector to benefit from vision-language pretrained models, i.e., CLIP. The novel use of image and vision-language pretrained models for point-cloud detectors allows for open-vocabulary 3D object detection without the need for 3D annotations. Experiments demonstrate that the proposed method improves at least 3.03 points and 7.47 points over a wide range of baselines on the ScanNet and SUN RGB-D datasets, respectively. Furthermore, we provide a comprehensive analysis to explain why our approach works. Code is available at https://***/lyhdet/OV-3DET

关键词：

来源：评论

学校读者我要写书评

暂无评论

Bidirectional Copy-Paste for Semi-Supervised Medical image Segmentation

Bidirectional Copy-Paste for Semi-Supervised Medical Image S...

引用

conference on computer vision and pattern recognition (CVPR)

作者： Yunhao Bai Duowen Chen Qingli Li Wei Shen Yan Wang Shanghai Key Laboratory of Multidimensional Information Processing East China Normal University MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University

In semi-supervised medical image segmentation, there exist empirical mismatch problems between labeled and un-labeled data distribution. The knowledge learned from the labeled data may be largely discarded if treating labeled and unlabeled data separately or in an inconsistent manner. We propose a straightforward method for alleviating the problem-copy-pasting labeled and unlabeled data bidirectionally, in a simple Mean Teacher architecture. The method encourages unlabeled data to learn comprehensive common semantics from the labeled data in both inward and outward directions. More importantly, the consistent learning procedure for labeled and unlabeled data can largely reduce the empirical distribution gap. In detail, we copy-paste a random crop from a labeled image (foreground) onto an unlabeled image (background) and an unlabeled image (foreground) onto a labeled image (background), respectively. The two mixed images are fed into a Student network and supervised by the mixed supervisory signals of pseudo-labels and ground-truth. We reveal that the simple mechanism of copy-pasting bidirectionally between labeled and unlabeled data is good enough and the experiments show solid gains (e.g., over 21% Dice improvement on ACDC dataset with 5% labeled data) compared with other state-of-the-arts on various semi-supervised medical image segmentation datasets. Code is avaiable at https://***/DeepMed-Lab-ECNU/BCP.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for Multi-task Learning in computer vision Tasks for Robotic Grasping on the Edge

Fast GraspNeXt: A Fast Self-Attention Neural Network Archite...

引用

IEEE computer Society conference on computer vision and pattern recognition Workshops (CVPRW)

作者： Alexander Wong Yifan Wu Saad Abbasi Saeejith Nair Yuhao Chen Mohammad Javad Shafiee Vision and Image Processing Research Group University of Waterloo Waterloo Artificial Intelligence Institute Waterloo ON DarwinAI Corp. Waterloo ON

Multi-task learning has shown considerable promise for improving the performance of deep learning-driven vision systems for the purpose of robotic grasping. However, high architectural and computational complexity can result in poor suitability for deployment on embedded devices that are typically leveraged in robotic arms for real-world manufacturing and warehouse environments. As such, the design of highly efficient multi-task deep neural network architectures tailored for computer vision tasks for robotic grasping on the edge is highly desired for widespread adoption in manufacturing environments. Motivated by this, we propose Fast GraspNeXt, a fast self-attention neural network architecture tailored for embedded multi-task learning in computer vision tasks for robotic grasping. To build Fast GraspNeXt, we leverage a generative network architecture search strategy with a set of architectural constraints customized to achieve a strong balance between multitask learning performance and embedded inference efficiency. Experimental results on the MetaGraspNet benchmark dataset show that the Fast GraspNeXt network design achieves the highest performance (average precision (AP), accuracy, and mean squared error (MSE)) across multiple computer vision tasks when compared to other efficient multi-task network architecture designs, while having only 17.8M parameters (about >5× smaller), 259 GFLOPs (as much as >5× lower) and as much as >3.15× faster on a NVIDIA Jetson TX2 embedded processor.

关键词：

来源：评论

学校读者我要写书评

暂无评论

ConZIC: Controllable Zero-shot image Captioning by Sampling-Based Polishing

ConZIC: Controllable Zero-shot Image Captioning by Sampling-...

引用

conference on computer vision and pattern recognition (CVPR)

作者： Zequn Zeng Hao Zhang Ruiying Lu Dongsheng Wang Bo Chen Zhengjue Wang National Key Laboratory of Radar Signal Processing Xidian University Xi'an China State Key Laboratory of Integrated Service Networks Xidian University Xi'an China

Zero-shot capability has been considered as a new revolution of deep learning, letting machines work on tasks without curated training data. As a good start and the only existing outcome of zero-shot image captioning (IC), ZeroCap abandons supervised training and sequentially searches every word in the caption using the knowledge of large-scale pre-trained models. Though effective, its autoregressive generation and gradient-directed searching mechanism limit the diversity of captions and inference speed, respectively. Moreover, ZeroCap does not consider the controllability issue of zero-shot IC. To move forward, we propose a framework for Controllable Zero-shot IC, named ConZIC. The core of ConZIC is a novel sampling-based non-autoregressive language model named Gibbs-BERT, which can generate and continuously polish every word. Extensive quantitative and qualitative results demonstrate the superior performance of our proposed ConZIC for both zero-shot IC and controllable zero-shot IC. Especially, ConZIC achieves about $5\times$ generation speed than ZeroCap, and about $1.5\times$ diversity scores, with accurate generation given different control signals. Our code is available at https://***/joeyz0z/ConZIC.

关键词：

来源：评论

学校读者我要写书评

暂无评论

NTIRE 2023 Quality Assessment of Video Enhancement Challenge

NTIRE 2023 Quality Assessment of Video Enhancement Challenge

引用

IEEE computer Society conference on computer vision and pattern recognition Workshops (CVPRW)

作者： Xiaohong Liu Xiongkuo Min Wei Sun Yulun Zhang Kai Zhang Radu Timofte Guangtao Zhai Yixuan Gao Yuqin Cao Tengchuan Kou Yunlong Dong Ziheng Jia Yilin Li Kai Zhao Heng Cong Hang Shi Zhiliang Ma Mirko Agarla Zhiwei Huang Hongye Liu Ironhead Chuang Haotian Fan Shiqi Zhou Yu Lai Wenqi Wang Haoning Wu Chunzheng Zhu Shiling Zhao Hanene Brachemi Meftah Tengfei Shi Azadeh Mansouri

This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance.

关键词：

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共33页 << < 18 19 20 21 22 23 24 25 26 27 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：