检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,886 篇 会议
5 篇 期刊文献

馆藏范围

11,891 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,060 篇 工学
- 7,618 篇 计算机科学与技术...
- 796 篇 机械工程
- 688 篇 电气工程
- 361 篇 软件工程
- 228 篇 控制科学与工程
- 41 篇 光学工程
- 19 篇 生物工程
- 17 篇 信息与通信工程
- 12 篇 生物医学工程（可授...
- 7 篇 交通运输工程
- 6 篇 电子科学与技术（可...
- 6 篇 建筑学
- 5 篇 仪器科学与技术
- 5 篇 化学工程与技术
- 5 篇 安全科学与工程
- 4 篇 土木工程
3,347 篇 医学
- 3,346 篇 临床医学
- 4 篇 基础医学(可授医学...
- 4 篇 公共卫生与预防医...
254 篇 理学
- 198 篇 系统科学
- 32 篇 物理学
- 21 篇 生物学
- 19 篇 数学
- 9 篇 统计学（可授理学、...
- 7 篇 化学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 7 篇 图书情报与档案管...
- 5 篇 工商管理
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
2 篇 农学
1 篇 经济学
1 篇 军事学

主题

5,633 篇 computer vision
2,668 篇 training
2,203 篇 pattern recognit...
1,747 篇 computational mo...
1,502 篇 visualization
1,360 篇 three-dimensiona...
1,074 篇 semantics
999 篇 benchmark testin...
986 篇 codes
959 篇 computer archite...
892 篇 deep learning
777 篇 conferences
754 篇 task analysis
700 篇 feature extracti...
561 篇 transformers
533 篇 face recognition
527 篇 neural networks
495 篇 object detection
490 篇 image segmentati...
468 篇 cameras

机构

174 篇 univ sci & techn...
145 篇 carnegie mellon ...
144 篇 univ chinese aca...
144 篇 tsinghua univ pe...
134 篇 chinese univ hon...
110 篇 zhejiang univ pe...
109 篇 peng cheng lab p...
99 篇 swiss fed inst t...
91 篇 tsinghua univers...
90 篇 shanghai ai lab ...
87 篇 sensetime res pe...
86 篇 shanghai jiao to...
83 篇 zhejiang univers...
82 篇 tech univ munich...
79 篇 university of sc...
79 篇 stanford univ st...
78 篇 univ hong kong p...
77 篇 australian natl ...
76 篇 alibaba grp peop...
75 篇 peng cheng labor...

作者

75 篇 timofte radu
64 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
37 篇 loy chen change
36 篇 tao dacheng
32 篇 zhou jie
31 篇 chen chen
30 篇 liu yang
30 篇 tian qi
29 篇 sun jian
29 篇 zha zheng-jun
28 篇 li xin
27 篇 qi tian
26 篇 vasconcelos nuno
25 篇 liu xiaoming
25 篇 darrell trevor
24 篇 zheng wei-shi
24 篇 luo ping
24 篇 ying shan

语言

11,849 篇 英文
41 篇 其他
1 篇 中文

检索条件"任意字段=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024"

共 11891 条记录，以下是1261-1270 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training

The Dialog Must Go On: Improving Visual Dialog via Generativ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Kang, Gi-Cheon Kim, Sungdong Kim, Jin-Hwa Kwak, Donghyun Zhang, Byoung-Tak Seoul Natl Univ IPAI Seoul South Korea AIIS Seoul South Korea NAVER AI Lab Seongnam South Korea NAVER Cloud CLOVA Seongnam South Korea

ISBN: (纸本)9798350301298

Visual dialog (VisDial) is a task of answering a sequence of questions grounded in an image, using the dialog history as context. Prior work has trained the dialog agents solely on VisDial data via supervised learning or leveraged pre-training on related vision-and-language datasets. This paper presents a semi-supervised learning approach for visually-grounded dialog, called Generative Self-Training (GST), to leverage unlabeled images on the Web. Specifically, GST first retrieves in-domain images through out-of-distribution detection and generates synthetic dialogs regarding the images via multimodal conditional text generation. GST then trains a dialog agent on the synthetic and the original VisDial data. As a result, GST scales the amount of training data up to an order of magnitude that of VisDial (1.2M. 12.9M QA data). For robust training of the synthetic dialogs, we also propose perplexity-based data selection and multimodal consistency regularization. Evaluation on VisDial v1.0 and v0.9 datasets shows that GST achieves new state-of-the-art results on both datasets. We further observe the robustness of GST against both visual and textual adversarial attacks. Finally, GST yields strong performance gains in the low-data regime. Code is available at https: //***/gicheonkang/gst-visdial.

关键词： and reasoning language vision

来源：评论

学校读者我要写书评

暂无评论

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval

Towards Fast Adaptation of Pretrained Contrastive Models for...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Lin, Xudong Tiwari, Simran Huang, Shiyuan Li, Manling Shou, Mike Zheng Ji, Heng Chang, Shih-Fu Columbia Univ New York NY 10027 USA UIUC Champaign IL USA Natl Univ Singapore Singapore Singapore

ISBN: (纸本)9798350301298

Multi-channel video-language retrieval require models to understand information from different channels (e.g. video+question, video+speech) to correctly link a video with a textual response or query. Fortunately, contrastive multimodal models are shown to be highly effective at aligning entities in images/videos and text, e.g., CLIP [20];text contrastive models are extensively studied recently for their strong ability of producing discriminative sentence embeddings, e.g., SimCSE [5]. However, there is not a clear way to quickly adapt these two lines to multi-channel video-language retrieval with limited data and resources. In this paper, we identify a principled model design space with two axes: how to represent videos and how to fuse video and text information. Based on categorization of recent methods, we investigate the options of representing videos using continuous feature vectors or discrete text tokens;for the fusion method, we explore the use of a multimodal transformer or a pretrained contrastive text model. We extensively evaluate the four combinations on five video-language datasets. We surprisingly find that discrete text tokens coupled with a pretrained contrastive text model yields the best performance, which can even outperform state-of-the-art on the iVQA and How2QA datasets without additional training on millions of video-text data. Further analysis shows that this is because representing videos as text tokens captures the key visual information and text tokens are naturally aligned with text models that are strong retrievers after the contrastive pretraining process. All the empirical analysis establishes a solid foundation for future research on affordable and upgradable multimodal intelligence.

关键词： and reasoning language vision

来源：评论

学校读者我要写书评

暂无评论

Meta-Explore: Exploratory Hierarchical vision-and-Language Navigation Using Scene Object Spectrum Grounding

Meta-Explore: Exploratory Hierarchical Vision-and-Language N...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Hwang, Minyoung Jeong, Jaeyeon Kim, Minsoo Oh, Yoonseon Oh, Songhwai Seoul Natl Univ Elect & Comp Engn Seoul South Korea Seoul Natl Univ ASRI Seoul South Korea Hanyang Univ Dept Elect Engn Seoul South Korea Seoul Natl Univ Interdisciplinary Major Artificial Intelligence Seoul South Korea

ISBN: (纸本)9798350301298

The main challenge in vision-and-language navigation (VLN) is how to understand natural-language instructions in an unseen environment. The main limitation of conventional VLN algorithms is that if an action is mistaken, the agent fails to follow the instructions or explores unnecessary regions, leading the agent to an irrecoverable path. To tackle this problem, we propose Meta-Explore, a hierarchical navigation method deploying an exploitation policy to correct misled recent actions. We show that an exploitation policy, which moves the agent toward a well-chosen local goal among unvisited but observable states, outperforms a method which moves the agent to a previously visited state. We also highlight the demand for imagining regretful explorations with semantically meaningful clues. The key to our approach is understanding the object placements around the agent in spectral-domain. Specifically, we present a novel visual representation, called scene object spectrum (SOS), which performs category-wise 2D Fourier transform of detected objects. Combining exploitation policy and SOS features, the agent can correct its path by choosing a promising local goal. We evaluate our method in three VLN benchmarks: R2R, SOON, and REVERIE. Meta-Explore outperforms other baselines and shows significant generalization performance. In addition, local goal search using the proposed spectral-domain SOS features significantly improves the success rateby17.1% andSPLby20.6% against the state-of-the-art method of the SOON benchmark. Project page: https://***/projects/Meta-Explore/***

关键词： Robotics

来源：评论

学校读者我要写书评

暂无评论

Distilling vision-Language Pre-training to Collaborate with Weakly-Supervised Temporal Action Localization

Distilling Vision-Language Pre-training to Collaborate with ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Ju, Chen Zheng, Kunhao Liu, Jinxiang Zhao, Peisen Zhang, Ya Chang, Jianlong Tian, Qi Wang, Yanfeng Shanghai Jiao Tong Univ CMIC Shanghai Peoples R China Shanghai AI Lab Shanghai Peoples R China Huawei Cloud Shenzhen Peoples R China

ISBN: (纸本)9798350301298

Weakly-supervised temporal action localization (WTAL) learns to detect and classify action instances with only category labels. Most methods widely adopt the off-the-shelf Classification-Based Pre-training (CBP) to generate video features for action localization. However, the different optimization objectives between classification and localization, make temporally localized results suffer from the serious incomplete issue. To tackle this issue without additional annotations, this paper considers to distill free action knowledge from vision-Language Pre-training (VLP), as we surprisingly observe that the localization results of vanilla VLP have an over-complete issue, which is just complementary to the CBP results. To fuse such complementarity, we propose a novel distillation-collaboration framework with two branches acting as CBP and VLP respectively. The framework is optimized through a dual-branch alternate training strategy. Specifically, during the B step, we distill the confident background pseudo-labels from the CBP branch;while during the F step, the confident foreground pseudo-labels are distilled from the VLP branch. As a result, the dual-branch complementarity is effectively fused to promote one strong alliance. Extensive experiments and ablation studies on THUMOS14 and ActivityNet1.2 reveal that our method significantly outperforms state-of-the-art methods.

关键词： Video: Action and event understanding

来源：评论

学校读者我要写书评

暂无评论

IFSeg: Image-free Semantic Segmentation via vision-Language Model

IFSeg: Image-free Semantic Segmentation via Vision-Language ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Yun, Sukmin Park, Seong Hyeon Seo, Paul Hongsuck Shin, Jinwoo Korea Adv Inst Sci & Technol KAIST Daejeon South Korea Google Res Seoul South Korea Mohamed Bin Zayed Univ Artificial Intelligence MB Abu Dhabi U Arab Emirates

ISBN: (纸本)9798350301298

vision-language (VL) pre-training has recently gained much attention for its transferability and flexibility in novel concepts (e.g., cross-modality transfer) across various visual tasks. However, VL-driven segmentation has been underexplored, and the existing approaches still have the burden of acquiring additional training images or even segmentation annotations to adapt a VL model to downstream segmentation tasks. In this paper, we introduce a novel image-free segmentation task where the goal is to perform semantic segmentation given only a set of the target semantic categories, but without any task-specific images and annotations. To tackle this challenging task, our proposed method, coined IFSeg, generates VL-driven artificial image-segmentation pairs and updates a pre-trained VL model to a segmentation task. We construct this artificial training data by creating a 2D map of random semantic categories and another map of their corresponding word tokens. Given that a pre-trained VL model projects visual and text tokens into a common space where tokens that share the semantics are located closely, this artificially generated word map can replace the real image inputs for such a VL model. Through an extensive set of experiments, our model not only establishes an effective baseline for this novel task but also demonstrates strong performances compared to existing methods that rely on stronger supervision, such as task-specific images and segmentation masks. Code is available at https://***/alinlab/ifseg.

关键词： grouping and shape analysis Segmentation

来源：评论

学校读者我要写书评

暂无评论

Learning to Segment Every Referring Object Point by Point

Learning to Segment Every Referring Object Point by Point

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Qu, Mengxue Wu, Yu Wei, Yunchao Liu, Wu Liang, Xiaodan Zhao, Yao Beijing Jiaotong Univ Inst Informat Sci Beijing Peoples R China Beijing Key Lab Adv Informat Sci & Network Techno Beijing Peoples R China Wuhan Univ Wuhan Peoples R China JD Explore Acad Beijing Peoples R China Sun Yat Sen Univ Guangzhou Peoples R China MBZUAI Abu Dhabi U Arab Emirates

ISBN: (纸本)9798350301298

Referring Expression Segmentation (RES) can facilitate pixel-level semantic alignment between vision and language. Most of the existing RES approaches require massive pixel-level annotations, which are expensive and exhaustive. In this paper, we propose a new partially supervised training paradigm for RES, i.e., training using abundant referring bounding boxes and only a few (e.g., 1%) pixel-level referring masks. To maximize the transferability from the REC model, we construct our model based on the point-based sequence prediction model. We propose the co-content teacher-forcing to make the model explicitly associate the point coordinates (scale values) with the referred spatial features, which alleviates the exposure bias caused by the limited segmentation masks. To make the most of referring bounding box annotations, we further propose the resampling pseudo points strategy to select more accurate pseudo-points as supervision. Extensive experiments show that our model achieves 52.06% in terms of accuracy (versus 58.93% in fully supervised setting) on RefCOCO+@testA, when only using 1% of the mask annotations. Code is available at https://***/ qumengxue/Partial- ***.

关键词： language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization

Bit-shrinking: Limiting Instantaneous Sharpness for Improvin...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Lin, Chen Peng, Bo Li, Zheyang Tan, Wenming Ren, Ye Xiao, Jun Pu, Shiliang Hikvis Res Inst Hangzhou Peoples R China Zhe Jiang Univ Hangzhou Peoples R China

ISBN: (纸本)9798350301298

Post-training quantization (PTQ) is an effective compression method to reduce the model size and computational cost. However, quantizing a model into a low-bit one, e.g., lower than 4, is difficult and often results in non-negligible performance degradation. To address this, we investigate the loss landscapes of quantized networks with various bit-widths. We show that the network with more ragged loss surface, is more easily trapped into bad local minima, which mostly appears in low-bit quantization. A deeper analysis indicates, the ragged surface is caused by the injection of excessive quantization noise. To this end, we detach a sharpness term from the loss which reflects the impact of quantization noise. To smooth the rugged loss surface, we propose to limit the sharpness term small and stable during optimization. Instead of directly optimizing the target bit network, we design a self-adapted shrinking scheduler for the bit-width in continuous domain from high bit-width to the target by limiting the increasing sharpness term within a proper range. It can be viewed as iteratively adding small "instant" quantization noise and adjusting the network to eliminate its impact. Widely experiments including classification and detection tasks demonstrate the effectiveness of the Bit-shrinking strategy in PTQ. On the vision Transformer models, our INT8 and INT6 models drop within 0.5% and 1.5% Top-1 accuracy, respectively. On the traditional CNN networks, our INT4 quantized models drop within 1.3% and 3.5% Top-1 accuracy on ResNet18 and MobileNetV2 without fine-tuning, which achieves the state-of-the-art performance.

关键词： Efficient and scalable vision

来源：评论

学校读者我要写书评

暂无评论

Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking

Observation-Centric SORT: Rethinking SORT for Robust Multi-O...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Cao, Jinkun Pang, Jiangmiao Weng, Xinshuo Khirodkar, Rawal Kitani, Kris Carnegie Mellon Univ Pittsburgh PA 15213 USA Shanghai AI Lab Shanghai Peoples R China Nvidia Santa Clara CA USA

ISBN: (纸本)9798350301298

Kalman filter (KF) based methods for multi-object tracking (MOT) make an assumption that objects move linearly. While this assumption is acceptable for very short periods of occlusion, linear estimates of motion for prolonged time can be highly inaccurate. Moreover, when there is no measurement available to update Kalman filter parameters, the standard convention is to trust the priori state estimations for posteriori update. This leads to the accumulation of errors during a period of occlusion. The error causes significant motion direction variance in practice. In this work, we show that a basic Kalman filter can still obtain state-of-the-art tracking performance if proper care is taken to fix the noise accumulated during occlusion. Instead of relying only on the linear state estimate (i.e., estimation-centric approach), we use object observations (i.e., the measurements by object detector) to compute a virtual trajectory over the occlusion period to fix the error accumulation of filter parameters. This allows more time steps to correct errors accumulated during occlusion. We name our method Observation-Centric SORT (OC-SORT). It remains Simple, Online, and Real-Time but improves robustness during occlusion and non-linear motion. Given off-the-shelf detections as input, OC-SORT runs at 700+ FPS on a single CPU. It achieves state-of-the-art on multiple datasets, including MOT17, MOT20, KITTI, head tracking, and especially DanceTrack where the object motion is highly non-linear. The code and models are available at https://***/noahcao/OC_SORT.

关键词： vision applications and systems

来源：评论

学校读者我要写书评

暂无评论

Diffusion in the Dark: A Diffusion Model for Low-Light Text recognition

Diffusion in the Dark: A Diffusion Model for Low-Light Text ...

引用

ieee/cvf Winter conference on Applications of computer vision (WACV)

作者： Nguyen, Cindy M. Chan, Eric R. Bergman, Alexander W. Wetzstein, Gordon Stanford Univ Stanford CA 94305 USA

ISBN: (纸本)9798350318920;9798350318937

Capturing images is a key part of automation for high-level tasks such as scene text recognition. Low-light conditions pose a challenge for high-level perception stacks, which are often optimized on well-lit, artifact-free images. Reconstruction methods for low-light images can produce well-lit counterparts, but typically at the cost of high-frequency details critical for downstream tasks. We propose Diffusion in the Dark (DiD), a diffusion model for low-light image reconstruction for text recognition. DiD provides qualitatively competitive reconstructions with that of state-of-the-art (SOTA), while preserving high-frequency details even in extremely noisy, dark conditions. We demonstrate that DiD, without any task-specific optimization, can outperform SOTA low-light methods in low-light text recognition on real images, bolstering the potential of diffusion models to solve ill-posed inverse problems. Our code and pretrained models can be found on https://***/diffusion-in-the-dark/.

关键词： 3D Algorithms Algorithms Computational photography etc Generative models for image image and video synthesis video

来源：评论

学校读者我要写书评

暂无评论

Semantic Prompt for Few-Shot Image recognition

Semantic Prompt for Few-Shot Image Recognition

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Chen, Wentao Si, Chenyang Zhang, Zhang Wang, Liang Wang, Zilei Tan, Tieniu Univ Sci & Technol China Hefei Peoples R China CASIA NLPR Ctr Res Intelligent Percept & Comp Hangzhou Peoples R China Nanyang Technol Univ Singapore Singapore Univ Chinese Acad Sci Beijing Peoples R China

ISBN: (纸本)9798350301298

Few-shot learning is a challenging problem since only a few examples are provided to recognize a new class. Several recent studies exploit additional semantic information, e.g. text embeddings of class names, to address the issue of rare samples through combining semantic prototypes with visual prototypes. However, these methods still suffer from the spurious visual features learned from the rare support samples, resulting in limited benefits. In this paper, we propose a novel Semantic Prompt (SP) approach for few-shot learning. Instead of the naive exploitation of semantic information for remedying classifiers, we explore leveraging semantic information as prompts to tune the visual feature extraction network adaptively. Specifically, we design two complementary mechanisms to insert semantic prompts into the feature extractor: one is to enable the interaction between semantic prompts and patch embeddings along the spatial dimension via self-attention, another is to supplement visual features with the transformed semantic prompts along the channel dimension. By combining these two mechanisms, the feature extractor presents a better ability to attend to the class-specific features and obtains more generalized image representations with merely a few support samples. Through extensive experiments on four datasets, the proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.

关键词： continual low-shot meta or long-tail learning Transfer

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 123 124 125 126 127 128 129 130 131 132 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：