检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

11,745 篇 会议
8 篇 期刊文献

馆藏范围

11,753 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,139 篇 工学
- 7,674 篇 计算机科学与技术...
- 804 篇 机械工程
- 580 篇 软件工程
- 376 篇 电气工程
- 252 篇 控制科学与工程
- 208 篇 光学工程
- 85 篇 生物工程
- 83 篇 信息与通信工程
- 29 篇 生物医学工程（可授...
- 23 篇 电子科学与技术（可...
- 21 篇 化学工程与技术
- 15 篇 交通运输工程
- 14 篇 安全科学与工程
- 10 篇 网络空间安全
- 8 篇 仪器科学与技术
- 6 篇 材料科学与工程（可...
- 6 篇 动力工程及工程热...
3,194 篇 医学
- 3,190 篇 临床医学
- 11 篇 基础医学(可授医学...
- 7 篇 公共卫生与预防医...
481 篇 理学
- 216 篇 物理学
- 203 篇 系统科学
- 88 篇 生物学
- 55 篇 数学
- 29 篇 统计学（可授理学、...
- 24 篇 化学
55 篇 管理学
- 29 篇 图书情报与档案管...
- 28 篇 管理科学与工程(可...
- 12 篇 工商管理
17 篇 法学
- 15 篇 社会学
6 篇 农学
4 篇 教育学
2 篇 经济学
1 篇 军事学
1 篇 艺术学

主题

5,434 篇 computer vision
2,516 篇 training
2,087 篇 pattern recognit...
1,621 篇 computational mo...
1,435 篇 visualization
1,306 篇 three-dimensiona...
1,060 篇 semantics
981 篇 codes
968 篇 benchmark testin...
898 篇 computer archite...
884 篇 deep learning
762 篇 task analysis
681 篇 feature extracti...
536 篇 face recognition
527 篇 conferences
515 篇 transformers
515 篇 neural networks
479 篇 object detection
466 篇 image segmentati...
454 篇 cameras

机构

168 篇 univ sci & techn...
144 篇 univ chinese aca...
144 篇 tsinghua univ pe...
143 篇 carnegie mellon ...
135 篇 chinese univ hon...
112 篇 peng cheng lab p...
108 篇 zhejiang univ pe...
97 篇 swiss fed inst t...
92 篇 tsinghua univers...
92 篇 sensetime res pe...
88 篇 shanghai ai lab ...
85 篇 zhejiang univers...
84 篇 shanghai jiao to...
78 篇 peng cheng labor...
77 篇 university of sc...
77 篇 alibaba grp peop...
76 篇 univ hong kong p...
76 篇 tech univ munich...
76 篇 stanford univ st...
73 篇 university of ch...

作者

76 篇 timofte radu
64 篇 van gool luc
50 篇 zhang lei
44 篇 yang yi
40 篇 loy chen change
34 篇 tao dacheng
32 篇 liu yang
32 篇 chen chen
30 篇 zhou jie
30 篇 tian qi
30 篇 sun jian
28 篇 zha zheng-jun
27 篇 qi tian
26 篇 li xin
26 篇 vasconcelos nuno
26 篇 ying shan
25 篇 liu xiaoming
25 篇 luc van gool
25 篇 boxin shi
24 篇 zheng wei-shi

语言

11,746 篇 英文
7 篇 其他

检索条件"任意字段=2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023"

共 11753 条记录，以下是151-160 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

SketchXAI: A First Look at Explainability for Human Sketches

SketchXAI: A First Look at Explainability for Human Sketches

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Qu, Zhiyu Gryaditskayal, Yulia Li, Ke Pang, Kaiyue Xiang, Tao Song, Yi-Zhe Univ Surrey SketchX CVSSP Guildford Surrey England Beijing Univ Posts & Telecommun Beijing Peoples R China IFlyTek Surrey Joint Res Ctr Artificial Intellige Guildford Surrey England

ISBN: (纸本)9798350301298

This paper, for the very first time, introduces human sketches to the landscape of XAI (Explainable Artificial Intelligence). We argue that sketch as a "human-centred" data form, represents a natural interface to study explainability. We focus on cultivating sketch-specific explainability designs. This starts by identifying strokes as a unique building block that offers a degree of flexibility in object construction and manipulation impossible in photos. Following this, we design a simple explainability-friendly sketch encoder that accommodates the intrinsic properties of strokes: shape, location, and order. We then move on to define the first ever XAI task for sketch, that of stroke location inversion (SLI). Just as we have heat maps for photos, and correlation matrices for text, SLI offers an explainability angle to sketch in terms of asking a network how well it can recover stroke locations of an unseen sketch. We offer qualitative results for readers to interpret as snapshots of the SLI process in the paper, and as GIFs on the project page. A minor but interesting note is that thanks to its sketch-specific design, our sketch encoder also yields the best sketch recognition accuracy to date while having the smallest number of parameters. The code is available at https://***.

关键词： Explainable computer vision

来源：评论

学校读者我要写书评

暂无评论

Multi-Modal Representation Learning with Text-Driven Soft Masks

Multi-Modal Representation Learning with Text-Driven Soft Ma...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Park, Jaeyoo Han, Bohyung Seoul Natl Univ Comp Vis Lab ECE Seoul South Korea Seoul Natl Univ IPAI Seoul South Korea

ISBN: (纸本)9798350301298

We propose a visual-linguistic representation learning approach within a self-supervised learning framework by introducing a new operation, loss, and data augmentation strategy. First, we generate diverse features for the image-text matching (ITM) task via soft-masking the regions in an image, which are most relevant to a certain word in the corresponding caption, instead of completely removing them. Since our framework relies only on image-caption pairs with no fine-grained annotations, we identify the relevant regions to each word by computing the word-conditional visual attention using multi-modal encoder. Second, we encourage the model to focus more on hard but diverse examples by proposing a focal loss for the image-text contrastive learning (ITC) objective, which alleviates the inherent limitations of overfitting and bias issues. Last, we perform multi-modal data augmentations for self-supervised learning via mining various examples by masking texts and rendering distortions on images. We show that the combination of these three innovations is effective for learning a pretrained model, leading to outstanding performance on multiple vision-language downstream tasks.

关键词： language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

Continuous Sign Language recognition with Correlation Network

Continuous Sign Language Recognition with Correlation Networ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Hu, Lianyu Gao, Liqing Liu, Zekang Feng, Wei Tianjin Univ Coll Intelligence & Comp Tianjin 300350 Peoples R China

ISBN: (纸本)9798350301298

Human body trajectories are a salient cue to identify actions in the video. Such body trajectories are mainly conveyed by hands and face across consecutive frames in sign language. However, current methods in continuous sign language recognition (CSLR) usually process frames independently, thus failing to capture cross-frame trajectories to effectively identify a sign. To handle this limitation, we propose correlation network (CorrNet) to explicitly capture and leverage body trajectories across frames to identify signs. In specific, a correlation module is first proposed to dynamically compute correlation maps between the current frame and adjacent frames to identify trajectories of all spatial patches. An identification module is then presented to dynamically emphasize the body trajectories within these correlation maps. As a result, the generated features are able to gain an overview of local temporal movements to identify a sign. Thanks to its special attention on body trajectories, CorrNet achieves new state-of-the-art accuracy on four large-scale datasets, i.e., PHOENIX14, PHOENIX14-T, CSL-Daily, and CSL. A comprehensive comparison with previous spatial-temporal reasoning methods verifies the effectiveness of CorrNet. Visualizations demonstrate the effects of CorrNet on emphasizing human body trajectories across adjacent frames.

关键词： body gesture Humans: Face movement pose

来源：评论

学校读者我要写书评

暂无评论

On the Benefits of 3D Pose and Tracking for Human Action recognition

On the Benefits of 3D Pose and Tracking for Human Action Rec...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Rajasegaran, Jathushan Pavlakos, Georgios Kanazawa, Angjoo Feichtenhofer, Christoph Malik, Jitendra Univ Calif Berkeley Berkeley CA 94720 USA Meta AI FAIR Menlo Pk CA 94025 USA

ISBN: (纸本)9798350301298

In this work we study the benefits of using tracking and 3D poses for action recognition. To achieve this, we take the Lagrangian view on analysing actions over a trajectory of human motion rather than at a fixed point in space. Taking this stand allows us to use the tracklets of people to predict their actions. In this spirit, first we show the benefits of using 3D pose to infer actions, and study person-person interactions. Subsequently, we propose a Lagrangian Action recognition model by fusing 3D pose and contextualized appearance over tracklets. To this end, our method achieves state-of-the-art performance on the AVA v2.2 dataset on both pose only settings and on standard benchmark settings. When reasoning about the action using only pose cues, our pose model achieves +10.0 mAP gain over the corresponding state-of-the-art while our fused model has a gain of +2.8 mAP over the best state-of-the-art model. Code and results are available at: https://***/LART

关键词： Video: Action and event understanding

来源：评论

学校读者我要写书评

暂无评论

Learning a Practical SDR-to-HDRTV Up-conversion using New Dataset and Degradation Models

Learning a Practical SDR-to-HDRTV Up-conversion using New Da...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Guo, Cheng Fan, Leidong Xue, Ziyu Jiang, Xiuhua Commun Univ China State Key Lab Media Convergence & Commun Beijing Peoples R China Peng Cheng Lab Shenzhen Peoples R China Peking Univ Beijing Peoples R China Natl Radio & Televis Adm Acad Broadcasting Sci Beijing Peoples R China

ISBN: (纸本)9798350301298

In media industry, the demand of SDR-to-HDRTV up-conversion arises when users possess HDR-WCG (high dynamic range-wide color gamut) TVs while most off-the-shelf footage is still in SDR (standard dynamic range). The research community has started tackling this low-level vision task by learning-based approaches. When applied to real SDR, yet, current methods tend to produce dim and desaturated result, making nearly no improvement on viewing experience. Different from other network-oriented methods, we attribute such deficiency to training set (HDR-SDR pair). Consequently, we propose new HDRTV dataset (dubbed HDRTV4K) and new HDR-to-SDR degradation models. Then, it's used to train a luminance-segmented network (LSN) consisting of a global mapping trunk, and two Transformer branches on bright and dark luminance range. We also update assessment criteria by tailored metrics and subjective experiment. Finally, ablation studies are conducted to prove the effectiveness. Our work is available at: https://***/AndreGuo/HDRTVDM.

关键词： Low-level vision

来源：评论

学校读者我要写书评

暂无评论

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

LAVENDER: Unifying Video-Language Understanding as Masked La...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Li, Linjie Can, Zhe Lin, Kevin Lin, Chung-Ching Liu, Zicheng Liu, Ce Wang, Lijuan Microsoft Albuquerque NM 87108 USA

ISBN: (纸本)9798350301298

Unified vision-language frameworks have greatly advanced in recent years, most of which adopt an encoder-decoder architecture to unify image-text tasks as sequence-to-sequence generation. However, existing video-language (VidL) models still require task-specific designs in model architecture and training objectives for each task. In this work, we explore a unified VidL framework LAVENDER, where Masked Language Modeling [13] (MLM) is used as the common interface for all pre-training and downstream tasks. Such unification leads to a simplified model architecture, where only a lightweight MLM head, instead of a decoder with much more parameters, is needed on top of the multimodal encoder. Surprisingly, experimental results show that this unified framework achieves competitive performance on 14 VidL benchmarks, covering video question answering, text-to-video retrieval and video captioning. Extensive analyses further demonstrate LAVENDER can (i) seamlessly support all downstream tasks with just a single set of parameter values when multi-task fine-tuned;(ii) generalize to various downstream tasks with limited training samples;and (iii) enable zero-shot evaluation on video question answering tasks. Code is available at https://***/microsoft/LAVENDER.

关键词： and reasoning language vision

来源：评论

学校读者我要写书评

暂无评论

Learning Debiased Representations via Conditional Attribute Interpolation

Learning Debiased Representations via Conditional Attribute ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Yi-Kai Wang, Qi-Wei Zhan, De-Chuan Ye, Han-Jia Nanjing Univ State Key Lab Novel Software Technol Nanjing Peoples R China

ISBN: (纸本)9798350301298

An image is usually described by more than one attribute like "shape" and "color". When a dataset is biased, i.e., most samples have attributes spuriously correlated with the target label, a Deep Neural Network (DNN) is prone to make predictions by the "unintended" attribute, especially if it is easier to learn. To improve the generalization ability when training on such a biased dataset, we propose a chi(2)-model to learn debiased representations. First, we design a.-shape pattern to match the training dynamics of a DNN and find Intermediate Attribute Samples (IASs) samples near the attribute decision boundaries, which indicate how the value of an attribute changes from one extreme to another. Then we rectify the representation with a chi-structured metric learning objective. Conditional interpolation among IASs eliminates the negative effect of peripheral attributes and facilitates retaining the intra-class compactness. Experiments show that chi(2)-model learns debiased representation effectively and achieves remarkable improvements on various datasets. Code is available at: https: //***/ZhangYikaii/chi-square

关键词： accountability ethics in vision fairness privacy Transparency

来源：评论

学校读者我要写书评

暂无评论

GeneCIS: A Benchmark for General Conditional Image Similarity

GeneCIS: A Benchmark for General Conditional Image Similarit...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Vaze, Sagar Carion, Nicolas Misra, Ishan Meta AI FAIR Menlo Pk CA 94025 USA Univ Oxford VGG Oxford England

ISBN: (纸本)9798350301298

We argue that there are many notions of 'similarity' and that models, like humans, should be able to adapt to these dynamically. This contrasts with most representation learning methods, supervised or self-supervised, which learn a fixed embedding function and hence implicitly assume a single notion of similarity. For instance, models trained on ImageNet are biased towards object categories, while a user might prefer the model to focus on colors, textures or specific elements in the scene. In this paper, we propose the GeneCIS ('genesis') benchmark, which measures models' ability to adapt to a range of similarity conditions. Extending prior work, our benchmark is designed for zero-shot evaluation only, and hence considers an open-set of similarity conditions. We find that baselines from powerful CLIP models struggle on GeneCIS and that performance on the benchmark is only weakly correlated with ImageNet accuracy, suggesting that simply scaling existing methods is not fruitful. We further propose a simple, scalable solution based on automatically mining information from existing image-caption datasets. We find our method offers a substantial boost over the baselines on GeneCIS, and further improves zero-shot performance on related image retrieval benchmarks. In fact, though evaluated zero-shot, our model surpasses state-of-the-art supervised models on MIT-States.

关键词： language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering

MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Jiang, Jingjing Zheng, Nanning Xi An Jiao Tong Univ Inst Artificial Intelligence & Robot Xian Peoples R China

ISBN: (纸本)9798350301298

Recently, finetuning pretrained vision-language models (VLMs) has been a prevailing paradigm for achieving state-of-the-art performance in VQA. However, as VLMs scale, it becomes computationally expensive, storage inefficient, and prone to overfitting when tuning full model parameters for a specific task in low-resource settings. Although current parameter-efficient tuning methods dramatically reduce the number of tunable parameters, there still exists a significant performance gap with full finetuning. In this paper, we propose MixPHM, a redundancy-aware parameter-efficient tuning method that outperforms full finetuning in low-resource VQA. Specifically, MixPHM is a lightweight module implemented by multiple PHM-experts in a mixture-of-experts manner. To reduce parameter redundancy, we reparameterize expert weights in a low-rank subspace and share part of the weights inside and across MixPHM. Moreover, based on our quantitative analysis of representation redundancy, we propose Redundancy Regularization, which facilitates MixPHM to reduce task-irrelevant redundancy while promoting task-relevant correlation. Experiments conducted on VQA v2, GQA, and OK-VQA with different low-resource settings show that our MixPHM outperforms state-of-the-art parameter-efficient methods and is the only one consistently surpassing full finetuning.

关键词： and reasoning language vision

来源：评论

学校读者我要写书评

暂无评论

Edges to Shapes to Concepts: Adversarial Augmentation for Robust vision

Edges to Shapes to Concepts: Adversarial Augmentation for Ro...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Tripathi, Aditay Singh, Rishubh Chakraborty, Anirban Shenoy, Pradeep Indian Inst Sci CDS Bangalore Karnataka India Google Res India Bangalore Karnataka India

ISBN: (纸本)9798350301298

Recent work has shown that deep vision models tend to be overly dependent on low-level or "texture" features, leading to poor generalization. Various data augmentation strategies have been proposed to overcome this so-called texture bias in DNNs. We propose a simple, lightweight adversarial augmentation technique that explicitly incentivizes the network to learn holistic shapes for accurate prediction in an object classification setting. Our augmentations superpose edgemaps from one image onto another image with shuffled patches, using a randomly determined mixing proportion, with the image label of the edgemap image. To classify these augmented images, the model needs to not only detect and focus on edges but distinguish between relevant and spurious edges. We show that our augmentations significantly improve classification accuracy and robustness measures on a range of datasets and neural architectures. As an example, for ViT-S, We obtain absolute gains on classification accuracy gains up to 6%. We also obtain gains of up to 28% and 8.5% on natural adversarial and out-of-distribution datasets like ImageNet-A (for ViTB) and ImageNet-R (for ViT-S), respectively. Analysis using a range of probe datasets shows substantially increased shape sensitivity in our trained models, explaining the observed improvement in robustness and classification accuracy.

关键词： Deep learning architectures and techniques

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 12 13 14 15 16 17 18 19 20 21 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：