检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

19,636 篇 会议
49 篇 期刊文献
3 册 图书

馆藏范围

19,687 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

12,587 篇 工学
- 10,355 篇 计算机科学与技术...
- 2,449 篇 机械工程
- 2,010 篇 软件工程
- 815 篇 光学工程
- 599 篇 电气工程
- 433 篇 控制科学与工程
- 329 篇 信息与通信工程
- 211 篇 测绘科学与技术
- 80 篇 生物医学工程（可授...
- 75 篇 生物工程
- 69 篇 电子科学与技术（可...
- 67 篇 仪器科学与技术
- 37 篇 建筑学
- 36 篇 土木工程
- 34 篇 力学（可授工学、理...
- 31 篇 航空宇航科学与技...
- 29 篇 安全科学与工程
- 23 篇 交通运输工程
- 21 篇 化学工程与技术
- 20 篇 材料科学与工程（可...
3,435 篇 医学
- 3,434 篇 临床医学
1,980 篇 理学
- 1,001 篇 数学
- 972 篇 物理学
- 356 篇 统计学（可授理学、...
- 340 篇 生物学
- 235 篇 系统科学
- 26 篇 化学
262 篇 管理学
- 141 篇 管理科学与工程(可...
- 124 篇 图书情报与档案管...
- 26 篇 工商管理
19 篇 法学
12 篇 农学
8 篇 教育学
6 篇 经济学
4 篇 艺术学
2 篇 军事学

主题

7,949 篇 computer vision
2,773 篇 training
2,712 篇 pattern recognit...
1,771 篇 computational mo...
1,660 篇 visualization
1,427 篇 cameras
1,383 篇 three-dimensiona...
1,345 篇 shape
1,236 篇 face recognition
1,222 篇 feature extracti...
1,213 篇 image segmentati...
1,117 篇 robustness
1,094 篇 semantics
977 篇 layout
961 篇 object detection
946 篇 benchmark testin...
944 篇 computer archite...
931 篇 codes
897 篇 computer science
861 篇 deep learning

机构

174 篇 univ sci & techn...
159 篇 carnegie mellon ...
148 篇 univ chinese aca...
144 篇 chinese univ hon...
109 篇 microsoft resear...
103 篇 zhejiang univ pe...
103 篇 tsinghua univ pe...
99 篇 swiss fed inst t...
92 篇 tsinghua univers...
89 篇 microsoft res as...
88 篇 shanghai ai lab ...
81 篇 zhejiang univers...
76 篇 alibaba grp peop...
74 篇 university of sc...
73 篇 hong kong univ s...
73 篇 university of ch...
72 篇 peking univ peop...
68 篇 shanghai jiao to...
66 篇 univ oxford oxfo...
65 篇 google res mount...

作者

79 篇 van gool luc
70 篇 zhang lei
60 篇 timofte radu
48 篇 yang yi
48 篇 luc van gool
46 篇 xiaoou tang
43 篇 darrell trevor
43 篇 tian qi
42 篇 loy chen change
42 篇 sun jian
42 篇 li fei-fei
40 篇 li stan z.
39 篇 qi tian
36 篇 chen xilin
36 篇 torralba antonio
35 篇 vasconcelos nuno
35 篇 shan shiguang
35 篇 liu yang
34 篇 liu xiaoming
34 篇 tao dacheng

语言

19,682 篇 英文
3 篇 中文
2 篇 日文
1 篇 其他

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015"

共 19688 条记录，以下是451-460 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos

3DGStream: On-the-Fly Training of 3D Gaussians for Efficient...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Sun, Jiakai Jiao, Han Li, Guangyuan Zhang, Zhanjie Zhao, Lei Xing, Wei Zhejiang Univ Hangzhou Peoples R China

ISBN: (纸本)9798350353006

Constructing photo-realistic Free-Viewpoint Videos ( FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advance-ments achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time render-ing. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specifically, we utilize 3D Gaussians ( 3DGs) to represent the scene. Instead of the naive ap-proach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strat-egy to handle emerging objects in dynamic scenes. Exper-iments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods.

关键词： 3D Gaussian Splatting 3D vision Dynamic Scene Reconstruction Free-Viewpoint Video Streaming Media

来源：评论

学校读者我要写书评

暂无评论

Three Pillars improving vision Foundation Model Distillation for Lidar

Three Pillars improving Vision Foundation Model Distillation...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Puy, Gilles Gidaris, Spyros Boulch, Alexandre Simeoni, Oriane Sautier, Corentin Perez, Patrick Bursucl, Andrei Marlet, Renaud Valeo ai Paris France Kyutai Paris France Univ Gustave Eiffel CNRS LIGM Ecole Ponts Marne La Vallee France

ISBN: (纸本)9798350353006

Self-supervised image backbones can be used to address complex 2D tasks (e.g., semantic segmentation, object discovery) very efficiently and with little or no downstream supervision. Ideally, 3D backbones for lidar should be able to inherit these properties after distillation of these powerful 2D features. The most recent methods for image-to-lidar distillation on autonomous driving data show promising results, obtained thanks to distillation methods that keep improving. Yet, we still notice a large performance gap when measuring by linear probing the quality of distilled vs fully supervised features. In this work, instead of focusing only on the distillation method, we study the effect of three pillars for distillation: the 3D backbone, the pretrained 2D backbone, and the pretraining 2D+3D dataset. In particular, thanks to our scalable distillation method named ScaLR, we show that scaling the 2D and 3D backbones and pretraining on diverse datasets leads to a substantial improvement of the feature quality. This allows us to significantly reduce the gap between the quality of distilled and fully-supervised 3D features, and to improve the robustness of the pretrained backbones to domain gaps and perturbations. The code is available at https://***/valeoai/ScaLR.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries

360Loc: A Dataset and Benchmark for Omnidirectional Visual L...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Huang, Huajian Liu, Changkun Zhu, Yipeng Cheng, Hui Braud, Tristan Yeung, Sai-Kit Hong Kong Univ Sci & Technol Hong Kong Peoples R China Sun Yat Sen Univ Guangzhou Peoples R China

ISBN: (纸本)9798350353006

Portable 360 degrees cameras are becoming a cheap and efficient tool to establish large visual databases. By capturing omnidirectional views of a scene, these cameras could expedite building environment models that are essential for visual localization. However, such an advantage is often overlooked due to the lack of valuable datasets. This paper introduces a new benchmark dataset, 360Loc, composed of 360 degrees images with ground truth poses for visual localization. We present a practical implementation of 360 degrees mapping combining 360 degrees images with lidar data to generate the ground truth 6DoF poses. 360Loc is the first dataset and benchmark that explores the challenge of cross-device visual positioning, involving 360 degrees reference frames, and query frames from pinhole, ultra-wide FoV fisheye, and 360 degrees cameras. We propose a virtual camera approach to generate lower-FoV query frames from 360 degrees images, which ensures a fair comparison of performance among different query types in visual localization tasks. We also extend this virtual camera approach to feature matching-based and pose regression-based methods to alleviate the performance loss caused by the cross-device domain gap, and evaluate its effectiveness against state-of-the-art baselines. We demonstrate that omnidirectional visual localization is more robust in challenging large-scale scenes with symmetries and repetitive structures. These results provide new insights into 360-camera mapping and omnidirectional visual localization with cross-device queries. Project Page and dataset: https://***/research/360Loc/.

关键词： omnidirectional vision visual lozalization

来源：评论

学校读者我要写书评

暂无评论

OpenEQA: Embodied Question Answering in the Era of Foundation Models

OpenEQA: Embodied Question Answering in the Era of Foundatio...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Majumdar, Arjun Ajay, Anurag Zhang, Xi Aohan Punya, Pranav Yenamandra, Sriram Henaff, Mikael Silwal, Sneha Mcvay, Paul Maksymets, Oleksandr Arnaud, Sergio Yadav, Karmesh Li, Qiyang Newman, Ben Sharma, Mohit Berges, Vincent Zhang, Shiqi Agrawal, Pulkit Bisk, Yonatan Batra, Dhruv Kalakrishnan, Mrinal Meier, Franziska Paxton, Chris Sax, Alexander Rajeswaran, Aravind Georgia Tech Atlanta GA 30332 USA MIT 77 Massachusetts Ave Cambridge MA 02139 USA SUNY Binghamton Binghamton NY USA Meta AI Menlo Pk CA USA Univ Calif Berkeley Berkeley CA USA CMU Pittsburgh PA USA Meta Fundamental AI Res FAIR Menlo Pk CA USA

ISBN: (纸本)9798350353006

We present a modern formulation of Embodied Question Answering (EQA) as the task of understanding an environment well enough to answer questions about it in natural language. An agent can achieve such an understanding by either drawing upon episodic memory, exemplified by agents on smart glasses, or by actively exploring the environment, as in the case of mobile robots. We accompany our formulation with OpenEQA - the first open-vocabulary benchmark dataset for EQA supporting both episodic memory and active exploration use cases. OpenEQA contains over 1600 high-quality human generated questions drawn from over 180 real-world environments. In addition to the dataset, we also provide an automatic LLM-powered evaluation protocol that has excellent correlation with human judgement. Using this dataset and evaluation protocol, we evaluate several state-of-the-art foundation models including GPT-4V, and find that they significantly lag behind human-level performance. Consequently, OpenEQA stands out as a straightforward, measurable, and practically relevant benchmark that poses a considerable challenge to current generation of foundation models. We hope this inspires and stimulates future research at the intersection of Embodied AI, conversational agents, and world models.

关键词： Embodied AI Embodied Question Answering vision-Language Models

来源：评论

学校读者我要写书评

暂无评论

Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement

Egocentric Whole-Body Motion Capture with FisheyeViT and Dif...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wang, Jian Cao, Zhe Luvizon, Diogo Liu, Lingjie Sarkar, Kripasindhu Tang, Danhang Beeler, Thabo Theobalt, Christian MPI Informat & Saarland Informat Campus Saarbrucken Germany Google Mountain View CA USA Univ Penn Philadelphia PA USA Saarbrucken Res Ctr Visual Com Interact & Artific Saarbrucken Germany

ISBN: (纸本)9798350353013;9798350353006

In this work, we explore egocentric whole-body motion capture using a single fisheye camera, which simultaneously estimates human body and hand motion. This task presents significant challenges due to three factors: the lack of high-quality datasets, fisheye camera distortion, and human body self-occlusion. To address these challenges, we propose a novel approach that leverages FisheyeViT to extract fisheye image features, which are subsequently converted into pixel-aligned 3D heatmap representations for 3D human body pose prediction. For hand tracking, we incorporate dedicated hand detection and hand pose estimation networks for regressing 3D hand poses. Finally, we develop a diffusion-based whole-body motion prior model to refine the estimated whole-body motion while accounting for joint uncertainties. To train these networks, we collect a large synthetic dataset, EgoWholeBody, comprising 840,000 high-quality egocentric images captured across a diverse range of whole-body motion sequences. Quantitative and qualitative evaluations demonstrate the effectiveness of our method in producing high-quality whole-body motion estimates from a single egocentric camera.

关键词： Diffusion-based 3D Pose Estimation Diffusion-Based Motion Refinement Egocentric Human Pose Egocentric vision FisheyeViT Human Motion Capture Whole-Body Motion Capture

来源：评论

学校读者我要写书评

暂无评论

HRVDA: High-Resolution Visual Document Assistant

HRVDA: High-Resolution Visual Document Assistant

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Liu, Chaohu Yin, Kun Cao, Haoyu Jiang, Xinghua Li, Xin Liu, Yinsong Jiang, Deqiang Sun, Xing Xu, Linli Univ Sci & Technol China Sch Comp Sci & Technol Hefei Anhui Peoples R China State Key Lab Cognit Intelligence Hefei Anhui Peoples R China Tencent YouTu Lab Shanghai Peoples R China

ISBN: (纸本)9798350353006

Leveraging vast training data, multimodal large language models (MLLMs) have demonstrated formidable general visual comprehension capabilities and achieved remarkable performance across various tasks. However, their performance in visual document understanding still leaves much room for improvement. This discrepancy is primarily attributed to the fact that visual document understanding is a fine-grained prediction task. In natural scenes, MLLMs typically use low-resolution images, leading to a substantial loss of visual information. Furthermore, general-purpose MLLMs do not excel in handling document-oriented instructions. In this paper, we propose a High-Resolution Visual Document Assistant (HRVDA), which bridges the gap between MLLMs and visual document understanding. This model employs a content filtering mechanism and an instruction filtering module to separately filter out the content-agnostic visual tokens and instruction-agnostic visual tokens, thereby achieving efficient model training and inference for high-resolution images. In addition, we construct a document-oriented visual instruction tuning dataset and apply a multi-stage training strategy to enhance the model's document modeling capabilities. Extensive experiments demonstrate that our model achieves state-of-the-art performance across multiple document understanding datasets, while maintaining training efficiency and inference speed comparable to low-resolution models.

关键词： Document understanding Multimodal vision-language model

来源：评论

学校读者我要写书评

暂无评论

Seeing the Unseen: Visual Common Sense for Semantic Placement

Seeing the Unseen: Visual Common Sense for Semantic Placemen...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Ramrakhya, Ram Kembhavi, Aniruddha Batra, Dhruv Kira, Zsolt Zeng, Kuo-Hao Weihs, Luca Georgia Inst Technol Atlanta GA 30332 USA PRIOR Allen Inst AI Seattle WA USA PRIOR AI2 Seattle WA USA

ISBN: (纸本)9798350353006

computer vision tasks typically involve describing what is present in an image (e.g. classification, detection, segmentation, and captioning). We study a visual common sense task that requires understanding 'what is not present'. Specifically, given an image (e.g. of a living room) and a name of an object ("cushion"), a vision system is asked to predict semantically-meaningful regions (masks or bounding boxes) in the image where that object could be placed or is likely be placed by humans (e.g. on the sofa). We call this task: Semantic Placement (SP) and believe that such common-sense visual understanding is critical for assitive robots (tidying a house), AR devices (automatically rendering an object in the user's space), and visually-grounded chatbots with common sense. Studying the invisible is hard. Datasets for image description are typically constructed by curating relevant images (e.g. via image search with object names) and asking humans to annotate the contents of the image;neither of those two steps are straightforward for objects not present in the image. We overcome this challenge by operating in the opposite direction: we start with an image of an object in context, which is easy to find online, and then remove that object from the image via inpainting. This automated pipeline converts unstructured web data into a dataset comprising pairs of images with/without the object. With this proposed data generation pipeline, we collect a novel dataset, containing similar to 1.3M images across 9 object categories. We then train a SP prediction model, called CLIP-UNet, on our dataset. The CLIP-UNet outperforms existing VLMs and baselines that combine semantic priors with object detectors, generalizes well to real-world and simulated images and exhibits semantics-aware reasoning for object placement. In our user studies, we find that the SP masks predicted by CLIP-UNet are favored 43.7% and 31.3% times when comparing against the 4 SP baselines on real and simulated

关键词： Common Sense Reasoning computer vision Embodied AI

来源：评论

学校读者我要写书评

暂无评论

PromptSync: Bridging Domain Gaps in vision-Language Models through Class-Aware Prototype Alignment and Discrimination

PromptSync: Bridging Domain Gaps in Vision-Language Models t...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Khandelwal, Anant Glance AI Bangalore Karnataka India

ISBN: (纸本)9798350365474

The potential for zero-shot generalization in vision-language (V-L) models such as CLIP has spurred their widespread adoption in addressing numerous downstream tasks. Previous methods have employed test-time prompt tuning to adapt the model to unseen domains, but they overlooked the issue of imbalanced class distributions. In this study, we explicitly address this problem by employing class-aware prototype alignment weighted by mean class probabilities obtained for the test sample and filtered augmented views. Additionally, we ensure that the class probabilities are as accurate as possible by performing prototype discrimination using contrastive learning. The combination of alignment and discriminative loss serves as a geometric regularizer, preventing the prompt representation from collapsing onto a single class and effectively bridging the distribution gap between the source and test domains. Our method, named PromptSync, synchronizes the prompts for each test sample on both the text and vision branches of the V-L model. In empirical evaluations on the domain generalization benchmark, our method outperforms previous best methods by 2.33% in overall performance, by 1% in base-to-novel generalization, and by 2.84% in cross-dataset transfer tasks.

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Panda-70M: Captioning 70M Videos with Multiple Cross-Modalit...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Chen, Tsai-Shien Siarohin, Aliaksandr Menapace, Willi Deyneka, Ekaterina Chao, Hsiang-wei Jeon, Byung Eun Fang, Yuwei Lee, Hsin-Ying Ren, Jian Yang, Ming-Hsuan Tulyakov, Sergey Snap Inc Santa Monica CA 90405 USA Univ Calif Merced Merced CA 95343 USA Univ Trento Trento Italy Snap Santa Monica CA USA

ISBN: (纸本)9798350353006

The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to collect. First of all, manual labeling is more time-consuming, as it requires an annotator to watch an entire video. Second, videos have a temporal dimension, consisting of several scenes stacked together, and showing multiple actions. Accordingly, to establish a video dataset with high- quality captions, we propose an automatic approach leveraging multimodal inputs, such as textual video description, subtitles, and individual video frames. Specifically, we curate 3.8M high-resolution videos from the publicly available HD-VILA-100M dataset. We then split them into semantically consistent video clips, and apply multiple cross-modality teacher models to obtain captions for each video. Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation. In this way, we get 70M videos paired with high-quality text captions. We dub the dataset as Panda-70M. We show the value of the proposed dataset on three downstream tasks: video captioning, video and text retrieval, and text-driven video generation. The models trained on the proposed data score substantially better on the majority of metrics across all the tasks.

关键词： Multimodal learning Video captioning vision-language dataset

来源：评论

学校读者我要写书评

暂无评论

Multi-criteria Token Fusion with One-step-ahead Attention for Efficient vision Transformers

Multi-criteria Token Fusion with One-step-ahead Attention fo...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Lee, Sanghyeok Choi, Joonmyung Kim, Hyunwoo J. Korea Univ Dept Comp Sci & Engn Seoul South Korea

ISBN: (纸本)9798350353006

vision Transformer (ViT) has emerged as a prominent backbone for computer vision. For more efficient ViTs, recent works lessen the quadratic cost of the self- attention layer by pruning or fusing the redundant tokens. However, these works faced the speed-accuracy trade-off caused by the loss of information. Here, we argue that token fusion needs to consider diverse relations between tokens to minimize information loss. In this paper, we propose a Multi-criteria Token Fusion (MCTF), that gradually fuses the tokens based on multi-criteria (i.e., similarity, informativeness, and size of fused tokens). Further, we utilize the one-step-ahead attention, which is the improved approach to capture the informativeness of the tokens. By training the model equipped with MCTF using a token reduction consistency, we achieve the best speed-accuracy tradeoff in the image classification (ImageNet1K). Experimental results prove that MCTF consistently surpasses the previous reduction methods with and without training. Specifically, DeiT-T and DeiT-S with MCTF reduce FLOPs by about 44% while improving the performance (+0.5%, and +0.3%) over the base model, respectively. We also demonstrate the applicability of MCTF in various vision Transformers (e.g., T2T-ViT, LV-ViT), achieving at least 31% speedup without performance degradation. Code is available at https://***/mlvlab/MCTF.

关键词： Efficient ViTs Token Fusion Token Merging Token Reduction

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 42 43 44 45 46 47 48 49 50 51 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：