检索结果-内蒙古大学图书馆

arXiv 2022年

作者： Yin, Junbo Fang, Jin Zhou, Dingfu Zhang, Liangjun Xu, Cheng-Zhong Shen, Jianbing Wang, Wenguan School of Computer Science Beijing Institute of Technology China Baidu Research United States National Engineering Laboratory of Deep Learning Technology and Application China SKL-IOTSC Cis University of Macau China ReLER Aaii University of Technology Sydney Australia

Dominated point cloud-based 3D object detectors in autonomous driving scenarios rely heavily on the huge amount of accurately labeled samples, however, 3D annotation in the point cloud is extremely tedious, expensive and time-consuming. To reduce the dependence on large supervision, semi-supervised learning (SSL) based approaches have been proposed. The Pseudo-Labeling methodology is commonly used for SSL frameworks, however, the low-quality predictions from the teacher model have seriously limited its performance. In this work, we propose a new Pseudo-Labeling framework for semi-supervised 3D object detection, by enhancing the teacher model to a proficient one with several necessary designs. First, to improve the recall of pseudo labels, a Spatial-temporal Ensemble (STE) module is proposed to generate sufficient seed boxes. Second, to improve the precision of recalled boxes, a Clustering-based Box Voting (CBV) module is designed to get aggregated votes from the clustered seed boxes. This also eliminates the necessity of sophisticated thresholds to select pseudo labels. Furthermore, to reduce the negative influence of wrongly pseudo-labeled samples during the training, a soft supervision signal is proposed by considering Box-wise Contrastive learning (BCL). The effectiveness of our model is verified on both ONCE and Waymo datasets. For example, on ONCE, our approach significantly improves the baseline by 9.51 mAP. Moreover, with half annotations, our model outperforms the oracle model with full annotations on Waymo. Copyright © 2022, The Authors. All rights reserved.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking

arXiv

引用

arXiv 2021年

作者： Jiang, Nan Wang, Kuiran Peng, Xiaoke Yu, Xuehui Wang, Qiang Xing, Junliang Li, Guorong Zhao, Jian Guo, Guodong Han, Zhenjun Beijing101408 China Beijing China Institute of North Electronic Equipment Beijing China Institute of Deep Learning Baidu Research and National Engineering Laboratory for Deep Learning Technology and Application China

Unmanned Aerial Vehicle (UAV) offers lots of applications in both commerce and recreation. Therefore, perception of the status of UAVs is crucially important. In this paper, we consider the task of tracking UAVs, providing rich information such as location and trajectory. To facilitate research on this topic, we introduce a new benchmark, referred to as Anti-UAV, opening up a promising direction for UAV tracking in a long distance with more than 300 video pairs containing over 580k manually annotated bounding boxes. Furthermore, the advancement of addressing research challenges in Anti-UAV can help the design of anti-UAV systems, leading to better surveillance of UAVs. Accordingly, a simple yet effective approach named dual-flow semantic consistency (DFSC) is proposed for UAV tracking. Modulated by the semantic flow across video sequences, the tracker learns more robust class-level semantic information and obtains more discriminative instance-level features. Experiments show the significant performance gain of our proposed approach over state-of-the-art trackers, and the challenging aspects of Anti-UAV. The Anti-UAV benchmark and the code of the proposed approach will be publicly available at https://***/ucas-vg/Anti-UAV. Copyright © 2021, The Authors. All rights reserved.

关键词： Unmanned aerial vehicles (UAV)

来源：评论

学校读者我要写书评

暂无评论

Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention

arXiv

引用

arXiv 2021年

作者： Wu, Sitong Wu, Tianyi Tan, Haoru Guo, Guodong Institute of Deep Learning Baidu Research Beijing China National Engineering Laboratory for Deep Learning Technology and Application Beijing China School of Artificial Intelligence University of Chinese Academy of Sciences Beijing China

Recently, Transformers have shown promising performance in various vision tasks. To reduce the quadratic computation complexity caused by the global self-attention, various methods constrain the range of attention within a local region to improve its efficiency. Consequently, their receptive fields in a single attention layer are not large enough, resulting in insufficient context modeling. To address this issue, we propose a Pale-Shaped self-Attention (PS-Attention), which performs self-attention within a pale-shaped region. Compared to the global self-attention, PS-Attention can reduce the computation and memory costs significantly. Meanwhile, it can capture richer contextual information under the similar computation complexity with previous local self-attention mechanisms. Based on the PS-Attention, we develop a general Vision Transformer backbone with a hierarchical architecture, named Pale Transformer, which achieves 83.4%, 84.3%, and 84.9% Top-1 accuracy with the model size of 22M, 48M, and 85M respectively for 224 × 224 ImageNet-1K classification, outperforming the previous Vision Transformer backbones. For downstream tasks, our Pale Transformer backbone performs better than the recent state-of-the-art CSWin Transformer by a large margin on ADE20K semantic segmentation and COCO object detection & instance segmentation. The code will be released on https://***/BRIDL/PaddleViT. © 2021, CC BY.

关键词： Instance Segmentation

来源：评论

学校读者我要写书评

暂无评论

The Combined Effect of Konjac Glucomannan and Ultrasound Treatment on the Interaction of Surimi and Gluten Protein

SSRN

引用

SSRN 2024年

作者： Cao, Geng Yang, Zuoqian Li, Xiangzheng He, Xiaoyang Song, Shuang Wen, Chengrong National Engineering Research Center for Seafood State Key Laboratory of Marine Food Processing and Safety Control Collaborative Innovation Center of Seafood Deep Processing National Engineering Research Center of Seafood National & Local Joint Engineering Laboratory for Marine Bioactive Polysaccharide Development and Application School of Food Science and Technology Dalian Polytechnic University Dalian116034 China School of Information Science and Engineering Dalian Polytechnic University Dalian116034 China

Gluten, surimi and mixted protein were analyzed for changes in protein structure after treatment with konjac glucomannan (KGM), ultrasound (U) and konjac glucomannan-ultrasound (UKGM). In this study, molecular weight, fluorescence intensity, intermolecular force, disulfide bond, microstructure, SDS-soluble protein and free amino acid content were investigated. All treatments increased, Imax, SH and free amino groups of gluten, while decreasing SS bonds. KGM and U treatment decreased the SDS-soluble protein, SH groups and Imax of surimi protein while increasing SS bonds. However, the effect of UKGM treatment was the opposite. For the mixed proteins, all treatments were found to decrease the Imax and SH groups, while increasing the SS group. KGM and UKGM treatments increased protein with ionic and hydrogen bond contents, while U treatment increased hydrophobic bonds. Consequently, UKGM treatment was able to improve the interaction of surimi and gluten and resulted in higher structural properties of the mixed protein. © 2024, The Authors. All rights reserved.

关键词： Proteins

来源：评论

学校读者我要写书评

暂无评论

Fully transformer networks for semantic image segmentation

arXiv

引用

arXiv 2021年

作者： Wu, Sitong Wu, Tianyi Lin, Fangjian Tian, Shengwei Guo, Guodong Institute of Deep Learning Baidu Research Beijing100085 China National Engineering Laboratory for Deep Learning Technology and Application Beijing100085 China Shengwei Tian are with School of Software XinJiang University Urumqi China

Transformers have shown impressive performance in various natural language processing and computer vision tasks, due to the capability of modeling long-range dependencies. Recent progress has demonstrated that combining such Transformers with CNN-based semantic image segmentation models is very promising. However, it is not well studied yet on how well a pure Transformer based approach can achieve for image segmentation. In this work, we explore a novel framework for semantic image segmentation, which is encoder-decoder based Fully Transformer Networks (FTN). Specifically, we first propose a Pyramid Group Transformer (PGT) as the encoder for progressively learning hierarchical features, meanwhile reducing the computation complexity of the standard Visual Transformer (ViT). Then, we propose a Feature Pyramid Transformer (FPT) to fuse semantic-level and spatial-level information from multiple levels of the PGT encoder for semantic image segmentation. Surprisingly, this simple baseline can achieve better results on multiple challenging semantic segmentation and face parsing benchmarks, including PASCAL Context, ADE20K, COCOStuff, and CelebAMask-HQ. The source code will be released on https://***/BR-IDL/PaddleViT. Copyright © 2021, The Authors. All rights reserved.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

deep Feedforward Sequential Memory Networks Based Mispronunciation Detection for Tibetan Students' Mandarin 2

Deep Feedforward Sequential Memory Networks Based Mispronunc...

引用

2nd International Conference on Information Science and Education, ICISE-IE 2021

作者： Gan, Zhenye Zhao, Tianqin Yu, Xinke Yang, Hongwu College of Physics and Electronic Engineering Northwest Normal University Engineering Research Center of Gansu Province for Intelligent Information Technology and Application LanZhou China School of Educational Technology Northwest Normal University National and Local Joint Engineering Laboratory for Learning Analytics Technology of Internet Education Data LanZhou China

ISBN: (纸本)9781665438292

Computer assisted pronunciation training system (CAPT) can detect the wrong pronunciation produced by nonnative speakers and provide positive feedback. CAPT is helpful to improve the pronunciation level for L2 learners' accurately. Tibetan students' Mandarin is influenced by their native language pronunciation habits. So, this feature leads to their pronunciation being obviously different from that of standard mandarin. This paper used CNN model as a fundamental, and we introduced acoustic model: DFSMN and CTC. This acoustic model implemented a method of speech recognition on Tibetan students' mandarin mispronunciation detection. In order to continue improving the detection performance, we used extended initial final (XIF) as bias primitives and design 64 bias types. Experiment results show that the proposed method in this paper can effectively detect mispronunciation and provide correct feedback, with the DA of 88.02%, FRR of 7.95% and FAR of 25.74%. © 2021 IEEE

关键词： Students

来源：评论

学校读者我要写书评

暂无评论

Discrepancy Matters: learning from Inconsistent Decoder Features for Consistent Semi-supervised Medical Image Segmentation

arXiv

引用

arXiv 2023年

作者： Zeng, Qingjie Xie, Yutong Lu, Zilin Lu, Mengkang Xia, Yong The National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology School of Computer Science and Engineering Northwestern Polytechnical University Xi’an710072 China The Australian Institute for Machine Learning The University of Adelaide AdelaideSA5000 Australia

Semi-supervised learning (SSL) has been proven beneficial for mitigating the issue of limited labeled data especially on the task of volumetric medical image segmentation. Unlike previous SSL methods which focus on exploring highly confident pseudo-labels or developing consistency regularization schemes, our empirical findings suggest that inconsistent decoder features emerge naturally when two decoders strive to generate consistent predictions. Based on the observation, we first analyze the treasure of discrepancy in learning towards consistency, under both pseudo-labeling and consistency regularization settings, and subsequently propose a novel SSL method called LeFeD, which learns the feature-level discrepancy obtained from two decoders, by feeding the discrepancy as a feedback signal to the encoder. The core design of LeFeD is to enlarge the difference by training differentiated decoders, and then learn from the inconsistent information iteratively. We evaluate LeFeD against eight state-of-the-art (SOTA) methods on three public datasets. Experiments show LeFeD surpasses competitors without any bells and whistles such as uncertainty estimation and strong constraints, as well as setting a new state-of-the-art for semi-supervised medical image segmentation. Code is available at https://***/maxwell0027/LeFeD © 2023, CC BY.

关键词： Decoding

来源：评论

学校读者我要写书评

暂无评论

FusionPainting: Multimodal fusion with adaptive attention for 3D object detection

arXiv

引用

arXiv 2021年

作者： Xu, Shaoqing Zhou, Dingfu Fang, Jin Yin, Junbo Bin, Zhou Zhang, Liangjun Beihang University Beijing100083 China Robotics and Autonomous Driving Laboratory Baidu Research National Engineering Laboratory of Deep Learning Technology and Application China Beijing Institute of Technology Beijing China

Accurate detection of obstacles in 3D is an essential task for autonomous driving and intelligent transportation. In this work, we propose a general multimodal fusion framework FusionPainting to fuse the 2D RGB image and 3D point clouds at a semantic level for boosting the 3D object detection task. Especially, the FusionPainting framework consists of three main modules: a multi-modal semantic segmentation module, an adaptive attention-based semantic fusion module, and a 3D object detector. First, semantic information is obtained for 2D image and 3D Lidar point clouds based on 2D and 3D segmentation approaches. Then the segmentation results from different sensors are adaptively fused based on the proposed attention-based semantic fusion module. Finally, the point clouds painted with the fused semantic label are sent to the 3D detector for obtaining the 3D objection results. The effectiveness of the proposed framework has been verified on the large-scale nuScenes detection benchmark by comparing with three different baselines. The experimental results show that the fusion strategy can significantly improve the detection performance compared to the methods using only point clouds, and the methods using point clouds only painted with 2D segmentation information. Furthermore, the proposed approach outperforms other state-of-the-art methods on the nuScenes testing benchmark. Code will be available at https://***/Shaoqing26/FusionPainting/. © 2021, CC BY-NC-ND.

关键词： Object recognition

来源：评论

学校读者我要写书评

暂无评论

IAFA: Instance-aware feature aggregation for 3D object detection from a single image

arXiv

引用

arXiv 2021年

作者： Zhou, Dingfu Song, Xibin Dai, Yuchao Yin, Junbo Lu, Feixiang Fang, Jin Liao, Miao Zhang, Liangjun Baidu Research China National Engineering Laboratory of Deep Learning Technology and Application Beijing China Northwestern Polytechnical University Xi’an China Beijing Institute of Technology Beijing China

3D object detection from a single image is an important task in Autonomous Driving (AD), where various approaches have been proposed. However, the task is intrinsically ambiguous and challenging as single image depth estimation is already an ill-posed problem. In this paper, we propose an instance-aware approach to aggregate useful information for improving the accuracy of 3D object detection with the following contributions. First, an instance-aware feature aggregation (IAFA) module is proposed to collect local and global features for 3D bounding boxes regression. Second, we empirically find that the spatial attention module can be well learned by taking coarse-level instance annotations as a supervision signal. The proposed module has significantly boosted the performance of the baseline method on both 3D detection and 2D bird-eye’s view of vehicle detection among all three categories. Third, our proposed method outperforms all single image-based approaches (even these methods trained with depth as auxiliary inputs) and achieves state-of-the-art 3D detection performance on the KITTI benchmark. © 2021, CC BY-NC-SA.

关键词： Object recognition

来源：评论

学校读者我要写书评

暂无评论

Teaching Text-to-Image Models to Communicate in Dialog

arXiv

引用

arXiv 2023年

作者： Sun, Xiaowen Feng, Jiazhan Wang, Yuxuan Lai, Yuxuan Shen, Xingyu Zhao, Dongyan Wangxuan Institute of Computer Technology Peking University China National Key Laboratory of General Artificial Intelligence BIGAI China Engineering Research Center of Integration and Application of Digital Learning Technology Ministry of Education China Department of Computer Science The Open University of China China

A picture is worth a thousand words, thus, it is crucial for conversational agents to understand, perceive, and effectively respond with pictures. However, we find that directly employing conventional image generation techniques is inadequate for conversational agents to produce image responses effectively. In this paper, we focus on the innovative dialog-to-image generation task, where the model synthesizes a high-resolution image aligned with the given dialog context as a response. To tackle this problem, we design a tailored fine-tuning approach on the top of state-of-the-art text-to-image generation models to fully exploit the structural and semantic features in dialog context during image generation. Concretely, we linearize the dialog context with specific indicators to maintain the dialog structure, and employ in-domain data to alleviate the style mismatch between dialog-to-image and conventional image generation tasks. Empirical results on PhotoChat and MMDialog Corpus show that our approach brings consistent and remarkable improvement with 3 state-of-the-art pre-trained text-to-image generation backbones. Copyright © 2023, The Authors. All rights reserved.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：