检索结果-内蒙古大学图书馆

49th Annual Conference of the IEEE Industrial Electronics Society, IECON 2023

作者： Chen, Xinyu Zhao, Meng Shi, Fan Zhang, Meng'en He, Yu Chen, Shengyong School of Computer Science and Engineering Tianjin University of Technology Laboratory of Computer Vision and System of Ministry of Education Tianjin300384 China Technology and Engineering Center for Space Utilization Chinese Academy of Sciences Key Laboratory of Space Utilization Beijing100190 China

ISBN: (纸本)9798350331820

With the success of multimodal pre-training models in the video-language field and various downstream tasks, previous multimodal models used 3DCNN networks as video feature extractors, which have limitations in interacting and fusing with text features. This paper proposes a multimodal pre-training model that utilizes a Video-Swin-Transformer-based network to encode both video and text data, to achieve better performance in video understanding. The model consists of four modules: video encoder, text encoder, interact encoder, and caption decoder to accomplish the task of ocean scene video captioning. A dataset of ocean scene videos, including various content types such as sea surfaces and shores, is also constructed. The training process is divided into two stages: pre-training and fine-tuning. Pre-training is performed on the Howto100m dataset to allow the model to learn video captions in natural scenes and complete video-language matching tasks. The fine-tuning stage is then performed on the ocean1000 dataset to better understand the events and content in ocean scene videos and generate captions that conform to ocean scene video descriptions. The model achieves satisfying results on both the public dataset YouCook2 and the proprietary dataset Ocean1000, demonstrating its ability in video-text information fusion and interaction. © 2023 IEEE.

关键词： Signal encoding

来源：评论

学校读者我要写书评

暂无评论

Learning to see speckle in the weak laser field through multimode fiber

引用

Optoelectronics Letters 2025年

作者： JI Yunqi SONG Binbin LI Xueqing LI Yonghui The Engineering Research Center of Learning-Based Intelligent System (Ministry of Education)， the Key Laboratory of Computer Vision and Systems (Ministry of Education)， and the School of Computer Science and Engineering， Tianjin University of Technology

Multimode fibers (MMFs) have great potential for endoscopic imaging due to the high number of modes and a small core diameter. Deep learning based on neural networks has received increasing attention in the field of scattering image reconstruction. However， most studies focus on designing complex network architectures to improve reconstruction， but these network models struggle to reconstruct images in a weak laser field. In the paper， a lightweight generative adversarial network model combined with a histogram specification algorithm is designed to reconstruct speckles in the weak laser field through MMF. Experimental results show that the reconstruction results of our algorithm have better metrics. Moreover， the model demonstrates excellent cross-domain generalization ability with regards to the Fashion-MNIST dataset. It is worth mentioning that we found that the speckles after inactivation still retain the ability to be reconstructed， which enhances the robustness of the model

关键词：

来源：评论

学校读者我要写书评

暂无评论

SCSegamba: Lightweight Structure-Aware vision Mamba for Crack Segmentation in Structures

arXiv

引用

arXiv 2025年

作者： Liu, Hui Jia, Chen Shi, Fan Cheng, Xu Chen, Shengyong School of Computer Science and Engineering Tianjin University of Technology China Engineering Research Center of Learning-Based Intelligent System Ministry of Education Key Laboratory of Computer Vision and System Ministry of Education

Pixel-level segmentation of structural cracks across various scenarios remains a considerable challenge. Current methods encounter challenges in effectively modeling crack morphology and texture, facing challenges in balancing segmentation quality with low computational resource usage. To overcome these limitations, we propose a lightweight Structure-Aware vision Mamba Network (SCSegamba), capable of generating high-quality pixel-level segmentation maps by leveraging both the morphological information and texture cues of crack pixels with minimal computational cost. Specifically, we developed a Structure-Aware Visual State Space module (SAVSS), which incorporates a lightweight Gated Bottleneck Convolution (GBC) and a Structure-Aware Scanning Strategy (SASS). The key insight of GBC lies in its effectiveness in modeling the morphological information of cracks, while the SASS enhances the perception of crack topology and texture by strengthening the continuity of semantic information between crack pixels. Experiments on crack benchmark datasets demonstrate that our method outperforms other state-of-the-art (SOTA) methods, achieving the highest performance with only 2.8M parameters. On the multi-scenario dataset, our method reached 0.8390 in F1 score and 0.8479 in mIoU. The code is available at https://***/Karl1109/SCSegamba. © 2025, CC BY.

关键词： State space methods

来源：评论

学校读者我要写书评

暂无评论

Harnessing Light Field Angular Cues and Spatial Geometries for Semantic Segmentation

Harnessing Light Field Angular Cues and Spatial Geometries f...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Chen Jia Fan Shi Xu Cheng School of Computer Science and Engineering The Engineering Research Center of Learning-Based Intelligent System (Ministry of Education) The Key Laboratory of Computer Vision and System (Ministry of Education) Tianjin University of Technology Tianjin China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

4D light field imaging captures rich spatial-angular information, providing essential geometric cues for semantic segmentation tasks. In this paper, we introduce a novel backbone network called the Light Field Extraction Interaction Network (LFEI-Net). LFEI-Net excels in extracting global structures and multi-scale spatial-angular features, capturing feature dependencies through channel modeling and diverse feature interactions. Unlike traditional methods that depend on pyramid and dilated feature extraction, LFEI-Net pioneers an efficient method by integrating large-scale horizontal depth-wise convolution (HDWC) and vertical depth-wise convolution (VDWC) with interactive operations for comprehensive spatial multi-scale feature extraction. Furthermore, we present the Multi-Angular Modeling (MAM) module, which effectively captures scene angle variations from multiple perspectives and precisely delineates object boundaries, thereby improving model adaptability. Our experimental evaluations on two datasets demonstrate that LFEI-Net significantly outperforms state-ofthe-art (SOTA) 2D and 4D light field semantic segmentation methods, achieving mean Intersection over Union (mIoU) of 83.72% and 86.88%, respectively.

关键词： Geometry Adaptation models Convolution Semantic segmentation Imaging Feature extraction Light fields Acoustics Speech processing

来源：评论

学校读者我要写书评

暂无评论

A Short-term Aircraft Trajectory Prediction Framework Using Conditional Generative Adversarial Network 4

A Short-term Aircraft Trajectory Prediction Framework Using ...

引用

4th IEEE International Conference on Civil Aviation Safety and Information Technology, ICCASIT 2022

作者： Hu, Qinzhi Huang, Guoxin Shi, Han Lin, Yi Guo, Dongyue National Key Laboratory of Air Traffic Control Automation System Technology College of Computer Science Sichuan University Chengdu China National Key Laboratory of Fundamental Science on Synthetic Vision Sichuan University Chengdu China

ISBN: (数字)9781665467667

ISBN: (纸本)9781665467667

Short-term aircraft trajectory prediction (TP) plays an important role in current air traffic control systems. However, existing works usually perform the multi-horizon TP task in an iterated manner which easily suffers from error accumulation problems. In this work, a novel short-term aircraft TP framework, called TPGAN, is proposed which predicts the multi-horizon trajectory in a single step using the conditional generative adversarial network (CGAN). Compared with the conventional approaches, the TP task is formulated to the probability distribution estimate problems by CGAN architecture. In this framework, the generator is employed to output the predictions while the discriminative features between ground truth and predictions are learned by the discriminator. The generative adversarial training strategy is applied to optimize the proposed framework. Moreover, to validate the generality and effectiveness of the proposed framework, three neural network architectures are designed to develop the proposed framework, including Conv1D-TPGAN, Conv2D-TPGAN, and LSTM-TPGAN. In addition, a dataset collected from real-world ATC systems is used to construct the experiments. The experimental results demonstrated that the proposed framework achieves significantly performance improvements than baseline. © 2022 IEEE.

关键词： Air traffic control

来源：评论

学校读者我要写书评

暂无评论

A Coarse to Fine Detection Method for Prohibited Object in X-ray Images Based on Progressive Transformer Decoder 24

A Coarse to Fine Detection Method for Prohibited Object in X...

引用

32nd ACM International Conference on Multimedia, MM 2024

作者： Ma, Chunjie Du, Lina Gao, Zan Zhuo, Li Wang, Meng Shandong Jinan China School of Computer Science and Technology Shandong Jianzhu University Shandong Jinan China Faculty of Information Technology Beijing University Of Technology Beijing China School of Computer Science and Information Engineering Hefei University of Technology Anhui Hefei China Key Laboratory of Computer Vision and System Ministry of Education Tianjin University of Technology Tianjin300384 China

ISBN: (纸本)9798400706868

Currently, Transformer-based prohibited object detection methods in X-ray images appear constantly, but there are still some shortcomings such as poor performance and high computational complexity for prohibited object detection with heavily occlusion. Therefore, a coarse to fine detection method for prohibited object in X-ray images based on progressive Transformer decoder is proposed in this paper. Firstly, a coarse to fine framework is proposed, which includes two stages: coarse detection and fine detection. Through adaptive inference in stages, the computational efficiency of the model is effectively improved. Then, a position and class object queries method is proposed, which improves the convergence speed and detection accuracy of the model by fusing the position and class information of prohibited object with object queries. Finally, a progressive Transformer decoder is proposed, which distinguishes high and low score queries by decreasing confidence thresholds, so that high-score queries are not affected by low-score queries in the decoding stage, and the model can focus more on decoding low-score queries, which usually correspond to prohibited object with severe occlusion. The experimental results on three public benchmark datasets (SIXray, OPIXray, HiXray) demonstrate that compared with the baseline DETR, the proposed method achieves the state-of-the-art detection accuracy with a 21.6% reduction in model computational complexity. Especially for prohibited objects with heavily occlusion, accurate detection can be carried out. © 2024 ACM.

关键词： Decoding

来源：评论

学校读者我要写书评

暂无评论

Knowledge Graph for Identifying Geological Disasters by Integrating computer vision with Ontology

引用

Journal of Earth Science 2023年第5期34卷 1418-1432页

作者： Qinjun Qiu Zhong Xie Die Zhang Kai Ma Liufeng Tao Yongjian Tan Zhipeng Zhang Baode Jiang School of Computer Science China University of GeosciencesWuhan 430078China Key Laboratory of Geological Survey and Evaluation of Ministry of Education China University of GeosciencesWuhan 430074China National Engineering Research Center of Geographic Information System Wuhan 430074China School of Resource and Environmental Sciences Wuhan UniversityWuhan 430074China College of Computer and Information Technology China Three Gorges UniversityYichang 443002China Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering China Three Gorges UniversityYichang 443002China

The occurrence of geological disasters can have a large impact on urban safety. Protecting people’s safety is the most important concern when disasters occur. Safety improvement requires a large amount of comprehensive and representative risk analysis and a large collection of information related to geological hazards, including unstructured knowledge and experience. To address the relevant information and support safety risk analysis, a geological hazard knowledge graph is developed automatically based on computer vision and domain-geoscience ontology to identify geological hazards from input images while obeying safety rules and regulations, even when affected by changes. In the implementation of the knowledge graph, we design an ontology schema of geological disasters based on a top-down approach, and by organizing knowledge as a logical semantic expression, it can be shared using ontology technologies and therefore enable semantic interoperability. computer vision approaches are then used to automatically detect a set of entities and attributes, using the data from input images, and object types and their attributes are identified so that they can be stored in Neo4j for reasoning and searching. Finally, a reasoning model for geological hazard identification was developed using the Neo4j database to create nodes, relationships, and their properties for modeling, and geological hazards in the images can be automatically identified by searching the Neo4j database. An application on geological hazard is presented. The results show the effectiveness of the proposed approach in terms of identifying possible potential hazards in geological hazards and assisting in formulating targeted preventive measures.

关键词： geological hazard computer vision knowledge graph city safety ontology

来源：评论

学校读者我要写书评

暂无评论

Large Model Collaborative Optimization Enhanced by Mobile Consumer Electronics Constructed Federated ML

引用

IEEE Transactions on Consumer Electronics 2024年

作者： Bu, Chao Li, Jianlong Wang, Jinsong Tianjin University of Technology Key Laboratory of Computer Vision and System of Ministry of Education School of Computer Science and Engineering Tianjin300384 China Tianjin University of Technology Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology School of Computer Science and Engineering Tianjin300384 China

With the rapid development of internet of everything, current Mobile Consumer Electronics (MCEs) can support complex computing due to their specialized data-handling capacity. Thus, federated Machine Learning (ML) and edge computing are integrated to leverage such capacity of MCEs to enhance large model training. This paper studies motivations of all participants that take part in large model collaborative optimization from a global perspective, and proposes the positive promotion pattern that collaborates MCEs, Edge Computing Servers (ECSs), and the centralized computing center to support large model collaborative optimization by introducing the idea of computing economic. We firstly propose the novel three-layered framework for large model collaboratively optimizing based on MCEs constructed federated ML. Secondly, we introduce the economic factor into the large model optimization by devising algorithms to assess consumptions of all participants and dynamically adjust incentive benefits according to the qualities of trained models. Finally, we design the scheme to encourage more MCEs to train local models by adaptively adjusting the ECS pricing strategy with the ECS future economic benefit trend considered. Experimental results validate that our proposed approach is able to optimize the global accuracy and improve the economic benefit more efficiently than the state of the art. © 1975-2011 IEEE.

关键词： Mobile edge computing

来源：评论

学校读者我要写书评

暂无评论

Stackelberg Game-Driven Computational Offloading in Edge Computing Scenarios

Stackelberg Game-Driven Computational Offloading in Edge Com...

引用

International Conference on Communication Software and Networks, ICCSN

作者： Lan Zhang Chao Bu Key Laboratory of Computer Vision and System of Ministry of Education School of Computer Science and Engineering Tianjin University of Technology Tianjin China

With the number of users that use mobile devices for frequent transactions increasing rapidly, it is a great challenge to guarantee the credibility of transactions. Blockchain is regarded as a practical technology for such demand, however, the limited computing capacity of each user's device becomes a bottleneck. In this paper, the edge computing pattern is introduced to support complex computing for mobile devices of users by renting computing resource from computing service providers. By considering demands of both the user and the service provider, we propose a two-level game approach based on the Stackelberg Game for multiple users and multiple service providers on computing resources renting and pricing. The simulation results show that the proposed mechanism is feasible and effective.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Dynamic Services Migration Based on Dueling Deep Q-Network in MEC

Dynamic Services Migration Based on Dueling Deep Q-Network i...

引用

International Conference on Communication Software and Networks, ICCSN

作者： Xinyang Zhang Chao Bu Key Laboratory of Computer Vision and System of Ministry of Education School of Computer Science and Engineering Tianjin University of Technology Tianjin China

As an extended computing paradigm of cloud computing, Mobile Edge Computing (MEC) facilitates real-time service responses by deploying resources near network edges. However, services should frequently move among multiple edge computing servers because of the mobility of most users, which accordingly leads to increased network operation costs and influences service quality. In this paper, we formulate the service migration problem as a Markov Decision Process (MDP) and introduce the dueling Deep Q-Network (DQN) to solve the problem, so as to reduce the network operating cost without lowering the service quality. We also propose a trajectory prediction approach to further optimize the service migration. Simulation experimental results demonstrate that the proposed mechanism can achieve a lower network operation cost without reducing the service quality.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：