检索结果-内蒙古大学图书馆

Stochastic normalized gradient descent with momentum for large-batch training

science China(Information sciences) 2024年第11期67卷 77-91页

作者： Shen-Yi ZHAO Chang-Wei SHI Yin-Peng XIE Wu-Jun LI National Key Laboratory for Novel Software Technology Department of Computer Science and TechnologyNanjing University

Stochastic gradient descent(SGD) and its variants have been the dominating optimization methods in machine learning. Compared with SGD with small-batch training, SGD with large-batch training can better utilize the computational power of current multi-core systems such as graphics processing units(GPUs)and can reduce the number of communication rounds in distributed training settings. Thus, SGD with large-batch training has attracted considerable attention. However, existing empirical results showed that large-batch training typically leads to a drop in generalization accuracy. Hence, how to guarantee the generalization ability in large-batch training becomes a challenging task. In this paper, we propose a simple yet effective method, called stochastic normalized gradient descent with momentum(SNGM), for large-batch training. We prove that with the same number of gradient computations, SNGM can adopt a larger batch size than momentum SGD(MSGD), which is one of the most widely used variants of SGD, to converge to an?-stationary point. Empirical results on deep learning verify that when adopting the same large batch size,SNGM can achieve better test accuracy than MSGD and other state-of-the-art large-batch training methods.

关键词： non-convex problems large-batch training stochastic normalized gradient descent momentum

来源：评论

学校读者我要写书评

暂无评论

Small object detection in diverse application landscapes: a survey

引用

Multimedia Tools and Applications 2024年第41期83卷 88645-88680页

作者： Iqra Giri, Kaisar J. Javed, Mohammed Department of Computer Science Islamic University of Science & Technology Pulwama India Computer Vision & Biometrics Lab Dept. of IT Indian Institute of Information Technology Allahabad India

The importance of object detection within computer vision, especially in the context of detecting small objects, has notably increased. This thorough survey extensively examines small object detection across various applications, consolidating and outlining the available methodologies. Traditional papers on small object detection have focused on specific domains. However, this survey paper incorporates insights from a multitude of domains, providing a comprehensive understanding of the versatility and applicability of small object detection techniques. This paper sheds light on the key challenges faced and delves into potential solutions to address the challenges, offering insights into viable solutions to enhance small object detection performance, setting it apart from existing literature. The strategies identified in our survey encompass a spectrum of approaches, categorized as transformer-based, CNN, and traditional methods. Also, this paper collates prevalent datasets relevant to small object detection, simplifying access to these resources. Further, it provides a succinct overview of diverse evaluation metrics used for performance assessment in this field, enhancing understanding of the effectiveness and proficiency of these methods. This survey paper not only consolidates established knowledge but also highlights innovative viewpoints, providing a comprehensive and enlightening compilation that contributes to the advancement of small object detection in the field of computer vision. © The Author(s), under exclusive licence to Springer science+Business Media, LLC, part of Springer Nature 2024.

关键词： Feature extraction

来源：评论

学校读者我要写书评

暂无评论

Dual Encoder-Decoder Shifted Window-Based Transformer Network for Polyp Segmentation with Self-Learning Approach

IEEE Transactions on Artificial Intelligence

引用

IEEE Transactions on Artificial Intelligence 2024年第7期5卷 3456-3469页

作者： Lijin, P. Ullah, Mohib Vats, Anuja Cheikh, Faouzi Alaya Kumar, Santhosh Nair, Madhu S. Cochin University of Science and Technology Artificial Intelligence & Computer Vision Laboratory Department of Computer Science Kerala Kochi682022 India Norwegian University of Science and Technology Gjovik2815 Norway Norwegian University of Science and Technology Norwegian Colour and Visual Computing Laboratory Gjovik2815 Norway

According to WHO reports, cancer is the leading cause of death worldwide. The second most prevalent cause of cancer-related death in both men and women is colorectal cancer (CRC). One potential approach for reducing the severity of colon cancer is to utilize automatic segmentation and detection of colorectal polyps in colonoscopy videos. This technology can assist endoscopists in quickly identifying colorectal disease, leading to earlier intervention and better patient Quality of Life (QoL). In this article, we propose a self-supervised transformer based dual encoder-decoder architecture named P-SwinNet for polyps segmentation in colonoscopy images. The P-SwinNet adapts the dual encoder-decoder type of model to enhance the feature maps by sharing multiscale information from the encoder to the decoder. The proposed model uses multiple dilated convolutions to enlarge the field of view to gather more information without increasing the computational cost and the loss of spatial information. We also leverage a large-scale unlabeled dataset for training our model using the self-learning strategy of Barlow twins. Additionally, to capture the long-range dependencies in the data, we used a shift window-based approach that computes global attention. We extensively evaluate our model against state-of-the-art algorithms. The quantitative results show that the proposed P-SwinNet achieves a mean dice score of 0.87 and a mean Intersection over Union (IoU) of 0.82 on five datasets used in our study. This performance demonstrates a substantial advancement over existing similar works, highlighting the advantage and novelty of our proposed approach in the field of medical image segmentation. © 2020 IEEE.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

FG-PIH: A Fusion of Fresnelet Transform and Gradient Directional Pattern for Perceptual Image Hashing 2

FG-PIH: A Fusion of Fresnelet Transform and Gradient Directi...

引用

2nd International Conference on Recent Trends in Microelectronics, Automation, Computing, and Communications Systems, ICMACC 2024

作者： Meesala, Pavani Thounaojam, Dalton Meitei Computer Science and Engineering National Institute of Technology Silchar Computer Vision Laboratory Silchar India

ISBN: (纸本)9798350366570

Perceptual image hashing is pivotal in various image processing applications, including image authentication, content-based image retrieval, tampered image detection, and copyright protection. This paper proposes a novel approach for perceptual image hashing by combining the Fresnelet Transform with Gradient Directional Patterns. Using the FG-PIH technique, the proposed method achieves superior robustness against common image processing attacks while maintaining perceptual similarity for near-duplicate images. Experimental results on standard benchmark datasets demonstrate the effectiveness and efficiency of the proposed Fresnelet Transform-based perceptual image hashing scheme. Furthermore, comparative analysis against state-of-the-art methods underscores the competitiveness of our approach in terms of hash quality and computational complexity. © 2024 IEEE.

关键词： Hamming distance

来源：评论

学校读者我要写书评

暂无评论

AmplitudeArrow: On-the-Go AR Menu Selection Using Consecutive Simple Head Gestures and Amplitude Visualization

引用

IEEE Transactions on Visualization and computer Graphics 2025年第05期31卷 3408-3417页

作者： Tian, Yang Zhang, Youpeng Yan, Yukang Zhao, Shengdong Ma, Xiaojuan Shi, Yuanchun Guangxi University Department of Computer Science Guangxi Key Laboratory of Multimedia Communications and Network Technology China University of Rochester Department of Computer Science United States City University of Hong Kong School of Creative Media and the Department of Computer Science Hong Kong Hong Kong University of Science and Technology Department of Computer Science and Engineering Hong Kong Tsinghua University Department of Computer Science and Technology China

Heads-up computing aims to provide synergistic digital assistance that minimally interferes with users' on-the-go daily activities. Currently, the input modalities of heads-up computing are mainly voice and finger gestures. In this work, we propose and evaluate the AmplitudeArrow (AA) technique designed for on-the-go AR menu selection to demonstrate that consecutive simple head gestures can also be an effective input modality for heads-up computing. Specifically, AA arranges menu icons into one/two row(s). To select a target icon, the user first makes their head yaw to pre-select the target icon or the column containing it and then makes their head pitch to make the arrow in the target icon expand until the arrow covers the target icon completely, i.e., the pitch amplitude surpasses the selection confirmation threshold. User studies indicated that AA demonstrated robust resistance to walking-caused head perturbation and external factors such as other people/obstacles, delivering high accuracy (error rate © 1995-2012 IEEE.

关键词： Augmented reality

来源：评论

学校读者我要写书评

暂无评论

Static video summarization based on genetic algorithm and deep learning approach

引用

Multimedia Tools and Applications 2025年第13期84卷 12487-12512页

作者： Benoughidene, Abdelhalim Titouna, Faiza Boughida, Adil Computer Science Department LaSTIC Laboratory University Batna 2 Batna05000 Algeria Computer Science Department LabSTIC Laboratory University of 08 may 1945 Guelma Guelma24000 Algeria

The development of information technology has led to the rise of big data. A large portion of this big data comes in the form of video information. The automatic analysis of this exponential growth in video content has become a popular research area. This research focuses on finding a video’s keyframes through a proposed static video summarization method. The method uses a deep learning-based shot boundary detection approach as a pre-processing step and exploits DBSCAN clustering to extract keyframes. A genetic algorithm is used to optimize the hyper-parameters of DBSCAN rather than having the user pre-tune them because the number of keyframes in a video can vary depending on the content of the video. The experimental results on standard databases Open Video Project (OVP) and YouTube (YT) show that the proposed method produces better results than existing methods. © The Author(s), under exclusive licence to Springer science+Business Media, LLC, part of Springer Nature 2024.

关键词： Genetic algorithms

来源：评论

学校读者我要写书评

暂无评论

A systematic and comprehensive review on low power wide area network: characteristics, architecture, applications and research challenges

引用

Discover Internet of Things 2025年第1期5卷 1-26页

作者： Diane, Ass Diallo, Ousmane Ndoye, El Hadji Malick Laboratory Department of Computer Science Assane SECK University of Ziguinchor Ziguinchor Senegal

The Internet of Things (IoT) has become a rapidly growing research field. This is due to the advancement of digital technologies, miniaturization, and the reduction of the cost of IoT devices and wireless connectivity, among others. Despite the plethora of technologies used for the Internet of Things, the trade-off between long data transmission range and low power consumption was not found until the advent of Low Power Wide Area Network (LPWAN) technologies. This paper reviews the main aspects of LPWANs and their technologies based on an exhaustive search in several online scientific databases, such as Springer, IEEE Xplore, the ACM digital library, and Google Scholar. This research methodology enabled us to gather recent work on LPWANs, which forms the basis of this article. It is informative and knowledge-updating support in the LPWANs’ environment that broadly covers LPWANs. This research work has developed the characteristics of LPWANs and the techniques used to achieve long-range energy efficiency, high scalability, and low cost. In addition, it presents the application areas of LPWAN technologies with use-case network architectures for each area, addresses spectrum and energy optimization, and discusses open research challenges that need to be focused to provide guidelines for further contributions. © The Author(s) 2025.

关键词： Applications Architecture IoT LPWAN Research challenges Standardization

来源：评论

学校读者我要写书评

暂无评论

Learning continuous network emerging dynamics from scarce observations via data-adaptive stochastic processes

引用

science China(Information sciences) 2024年第12期67卷 240-255页

作者： Jiaxu CUI Qipeng WANG Bingyi SUN Jiming LIU Bo YANG College of Computer Science and Technology Jilin University Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University Public Computer Education and Research Center Jilin University Department of Computer Science Hong Kong Baptist University

Learning network dynamics from the empirical structure and spatio-temporal observation data is crucial to revealing the interaction mechanisms of complex networks in a wide range of domains. However,most existing methods only aim at learning network dynamic behaviors generated by a specific ordinary differential equation instance, resulting in ineffectiveness for new ones, and generally require dense *** observed data, especially from network emerging dynamics, are usually difficult to obtain, which brings trouble to model learning. Therefore, learning accurate network dynamics with sparse, irregularly-sampled,partial, and noisy observations remains a fundamental challenge. We introduce a new concept of the stochastic skeleton and its neural implementation, i.e., neural ODE processes for network dynamics(NDP4ND), a new class of stochastic processes governed by stochastic data-adaptive network dynamics, to overcome the challenge and learn continuous network dynamics from scarce observations. Intensive experiments conducted on various network dynamics in ecological population evolution, phototaxis movement, brain activity, epidemic spreading, and real-world empirical systems, demonstrate that the proposed method has excellent data adaptability and computational efficiency, and can adapt to unseen network emerging dynamics, producing accurate interpolation and extrapolation with reducing the ratio of required observation data to only about 6% and improving the learning speed for new dynamics by three orders of magnitude.

关键词： complex networks network dynamics emerging spatio-temporal dynamics neural processes

来源：评论

学校读者我要写书评

暂无评论

Segmentation of Head and Neck Tumors Using Dual PET/CT Imaging:Comparative Analysis of 2D,2.5D,and 3D Approaches Using UNet Transformer

引用

computer Modeling in Engineering & sciences 2024年第12期141卷 2351-2373页

作者： Mohammed A.Mahdi Shahanawaj Ahamad Sawsan A.Saad Alaa Dafhalla Alawi Alqushaibi Rizwan Qureshi Information and Computer Science Department College of Computer Science and EngineeringUniversity of Ha’ilHa’il55476Saudi Arabia Software Engineering Department College of Computer Science and EngineeringUniversity of Ha’ilHa’il55476Saudi Arabia Computer Engineering Department College of Computer Science and EngineeringUniversity of Ha’ilHa’il55476Saudi Arabia Department of Computer and Information Sciences Universiti Teknologi PetronasSeri Iskandar32610Malaysia Center for Research in Computer Vision(CRCV) University of Central FloridaOrlandoFL 32816USA

The segmentation of head and neck(H&N)tumors in dual Positron Emission Tomography/Computed Tomogra-phy(PET/CT)imaging is a critical task in medical imaging,providing essential information for diagnosis,treatment planning,and outcome *** by the need for more accurate and robust segmentation methods,this study addresses key research gaps in the application of deep learning techniques to multimodal medical ***,it investigates the limitations of existing 2D and 3D models in capturing complex tumor structures and proposes an innovative 2.5D UNet Transformer model as a *** primary research questions guiding this study are:(1)How can the integration of convolutional neural networks(CNNs)and transformer networks enhance segmentation accuracy in dual PET/CT imaging?(2)What are the comparative advantages of 2D,2.5D,and 3D model configurations in this context?To answer these questions,we aimed to develop and evaluate advanced deep-learning models that leverage the strengths of both CNNs and *** proposed methodology involved a comprehensive preprocessing pipeline,including normalization,contrast enhancement,and resampling,followed by segmentation using 2D,2.5D,and 3D UNet Transformer *** models were trained and tested on three diverse datasets:HeckTor2022,AutoPET2023,and *** was assessed using metrics such as Dice Similarity Coefficient,Jaccard Index,Average Surface Distance(ASD),and Relative Absolute Volume Difference(RAVD).The findings demonstrate that the 2.5D UNet Transformer model consistently outperformed the 2D and 3D models across most metrics,achieving the highest Dice and Jaccard values,indicating superior segmentation *** instance,on the HeckTor2022 dataset,the 2.5D model achieved a Dice score of 81.777 and a Jaccard index of 0.705,surpassing other model *** 3D model showed strong boundary delineation performance but exhibited variability across datasets,while the

关键词： PET/CT imaging tumor segmentation weighted fusion transformer multi-modal imaging deep learning neural networks clinical oncology

来源：评论

学校读者我要写书评

暂无评论

Real-Time Multi-object Tracking Using YOLOv8 and SORT on a SoC FPGA 21st

Real-Time Multi-object Tracking Using YOLOv8 and SORT on a...

引用

21st International Symposium on Applied Reconfigurable Computing, ARC 2025

作者： Danilowicz, Michal Kryjak, Tomasz Embedded Vision Systems Group Computer Vision Laboratory Department of Automatic Control and Robotics AGH University of Science and Technology Krakow Poland

ISBN: (纸本)9783031879944

Multi-object tracking (MOT) is one of the most important problems in computer vision and a key component of any vision-based perception system used in advanced autonomous mobile robotics. Therefore, its implementation on low-power and real-time embedded platforms is highly desirable. Modern MOT algorithms should be able to track objects of a given class (e.g. people or vehicles). In addition, the number of objects to be tracked is not known in advance, and they may appear and disappear at any time, as well as be obscured. For these reasons, the most popular and successful approaches have recently been based on the tracking paradigm. Therefore, the presence of a high quality object detector is essential, which in practice accounts for the vast majority of the computational and memory complexity of the whole MOT system. In this paper, we propose an FPGA (Field-Programmable Gate Array) implementation of an embedded MOT system based on a quantized YOLOv8 detector and the SORT (Simple Online Realtime Tracker) tracker. We use a modified version of the FINN framework to utilize external memory for model parameters and to support operations necessary required by YOLOv8. We discuss the evaluation of detection and tracking performance using the COCO and MOT15 datasets, where we achieve 0.21 mAP and 38.9 MOTA respectively. As the computational platform, we use an MPSoC system (Zynq UltraScale+ device from AMD/Xilinx) where the detector is deployed in reprogrammable logic and the tracking algorithm is implemented in the processor system. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： System-on-chip

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：