Super-resolution has advanced significantly in the last 20 years, particularly with the application of deep learning methods. One of the most important imageprocessing methods for boosting an image's resolut...
详细信息
Transformer based on self-attention mechanism has made remarkable achievements in natural language processing, which inspired the application research of Transformer in computer vision. The current deep hashing algori...
详细信息
Transformer based on self-attention mechanism has made remarkable achievements in natural language processing, which inspired the application research of Transformer in computer vision. The current deep hashing algorithms extract image features through the convolutional neural network (CNN). CNN concentrates on local information, and features lack global dependency information, which has an impact on image retrieval accuracy. To remedy the above defects, this paper proposes deep internally connected Transformer hashing for image retrieval (DICTH). DICTH has designed an improved Transformer block: internally connected Transformer block (ICT). ICT performs an embedded transformation on the feature maps, splices the generated Keys and Queries, to explore the rich context information between Query-Key pairs, and then dynamically encodes through multi-layer convolution to learn the context multi-head self-attention matrix. By combining ICT and ResNet18 to achieve selfattention injection, a long-distance dependency is established in the feature space to make up for the shortcomings of pure CNN in the feature extraction process and guide the algorithm to learn more accurate hash codes. At the same time, in the face of complex label information in big data sets, this paper uses an improved cross-entropy loss function: T-cross-entropy loss, to promote network learning of hash codes with more ability to distinguish between classes. In this paper, a lot of experiments have been conducted on CIFAR10, NUS-WIDE and MS-COCO datasets to verify the performance of DICTH. (c) 2023 Elsevier B.v. All rights reserved.
Rapid advancements in image and video processing technologies are poised to create remarkable impacts on a wide range of industries. A significant challenge in these processing technologies resides in identifying the ...
详细信息
vehicle positioning algorithms are essential for improving traffic management and safety by accurately locating vehicles in real-time, and, thus, minimizing congestion and accidents. They also support the development ...
详细信息
ISBN:
(纸本)9798350364309;9798350364293
vehicle positioning algorithms are essential for improving traffic management and safety by accurately locating vehicles in real-time, and, thus, minimizing congestion and accidents. They also support the development of advanced driver assistance systems and autonomous vehicles, relying on precise positioning data for safe navigation. One of the solutions involves using imageprocessingalgorithms, which can have two approaches. One approach is decentralized, in which each vehicle performs its own computing steps and determines its position concerning the other nearby vehicles. The second approach, proposed in this paper, is centralized, where each vehicle sends data to a server that uses cloud computing to process all the data in real-time. As such, vehicles can create a more comprehensive view of the driving conditions in the area by using either of these two approaches, which can help them anticipate potential hazards and make more informed decisions.
Enhancement of low-light images is a low-level visual task aimed at improving the quality of images captured under low-light conditions. In this study, a low-light image enhancement algorithm is proposed by compensati...
详细信息
Detection of corrosion in moving objects like ships is challenging due to the dynamic nature of the input image. Existing machine learning techniques are suitable for static images and the algorithms suffer in perform...
详细信息
The diagnosis of a range of eye disorders needs to categorize the retinal vessels. Computerized implementation of this process is becoming increasingly essential for automated screening systems for retinal diseases. T...
详细信息
The 'Interactive Sign Language Learning System' is a sophisticated application designed to facilitate the learning process of sign language learners. This comprehensive system encompasses several key features,...
详细信息
Detecting the objects and tracking them is a wonder with the help of the recent technologies. In today's life, video surveillance is commonly present at maximum places. As the population increases, the crime rates...
详细信息
Detecting the objects and tracking them is a wonder with the help of the recent technologies. In today's life, video surveillance is commonly present at maximum places. As the population increases, the crime rates also keep on increasing at different cases. The crime identification, dealing with the crime scenes and tracking down the criminal is a major task which involves numerous man powers. If there is a methodology for detecting and preventing the crime, it would be very much helpful to the public and also to the authorities. So the main objective of the proposed system is to automate the crime identification and crime tracking process by using the video surveillance with deep learning algorithms. Also, to alert the public before a crime takes place in public areas. Because of the timely alert, further crime activities may be avoided or the loss incurred may be reduced. The system also helps in identifying the criminals and tracking the information about the crime scenes, thus providing more useful information to the authorities in solving the cases. According to recent research works, the YOLO algorithm achieves higher accuracy with multi-object detection. In this paper, a framework with improvised YOLO algorithm is illustrated. The algorithm is fine-tuned with different hyperparameters for achieving the AUC with 0.91 for detecting vandalism behavior and 0.8299 over all the 14 classes of crime activities. The result of the proposed system is compared with the existing systems with parameters like training loss, testing loss, precision and F1 score.
visual Question Answering (vQA) lies at the crossroads of computer vision, natural language processing, and deep learning, captivating researchers across various AI domains. This dynamic field involves processing an i...
详细信息
暂无评论