AI analytics enables autonomous cars to detect and recognize objects, such as other vehicles, pedestrians, traffic signs, and obstacles, in real-time. deeplearning models, notably the You Only Look Once (YOLO) model,...
详细信息
作者:
Ramkumar, G.
Saveetha University Saveetha School of Engineering Department of Ece Chennai India
There are several needs and immense challenges in the existing health care conditions for which necessitates the development of a new model for blood pressure prediction in an IoT enabled environment. Many traditional...
详细信息
CNNs have traditionally been applied in computer vision. Recently, applying Transformer networks, originally a technique in natural language processing, to computer vision has received much attention and produced supe...
详细信息
CNNs have traditionally been applied in computer vision. Recently, applying Transformer networks, originally a technique in natural language processing, to computer vision has received much attention and produced superior results. However, Transformers and their derivation have drawbacks that the computational cost and memory usage increase rapidly with the image resolution. In this paper, we propose the Laplacian Pyramid Translation Transformer (LPTT) for image to image translation. The Laplacian Pyramid Translation Network, a previous study of this work, creates Laplacian pyramid of the input images and processes each component with CNNs. However, LPTT transforms the high-frequency components with CNNs and the low-frequency components with Axial Transformer blocks. LPTT can have Transformer's expressive power while reducing the computational cost and memory usage. LPTT significantly improves the quality of generated images and inference speed for high-resolution images over conventional methods. LPTT is the first network with a Transformer that can perform practical inference in realtime on 4K resolution images. LPTT can also process 8K images in realtime depending on the model conditions and the performance of the GPU. The ablation study in this paper suggests that even when processing high-resolution images, the performance is improved while maintaining the inference speed by computing the low-resolution component with a Transformer. LPTT improves PSNR value by 0.41 dB in MIT-Adobe FiveK dataset. The greater the number of layers in the Laplacian pyramid, the greater the improvement of LPTT over the Laplacian Pyramid Translation Network.
This paper assesses the efficacy of self-supervised learning in the deepDR Diabetic Retinopathy image Dataset (deepDRiD). Recently, self-supervised learning has achieved great success in the field of Computer Vision. ...
详细信息
ISBN:
(纸本)9781510650817;9781510650800
This paper assesses the efficacy of self-supervised learning in the deepDR Diabetic Retinopathy image Dataset (deepDRiD). Recently, self-supervised learning has achieved great success in the field of Computer Vision. Particularly, self-supervised learning can effectively serve the field of medical imaging where a large amount of labeled data is usually limited. In this paper, we apply the Bootstrap Your Own Latent (BYOL) approach to grade diabetic retinopathy which scores the lowest among the MedMNIST dataset. With the pre-trained model using BYOL, we evaluate the efficacy of the BYOL approach on deepDRiD following fine-tuning protocols. Further, we compare the performance of the model with the model from scratch and proved the effectiveness of BYOL in deepDRiD. Our experiment shows that BYOL can boost the performance of grading diabetic retinopathy.
This paper introduces the Adaptive Resource Optimization Network (ARON), a novel AI-driven framework for strategic resource allocation and risk management in enterprise environments. ARON integrates deep reinforcement...
详细信息
Target tracking technology is of great significance in football game video, and is the basis of high-level semantic tasks such as video summary generation, player motion analysis, game strategy formulation and footbal...
详细信息
Diffusion models face significant challenges when employed for large-scale medical image reconstruction in real practice such as 3D Computed Tomography (CT). Due to the demanding memory, time, and data requirements, i...
Today's leading cause of death worldwide is cardiovascular disease, which has risen to the top of the list of diseases in terms of diagnostic difficulty. Cardiovascular disease is more likely to occur in a person ...
详细信息
Many deeplearning-based image super-resolution models exist to effectively up-sample images, with the most notable and reliable architectures being Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), an...
详细信息
The implementation of deeplearning-based fault diagnosis methodologies has been increasingly observed across diverse sectors within the power industry. This is particularly relevant in contexts where power stations g...
详细信息
暂无评论