检索结果-内蒙古大学图书馆

PMED-Net: Pyramid Based Multi-Scale encoder-decoder Network for Medical Image Segmentation

IEEE ACCESS 2021年 9卷 55988-55998页

作者： Khan, Abbas Kim, Hyongsuk Chua, Leon Jeonbuk Natl Univ Div Elect & Informat Engn Jeonju 54896 South Korea Jeonbuk Natl Univ Core Res Inst Intelligent Robots Jeonju 54896 South Korea Univ Calif Berkeley Dept Elect Engn & Comp Sci Berkeley CA 94720 USA

A pyramidical multi-scale encoder-decoder network, namely PMED-Net, is proposed for medical image segmentation. Different variants of encoder-decoder networks are in practice for segmenting the medical images and U-Net is the most widely used one. However, the existing architectures for segmenting medical images have millions of parameters that require enormous computations which results in memory and cost-inefficiency. To overcome such limitations, we come up with the idea of training small networks in a cascaded form for coarse-to-fine prediction. The proposed adaptive network is extended up to six pyramid levels, and at each level, features are extracted at different scales of the input image. Each lightweight encoder-decoder network is trained independently to minimize loss, where succeeding level networks further refine the prior predictions. Evaluation and comparison of our architecture were performed on four different publicly available medical image segmentation datasets: International Skin Imaging Collaboration (ISIC) challenge 2018 dataset, brain tumor dataset, nuclei dataset, and X-ray dataset. The experimental results of the PMED-Net are either better or on par with other state-of-the-art networks in terms of IoU, F1-Score, and sensitivity metrics. Moreover, PMED-Net is efficient in terms of parameterized complexity as it has 1/21.3, 1/21.1, 1/14.0, 1/11.6, 1/11.2, 1/6.64, and 1/4.95 times fewer parameters than SegNet, U-Net, BCDU-Net, CU-Net, FCN-8s, ORED-Net, and MultiResUNet respectively. The pre-trained models, datasets information, and implementation details are available at https://***/kabbas570/Pyramid-Based-encoder-decoder.

关键词： Image segmentation Decoding Feature extraction Medical diagnostic imaging Training Diseases Deep learning Convolutional neural networks encoder-decoder architecture medical image processing semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

CLR2G: Cross-modal Contrastive Learning on Radiology Report Generation 24

CLR2G: Cross-modal Contrastive Learning on Radiology Report ...

引用

33rd ACM International Conference on Information and Knowledge Management (CIKM)

作者： Xue, Hongchen Ma, Qingzhi Liu, Guanfeng Qu, Jianfeng Liu, Yuanjun Liu, An Soochow Univ Sch Comp Sci & Technol Suzhou Peoples R China Macquarie Univ Sch Comp Sydney NSW Australia

ISBN: (纸本)9798400704369

The automatic generation of radiological imaging reports aims to produce accurate and coherent clinical descriptions based on X-ray images. This facilitates clinicians in completing the arduous task of report writing and advances clinical automation. The primary challenge in radiological imaging report generation lies in accurately capturing and describing abnormal regions in the images under data bias conditions, resulting in the generation of lengthy texts containing image details. Existing methods mostly rely on prior knowledge such as medical knowledge graphs, corpora, and image databases to assist models in generating more precise textual descriptions. However, these methods still struggle to identify rare anomalies in the images. To address this issue, we propose a two-stage training model, named CLR2G, based on cross-modal contrastive learning. This model delegates the task of capturing anomalies, particularly those challenging for the generative model trained with cross-entropy loss under data bias conditions, to a specialized abnormality capture component. Specifically, we employ a semantic matching loss function to train additional abnormal image and text encoders through cross-modal contrastive learning, facilitating the capture of 13 common anomalies. We utilize the anomalous image features, text features and their confidence probabilities as a posteriori knowledge to help the model generate accurate image reports. Experimental results demonstrate the state-of-the-art performance of our method on two widely used public datasets, IU-Xray and MIMIC-CXR.

关键词： Medical Data Mining Radiology Image Report Generation encoder-decoder architecture Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

Crack analysis of tall concrete wind towers using an ad-hoc deep multiscale encoder-decoder with depth separable convolutions under severely imbalanced data

引用

STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL 2024年

作者： Deng, Jianghua Hua, Linxin Lu, Ye Song, Yang Singh, Amardeep Che, Jiao Li, Yang Changzhou Inst Technol Sch Civil Engn & Architecture 666 Liaohe Rd Changzhou 213002 Peoples R China Monash Univ Dept Civil Engn Melbourne Australia Changzhou Inst Technol Sch Photoelect Engn Changzhou Peoples R China

An accurate and timely cracking assessment, including the presence, location and crack geometric feature measurement, is crucial for evaluating concrete wind towers. Therefore, the early identification of cracks is a critical procedure in promptly evaluating structural integrity. This study proposed an ad-hoc encoder-decoder network based on DeepLabv3+ with depth separable convolutions to automatically segment cracks from real-world images captured from various concrete wind towers. The combined advantages of the improved DeepLabv3+ and the lightweight MobileNet v2 are suitable as a benchmark due to their high performance and universality. Four experiments were conducted to determine the model design choice and crack feature measurement capability: (1) six parametric tests using various pre-trained base networks and algorithm optimisers, (2) the influence of complex background noise (i.e., handwriting script) on crack segmentation performance, (3) comparative studies with cutting-edge pixel-wise segmentation models and (4) crack feature measurement (i.e., length and width). The research outcome demonstrated that DeepLabv3+ with MobileNet v2 can potentially be applied for efficient and accurate crack segmentation in concrete wind towers with complex backgrounds.

关键词： Structural health monitoring concrete cracks depth separable convolutions encoder-decoder architecture automated inspection for wind towers

来源：评论

学校读者我要写书评

暂无评论

Open-ended remote sensing visual question answering with transformers

引用

INTERNATIONAL JOURNAL OF REMOTE SENSING 2022年第18期43卷 6809-6823页

作者： Al Rahhal, Mohamad M. Bazi, Yakoub Alsaleh, Sara O. Al-Razgan, Muna Mekhalfi, Mohamed Lamine Al Zuair, Mansour Alajlan, Naif King Saud Univ Coll Appl Comp Sci Appl Comp Sci Dept Riyadh Saudi Arabia King Saud Univ Coll Comp & Informat Sci Comp Engn Dept Riyadh Saudi Arabia King Saud Univ Coll Comp & Informat Sci Dept Software Engn Riyadh Saudi Arabia Fdn Bruno Kessler Digital Ind Ctr Technol Vis Unit Trento Italy King Saud Univ Coll Appl Comp Sci Appl Comp Sci Dept PO Box 51178 Riyadh 11543 Saudi Arabia

Visual question answering (VQA) has been attracting attention in remote sensing very recently. However, the proposed solutions remain rather limited in the sense that the existing VQA datasets address closed-ended question-answer queries, which may not necessarily reflect real open-ended scenarios. In this paper, we propose a new dataset named VQA-TextRS that was built manually with human annotations and considers various forms of open-ended question-answer pairs. Moreover, we propose an encoder-decoder architecture via transformers on account of their self-attention property that allows relational learning of different positions of the same sequence without the need of typical recurrence operations. Thus, we employed vision and natural language processing (NLP) transformers respectively to draw visual and textual cues from the image and respective question. Afterwards, we applied a transformer decoder, which enables the cross-attention mechanism to fuse the earlier two modalities. The fusion vectors correlate with the process of answer generation to produce the final form of the output. We demonstrate that plausible results can be obtained in open-ended VQA. For instance, the proposed architecture scores an accuracy of 84.01% on questions related to the presence of objects in the query images.

关键词： Visual question answering remote sensing open-set dataset vision transformers encoder-decoder architecture

来源：评论

学校读者我要写书评

暂无评论

CFN: A coarse-to-fine network for eye fixation prediction

引用

IET IMAGE PROCESSING 2022年第9期16卷 2373-2383页

作者： Xu, Binwei Liang, Haoran Liang, Ronghua Chen, Peng Zhejiang Univ Technol Coll Comp Sci & Technol Hangzhou 31023 Peoples R China

Many image-to-image computer vision approaches have made great progress by an end-to-end framework with the encoder-decoder architecture. However, the same image-to-image eye fixation prediction task is not the same as those computer vision tasks in that it focuses more on salient regions rather than precise predictions for every pixel. Thus, it is not appropriate to directly apply the end-to-end encoder-decoder to the eye fixation prediction task. In addition, although high-level feature is important, the contribution of low-level feature should also be kept and balanced in computational model. Nevertheless, some low-level features that attract attention are easily neglected while transiting through the deep network. Therefore, the effective way to integrate low-level and high-level features for improving eye fixation prediction performance is still a challenging task. In this paper, a coarse-to-fine network (CFN) that encompasses two pathways with different training strategies are proposed: coarse perceiving network (CFN-Coarse) can be a simple encoder network or any of the existing pretrained network to capture the distribution of salient regions and generate high-quality feature maps;fine integrating network (CFN-Fine) uses fixed parameters from the CFN-Coarse and combines features from deep to shallow in the deconvolution process by adding skip connections between down-sampling and up-sampling paths to efficiently integrate deep and shallow features. The saliency map obtained by the method is evaluated over 6 standard benchmark datasets, namely SALICON, MIT1003, MIT300, Toronto, OSIE, and SUN500. The results demonstrate that the method can surpass the state-of-the-art accuracy of eye fixation prediction and achieves the competitive performance to date under most evaluation metrics on SALICON Saliency Prediction Challenge (LSUN2017).

关键词： OSIE dataset saliency map SALICON Saliency Prediction Challenge low-level feature CFN-Fine image-to-image eye fixation prediction task SALICON dataset deconvolution image-to-image computer vision convolutional neural nets MIT300 dataset high-quality feature map generation MIT1003 dataset salient region distribution image sampling feature extraction CFN-Coarse deep learning (artificial intelligence) image segmentation SUN500 dataset computer vision coarse perceiving network Toronto dataset high-level feature down-sampling path up-sampling path deconvolution process coarse-to-fine network encoder-decoder architecture deep network

来源：评论

学校读者我要写书评

暂无评论

A Mono SLAM Method Based on Depth Estimation by DenseNet-CNN

引用

IEEE SENSORS JOURNAL 2022年第3期22卷 2447-2455页

作者： Jin, Yifan Yu, Lei Chen, Zhong Fei, Shumin Huaiyin Inst Technol Jiangsu Key Lab Adv Mfg Technol Huaian 223003 Peoples R China Soochow Univ Sch Mech & Elect Engn Suzhou 215000 Peoples R China Southeast Univ Sch Automat Nanjing 210000 Peoples R China

Currently, SLAM (simultaneous localization and mapping) systems based on monocular cameras cannot directly obtain depth information, and most of them have problems with scale uncertainty and need to be initialized. In some application scenarios that require navigation and obstacle avoidance, the inability to achieve dense mapping is also a defect of monocular SLAM. In response to the above problems, this paper proposes a method which learns depth estimation by DenseNet and CNN for a monocular SLAM system. We use an encoder-decoder architecture based on transfer learning and convolutional neural networks to estimate the depth information of monocular RGB images. At the same time, through the front-end ORB feature extraction and the back-end direct RGB-D Bundle Adjustment optimization method, it is possible to obtain accurate camera poses and achieve dense indoor mapping when using estimated depth information. The experimental results show that the monocular depth estimation model used in this paper can achieve good results, and it is also competitive in comparison with the current popular methods. On this basis, the error of camera pose estimation is also smaller than traditional monocular SLAM solutions and can complete the dense indoor reconstruction task. It is a complete SLAM system based on monocular camera.

关键词： Simultaneous localization and mapping Estimation Cameras Convolutional neural networks Sensors Image reconstruction Feature extraction Monocular depth estimation encoder-decoder architecture transfer learning camera pose estimation dense mapping

来源：评论

学校读者我要写书评

暂无评论

LiCENt: Low-Light Image Enhancement Using the Light Channel of HSL

引用

IEEE ACCESS 2022年 10卷 33547-33560页

作者： Garg, Atik Pan, Xin-Wen Dung, Lan-Rong Natl Yang Ming Chiao Tung Univ EECS Int Grad Program Hsinchu 30010 Taiwan Natl Yang Ming Chiao Tung Univ Dept Elect & Comp Engn Hsinchu 30010 Taiwan

Images captured in low-brightness environments often lead to poor visibility and exhibit artifacts such as low brightness, low contrast, and color distortion. These artifacts not only affect the visual perception of the human eye but also decrease the performance of computer vision algorithms. Existing deep learning-based image enhancements studies are quite slow and usually require extensive hardware specifications. Conversely, lightweight enhancement approaches do not provide satisfactory performance as compared to state-of-the-art methods. Therefore, we proposed a fast and lightweight deep learning-based algorithm for performing low-light image enhancement using the light channel of Hue Saturation Lightness (HSL). LiCENt stands for Light Channel Enhancement Network that uses a combination of an autoencoder and convolutional neural network (CNN) to train a low-light enhancer to first improve the illumination and later improve the details of the low-light image in a unified framework. This method used a single channel lightness 'L' of HSL color space instead of traditional RGB color channels which helps in reducing the number of learnable parameters by a factor of 8.92, at the most. LiCENt also has significant advantages for the Brilliance Perception Adjustment, which enables the model to avoid issues including over-enhancement and color distortion. The experimental results demonstrate that our approach generalizes well in synthetic and natural low-light images and outperforms other methods in terms of qualitative and quantitative metrics.

关键词： Image color analysis Lighting Image enhancement Visualization Training Generative adversarial networks Convolutional neural networks Deep learning image processing low-light enhancement encoder-decoder architecture image enhancement computer vision convolutional neural network

来源：评论

学校读者我要写书评

暂无评论

ClarifyNet: A high-pass and low-pass filtering based CNN for single image dehazing

引用

JOURNAL OF SYSTEMS architecture 2022年第0期132卷

作者： Susladkar, Onkar Deshmukh, Gayatri Nag, Subhrajit Mantravadi, Ananya Makwana, Dhruv Ravichandran, Sujitha Teja, R. Sai Chandra Chavhan, Gajanan H. Mohan, C. Krishna Mittal, Sparsh Vishwakarma Inst Informat Technol Pune India IIT Hyderabad Hyderabad India IIIT Raichur Raichur India CKM Vigil Pvt Ltd Hyderabad India NIT Trichy Trichy India IIT Roorkee Roorkee India

Dehazing refers to removing the haze and restoring the details from hazy images. In this paper, we propose ClarifyNet, a novel, end-to-end trainable, convolutional neural network architecture for single image dehazing. We note that a high-pass filter detects sharp edges, texture, and other fine details in the image, whereas a low-pass filter detects color and contrast information. Based on this observation, our key idea is to train ClarifyNet on ground-truth haze-free images, low-pass filtered images, and high-pass filtered images. Based on this observation, we present a shared-encoder multi-decoder model ClarifyNet which employs interconnected parallelization. While training, ground-truth haze-free images, low-pass filtered images, and high-pass filtered images undergo multi-stage filter fusion and attention. By utilizing a weighted loss function composed of SSIM loss and L1 loss, we extract and propagate complementary features. We comprehensively evaluate ClarifyNet on I-HAZE, O-HAZE, Dense-Haze, NH-HAZE, SOTS-Indoor, SOTS-Outdoor, HSTS, and Middlebury datasets. We use PSNR and SSIM metrics and compare the results with previous works. For most datasets, ClarifyNet provides the highest scores. On using EfficientNet-B6 as the backbone, ClarifyNet has 18 M parameters (model size of similar to 71 MB) and a throughput of 8 frames-per-second while processing images of size 2048 x 1024.

关键词： Single-image dehazing Convolutional neural network encoder-decoder architecture Attention Low-pass filter High-pass filter

来源：评论

学校读者我要写书评

暂无评论

TATL: Task agnostic transfer learning for skin attributes detection

引用

MEDICAL IMAGE ANALYSIS 2022年第0期78卷 102359-102359页

作者： Nguyen, Duy M. H. Nguyen, Thu T. Vu, Huong Pham, Quang Manh-Duy Nguyen Nguyen, Binh T. Sonntag, Daniel German Res Ctr Artificial Intelligence Saarbrucken Germany Max Planck Inst Informat Saarbrucken Germany Univ Louisiana Lafayette Lafayette LA 70504 USA Univ Calif Berkeley Berkeley CA 94720 USA Singapore Management Univ Sch Comp & Informat Syst Singapore Singapore Dublin City Univ Sch Comp Dublin Ireland AISIA Res Lab Ho Chi Minh City Vietnam Univ Science Ho Chi Minh City Vietnam Vietnam Natl Univ Ho Chi Minh City Vietnam Oldenburg Univ Oldenburg Germany

Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune on a medical target task. However, we argue that such approaches are suboptimal be-cause medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose Task Agnostic Transfer Learning (TATL), a novel framework motivated by der-matologists' behaviors in the skincare context. TATL learns an attribute-agnostic segmenter that detects lesion skin regions and then transfers this knowledge to a set of attribute-specific classifiers to detect each particular attribute. Since TATL's attribute-agnostic segmenter only detects skin attribute regions, it enjoys ample data from all attributes, allows transferring knowledge among features, and compensates for the lack of training data from rare attributes. We conduct extensive experiments to evaluate the pro-posed TATL transfer learning mechanism with various neural network architectures on two popular skin attributes detection benchmarks. The empirical results show that TATL not only works well with multi -ple architectures but also can achieve state-of-the-art performances, while enjoying minimal model and computational complexities. We also provide theoretical insights and explanations for why our transfer learning framework performs well in practice.(c) 2022 Elsevier B.V. All rights reserved.

关键词： Transfer learning Skin attribute detection encoder-decoder architecture

来源：评论

学校读者我要写书评

暂无评论

WaferSegClassNet - A light-weight network for classification and segmentation of semiconductor wafer defects

引用

COMPUTERS IN INDUSTRY 2022年 142卷

作者： Nag, Subhrajit Makwana, Dhruv Teja, Sai Chandra R. Mittal, Sparsh Mohan, C. Krishna IIT Hyderabad Hyderabad India CKMVigil Pvt Ltd Hyderabad India IIT Roorkee Roorkee Uttar Pradesh India

As the integration density and design intricacy of semiconductor wafers increase, the magnitude and complexity of defects in them are also on the rise. Since the manual inspection of wafer defects is costly, an automated artificial intelligence (AI) based computer-vision approach is highly desired. The previous works on defect analysis have several limitations, such as low accuracy and the need for separate models for classification and segmentation. For analyzing mixed-type defects, some previous works require separately training one model for each defect type, which is non-scalable. In this paper, we present WaferSegClassNet (WSCN), a novel network based on encoder-decoder architecture. WSCN performs simultaneous classification and segmentation of both single and mixed-type wafer defects. WSCN uses a "shared encoder" for classification, and segmentation, which allows training WSCN end-to-end. We use N-pair contrastive loss to first pretrain the encoder and then use BCE-Dice loss for segmentation, and categorical cross-entropy loss for classification. Use of N-pair contrastive loss helps in better embedding representation in the latent dimension of wafer maps. WSCN has a model size of only 0.51MB and performs only 0.2 M FLOPS. Thus, it is much lighter than other state-of-the-art models. Also, it requires only 150 epochs for convergence, compared to 4000 epochs needed by a previous work. We evaluate our model on the MixedWM38 dataset, which has 38,015 images. WSCN achieves an average classification accuracy of 98.2% and a dice coefficient of 0.9999. We are the first to show segmentation results on the MixedWM38 dataset. The source code can be obtained from https://***/ckmvigil/WaferSegClassNet. (C) 2022 Elsevier B.V. All rights reserved.

关键词： Image classification Image Segmentation Semiconductor wafer defect analysis Convolution neural network encoder-decoder architecture

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：