In the application of structural health monitoring, the measured data might be temporarily or permanently lost due to sensor fault or transmission failure. The measured data with a high data loss ratio undermine its a...
详细信息
In the application of structural health monitoring, the measured data might be temporarily or permanently lost due to sensor fault or transmission failure. The measured data with a high data loss ratio undermine its ability for modal identifications and structural condition evaluations. To reconstruct the lost data in the field of structural health monitoring, this study proposes a deep convolutional generative adversarial network which includes a generator with encoder-decoder structure and an adversarial discriminator. The proposed generative adversarial network model needs to understand the content of the complete signals, as well as produce realistic hypotheses for the lost signals. Given the data stably measured before the occurrence of data loss, the generator is trained to extract the features maintained in the data set and reconstruct lost signals using the responses of the remaining functional sensors alone. The discriminator feeds back the distinguished results to the generator to improve its reconstruction accuracy. When training the model, the reconstruction loss and the adversarial loss are employed to better handle the low-frequency features and high-frequency features of the signals. The effectiveness and efficiency of the proposed method are validated by two case studies. As the number of training epoch increases, the reconstructed signals learn the features from low-frequency to high-frequency, and the amplitude of the reconstructed signals gradually increases. It can be seen that the final reconstruction signals match well with the real signals in the time domain and frequency domain. To further demonstrate the applicability of the reconstructed signals in data analysis, the reconstructed acceleration data are used to accurately identify the modal parameters in the numerical case, and the vehicle-induced responses are precisely decomposed from the reconstructed strain data in the field case. Finally, the reconstruction capacity is also investigated
Currently, non -decision -level image fusion algorithms require extremely high registration precision of the images to be fused. In the face of different perspective image fusion scenarios, traditional feature registr...
详细信息
Currently, non -decision -level image fusion algorithms require extremely high registration precision of the images to be fused. In the face of different perspective image fusion scenarios, traditional feature registration algorithms and learning -based methods have poor robustness and are unsuitable for large image differences because of the Registration -Fusion separation. In addition, the lack of relevant datasets also hinders the development of different perspective image fusion methods. Given the above problems, we collect 5000 sets of different perspective RGB-MONO datasets in multiple scenes for raw data support. We present an end -toend learned system for fusing two different perspective photographs into a chosen target view. The cascaded feature extraction based on encoder-decoder structure enables learning optical flow at different feature levels systematically. Then the optical flow module enables the image to be continuously registered and optimized during the fusion process, thus avoiding the deviations introduced by non -end -to -end algorithms. Extensive quantitative and qualitative experiments demonstrate that our proposed system can effectively fuse images from different perspectives in our self -built dataset. Compared with non -end -to -end fusion, our method provides superior performance in several fusion evaluation indicators.
Numerous people die from lung cancer every year, making it a serious public health issue. Oftentimes, the symptoms of lung cancer manifest only at a later stage, when it is difficult to treat. Pulmonary nodules are co...
详细信息
Numerous people die from lung cancer every year, making it a serious public health issue. Oftentimes, the symptoms of lung cancer manifest only at a later stage, when it is difficult to treat. Pulmonary nodules are commonly found while screening the lungs using a Computed Tomography (CT) scan, and some of the nodules may be cancerous. So, an efficient automated pulmonary nodule segmentation system is needed to isolate the pulmonary nodules from the scan images. The doctors can track the nodules that are likely to be malignant and provide early treatment if they become cancerous, thereby improving the patient's chance of survival. The attention mechanism is a technique that is often used in computer vision to enhance the neural network's performance. LA-ResUNet, a pulmonary nodule segmentation model, built using ResUNet with a linear attention mechanism and the Leaky ReLU activation function is proposed. LA-ResUNet efficiently segments pulmonary nodules, while achieving a linear time and space complexity. By employing residual blocks, it is possible to construct a deep network without facing the vanishing gradient problem. Additionally, it makes deep network training simpler. Skip connections allow for better gradient flow during training and better information flow between layers. Leaky ReLU addresses the dying ReLU scenario, a situation where some neurons cease to learn when the network is being trained. LA-ResUNet was used on the dataset LIDC-IDRI (The Lung Image Database Consortium and Image Database Resource Initiative) and it produced a dice score coefficient (DSC) of 73.11% and Intersection over Union score (IoU) of 60.62%.
Fine-grained image captioning is a focal point in the vision-to-language task and has attracted considerable attention for generating accurate and contextually relevant image captions. Effective attribute prediction a...
详细信息
Fine-grained image captioning is a focal point in the vision-to-language task and has attracted considerable attention for generating accurate and contextually relevant image captions. Effective attribute prediction and their utilization play a crucial role in enhancing image captioning performance. Despite progress in prior attribute-related methods, they either focus on predicting attributes related to the input image or concentrate on predicting linguistic context-related attributes at each time step in the language model. However, these approaches often overlook the importance of balancing visual and linguistic contexts, leading to ineffective exploitation of semantic information and a subsequent decline in performance. To address these issues, an Independent Attribute Predictor (IAP) is introduced to precisely predict attributes related to the input image by leveraging relationships between visual objects and attribute embeddings. Following this, an Enhanced Attribute Predictor (EAP) is proposed, initially predicting linguistic context-related attributes and then using prior probabilities from the IAP module to rebalance image and linguistic context-related attributes, thereby generating more robust and enhanced attribute probabilities. These refined attributes are then integrated into the language LSTM layer to ensure accurate word prediction at each time step. The integration of the IAP and EAP modules in our proposed image captioning with the enhanced attribute predictor (ICEAP) model effectively incorporates high-level semantic details, enhancing overall model performance. The ICEAP outperforms contemporary models, yielding significant average improvements of 10.62% in CIDEr-D scores for MS-COCO, 9.63% for Flickr30K and 7.74% for Flickr8K datasets using cross-entropy optimization, with qualitative analysis confirming its ability to generate fine-grained captions.
Accurate building segmentation plays a crucial role in a wide range of applications such as urban planning, monitoring, and mapping. Different deep learning models were employed for building segmentation. However, the...
详细信息
Accurate building segmentation plays a crucial role in a wide range of applications such as urban planning, monitoring, and mapping. Different deep learning models were employed for building segmentation. However, these models analyze images from a single view. Given the limitations of single-view building segmentation models, our research aims to enhance accuracy by proposing a novel multi-view U-Net deep model for accurate building segmentation that incorporates multiple views of the images. We employ two pre-trained convolutional neural network architectures, MobileNetV2 and ResNet50, to extract features representing two different views of our images. By fusing these features, our proposed method effectively captures complementary information, leading to enhanced segmentation accuracy. To further improve the model's performance, we incorporate skip connections and up-convolutional layers to ensure fine-grained feature propagation. Our experimental results on a large building dataset demonstrate a significant improvement in segmentation accuracy 91% compared to state-of-theart methods, highlighting the effectiveness of our multiview fusion approach. The experimental results enhance the benefits of creating different views by adopting the novel concept proposed in this paper. This research has the potential to redefine the landscape of building segmentation in applications such as urban planning and mapping. We also conducted a test on a large study area (city scale of Belval-Luxembourg). This demonstrates the capabilities of our method and its efficiency in segmenting satellite images from a large extent area and reinforces its potential for real-world applications.
An attribute network is a form of data that contains rich semantic information. Many real scenarios can be modeled as attributed networks, such as social media, citations, and traffic networks. Anomaly detection in at...
详细信息
An attribute network is a form of data that contains rich semantic information. Many real scenarios can be modeled as attributed networks, such as social media, citations, and traffic networks. Anomaly detection in attributed networks is an interesting research topic owing to its potential in various practical applications, including spam, network intrusion, and financial fraud detection. However, attributed networks exhibit many anomaly patterns, such as structural, attribute, local, and global anomalies, making anomaly detection in attributed networks a challenging task. To address these difficulties, we designed DeepGL, a novel unsupervised deep global-local view model, for anomaly detection in attributed networks. Our model is an encoder-decoder framework with multiple views that capture node attributes and network structure information from both global and local views. Specifically, our model contains two encoders and four decoders. The two encoders are used to capture network features from local and global views, and the four decoders are used to reconstruct the local node attribute information, local structure information, global node attribute information, and global structure information. To the encoders and decoders, we applied Laplacian sharpening and smoothing techniques to maintain the integrity of normal node features while diminishing the conspicuousness of anomalous nodes in the reconstructed information, thereby facilitating the calculation of reconstruction errors. Extensive experiments on four real-world attributed network datasets demonstrate the excellent performance of the proposed method.
The electrocardiogram (ECG) is an affordable, non-invasive and quick method to gain essential information about the electrical activity of the heart. Interpreting ECGs is a time-consuming process even for experienced ...
详细信息
The electrocardiogram (ECG) is an affordable, non-invasive and quick method to gain essential information about the electrical activity of the heart. Interpreting ECGs is a time-consuming process even for experienced cardiologists, which motivates the current usage of rule-based methods in clinical practice to automatically describe ECGs. However, in comparison to descriptions created by experts, ECG-descriptions generated by such rule-based methods show considerable limitations. Inspired by image captioning methods, we instead propose a data-driven approach for ECG description generation. We introduce a label-guided Transformer model, and show that it is possible to automatically generate relevant and readable ECG descriptions with a data-driven captioning model. We incorporate prior ECG labels into our model design, and show this improves the overall quality of generated descriptions. We find that training these models on free-text annotations of ECGs - instead of the clinically-used computer generated ECG descriptions - greatly improves performance. Moreover, we perform a human expert evaluation study of our best system, which shows that our data-driven approach improves upon existing rule-based methods.
As the world is pacing towards globalization, the demand for automatic language translators is increasing rapidly. Traditional translation systems consist of multiple steps like speech recognition, text to text machin...
详细信息
ISBN:
(数字)9781665486743
ISBN:
(纸本)9781665486743
As the world is pacing towards globalization, the demand for automatic language translators is increasing rapidly. Traditional translation systems consist of multiple steps like speech recognition, text to text machine translation, and speech generation. Issue with these systems are, latency due to multiple steps and error propagation from first steps toward last steps. Another challenge is that many spoken languages do not have text representation, so traditional system involving speech to text and text to text translation do not work. In this paper, we are presenting a recurrent neural network (RNN) based translation system that can generate a direct waveform of target language audio. We have used the sparse coding technique for the extraction and inversion of audio features. An attention-based multi-layered sequence to sequence model is trained using a novel technique on a dataset of Spanish to English audio and no intermediate text representation is used while training or inference. We have done performance comparison of proposed approaches using latency, bilingual evaluation understudy (BLEU) score and Perceptual Evaluation of Speech Quality PESQ score analysis. The resulting system provides a very fast translation with good translation accuracy and audio quality.
Radiologists are required towrite a descriptive report for each examination they perform which is a time-consuming process. Deep-learning researchers are developing models to automate this process. Currently, the most...
详细信息
ISBN:
(纸本)9783031226946;9783031226953
Radiologists are required towrite a descriptive report for each examination they perform which is a time-consuming process. Deep-learning researchers are developing models to automate this process. Currently, the most researched architecture for this task is the encoder-decoder (E-D). An issue with this approach is that these models are optimised to produce output that is more coherent and grammatically correct rather than clinically correct. The current study considers this and instead builds upon a more recent approach that generates reports using a multi-label classification model attached to a Template-based Report Generation (TRG) subsystem. In the current study two TRG models that utilise either a Transformer or CNN classifier are produced and directly compared to the most clinically accurate E-D in the literature at the time of writing. The models were trained using the MIMIC-CXR dataset, a public set of 473,057 chest X-rays and 206,563 corresponding reports. Precision, recall and F1 scores were obtained by applying a rule-based labeller to the MIMIC-CXR reports, applying those labels to the corresponding images, and then using the labeller on the generated reports. The TRG models outperformed the E-D model for clinical accuracy with the largest difference being the recall rate (T-TRG: Precision 0.38, Recall 0.58, F1 0.45;CNN-TRG: Precision 0.34, Recall 0.69, F1 0.42;E-D: Precision 0.38, Recall 0.14, F1 0.19). Examination of the quantitative metrics for each specific abnormality combined with the qualitative assessment concludes that significant progress still needs to be made before clinical integration is safe.
Trigger-action programming allows end users to write event-driven rules to automate smart devices and internet services. Users can create a trigger-action program (TAP) by specifying triggers and actions from a set of...
详细信息
ISBN:
(纸本)9781450392983
Trigger-action programming allows end users to write event-driven rules to automate smart devices and internet services. Users can create a trigger-action program (TAP) by specifying triggers and actions from a set of predefined functions along with suitable data fields for the functions. Many trigger-action programming platforms have emerged as the popularity grows, e.g., IFTTT, Microsoft Power Automate, and Samsung SmartThings. Despite their simplicity, composing trigger-action programs (TAPs) can still be challenging for end users due to the domain knowledge needed and enormous search space of many combinations of triggers and actions. We propose RecipeGen, a new deep learning-based approach that leverages Transformer sequence-to-sequence (seq2seq) architecture to generate TAPs on the fine-grained field-level granularity from natural language descriptions. Our approach adapts autoencoding pre-trained models to warm-start the encoder in the seq2seq model to boost the generation performance. We have evaluated RecipeGen on real-world datasets from the IFTTT platform against the prior state-of-the-art approach on the TAP generation task. Our empirical evaluation shows that the overall improvement against the prior best results ranges from 9.5%-26.5%. Our results also show that adopting a pre-trained autoencoding model boosts the MRR@3 further by 2.8%-10.8%. Further, in the field-level generation setting, RecipeGen achieves 0.591 and 0.575 in terms of MRR@3 and BLEU scores respectively.
暂无评论