image captioning is a cross-domain task involving image and natural language processing. Most of the current models follow an encoder-decoder architecture, where the encoder takes image feature vectors as input, and t...
详细信息
ISBN:
(纸本)9781450396899
image captioning is a cross-domain task involving image and natural language processing. Most of the current models follow an encoder-decoder architecture, where the encoder takes image feature vectors as input, and the decoder uses an autoregressive or non-autoregressive approach for decoding. However, most models extract the feature vectors of an imagethrough region proposals of the object detector, without considering the relative spatial relationship between objects. When decoding, the autoregressive decoding method is adopted, that is, the next word is generated based on the generated word, and the generation is performed step by step, which will lead to high delay in the inference process. To solve this problem, scholars have proposed a non-autoregressive approach, which speeds up inference by generating all words in parallel, but reduces the quality of generated captioning. Aiming at the above problems, this paper proposes a semi-autoregressive Transformer model based on geometric attention. the encoder integrates the relative spatial relationship between the detected objects through geometric attention and appearance attention, so as to enhance spatial awareness; the decoder adopts a semi-autoregressive decoding method, which is serialized globally and paralleled locally, enabling the model to achieve a better trade-off between decoding speed and captioning accuracy. Extensive experiments and ablation studies on the MSCOCO dataset have shown that the model achieves better performance compared to state-of-the-art models.
Bone age plays an important role in the scene of pediatrics and judicial identification, because the traditional bone age assessment is time-consuming, laborious and depends on the experience of physicians, the result...
详细信息
ISBN:
(纸本)9781450396899
Bone age plays an important role in the scene of pediatrics and judicial identification, because the traditional bone age assessment is time-consuming, laborious and depends on the experience of physicians, the results of artificial bone age assessment will vary from person to person. this paper has collected the X-ray image from a class A tertiary children's hospital, and presents the convolutional neural network suitable for China 05 bone age assessment. For the collected Chinese 3-16-year-old youth hand bone image, using the stacked denoising autoencoder (SDAE) combined with ResNet50, while reducing the noise of soft tissue and effectively improving the feature extraction ability of the model; Secondly, the 3×3 convolution in the ResNet50 residual block is replaced with a pyramid split attention (PSA) module to get the new model, fusion multi-level features of space and channel attention, adapt to re-define features; Presents the adaptive dual-channel pooling layer by combining the max pooling and average pooling; Use pre-excitement to speed up convergence and label smooth loss function to prevent the model from overfitting, and finally establish a deep learning classification model for China 05 bone age assessment. the experimental results show that the accuracy of ±1 year in this method reaches 93.22% of men, and 91.71% of women. the Mean Absolute Error (MAE) also decreases.
Due to the continuous popularization of the Internet and mobile phones, people have gradually entered a participatory network era, and the rapid growth of social networks has caused an explosion of digital information...
详细信息
ISBN:
(纸本)9781450396899
Due to the continuous popularization of the Internet and mobile phones, people have gradually entered a participatory network era, and the rapid growth of social networks has caused an explosion of digital information content. It has turned online opinions, blogs, tweets and posts into highly valuable assets, allowing governments and businesses to gain insights from the data and make their strategies. Business organizations need to process and analyze these sentiments to investigate the data and gain business insights. In recent years, deep learning techniques have been very successful in performing sentiment analysis, which offers automatic feature extraction, rich representation capabilities and better performance compared with traditional feature-based techniques. the core idea is to extract complex features automatically from large amounts of data by building deep neural networks to generate up-to-date predictions. this paper reviews social media sentiment analysis methods based on deep learning. Firstly, it introduces the process of single-modal text sentiment analysis on social media. then it summarizes the multimodal sentiment analysis algorithms for social media, and divides the algorithm into feature layer fusion, decision layer fusion and linear regression model according to different fusion strategies. finally, the difficulties of social media sentiment analysis based on deep learning and future research directions are discussed.
In this paper, a new compounded direct pixel beamforming (CDPB) method is presented to remove blurring artifacts introduced by ultrasound scan conversion. In CDPB, receive focusing is directly performed on each displa...
详细信息
In this paper, a new compounded direct pixel beamforming (CDPB) method is presented to remove blurring artifacts introduced by ultrasound scan conversion. In CDPB, receive focusing is directly performed on each display pixel in Cartesian coordinates using the raw RF data from adjacent transmit firings so that artifacts from the scan conversion can be removed. In addition, the energy variations resulting from the distance between the transmit scanline and display pixel are compensated by utilizing the gain factor obtained from the ultrasound beam pattern. the proposed CDPB method was evaluated using simulation and in vivo liver data acquired by a commercial ultrasound machine equipped with a research package. the experimental results showed that the proposed CDPB method improved the information entropy contrast (IEC) by 23.6% compared withthe conventional scan conversion method and it reduced the blocking artifacts factor (BAF) by 16.4% over the direct pixel-based focusing method. these results indicate the proposed new direct pixel beamforming method could be used to enhance image quality in medical ultrasound imaging.
暂无评论