In this article, we have designed a novel variable-stiffness soft gripper that combines the advantages of both flexible and rigid grippers. It is capable of performing two different tasks: pinching and enveloping. The...
详细信息
The efficiency of state-of-the-art convolutional networks trained to detect lung cancer nodules depends on their feature extraction model. Various feature extraction models have been proposed based on convolutional ne...
The efficiency of state-of-the-art convolutional networks trained to detect lung cancer nodules depends on their feature extraction model. Various feature extraction models have been proposed based on convolutional networks, such as VGG-Net, or ResNet. It has been demonstrated that such models effectively extract features from objects in an image. However, their efficacy is limited when the objects of interest are very small, such as lung nodules. One of the widely used feature extraction models for detecting small objects is the VGG16 network. The model, which has a small kernel of $\mathbf{3}\times \mathbf{3}$ and optimal layers, can extract the features of small objects with reasonable accuracy. In this article, feature maps are created by combining the last three layers of the VGG16 network to extract features of various sizes of nodules. This study utilizes a Region Proposal Network (RPN) to compare the accuracy of the feature map created in the proposed method and the original VGG16. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which Faster R-CNN uses for detection. In this article, we select 300, 1, 000 and 2, 000 regions chosen by the RPN network for each method; then, we calculate the recall for different Intersection over Union (IoU) ratios with ground-truth boxes. The results show that the feature map of the proposed method works more optimally than the feature map of different layers of VGG16 for extracting various sizes of nodules. Also, by reducing the number of selected region proposals, the recall of the proposed method has fewer changes than other methods.
This paper proposes an AI-based video metadata extension model to overcome the limitations of video search and recommendation systems in the multimedia industry. Current video searches and recommendations utilize pre-...
This paper proposes an AI-based video metadata extension model to overcome the limitations of video search and recommendation systems in the multimedia industry. Current video searches and recommendations utilize pre-added metadata. Metadata includes filenames, keywords, tags, genres, etc. This makes it impossible to make direct predictions about the content of a video without pre-added metadata. These platforms also analyze your previous search history, viewing history, etc. to understand your interests in order to serve you personalized videos. This may not reflect the actual content and may raise privacy concerns. In addition, recommendation systems suffer from a cold start problem, which is the lack of an initial target, as well as a bubble effect. Therefore, this study proposes a search and recommendation system by expanding metadata in videos using techniques such as shot boundary detection, speech recognition, and text mining. The proposed method selects the main objects required by the recommendation system based on the object frequency and extracts the corresponding objects from the video frame by frame. In addition, we extract the speech from the video separately, convert the speech to text to extract the script and apply text mining techniques to the extracted script to quantify it. Then, we synchronize the object frequency and the transcript to create a single contextual data. After that, we group videos and clips based on the contextual data and index them. Finally, we utilize Shot Boundary Detection to segment videos based on their content. To ensure that the generated contextual data is appropriate for the video, the proposed model compares the extracted script with the video's subtitle data to check and calibrate its accuracy. The model can then be fine-tuned by tuning and cross-validating the hyperparameter to improve its performance. These models can be incorporated into a variety of content discovery and recommendation platforms. By using expanded
Emotional voice conversion (EVC) involves modifying various acoustic characteristics, such as pitch and spectral envelope, to match a desired emotional state while preserving the speaker's identity. Existing EVC m...
详细信息
This research explores the application of LSTM, GRU and Transformer models for predicting stock prices, aiming to enhance accuracy in financial forecasting. Stock price prediction is crucial for investment decision-ma...
详细信息
ISBN:
(数字)9798331542559
ISBN:
(纸本)9798331542566
This research explores the application of LSTM, GRU and Transformer models for predicting stock prices, aiming to enhance accuracy in financial forecasting. Stock price prediction is crucial for investment decision-making, yet challenging due to market volatility and complex patterns. The objectives are to evaluate the performance of LSTM, GRU and Transformer models using key metrics such as test loss, MAE, and MSE, and to compare their predictive capabilities. The LSTM model demonstrates robust performance with low test loss and MAE, indicating precise predictions and effective pattern recognition in financial data. In contrast, the Transformer model also shows promising results with relatively low test loss and MAE, albeit with larger errors in MSE and MAE metrics. Both models highlight the potential for accurate stock price prediction, suggesting avenues for future research to optimize model performance and reliability in financial forecasting applications. The experimental results show that GRU Outperformed LSTM and Transformer with an MSE of 0.0008, MAE of 0.0023, and high test accuracy of 0.9833.
Personality trait recognition is an important psychological paradigm to understand the differences in people’s behavior. This paper presents a new dataset, which we dubbed as PROPER (Personality Recognition based On ...
详细信息
Personality trait recognition is an important psychological paradigm to understand the differences in people’s behavior. This paper presents a new dataset, which we dubbed as PROPER (Personality Recognition based On Public Speaking using Electroencephalography Recordings) that connects the personality traits of an individual with public speaking activity via electroencephalography (EEG) signals. EEG data of 40 healthy individuals is recorded before, during, and after public speaking activity using Muse headband. A score from the Big Five Personality Trait questionnaire is used to label the participant’s EEG data. A statistical analysis of EEG signals for each personality trait during different phases of the experiment is performed. The personality recognition process involves data acquisition, pre-processing, feature extraction and selection, and classification. Five feature groups are extracted from the frequency bands of EEG data of each channel. Feature selection is applied to the extracted features via the wrapper method. Support vector machine, the Naive Bayes, and multilayer perceptron (MLP) are used to classify the personality traits. An average F1-score of 0.95 for extroversion, 0.94 for openness to experience, 0.90 for conscientiousness, 0.84 for neuroticism, and 0.85 for agreeableness is achieved using the MLP classifier using pre-stimulus, during activity, and post-stimulus EEG data respectively. Authors
Diagnosing choroidal nevus in color fundus images is challenging for clinicians not regularly practicing it. Machine learning (ML) has proven effective in detecting and analyzing such abnormalities with high accuracy ...
详细信息
Combinatorial test suite generation is a critical aspect of software testing, particularly for systems with variable-strength interactions. Traditional optimization algorithms often struggle to efficiently generate mi...
详细信息
The Fortran programming language is widely utilized in numerical computation and scientific computing. Fortran programs are prone to potential runtime errors related to numerical properties due to the large number of ...
详细信息
作者:
Liu, ZiyiWang, ZengmaoDu, BoWuhan University
National Engineering Research Center for Multimedia Software School of Computer Science Artificial Intelligence Institute of Wuhan University Wuhan430072 China
Chest X-ray images have been highly involved in clinical diagnosis and treatment planning for thoracic disease. The process of medical images has attracted great attention in the machine learning community. However, t...
详细信息
暂无评论