The escalating demand for skilled IT professionals underscores the increasing significance of the recruitment process. Traditional methods often fall short in identifying individuals poised for success in the dynamic ...
详细信息
ISBN:
(数字)9798350363104
ISBN:
(纸本)9798350363111
The escalating demand for skilled IT professionals underscores the increasing significance of the recruitment process. Traditional methods often fall short in identifying individuals poised for success in the dynamic technology industry. This paper aims to address the challenges in recruiting proficient applicants by proposing an interpretable machine learning model for assessing IT job candidates. The study conducts experiments comparing various machine learning algorithms, including Logistic Regression, Random Forest, and Decision Tree, using data from the ‘IT job applicant’ public dataset. Evaluation methods such as the confusion matrix, accuracy, recall, and precision are employed to assess the model's performance. The findings reveal that the decision tree exhibits an accuracy exceeding 99.8%, establishing it as the top-performing model. The key contributing factor to this accuracy is identified as the 'ComputerSkills' feature. Additionally, within the decision tree, *** and Microsoft SQL Server are highlighted as influential factors positively impacting individual decision interpretation.
Accurate box office prediction is crucial for managing financial risks in film production. The internet has transformed consumer behavior, affecting marketing strategies. Critical online reviews, more than early reven...
详细信息
ISBN:
(数字)9798350366648
ISBN:
(纸本)9798350366655
Accurate box office prediction is crucial for managing financial risks in film production. The internet has transformed consumer behavior, affecting marketing strategies. Critical online reviews, more than early revenue, predict a film's profit, while establishing film critics as the predictors. Hence, this study investigates the intricate relationship between ratings and movies profit. While also finding the best model to predict those variables. This research was conducted through a comprehensive process encompassing data collection and processing, feature selection, prediction modeling, and a detailed discussion of the conducted analysis. Several manual interventions were applied to the data to support further discoveries and analysis in this research. These interventions were carried out based on a solid foundation to enhance the depth of exploration and understanding within the study. The findings reveal that the random forest regression emerged as the most effective in this study, achieving the Mean Squared Error (MSE) of 768.40. Leveraging a dataset of 500 movies from ***, incorporating critics' ratings from ***. Through this research, industry professionals not only enhance their understanding but also identify factors that can be considered to increase a movie’s revenue.
Transformer models, originally successful in natural language processing, are now being applied to chemical and biological studies, excelling in areas such as molecular property prediction, material science, and drug ...
详细信息
ISBN:
(数字)9798331510732
ISBN:
(纸本)9798331510749
Transformer models, originally successful in natural language processing, are now being applied to chemical and biological studies, excelling in areas such as molecular property prediction, material science, and drug discovery. BERT, a Transformer-based model, has become foundational in cheminformatics, particularly for QSAR (Quantitative Structure-Activity Relationship) modeling and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) evaluations in drug discovery. However, achieving higher accuracy often requires designing more complex models, which can compromise their interpretability. This posing a challenge for researchers who need to understand the reasoning behind the predictions. The trade-off between accuracy and interpretability presents a critical challenge in applying black box models to real-world problems in cheminformatics. This work compares Transformer-based models with traditional machine learning and deep learning approaches, focusing on both interpretability and performance. The goal is to highlight the strengths and limitations of each method, offering insights into their optimal use in drug discovery and material science.
This research offers a new perspective on predicting the activity of the HIV virus from the Drug Therapeutics program (DTP) Antiviral Screen by using the molecular data represented in SMILES notation. The topic has si...
详细信息
ISBN:
(数字)9798350363432
ISBN:
(纸本)9798350363449
This research offers a new perspective on predicting the activity of the HIV virus from the Drug Therapeutics program (DTP) Antiviral Screen by using the molecular data represented in SMILES notation. The topic has significance as it focuses on a major global health issue using modern computational approaches and has the potential to uncover new antiviral drug candidates, which could eventually save lives and improve public health outcomes. The study addresses the data imbalance between two classes, active and inactive, and employs the Morgan Fingerprint method for feature extraction, along with the Graph Convolutional Network (GCN) and Graph Attention Network (GAT) as the baseline architectures and the fusion of GCN's and GAT's main features as the proposed architecture. The random oversampling technique is applied to alleviate dataset imbalances. However, even though it improved the training process, the performance of the model flopped when the test set was fed into the model. Combining the main features in GCN and GAT, the proposed model was able to do the classification task more accurately. The attention mechanism from GAT allows the model to focus more on the parts that are more relevant and ignore the irrelevant ones. It managed to outperform the baseline models. Despite a high overall accuracy of 94%, the fusion model exhibits significant disparities in precision, recall, and f1-score metrics, potentially due to class imbalance. Random oversampling led to improved training but compromised model performance on the test set.
The box office (BO) income had significantly declined up to 80% in 2020, as the COVID-19 pandemic emerged. To minimize further financial risks, multiplex (multiple cinema complexes) owners need to analyze their potent...
详细信息
The box office (BO) income had significantly declined up to 80% in 2020, as the COVID-19 pandemic emerged. To minimize further financial risks, multiplex (multiple cinema complexes) owners need to analyze their potential income for each movie, each week. Therefore, we developed a proper data mining strategy that allows multiplex owners to analyze and discover insights on how successfully produced movies could be. The methodology comprises (1) data loading and exploration, (2) data cleaning, (3) data selection, integration, and transformation using Pentaho, (4) data mining in which the results were stored in the MySQL database, and (5) pattern evaluation and presentation using Qlik Sense as the Business Intelligence (BI) dashboard. Based on our data mining methodology, we revealed that drama, comedy, action, and thriller are favorite genres. We also found that DreamWorks Animation and Pixar Animation Studios are both the most popular production houses, even Apatow Productions and Escape Artists still have the biggest revenue on average.
The study investigates the increasing demand of online learning as a means of addressing education issues in the context of the COVID-19 epidemic. Online learning requires several adaptations for teaching methods, lea...
详细信息
ISBN:
(数字)9798350376111
ISBN:
(纸本)9798350376128
The study investigates the increasing demand of online learning as a means of addressing education issues in the context of the COVID-19 epidemic. Online learning requires several adaptations for teaching methods, learning methodologies, and devices needs. The flexibility of both teachers and students is essential to these adaptations. This research uses dataset collected from a survey about student adaptability level in online education. Carrying out the preprocessing is a challenge as the data used in this research has imbalanced value on the target category. Based on this problem, the aim of this research was created, namely to categorize students’ adaptivity levels in online learning and also focuses on finding out approaches that can provide solutions to overcome imbalanced values in the dataset. The model uses ensemble methods - Bagging, Boosting, and Voting with machine learning algorithms. There are two models that stand out, with Soft Voting obtaining the best performance with $\mathbf{9 0 \%}$ accuracy.
Customers in the banking industry nowadays have many options when deciding where to invest their money. Customer retention and churn have thus emerged as crucial challenges for the majority of banks. This research tri...
详细信息
Indonesia's tourism sector, boosted by its captivating landscapes, has seen a rise in the popularity of Online Travel Agencies (OTAs) like Traveloka, ***, and Agoda. To effectively assist tourists, OTAs are antici...
详细信息
ISBN:
(数字)9798350363432
ISBN:
(纸本)9798350363449
Indonesia's tourism sector, boosted by its captivating landscapes, has seen a rise in the popularity of Online Travel Agencies (OTAs) like Traveloka, ***, and Agoda. To effectively assist tourists, OTAs are anticipated to comprehend tourists' needs and interests, with insights gathered through scraping data from the Google Play Store. Through topic extraction from customers' reviews, this research aims to optimize the tourist experience and offer actionable insights for the tourism industry's marketing strategies. Furthermore, it endeavors to eliminate the reliance on brute force approaches in determining the candidate number of topics. To achieve the research goal, this research employs topic modeling, specifically Latent Dirichlet Allocation (LDA) enhanced through the K- means elbow method. The evaluation of the optimal topic number utilizes a coherence score, while human judgement serves as a quantitative metric for overall performance, with a remarkable validity rate across three platforms: Agoda (91%), *** (88%), and Traveloka (94%). This highlights the approach's effectiveness in accurately identifying and categorizing topics. In conclusion, the model accurately discerned essential topics, revealing that the majority of the reviews on these OTAs focus on the transaction and refund systems on each platform. Furthermore, this model offers promising recommendations to enhance the understanding and response to customers' reviews, facilitating faster and more effective responses. These insights are invaluable for continuous development and improvement strategies within the dynamic changes in the tourism industry across these influential platforms.
The projected increase in PayLater utilization reaches up to five million people by 2025. To optimize the yearly profit from their PayLater service, fintech companies must examine all possible risks before a unanimous...
详细信息
The projected increase in PayLater utilization reaches up to five million people by 2025. To optimize the yearly profit from their PayLater service, fintech companies must examine all possible risks before a unanimous decision is taken. Therefore, we proposed a unified decision framework derived from decision theory and the Monte Carlo simulation technique. Two schemes were coined: (1) a decision-making scheme, and (2) a risk simulation scheme. Throughout experiments, the framework was able to estimate several alternative decisions and their impacts, analyze the causes of failure and delays in the development of the PayLater service, and execute Monte Carlo simulations in up to 10,000 trials. Outputs of this study will benefit decision-makers in the fintech initiative before launching their PayLater products.
Natural disasters, including earthquakes, cyclones, floods, and wildfires, cause significant environmental damage and have emerged as a major global issue. These events can result in loss of life and disrupt communiti...
详细信息
ISBN:
(数字)9798350353464
ISBN:
(纸本)9798350353471
Natural disasters, including earthquakes, cyclones, floods, and wildfires, cause significant environmental damage and have emerged as a major global issue. These events can result in loss of life and disrupt communities and economy. Effective disaster management relies on timely and accurate information. With social media becoming a primary source of real-time information during such events, many images of these disasters are shared, though some may not be accurately labeled. Therefore, the purpose of this study is classifying natural disaster images, mainly cyclone, wildfire, flood, earthquake from social media images. This paper is to conduct a comparative analysis over 4 models namely VGG16, VGG19, EfficientNetB0, and ResNet-50. This paper uses a dataset that contains 4428 human-annotated images including cyclones, earthquakes, floods, and wildfires. Results concluded that ResNet-50 and EfficientNetB0 give the best result with both got over 96% average accuracy. ResNet-50 delivers more stable and more convergent results compared to EfficientNetB0, but it delivers slightly less classification accuracy. VGG16 and VGG19 are both outclassed in terms of accuracy, but better in training time, with VGG19 having a more convergent result.
暂无评论