The web continues to grow and attacks against the web continue to increase. This paper focuses on the literature review on scanning web vulnerabilities and solutions to mitigate web attacks. Vulnerability scanning met...
详细信息
In this digital era, we are exposed to a large amount of data. This includes biological data, which stores information about living organisms, including Deoxyribonucleic acid (DNA), genes, and proteins. With the devel...
详细信息
ISBN:
(纸本)9781665453967
In this digital era, we are exposed to a large amount of data. This includes biological data, which stores information about living organisms, including Deoxyribonucleic acid (DNA), genes, and proteins. With the development of information technology and information system, most of available biological data are stored in an online public database. Many of the databases are free-access and easily used, which helps the users, especially researchers, to make use of the data. Among the known public biological databases are the University of California Santa Cruz (UCSC) Genome Browser Database and the Rat Genome Database (RGD). These two databases provide access to the biological data from different organisms. This paper aims to describe the technology of public biological databases. Also elucidated in this paper are the differences features between UCSC Genome Browser Database and the RGD. Our results showed that the UCSC contains much more biological data and features than the RGD. However, the genome browser of UCSC has a complex display, while the RGD has a simple display. Overall, both databases give the users the option to choose the most suitable database for them.
Churn prediction methods are widely used to anticipate customer churn from services provided by a company for some reasons. This study aims to develop an optimal churn prediction model based on customer data from a te...
Churn prediction methods are widely used to anticipate customer churn from services provided by a company for some reasons. This study aims to develop an optimal churn prediction model based on customer data from a telecommunication company in Indonesia. The model development and evaluation processes are performed by following the Cross-Industry Standard Process for Data Mining (CRISP-DM), which consist of business understanding, data understanding, data preparation, modelling, and evaluation. Various combination of data preparation and modelling methods have been evaluated. The evaluation results show that the combination of feature selection and prediction model yields better results compared to prediction model without feature selection. The highest accuracy is achieved by Random Forrest at 97.82%, which is followed by Decision Tree at 97.06%, and Naive Bayes at 90.62%. This result indicates that a prediction model can be reliably used to predict customer churn in a telecommunication company.
It is necessary to study technical factors such as bait used, oceanographic conditions of fishing areas and skipjack tuna trade patterns (Katsuwonus pelamis) as well as other factors in Sulawesi Fisheries. Supporting ...
详细信息
Interactions that occur on Twitter social media are easier to do and can easily reach all levels of society, as well as conversations tweeted by users. They can easily spread information or issues that are developing....
Interactions that occur on Twitter social media are easier to do and can easily reach all levels of society, as well as conversations tweeted by users. They can easily spread information or issues that are developing. Based on this, the use of conversational data on the Twitter platform can get an overview of issues that are developing and even those that are just being formed from community conversations on this Twitter platform. Still related to previous research, which researched to implement the Twitter social media monitoring system to provide access to additional information for journalists. This study aligns with its predecessors but comes with a richer data collection mechanism, so it can gather more conversations. The difficulty to be answered in this study is the difficulty of journalist determining a bunch of location of an incident on a Twitter conversation, which describes the name of a place or contains the address of a location in it. Also in this study, there are additional modifications to extract location names using the NER model, which has been customized according to the structure of naming location names in Indonesia. So from the text of the Twitter conversation, whether it's an address name or a description of a location name, it can get a location name that refers to the name of a place where it could be the location where an event occurred. The results of an Indonesian language location NER model, with an accuracy score of 97.67%, with a precision value of 90.57%, recall 59.30% and f1-score 71.67%.
This paper presented a sentiment analysis of the Indonesian government's policies in overcoming Covid 19 through twitter data using several classification methods, namely SVM, Naive Bayes, and LSTM. Based on the a...
详细信息
ISBN:
(纸本)9781665401524
This paper presented a sentiment analysis of the Indonesian government's policies in overcoming Covid 19 through twitter data using several classification methods, namely SVM, Naive Bayes, and LSTM. Based on the analysis of the twitter data, it was found that the twitter community in Indonesia gave negative sentiments to government policies in handling Covid 19. From the experimental results, it was found that SVM gave the best sentiment results compared to Naïve Bayes and LSTM by providing an accuracy of 88.5%.
Sarcasm is the use of words usually used to either mock or annoy someone, or for humorous purposes. Sarcasm is largely used in social networks and microblogging websites, where people mock or censure in a way that mak...
详细信息
The number of findings in cancer genomics research has grown rapidly in the last decade due to the decline in the cost of human sequencing and genotyping. However, the majority of the reported significant marker assoc...
详细信息
The number of findings in cancer genomics research has grown rapidly in the last decade due to the decline in the cost of human sequencing and genotyping. However, the majority of the reported significant marker associated with cancer traits are based on European and East Asian population. Large population such as South Asian and South-East Asian population are under-represented in genomics research. In this study, we explored the possibility of computing a Polygenic Risk Score (PRS) of colorectal cancer on our test sample based on reported significant Single Nucleotide Polymorphism (SNP). The SNPs used to compute the risk score were collected from GWAS Central and GWAS Catalog. Significant SNPs from IC3 study were used as a benchmark. The result shows that calculating colorectal cancer risk score using reported significant marker from different population group is possible. The p-value of our statistic model shows significant differences between case and control group risk score.
Adopting a deep learning model into bird sound classification tasks becomes a common practice in order to construct a robust automated bird sound detection system. In this paper, we employ a four-layer Convolutional N...
详细信息
Adopting a deep learning model into bird sound classification tasks becomes a common practice in order to construct a robust automated bird sound detection system. In this paper, we employ a four-layer Convolutional Neural Network (CNN) formulated to classify different species of Indonesia scops owls based on their vocal sounds. Two widely used representations of an acoustic signal: log-scaled mel-spectrogram and Mel Frequency Cepstral Coefficient (MFCC) are extracted from each sound file and fed into the network separately to compare the model performance with different inputs. A more complex CNN that can simultaneously process the two extracted acoustic representations is proposed to provide a direct comparison with the baseline model. The dual-input network is the well-performing model in our experiment that achieves 97.55% Mean Average Precision (MAP). Meanwhile, the baseline model achieves a MAP score of 94.36% for the mel-spectrogram input and 96.08% for the MFCC input.
With an increasing interest in the digitization effort of ancient manuscripts, ancient character recognition becomes one of the most important areas in the automated document image analysis. In this regard, we propose...
详细信息
With an increasing interest in the digitization effort of ancient manuscripts, ancient character recognition becomes one of the most important areas in the automated document image analysis. In this regard, we propose a Convolutional Neural Network (CNN)-based classifier to recognize the ancient Sundanese characters obtained from a digital collection of Southeast Asian palm leaf manuscripts. In this work, we utilize two different preprocessing techniques for the dataset. The first technique involves the use of geometric transformations, noise background addition, and brightness adjustment to augment the imbalanced samples to be fed into the classifier. The second technique makes use of the Otsu’s threshold method to binarize the characters and only uses the usual geometric transformations for the data augmentation. The proposed network with different data augmentation processes is trained on the training set and tested on the testing set. Image binarization from the second technique can outperform the performance of the CNN-based classifier upon the first technique by achieving a testing accuracy of 97.74%.
暂无评论