The identification of possible targets for a known compound by its sole molecular representation is one of the most important tasks for drug design and development. In this work, a methodology is proposed for target i...
ISBN:
(纸本)9783031425073;9783031425080
The identification of possible targets for a known compound by its sole molecular representation is one of the most important tasks for drug design and development. In this work, a methodology is proposed for target identification using supervised machine learning. To predict drug binding targets, classification models across targets were constructed using the k-NN algorithm by integrating multiple data types. Two different groups of descriptors are used: 1) Morgan's fingerprint and 2) general molecular properties of interest. The findings demonstrate that the k-NN classification models achieved a higher f1-score with descriptors based on molecular properties of interest with 0.7 in comparison to the Morgan fingerprint descriptors that achieved a score of 0.57 or the fusion of both with a score of 0.58.
The paper compares the concepts of reduction of binary attributes in rough set theory (RST) and the reduction of unary attributes or dychotomic attributes in formal concept analysis (FCA). We present some basics of bo...
ISBN:
(纸本)9783031509582;9783031509599
The paper compares the concepts of reduction of binary attributes in rough set theory (RST) and the reduction of unary attributes or dychotomic attributes in formal concept analysis (FCA). We present some basics of both theories together with a brief presentation of elements of the theory of set spaces used in the paper as a platform for mentioned comparison. Then we deliver some results on binary attribute reduction in RST and attribute reduction in FCA. We characterize independence of sets of binary attributes in RST by complete algebras of sets completely generated by completely irredundant families of sets. Then by means of complete algebras of sets and indiscernibility relations with respect to families of sets we investigate some families of FCA-attributes. And finally we present some formal context for which we prove that RST-binary attribute reduction and FCA-unary attribute reduction give the same results.
A large amount of legal and legislative documents are generated every year with highly specialized content and significant repercussions on society. Besides technical, the produced information is not semantically stan...
ISBN:
(纸本)9783031490101;9783031490118
A large amount of legal and legislative documents are generated every year with highly specialized content and significant repercussions on society. Besides technical, the produced information is not semantically standardized or format structured. Automating the document analysis, categorization, search, and summarization is essential. The Named Entity Recognition (NER) task is one of the tools that have the potential to extract information from legal documents with efficiency. This paper evaluates the state-of-the-art NER models BiLSTM+CRF and BERT+Fine-Tunning trained on Portuguese corpora through fine-tuning in the legal and legislative domains. The obtained results (F1-scores of 83.17% and 88.27%) suggest that the BERT model is superior, achieving better average results.
Training large neural networks with huge amount of data using multiple Graphic Processing Units (GPUs) became widespread with the emergence of Deep Learning (DL) technology. It is usually operated in datacenters featu...
ISBN:
(纸本)9789819958368;9789819958375
Training large neural networks with huge amount of data using multiple Graphic Processing Units (GPUs) became widespread with the emergence of Deep Learning (DL) technology. It is usually operated in datacenters featuring multiple GPU clusters, which are shared amongst users. However, different GPU architectures co-exist on the market and differ in training performance. To maximise the utilisation of a GPU cluster, the scheduler plays an important role in managing the resources by dispatching the jobs to the GPUs. An efficient scheduling strategy should take into account that the training performance of each GPU architecture varies for the different DL models. In this work, an original model-similarity-based scheduling policy is introduced that takes into account the GPU architectures that match with the DL models. The results show that using the model-similarity-based scheduling policy for distributed training across multiple GPUs of a DL model with a large batch size can reduce the makespan.
Recently, Deep Learning (DL)-based unmixing techniques have gained popularity owing to the robust learning of Deep Neural Networks (DNNs). In particular, the Autoencoder (AE) model, as a baseline network for unmixing,...
ISBN:
(纸本)9783031414558;9783031414565
Recently, Deep Learning (DL)-based unmixing techniques have gained popularity owing to the robust learning of Deep Neural Networks (DNNs). In particular, the Autoencoder (AE) model, as a baseline network for unmixing, performs well in Hyperspectral Unmixing (HU) by automatically learning a new representation and recovering original data. However, patch-wise AE based architecture, which incorporates both spectral and spatial information through convolutional filters may blur the abundance maps due to the fixed kernel shape of the used window size. To cope with the above issue, we propose in this paper a novel methodology based on graph DL called DNGAE. Unlike the pixel-wise or patch-wise Convolutional AE (CAE), our proposed method incorporates the complementary spatial information based on graph spectral similarity. A neighborhood graph based on band correlations is firstly constructed. Then, our method attempts to aggregate similar spectra from the neighboring pixels of a target pixel. Consequently, this leads to better quality of both extracted endmembers and abundances. Extensive experiments performed on two real HSI benchmarks confirm the effectiveness of our proposed method compared to other DL models.
The Abstractive text summarization has been of research interest for decades. Neural approaches, specifically recent transformer-based methods, have demonstrated promising performance in generating summaries with nove...
ISBN:
(纸本)9783031434204;9783031434211
The Abstractive text summarization has been of research interest for decades. Neural approaches, specifically recent transformer-based methods, have demonstrated promising performance in generating summaries with novel words and paraphrases. In spite of generating more fluent summaries, these approaches may yet show poor summary-worthy content selection. In these methods, the extractive content selection is majorly dependent on the reference summary with little to no focus on identifying the summary-worthy segments (SWORTS) in a reference-free setting. In this work, we leverage three metrics, namely, informativeness, relevance, and redundancy in selecting the SWORTS. We propose a novel topic-informed and reference-free method to rank the sentences in the source document based on their importance. We demonstrate the effectiveness of SWORTS selection in different settings such as fine-tuning, few-shot tuning, and zero-shot abstractive text summarization. We observe that self-training and cross-training a pre-trained model with SWORTS selected data shows competitive performance to the pre-trained model. Furthermore, a small amount of SWORTS selected data is sufficient for domain adaptation against fine-tuning on the entire training dataset with no content selection. In contrast to training a model on the source dataset with no content selection, we observe a significant reduction in the time required to train a model with SWORTS that further underlines the importance of content selection for training an abstractive text summarization model.
Autonomous robots require well-trained Anomaly Detection systems to detect unexpected hazardous events in unknown deployment scenarios. Such systems are difficult to train when data is scarce. During deployment, lots ...
ISBN:
(数字)9783031433603
ISBN:
(纸本)9783031433597;9783031433603
Autonomous robots require well-trained Anomaly Detection systems to detect unexpected hazardous events in unknown deployment scenarios. Such systems are difficult to train when data is scarce. During deployment, lots of data are collected but it is not apparent how to efficiently and effectively use that data. We propose to use Active Learning to select samples to improve the detection system performance. We benchmark 8 different query strategies, of which 2 are novel, using normalizing flow over image embeddings. While our results show that our approach has the best performance overall, choosing the right query strategy strongly depends on external factors.
In the quest for an artificial General intelligence (AGI) this paper presents a proposal for a symbol-based narrow AGI that uses a problem-driven mechanism within a certain domain. Using a small set of seeded ontology...
详细信息
In the quest for an artificial General intelligence (AGI) this paper presents a proposal for a symbol-based narrow AGI that uses a problem-driven mechanism within a certain domain. Using a small set of seeded ontology roots, simplified sentences can be constructed with surprising characteristics. Problem solving graphs with a limited depth are combined to form larger graphs.
Software obfuscation is a method that complicates data structures and algorithms in software to prevent software from being analyzed. This paper proposes a method to obfuscate loop structures in LLVM IR (LLVM is the a...
ISBN:
(纸本)9789819958368;9789819958375
Software obfuscation is a method that complicates data structures and algorithms in software to prevent software from being analyzed. This paper proposes a method to obfuscate loop structures in LLVM IR (LLVM is the abbreviation of "Low Level Virtual Machine" and IR is that of "Intermediate Representation".) by applying a fixedpoint combinator in the lambda calculus. LLVM IR is an intermediate representation in LLVM. A purpose of using intermediate representation in this paper is to handle multiple programming languages and architectures. This paper also evaluates the proposed obfuscation method by experiments. These experiments use loop programs, which are created artificially in this paper, and a practical program. The result of the experiments in this paper shows that the proposed method can obfuscate programs, but the execution times of the loop programs increase if these programs are obfuscated by the proposed method. However, the execution times of the practical programs do not increase under the obfuscation by the proposed method. It follows that the proposed method is available as an obfuscation method for practical programs.
Offline logged data is quite common in many web applications such as recommendation, Internet advertising, etc., which offers great potentials to improve online decision making. It is a non-trivial task to utilize off...
ISBN:
(纸本)9783031333767;9783031333774
Offline logged data is quite common in many web applications such as recommendation, Internet advertising, etc., which offers great potentials to improve online decision making. It is a non-trivial task to utilize offline logged data for online decision making, because the offline logged data is observational and it may mislead online decision making. The VirUCB is one of the latest notable algorithmic frameworks in this research line. This paper studies how to extend VirUCB from upper confidence bound (UCB) based online decision making to Thompson sampling based online decisionmaking, for the purpose of improving the online decision accuracy. We first show that naively applying Thompson sampling to the VirUCB framework is not effective and we reveal fundamental insights on why it is not effective. Based on these insights, we design a filtering algorithm to filter out the logged data corresponding to the optimal arm. To address the challenge that the optimal arm is unknown, we estimate it through the posterior of the reward mean. Putting them together, we obtain our VirTS-DF algorithm. Extensive experiments on two real-world datasets validate the superior performance of VirTS-DF.
暂无评论