As an important research direction of computer vision, target detection has been widely used in face recognition, intelligent driving, robot navigation and other fields. In recent years, with the deepening research on...
详细信息
The relation extraction (RE) in complex scenarios faces some challenges such as diverse relation types and ambiguous relations between entities within a single sentence, leading to the poor performance of pure "t...
Proteins serve as the functional building blocks of life, facilitating critical tasks such as signaling, catalysis, and structural support in all living organisms. Designing proteins with targeted biological features ...
详细信息
ISBN:
(数字)9798350386226
ISBN:
(纸本)9798350386233
Proteins serve as the functional building blocks of life, facilitating critical tasks such as signaling, catalysis, and structural support in all living organisms. Designing proteins with targeted biological features or domains is of utmost importance. Traditional wet-lab experiments are time-consuming and resource-intensive, which makes deep learning (DL) methods ideal alternatives. However, existing DL methods predominantly focus on generating new proteins with the same biological domain as the training data, and overlook some scenarios where designers expect to combine proteins from different biological domains to create novel proteins with both features, which can show better fits for practical purpose. To fill this gap, in this paper, we present ComProtein, a novel framework further exploiting the potential of pre-trained protein large language models, which is the first work aiming to generate innovative proteins with combinative biological features from two different domains. This process is performed by a cycle-consistent generative adversarial approach, leveraging insights from the latent space. It enables the transformation of protein representations from one biological domain to another, while preserving their intrinsic features. Additionally, we introduce new evaluative metrics, namely Shortest Target Neighbor Distance (STND), Mutual Root Mean Square Deviation (MRMSD) and Sequence Diversity (SD) on the evaluation of biological representations, protein structure and sequence quality, respectively to complement the existing measures. Our experimental results demonstrate that our proposed method performs better and has great potential in biological representations, structure similarity, homology relationships, and sequence quality.
Online platforms have supported users in collaborating and communicating with each other distantly. Adopting online platforms interconnected with the virtual world, especially the metaverse, has fostered interactive a...
详细信息
Every year, astronomers from around the world submit research proposals to the Atacama Large Millimeter Array (ALMA), the largest radio telescope array in the world. The aim of the current work is to streamline the pr...
详细信息
ISBN:
(数字)9798350385144
ISBN:
(纸本)9798350385151
Every year, astronomers from around the world submit research proposals to the Atacama Large Millimeter Array (ALMA), the largest radio telescope array in the world. The aim of the current work is to streamline the proposal process for astronomers submitting projects to ALMA by suggesting frequency ranges that may be relevant to their research based on their proposal text. We introduce a pipeline of supervised and unsupervised machine learning models, each using various representations of the title and abstract of an incoming proposal. First, a logistic regression filters out proposed projects that are not expected to need specific technical setups. Second, if a technical setup is deemed necessary, our pipeline assigns an incoming project to one of 50 "similar project" groups, defined by topics generated from Latent Dirichlet Allocation (LDA). Third, we apply Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to mine patterns in measurements ("areas of interest") made in previous projects, for each one of the 50 "similar project" groups. In parallel to the aforementioned topic modeling and HDBSCAN mining, we employ a Multinomial Naive Bayes classifier to predict the broad frequency range defined by the technical limitations of ALMA (frequency band) that we expect a project to make measurements in. Finally, we offer researchers a list of the mined "areas of interest" filtered by the predictions of the Multinomial Naive Bayes classifier. Ultimately, given a proposed project title and abstract, our pipeline generates several recommended "areas of interest" that one should consider measuring *** the performance of our models, we find that 67.17% of test projects match at least one of the recommended "areas of interest", with an average hit rate of 44.72% across measurements within each test project, when limiting to the top two band predictions. When we disregard band predictions, 88.81% of test projects match at least one recomm
IT controls in information systems play an important role for companies. One common control is the management and verification of daily logs. Text-based logs (keystrokes, communication history, and application informa...
IT controls in information systems play an important role for companies. One common control is the management and verification of daily logs. Text-based logs (keystrokes, communication history, and application information) are often used to verify that the system is operating properly. However, some systems can only record PC screenshot image logs to prioritize stable operation. In such systems, checking the logs is time consuming, making it difficult to check the logs on a daily basis. In addition, if an auditor wants to detect anomalous operations, the auditor needs to know the correct operation of the system, which becomes very difficult when targeting a large number of systems. In this study, we aimed to convert user operations from screenshot images of PCs into graph structures and use the features of the graph structures for anomaly detection. The proposed method groups image features from screenshot images based on similarity, transforms feature transitions into graph structures, and detects anomalous operations using graph autoencoder-based learning. We demonstrate that the proposed method can detect anomalous operations with a recall rate of over 70%.
With the advancement of the Internet of Things (IoT) technologies, there has been a rapid increase in the volume of IoT data, leading to escalating costs in storage, transmission, and analytics. The benefits of conven...
详细信息
Everyone depends on numerous sources of E-news in today's world when the internet is ubiquitous. Online content abounds, especially social media feeds, many of which are unreliable and may not always be factual. F...
详细信息
This paper introduces a semi-supervised learning technique for model-based clustering. Our research focus is on applying it to matrices of ordered categorical response data, such as those obtained from the surveys wit...
详细信息
Recent advances in Large Language Models (LLMs) have enabled the semantic description of textures in natural language, aiming to capture them in richer detail. However, most methods are confined to either depending on...
详细信息
ISBN:
(数字)9798331510831
ISBN:
(纸本)9798331510848
Recent advances in Large Language Models (LLMs) have enabled the semantic description of textures in natural language, aiming to capture them in richer detail. However, most methods are confined to either depending on supervised training with pairs of images and manually annotated visual attributes that most texture datasets lack or using Vision-Language Models (VLMs) such as CLIP. In this paper, we develop an encoder-agnostic Weakly supervised Texture Description Generator (WTDG) that employs a novel Scaled Ranked Kullback-Leibler divergence (SR-KL) loss between image and text modalities. Within the SR-KL loss formulation, we leverage category information, which is always available as ground-truths for all benchmark texture recognition datasets. We further extend our proposed WTDG to assist in texture recognition by using its generated texture descriptions. Thus, we develop a multimodal framework, called $T e x^2$ , which is adept at simultaneous generation of texture description and recognition. Our approach exhibits promising performance in describing and recognizing textures on benchmark datasets.
暂无评论