We introduce Cell2Sentence (C2S), a novel method to directly adapt large language models to a biological context, specifically single-cell transcriptomics. By transforming gene expression data into"cell sentences...
详细信息
We introduce Cell2Sentence (C2S), a novel method to directly adapt large language models to a biological context, specifically single-cell transcriptomics. By transforming gene expression data into"cell sentences," C2S bridges the gap between natural language processing and biology. We demonstrate cell sentences enable the finetuning of language models for diverse tasks in biology, including cell generation, complex cell-type annotation, and direct data-driven text generation. Our experiments reveal that GPT-2, when fine-tuned with C2S, can generate biologically valid cells based on cell type inputs, and accurately predict cell types from cell sentences. This illustrates that language models, through C2S finetuning, can acquire a significant understanding of single-cell biology while maintaining robust text generation capabilities. C2S offers a flexible, accessible framework to integrate natural language processing with transcriptomics, utilizing existing models and libraries for a wide range of biological applications. Copyright 2024 by the author(s)
Learning activities are an indicator of the learner's desire to learn during the learning process. The pattern of learner action is related to learning activities. In this case, in extracting the learning process,...
详细信息
Whether future AI models are fair, trustworthy, and aligned with the public’s interests rests in part on our ability to collect accurate data about what we want the models to do. However, collecting high-quality data...
详细信息
To determine whether Ohio college re-opening plans were effective in controlling the spread of COVID-19, cumulative case counts by county were gathered to compare various metrics related to the spread of COVID-19 case...
To determine whether Ohio college re-opening plans were effective in controlling the spread of COVID-19, cumulative case counts by county were gathered to compare various metrics related to the spread of COVID-19 cases between counties with NCAA colleges and counties without NCAA colleges. Various non-parametric statistical tests were used to determine if the samples were similar, and the analysis found the differences were statistically significant. Metropolitan and non-metropolitan groupings were also added to further subdivide the data set, but the analysis found no statistically significant differences in this case.
Laparoscopic surgery has transformed conventional open surgery. Robot-Assisted laparoscopic surgery which is minimally invasive is effective for operations in limited space. Nevertheless, the robotic system which is u...
详细信息
After Large-Scale Social Restriction (PSBB) established in Jakarta, a change of air quality was indicated by the citizens. Representatives of Indonesia's Agency for Meteorology, Climatology, and Geophysics (BMKG) ...
详细信息
This paper introduces SARA, a semantic-assisted reinforced active learning framework for enhancing entity alignment (EA) under limited supervision scenarios. SARA addresses the challenges of EA in real-world scenarios...
详细信息
ISBN:
(数字)9798350359312
ISBN:
(纸本)9798350359329
This paper introduces SARA, a semantic-assisted reinforced active learning framework for enhancing entity alignment (EA) under limited supervision scenarios. SARA addresses the challenges of EA in real-world scenarios, including knowledge graph heterogeneity and limited training ground truth. SARA effectively selects valuable entity pairs with limited labeled data by combining reinforced active learning and semantic information. It utilizes a pair-wise language model based on Sentence-BERT to learn informative name embeddings that capture entity name semantics. These embeddings are combined with structural embeddings and trained using a novel semantic-assisted alignment loss. Extensive experiments on benchmark datasets and a real-world dataset demonstrate the superiority of SARA over existing approaches, particularly in limited labeled data scenarios. The paper also provides insights into fine-tuning strategies, presents ablation studies, and conducts sensitivity analyses to validate the effectiveness of SARA.
In agricultural water research, the adoption of Internet of Things (IoT) technology has emerged as a pivotal approach for large-scale data collection. Water availability in the context of water quality is very importa...
In agricultural water research, the adoption of Internet of Things (IoT) technology has emerged as a pivotal approach for large-scale data collection. Water availability in the context of water quality is very important, both for domestic and industrial purposes. For domestic purposes, drinking water and bathing water are separated. Meanwhile, for the palm oil industry, boiler filler is differentiated from additional process water (dilution water). Water quality parameters can be assessed from turbidity and Total Dissolve Solid (TDS). Measurements using measuring instruments separately and repeatedly require significant energy, time, and costs. This research was conducted with the primary objective of presenting a novel method for categorizing water quality with the approach of IoT sensor technology. The research methodology entailed the utilization of an integrated IoT water sensors system in conjunction with manual water categorization. The methods consist of (1) system design, (2) design and installation of sensor and IoT-based microcontrollers, and (3) accuracy and precision testing compared with laboratory measurements. The precision of the integrated IoT water sensors was assessed through a dedicated sensor precision test, resulting in an accuracy rate of 94.4% for the turbidity sensor and 97.5% for the TDS sensor. Notably, this approach successfully discriminated drinking water with valid categorization, while other water types, including groundwater, water with tea, and water with coffee, yielded null categorization results.
Cumulative COVID-19 case counts by county in Ohio were gathered and combined with population data from the Census Bureau and student enrollment by county from the Integrated Postsecondary Education data System (IPEDS)...
详细信息
暂无评论