With the adoption of foundation models(FMs),artificial intelligence(AI) has become increasingly significant in bioinformatics and has successfully addressed many historical challenges,such as pre-training frameworks,m...
详细信息
With the adoption of foundation models(FMs),artificial intelligence(AI) has become increasingly significant in bioinformatics and has successfully addressed many historical challenges,such as pre-training frameworks,model evaluation and *** demonstrate notable proficiency in managing large-scale,unlabeled datasets,because experimental procedures are costly and labor *** various downstream tasks,FMs have consistently achieved noteworthy results,demonstrating high levels of accuracy in representing biological entities.A new era in computational biology has been ushered in by the application of FMs,focusing on both general and specific biological *** this review,we introduce recent advancements in bioinformatics FMs employed in a variety of downstream tasks,including genomics,transcriptomics,proteomics,drug discovery and single-cell *** aim is to assist scientists in selecting appropriate FMs in bioinformatics,according to four model types:language FMs,vision FMs,graph FMs and multimodal *** addition to understanding molecular landscapes,AI technology can establish the theoretical and practical foundation for continued innovation in molecular biology.
We construct a Chinese Economic Event Treebank (CEETB), focusing on revealing economic and finance events and their relations. Investigating economic event relations will benefit academic research and practice in not ...
详细信息
We construct a Chinese Economic Event Treebank (CEETB), focusing on revealing economic and finance events and their relations. Investigating economic event relations will benefit academic research and practice in not just economics but many other scientific areas. The characteristics of economic-related texts (e.g., abundant longer enterprises names and terms) and the Chinese language speciality (e.g., component ellipsis in long sentences) have resulted in challenges in the event relation extraction task. Existing Chinese corpora containing economic event relations mainly focused on finance areas (e.g., the equity market) and only covered a few event types. To support research that may involve economic text analysis in Chinese, our CEETB is constructed following a carefully designed process. First, based on practical and research requirements, we summarize nine different types of event relations and four types of component ellipses in economic texts. Then, an excellent annotation scheme is presented to hyalinize the model, strategy, and process in annotation, followed by statistical analysis and quality evaluation for the CEETB corpus. Finally, to demonstrate the strengths of the constructed corpus in practical applications, we conduct experiments on five SOTA models for event relation extraction.
Existing models on event detection share three -fold limitations, including (1) insufficient consideration of the structures between dependency relations, (2) limited exploration of the directed -edge semantics, and (...
详细信息
ISBN:
(纸本)1577358872
Existing models on event detection share three -fold limitations, including (1) insufficient consideration of the structures between dependency relations, (2) limited exploration of the directed -edge semantics, and (3) issues in strengthening the event core arguments. To tackle these problems, we propose a dependency structure-enhanced event detection framework. In addition to the traditional token dependency parsing tree, denoted as TDG, our model considers the dependency edges in it as new nodes and constructs a dependency relation graph (DRG). DRG allows the embedding representations of dependency relations to be updated as nodes rather than edges in a graph neural network. Moreover, the levels of core argument nodes in the two graphs are adjusted by dependency relation types in TDG to enhance their status. Subsequently, the two graphs are further encoded and jointly trained in graph attention networks (GAT). Importantly, we design an interaction strategy of node embedding for the two graphs and refine the attention coefficient computational method to encode the semantic meaning of directed edges. Extensive experiments are conducted to validate the effectiveness of our method, and the results confirm its superiority over the state-of-the-art baselines. Our model outperforms the best benchmark with the Fl score increased by 3.5 and 3.4 percentage points on ACE2005 English and Chinese corpus.
Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc. Given the enormous social impact and the consequent incentives, the potential adversary has ...
详细信息
Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc. Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fully explore the potential risks, we leverage an online attack on the vulnerable data collection process. Since it is independent of rank aggregation and lacks effective protection mechanisms, we disrupt the data collection process by fabricating pairwise comparisons without knowledge of the future data or the true distribution. From the game-theoretic perspective, the confrontation scenario between the online manipulator and the ranker who takes control of the original data source is formulated as a distributionally robust game that deals with the uncertainty of knowledge. Then we demonstrate that the equilibrium in the above game is potentially favorable to the adversary by analyzing the vulnerability of the sampling algorithms such as Bernoulli and reservoir methods. According to the above theoretical analysis, different sequential manipulation policies are proposed under a Bayesian decision framework and a large class of parametric pairwise comparison models. For attackers with complete knowledge, we establish the asymptotic optimality of the proposed policies. To increase the success rate of the sequential manipulation with incomplete knowledge, a distributionally robust estimator, which replaces the maximum likelihood estimation in a saddle point problem, provides a conservative data generation solution. Finally, the corroborating empirical evidence shows that the proposed method manipulates the results of rank aggregation methods in a sequential manner.
This paper presents a Scientific Literature Management Platform (SLMP, demo link1 ) based on large language models (LLMs). The platform consists of four modules: literature management, literature extraction, literatur...
详细信息
We solve the challenging document-level event extraction problem by proposing a joint exaction methodology that can avoid inefficiency and error propagation issues in classic pipeline methods. Essentially, we address ...
详细信息
ISBN:
(纸本)9781959429722
We solve the challenging document-level event extraction problem by proposing a joint exaction methodology that can avoid inefficiency and error propagation issues in classic pipeline methods. Essentially, we address the three crucial limitations in existing studies. First, the autoregressive strategy of path expansion heavily relies on the orders of argument roles. Second, the number of events in documents must be specified in advance. Last, unexpected errors usually exist when decoding events based on the entity-entity adjacency matrix. This paper designs a Token-Token Bidirectional Event Completed Graph (TT-BECG) in which the relation eType-Role1-Role2 serves as the edge type, precisely revealing which tokens play argument roles in an event of a specific event type. Exploiting the token-token adjacency matrix of the TT-BECG, we develop an edge-enhanced joint document-level event extraction model. Guided by the target token-token adjacency matrix, the predicted token-token adjacency matrix can be obtained during model training. Then, the event records in a document are decoded based on the predicted matrix, including the graph structure and edge-type decoding. Extensive experiments are conducted on two public datasets, and the results confirm the effectiveness of our method and its superiority over the state-of-the-art baselines.
Reference point-based environmental selection has achieved promising performance in multi-objective optimization problems. However, when solving the irregular multi-objective optimization problems, the performance of ...
详细信息
Reference point-based environmental selection has achieved promising performance in multi-objective optimization problems. However, when solving the irregular multi-objective optimization problems, the performance of environmental selection is affected. This is because the irregular Pareto front is often degraded, disconnected, inverted, or with sharp tails, resulting in some reference points not located in appropriate region. This releases the selection pressure. Therefore, adjusting or generating some points is necessary to tackle this problem. However, how to identify the region of interest and how to generate new points in the appropriate region are the current problems to be solved. In this paper, a region-based reconstruction for reference points is proposed. For simplicity, the smallest region which consists of M reference points (M is the dimension of objective space) in the hyperplane of reference point is identified as the unit region. If the vertexes of the region all belong to active reference points, the region will be identified as region of interest and new reference points will be reconstructed in this region. In addition, the process is activated in the later stage of the algorithm operation, while the efficient of the search algorithm is weak. In order to find more valuable individuals in the neighborhood region of selected individuals, thereby, firefly algorithm is employed as search algorithm because of its search mechanism which has strong indicative features. Several experiments are designed to verify the performance of the proposed method. The experiment results show that the proposed method is effective.
Spreadsheets contain a lot of valuable data and have many practical *** key technology of these practical applications is how to make machines understand the semantic structure of spreadsheets,e.g.,identifying cell fu...
详细信息
Spreadsheets contain a lot of valuable data and have many practical *** key technology of these practical applications is how to make machines understand the semantic structure of spreadsheets,e.g.,identifying cell function types and discovering relationships between cell *** existing methods for understanding the semantic structure of spreadsheets do not make use of the semantic information of cells.A few studies do,but they ignore the layout structure information of spreadsheets,which affects the performance of cell function classification and the discovery of different relationship types of cell *** this paper,we propose a Heuristic algorithm for Understanding the Semantic Structure of spreadsheets(HUSS).Specifically,for improving the cell function classification,we propose an error correction mechanism(ECM)based on an existing cell function classification model[11]and the layout features of *** improving the table structure analysis,we propose five types of heuristic rules to extract four different types of cell pairs,based on the cell style and spatial location *** experimental results on five real-world datasets demonstrate that HUSS can effectively understand the semantic structure of spreadsheets and outperforms corresponding baselines.
Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of su...
详细信息
Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of such algorithms, in contrast to numerous research work on the computational and statistical characteristics. Driven by huge profit, the potential adversary has strong motivation and incentives to manipulate the ranking list. Meanwhile, the intrinsic vulnerability of the rank aggregation methods is not well studied in the literature. To fully understand the possible risks, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data in this paper. From the perspective of the dynamical system, the attack behavior with a target ranking list is a fixed point belonging to the composition of the adversary and the victim. To perform the targeted attack, we formulate the interaction between the adversary and the victim as a game-theoretic framework consisting of two continuous operators while Nash equilibrium is established. Then two procedures against HodgeRank and RankCentrality are constructed to produce the modification of the original data. Furthermore, we prove that the victims will produce the target ranking list once the adversary masters the complete information. It is noteworthy that the proposed methods allow the adversary only to hold incomplete information or imperfect feedback and perform the purposeful attack. The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments. These experimental results show that the proposed methods could achieve the attacker's goal in the sense that the leading candidate of the perturbed ranking list is the designated one by the adversary.
Cross-lingual image captioning, with its ability to caption an unlabeled image in a target language other than English, is an emerging topic in the multimedia field. In order to save the precious human resource from r...
详细信息
暂无评论