With the exponential growth of biomedical knowledge in unstructured text repositories such as PubMed, it is imminent to establish a knowledge graph-style, efficient searchable and targeted database that can support th...
With the exponential growth of biomedical knowledge in unstructured text repositories such as PubMed, it is imminent to establish a knowledge graph-style, efficient searchable and targeted database that can support the need of information retrieval from researchers and clinicians. To mine knowledge from graph databases, most previous methods view a triple in a graph (see Fig. 1) as the basic processing unit and embed the triplet element (i.e. drugs/chemicals, proteins/genes and their interaction) as separated embedding matrices, which cannot capture the semantic correlation among triple elements. To remedy the loss of semantic correlation caused by disjoint embeddings, we propose a novel approach to learn triple embeddings by combining entities and interactions into a unified representation. Furthermore, traditional methods usually learn triple embeddings from scratch, which cannot take advantage of the rich domain knowledge embedded in pre-trained models, and is also another significant reason for the fact that they cannot distinguish the differences implied by the same entity in the multi-interaction triples. In this paper, we propose a novel fine-tuning based approach to learn better triple embeddings by creating weakly supervised signals from pre-trained knowledge graph embeddings. The method automatically samples triples from knowledge graphs and estimates their pairwise similarity from pre-trained embedding models. The triples are then fed pairwise into a Siamese-like neural architecture, where the triple representation is fine-tuned in the manner bootstrapped by triple similarity scores. Finally, we demonstrate that triple embeddings learned with our method can be readily applied to several downstream applications (e.g. triple classification and triple clustering). We evaluated the proposed method on two open-source drug-protein knowledge graphs constructed from PubMed abstracts, as provided by BioCreative. Our method achieves consistent improvement in both t
The implicitly coupled pressure-based algorithm is widely acknowledged for its superior convergence and robustness in solving incompressible flow problems. However, the increased expansion scale of equations and diffi...
详细信息
Dear editor,Docker1), as a de-facto industry standard [1], enables the packaging of an application with all its dependencies and execution environment in a light-weight, self-contained unit, i.e., *** launching the co...
详细信息
Dear editor,Docker1), as a de-facto industry standard [1], enables the packaging of an application with all its dependencies and execution environment in a light-weight, self-contained unit, i.e., *** launching the container from Docker image, developers can easily share the same operating system, libraries, and binaries [2]. As the configuration file, the dockerfile plays an important role,
The Double Heterogeneous (DH) system, where fuel particles are randomly dispersed in the non-fissile matrix, is challenging for the reactor physics calculation. The Sanchez-Pomraning method accurately handles the DH s...
The growth of IoT and mobile devices has led to Mobile Crowdsensing (MCS), a cost-effective data collection method crucial for smart cities. While MCS outperforms wireless sensor networks, it may expose workers’ sens...
The growth of IoT and mobile devices has led to Mobile Crowdsensing (MCS), a cost-effective data collection method crucial for smart cities. While MCS outperforms wireless sensor networks, it may expose workers’ sensitive data, such as location and identity, in air quality monitoring. Traditional privacy-preserving techniques, such as location obfuscation and data perturbation, have inherent limitations in ensuring strong privacy protection. Moreover, the frequent uploading of numerical data during task execution requires a larger privacy budget, thereby increasing the risk of privacy leakage. To solve these problems, this paper proposes a key–value data collection scheme based on local differential privacy for air quality monitoring in smart cities. The proposed scheme aims to protect user privacy while ensuring data utility. It consists of two main phases: data collection and data prediction. During the data collection phase, workers locally perturb both the task location (key) and the sensed data (value), utilizing the correlation between keys and values to enhance data utility. The system subsequently aggregates the perturbed data and applies bias correction to ensure unbiased estimation. In the prediction phase, an exponential smoothing technique is introduced to mitigate the impact of privacy-preserving mechanisms on prediction accuracy. This method effectively reduces random fluctuations in the data, thereby enhancing the overall prediction performance. Experiments on real-world datasets show that the proposed scheme outperforms other privacy-preserving algorithms in efficiency while maintaining nearly the same prediction accuracy as non-privacy-preserving methods, effectively balancing privacy and data utility.
A race condition is a common trigger for concurrency bugs. As a special case, a race condition can also occur across the kernel and user space causing a doublefetch bug, which is a field that has received little resea...
详细信息
A race condition is a common trigger for concurrency bugs. As a special case, a race condition can also occur across the kernel and user space causing a doublefetch bug, which is a field that has received little research attention. In our work, we first analyzed real-world doublefetch bug cases and extracted two specific patterns for doublefetch bugs. Based on these patter ns, we proposed an approach of multi-taint parallel tracking to detect double-fetch bugs. We also implemented a prototype called DFTracker (doublefetch bug tracker), and we evaluated it with our test suite. Our experiments demonstrated that it could effectively find all the double-fetch bugs in the test suite including eight realworld cases with no false negatives and minor false positives. In addition, we tested it on Linux kernel and found a new double-fetch bug. The execution overhead is approximately 2x for single-file cases and approximately 9x for the whole kernel test, which is acceptable. To the best of the authors1 knowledge, this work is the first to introduce multi-taint parallel tracking to double-fetch bug detection—an innovative method that is specific to double-fetch bug features—and has better path coverage as well as lower runtime overhead than the widely used dynamic approaches.
With the rapid development of the satellite industry, the information transmission network based on communication satellites has gradually become a major and important part of the future satellite ground integration n...
详细信息
With the rapid development of the satellite industry, the information transmission network based on communication satellites has gradually become a major and important part of the future satellite ground integration network. However, the low transmission efficiency of the satellite data relay back mission has become a problem that is currently constraining the construction of the system and needs to be solved urgently. Effectively planning the task of satellite ground networking by reasonably scheduling resources is crucial for the efficient transmission of task data. In this paper, we hope to provide a task execution scheme that maximizes the profit of the networking task for satellite ground network planning considering feeding mode (SGNPFM). To solve the SGNPFM problem, a mixed-integer planning model with the objective of maximizing the gain of the link-building task is constructed, which considers various constraints of the satellite in the feed-switching mode. Based on the problem characteristics, we propose a distance similarity-based genetic optimization algorithm (DSGA), which considers the state characteristics between the tasks and introduces a weighted Euclidean distance method to determine the similarity between the tasks. To obtain more high-quality solutions, different similarity evaluation methods are designed to assist the algorithm in intelligently screening individuals. The DSGAalso uses an adaptive crossover strategy based on similarity mechanism, which guides the algorithm to achieve efficient population search. In addition, a task scheduling algorithm considering the feed-switching mode is designed for decoding the algorithm to generate a highquality scheme. The results of simulation experiments show that the DSGA can effectively solve the SGNPFM problem. Compared to other algorithms, the proposed algorithm not only obtains higher quality planning schemes but also has faster algorithm convergence speed. The proposed algorithm improves data trans
In large-scale distributed training, communication compression techniques are widely used to reduce the significant communication overhead caused by the frequent exchange of model parameters or gradients between train...
详细信息
Reducing feature redundancy has shown beneficial effects for improving the accuracy of deep learning models, thus it is also indispensable for the models of unsupervised domain adaptation (UDA). Nevertheless, most rec...
详细信息
User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate sid...
详细信息
暂无评论