Digital technologies are gaining widespread acceptance in engineering and offer opportunities for collating and curating knowledge during and beyond the life cycle of engineering products. knowledge is central to stra...
详细信息
Digital technologies are gaining widespread acceptance in engineering and offer opportunities for collating and curating knowledge during and beyond the life cycle of engineering products. knowledge is central to strategy and operations in most engineering organizations and digital technologies have been employed in attempts to improve current knowledge management practices. A systematic literature review was undertaken to address the question: how do digital technologies influence knowledge management in the engineering sector? Twenty-seven primary studies were identified from 3097 papers on these topics within the engineering literature published between 2010 and 2022. Four knowledge management processes supported by digital technologies were recognized: knowledge creation, storage and retrieval, sharing and application. In supporting knowledge management, digital technologies were found to have been acting in five roles: repositories, transactive memory systems, communication spaces, boundary objects and non-human actors. However, the ability of digital technologies to perform these roles simultaneously had not been considered and similarly knowledge management had not been addressed as a holistic process. Hence, it was concluded that a holistic approach to knowledge management combined with the deployment of digital technologies in multiple roles simultaneously would likely yield significant competitive advantage and organizational value for organizations in the engineering sector.
Linked Open data (LOD) is the largest, collaborative, distributed, and publicly-accessible knowledge Graph (KG) uniformly encoded in the Resource Description Framework (RDF) and formally represented according to the s...
详细信息
Linked Open data (LOD) is the largest, collaborative, distributed, and publicly-accessible knowledge Graph (KG) uniformly encoded in the Resource Description Framework (RDF) and formally represented according to the semantics of the Web Ontology Language (OWL). LOD provides researchers with a unique opportunity to study knowledgeengineering as an empirical science: to observe existing modelling practices and possibly understanding how to improve knowledgeengineering methodologies and knowledge representation formalisms. Following this perspective, several studies have analysed LOD to identify (mis-)use of OWL constructs or other modelling phenomena e.g. class or property usage, their alignment, the average depth of taxonomies. A question that remains open is whether there is a relation between observed modelling practices and knowledge domains (natural science, linguistics, etc.): do certain practices or phenomena change as the knowledge domain varies? Answering this question requires an assessment of the domains covered by LOD as well as a classification of its datasets. Existing approaches to classify LOD datasets provide partial and unaligned views, posing additional challenges. In this paper, we introduce a classification of knowledge domains, and a method for classifying LOD datasets and ontologies based on it. We classify a large portion of LOD and investigate whether a set of observed phenomena have a domain-specific character.
The automatic extraction of the chemical-disease relation (CDR) from the text becomes critical because it takes a lot of time and effort to extract valuable CDR manually. Studies have shown that prior knowledge from t...
详细信息
The automatic extraction of the chemical-disease relation (CDR) from the text becomes critical because it takes a lot of time and effort to extract valuable CDR manually. Studies have shown that prior knowledge from the biomedical knowledge base is important for relation extraction. The method of combining deep learning models with prior knowledge is worthy of our study. In this paper, we propose a new model called knowledge Guided Attention and Graph Convolutional Networks (KGAGN) for CDR extraction. First, to make full advantage of domain knowledge, we train entity embedding as a feature representation of input sequence, and relation embedding to capture weighted contextual information further through the attention mechanism. Then, to make full advantage of syntactic dependency information in cross-sentence CDR extraction, we construct document-level syntactic dependency graphs and encode them using a graph convolution network (GCN). Finally, the chemical-induced disease (CID) relation is extracted by using weighted context features and long-range dependency features both of which contain additional knowledge informationWe evaluated our model on the CDR dataset published by the BioCreative-V community and achieves an F1-score of 73.3%, surpassing other state-of-the-art methods. the code implemented by PyTorch 1.7.0 deep learning library can be downloaded from Github: https://***/sunyi123/cdr.
Although popularly used in big-data analytics, dimensionality reduction is a complex, black-box technique whose outcome is difficult to interpret and evaluate. In recent years, a number of quantitative and visual meth...
详细信息
Although popularly used in big-data analytics, dimensionality reduction is a complex, black-box technique whose outcome is difficult to interpret and evaluate. In recent years, a number of quantitative and visual methods have been proposed for analyzing low-dimensional embeddings. On the one hand, quantitative methods associate numeric identifiers to qualitative characteristics of these embeddings;and, on the other hand, visual techniques allow users to interactively explore these embeddings and make decisions. However, in the former case, users do not have control over the analysis, while in the latter case. assessment decisions are entirely dependent on the user's perception and expertise. In order to bridge the gap between the two, in this article, we present VisExPreS, a visual interactive toolkit that enables a user-driven assessment of low-dimensional embeddings. VisExPreS is based on three novel techniques namely PG-LAPS, PG-GAPS, and RepSubset, that generate interpretable explanations of the preserved local and global structures in embeddings. In the first two techniques, the VisExPreS system proactively guides users during every step of the analysis. We demonstrate the utility of VisExPreS in interpreting, analyzing, and evaluating embeddings from different dimensionality reduction algorithms using multiple case studies and an extensive user study.
Multivariate time series data are invasive in different domains, ranging from data center supervision and e-commerce data to financial transactions. This kind of data presents an important challenge for anomaly detect...
详细信息
Multivariate time series data are invasive in different domains, ranging from data center supervision and e-commerce data to financial transactions. This kind of data presents an important challenge for anomaly detection due to the temporal dependency aspect of its observations. In this article, we investigate the problem of unsupervised local anomaly detection in multivariate time series data from temporal modeling and residual analysis perspectives. The residual analysis has been shown to be effective in classical anomaly detection problems. However, it is a nontrivial task in multivariate time series as the temporal dependency between the time series observations complicates the residual modeling process. Methodologically, we propose a unified learning framework to characterize the residuals and their coherence with the temporal aspect of the whole multivariate time series. Experiments on real-world datasets are provided showing the effectiveness of the proposed algorithm.
Analysis pipelines commonly use high-level technologies that are popular when created, but are unlikely to be readable, executable, or sustainable in the long term. A set of criteria is introduced to address this prob...
详细信息
Analysis pipelines commonly use high-level technologies that are popular when created, but are unlikely to be readable, executable, or sustainable in the long term. A set of criteria is introduced to address this problem: completeness (no execution requirement beyond a minimal Unix-like operating system, no administrator privileges, no network connection, and storage primarily in plain text);modular design;minimal complexity;scalability;verifiable inputs and outputs;version control;linking analysis with narrative;and free and open-source software. As a proof of concept, we introduce "Maneage" (managing data lineage), enabling cheap archiving, provenance extraction, and peer verification that has been tested in several research publications. We show that longevity is a realistic requirement that does not sacrifice immediate or short-term reproducibility. The caveats (with proposed solutions) are then discussed and we conclude with the benefits for the various stakeholders. This article is itself a Maneage'd project (project commit 313db0b). Appendices-Two comprehensive appendices that review the longevity of existing solutions are available as supplementary "Web extras," which are available in the IEEE Computer Society Digital Library at http://***/10.1109/MCSE.2021.3072860. Reproducibility-All products available in zenodo.4913277, the Git history of this paper's source is at ***/***, which is also archived in Software Heritage Heritage: swh:1:dir:33fea87068c1612daf011f161b97787b9a0df39f. Clicking on the SWHIDs in the digital format will provide more "context" for same content.
The actionable behavioral rules suggest specific actions that may influence certain behavior in the stakeholders' best interest. In mining such rules, it was assumed previously that all attributes are categorical ...
详细信息
The actionable behavioral rules suggest specific actions that may influence certain behavior in the stakeholders' best interest. In mining such rules, it was assumed previously that all attributes are categorical while the numerical attributes have been discretized in advance. However, this assumption significantly reduces the solution space, and thus hinders the potential of mining algorithms, especially when the numerical attributes are prevalent. As the numerical data are ubiquitous in business applications, there is a crucial need for new mining methodologies that can better leverage such data. To meet this need, in this paper, we define a new data mining problem, named behavior action mining, as a problem of continuous variable optimization of expected utility for action. We then develop three approaches to solving this new problem, which uses regression as a technical basis. The experimental results based on a marketing dataset demonstrate the validity and superiority of our proposed approaches.
The need to manage electronic documents is an open issue in the digital era. It becomes a challenging problem on the internet where a large amount of data needs even more efficient and effective methods and techniques...
详细信息
The need to manage electronic documents is an open issue in the digital era. It becomes a challenging problem on the internet where a large amount of data needs even more efficient and effective methods and techniques for mining and representing information. In this context, document summarization, browsing processes and visualization techniques have had a great impact on several dimensions of user information perception. In this context, the use of ontologies for knowledge representation has rapidly grown in the last years in several application domains together with social-based techniques such as tag clouds. This form of visualization tool is becoming particularly useful in the interaction process between users and social applications where a huge amount of data needs to have effective and efficient interfaces. In this article, the authors propose a novel methodology based on a combination of ontologies and Tag Clouds for web document collections browsing and summarizing, they call this tool Semantic Tag Cloud.
Many data exploration applications require the ability to identify the top-k results according to a scoring function. We study a class of top-k ranking problems where top-k candidates in a dataset are scored with the ...
详细信息
Many data exploration applications require the ability to identify the top-k results according to a scoring function. We study a class of top-k ranking problems where top-k candidates in a dataset are scored with the assistance of another set. We call this class of workloads cross aggregate ranking. Example computation problems include evaluating the Hausdorff distance between two datasets, finding the medoid or radius within one dataset, and finding the closest or farthest pair between two datasets. In this paper, we propose a parallel and distributed solution to process cross aggregate ranking workloads. Our solution subdivides the aggregate score computation of each candidate into tasks while constantly maintains the tentative top-k results as an uncertain top-k result set. The crux of our proposed approach lies in our entropy-based scheduling technique to determine result-yielding tasks based on their abilities to reduce the uncertainty of the tentative result set. Experimental results show that our proposed approach consistently outperforms the best existing one in two different types of cross aggregate rank workloads using real datasets.
Someone enjoys listening to playlists while commuting. He wants a different playlist of n songs each day, but always starting from Locked Out of Heaven, a Bruno Mars song. The list should progress in smooth transition...
详细信息
Someone enjoys listening to playlists while commuting. He wants a different playlist of n songs each day, but always starting from Locked Out of Heaven, a Bruno Mars song. The list should progress in smooth transitions between successive and randomly selected songs until it ends up at Stairway to Heaven, a Led Zeppelin song. The challenge of automatically generating random and heterogeneous playlists is to find the appropriate balance among several conflicting goals. We propose two methods for solving this problem. One is called ROPE, and it depends on a representation of the songs in a Euclidean space. It generates a random path through a Brownian Bridge that connects any two songs selected by the user in this music space. The second is STRAW, which constructs a graph representation of the music space where the nodes are songs and edges connect similar songs. STRAW creates a playlist by traversing the graph through a steering random walk that starts on a selected song and is directed toward a target song also selected by the user. When compared with the state-of-the-art algorithms, our algorithms are the only ones that satisfy the following quality constraints: heterogeneity, smooth transitions, novelty, scalability, and usability. We demonstrate the usefulness of our proposed algorithms by applying them to a large collection of songs and make available a prototype.
暂无评论