Linked Open data (LOD) is the largest, collaborative, distributed, and publicly-accessible knowledge Graph (KG) uniformly encoded in the Resource Description Framework (RDF) and formally represented according to the s...
详细信息
Linked Open data (LOD) is the largest, collaborative, distributed, and publicly-accessible knowledge Graph (KG) uniformly encoded in the Resource Description Framework (RDF) and formally represented according to the semantics of the Web Ontology Language (OWL). LOD provides researchers with a unique opportunity to study knowledgeengineering as an empirical science: to observe existing modelling practices and possibly understanding how to improve knowledgeengineering methodologies and knowledge representation formalisms. Following this perspective, several studies have analysed LOD to identify (mis-)use of OWL constructs or other modelling phenomena e.g. class or property usage, their alignment, the average depth of taxonomies. A question that remains open is whether there is a relation between observed modelling practices and knowledge domains (natural science, linguistics, etc.): do certain practices or phenomena change as the knowledge domain varies? Answering this question requires an assessment of the domains covered by LOD as well as a classification of its datasets. Existing approaches to classify LOD datasets provide partial and unaligned views, posing additional challenges. In this paper, we introduce a classification of knowledge domains, and a method for classifying LOD datasets and ontologies based on it. We classify a large portion of LOD and investigate whether a set of observed phenomena have a domain-specific character.
The task of automatically detecting emotion in text is challenging. This is due to the fact that most of the times, textual expressions of affect are not direct-using emotion words-but result from the interpretation a...
详细信息
The task of automatically detecting emotion in text is challenging. This is due to the fact that most of the times, textual expressions of affect are not direct-using emotion words-but result from the interpretation and assessment of the meaning of the concepts and interaction of concepts described in the text. This paper presents the core of EmotiNet, a new knowledge base (KB) for representing and storing affective reaction to real-life contexts, and the methodology employed in designing, populating, and evaluating it. The basis of the design process is given by a set of self-reported affective situations in the International Survey on Emotion Antecedents and Reactions (ISEAR) corpus. We cluster the examples and extract triples using Semantic Roles. We subsequently extend our model using other resources, such as VerbOcean, ConceptNet, and SentiWordNet, with the aim of generalizing the knowledge contained. Finally, we evaluate the approach using the representations of other examples in the ISEAR corpus. We conclude that EmotiNet, although limited by the domain and small quantity of knowledge it presently contains, represents a semantic resource appropriate for capturing and storing the structure and the semantics of real events and predicting the emotional responses triggered by chains of actions.
Many data exploration applications require the ability to identify the top-k results according to a scoring function. We study a class of top-k ranking problems where top-k candidates in a dataset are scored with the ...
详细信息
Many data exploration applications require the ability to identify the top-k results according to a scoring function. We study a class of top-k ranking problems where top-k candidates in a dataset are scored with the assistance of another set. We call this class of workloads cross aggregate ranking. Example computation problems include evaluating the Hausdorff distance between two datasets, finding the medoid or radius within one dataset, and finding the closest or farthest pair between two datasets. In this paper, we propose a parallel and distributed solution to process cross aggregate ranking workloads. Our solution subdivides the aggregate score computation of each candidate into tasks while constantly maintains the tentative top-k results as an uncertain top-k result set. The crux of our proposed approach lies in our entropy-based scheduling technique to determine result-yielding tasks based on their abilities to reduce the uncertainty of the tentative result set. Experimental results show that our proposed approach consistently outperforms the best existing one in two different types of cross aggregate rank workloads using real datasets.
In this paper, we present the design of ONDINE system which allows the loading and the querying of a data warehouse opened on the Web, guided by an Ontological and Terminological Resource (OTR). The data warehouse, co...
详细信息
In this paper, we present the design of ONDINE system which allows the loading and the querying of a data warehouse opened on the Web, guided by an Ontological and Terminological Resource (OTR). The data warehouse, composed of data tables extracted from Web documents, has been built to supplement existing local data sources. First, we present the main steps of our semiautomatic method to annotate data tables driven by an OTR. The output of this method is an XML/RDF data warehouse composed of XML documents representing data tables with their fuzzy RDF annotations. We then present our flexible querying system which allows the local data sources and the data warehouse to be simultaneously and uniformly queried, using the OTR. This system relies on SPARQL and allows approximate answers to be retrieved by comparing preferences expressed as fuzzy sets with fuzzy RDF annotations.
Digital technologies are gaining widespread acceptance in engineering and offer opportunities for collating and curating knowledge during and beyond the life cycle of engineering products. knowledge is central to stra...
详细信息
Digital technologies are gaining widespread acceptance in engineering and offer opportunities for collating and curating knowledge during and beyond the life cycle of engineering products. knowledge is central to strategy and operations in most engineering organizations and digital technologies have been employed in attempts to improve current knowledge management practices. A systematic literature review was undertaken to address the question: how do digital technologies influence knowledge management in the engineering sector? Twenty-seven primary studies were identified from 3097 papers on these topics within the engineering literature published between 2010 and 2022. Four knowledge management processes supported by digital technologies were recognized: knowledge creation, storage and retrieval, sharing and application. In supporting knowledge management, digital technologies were found to have been acting in five roles: repositories, transactive memory systems, communication spaces, boundary objects and non-human actors. However, the ability of digital technologies to perform these roles simultaneously had not been considered and similarly knowledge management had not been addressed as a holistic process. Hence, it was concluded that a holistic approach to knowledge management combined with the deployment of digital technologies in multiple roles simultaneously would likely yield significant competitive advantage and organizational value for organizations in the engineering sector.
In a competitive market, business process improvement is a requirement for any organization. This improvement can only be achieved with the support of comprehensive systems that fully monitor business processes. We pr...
详细信息
In a competitive market, business process improvement is a requirement for any organization. This improvement can only be achieved with the support of comprehensive systems that fully monitor business processes. We propose an occurrence-based approach to business process monitoring that provides a holistic perspective of system dynamics, lending support to evolution aspects. More specifically, we present a three-dimensional artifact, called Occurrence, in which structure, behavior, and guidance are considered simultaneously. Based on this, we define more complex structures, namely Occurrence Base and Occurrence Management System, which serve as scaffolding to develop business process monitoring systems. We also present a specific occurrence-based design strategy that, using an MDE approach, has been applied by our research group for the development of successful monitoring applications.
Many emerging database applications entail sophisticated graph-based query manipulation, predominantly evident in large-scale scientific applications. To access the information embedded in graphs, efficient graph matc...
详细信息
Many emerging database applications entail sophisticated graph-based query manipulation, predominantly evident in large-scale scientific applications. To access the information embedded in graphs, efficient graph matching tools and algorithms have become of prime importance. Although the prohibitively expensive time complexity associated with exact subgraph isomorphism techniques has limited its efficacy in the application domain, approximate yet efficient graph matching techniques have received much attention due to their pragmatic applicability. Since public domain databases are noisy and incomplete in nature, inexact graph matching techniques have proven to be more promising in terms of inferring knowledge from numerous structural data repositories. In this paper, we propose a novel technique called TraM for approximate graph matching that off-loads a significant amount of its processing on to the database making the approach viable for large graphs. Moreover, the vector space embedding of the graphs and efficient filtration of the search space enables computation of approximate graph similarity at a throw-away cost. We annotate nodes of the query graphs by means of their global topological properties and compare them with neighborhood biased segments of the data-graph for proper matches. We have conducted experiments on several real data sets, and have demonstrated the effectiveness and efficiency of the proposed method
The actionable behavioral rules suggest specific actions that may influence certain behavior in the stakeholders' best interest. In mining such rules, it was assumed previously that all attributes are categorical ...
详细信息
The actionable behavioral rules suggest specific actions that may influence certain behavior in the stakeholders' best interest. In mining such rules, it was assumed previously that all attributes are categorical while the numerical attributes have been discretized in advance. However, this assumption significantly reduces the solution space, and thus hinders the potential of mining algorithms, especially when the numerical attributes are prevalent. As the numerical data are ubiquitous in business applications, there is a crucial need for new mining methodologies that can better leverage such data. To meet this need, in this paper, we define a new data mining problem, named behavior action mining, as a problem of continuous variable optimization of expected utility for action. We then develop three approaches to solving this new problem, which uses regression as a technical basis. The experimental results based on a marketing dataset demonstrate the validity and superiority of our proposed approaches.
Analysis pipelines commonly use high-level technologies that are popular when created, but are unlikely to be readable, executable, or sustainable in the long term. A set of criteria is introduced to address this prob...
详细信息
Analysis pipelines commonly use high-level technologies that are popular when created, but are unlikely to be readable, executable, or sustainable in the long term. A set of criteria is introduced to address this problem: completeness (no execution requirement beyond a minimal Unix-like operating system, no administrator privileges, no network connection, and storage primarily in plain text);modular design;minimal complexity;scalability;verifiable inputs and outputs;version control;linking analysis with narrative;and free and open-source software. As a proof of concept, we introduce "Maneage" (managing data lineage), enabling cheap archiving, provenance extraction, and peer verification that has been tested in several research publications. We show that longevity is a realistic requirement that does not sacrifice immediate or short-term reproducibility. The caveats (with proposed solutions) are then discussed and we conclude with the benefits for the various stakeholders. This article is itself a Maneage'd project (project commit 313db0b). Appendices-Two comprehensive appendices that review the longevity of existing solutions are available as supplementary "Web extras," which are available in the IEEE Computer Society Digital Library at http://***/10.1109/MCSE.2021.3072860. Reproducibility-All products available in zenodo.4913277, the Git history of this paper's source is at ***/***, which is also archived in Software Heritage Heritage: swh:1:dir:33fea87068c1612daf011f161b97787b9a0df39f. Clicking on the SWHIDs in the digital format will provide more "context" for same content.
We propose a hypergraph-based framework for modeling and detecting malevolent activities. The proposed model supports the specification of order-independent sets of action symbols along with temporal and cardinality c...
详细信息
We propose a hypergraph-based framework for modeling and detecting malevolent activities. The proposed model supports the specification of order-independent sets of action symbols along with temporal and cardinality constraints on the execution of actions. We study and characterize the problems of consistency checking, equivalence, and minimality of hypergraph-based models. In addition, we define and characterize the general activity detection problem, that amounts to finding all subsequences that represent a malevolent activity in a sequence of logged actions. Since the problem is intractable, we also develop an index data structure that allows the security expert to efficiently extract occurrences of activities of interest.
暂无评论