We propose a hypergraph-based framework for modeling and detecting malevolent activities. The proposed model supports the specification of order-independent sets of action symbols along with temporal and cardinality c...
详细信息
We propose a hypergraph-based framework for modeling and detecting malevolent activities. The proposed model supports the specification of order-independent sets of action symbols along with temporal and cardinality constraints on the execution of actions. We study and characterize the problems of consistency checking, equivalence, and minimality of hypergraph-based models. In addition, we define and characterize the general activity detection problem, that amounts to finding all subsequences that represent a malevolent activity in a sequence of logged actions. Since the problem is intractable, we also develop an index data structure that allows the security expert to efficiently extract occurrences of activities of interest.
Estimation of crowd sizes or the occupancy of buildings and skyscrapers can often be essential. However, traditional ways of estimation through manual counting, image processing or in the case of skyscrapers, through ...
详细信息
Estimation of crowd sizes or the occupancy of buildings and skyscrapers can often be essential. However, traditional ways of estimation through manual counting, image processing or in the case of skyscrapers, through total water usage are awkward, inefficient and often inaccurate. Social media has developed rapidly in the last decade. In this work, we provide novel solutions to estimate the population of suburbs and skyscrapers-so-called micro-populations, through the use of social media. We develop a big data solution leveraging large-scale harvesting and analysis of Twitter data. By harvesting realtime tweets and clustering tweets within suburbs and skyscrapers, we show how micro-populations can be calculated. To validate this, we construct linear and spatial models for the suburbs in four cities of Australia using census data and geospatial data models (shapefiles). Our prediction of micro-population shows that Twitter can indeed be used for population prediction with a high degree of accuracy.
Current methods of surmounting the limitations of von Neumann architectures in building Big data systems do not address the core architectural problems, thereby incurring resource utilization inefficiencies. This pape...
详细信息
ISBN:
(纸本)9781509035199
Current methods of surmounting the limitations of von Neumann architectures in building Big data systems do not address the core architectural problems, thereby incurring resource utilization inefficiencies. This paper proposes the replacement of von Neumann architectures with a non-volatile computer architecture that is specifically designed to scale up to Big data. This new architecture is assessed against prevalent Big data models, in the context of performance and power consumption, the results of which show that the prerequisite technology is already available for its implementation.
In a competitive market, business process improvement is a requirement for any organization. This improvement can only be achieved with the support of comprehensive systems that fully monitor business processes. We pr...
详细信息
In a competitive market, business process improvement is a requirement for any organization. This improvement can only be achieved with the support of comprehensive systems that fully monitor business processes. We propose an occurrence-based approach to business process monitoring that provides a holistic perspective of system dynamics, lending support to evolution aspects. More specifically, we present a three-dimensional artifact, called Occurrence, in which structure, behavior, and guidance are considered simultaneously. Based on this, we define more complex structures, namely Occurrence Base and Occurrence Management System, which serve as scaffolding to develop business process monitoring systems. We also present a specific occurrence-based design strategy that, using an MDE approach, has been applied by our research group for the development of successful monitoring applications.
In this paper, we present the design of ONDINE system which allows the loading and the querying of a data warehouse opened on the Web, guided by an Ontological and Terminological Resource (OTR). The data warehouse, co...
详细信息
In this paper, we present the design of ONDINE system which allows the loading and the querying of a data warehouse opened on the Web, guided by an Ontological and Terminological Resource (OTR). The data warehouse, composed of data tables extracted from Web documents, has been built to supplement existing local data sources. First, we present the main steps of our semiautomatic method to annotate data tables driven by an OTR. The output of this method is an XML/RDF data warehouse composed of XML documents representing data tables with their fuzzy RDF annotations. We then present our flexible querying system which allows the local data sources and the data warehouse to be simultaneously and uniformly queried, using the OTR. This system relies on SPARQL and allows approximate answers to be retrieved by comparing preferences expressed as fuzzy sets with fuzzy RDF annotations.
In experiments designed for family-based association studies, methods such as transmission disequilibrium test require large number of trios to identify single-nucleotide polymorphisms associated with the disease. How...
详细信息
In experiments designed for family-based association studies, methods such as transmission disequilibrium test require large number of trios to identify single-nucleotide polymorphisms associated with the disease. However, unavailability of a large number of trios is the Achilles' heel of many complex diseases, especially for late-onset diseases. In this paper, we propose a novel approach to this problem by means of the Dempster-Shafer method. The simulation studies show that the Dempster-Shafer method has a promising overall performance, in identifying single-nucleotide polymorphisms in the correct association class, as it has 90 percent accuracy even with 60 trios.
The task of automatically detecting emotion in text is challenging. This is due to the fact that most of the times, textual expressions of affect are not direct-using emotion words-but result from the interpretation a...
详细信息
The task of automatically detecting emotion in text is challenging. This is due to the fact that most of the times, textual expressions of affect are not direct-using emotion words-but result from the interpretation and assessment of the meaning of the concepts and interaction of concepts described in the text. This paper presents the core of EmotiNet, a new knowledge base (KB) for representing and storing affective reaction to real-life contexts, and the methodology employed in designing, populating, and evaluating it. The basis of the design process is given by a set of self-reported affective situations in the International Survey on Emotion Antecedents and Reactions (ISEAR) corpus. We cluster the examples and extract triples using Semantic Roles. We subsequently extend our model using other resources, such as VerbOcean, ConceptNet, and SentiWordNet, with the aim of generalizing the knowledge contained. Finally, we evaluate the approach using the representations of other examples in the ISEAR corpus. We conclude that EmotiNet, although limited by the domain and small quantity of knowledge it presently contains, represents a semantic resource appropriate for capturing and storing the structure and the semantics of real events and predicting the emotional responses triggered by chains of actions.
Many emerging database applications entail sophisticated graph-based query manipulation, predominantly evident in large-scale scientific applications. To access the information embedded in graphs, efficient graph matc...
详细信息
Many emerging database applications entail sophisticated graph-based query manipulation, predominantly evident in large-scale scientific applications. To access the information embedded in graphs, efficient graph matching tools and algorithms have become of prime importance. Although the prohibitively expensive time complexity associated with exact subgraph isomorphism techniques has limited its efficacy in the application domain, approximate yet efficient graph matching techniques have received much attention due to their pragmatic applicability. Since public domain databases are noisy and incomplete in nature, inexact graph matching techniques have proven to be more promising in terms of inferring knowledge from numerous structural data repositories. In this paper, we propose a novel technique called TraM for approximate graph matching that off-loads a significant amount of its processing on to the database making the approach viable for large graphs. Moreover, the vector space embedding of the graphs and efficient filtration of the search space enables computation of approximate graph similarity at a throw-away cost. We annotate nodes of the query graphs by means of their global topological properties and compare them with neighborhood biased segments of the data-graph for proper matches. We have conducted experiments on several real data sets, and have demonstrated the effectiveness and efficiency of the proposed method
Ontology alignment identifies semantically matching entities in different ontologies. Various ontology alignment strategies have been proposed;however, few systems have explored how to automatically combine multiple s...
详细信息
Ontology alignment identifies semantically matching entities in different ontologies. Various ontology alignment strategies have been proposed;however, few systems have explored how to automatically combine multiple strategies to improve the matching effectiveness. This paper presents a dynamic multistrategy ontology alignment framework, named RiMOM. The key insight in this framework is that similarity characteristics between ontologies may vary widely. We propose a systematic approach to quantitatively estimate the similarity characteristics for each alignment task and propose a strategy selection method to automatically combine the matching strategies based on two estimated factors. In the approach, we consider both textual and structural characteristics of ontologies. With RiMOM, we participated in the 2006 and 2007 campaigns of the Ontology Alignment Evaluation Initiative (OAEI). Our system is among the top three performers in benchmark data sets.
In recent years, there has been significant interest in the development of ranking functions and efficient top-k retrieval algorithms to help users in ad hoc search and retrieval in databases (e.g., buyers searching f...
详细信息
In recent years, there has been significant interest in the development of ranking functions and efficient top-k retrieval algorithms to help users in ad hoc search and retrieval in databases (e.g., buyers searching for products in a catalog). We introduce a complementary problem: How to guide a seller in selecting the best attributes of a new tuple (e.g., a new product) to highlight so that it stands out in the crowd of existing competitive products and is widely visible to the pool of potential buyers. We develop several formulations of this problem. Although the problems are NP-complete, we give several exact and approximation algorithms that work well in practice. One type of exact algorithms is based on Integer Programming (IP) formulations of the problems. Another class of exact methods is based on maximal frequent item set mining algorithms. The approximation algorithms are based on greedy heuristics. A detailed performance study illustrates the benefits of our methods on real and synthetic data.
暂无评论