The global diffusion of smartphones and tablets, exceeding traditional desktops and laptops market share, presents investigative opportunities and poses serious challenges to law enforcement agencies and forensic prof...
详细信息
The global diffusion of smartphones and tablets, exceeding traditional desktops and laptops market share, presents investigative opportunities and poses serious challenges to law enforcement agencies and forensic professionals. Traditional Digital Forensics techniques, indeed, may be no longer appropriate for timely analysis of digital devices found at the crime scene. Nevertheless, dealing with specific crimes such as murder, child abductions, missing persons, death threats, such activity may be crucial to speed up investigations. Motivated by this, the paper explores the field of Triage, a relatively new branch of Digital Forensics intended to provide investigators with actionable intelligence through digital media inspection, and describes a new interdisciplinary approach that merges Digital Forensics techniques and Machine Learning principles. The proposed Triage methodology aims at automating the categorization of digital media on the basis of plausible connections between traces retrieved (i.e. digital evidence) and crimes under investigation. As an application of the proposed method, two case studies about copyright infringement and child pornography exchange are then presented to actually prove that the idea is viable. The term "feature" will be regarded in the paper as a quantitative measure of a "plausible digital evidence", according to the Machine Learning terminology. In this regard, we (a) define a list of crime-related features, (b) identify and extract them from available devices and forensic copies, (c) populate an input matrix and (d) process it with different Machine Learning mining schemes to come up with a device classification. We perform a benchmark study about the most popular miningalgorithms (i.e. Bayes Networks, Decision Trees, Locally Weighted Learning and Support Vector Machines) to find the ones that best fit the case in question. Obtained results are encouraging as we will show that, triaging a dataset of 13 digital media and 45 copyr
This paper introduces a recently published Python datamining book (chapters, topics, samples of Python source code written by its authors) to be used in datamining via world wide web and any specific database in sev...
详细信息
This paper introduces a recently published Python datamining book (chapters, topics, samples of Python source code written by its authors) to be used in datamining via world wide web and any specific database in several disciplines (economic, physics, education, marketing. etc). The book started with an introduction to datamining by explaining some of the datamining tasks involved classification, dependence modelling, clustering and discovery of association rules. The book addressed that using Python in datamining has been gaining some interest from data miner community due to its open source, general purpose programming and web scripting language; furthermore, it is a cross platform and it can be run on a wide variety of operating systens such as Linux, Windows, FreeBSD, Macintosh, Solaris, OS/2, Amiga, AROS, AS/400, BeOS, OS/390, z/OS, Palm OS, QNX, VMS, Psion, Acorn RISC OS, VxWorks, PlayStation, Sharp Zaurus, Windows CE and even PocketPC. Finally this book can be considered as a teaching textbook for datamining in which several methods such as machine learning and statistics are used to extract high-level knowledge from real-world datasets.
The pattern extraction and discovery of useful information from a dataset are the foremost purposes of datamining; the outcome of this process is the ‘knowledge’ which is helpful in taking the decision. For the pas...
详细信息
The pattern extraction and discovery of useful information from a dataset are the foremost purposes of datamining; the outcome of this process is the ‘knowledge’ which is helpful in taking the decision. For the past decade there have been multiple attempts and strong beliefs in the development and the formulation of the unified datamining frameworks that would answer to the fundamental versions related to the discovery of knowledge. In this paper we are presenting a novel unified framework for datamining conceptualized through the composite functions. The framework is further illustrated with a variety of real life datasets using different data mining algorithms.
BACKGROUND: With the exception of case reports, limited data are available regarding the risk of hepatotoxicity associated with the use of telithromycin. OBJECTIVE: To detect the safety signal regarding the reporting ...
详细信息
BACKGROUND: With the exception of case reports, limited data are available regarding the risk of hepatotoxicity associated with the use of telithromycin. OBJECTIVE: To detect the safety signal regarding the reporting of hepatotoxicity associated with the use of telithromycin using 4 commonly employed data mining algorithms (DMAs). METHODS: Based on the Adverse Events Reporting System (AERS) database of the Food and Drug Administration, 4 DMAs, including the reporting odds ratio (ROR), the proportional reporting ratio (PRR), the information component (IC), and the Gamma Poisson Shrinker (GPS), were applied to examine the association between the reporting of hepatotoxicity and the use of telithromycin. The study period was from the first quarter of 2004 to the second quarter of 2006. The reporting of hepatotoxicity was identified using the preferred terms indexed in the Medical Dictionary for Regulatory Activities. The drug name was used to identify reports regarding the use of telithromycin. RESULTS: A total of 226 reports describing hepatotoxocity associated with the use of telithromycin were recorded in the AERS. A safety problem of telithromycin associated with increased reporting of hepatotoxicity was clearly detected by 4 algorithms as early as 2005, signaling the problem in the first quarter by the ROR and the IC, in the second quarter by the PRR, and in the fourth quarter by the GPS. CONCLUSIONS: A safety signal was indicated by the 4 DMAs suggesting an association between the reporting of hepatotoxicity and the use of telithromycin. Given the wide use of telithromycin and serious consequences of hepatotoxicity, clinicians should be cautious when selecting telithromycin for treatment of an infection. In addition, further observational studies are required to evaluate the utility of signal detection systems for early recognition of serious, life-threatening, low-frequency drug-induced adverse events.
Digital forensics is a growing and important field of research for current intelligence, law en- forcement, and military organizations today. As more information is stored in digital form, the need and ability to anal...
详细信息
Digital forensics is a growing and important field of research for current intelligence, law en- forcement, and military organizations today. As more information is stored in digital form, the need and ability to analyze and process this information for relevant evidence has grown in complexity. Today analysis is reliant upon trained experts. This, compounded with the sheer volume of evidence obtained from the field, means that analysis frequently takes too long. Cur- rent forensic tools focus on decoding and visualization and not data reduction or correlation. This thesis fills an important void. The first goal is to determine whether it is possible to use file metadata accurately to ascribe ownership of files based upon a hard drive with multiple users. The second is to explore and validate existing algorithms that may support and aid data ascrip- tion. The last goal of this work is to compare and measure the accuracy of these algorithms. This work facilitates further research into developing an automated analysis and reporting framework for media exploitation in computer forensics.
Generalized association rule extraction is a powerful tool to discover a high level view of the interesting patterns hidden in the analyzed data. However, since the patterns are extracted at any level of abstraction, ...
详细信息
Generalized association rule extraction is a powerful tool to discover a high level view of the interesting patterns hidden in the analyzed data. However, since the patterns are extracted at any level of abstraction, the mined rule set may be too large to be effectively exploited in the decision making process. Thus, to discover valuable and interesting knowledge a post-processing step is usually required. This paper presents the CoGAR framework to efficiently support constrained generalized association rule mining. The generalization process of CoGAR exploits a (user-provided) multiple-taxonomy to drive an opportunistic itemset generalization process, which prevents discarding relevant but infrequent knowledge by aggregating features at different granularity levels. Besides the traditional support and confidence constraints, two further constraints are enforced: (i) schema constraints and (ii) the opportunistic confidence constraint. Schema constraints allow the analyst to specify the structure of the patterns of interest and drive the itemset mining phase. The opportunistic confidence constraint, a new constraint proposed in this paper, allows us to discriminate between significant and redundant rules by analyzing similar rules belonging to different abstraction levels. This constraint is enforced during the rule generation step. Experiments performed on real datasets collected in two different application domains show the effectiveness and the efficiency of the proposed framework in mining constrained generalized association rules. (C) 2011 Elsevier Inc. All rights reserved.
This paper will be presenting how the social attributes impact the Predictive Analytics results when applied on the telecommunication industry dataset. Predictive Analytics is the new emerging field in datamining, an...
详细信息
ISBN:
(纸本)9781467322522;9781467322492
This paper will be presenting how the social attributes impact the Predictive Analytics results when applied on the telecommunication industry dataset. Predictive Analytics is the new emerging field in datamining, and is actively being applied to solve multiple business questions, such as Customer Churn, Product Up-Sell and Cross-Sell, etc. Predictive Analytical models exploit patterns that are found in historical and transactional data to identify risks, opportunities and future events. During comparative analysis it was noticed that with the Addition of Social Attributes, the Efficiency and Reliability of these Models is greatly enhanced.
data perturbation is a popular technique in privacy-preserving datamining. A major challenge in data perturbation is to balance privacy protection and data utility, which are normally considered as a pair of conflict...
详细信息
data perturbation is a popular technique in privacy-preserving datamining. A major challenge in data perturbation is to balance privacy protection and data utility, which are normally considered as a pair of conflicting factors. We argue that selectively preserving the task/model specific information in perturbation will help achieve better privacy guarantee and better data utility. One type of such information is the multidimensional geometric information, which is implicitly utilized by many data-mining models. To preserve this information in data perturbation, we propose the Geometric data Perturbation (GDP) method. In this paper, we describe several aspects of the GDP method. First, we show that several types of well-known data-mining models will deliver a comparable level of model quality over the geometrically perturbed data set as over the original data set. Second, we discuss the intuition behind the GDP method and compare it with other multidimensional perturbation methods such as random projection perturbation. Third, we propose a multi-column privacy evaluation framework for evaluating the effectiveness of geometric data perturbation with respect to different level of attacks. Finally, we use this evaluation framework to study a few attacks to geometrically perturbed data sets. Our experimental study also shows that geometric data perturbation can not only provide satisfactory privacy guarantee but also preserve modeling accuracy well.
In order to guarantee high operational availability, the modular industrial equipment requires frequent maintenance, which oftentimes is carried out by the manufacturer. Reports about service technician's activiti...
详细信息
ISBN:
(纸本)9781457704345
In order to guarantee high operational availability, the modular industrial equipment requires frequent maintenance, which oftentimes is carried out by the manufacturer. Reports about service technician's activities are stored in maintenance histories. Manufacturers of such equipment would benefit significantly from analysis of recorded maintenance and fault histories for planning of maintenance activities, offering scalable service contracts and finding reasons for product faults. This paper introduces a methodology that supports the interpretation of the maintenance histories to allow the manufacturers the analysis and optimization of maintenance operations. The methodology interprets the maintenance histories as sequences of events, containing meaningful patterns. Tailored data mining algorithms are applied, that provide causality details going beyond the results of standard techniques. The paper uses the example of maintenance reports of gas analytic equipment.
Efficiency issue in automatic web-based information retrieval research becomes an important issue to users. Most knowledge and materials on the Internet is in either semi-structured or unstructured hypermedia form. Wh...
详细信息
Efficiency issue in automatic web-based information retrieval research becomes an important issue to users. Most knowledge and materials on the Internet is in either semi-structured or unstructured hypermedia form. When users found a webpage as searching result from the Internet for self-learning, sometimes they may not understand the meanings of specific part of the retrieved webpage easily. They need to spend a lot of time in finding more references manually to make they get clear idea of the original retrieved webpage told. If an agent can be developed to reconstruct the webpage that the users are browsing by inserting additional self-explainable documents' links at appropriate places in the original webpage, it will be perfect. This research uses formal concept analysis (FCA) and association rule methodology (ARM) to develop a Keyword Association Lattice (KAL). With the KAL, the webpages accessed by users can be analyzed and reconstructed automatically. A pedagogical software agent called K-Navi for users doing survey and self-learning on the Internet is developed.
暂无评论