检索结果-内蒙古大学图书馆

A Machine Learning-based Triage methodology for automated categorization of digital media

DIGITAL INVESTIGATION 2013年第2期10卷 193-204页

作者： Marturana, Fabio Tacconi, Simone Univ Roma Tor Vergata I-00133 Rome Italy

The global diffusion of smartphones and tablets, exceeding traditional desktops and laptops market share, presents investigative opportunities and poses serious challenges to law enforcement agencies and forensic professionals. Traditional Digital Forensics techniques, indeed, may be no longer appropriate for timely analysis of digital devices found at the crime scene. Nevertheless, dealing with specific crimes such as murder, child abductions, missing persons, death threats, such activity may be crucial to speed up investigations. Motivated by this, the paper explores the field of Triage, a relatively new branch of Digital Forensics intended to provide investigators with actionable intelligence through digital media inspection, and describes a new interdisciplinary approach that merges Digital Forensics techniques and Machine Learning principles. The proposed Triage methodology aims at automating the categorization of digital media on the basis of plausible connections between traces retrieved (i.e. digital evidence) and crimes under investigation. As an application of the proposed method, two case studies about copyright infringement and child pornography exchange are then presented to actually prove that the idea is viable. The term "feature" will be regarded in the paper as a quantitative measure of a "plausible digital evidence", according to the Machine Learning terminology. In this regard, we (a) define a list of crime-related features, (b) identify and extract them from available devices and forensic copies, (c) populate an input matrix and (d) process it with different Machine Learning mining schemes to come up with a device classification. We perform a benchmark study about the most popular mining algorithms (i.e. Bayes Networks, Decision Trees, Locally Weighted Learning and Support Vector Machines) to find the ones that best fit the case in question. Obtained results are encouraging as we will show that, triaging a dataset of 13 digital media and 45 copyr

关键词： Digital Forensics Triage Machine Learning Automated digital device categorization Crime-related feature extraction data mining algorithms Child pornography exchange Copyright infringement

来源：评论

学校读者我要写书评

暂无评论

Atomic data mining Numerical Methods, Source Code SQlite with Python

引用

Procedia - Social and Behavioral Sciences 2013年 73卷 232-239页

作者： Ali Khwaldeh Amani Tahat Jordi Marti Mofleh Tahat Department Of Computer Engineering Faculty of engineering Philadelphia University 19392 Amman Jordan Department of Physics and Nuclear Engineering Technical University of Catalonia - Barcelona Tech B5-209 North Campus UPC 08034 Barcelona Catalonia Spain Software Development Engineer American Airlines® Addison TX | HDQ2 3N2D-63

This paper introduces a recently published Python data mining book (chapters, topics, samples of Python source code written by its authors) to be used in data mining via world wide web and any specific database in several disciplines (economic, physics, education, marketing. etc). The book started with an introduction to data mining by explaining some of the data mining tasks involved classification, dependence modelling, clustering and discovery of association rules. The book addressed that using Python in data mining has been gaining some interest from data miner community due to its open source, general purpose programming and web scripting language; furthermore, it is a cross platform and it can be run on a wide variety of operating systens such as Linux, Windows, FreeBSD, Macintosh, Solaris, OS/2, Amiga, AROS, AS/400, BeOS, OS/390, z/OS, Palm OS, QNX, VMS, Psion, Acorn RISC OS, VxWorks, PlayStation, Sharp Zaurus, Windows CE and even PocketPC. Finally this book can be considered as a teaching textbook for data mining in which several methods such as machine learning and statistics are used to extract high-level knowledge from real-world datasets.

关键词： Python atomic data database data mining algorithms data model collaborative intelligence machine learning

来源：评论

学校读者我要写书评

暂无评论

A Unified Theoretical Framework for data mining

引用

Procedia Computer Science 2013年 17卷 104-113页

作者： Dost Muhammad Khan Nawaz Mohamudally D.K.R. Babajee Department of Computer Science & IT The Islamia University of Bahawalpur PAKISTAN School of Innovative Technologies & Engineering (SITE) University of Technology MAURITIUS Scientific & Academic Research Council African Network for Policy Research & Advocacy for Sustainability MAURITIUS

The pattern extraction and discovery of useful information from a dataset are the foremost purposes of data mining; the outcome of this process is the ‘knowledge’ which is helpful in taking the decision. For the past decade there have been multiple attempts and strong beliefs in the development and the formulation of the unified data mining frameworks that would answer to the fundamental versions related to the discovery of knowledge. In this paper we are presenting a novel unified framework for data mining conceptualized through the composite functions. The framework is further illustrated with a variety of real life datasets using different data mining algorithms.

关键词： Composite Function Unified Theory (UT) Unified data mining Theory (UDMT) Multiagent System (MAS) data mining algorithms datasets Unified data mining Framework

来源：评论

学校读者我要写书评

暂无评论

Risk of Hepatotoxicity Associated with the Use of Telithromycin: A Signal Detection Using data mining algorithms

引用

ANNALS OF PHARMACOTHERAPY 2008年第12期42卷 1791-1796页

作者： Chen, Yan Guo, Jeff J. Healy, Daniel P. Lin, Xiaodong Patel, Nick C. Univ Cincinnati James L Winkle Coll Pharm Div Pharm Practice & Adm Sci Acad Hlth Ctr Cincinnati OH 45267 USA Univ Cincinnati McMicken Coll Arts & Sci Dept Math Sci Cincinnati OH 45267 USA Med Coll Georgia Dept Psychiat Augusta GA 30912 USA Univ Georgia Coll Pharm Augusta GA USA

BACKGROUND: With the exception of case reports, limited data are available regarding the risk of hepatotoxicity associated with the use of telithromycin. OBJECTIVE: To detect the safety signal regarding the reporting of hepatotoxicity associated with the use of telithromycin using 4 commonly employed data mining algorithms (DMAs). METHODS: Based on the Adverse Events Reporting System (AERS) database of the Food and Drug Administration, 4 DMAs, including the reporting odds ratio (ROR), the proportional reporting ratio (PRR), the information component (IC), and the Gamma Poisson Shrinker (GPS), were applied to examine the association between the reporting of hepatotoxicity and the use of telithromycin. The study period was from the first quarter of 2004 to the second quarter of 2006. The reporting of hepatotoxicity was identified using the preferred terms indexed in the Medical Dictionary for Regulatory Activities. The drug name was used to identify reports regarding the use of telithromycin. RESULTS: A total of 226 reports describing hepatotoxocity associated with the use of telithromycin were recorded in the AERS. A safety problem of telithromycin associated with increased reporting of hepatotoxicity was clearly detected by 4 algorithms as early as 2005, signaling the problem in the first quarter by the ROR and the IC, in the second quarter by the PRR, and in the fourth quarter by the GPS. CONCLUSIONS: A safety signal was indicated by the 4 DMAs suggesting an association between the reporting of hepatotoxicity and the use of telithromycin. Given the wide use of telithromycin and serious consequences of hepatotoxicity, clinicians should be cautious when selecting telithromycin for treatment of an infection. In addition, further observational studies are required to evaluate the utility of signal detection systems for early recognition of serious, life-threatening, low-frequency drug-induced adverse events.

关键词： data mining algorithms spontaneous reporting system telithromycin

来源：评论

学校读者我要写书评

暂无评论

Exploring and validating data mining algorithms for use in data ascription

Exploring and validating data mining algorithms for use in d...

引用

作者： Daniel P. Huynh Naval Postgraduate School

学位级别：硕士

Digital forensics is a growing and important field of research for current intelligence, law en- forcement, and military organizations today. As more information is stored in digital form, the need and ability to analyze and process this information for relevant evidence has grown in complexity. Today analysis is reliant upon trained experts. This, compounded with the sheer volume of evidence obtained from the field, means that analysis frequently takes too long. Cur- rent forensic tools focus on decoding and visualization and not data reduction or correlation. This thesis fills an important void. The first goal is to determine whether it is possible to use file metadata accurately to ascribe ownership of files based upon a hard drive with multiple users. The second is to explore and validate existing algorithms that may support and aid data ascrip- tion. The last goal of this work is to compare and measure the accuracy of these algorithms. This work facilitates further research into developing an automated analysis and reporting framework for media exploitation in computer forensics.

关键词： data mining algorithms metadata file ascription data carving multi-user hard drives

来源：评论

学校读者我要写书评

暂无评论

Generalized association rule mining with constraints

引用

INFORMATION SCIENCES 2012年 194卷 68-84页

作者： Baralis, Elena Cagliero, Luca Cerquitelli, Tania Garza, Paolo Politecn Milan Dipartimento Elettron & Informat I-20133 Milan Italy Politecn Torino Dipartimento Automat & Informat I-10129 Turin Italy

Generalized association rule extraction is a powerful tool to discover a high level view of the interesting patterns hidden in the analyzed data. However, since the patterns are extracted at any level of abstraction, the mined rule set may be too large to be effectively exploited in the decision making process. Thus, to discover valuable and interesting knowledge a post-processing step is usually required. This paper presents the CoGAR framework to efficiently support constrained generalized association rule mining. The generalization process of CoGAR exploits a (user-provided) multiple-taxonomy to drive an opportunistic itemset generalization process, which prevents discarding relevant but infrequent knowledge by aggregating features at different granularity levels. Besides the traditional support and confidence constraints, two further constraints are enforced: (i) schema constraints and (ii) the opportunistic confidence constraint. Schema constraints allow the analyst to specify the structure of the patterns of interest and drive the itemset mining phase. The opportunistic confidence constraint, a new constraint proposed in this paper, allows us to discriminate between significant and redundant rules by analyzing similar rules belonging to different abstraction levels. This constraint is enforced during the rule generation step. Experiments performed on real datasets collected in two different application domains show the effectiveness and the efficiency of the proposed framework in mining constrained generalized association rules. (C) 2011 Elsevier Inc. All rights reserved.

关键词： Generalized association rules data mining algorithms Knowledge discovery Context-aware mining Network traffic analysis mining with constraints

来源：评论

学校读者我要写书评

暂无评论

Impact of Social Attributes on Predictive Analytics in Telecommunication Industry

Impact of Social Attributes on Predictive Analytics in Telec...

引用

15th IEEE International Multitopic Conference (INMIC)

作者： Hashmi, Osama Z. Sheikh, Shahid SZABIST Islamabad Islamabad Pakistan

ISBN: (纸本)9781467322522;9781467322492

This paper will be presenting how the social attributes impact the Predictive Analytics results when applied on the telecommunication industry dataset. Predictive Analytics is the new emerging field in data mining, and is actively being applied to solve multiple business questions, such as Customer Churn, Product Up-Sell and Cross-Sell, etc. Predictive Analytical models exploit patterns that are found in historical and transactional data to identify risks, opportunities and future events. During comparative analysis it was noticed that with the Addition of Social Attributes, the Efficiency and Reliability of these Models is greatly enhanced.

关键词： Predictive Analytics Social Network Social Network Analysis Customer Churn Prediction data mining algorithms

来源：评论

学校读者我要写书评

暂无评论

Geometric data perturbation for privacy preserving outsourced data mining

引用

KNOWLEDGE AND INFORMATION SYSTEMS 2011年第3期29卷 657-695页

作者： Chen, Keke Liu, Ling Wright State Univ Dept Comp Sci & Engn Dayton OH 45435 USA Georgia Inst Technol Coll Comp Atlanta GA 30332 USA

data perturbation is a popular technique in privacy-preserving data mining. A major challenge in data perturbation is to balance privacy protection and data utility, which are normally considered as a pair of conflicting factors. We argue that selectively preserving the task/model specific information in perturbation will help achieve better privacy guarantee and better data utility. One type of such information is the multidimensional geometric information, which is implicitly utilized by many data-mining models. To preserve this information in data perturbation, we propose the Geometric data Perturbation (GDP) method. In this paper, we describe several aspects of the GDP method. First, we show that several types of well-known data-mining models will deliver a comparable level of model quality over the geometrically perturbed data set as over the original data set. Second, we discuss the intuition behind the GDP method and compare it with other multidimensional perturbation methods such as random projection perturbation. Third, we propose a multi-column privacy evaluation framework for evaluating the effectiveness of geometric data perturbation with respect to different level of attacks. Finally, we use this evaluation framework to study a few attacks to geometrically perturbed data sets. Our experimental study also shows that geometric data perturbation can not only provide satisfactory privacy guarantee but also preserve modeling accuracy well.

关键词： Privacy-preserving data mining data perturbation Geometric data perturbation Privacy evaluation data mining algorithms

来源：评论

学校读者我要写书评

暂无评论

Analysis of Maintenance Histories of Industrial Equipment with Frequent Maintenance Demand

Analysis of Maintenance Histories of Industrial Equipment wi...

引用

9th IEEE International Conference on Industrial Informatics (INDIN)

作者： Schwenke, Clemens Vasyutynskyy, Volodymyr Roeder, Andre Kabitzsch, Klaus Tech Univ Dresden Chair Tech Informat Syst D-01062 Dresden Germany

ISBN: (纸本)9781457704345

In order to guarantee high operational availability, the modular industrial equipment requires frequent maintenance, which oftentimes is carried out by the manufacturer. Reports about service technician's activities are stored in maintenance histories. Manufacturers of such equipment would benefit significantly from analysis of recorded maintenance and fault histories for planning of maintenance activities, offering scalable service contracts and finding reasons for product faults. This paper introduces a methodology that supports the interpretation of the maintenance histories to allow the manufacturers the analysis and optimization of maintenance operations. The methodology interprets the maintenance histories as sequences of events, containing meaningful patterns. Tailored data mining algorithms are applied, that provide causality details going beyond the results of standard techniques. The paper uses the example of maintenance reports of gas analytic equipment.

关键词： Analytical models Buildings data mining History Inspection Maintenance engineering Tagging data mining data mining algorithms fault diagnosis fault histories frequent maintenance demand gas analytic equipment history industrial equipments maintenance acti

来源：评论

学校读者我要写书评

暂无评论

A Personalized Webpage Reconstructor Based on Concept Lattice and Association Rules

引用

JOURNAL OF INTERNET TECHNOLOGY 2011年第6期12卷 1015-1024页

作者： Kuo, Rita Hsu, Chang-Kai Chang, Maiga Heh, Jia-Sheng Chung Yuan Christian Univ Dept Informat & Comp Engn Zhongli City Taiwan Athabasca Univ Sch Comp & Informat Syst Athabasca AB Canada

Efficiency issue in automatic web-based information retrieval research becomes an important issue to users. Most knowledge and materials on the Internet is in either semi-structured or unstructured hypermedia form. When users found a webpage as searching result from the Internet for self-learning, sometimes they may not understand the meanings of specific part of the retrieved webpage easily. They need to spend a lot of time in finding more references manually to make they get clear idea of the original retrieved webpage told. If an agent can be developed to reconstruct the webpage that the users are browsing by inserting additional self-explainable documents' links at appropriate places in the original webpage, it will be perfect. This research uses formal concept analysis (FCA) and association rule methodology (ARM) to develop a Keyword Association Lattice (KAL). With the KAL, the webpages accessed by users can be analyzed and reconstructed automatically. A pedagogical software agent called K-Navi for users doing survey and self-learning on the Internet is developed.

关键词： Pedagogical software agent data mining algorithms Keyword query Hypermedia Knowledge-Based software

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：