Domain novices learning about a new subject can struggle to find their way in large collections. Typical searching and browsing tools are better utilized if users know what to search for or browse to. In this disserta...
详细信息
Domain novices learning about a new subject can struggle to find their way in large collections. Typical searching and browsing tools are better utilized if users know what to search for or browse to. In this dissertation, we present Multiple Diagram Navigation (MDN) to assist domain novices by providing multiple overviews of the content matter using multiple diagrams. Rather than relying on specific types of visualizations, MDN superimposes any type of diagram or map over a collection of documents, allowing content providers to reveal interesting perspectives of their content. Domain novices can navigate through the content in an exploratory way using three types of queries (navigation): diagram to content (D2C), diagram to diagram (D2D), and content to diagram (C2D). To evaluate the MDN user interface, we conducted a user study, which showed that users found MDN useful and easy to use in exploratory-navigation scenarios. Encouraged by these positive results, we extended the functionality of MDN to provide a ranking of collection documents for D2C queries (expressed by a selected diagram concept). We studied different elements of the ranking process. As a case study, we targeted our research towards the Wikipedia collection. With the goal of studying ranking in different types of diagrams, we introduced two diagram models: the Items-and-Attributes model and the Universal model. We also studied two ranking algorithms: Personalized PageRank (PPR), an algorithm used in similar applications; and Greedy Energy Spreading (GES), an algorithm that we designed. We also studied different approaches to computing rankings for C2D queries. Our results show encouraging performance on the ranking of D2C and C2D queries. For example, in an experiment targeting diagrams conforming to the Items-and-Attributes model, results showed reasonably high similarity between a diagram concept selected by the user and the top-ten-ranked pages. We also found that diagrams had a strong influence
In the domain of architecture and planning, the space allocation problem (SAP) is a general class of computable problems which is employed by numerous design processes to assist in the generation of spaces of a layout...
详细信息
In the domain of architecture and planning, the space allocation problem (SAP) is a general class of computable problems which is employed by numerous design processes to assist in the generation of spaces of a layout and simultaneously satisfy design objectives. The SAP has eluded automation due to combinatorial complexity and geometric intractability. This thesis describes a computational framework for solving the SAP across multiple scales and domains of the application using reinforcement learning algorithms that generate spatial solutions with optimal space-activity relations. In this research, a broad range of computable problems are addressed across three scales of design processes, namely, space planning, site planning, and interactive networks of city blocks. This is achieved by identifying the role of SAP in generating the spatial output of a design process and compartmentalizing the SAP into computable tasks. Each task is mapped to a spatial model that consists of a set of geometric operations driven by optimization algorithms or numerical relations. These techniques are referred to as the space allocation techniques or SAT and developed as autonomous modules. Each spatial model invokes a specific set of SAT modules, in sequence, and the models can be connected to solve the desired SAP. The spatial models are integrated into a framework after considering the exclusivity of the task accomplished by the models, common methods, data structures, and the flow of information between models. It is proposed that the spatial output of large design processes is approximated by creating workflows of connected elemental models. A workflow can be reused to solve project-specific design problems by updating the inputs such as site boundary or project requirements and bylaws. The workflows support design exploration and provide iterative user interaction such that for a given problem, it is possible to study entirely different solutions, explore the downstream propagati
In this dissertation, a theoretical framework based on concentration inequalities for empirical processes is developed to better design iterative optimization algorithms and analyze their convergence properties in the...
详细信息
In this dissertation, a theoretical framework based on concentration inequalities for empirical processes is developed to better design iterative optimization algorithms and analyze their convergence properties in the presence of complex dependence between directions and step-sizes. Based on this framework, we proposed a stochastic away-step Frank-Wolfe algorithm and a stochastic pairwise-step Frank-Wolfe algorithm for solving strongly convex problems with polytope constraints and proved that both of those algorithms converge linearly to the optimal solution in expectation and almost surely. Numerical results showed that the proposed algorithms are faster and more stable than most of their competitors. This framework can be applied for designing and analyzing stochastic algorithms with adaptive step-sizes that are based on local curvature for self-concordant optimization problems. Notably, we proposed and analyzed a stochastic BFGS algorithm without line-search, and proved that it converges linearly globally and super-linearly locally using the framework mentioned above. This is the first work that analyzes a fully stochastic BFGS algorithm, which also avoids time consuming or even impossible line-search steps. A third class of problems that the empirical processes framework can be applied to is to study the optimization of compositions of stochastic functions. A multi-level Monte Carlo based unbiased gradient generation method is introduced into stochastic optimization algorithms for minimizing function compositions. Based on this, standard stochastic optimization algorithms can be applied to these problems directly.
Today, many resources are freely available on the Internet in the form of PDF documents. However, free PDF documents may not contain what people expect. There are several ways that allow attackers/viruses (e.g., Code ...
详细信息
Today, many resources are freely available on the Internet in the form of PDF documents. However, free PDF documents may not contain what people expect. There are several ways that allow attackers/viruses (e.g., Code Red, Melissa) to add malicious content to PDF files, which can badly harm your devices (e.g., redirect you to a fake website, corrupt your operating system, get full access to your devices, etc.). This project aims to detect potential malicious content in PDF files. There are several types of malicious content, such as executable JavaScript, shellcode, and adware. We will first collect the PDF file details and save them to a csv file. We will use the csv file data and test it with our machine learning model, which we build using the Random Forest (RF) algorithm. A result of prediction will be given to a Chrome browser extension. It will run a detector in the background to detect malicious JavaScript from the PDF file and notify users whether the PDF file is benign or malicious during downloading.
ENGLISH SUMMARY : Naïve Bayes is a well-known statistical model that is recognised by the Institute of Electrical and Electronics Engineers (IEEE) as being among the top ten data mining algorithms. It performs cl...
详细信息
ENGLISH SUMMARY : Naïve Bayes is a well-known statistical model that is recognised by the Institute of Electrical and Electronics Engineers (IEEE) as being among the top ten data mining algorithms. It performs classification by making the strong assumption of class conditional mutual statistical independence. Although this assumption is unlikely to be an accurate representation of the true statistical dependencies, naïve Bayes nevertheless delivers accurate classification in many domains. This success can be related to that of linear regression providing reliable estimation in problems where exact linearity is not realistic. There is a rich body of literature on the topic of improving naïve Bayes. This dissertation is concerned with doing so via a projection matrix that provides an alternative representation for the data of interest. We introduce Projected Gaussian naïve Bayes and Projected Kernel naïve Bayes as naïve-Bayes-type classifiers that respectively relies on Gaussianity and kernel density estimation. The proposed method extends the flexibility of the standard naïve Bayes. The approach maintains the simplicity and efficiency of naïve Bayes while improving its accuracy. Our method is shown to be competitive with several popular classifiers on real-world data. In particular, our method’s classification accuracy is compared to that of linear- and quadratic discriminant analysis, the support vector machine and the random forest. There is a close connection between our proposal and the application of naïve Bayes to a class conditionally conducted independent component analysis. In addition to a classification accuracy improvement, the proposed method also provides a tool for visually representing data in low-dimensional space. This visualisation aspect of our method is discussed with respect to the connection to independent component analysis. Our method is shown to give a better visual representation than does linear discriminant analysis on a number of real-wo
This publication presents a co-simulation framework that enables joint simulation experiments by multiple remote laboratories for analyses of smart grid energy systems that can also include power hardware-in-the-loop....
详细信息
This publication presents a co-simulation framework that enables joint simulation experiments by multiple remote laboratories for analyses of smart grid energy systems that can also include power hardware-in-the-loop. It introduces a proof of concept where individual parts of an example electrical grid are modelled at three geographically distributed Fraunhofer Institutes. The models differ greatly in terms of their used tools, functionality, control algorithms, and time resolution. Real-time and non-real-time systems can be combined as well. The framework is developed within the Distributed Grid Lab of the Fraunhofer Cluster of Excellence Integrated Energy Systems (CINES). The goal is to allow laboratory collaboration for addressing the needs of users such as manufacturers, grid operators and research institutions to test their grid automation solutions before field deployment. This testing can be done considering the interactions of various components and solutions at different remotely located testing facilities without needing to use the same hardware and software setup. A demonstrator is developed to highlight these capabilities.
Recent advances in next-generation sequencing and computational technologies have enabled routine analysis of large-scale single-cell ribonucleic acid sequencing (scRNA-seq) data. However, scRNA-seq technologies have ...
详细信息
Recent advances in next-generation sequencing and computational technologies have enabled routine analysis of large-scale single-cell ribonucleic acid sequencing (scRNA-seq) data. However, scRNA-seq technologies have suffered from several technical challenges, including low mean expression levels in most genes and higher frequencies of missing data than bulk population sequencing technologies. Identifying functional gene sets and their regulatory networks that link specific cell types to human diseases and therapeutics from scRNA-seq profiles are daunting tasks. In this study, we developed a Component Overlapping Attribute Clustering (COAC) algorithm to perform the localized (cell subpopulation) gene co-expression network analysis from large-scale scRNA-seq profiles. Gene subnetworks that represent specific gene co-expression patterns are inferred from the components of a decomposed matrix of scRNA-seq profiles. We showed that single-cell gene subnetworks identified by COAC from multiple time points within cell phases can be used for cell type identification with high accuracy (83%). In addition, COAC-inferred subnetworks from melanoma patients' scRNA-seq profiles are highly correlated with survival rate from The Cancer Genome Atlas (TCGA). Moreover, the localized gene subnetworks identified by COAC from individual patients' scRNA-seq data can be used as pharmacogenomics biomarkers to predict drug responses (The area under the receiver operating characteristic curves ranges from 0.728 to 0.783) in cancer cell lines from the Genomics of Drug Sensitivity in Cancer (GDSC) database. In summary, COAC offers a powerful tool to identify potential network-based diagnostic and pharmacogenomics biomarkers from large-scale scRNA-seq profiles. COAC is freely available at https://***/ChengF-Lab/COAC.
The multi-armed restless bandit problem is studied in the case where the pay-off distributions are stationary phi-mixing. This version of the problem provides a more realistic model for most real-world applications, b...
详细信息
The multi-armed restless bandit problem is studied in the case where the pay-off distributions are stationary phi-mixing. This version of the problem provides a more realistic model for most real-world applications, but cannot be optimally solved in practice, since it is known to be PSPACE-hard. The objective of this paper is to characterize a sub-class of the problem where good approximate solutions can be found using tractable approaches. Specifically, it is shown that under some conditions on the phi-mixing coefficients, a modified version of UCB can prove effective. The main challenge is that, unlike in the i.i.d. setting, the distributions of the sampled pay-offs may not have the same characteristics as those of the original bandit arms. In particular, the phi-mixing property does not necessarily carry over. This is overcome by carefully controlling the effect of a sampling policy on the pay-off distributions. Some of the proof techniques developed in this paper can be more generally used in the context of online sampling under dependence. Proposed algorithms are accompanied with corresponding regret analysis.
An individual's Web search behavior can be influenced by a number of factors, including features and functions of a search engine as well as search education. In contrast to the long-lasting attention to the algor...
详细信息
An individual's Web search behavior can be influenced by a number of factors, including features and functions of a search engine as well as search education. In contrast to the long-lasting attention to the algorithm and interface dimensions of search, there is a lack of research concerned with the potential effects of user education on search behavior. To address this gap, we ran a three-session field-lab-combined study to examine the effects of user education from two distinct sources - peer advice and cognitive authority (operationalized as video-based student's advice and expert's advice respectively) - on Web search behavior in two different search task scenarios (i.e., factual specific and factual amorphous tasks). We also tested if these behavioral effects persist for a short period of time when the explicit search tips are removed. Using 185 task session data generated by 31 participants in two field and one lab sessions, this study demonstrates that: (1) both peer advice and cognitive authority are effective in stimulating immediate behavioral changes in Web search;(2) the immediate behavioral impact of search advice is broader in factual amorphous task than in factual specific task;(3) framing search tips as the advice from cognitive authority is more likely to generate continuing, short-term effects on Web search behaviors. This research has implications for the design of task-aware user education as well as the study of users' interactions with IR systems in general.
This paper presents a cyber-physical approach to optimize the semiactive control of a base-isolated structure under a suite of earthquakes. The approach uses numerical search algorithms to guide the exploration of the...
详细信息
This paper presents a cyber-physical approach to optimize the semiactive control of a base-isolated structure under a suite of earthquakes. The approach uses numerical search algorithms to guide the exploration of the design space and real-time hybrid simulation (RTHS) to evaluate candidate designs, creating a framework for real-time hybrid optimization (RHTO). By supplanting traditional numerical analysis (i.e., finite element methods) with RTHS, structural components that are difficult to model can be represented accurately while still capturing global structural performance. The efficiency of RTHO is improved for multiple design excitations with the creation of a multiinterval particle swarm optimization (MI-PSO) algorithm. As a proof-of-concept, RTHO is applied to improve the seismic performance of a base-isolated structure with supplemental control. The proposed RTHO framework with MI-PSO is a versatile technique for multivariate optimization under multiple excitations. It is well suited for the accurate and rapid evaluation of structures with nonlinear experimental substructures, in particular, those that do not undergo permanent damage such as structural control devices. The RTHO framework integrates popular optimization algorithms with advanced experimental methods, creating an exciting new cyber-physical approach to design.
暂无评论