Modern interactive data analysis Aplatforms, Splunk, and Tableau, are gradually replacing traditional OLAP/SQL tools, as they allow for easy-to-use data exploration, visualization, and mining, even for users lacking S...
详细信息
ISBN:
(纸本)9781450355520
Modern interactive data analysis Aplatforms, Splunk, and Tableau, are gradually replacing traditional OLAP/SQL tools, as they allow for easy-to-use data exploration, visualization, and mining, even for users lacking SQL and programming skills. Nevertheless, dataanalysis is still a difficult task, especially for non-expert users. To that end we present REACT, a recommender system designed for modern IDA platforms. In these platforms, analysis sessions interweave high-level actions of to multiple types and operate over diverse datasets. REACT identifies and generalizes relevant (previous) sessions to generate personalized next-action suggestions to the user. We model the user's analysis context using a generic tree based model, where the edges represent the user's recent actions, and the nodes represent their result 'screens". A dedicated context similarity metric is employed for efficient indexing and retrieval of relevant candidate next-actions. These are then generalized to abstract actions that convey common fragments, then adapted to the specific user context. To prove the utility of REACT we performed an extensive online and offline experimental evaluation over real world analysis logs from the cyber security domain, which we also publish to serve as a benchmark dataset for future work.
BackgroundSequencing data has become a standard measure of diverse cellular activities. For example, gene expression is accurately measured by RNA sequencing (RNA-Seq) libraries, protein-DNA interactions are captured ...
详细信息
BackgroundSequencing data has become a standard measure of diverse cellular activities. For example, gene expression is accurately measured by RNA sequencing (RNA-Seq) libraries, protein-DNA interactions are captured by chromatin immunoprecipitation sequencing (ChIP-Seq), protein-RNA interactions by crosslinking immunoprecipitation sequencing (CLIP-Seq) or RNA immunoprecipitation (RIP-Seq) sequencing, DNA accessibility by assay for transposase-accessible chromatin (ATAC-Seq), DNase or MNase sequencing libraries. The processing of these sequencing techniques involves library-specific approaches. However, in all cases, once the sequencing libraries are processed, the result is a count table specifying the estimated number of reads originating from each genomic locus. Differential analysis to determine which loci have different cellular activity under different conditions starts with the count table and iterates through a cycle of data assessment, preparation and analysis. Such complex analysis often relies on multiple programs and is therefore a challenge for those without programming *** developed DEBrowser as an R bioconductor project to interactively visualize every step of the differential analysis, without programming. The application provides a rich and interactive web based graphical user interface built on R's shiny infrastructure. DEBrowser allows users to visualize data with various types of graphs that can be explored further by selecting and re-plotting any desired subset of data. Using the visualization approaches provided, users can determine and correct technical variations such as batch effects and sequencing depth that affect differential analysis. We show DEBrowser's ease of use by reproducing the analysis of two previously published data *** is a flexible, intuitive, web-based analysis platform that enables an iterative and interactiveanalysis of count data without any requirement of programming knowledge.
interactive visualization can support fluid exploration but is often limited to predetermined tasks. Scripting can support a vast range of queries but may be more cumbersome for free-form exploration. Embedding intera...
详细信息
interactive visualization can support fluid exploration but is often limited to predetermined tasks. Scripting can support a vast range of queries but may be more cumbersome for free-form exploration. Embedding interactive visualization in scripting environments, such as computational notebooks, provides an opportunity to leverage the strengths of both direct manipulation and scripting. We investigate interactive visualization design methodology, choices, and strategies under this paradigm through a design study of calling context trees used in performance analysis, a field which exemplifies typical exploratory dataanalysis workflows with Big data and hard to define problems. We first produce a formal task analysis assigning tasks to graphical or scripting contexts based on their specificity, frequency, and suitability. We then design a notebook-embedded interactive visualization and validate it with intended users. In a follow-up study, we present participants with multiple graphical and scripting interaction modes to elicit feedback about notebook-embedded visualization design, finding consensus in support of the interaction model. We report and reflect on observations regarding the process and design implications for combining visualization and scripting in notebooks.
Over the past five years, the Istituto Nazionale di Geofisica e Vulcanologia (INGV) has started a technological transformation of its real-time seismic monitoring capabilities. This comprehensive restructuring initiat...
详细信息
Over the past five years, the Istituto Nazionale di Geofisica e Vulcanologia (INGV) has started a technological transformation of its real-time seismic monitoring capabilities. This comprehensive restructuring initiative represents a pivotal moment in the Institute's commitment to advancing seismic research and enhancing public safety. At the heart of this transformation lies the development and deployment of the integrated system known as Caravel. Caravel stands as a testament to INGV's dedication to cutting-edge seismic monitoring technology and its mission to provide timely and accurate seismic information to researchers, emergency responders, and the general public. It represents a leap forward in real-time seismic monitoring, integrating state-of-the-art technologies and methodologies to detect, analyze, and disseminate seismic data with unprecedented efficiency and precision. This development reflects INGV's commitment to staying at the forefront of seismic research and hazard mitigation. This integrated system not only improves the accuracy of earthquake detection but also enhances our ability to rapidly assess the potential impact of seismic events, enabling more informed decisionmaking during emergency situations. Seismic Intelligence Tool (SIT) emerges as a software fork from one of Caravel's components previously known as PickFX. The reason behind this fork is to share with the scientific community a robust, multi-platform and freely accessible dataanalysis tool that adheres to current standards for representing seismic data while removing all INGV specific customizations from PickFX. The decision to fork the original software and release SIT underscores a commitment to democratizing access to advanced seismic analysis tools. By offering this resource at no cost, the scientific community gains access to a platform that is fully compatible with contemporary seismic data representation standards and that can become very powerful with time and cooperation.
The application of digitalization in manufacturing involves using sensors to collect and transmit large amounts of data in real-time. These complex, timestamped sequence data require effective analytical support to dr...
详细信息
Background: The interpretation of results from transcriptome profiling experiments via RNA sequencing (RNA-seq) can be a complex task, where the essential information is distributed among different tabular and list fo...
详细信息
Background: The interpretation of results from transcriptome profiling experiments via RNA sequencing (RNA-seq) can be a complex task, where the essential information is distributed among different tabular and list formats-normalized expression values, results from differential expression analysis, and results from functional enrichment analyses. A number of tools and databases are widely used for the purpose of identification of relevant functional patterns, yet often their contextualization within the data and results at hand is not straightforward, especially if these analytic components are not combined together efficiently. Results: We developed the GeneTonic software package, which serves as a comprehensive toolkit for streamlining the interpretation of functional enrichment analyses, by fully leveraging the information of expression values in a differential expression context. GeneTonic is implemented in R and Shiny, leveraging packages that enable HTML-based interactive visualizations for executing drilldown tasks seamlessly, viewing the data at a level of increased detail. GeneTonic is integrated with the core classes of existing Bioconductor workflows, and can accept the output of many widely used tools for pathway analysis, making this approach applicable to a wide range of use cases. Users can effectively navigate interlinked components (otherwise available as flat text or spreadsheet tables), bookmark features of interest during the exploration sessions, and obtain at the end a tailored HTML report, thus combining the benefits of both interactivity and reproducibility. Conclusion: GeneTonic is distributed as an R package in the Bioconductor project (https://***/packages/GeneTonic/) under the MIT license. Offering both bird's-eye views of the components of transcriptome dataanalysis and the detailed inspection of single genes, individual signatures, and their relationships, GeneTonic aims at simplifying the process of interpretation of comp
At the AIAA SciTech 2020 conference, the Meshing, Visualization and Computational Environments Technical Committee hosted a special technical panel on In Situ/In Transit Computational Environments for Visualization an...
详细信息
ISBN:
(数字)9781624106095
ISBN:
(纸本)9781624106095
At the AIAA SciTech 2020 conference, the Meshing, Visualization and Computational Environments Technical Committee hosted a special technical panel on In Situ/In Transit Computational Environments for Visualization and data Analytics. The panel brought together leading experts from industry, software vendors, Department of Energy, Department of Defense and the Japan Aerospace Exploration Agency (JAXA). In situ and in transit methodologies enable Computational Fluid Dynamic (CFD) simulations to avoid the excessive overhead associated with data I/O at large scales especially as simulations scale to millions of processors. These methods either share the dataanalysis/visualization pipelines with the memory space of the solver or efficiently off load the workload to alternate processors. Using these methods, simulations can scale and have the promise of enabling the community to satisfy the Knowledge Extraction milestones as envisioned by the CFD Vision 2030 study for "on demand analysis/visualization of a 100 Billion point unsteady CFD simulation". This paper summarizes the presentations providing a discussion point of how the community can achieve the goals set forth in the CFD Vision 2030.
We present ProSecCo, an algorithm for the progressive mining of frequent sequences from large transactional datasets: It processes the dataset in blocks and it outputs, after having analyzed each block, a high-quality...
详细信息
We present ProSecCo, an algorithm for the progressive mining of frequent sequences from large transactional datasets: It processes the dataset in blocks and it outputs, after having analyzed each block, a high-quality approximation of the collection of frequent sequences. ProSecCo can be used for interactivedata exploration, as the intermediate results enable the user to make informed decisions as the computation proceeds. These intermediate results have strong probabilistic approximation guarantees and the final output is the exact collection of frequent sequences. Our correctness analysis uses the Vapnik-Chervonenkis (VC) dimension, a key concept from statistical learning theory. The results of our experimental evaluation of ProSecCo on real and artificial datasets show that it produces fast-converging high-quality results almost immediately. Its practical performance is even better than what is guaranteed by the theoretical analysis, and ProSecCo can even be faster than existing state-of-the-art non-progressive algorithms. Additionally, our experimental results show that ProSecCo uses a constant amount of memory, and orders of magnitude less than other standard, non-progressive, sequential pattern mining algorithms.
Background: Traditional Chinese medicine (TCM) formulas are combinations of Chinese herbal medicines. Knowledge of classic medicine formulas is the basis of TCM diagnosis and treatment and is the core of TCM inheritan...
详细信息
Background: Traditional Chinese medicine (TCM) formulas are combinations of Chinese herbal medicines. Knowledge of classic medicine formulas is the basis of TCM diagnosis and treatment and is the core of TCM inheritance. The large number and flexibility of medicine formulas make memorization difficult, and understanding their composition rules is even more difficult. The multifaceted and multidimensional properties of herbal medicines are important for understanding the formula;however, these are usually separated from the formula information. Furthermore, these data are presented as text and cannot be analyzed jointly and ***: We aimed to devise a visualization method for TCM formulas that shows the composition of medicine formulas and the multidimensional properties of herbal medicines involved and supports the comparison of medicine ***: A TCM formula visualization method with multiple linked views is proposed and implemented as a web-based tool after close collaboration between visualization and TCM experts. The composition of medicine formulas is visualized in a formula view with a similarity-based layout supporting the comparison of compositing herbs;a shared herb view complements the formula view by showing all overlaps of pair-wise formulas;and a dimensionality-reduction plot of herbs enables the visualization of multidimensional herb properties. The usefulness of the tool was evaluated through a usability study with TCM ***: Our method was applied to 2 typical categories of medicine formulas, namely tonic formulas and heat-clearing formulas, which contain 20 and 26 formulas composed of 58 and 73 herbal medicines, respectively. Each herbal medicine has a 23-dimensional characterizing attribute. In the usability study, TCM experts explored the 2 data sets with our web-based tool and quickly gained insight into formulas and herbs of interest, as well as the overall features of the formula groups that are difficult t
Background: The human leukocyte antigen (HLA) proteins play a fundamental role in the adaptive immune system as they present peptides to T cells. Mass-spectrometry-based immunopeptidomics is a promising and powerful t...
详细信息
Background: The human leukocyte antigen (HLA) proteins play a fundamental role in the adaptive immune system as they present peptides to T cells. Mass-spectrometry-based immunopeptidomics is a promising and powerful tool for characterizing the immunopeptidomic landscape of HLA proteins, that is the peptides presented on HLA proteins. Despite the growing interest in the technology, and the recent rise of immunopeptidomics-specific identification pipelines, there is still a gap in data-analysis and software tools that are specialized in analyzing and visualizing immunopeptidomics data. Results: We present the IPTK library which is an open-source Python-based library for analyzing, visualizing, comparing, and integrating different omics layers with the identified peptides for an in-depth characterization of the immunopeptidome. Using different datasets, we illustrate the ability of the library to enrich the result of the identified peptidomes. Also, we demonstrate the utility of the library in developing other software and tools by developing an easy-to-use dashboard that can be used for the interactiveanalysis of the results. Conclusion: IPTK provides a modular and extendable framework for analyzing and integrating immunopeptidomes with different omics layers. The library is deployed into PyPI at https://***/project/IPTKL/ and into Bioconda at https://***/bioconda/iptkl, while the source code of the library and the dashboard, along with the online tutorials are available at https://***/ikmb/iptoolkit.
暂无评论