This manuscript provides an in-depth analysis of Major League Baseball (MLB) team performance from 2005 to 2023 using data visualization in Tableau and proposed an hybrid data visualization model (VisMLB) for Major Le...
详细信息
ISBN:
(数字)9798331533205
ISBN:
(纸本)9798331533212
This manuscript provides an in-depth analysis of Major League Baseball (MLB) team performance from 2005 to 2023 using data visualization in Tableau and proposed an hybrid data visualization model (VisMLB) for Major League Baseball matches. The objective is to identify key metrics, such as batting, fielding, and pitching, that influence team success, including championship wins. The authors analyze the total number of championships won by each team, compare top-performing teams based on a variety of performance metrics, and explore the relationship between On-Base Percentage (OBP) and Slugging Percentage (SLG) for home and visitor teams in relation to home victories. Using Tableau, feature importance is visualized, and predictive models are evaluated for accuracy, precision, and recall. The findings offer a comprehensive overview of MLB performance trends over nearly two decades.
Text-to-visualization (Text2Vis) aims to democratize data insights for non-expert users by transforming natural language query (NLQ) into visualization specification (VS). In view of the high dependence of rule-based ...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Text-to-visualization (Text2Vis) aims to democratize data insights for non-expert users by transforming natural language query (NLQ) into visualization specification (VS). In view of the high dependence of rule-based methods on predefined VS templates and poor NLQ understanding ability of small data-driven methods, recent works resort to leveraging pre-trained LLMs to perform NLQ-understanding and VS-generating in Text2Vis tasks via prompt-guided in-context learning. However, existing LLM-based methods still fall short of satisfactory end-to-end Text2Vis performances primarily owing to the limited ability of pre-trained LLMs in directly retrieving and operating NLQ-intended tabular data. Inspired by the SQL generating ability born with latest LLMs, this paper proposes the idea of harnessing LLMs for SQL-driven visualizationdata retrieval and operation. Nonetheless, there remains a nonneglectable gap between data visualization queries in Text2Vis tasks and SQL retrieving queries in LLM corpus. To fill the gap, this paper proposes a visualization Query to SQL (VisQ2SQL) framework to obtain NLQ-intended data, primarily by fine-tuning LLMs through preference learning data-retrieval SQLs induced from VS and those generated by LLMs. We conduct extensive experiments to demonstrate the superiority of VisQ2SQL over SOTA methods, and various ablation studies to verify the efficacy of VisQ2SQL.
Big data processing and analysis have become significant challenges in assembling account management. visualization of data using conventional print hasn't quite fared well; it doesn't offer readability, which...
详细信息
ISBN:
(数字)9798331508456
ISBN:
(纸本)9798331508463
Big data processing and analysis have become significant challenges in assembling account management. visualization of data using conventional print hasn't quite fared well; it doesn't offer readability, which means it has not been effective in decision-making. We have integrated DCGAN technology with accounting management data to strengthen the visualization of the data-set. A 6,000-record data-set was put together, cleaned, and standards set. A DCGAN model was trained for 5,000 cycles with optimal generator and discriminator components. This means that, based on the output, the actual information content on the DCGAN images is about 92 % in comparison to classical visualization techniques, with more intuitive and clearer views. These new visualizations enable to back to better understanding of data and supporting of decisions in accounting management. The research unveils the opportunity for DCGAN technology to transform the industry of data visualization in view of improved readability and usability of accounting data for accounting decisions.
ProteoArk is a web-based tool that offers a range of computational pipelines for comprehensive analysis and visualization of mass spectrometry-based proteomics data. The application comprises four primary sections des...
详细信息
ProteoArk is a web-based tool that offers a range of computational pipelines for comprehensive analysis and visualization of mass spectrometry-based proteomics data. The application comprises four primary sections designed to address various aspects of mass spectrometry data analysis in a single platform, including label-free and labeled samples (SILAC/iTRAQ/TMT), differential expression analysis, and data visualization. ProteoArk supports postprocessing of Proteome Discoverer, MaxQuant, and MSFragger search results. The tool also includes functional enrichment analyses such as gene ontology, protein-protein interactions, pathway analysis, and differential expression analysis, which incorporate various statistical tests. By streamlining workflows and developing user-friendly interfaces, we created a robust and accessible solution for users with basic bioinformatics skills in proteomic data analysis. Users can easily create manuscript-ready figures with a single click, including principal component analysis, heatmaps (K-means and hierarchical), MA plots, volcano plots, and circular bar plots. ProteoArk is developed using the Django framework and is freely available for users [https://***/proteoark/]. Users can also download and run the standalone version of ProteoArk using Docker as described in the instructions [https://***/proteoark/dockerpage]. The application code, input data, and documentation are available online at https://***/ArokiaRex/proteoark. A tutorial video is available on YouTube: https://***/watch?v=WFMKAZ9Slq4&ab_channel=RexD.A.B.
With increasing frequency and severity, coastal cities are facing the effects of extreme weather events, such as sea-level rise, storm surges, hurricanes, and various types of flooding. Recent urban resilience scholar...
详细信息
With increasing frequency and severity, coastal cities are facing the effects of extreme weather events, such as sea-level rise, storm surges, hurricanes, and various types of flooding. Recent urban resilience scholarship suggests that responding to the cascading complexities of climate change requires an understanding of cities as social-ecological-technological systems, or SETS. Advances in data visualization, sensors, and analytics are making it possible for urban planners to gain more comprehensive views of cities. Yet, addressing climate complexity requires more than deploying the latest technologies;it requires transforming the institutional knowledge systems upon which cities rely for preparation and response in a climate-changed future. While debates in the theory and practice of knowledge co-production offer a rich contextual starting point, there are few practical examples of what it means to co-produce new knowledge systems capable of steering urban resilience planning in fundamentally new directions. This paper helps address this gap by offering a case study approach to co-producing new knowledge systems for SETS data visualization in three US coastal cities. Through a series of innovation spaces - dialogues, labs, and webinars - with residents, data experts, and other city stakeholders from multiple sectors, we show how to apply a knowledge systems approach to better understand, represent, and support cities as SETS. To illustrate what a redesigned knowledge system for urban resilience planning entails, we document the key steps and activities that led to a new prototype SETS platform that works with a wider range of ways of knowing - including community-based expertise, interdisciplinary research contributions, and various municipal actors' know-how - to build anticipatory capacity for visualizing and navigating the complex dynamics of a climate-changed future. Our findings point to new roles for activity-based learning, conflict, and SETS visualizatio
This article surveys and compares literature on data journalism from two areas of inquiry: journalism studies and visualization research. As digital interfaces become an important access point for news, journalism and...
详细信息
This article surveys and compares literature on data journalism from two areas of inquiry: journalism studies and visualization research. As digital interfaces become an important access point for news, journalism and visualization scholars have begun to share a common research interest: data journalism. Given their radically different traditions and histories, these areas follow very different rules in how the topic is approached. The result is two parallel scholarships on data journalism with little points of contact. Arguably, developing research space for encounters and exchange of the two is an opportunity for expanding the academic discourse on data journalism. This study aims at opening this space of exchange through a systematic literature review. 121 articles, published between 2010 and 2023, are analyzed. Findings show that the two areas of research approach data journalism with very different aspirations. In relation to data journalism, journalism studies and visualization research could be compared with Lazersfeld's distinction between critical and administrative research. These aspects cause various differences at an epistemic level, namely what, how and when knowledge about data journalism is produced.
visualization techniques are useful in the analysis and insight generation for applications in computing in science and engineering. In this article, we describe the importance of visualization to a digital twin (DT),...
详细信息
visualization techniques are useful in the analysis and insight generation for applications in computing in science and engineering. In this article, we describe the importance of visualization to a digital twin (DT), a virtual representation of a physical object, process or system that can be applied for different tasks, such as data-driven simulation, analysis or monitoring. We illustrate tasks in DTs and give examples of how visualization techniques can be applied for DTs in different application areas.
Semiconductor manufacturing plays a crucial role in the world's economic growth and technology development and is the backbone of the high value-added electronic device manufacturing industry. In this paper, a new...
详细信息
Semiconductor manufacturing plays a crucial role in the world's economic growth and technology development and is the backbone of the high value-added electronic device manufacturing industry. In this paper, a new anomaly detection framework by means of data visualization is proposed for semiconductor manufacturing. Firstly, t-Distributed Stochastic Neighbor Embedding (t-SNE) in unsupervised learning is used to transform the high-dimensional raw trace data, corresponding to normal wafers, into a two-dimensional map, with the purpose of visually observing the distribution of normal wafers. The t-SNE algorithm cannot be used at run time for a new test sample since it requires the whole dataset for the embedding transformation, and is computationally very expensive. The Multilayer Perceptron (MLP) neural network is then applied as a regressor for the real-time t-SNE embedding of a new test data. The envelope of t-SNE score estimates for a set of normal wafers is circumscribed and used as the 2D control boundary based on the Delaunay Triangulation (D.T.). A new test sample with its MLP estimated embedding points outside the D.T boundary is identified as defective. Lastly, a real-world dataset in semiconductor manufacturing is used to illustrate the proposed data visualization tool for anomaly detection. The experimental results show that a multilayer perceptron in combination with t-SNE and Delaunay Triangulation performs very well for data visualization and automated detection of anomalies.
In semiconductor etching processes, fault detection monitors the quality of wafers. However, the detailed dynamics in batch data are ignored in many traditional methods. In this paper, sequential image-based data visu...
详细信息
In semiconductor etching processes, fault detection monitors the quality of wafers. However, the detailed dynamics in batch data are ignored in many traditional methods. In this paper, sequential image-based data visualization and fault detection, using bi-kernel t-distributed stochastic neighbor embedding (t-SNE), is proposed for semiconductor etching processes. In the proposed method, multi-modals, multi-phases, and abnormal samples in batches are visualized in two-dimensional maps. First, the batch data are restructured into sequential images and input to a convolutional autoencoder (CAE) to learn the abstract representation. Then, bi-kernel t-SNE is applied to visualize the CAE codes in two-dimensional maps. To reduce the computational burden and overcome the out-of-sample projection diffusion in bi-kernel t-SNE, data subsampling is used in the training procedure. Finally, a one-class support vector machine is employed to calculate the visualization control boundary, and a batch-wise index is presented for fault wafer detection. To demonstrate the feasibility and effectiveness of the proposed method, it was applied to two wafer etching datasets. The results indicate that the proposed method outperforms state-of-the-art methods in data visualization and fault detection.
visualization techniques have been front-and-center in the efforts to communicate the science around COVID-19 to the very broad audience of policymakers, scientists, healthcare providers, and the general public. In th...
详细信息
visualization techniques have been front-and-center in the efforts to communicate the science around COVID-19 to the very broad audience of policymakers, scientists, healthcare providers, and the general public. In this article, I summarize and illustrate with examples how visualization can help understand different aspects of the pandemic.
暂无评论