B-mode ultrasound tongue imaging is a non-invasive and real-time method for visualizing vocal tract deformation. However, accurately extracting the tongue’s surface contour remains a significant challenge due to the ...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
B-mode ultrasound tongue imaging is a non-invasive and real-time method for visualizing vocal tract deformation. However, accurately extracting the tongue’s surface contour remains a significant challenge due to the low signal-to-noise ratio (SNR) and prevalent speckle noise in ultrasound images. Traditional supervised learning models often require large labeled datasets, which are labor-intensive to produce and susceptible to noise interference. To address these limitations, we present a novel Counterfactual Ultrasound Anti-Interference Self-Supervised Network (CUAI-SSN), which integrates self-supervised learning (SSL) with counterfactual data augmentation, progressively disentangles confounding factors, ensuring that the model generalizes well across varied ultrasound conditions. Our approach leverages causal reasoning to decouple noise from relevant features, enabling the model to learn robust representations that focus on essential tongue structures. By generating counterfactual image-label pairs, our method introduces alternative, noise-independent scenarios that enhance model training. Furthermore, we introduce attention mechanisms to enhance the network’s ability to capture fine-grained details even in noisy conditions. Extensive experiments on real ultrasound tongue images demonstrate that CUAI-SSN outperforms existing methods, setting a new benchmark for automated contour extraction in ultrasound tongue imaging. Our code is publicly available at https://***/inexhaustible419/CounterfactualultrasoundAI.
QR and LU decompositions are the most important matrix decomposition algorithms. Many studies work on accelerating these algorithms by FPGA or ASIC in a case by case style. In this paper, we propose a unified framewor...
详细信息
QR and LU decompositions are the most important matrix decomposition algorithms. Many studies work on accelerating these algorithms by FPGA or ASIC in a case by case style. In this paper, we propose a unified framework for the matrix decomposition algorithms, combining three QR decomposition algorithms and LU algorithm with pivoting into a unified linear array structure. The QR and LU decomposition algorithms exhibit the same two-level loop structure and the same data dependency. Utilizing the similarities in loop structure and data dependency of matrix decomposition, we unify a fine-grained algorithm for all four matrix decomposition algorithms. Furthermore, we present a unified co-processor structure with a scalable linear array of processing elements (PEs), in which four types of PEs are same in the structure of memory channels and PE connections, but the only difference exists in the internal structure of data path. Our unified co-processor, which is IEEE 32-bit floating-point precision, is implemented and mapped onto a Xilinx Virtex5 FPGA chip. Experimental results show that our co-processors can achieve speedup of 2.3 to 14.9 factors compared to a Pentium Dual CPU with double SSE threads.
The internal single-event transient(SET) induced upset in flip-flops is becoming significant with the increase of the operating frequency. However, the conventional soft error rate(SER) evaluation approach could only ...
详细信息
The internal single-event transient(SET) induced upset in flip-flops is becoming significant with the increase of the operating frequency. However, the conventional soft error rate(SER) evaluation approach could only produce an approximate upset prediction result caused by the internal SET. In this paper, we propose an improved SER evaluation approach based on Monte Carlo simulation. A novel SET-based upset model is implemented in the proposed evaluation approach to accurately predict upsets caused by the internal SET. A test chip was fabricated in a commercial 65 nm bulk process to validate the accuracy of the improved SER evaluation approach. The predicted single-event upset cross-sections are consistent with the experimental data.
Graph is a significant data structure that describes the relationship between entries. Many application domains in the real world are heavily dependent on graph data. However, graph applications are vastly different f...
详细信息
Graph is a significant data structure that describes the relationship between entries. Many application domains in the real world are heavily dependent on graph data. However, graph applications are vastly different from traditional applications. It is inefficient to use general-purpose platforms for graph applications, thus contributing to the research of specific graph processing platforms. In this survey, we systematically categorize the graph workloads and applications, and provide a detailed review of existing graph processing platforms by dividing them into general-purpose and specialized systems. We thoroughly analyze the implementation technologies including programming models, partitioning strategies, communication models, execution models, and fault tolerance strategies. Finally, we analyze recent advances and present four open problems for future research.
Representation learning on textual network or textual network embedding, which leverages rich textual information associated with the network structure to learn low-dimensional embedding of vertices, has been useful i...
详细信息
Representation learning on textual network or textual network embedding, which leverages rich textual information associated with the network structure to learn low-dimensional embedding of vertices, has been useful in a variety of tasks. However, most approaches learn textual network embedding by using direct neighbors. In this paper, we employ a powerful and spatially localized operation: personalized Page Rank(PPR) to eliminate the restriction of using only the direct connection relationship. Also, we analyze the relationship between PPR and spectral-domain theory, which provides insight into the empirical performance boost. From the experiment, we discovered that the proposed method provides a great improvement in linkprediction tasks, when compared to existing methods, achieving a new state-of-the-art on several real-world benchmark datasets.
We present a demand-driven approach to memory leak detection algorithm based on flow- and context-sensitive pointer analysis. The detection algorithm firstly assumes the presence of a memory leak at some program point...
详细信息
We present a demand-driven approach to memory leak detection algorithm based on flow- and context-sensitive pointer analysis. The detection algorithm firstly assumes the presence of a memory leak at some program point and then runs a backward analysis to see if this assumption can be disproved. Our algorithm computes the memory abstraction of programs based on points-to graph resulting from flow- and context-sensitive pointer analysis. We have implemented the algorithm in the SUIF2 compiler infrastructure and used the implementation to analyze a set of C benchmark programs. The experimental results show that the approach has better precision with satisfied scalability as expected.
The key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer ar...
详细信息
The key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer architectures. However, the number of on-chip cores grows quickly with the scale-down of feature size in semiconductor technology. In this paper, we present a scalability investigation of one energy group time-independent deterministic discrete ordinates neutron transport in 3D Cartesian geometry(Sweep3D) on Intel's Many Integrated Core(MIC) architecture, which can provide up to 62 cores with four hardware threads per core now and will own up to 72 in the future. The parallel programming model, Open MP, and vector intrinsic functions are used to exploit thread parallelism and vector parallelism for the discrete ordinates method, respectively. The results on a 57-core MIC coprocessor show that the implementation of Sweep3 D on MIC has good scalability in performance. In addition, the application of the Roofline model to assess the implementation and performance comparison between MIC and Tesla K20 C Graphics processing Unit(GPU) are also reported.
Deep reinforcement learning(RL)has become one of the most popular topics in artificial intelligence *** has been widely used in various fields,such as end-to-end control,robotic control,recommendation systems,and natu...
详细信息
Deep reinforcement learning(RL)has become one of the most popular topics in artificial intelligence *** has been widely used in various fields,such as end-to-end control,robotic control,recommendation systems,and natural language dialogue *** this survey,we systematically categorize the deep RL algorithms and applications,and provide a detailed review over existing deep RL algorithms by dividing them into modelbased methods,model-free methods,and advanced RL *** thoroughly analyze the advances including exploration,inverse RL,and transfer ***,we outline the current representative applications,and analyze four open problems for future research.
Concurrency bugs widely exist in concurrent programs and have caused severe failures in the real world. Researchers have made significant progress in detecting concurrency bugs, which improves software reliability. In...
详细信息
Concurrency bugs widely exist in concurrent programs and have caused severe failures in the real world. Researchers have made significant progress in detecting concurrency bugs, which improves software reliability. In this paper, we survey the most up-to-date and well-known concurrency bug detectors. We categorize the existing detectors based on the types of concurrency bugs. Consequently, we analyze data race detectors, atomicity violation detectors, order violation detectors, and deadlock detectors, respectively. We also discuss some other techniques which are mostly related to concurrency bug detection, including schedule bounding techniques, interleaving optimizing techniques, path expanding techniques, and deterministic replay techniques. Additionally, we statistically analyze the reviewed detectors and get some interesting findings, for instance, nearly 86% of previous detectors focus on data races and atomicity violations, and dynamic approaches are popular(74%). We also discuss the limitations of previous detectors, finding that 91% of previous detectors suffer from false negatives and 64% of previous detectors suffer from runtime overhead. Based on the reviewed detectors and statistical analysis, we conclude some future research directions, including accuracy, performance,applicability, and integrality.
Internet-based virtual computing environment (iVCE) has been proposed to combine data centers and other kinds of computing resources on the Internet to provide efficient and economical services. Virtual machines (...
详细信息
Internet-based virtual computing environment (iVCE) has been proposed to combine data centers and other kinds of computing resources on the Internet to provide efficient and economical services. Virtual machines (VMs) have been widely used in iVCE to isolate different users/jobs and ensure trustworthiness, but traditionally VMs require a long period of time for booting, which cannot meet the requirement of iVCE's large-scale and highly dynamic applications. To address this problem, in this paper we design and implement VirtMan, a fast booting system for a large number of virtual machines in iVCE. VirtMan uses the Linux Small Computer System Interface (SCSI) target to remotely mount to the source image in a scalable hierarchy, and leverages the homogeneity of a set of VMs to transfer only necessary image data at runtime. We have implemented VirtMan both as a standalone system and for OpenStack. In our 100-server testbed, VirtMan boots up 1000 VMs (with a 15 CB image of Windows Server 2008) on 100 physical servers in less than 120 s, which is three orders of magnitude lower than current public clouds.
暂无评论