With serverless computing offering more efficient and cost-effective application deployment, the diversity of serverless platforms presents challenges to users, including platform lock-in and costly migration. Moreove...
详细信息
SYCL programming model does not guarantee performance portability across different architectures. However, the HPC community severely needs platform-independent performance portable applications more than ever. Theref...
详细信息
The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory ...
详细信息
The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory parallel *** to adapt to the NUMA effect can significantly downgrade application performance,especially on today’s manycore platforms with tens to hundreds of ***,traditional approaches such as first-touch and memory policy fall short in false page-sharing,fragmentation,or ease of *** this paper,we propose a partitioned shared-memory approach that allows multithreaded applications to achieve full NUMA-awareness with only minor code changes and develop an accompanying NUMA-aware heap manager which eliminates false page-sharing and minimizes *** on a 256-core cc-NUMA computing node show that the proposed approach helps applications to adapt to NUMA with only minor code changes and improves the performance of typical multithreaded scientific applications by up to 4.3 folds with the increased use of cores.
Time series data generated by thousands of sensors are suffering data quality problems. Traditional constraint-based techniques have greatly contributed to data cleaning applications. However, cleaning methods that su...
详细信息
Message passing is a fundamental element in software development, ranging from concurrent and mobile computing to distributed services, but it suffers from communication errors such as deadlocks. Session types are a t...
详细信息
In the current data-intensive landscape, B+trees are crucial data structures utilized across various fields like databases and web indexing. With the rise of data explosion, the demand for high-performance real-time q...
详细信息
In the current data-intensive landscape, B+trees are crucial data structures utilized across various fields like databases and web indexing. With the rise of data explosion, the demand for high-performance real-time query processing in database systems has surged. For instance, Alibaba's PolarDB and AnalyticDB systems handle massive query volumes and real-time data processing, highlighting the need for efficient solutions. Traditional approaches leveraging GPUs for B+tree performance enhancement have yielded positive results but suffer from high energy consumption, making their widespread deployment in large data centers impractical. This paper introduces GreenB+Tree, an energy-efficient B+tree optimized for the PEZY-SC3s chip, known for its high energy efficiency and MIMD architecture, which mitigates common GPU memory and warp divergence issues. GreenB+Tree innovatively transforms the B+tree structure into two one-dimensional arrays, significantly reducing memory costs and computational overhead. It further incorporates query-agglomerated optimization (QAO) and the persistent data residency strategy (PDRS) to minimize global memory access and enhance cache efficiency. Experimental evaluations demonstrate that GreenB+Tree achieves a throughput of 62.6 Million Queries Per Second per Watt (MQPS/W), outperforming contemporary GPU-based solutions by approximately 4.5 times.
Since the preparation of labeled datafor training semantic segmentation networks of pointclouds is a time-consuming process, weakly supervisedapproaches have been introduced to learn fromonly a small fraction of data....
详细信息
Since the preparation of labeled datafor training semantic segmentation networks of pointclouds is a time-consuming process, weakly supervisedapproaches have been introduced to learn fromonly a small fraction of data. These methods aretypically based on learning with contrastive losses whileautomatically deriving per-point pseudo-labels from asparse set of user-annotated labels. In this paper, ourkey observation is that the selection of which samplesto annotate is as important as how these samplesare used for training. Thus, we introduce a methodfor weakly supervised segmentation of 3D scenes thatcombines self-training with active learning. Activelearning selects points for annotation that are likelyto result in improvements to the trained model, whileself-training makes efficient use of the user-providedlabels for learning the model. We demonstrate thatour approach leads to an effective method that providesimprovements in scene segmentation over previouswork and baselines, while requiring only a few userannotations.
We constructed a student classroom teaching video dataset, constructed a student classroom expression dataset through video frame extraction, face detection, face alignment, and facial extraction, and constructed a st...
详细信息
Generic Boundary Detection (GBD) aims at locating the general boundaries that divide videos into semantically coherent and taxonomy-free units, and could serve as an important pre-processing step for long-form video u...
详细信息
Generic Boundary Detection (GBD) aims at locating the general boundaries that divide videos into semantically coherent and taxonomy-free units, and could serve as an important pre-processing step for long-form video understanding. Previous works often separately handle these different types of generic boundaries with specific designs of deep networks from simple CNN to LSTM. Instead, in this paper, we present Temporal Perceiver, a general architecture with Transformer, offering a unified solution to the detection of arbitrary generic boundaries, ranging from shot-level, event-level, to scene-level GBDs. Our core design is to introduce a small set of latent feature queries as anchors to compress the redundant video input into a fixed dimension via cross-attention blocks. Thanks to this fixed number of latent units, it reduces the quadratic complexity of attention operation to a linear form of input frames. Specifically, to explicitly leverage the temporal structure of videos, we construct two types of latent feature queries: boundary queries and context queries, which handle the semantic incoherence and coherence accordingly. Moreover, to guide the learning of latent feature queries, we propose an alignment loss on the cross-attention maps to explicitly encourage the boundary queries to attend on the top boundary candidates. Finally, we present a sparse detection head on the compressed representation, and directly output the final boundary detection results without any post-processing module. We test our Temporal Perceiver on a variety of GBD benchmarks. Our method obtains the state-of-the-art results on all benchmarks with RGB single-stream features: SoccerNet-v2 (81.9 percent average-mAP), Kinetics-GEBD (86.0 percent average-f1), TAPOS (73.2 percent average-f1), MovieScenes (51.9 percent AP and 53.1 percentMiou) and MovieNet (53.3 percent AP and 53.2 percent Miou), demonstrating the generalization ability of our Temporal Perceiver. To further pursue a general GBD m
Microservices architecture is a promising approach for developing reusable scientific workflow capabilities for integrating diverse resources, such as experimental and observational instruments and advanced computatio...
详细信息
暂无评论