On-device training for deep neural networks(DNN) has become a trend due to various user preferences and scenarios. The DNN training process consists of three phases, feedforward(FF), backpropagation(BP), and weight gr...
详细信息
On-device training for deep neural networks(DNN) has become a trend due to various user preferences and scenarios. The DNN training process consists of three phases, feedforward(FF), backpropagation(BP), and weight gradient(WG) update. WG takes about one-third of the computation in the whole training process. Current training accelerators usually ignore the special computation property of WG and process it in a way similar to FF/BP. Besides, the extensive data sparsity existing in WG, which brings opportunities to save computation, is not well explored. Nevertheless, exploiting the optimization opportunities would meet three underutilization problems, which are caused by(1) the mismatch between WG data dimensions and hardware parallelism,(2) the full sparsity, i.e., the sparsity of feature map(Fmap),error map(Emap), and gradient, and(3) the workload imbalance resulting from irregular sparsity. In this paper, we propose a specific architecture for sparse weight gradient(SWG) computation. The architecture is designed based on hierarchical unrolling and sparsity-aware(HUSA) dataflow to exploit the optimization opportunities of the special computation property and full data sparsity. In HUSA dataflow, the data dimensions are unrolled hierarchically on the hardware architecture. A valid-data trace(VDT) mechanism is embedded in the dataflow to avoid the underutilization caused by the two-sided input sparsity. The gradient is unrolled in PE to alleviate the underutilization induced by output sparsity while maintaining the data reuse opportunities. Besides, we design an intra-and inter-column balancer(IIBLC) to dynamically tackle the workload imbalance problem resulting from the irregular sparsity. Experimental results show that with HUSA dataflow exploiting the full sparsity, SWG achieves a speedup of 12.23× over state-of-the-art gradient computation architecture, Train Ware. SWG helps to improve the energy efficiency of the state-of-the-art training accelerator LNPU from
Breast cancer, marked by uncontrolled cell growth in breast tissue, is the most common cancer among women and a second-leading cause of cancer-related deaths. Among its types, ductal and lobular carcinomas are the mos...
详细信息
Breast cancer, marked by uncontrolled cell growth in breast tissue, is the most common cancer among women and a second-leading cause of cancer-related deaths. Among its types, ductal and lobular carcinomas are the most prevalent, with invasive ductal carcinoma accounting for about 70–80% of cases and invasive lobular carcinoma for about 10–15%. Accurate identification is crucial for effective treatment but can be time-consuming and prone to interobserver variability. AI can rapidly analyze pathological images, providing precise, cost-effective identification, thus reducing the pathologists’ workload. This study utilizes a deep learning framework for advanced, automatic breast cancer detection and subtype identification. The framework comprises three key components: detecting cancerous patches, identifying cancer subtypes (ductal and lobular carcinoma), and predicting patient-level outcomes from whole slide images (WSI). The validation process includes visualization using Score-CAM to highlight cancer-affected areas prominently. Datasets include 111 WSIs (85 malignant from the Warwick HER2 dataset and 26 benign from pathologists). For subtype detection, there are 57 ductal and 8 lobular carcinoma cases. A total of 28,428 annotated patches were reviewed by two expert pathologists. Four pre-trained models—DenseNet-201, MobileNetV2, an ensemble of these two, and a Vision Transformer-based model—were fine-tuned and tested on the patches. Patient-level results were predicted using a majority voting technique based on the percentage of each patch type in the WSI. The Vision Transformer-based model outperformed other models in patch classification, achieving an accuracy of 96.74% for cancerous patch detection and 89.78% for cancer subtype classification. For WSI-based cancer classification, the majority voting method attained an F1-score of 99.06 and 96.13% for WSI-based cancer subtype classification. The proposed deep learning-based framework for advanced breast cancer det
Predicting crimes before they occur can save lives and losses of property. With the help of machine learning, many researchers have studied predicting crimes extensively. In this paper, we evaluate state-of-the-art cr...
详细信息
The deployment of fifth-generation (5G) networks across various industry verticals is poised to transform communication and data exchange, promising unparalleled speed and capacity. However, the security concerns rela...
详细信息
Over the past years,many efforts have been accomplished to achieve fast and accurate meta-heuristic algorithms to optimize a variety of real-world *** study presents a new optimization method based on an unusual geolo...
详细信息
Over the past years,many efforts have been accomplished to achieve fast and accurate meta-heuristic algorithms to optimize a variety of real-world *** study presents a new optimization method based on an unusual geological phenomenon in nature,named Geyser inspired Algorithm(GEA).The mathematical modeling of this geological phenomenon is carried out to have a better understanding of the optimization *** efficiency and accuracy of GEA are verified using statistical examination and convergence rate comparison on numerous CEC 2005,CEC 2014,CEC 2017,and real-parameter benchmark ***,GEA has been applied to several real-parameter engineering optimization problems to evaluate its *** addition,to demonstrate the applicability and robustness of GEA,a comprehensive investigation is performed for a fair comparison with other standard optimization *** results demonstrate that GEA is noticeably prosperous in reaching the optimal solutions with a high convergence rate in comparison with other well-known nature-inspired algorithms,including ABC,BBO,PSO,and *** that the source code of the GEA is publicly available at https://***/projects/gea.
The transferability of adversarial examples is of central importance to transfer-based black-box adversarial attacks. Previous works for generating transferable adversarial examples focus on attacking given pretrained...
详细信息
Large-quantity and high-quality data is critical to the success of machine learning in diverse *** with the dilemma of data silos where data is difficult to circulate,emerging data markets attempt to break the dilemma...
详细信息
Large-quantity and high-quality data is critical to the success of machine learning in diverse *** with the dilemma of data silos where data is difficult to circulate,emerging data markets attempt to break the dilemma by facilitating data exchange on the ***,on the other hand,is one of the important methods to efficiently collect large amounts of data with high-value in data *** this paper,we investigate the joint problem of efficient data acquisition and fair budget distribution across the crowdsourcing and data *** propose a new metric of data value as the uncertainty reduction of a Bayesian machine learning model by integrating the data into model *** by this data value metric,we design a mechanism called Shapley Value Mechanism with Individual Rationality(SV-IR),in which we design a greedy algorithm with a constant approximation ratio to greedily select the most cost-efficient data brokers,and a fair compensation determination rule based on the Shapley value,respecting the individual rationality *** further propose a fair reward distribution method for the data holders with various effort levels under the charge of a data *** demonstrate the fairness of the compensation determination rule and reward distribution rule by evaluating our mechanisms on two real-world *** evaluation results also show that the selection algorithm in SV-IR could approach the optimal solution,and outperforms state-of-the-art methods.
In preparation for the upcoming FAU Hack-a-Thon, we have implemented extensive support structures to ensure that all participating teams are thoroughly prepared for the competition. This preparation includes the provi...
详细信息
Now-a-days, the generation of videos has increased dramatically due to the quick growth of multimedia and the internet. The need for effective ways to store, manage, and index the massive numbers of videos has become ...
详细信息
Fog computing brings computational services near the network edge to meet the latency constraints of cyber-physical System(CPS)*** devices enable limited computational capacity and energy availability that hamper end ...
详细信息
Fog computing brings computational services near the network edge to meet the latency constraints of cyber-physical System(CPS)*** devices enable limited computational capacity and energy availability that hamper end user *** designed a novel performance measurement index to gauge a device’s resource *** examination addresses the offloading mechanism issues,where the end user(EU)offloads a part of its workload to a nearby edge server(ES).Sometimes,the ES further offloads the workload to another ES or cloud server to achieve reliable performance because of limited resources(such as storage and computation).The manuscript aims to reduce the service offloading rate by selecting a potential device or server to accomplish a low average latency and service completion time to meet the deadline constraints of sub-divided *** this regard,an adaptive online status predictive model design is significant for prognosticating the asset requirement of arrived services to make float ***,the development of a reinforcement learning-based flexible x-scheduling(RFXS)approach resolves the service offloading issues,where x=service/resource for producing the low latency and high performance of the *** approach to the theoretical bound and computational complexity is derived by formulating the system efficiency.A quadratic restraint mechanism is employed to formulate the service optimization issue according to a set ofmeasurements,as well as the behavioural association rate and adulation *** system managed an average 0.89%of the service offloading rate,with 39 ms of delay over complex scenarios(using three servers with a 50%service arrival rate).The simulation outcomes confirm that the proposed scheme attained a low offloading uncertainty,and is suitable for simulating heterogeneous CPS frameworks.
暂无评论