Real-time transformation was important for the practical implementation of impedance flow *** major obstacle was the time-consuming step of translating raw data to cellular intrinsic electrical properties(e.g.,specifi...
详细信息
Real-time transformation was important for the practical implementation of impedance flow *** major obstacle was the time-consuming step of translating raw data to cellular intrinsic electrical properties(e.g.,specific membrane capacitance C_(sm) and cytoplasm conductivityσ_(cyto)).Although optimization strategies such as neural network-aided strategies were recently reported to provide an impressive boost to the translation process,simultaneously achieving high speed,accuracy,and generalization capability is still *** this end,we proposed a fast parallel physical fitting solver that could characterize single cells’C_(sm)andσ_(cyto)within 0.62 ms/cell without any data preacquisition or pretraining *** achieved the 27000-fold acceleration without loss of accuracy compared with the traditional *** on the solver,we implemented physics-informed real-time impedance flow cytometry(piRT-IFC),which was able to characterize up to 100,902 cells’C_(sm) andσ_(cyto)within 50 min in a real-time *** to the fully connected neural network(FCNN)predictor,the proposed real-time solver showed comparable processing speed but higher ***,we used a neutrophil degranulation cell model to represent tasks to test unfamiliar samples without data for *** being treated with cytochalasin B and N-Formyl-Met-Leu-Phe,HL-60 cells underwent dynamic degranulation processes,and we characterized cell’s C_(sm)andσ_(cyto)using *** to the results from our solver,accuracy loss was observed in the results predicted by the FCNN,revealing the advantages of high speed,accuracy,and generalizability of the proposed piRT-IFC.
The dataflow architecture,which is characterized by a lack of a redundant unified control logic,has been shown to have an advantage over the control-flow architecture as it improves the computational performance and p...
详细信息
The dataflow architecture,which is characterized by a lack of a redundant unified control logic,has been shown to have an advantage over the control-flow architecture as it improves the computational performance and power efficiency,especially of applications used in high-performance computing(HPC).Importantly,the high computational efficiency of systems using the dataflow architecture is achieved by allowing program kernels to be activated in a simultaneous ***,a proper acknowledgment mechanism is required to distinguish the data that logically belongs to different *** solutions include the tagged-token matching mechanism in which the data is sent before acknowledgments are received but retried after rejection,or a handshake mechanism in which the data is only sent after acknowledgments are ***,these mechanisms are characterized by both inefficient data transfer and increased area *** performance of the dataflow architecture depends on the efficiency of data *** order to optimize the efficiency of data transfer in existing dataflow architectures with a minimal increase in area and power cost,we propose a Look-Ahead Acknowledgment(LAA)*** accelerates the execution flow by speculatively acknowledging ahead without *** simulation analysis based on a handshake mechanism shows that our LAA increases the average utilization of computational units by 23.9%,with a reduction in the average execution time by 17.4%and an increase in the average power efficiency of dataflow processors by 22.4%.Crucially,our novel approach results in a relatively small increase in the area and power consumption of the on-chip logic of less than 0.9%.In conclusion,the evaluation results suggest that Look-Ahead Acknowledgment is an effective improvement for data transfer in existing dataflow architectures.
This paper presents PAMPAR, a new benchmark to evaluate the performance and energy consumption of different Parallel Programming Interfaces (PPIs). The benchmark is composed of 11 algorithms implemented in PThreads, O...
详细信息
Edge visual systems demand high energy-efficiency vision processors like neuromorphic hardware leveraging spikebased computations. But their disability of directly interacting with non-spike information in the real wo...
详细信息
In this paper, discrete-time chaos generator is exploited for lightweight and energy-efficient true random number generator (TRNG) implementation. The chaos generator is realized with a two-stage tent map circuit with...
详细信息
ISBN:
(纸本)9781665441865
In this paper, discrete-time chaos generator is exploited for lightweight and energy-efficient true random number generator (TRNG) implementation. The chaos generator is realized with a two-stage tent map circuit with feedback. Each stage consists of three transistors. Its transient characteristics can be tuned over a range of bias voltage to realize a family of tent map functions. The chaotic output voltage amplitude of each stage is highly sensitive to the environmental noise. The difference signal between two such chaos generators at each stage is amplified to produce a random binary output. The output of the quantized sense amplifier is sampled at twice the bit rate of each stage of tent map circuit. The proposed design was simulated using a standard 55 nm 1.2 V CMOS process. The results show that it has a very low power consumption of 10.98 μW. It has an ultra-low energy consumption of only 5.49 pJ/bit at a throughput of 2 Mbps, which is 19x lower than the recently reported state-of-the-art and the most power efficient chaotic TRNGs. The random bit stream generated by the proposed TRNG has passed entropy, ACF and NIST randomness tests.
Injecting faults to the system architecture layer and studying the upper neural network for fault tolerancehe is difficult and time-consuming. This paper proposes an automatic method covering time and space, which can...
ISBN:
(数字)9781728143903
ISBN:
(纸本)9781728143910
Injecting faults to the system architecture layer and studying the upper neural network for fault tolerancehe is difficult and time-consuming. This paper proposes an automatic method covering time and space, which can inject faults into the processor on the Simics simulation platform, simulating soft errors, and then collect the time sequence data of the system architecture layer and the observed node data of visual convolutional neural networks program layer. At the same time, combined with the relevant standards, the GAN classifier is used to calibrate the different fault models after converting time sequence data into time sequence images. Finally, the Bayesian network is used to form the path of fault propagation from the architecture layer to the program layer and the result layer. After intensive fault injection into critical registers, the probability of neural network failure caused by soft errors is effectively stimulated.
Resistive random access memory(RRAM) is a promising non-volatile memory owing to its low operation voltage,small area,and good ***,the thermal crosstalk issue will significantly impact the retention properties of RR...
详细信息
ISBN:
(纸本)9781509066261;9781509066254
Resistive random access memory(RRAM) is a promising non-volatile memory owing to its low operation voltage,small area,and good ***,the thermal crosstalk issue will significantly impact the retention properties of RRAM for high density integration and frequent writing operation,which commonly causes a series of error flips in some serious *** this paper,we propose an efficient parity rearrangement coding scheme,named PRCoder,to alleviate the influence of thermal crosstalk in *** implement the PRCoder at the algorithm level and simulated it at the hardware circuits level for different RRAM *** show that PRCoder can reduce the error flips caused by thermal crosstalk by about 32.7% on average with only one extra bit of redundancy per storage ***,PRCoder incurs a negligible performance overhead of .3% with less than 0.008% additional RRAM area.
A combination of fifteen top quark mass measurements performed by the ATLAS and CMS experiments at the LHC is presented. The datasets used correspond to an integrated luminosity of up to 5 and 20 fb−1 of proton-proto...
详细信息
A combination of fifteen top quark mass measurements performed by the ATLAS and CMS experiments at the LHC is presented. The datasets used correspond to an integrated luminosity of up to 5 and 20 fb−1 of proton-proton collisions at center-of-mass energies of 7 and 8 TeV, respectively. The combination includes measurements in top quark pair events that exploit both the semileptonic and hadronic decays of the top quark, and a measurement using events enriched in single top quark production via the electroweak t channel. The combination accounts for the correlations between measurements and achieves an improvement in the total uncertainty of 31% relative to the most precise input measurement. The result is mt=172.52±0.14(stat)±0.30(syst) GeV, with a total uncertainty of 0.33 GeV.
When operating frequency is over several gigahertz, the effect of inductance plays an important role and should be included for accurate and speed crosstalk noise analysis. And for new generation IC(Integrated circuit...
详细信息
When operating frequency is over several gigahertz, the effect of inductance plays an important role and should be included for accurate and speed crosstalk noise analysis. And for new generation IC(Integrated circuit) design tools, the crosstalk noise analysis tools should consider the influence of process variations. In this paper, we propose coupled RLC crosstalk noise distributed parameter model with capacitive load termination. And we develop the framework that how to use the crosstalk noise model for process variations. Our results show that compare with HSPICE, the critical data errors of proposed model are within 1% and the relative errors occurred between calculated values for process variations and HSPICE Monte Carlo simulation values are less than 5%. The key features of the new model include:(1) The impact of inductance on crosstalk noise is considered;(2) The model can be used for process variations analysis;(3) The model reflect the effects of load capacitance directly;(4) The numerical inversion of Laplace transform is introduced for improving speed. So the proposed model meets the needs of future IC design both in speed and accuracy.
It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU (GPGPU) provides high computi...
详细信息
It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU (GPGPU) provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more effi- ciently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we pro- pose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the block- ing method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU mem- ory bandwidth and the performance of the GPU.
暂无评论