fault tolerance in gpgpu

819 Words2 Pages

[paper report 5] studies three software approaches for GPGPU reliability. These approaches are based on the redundant execution. The first approach is to execute the kernel twice, so the performance overhead is around 100 percent. The other two approaches use the interleaved execution of the main kernel with redundant threads. The paper explores the usefulness of employing ECC/parity bits in memories considering it’s exerted overhead. The first approach, called R-Native executes the kernel twice. One drawback is the similar effect of the permanent hardware defects on both of the executions that could not be detected. This could be avoided by reorganizing the input data for redundant execution or by obtaining a software interface to assign the blocks to an arbitrary SM. In addition, an offset could be used when accessing the memory. The execution time in this approach is 199% of its native execution. Applying ECC, the execution time is reduced to 192% and 194% for Brook and CUDA respectively. The second approach, called R-Scatter, try to make a benefit of the unutilized ILP in GPU cores. The redundant instructions are inserted in the kernel with their appropriate indexes. One problem is the use of some variables and computations for both the main and redundant threads (e.g. loop counter). Any error in these common variables causes a similar error in the main and the redundant threads. Two alternatives could be used. First, reorganizing the input data (chosen for Brooks implementation), or modifying the redundant thread indexes (CUDA implementation). Execution time for this approach is 193% in Brooks. Employing the hardware correction codes improves the overhead by 4.6%. CUDA implementation overhead is worse than the previous approa... ... middle of paper ... ...entioned schemes. P-RISE is made advantage of the branch divergent. A portion of a warp become inactive in the divergence so some of the SPs are idles during its execution. The idea of P-RISE is to find a warp with the same PC as the diverged warp, take a number of its threads and execute them with the diverged warp redundantly. So, whenever the warp is executed, its results are compared with the redundant execution. Experimental results show that in 48% of the cases, a warp with a matching PC is found. This scheme is also needs some hardware support and exerts some hardware overhead. GPGPU-Sim is used to simulate and analyze the presented scheme. AVF is used to evaluate the efficacy of the RISE. Simulation results show a 43% of AVF improvement. The effect of some branch divergent techniques (e.g. PDOM and DWF) on RISE is examined. Their effect is not considerable.

More about fault tolerance in gpgpu

Open Document