Der De-facto-Standard scheint CUDA GPU Memtest zu sein . Wie @ c2h5oh erwähnt hat, sieht es so aus, als ob es auf memtest86-Testmustern basiert, also bin ich sicher, dass es einen guten Job macht. Es läuft relativ schnell auf den von mir getesteten High-End-GPUs (30 Minuten auf einem Quadro 6000 und 20 Minuten auf einem Tesla C2075). Es läuft im Betriebssystem (im Gegensatz zu memtest), daher ist die Überwachung etwas anders. Sie möchten wahrscheinlich stdout und stderr in eine Datei ausgeben, um sie später anzusehen. Wenn Sie also Ihre Terminalausgabe verlieren, können Sie nachschlagen, was die Tests ergeben haben:
cuda_memtest 2>cuda_memtest.stderr 1>cuda_memtest.stdout &
tail -f cuda_memtest.stdout &
tail -f cuda_memtest.stderr &
Sie sollten auch sicherstellen, dass niemand das System und / oder die Karten verwendet. Sie können die GPUs in den exklusiven Modus versetzen, indem Sie:
nvidia-smi --compute-mode=EXCLUSIVE_PROCESS
Hier sind einige der Ergebnisse von Probeläufen des Quadro und des Tesla, falls Sie daran interessiert sind, welche Testinformationen angegeben werden:
[09/07/2012 11:56:22][hydro][0]:Running cuda memtest, version 1.2.2
[09/07/2012 11:56:23][hydro][0]:Warning: Getting serial number failed
[09/07/2012 11:56:23][hydro][0]:NVRM version: NVIDIA UNIX x86_64 Kernel Module 295.41 Fri Apr 6 23:18:58 PDT 2012
[09/07/2012 11:56:23][hydro][0]:num_gpus=1
[09/07/2012 11:56:23][hydro][0]:Device name=Quadro 6000, global memory size=6441992192
[09/07/2012 11:56:23][hydro][0]:major=2, minor=0
[09/07/2012 11:56:24][hydro][0]:Attached to device 0 successfully.
[09/07/2012 11:56:24][hydro][0]:Allocated 6040 MB
[09/07/2012 11:56:24][hydro][0]:Test0 [Walking 1 bit]
[09/07/2012 11:56:30][hydro][0]:Test0 finished in 5.7 seconds
[09/07/2012 11:56:30][hydro][0]:Test1 [Own address test]
[09/07/2012 11:56:33][hydro][0]:Test1 finished in 3.5 seconds
[09/07/2012 11:56:33][hydro][0]:Test2 [Moving inversions, ones&zeros]
[09/07/2012 11:57:05][hydro][0]:Test2 finished in 32.3 seconds
[09/07/2012 11:57:05][hydro][0]:Test3 [Moving inversions, 8 bit pat]
[09/07/2012 11:57:37][hydro][0]:Test3 finished in 31.9 seconds
[09/07/2012 11:57:37][hydro][0]:Test4 [Moving inversions, random pattern]
[09/07/2012 11:57:53][hydro][0]:Test4 finished in 15.9 seconds
[09/07/2012 11:57:53][hydro][0]:Test5 [Block move, 64 moves]
[09/07/2012 11:57:59][hydro][0]:Test5 finished in 6.3 seconds
[09/07/2012 11:57:59][hydro][0]:Test6 [Moving inversions, 32 bit pat]
[09/07/2012 12:18:46][hydro][0]:Test6 finished in 1246.6 seconds
[09/07/2012 12:18:46][hydro][0]:Test7 [Random number sequence]
[09/07/2012 12:19:06][hydro][0]:Test7 finished in 19.8 seconds
[09/07/2012 12:19:06][hydro][0]:Test8 [Modulo 20, random pattern]
[09/07/2012 12:19:06][hydro][0]:test8[mod test]: p1=0x13472f5f, p2=0xecb8d0a0
[09/07/2012 12:20:34][hydro][0]:Test8 finished in 88.0 seconds
[09/07/2012 12:20:34][hydro][0]:Test10 [Memory stress test]
[09/07/2012 12:20:34][hydro][0]:Test10 with pattern=0x55f6c69858704128
[09/07/2012 12:21:11][hydro][0]:Test10 finished in 36.8 seconds
[09/07/2012 12:21:11][hydro][0]:Test0 [Walking 1 bit]
[09/07/2012 12:21:16][hydro][0]:Test0 finished in 5.8 seconds
[09/06/2012 18:49:07][hydro][0]:Running cuda memtest, version 1.2.2
[09/06/2012 18:49:10][hydro][0]:Warning: Getting serial number failed
[09/06/2012 18:49:10][hydro][0]:NVRM version: NVIDIA UNIX x86_64 Kernel Module 295.41 Fri Apr 6 23:18:58 PDT 2012
[09/06/2012 18:49:10][hydro][0]:num_gpus=1
[09/06/2012 18:49:10][hydro][0]:Device name=Tesla C2075, global memory size=5636292608
[09/06/2012 18:49:10][hydro][0]:major=2, minor=0
[09/06/2012 18:49:11][hydro][0]:Attached to device 0 successfully.
[09/06/2012 18:49:11][hydro][0]:Allocated 5273 MB
[09/06/2012 18:49:11][hydro][0]:Test0 [Walking 1 bit]
[09/06/2012 18:49:22][hydro][0]:Test0 finished in 11.1 seconds
[09/06/2012 18:49:22][hydro][0]:Test1 [Own address test]
[09/06/2012 18:49:25][hydro][0]:Test1 finished in 3.1 seconds
[09/06/2012 18:49:25][hydro][0]:Test2 [Moving inversions, ones&zeros]
[09/06/2012 18:49:52][hydro][0]:Test2 finished in 27.4 seconds
[09/06/2012 18:49:52][hydro][0]:Test3 [Moving inversions, 8 bit pat]
[09/06/2012 18:50:20][hydro][0]:Test3 finished in 27.9 seconds
[09/06/2012 18:50:20][hydro][0]:Test4 [Moving inversions, random pattern]
[09/06/2012 18:50:34][hydro][0]:Test4 finished in 13.7 seconds
[09/06/2012 18:50:34][hydro][0]:Test5 [Block move, 64 moves]
[09/06/2012 18:50:39][hydro][0]:Test5 finished in 5.5 seconds
[09/06/2012 18:50:39][hydro][0]:Test6 [Moving inversions, 32 bit pat]
[09/06/2012 19:08:34][hydro][0]:Test6 finished in 1074.9 seconds
[09/06/2012 19:08:34][hydro][0]:Test7 [Random number sequence]
[09/06/2012 19:08:51][hydro][0]:Test7 finished in 17.1 seconds
[09/06/2012 19:08:51][hydro][0]:Test8 [Modulo 20, random pattern]
[09/06/2012 19:08:51][hydro][0]:test8[mod test]: p1=0x63136646, p2=0x9cec99b9
[09/06/2012 19:10:10][hydro][0]:Test8 finished in 78.4 seconds
[09/06/2012 19:10:10][hydro][0]:Test10 [Memory stress test]
[09/06/2012 19:10:10][hydro][0]:Test10 with pattern=0x26341d134a89ac2b
[09/06/2012 19:10:39][hydro][0]:Test10 finished in 29.0 seconds