diff --git a/README.md b/README.md index e366f3b..429f43a 100644 --- a/README.md +++ b/README.md @@ -145,18 +145,17 @@ mp: 0:2610744987 p: 1050 m: 0:966616035460 ms: 67696 dur: 182 ## GPU 20250116: GPU code is CPU bound for thread processing - at 100% cpu (other process) GPU slows by 4-10 times. I need to increase the threads sent to the GPU past 64k - -### Multi Threaded : 40 bit run -#### 128 bit native -##### CUDA 12.6: CPP +### 128 bit native +#### CUDA 12.6: CPP +##### Multi Threaded : 42 bit run +- sec 13900K b 32 core RTX-4090 Ada gpu 0 single 16384 cores 50% GPU 24% TDP 35840 threads 256 threads/block no av - batch 20 +##### Multi Threaded : 40 bit run - 4232 sec 13900K b 32 core RTX-4090 Ada gpu 0 single 16384 cores 50% GPU 24% TDP 35840 threads 256 threads/block no av - batch 20 - 5347 sec 14900K c RTX-A6000 single 60% GPU 54% TDP .5g/48g - 35840k threads / 256 threads/block batch 20 no av - 5508 sec 13900KS d RTX-A4500 gpu 1 single 3.2/5.9 GHz - 20 batch - no av -- sec P1Gen6 13800H RTX-3500 Ada mobile 5120 cores 60% GPU - 20 batch, 256 7168*5 threads - with path - no av +- sec P1Gen6 13800H RTX-3500 Ada mobile 5120 cores 60% GPU - 20 batch, 256 7168*5 threads - with path - no av ### Multi Threaded : 37 bit run -#### 128 bit native -##### CUDA 12.6: CPP - 528 sec 13900K b 32 core RTX-4090 Ada single 16384 cores 50% GPU 24% TDP 35840 threads 256 threads/block no av - batch 20 - sec 13900KS d RTX-A4500 gpu 0 single 3.2/5.9 GHz - 20 batch - no av - 641 sec 13900KS d RTX-A4500 gpu 1 single 3.2/5.9 GHz - 20 batch - no av @@ -164,8 +163,6 @@ mp: 0:2610744987 p: 1050 m: 0:966616035460 ms: 67696 dur: 182 - 799 sec P1Gen6 13800H RTX-3500 Ada mobile 5120 cores 60% GPU - 20 batch, 256 7168*5 threads - with path - no av - ### Multi Threaded : 32 bit run (search 0-(2^32-1) odd integer space) -#### 128 bit native -##### CUDA 12.6: CPP - 14 sec 14900K c RTX-A6000 single 55% GPU 45% TDP .5g/48g - 32k threads / 512 threads/block - 14 sec RTX-4090 Ada single 16384 cores 48% GPU 24% TDP 40960 threads 512 threads/block 80 blocks - batch 20 - 17 sec RTX-4090 Ada single 16384 cores 48% GPU 24% TDP 35840 threads 256/512 threads/block 160 blocks - batch 20 @@ -175,7 +172,7 @@ mp: 0:2610744987 p: 1050 m: 0:966616035460 ms: 67696 dur: 182 - 20 sec RTX-5000 TU104 16g mobile P17gen1 - 24 sec P1Gen6 13800H RTX-3500 Ada mobile 5120 cores 60% GPU - 20 batch, 256 7168*5 threads - with path - no av - cold start no thermal throttling - -#### 64 bit native +### 64 bit native Sec: 4 GlobalMax: 319804831 : 1414236446719942480 last search : 1073741825 - 9 sec 14900K c RTX-A6000 8/32c single 45% GPU 24% TDP .9g/48g - 32k threads / 256 threads/block - 10 sec 13900K b RTX-4090 Ada single 45% GPU 22% TDP .9g/24g 32k threads / 256 threads/block