Skip to content

Commit

Permalink
#1 - Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
obriensystems authored Jan 17, 2025
1 parent 481d433 commit a34fed6
Showing 1 changed file with 10 additions and 5 deletions.
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,28 +145,33 @@ mp: 0:2610744987 p: 1050 m: 0:966616035460 ms: 67696 dur: 182

## GPU
20250116: GPU code is CPU bound for thread processing - at 100% cpu (other process) GPU slows by 4-10 times. I need to increase the threads sent to the GPU past 64k

### Multi Threaded : 40 bit run
#### 128 bit native
##### CUDA 12.6: CPP
- sec 14900K d RTX-A6000 single 60% GPU 54% TDP .5g/48g - 35840k threads / 256 threads/block batch 20 no av
- sec 13900K b 32 core RTX-4090 Ada single 16384 cores 50% GPU 24% TDP 35840 threads 256 threads/block no av - batch 20
- 5347 sec 14900K d RTX-A6000 single 60% GPU 54% TDP .5g/48g - 35840k threads / 256 threads/block batch 20 no av

### Multi Threaded : 37 bit run
#### 128 bit native
##### CUDA 12.6: CPP
- 658 sec 14900K d RTX-A6000 single 55% GPU 45% TDP .5g/48g - 35840k threads / 256 threads/block no av
- 528 sec 13900K b 32 core RTX-4090 Ada single 16384 cores 50% GPU 24% TDP 35840 threads 256 threads/block no av - batch 20
- 658 sec 14900K d RTX-A6000 single 55% GPU 45% TDP .5g/48g - 35840k threads / 256 threads/block no av - batch 20?
-
### Multi Threaded : 32 bit run (search 0-(2^32-1) odd integer space)
#### 128 bit native
##### CUDA 12.6: CPP
- 14 sec 14900K d RTX-A6000 single 55% GPU 45% TDP .5g/48g - 32k threads / 512 threads/block
- 14 sec RTX-4090 single 16384 cores 48% GPU 24% TDP- 20 batch 40960 threads 512 threads/block 80 blocks
- 14 sec RTX-4090 Ada single 16384 cores 48% GPU 24% TDP 40960 threads 512 threads/block 80 blocks - batch 20
- 17 sec RTX-4090 Ada single 16384 cores 48% GPU 24% TDP 35840 threads 256/512 threads/block 160 blocks - batch 20
- 17 sec P1Gen6 13800H RTX-3500 Ada mobile 5120 cores 60% GPU - 20 batch, 40960 threads
- 18 sec RTX-A4500 single
- 18 sec RTX-A4000 single
- 20 sec RTX-5000 TU104 16g mobile P17gen1
#### 64 bit native
Sec: 4 GlobalMax: 319804831 : 1414236446719942480 last search : 1073741825
- 9 sec 14900K d RTX-A6000 8/32c single 45% GPU 24% TDP .9g/48g - 32k threads / 256 threads/block
- 10 sec 13900K b 4090 single 45% GPU 22% TDP .9g/24g 32k threads / 256 threads/block
- 10 sec 13900K b RTX-4090 Ada single 45% GPU 22% TDP .9g/24g 32k threads / 256 threads/block
- 12 sec 13900K a RTX-A4000 single 45% GPU 58% TDP .9g/16g

## CPU
Expand Down Expand Up @@ -318,7 +323,7 @@ large batch 12-14 up from 5 sizes for larger memory 64-128g, cpu for Pcores goes
- 689 sec 13900K a 3.0/5.7 GHz

# Records stats
## 128 bit CUDA (5120 to 32768 cores) - RTX-A6000 or dual RTX-4090
## 128 bit CUDA (5120 to 32768 cores) - RTX-A6000 or dual RTX-4090 Ada
55% GPU at 24% TDP

```
Expand Down

0 comments on commit a34fed6

Please sign in to comment.