-
Notifications
You must be signed in to change notification settings - Fork 63
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ci(buildkite): debugging CUDA segfaults on CI (#937)
* ci(buildkite): add coreupload plugin * ci(buildkite): try using the latest cuda_driver_jll * chore: try running tests with compat=false * chore: cleanup the PR
- Loading branch information
Showing
13 changed files
with
83 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1b7c9a9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
414167
ns414333
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
243812.5
ns243166
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
243375
ns244458
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
739750
ns740167
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43608.5
ns43790
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1274750
ns1313500
ns0.97
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
1257604
ns1240208
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
16232709
ns16477375
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2193229
ns2255000
ns0.97
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
205508.5
ns208556.5
ns0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1311791
ns1362770.5
ns0.96
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
1296000
ns1287854.5
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
16564750
ns16632958
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2236917
ns2226042
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1656771
ns1717603.5
ns0.96
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1101167
ns1031500
ns1.07
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1519083
ns1531250
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2996500
ns3017292
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
206771
ns208614
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12074917
ns12148333
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8846125
ns8837084
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9185812.5
ns9175417
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18620646
ns18614083.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1506641
ns1495290
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17279459
ns17311000
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14009229.5
ns13985416
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14468291.5
ns14512667
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21873146
ns21852792
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
252162083.5
ns252076020.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148884583
ns148360833
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116232875
ns116492833.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447534666
ns447245667
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5465296
ns5467078.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1230946875
ns1233021917
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
931953750
ns930623166
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
826867750.5
ns833831875
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1631748667
ns1634325958
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31362804
ns31610363
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1146184875
ns1149699000
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
997853916.5
ns996946771
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1329065916.5
ns1315883646
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1736617187.5
ns1740085708
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1111541.5
ns1118375
ns0.99
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1663917
ns1638125
ns1.02
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3634917
ns3616250
ns1.01
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
788500
ns782250
ns1.01
lenet(28, 28, 1, 32)/forward/GPU/CUDA
262430.5
ns270251
ns0.97
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2981646
ns2996292
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4151854.5
ns4132667
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
10487312.5
ns10221167
ns1.03
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3265083
ns3156062
ns1.03
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1131749
ns1122893
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2342791
ns2273271
ns1.03
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1260000
ns1316749.5
ns0.96
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1539542
ns1552667
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4176916
ns4209687.5
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
208157.5
ns209258
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19392625
ns19422291.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16105895.5
ns16107334
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17329250
ns17351937.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25905125
ns25921166
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1607984
ns1598707
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34168604
ns34193958
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
30734292
ns30938000
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
30891041.5
ns31197500
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36714750
ns36682042
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4532000
ns4538583
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2546584
ns2543291
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2675583.5
ns2674083
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8386333
ns8379958
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
419971
ns424210
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
38621250
ns38932625
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32144146
ns32157666
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32234313
ns32273542
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
51925709
ns51985167
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2628667
ns2626449
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
89245375
ns89065979.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
115663979
ns115218833
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
223717000
ns226131166
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74519062.5
ns74254146
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
270237667
ns269870292
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
156197542
ns156750750
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
123423271
ns123574479.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
485408250
ns485653292
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7027939
ns6941099.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1473080062.5
ns1470327334
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1168760792
ns1179882625
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1063953145.5
ns1073715271
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2006090104
ns2001077396
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34772934.5
ns34677615.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1719270959
ns1720676625
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1530344979
ns1537383438
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1879104875
ns1839399083
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2217620458
ns2211950584
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2066124.5
ns2097166
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
3080917
ns3024209
ns1.02
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
7964834
ns7203500
ns1.11
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2511771
ns2463583
ns1.02
lenet(28, 28, 1, 128)/forward/GPU/CUDA
272286
ns264273
ns1.03
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9629792
ns9398666
ns1.02
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
12051208
ns11990396
ns1.01
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
23782666.5
ns25173666.5
ns0.94
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11321791
ns11771708
ns0.96
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1192316.5
ns1166267.5
ns1.02
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
379182875
ns380060375
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
311332270.5
ns311100041.5
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
260260313
ns267361708.5
ns0.97
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
450681833
ns451932312.5
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4857816
ns4972294
ns0.98
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1151703750
ns1154776917
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
938427709
ns938936416
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
943142791
ns971050584
ns0.97
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1396853084
ns1397053958
ns1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
17794579
ns20192694
ns0.88
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1048833
ns1061792
ns0.99
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
1655208.5
ns1666562
ns0.99
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
4851812
ns4995250
ns0.97
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1291167
ns1386958.5
ns0.93
lenet(28, 28, 1, 64)/forward/GPU/CUDA
278270.5
ns265363
ns1.05
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6497104
ns6518459
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
13086396
ns13167979.5
ns0.99
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
18753875
ns20031250
ns0.94
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
5891208.5
ns6075042
ns0.97
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1253158.5
ns1210268
ns1.04
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70556458
ns70474416.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
44452167
ns44309437.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39837500
ns39939667
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132581125
ns132597500
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1865473
ns1928662.5
ns0.97
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
356767520.5
ns356622333.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
272336833
ns272976834
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
255661771
ns255218042
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
534829208.5
ns534735459
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12304649
ns12363809
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
395040042
ns396348584
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
370401500
ns384172750
ns0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
693812291
ns721077333.5
ns0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
711246750
ns711103834
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1188023709
ns1188662708
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
835256562.5
ns832992083.5
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
638885750
ns642434458.5
ns0.99
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1768729250
ns1776768042
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12316863.5
ns12309666
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3627838020.5
ns3641771083.5
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2824735750
ns2830188042
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2694929167
ns2706151041
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5002434750
ns5031071750
ns0.99
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49730192
ns49668395
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3432375.5
ns3440708
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2078583
ns2055416
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2530500
ns2500125
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6020833
ns6019375
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
339043.5
ns315406
ns1.07
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25844354
ns25875291
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18918770.5
ns19146229.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19719959
ns19365708.5
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39362209
ns38324833.5
ns1.03
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2460010
ns2475375
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54493625
ns54514062.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
84184417
ns82633291.5
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
173059688
ns174671625
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45573959
ns45302208
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1783437.5
ns1792062.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1098584
ns1099604.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1563624.5
ns1542042
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3028979
ns3033959
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
212147.5
ns211799
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12574667
ns12562750
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9223854
ns9226791
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9681958
ns9602354
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18996416
ns18997104.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1525057
ns1540683
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17650833
ns17671792
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14332292
ns14309083
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14552750
ns14547458
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22194208
ns22161375
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70637271
ns70504375
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
44500249.5
ns44105145.5
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
40038333
ns39912979
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132595500
ns132559791.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1878861
ns1936174
ns0.97
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
361106062
ns359409291
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
349644938
ns348618104
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
304116708.5
ns305716167
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
723634000
ns724420791
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13382866.5
ns13389186
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
419845083.5
ns420157625
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
427670459
ns429218959
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
765524104
ns700058604.5
ns1.09
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
715822875
ns716102291
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1591792
ns1595000
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1165292
ns1157583.5
ns1.01
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1150479.5
ns1159667
ns0.99
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2435375
ns2459125
ns0.99
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
580934.5
ns547379.5
ns1.06
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8855583
ns8848209
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
13566583
ns13600625
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
33371313
ns33819291.5
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9856250
ns9846625
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1447660.5
ns1473567
ns0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
16614333.5
ns16653917
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
22957687.5
ns22770333.5
ns1.01
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
45530875
ns47753750
ns0.95
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
13137979
ns13143791.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
830833
ns827917
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
515458
ns621542
ns0.83
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1061583
ns1073833
ns0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
723895.5
ns725042
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
48058.5
ns47938
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1549792
ns1553521
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1043458
ns1054000
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1717459
ns1432167
ns1.20
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2249729
ns2258729
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
235968.5
ns240587.5
ns0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1556416
ns1558291.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1068292
ns1087375
ns0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1707875
ns1840000
ns0.93
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2224354
ns2188584
ns1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3404875
ns3428687.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2061708
ns2064583
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2526583
ns2512875
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6005458
ns6002146
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
284654
ns287607
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24057375
ns24070583
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17188917
ns17222146
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17108854
ns17095666.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37589750
ns37568312.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2418683.5
ns2419447
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
52962291.5
ns52951458.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
85344416
ns82721062.5
ns1.03
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
171244354
ns171722291
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44652208.5
ns44599666.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
251293750
ns251795458
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148493709
ns148535375
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116314333.5
ns116156833
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447949229.5
ns447970041.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5446386
ns5450674
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1103974709
ns1105735084
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
855630395.5
ns859662041.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
831750854.5
ns829884208
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1754110584
ns1759233333
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
28887646
ns29448635.5
ns0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1030795771
ns1031472604
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
973527459
ns979007542
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1276835833
ns1377035458
ns0.93
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1741435895.5
ns1724417500
ns1.01
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1102104.5
ns1210666.5
ns0.91
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
764333
ns663000
ns1.15
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
784979
ns688083
ns1.14
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
1957854
ns2060083
ns0.95
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
563252
ns581074
ns0.97
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5885125
ns5887979
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
9085895.5
ns8608041
ns1.06
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
26897042
ns26857833
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7099083
ns7102729
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1415829
ns1395877.5
ns1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
9699771
ns9702166.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
15967729
ns16070292
ns0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
32771687.5
ns34207145.5
ns0.96
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
7633666
ns7622708
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
514458
ns521416
ns0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
384604.5
ns467583.5
ns0.82
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
3059459
ns2678000
ns1.14
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
87833
ns89458
ns0.98
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28219
ns28156
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
381812.5
ns382000
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
447750
ns441542
ns1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4678459
ns4583458
ns1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
258375
ns258542
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
228924.5
ns225153.5
ns1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
410916.5
ns411500
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
479208
ns471916
ns1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
4649000
ns4557813
ns1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
270833
ns271709
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
461250.5
ns464666
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
322625
ns415562.5
ns0.78
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
768834
ns787354
ns0.98
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
52875
ns54458
ns0.97
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28278
ns28105
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
342333
ns339125
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
347625
ns337209
ns1.03
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
396687
ns425750
ns0.93
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
151250
ns151834
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
212495
ns209442
ns1.01
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
356000
ns354958
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
362937.5
ns351542
ns1.03
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
740771
ns447104.5
ns1.66
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
150875
ns151375
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
601061209
ns601684667
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
430671250
ns429562875
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
383040583
ns376810833
ns1.02
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
870727020.5
ns869971062
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7032100
ns7028132
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2000504228.5
ns1996714667
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1604685125
ns1615544895.5
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1652458646
ns1563076479
ns1.06
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2626165250
ns2624412875
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
25934443
ns26150198.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
526333
ns520854
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
400458.5
ns393375
ns1.02
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
3022187.5
ns2582708
ns1.17
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
868667
ns866187.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47967.5
ns47544
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1757062.5
ns1879500
ns0.93
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1694333
ns1747271
ns0.97
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
16312334
ns16566729
ns0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2651375
ns2650937.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
257253
ns248835.5
ns1.03
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1894750.5
ns1949917
ns0.97
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
1834625
ns1830604.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
16537333
ns16534688
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2736604.5
ns2714312.5
ns1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1496021
ns1368166.5
ns1.09
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
931750
ns967041
ns0.96
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1059667
ns933875
ns1.13
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2319292
ns2334542
ns0.99
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
585808.5
ns587807
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5882458
ns5905375
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
8563167
ns8596208
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
26031937
ns25859917
ns1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7331479
ns7262750
ns1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1393892
ns1348515
ns1.03
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
11701667
ns11679812.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
18292896
ns18127854
ns1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
39864875
ns37908583
ns1.05
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
9527500
ns9569791
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2750
ns2583
ns1.06
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2334
ns4583
ns0.51
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3292
ns3459
ns0.95
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2583
ns2458.5
ns1.05
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24864
ns24305
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7041
ns7000
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7166
ns6958
ns1.03
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7250
ns7250
ns1
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7083
ns6959
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
216254.5
ns209367.5
ns1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8250
ns8250
ns1
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8459
ns8208
ns1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8542
ns8542
ns1
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
5834
ns6000
ns0.97
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10479.5
ns10521
ns1.00
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
13062.5
ns13833
ns0.94
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
10500
ns10437.5
ns1.01
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7500
ns7520.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25125
ns24374
ns1.03
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
19916
ns20000
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
19917
ns19542
ns1.02
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
20270.5
ns20291
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
20000
ns19750
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
238014.5
ns229284
ns1.04
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
23541
ns23459
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
23584
ns23417
ns1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
23917
ns23916
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
21333
ns21292
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28687.5
ns28917
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28458
ns29000
ns0.98
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28750
ns28417
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46041
ns46479.5
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
26166
ns25572
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
224416
ns223416
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
277458
ns272291
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
3940416
ns4265917
ns0.92
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145375
ns145583
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
215900.5
ns205830.5
ns1.05
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
241916.5
ns241125
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
294834
ns290042
ns1.02
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
4072750
ns4002209
ns1.02
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
145500
ns145667
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1750
ns2000
ns0.88
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1709
ns1959
ns0.87
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2833
ns2417
ns1.17
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1792
ns2625
ns0.68
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23320
ns22856
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5250
ns5209
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5084
ns5000
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5375
ns5292
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5250
ns5000
ns1.05
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
273997
ns247823
ns1.11
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
7500
ns7416
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
7458
ns7458
ns1
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
7625
ns7708
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5125
ns5084
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
79922000
ns79982958
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
48869292
ns48448125
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
43653750
ns43546250
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151454541
ns151447125
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2718779
ns2714961
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
663985416
ns664508792
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
413249125
ns414562750
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
397260000
ns399573833
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
684524000
ns682810167
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14579213
ns14687395.5
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
713434583.5
ns714844958
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
675522709
ns686047458
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
997663125
ns991292583
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
999548041
ns999363417
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.
1b7c9a9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JuliaRegistrator register
1b7c9a9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration pull request created: JuliaRegistries/General/115463
Tip: Release Notes
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
To add them here just re-invoke and the PR will be updated.
Tagging
After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.
This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via: