You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When Evaluating main bottleneck is Box decoding,
Since it works faster on cpu, it is decoded on cpu. (It should have been actually obvious, since gpus are good for parallel computation and box decoding is not parallel computation)
If you want to speed up the evaluation process, you have to parallelize box_decoding code and then use gpu.
For some reason I think even though it is not parallel, box_decode still taking long time than it should.
Hi @kuangliu,
in the evaluation code
you are decoding boxes on the cpu . https://github.com/kuangliu/torchcv/blob/master/examples/fpnssd/eval.py#L57
My question is why don't use gpu?
UPDATE 1: below measurement is actually incorrrect, because in the beginning of the evaluation most of the time is spent on waiting to load the images.
I loaded anchor boxes to GPU and did box decoding on GPU for the bach_size=1.
Here is the result:
CPU : 17sec
GPU: 7 sec
This is for the batch size of 1, I am sure using larger batch_sizes will give more gain in time for GPU
The text was updated successfully, but these errors were encountered: