You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
TFTensor object usually obtained from a CPU array meaning a need of data copy from a device(GPU) to a host (CPU). This pipleine architecture is considerably slow for a large dataset (e.g. large images).
Describe the solution you'd like
My pipeline considers the image processing via cuda(managedCuda wrapper) and its libraries (npp). At some point i would like to feed my CNN with an image stored on GPU as npp Image or just cuda array or just device pointer - call this d_array for the sake of convenience. Of course, one can copy it to the host to get a standard host
d_array.CopytoHost(h_array);
and then define the usual
vartensor=newTFTensor(h_array);
Is there and option to get the device tensor d_tensor from d_array directly avoiding CopytoHost operation and feed it to CNN?
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
TFTensor object usually obtained from a CPU array meaning a need of data copy from a device(GPU) to a host (CPU). This pipleine architecture is considerably slow for a large dataset (e.g. large images).
Describe the solution you'd like
My pipeline considers the image processing via cuda(managedCuda wrapper) and its libraries (npp). At some point i would like to feed my CNN with an image stored on GPU as npp Image or just cuda array or just device pointer - call this
d_array
for the sake of convenience. Of course, one can copy it to the host to get a standard hostand then define the usual
Is there and option to get the device tensor d_tensor from d_array directly avoiding CopytoHost operation and feed it to CNN?
The text was updated successfully, but these errors were encountered: