CPU works well, but when useing CUDA,get errors #126

ShelbyXu9 · 2021-06-15T01:58:37Z

/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:521: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:522: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
pid: 8240
WARNING:tensorflow:From /home/ray/Documents/frustum-pointnets/models/model_util.py:212: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2021-06-15 09:56:03.089481: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-06-15 09:56:03.190551: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-15 09:56:03.190859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.86
pciBusID: 0000:26:00.0
totalMemory: 5.80GiB freeMemory: 5.61GiB
2021-06-15 09:56:03.190872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2021-06-15 09:56:03.354297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-15 09:56:03.354327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2021-06-15 09:56:03.354333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2021-06-15 09:56:03.354408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5371 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:26:00.0, compute capability: 7.5)
**** EPOCH 000 ****
2021-06-15 09:56:04.726786
2021-06-15 09:56:07.940918: E tensorflow/stream_executor/cuda/cuda_dnn.cc:3072] failed to enqueue forward batch normalization on stream: CUDNN_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[32,3,3], b.shape=[32,3,8], m=3, n=8, k=3, batch_size=32
[[Node: MatMul_1 = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](concat_13, concat_12)]]
[[Node: gradients/conv-reg1/Conv2D_grad/tuple/control_dependency_1/_265 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4578_gradients/conv-reg1/Conv2D_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train/train.py", line 377, in
train()
File "train/train.py", line 202, in train
train_one_epoch(sess, ops, train_writer)
File "train/train.py", line 257, in train_one_epoch
feed_dict=feed_dict)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[32,3,3], b.shape=[32,3,8], m=3, n=8, k=3, batch_size=32
[[Node: MatMul_1 = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](concat_13, concat_12)]]
[[Node: gradients/conv-reg1/Conv2D_grad/tuple/control_dependency_1/_265 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4578_gradients/conv-reg1/Conv2D_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'MatMul_1', defined at:
File "train/train.py", line 377, in
train()
File "train/train.py", line 122, in train
size_class_label_pl, size_residual_label_pl, end_points)
File "/home/ray/Documents/frustum-pointnets/models/model_util.py", line 380, in get_loss
center_label, heading_label, size_label) # (B,8,3)
File "/home/ray/Documents/frustum-pointnets/models/model_util.py", line 91, in get_box3d_corners_helper
corners_3d = tf.matmul(R, corners) # (N,3,8)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2084, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1236, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[32,3,3], b.shape=[32,3,8], m=3, n=8, k=3, batch_size=32
[[Node: MatMul_1 = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](concat_13, concat_12)]]
[[Node: gradients/conv-reg1/Conv2D_grad/tuple/control_dependency_1/_265 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4578_gradients/conv-reg1/Conv2D_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

ShelbyXu9 · 2021-06-15T01:59:10Z

Anyone can give a advice?
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU works well, but when useing CUDA,get errors #126

CPU works well, but when useing CUDA,get errors #126

ShelbyXu9 commented Jun 15, 2021

ShelbyXu9 commented Jun 15, 2021

CPU works well, but when useing CUDA,get errors #126

CPU works well, but when useing CUDA,get errors #126

Comments

ShelbyXu9 commented Jun 15, 2021

ShelbyXu9 commented Jun 15, 2021