Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU works well, but when useing CUDA,get errors #126

Open
ShelbyXu9 opened this issue Jun 15, 2021 · 1 comment
Open

CPU works well, but when useing CUDA,get errors #126

ShelbyXu9 opened this issue Jun 15, 2021 · 1 comment

Comments

@ShelbyXu9
Copy link

/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:521: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:522: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
pid: 8240
WARNING:tensorflow:From /home/ray/Documents/frustum-pointnets/models/model_util.py:212: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2021-06-15 09:56:03.089481: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-06-15 09:56:03.190551: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-15 09:56:03.190859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.86
pciBusID: 0000:26:00.0
totalMemory: 5.80GiB freeMemory: 5.61GiB
2021-06-15 09:56:03.190872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2021-06-15 09:56:03.354297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-15 09:56:03.354327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2021-06-15 09:56:03.354333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2021-06-15 09:56:03.354408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5371 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:26:00.0, compute capability: 7.5)
**** EPOCH 000 ****
2021-06-15 09:56:04.726786
2021-06-15 09:56:07.940918: E tensorflow/stream_executor/cuda/cuda_dnn.cc:3072] failed to enqueue forward batch normalization on stream: CUDNN_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[32,3,3], b.shape=[32,3,8], m=3, n=8, k=3, batch_size=32
[[Node: MatMul_1 = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](concat_13, concat_12)]]
[[Node: gradients/conv-reg1/Conv2D_grad/tuple/control_dependency_1/_265 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4578_gradients/conv-reg1/Conv2D_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train/train.py", line 377, in
train()
File "train/train.py", line 202, in train
train_one_epoch(sess, ops, train_writer)
File "train/train.py", line 257, in train_one_epoch
feed_dict=feed_dict)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[32,3,3], b.shape=[32,3,8], m=3, n=8, k=3, batch_size=32
[[Node: MatMul_1 = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](concat_13, concat_12)]]
[[Node: gradients/conv-reg1/Conv2D_grad/tuple/control_dependency_1/_265 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4578_gradients/conv-reg1/Conv2D_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'MatMul_1', defined at:
File "train/train.py", line 377, in
train()
File "train/train.py", line 122, in train
size_class_label_pl, size_residual_label_pl, end_points)
File "/home/ray/Documents/frustum-pointnets/models/model_util.py", line 380, in get_loss
center_label, heading_label, size_label) # (B,8,3)
File "/home/ray/Documents/frustum-pointnets/models/model_util.py", line 91, in get_box3d_corners_helper
corners_3d = tf.matmul(R, corners) # (N,3,8)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2084, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1236, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/home/ray/anaconda3/envs/frustum-pointnets/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[32,3,3], b.shape=[32,3,8], m=3, n=8, k=3, batch_size=32
[[Node: MatMul_1 = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](concat_13, concat_12)]]
[[Node: gradients/conv-reg1/Conv2D_grad/tuple/control_dependency_1/_265 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4578_gradients/conv-reg1/Conv2D_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

@ShelbyXu9
Copy link
Author

Anyone can give a advice?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant