-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-tuning on Colab #63
Comments
Also have a problem during fine-tuning in Colab. 2019-11-28 08:04:36.283362: I tensorflow/core/common_runtime/placer.cc:54] save/RestoreV2/tensor_names: (Const)/job:localhost/replica:0/task:0/device:CPU:0 Here is the link to my colab notebook for more info: is it possible to somehow solve this problem with GPU? Thank you :) |
Any update to this problem? Having exactly the same issue here. |
The I will note that I'm having trouble fine-tuning a model even using Colab Pro with the higher resourcing. I've been playing around with batch sizes and the like and so far it's still blowing through the memory limits. |
Not possible, must use a developer, costs around $65k. Stop commenting |
I am trying to fine-tune the model in Google Colab in a [Python 3 / GPU] runtime type. After launching the training, it suddenly stops, indicating "^C" though I haven't pressed Ctrl-C. The last messages before it stops are:
2019-11-28 09:49:59.717067: W tensorflow/core/framework/allocator.cc:107] Allocation of 41943040 exceeds 10% of system memory.
tcmalloc: large alloc 1262256128 bytes == 0x11ff50000 @ 0x7f3452087b6b 0x7f34520a7379 0x7f340d73b754 0x7f340d6f6c8a 0x7f340d433f11 0x7f340d4415b2 0x7f340d449dda 0x7f3416665097 0x7f3416666bee 0x7f3416666dcd 0x7f3416660a3b 0x7f341660d781 0x7f341660e164 0x7f341651d3b9 0x7f341651e72a 0x7f3416520187 0x7f3416522122 0x7f34165156d1 0x7f341651713c 0x7f34134ad211 0x7f34134af0a6 0x7f34134b0f26 0x7f34134b1654 0x7f3410dd3755 0x7f34134ee7cd 0x7f34134ef505 0x7f3410dd0b58 0x7f3410dd0c9a 0x7f3410d87f8e 0x50a84f 0x50c549
tcmalloc: large alloc 1262256128 bytes == 0x11ff50000 @ 0x7f3452087b6b 0x7f34520a7379 0x7f340d73b754 0x7f340d6f6c8a 0x7f340d433f11 0x7f340d4415b2 0x7f340d449dda 0x7f3416665097 0x7f3416666bee 0x7f3416666dcd 0x7f3416660a3b 0x7f341660d781 0x7f341660e164 0x7f341651d3b9 0x7f341651e72a 0x7f3416520187 0x7f3416522122 0x7f34165156d1 0x7f341651713c 0x7f34134ad211 0x7f34134af0a6 0x7f34134b0f26 0x7f34134b1654 0x7f3410dd3755 0x7f34134ee7cd 0x7f34134ef505 0x7f3410dd0b58 0x7f3410dd0c9a 0x7f3410d87f8e 0x50a84f 0x50c549
2019-11-28 09:50:18.233025: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
Here is the link to my colab notebook for more info: https://colab.research.google.com/drive/1HZlVxvrH1JbcLKa-Z437MjsmYK9NcLwh
Let me know if you know where that comes from. Thanks a lot in advance :)
The text was updated successfully, but these errors were encountered: