-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add platform arguments to benchmarks #654
Conversation
case .x10: return Device.defaultXLA | ||
case .x10: | ||
switch platform { | ||
case .cpu: return Device.defaultXLA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a case where this is run on a GPU-enabled system, this will still place the benchmarks on the GPU, which could be undesirable if you really did want to explicitly run benchmarks on the CPU only. I wonder if we should use the explicit filtering done for the TPU case on these other two options, so that CPU placement will work and GPU placement will fail if GPU is specified and none is available. I'd much rather have the error than think that my benchmarks were running on the GPU when they weren't (a problem I've hit with bad CUDA installations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this section to be a bit more explicit, but I'm seeing strange behavior trying to specify CPU with XLA on a GPU-enabled platform. I'm testing on Colab so maybe it's something weird there, but this is what I see:
let device = Device(kind: .CPU, ordinal: 0, backend: .XLA)
let t1 = Tensor([1, 1, 0], on: device)
let t2 = Tensor([1, 1, 0], on: device)
t1 + t2
2020-08-10 15:43:18.077050: E tensorflow/compiler/xla/xla_client/tf_logging.cc:23] Check failed: it != device_contexts_.end()
*** Begin stack trace ***
copyTensor
$sSa23withUnsafeBufferPointeryqd__qd__SRyxGKXEKlF
$s10TensorFlow9XLATensorV4make__2onACSRyxG_SaySiGAA6DeviceVtAA13XLAScalarTypeRzlFZ
$s10TensorFlow0A0V5shape7scalars2onACyxGAA0A5ShapeV_SRyxGAA6DeviceVtcfC
*** End stack trace ***
No such device: CPU:0
2020-08-10 15:43:18.077121: F tensorflow/compiler/xla/xla_client/tf_logging.cc:26] tensorflow/compiler/tf2xla/xla_tensor/tensor.cpp:419 : Check failed: it != device_contexts_.end()
*** Begin stack trace ***
copyTensor
$sSa23withUnsafeBufferPointeryqd__qd__SRyxGKXEKlF
$s10TensorFlow9XLATensorV4make__2onACSRyxG_SaySiGAA6DeviceVtAA13XLAScalarTypeRzlFZ
$s10TensorFlow0A0V5shape7scalars2onACyxGAA0A5ShapeV_SRyxGAA6DeviceVtcfC
*** End stack trace ***
No such device: CPU:0
Current stack trace:
frame #21: 0x00007fb3999eb113 $__lldb_expr218`main at <Cell 28>:2
This doesn't happen with eager and looks like a bug in the way we're importing X10 libraries, so I'll file an issue. For this PR, which is better?
case .cpu: return Device(kind: .CPU, ordinal: 0, backend: .XLA)
case .gpu: return Device(kind: .GPU, ordinal: 0, backend: .XLA)
or
case .cpu: return Device.defaultXLA
case .gpu: return Device.defaultXLA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is due to how the XRT device mapping is taking place. The default device is found and mapped correctly, but for non-default devices you have to manually specify the mapping as an environment variable. For example, on a GPU-enabled system I think you need to do the following at the command line:
export XRT_DEVICE_MAP='CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0'
and then the CPU:0 device will appear. We need to have some way of doing these mappings that doesn't require specifying the above at the command line. You run into the same problem if you have multiple CPU or GPU devices and you need to access a non-zero ordinal.
My leaning would be towards your first option, specifying the correct device even if it isn't mapped, rather than using the default device, which could be incorrect. At least the error will let you know you need to have the mappings set up to use the device you desire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting - it sounds like a larger device management fix is in order. I think you're right about specifying the device, although I hate to add things that cause stack traces. In combination with the default option, it at least means someone would have to go out of their way to make it fail.
@@ -100,6 +125,7 @@ public let defaultSettings: [BenchmarkSetting] = [ | |||
TimeUnit(.s), | |||
InverseTimeUnit(.s), | |||
Backend(.eager), | |||
Platform(.cpu), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should have another state: .default
that would use Device.defaultXLA
, so that we could have that here if .cpu
and .gpu
cases were made more explicit. Otherwise, we'll have to modify the PerfZero Python file to explicitly provide --gpu
to the benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great suggestion, especially in combination with the changes above.
swift-apis/#1059 created for |
Adds 3 new arguments for specifying platform for benchmarks:
--cpu
,--gpu
,--tpu
. This allows us to set the proper device for connecting to TPUs outside of colab, whereDevice.defaultXLA
does not automatically pick up TPUs.