Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add platform arguments to benchmarks #654

Merged
merged 5 commits into from
Aug 11, 2020

Conversation

texasmichelle
Copy link
Member

@texasmichelle texasmichelle commented Aug 7, 2020

Adds 3 new arguments for specifying platform for benchmarks: --cpu, --gpu, --tpu. This allows us to set the proper device for connecting to TPUs outside of colab, where Device.defaultXLA does not automatically pick up TPUs.

  • Lint.

case .x10: return Device.defaultXLA
case .x10:
switch platform {
case .cpu: return Device.defaultXLA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a case where this is run on a GPU-enabled system, this will still place the benchmarks on the GPU, which could be undesirable if you really did want to explicitly run benchmarks on the CPU only. I wonder if we should use the explicit filtering done for the TPU case on these other two options, so that CPU placement will work and GPU placement will fail if GPU is specified and none is available. I'd much rather have the error than think that my benchmarks were running on the GPU when they weren't (a problem I've hit with bad CUDA installations).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this section to be a bit more explicit, but I'm seeing strange behavior trying to specify CPU with XLA on a GPU-enabled platform. I'm testing on Colab so maybe it's something weird there, but this is what I see:

let device = Device(kind: .CPU, ordinal: 0, backend: .XLA)
let t1 = Tensor([1, 1, 0], on: device)
let t2 = Tensor([1, 1, 0], on: device)
t1 + t2
2020-08-10 15:43:18.077050: E tensorflow/compiler/xla/xla_client/tf_logging.cc:23] Check failed: it != device_contexts_.end() 
*** Begin stack trace ***
	
	
	
	
	copyTensor
	
	
	
	
	$sSa23withUnsafeBufferPointeryqd__qd__SRyxGKXEKlF
	$s10TensorFlow9XLATensorV4make__2onACSRyxG_SaySiGAA6DeviceVtAA13XLAScalarTypeRzlFZ
	$s10TensorFlow0A0V5shape7scalars2onACyxGAA0A5ShapeV_SRyxGAA6DeviceVtcfC
	
*** End stack trace ***
No such device: CPU:0
2020-08-10 15:43:18.077121: F tensorflow/compiler/xla/xla_client/tf_logging.cc:26] tensorflow/compiler/tf2xla/xla_tensor/tensor.cpp:419 : Check failed: it != device_contexts_.end() 
*** Begin stack trace ***
	
	
	
	
	copyTensor
	
	
	
	
	$sSa23withUnsafeBufferPointeryqd__qd__SRyxGKXEKlF
	$s10TensorFlow9XLATensorV4make__2onACSRyxG_SaySiGAA6DeviceVtAA13XLAScalarTypeRzlFZ
	$s10TensorFlow0A0V5shape7scalars2onACyxGAA0A5ShapeV_SRyxGAA6DeviceVtcfC
	
*** End stack trace ***
No such device: CPU:0
Current stack trace:
	frame #21: 0x00007fb3999eb113 $__lldb_expr218`main at <Cell 28>:2

This doesn't happen with eager and looks like a bug in the way we're importing X10 libraries, so I'll file an issue. For this PR, which is better?

      case .cpu: return Device(kind: .CPU, ordinal: 0, backend: .XLA)
      case .gpu: return Device(kind: .GPU, ordinal: 0, backend: .XLA)

or

      case .cpu: return Device.defaultXLA
      case .gpu: return Device.defaultXLA

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is due to how the XRT device mapping is taking place. The default device is found and mapped correctly, but for non-default devices you have to manually specify the mapping as an environment variable. For example, on a GPU-enabled system I think you need to do the following at the command line:

export XRT_DEVICE_MAP='CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0'

and then the CPU:0 device will appear. We need to have some way of doing these mappings that doesn't require specifying the above at the command line. You run into the same problem if you have multiple CPU or GPU devices and you need to access a non-zero ordinal.

My leaning would be towards your first option, specifying the correct device even if it isn't mapped, rather than using the default device, which could be incorrect. At least the error will let you know you need to have the mappings set up to use the device you desire.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting - it sounds like a larger device management fix is in order. I think you're right about specifying the device, although I hate to add things that cause stack traces. In combination with the default option, it at least means someone would have to go out of their way to make it fail.

@@ -100,6 +125,7 @@ public let defaultSettings: [BenchmarkSetting] = [
TimeUnit(.s),
InverseTimeUnit(.s),
Backend(.eager),
Platform(.cpu),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have another state: .default that would use Device.defaultXLA, so that we could have that here if .cpu and .gpu cases were made more explicit. Otherwise, we'll have to modify the PerfZero Python file to explicitly provide --gpu to the benchmarks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great suggestion, especially in combination with the changes above.

@texasmichelle
Copy link
Member Author

swift-apis/#1059 created for Device fixing.

@texasmichelle texasmichelle merged commit 3e83532 into tensorflow:master Aug 11, 2020
@texasmichelle texasmichelle deleted the platform branch August 11, 2020 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants