Add platform arguments to benchmarks #654

texasmichelle · 2020-08-07T22:19:29Z

Adds 3 new arguments for specifying platform for benchmarks: --cpu, --gpu, --tpu. This allows us to set the proper device for connecting to TPUs outside of colab, where Device.defaultXLA does not automatically pick up TPUs.

Lint.

BradLarson · 2020-08-08T00:12:34Z

BenchmarksCore/BenchmarkSettings.swift

-    case .x10: return Device.defaultXLA
+    case .x10:
+      switch platform {
+      case .cpu: return Device.defaultXLA


In a case where this is run on a GPU-enabled system, this will still place the benchmarks on the GPU, which could be undesirable if you really did want to explicitly run benchmarks on the CPU only. I wonder if we should use the explicit filtering done for the TPU case on these other two options, so that CPU placement will work and GPU placement will fail if GPU is specified and none is available. I'd much rather have the error than think that my benchmarks were running on the GPU when they weren't (a problem I've hit with bad CUDA installations).

I updated this section to be a bit more explicit, but I'm seeing strange behavior trying to specify CPU with XLA on a GPU-enabled platform. I'm testing on Colab so maybe it's something weird there, but this is what I see:

let device = Device(kind: .CPU, ordinal: 0, backend: .XLA) let t1 = Tensor([1, 1, 0], on: device) let t2 = Tensor([1, 1, 0], on: device) t1 + t2

2020-08-10 15:43:18.077050: E tensorflow/compiler/xla/xla_client/tf_logging.cc:23] Check failed: it != device_contexts_.end() *** Begin stack trace *** copyTensor $sSa23withUnsafeBufferPointeryqd__qd__SRyxGKXEKlF $s10TensorFlow9XLATensorV4make__2onACSRyxG_SaySiGAA6DeviceVtAA13XLAScalarTypeRzlFZ $s10TensorFlow0A0V5shape7scalars2onACyxGAA0A5ShapeV_SRyxGAA6DeviceVtcfC *** End stack trace *** No such device: CPU:0 2020-08-10 15:43:18.077121: F tensorflow/compiler/xla/xla_client/tf_logging.cc:26] tensorflow/compiler/tf2xla/xla_tensor/tensor.cpp:419 : Check failed: it != device_contexts_.end() *** Begin stack trace *** copyTensor $sSa23withUnsafeBufferPointeryqd__qd__SRyxGKXEKlF $s10TensorFlow9XLATensorV4make__2onACSRyxG_SaySiGAA6DeviceVtAA13XLAScalarTypeRzlFZ $s10TensorFlow0A0V5shape7scalars2onACyxGAA0A5ShapeV_SRyxGAA6DeviceVtcfC *** End stack trace *** No such device: CPU:0 Current stack trace: frame #21: 0x00007fb3999eb113 $__lldb_expr218`main at <Cell 28>:2

This doesn't happen with eager and looks like a bug in the way we're importing X10 libraries, so I'll file an issue. For this PR, which is better?

case .cpu: return Device(kind: .CPU, ordinal: 0, backend: .XLA) case .gpu: return Device(kind: .GPU, ordinal: 0, backend: .XLA)

or

case .cpu: return Device.defaultXLA case .gpu: return Device.defaultXLA

I think this is due to how the XRT device mapping is taking place. The default device is found and mapped correctly, but for non-default devices you have to manually specify the mapping as an environment variable. For example, on a GPU-enabled system I think you need to do the following at the command line:

export XRT_DEVICE_MAP='CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0'

and then the CPU:0 device will appear. We need to have some way of doing these mappings that doesn't require specifying the above at the command line. You run into the same problem if you have multiple CPU or GPU devices and you need to access a non-zero ordinal.

My leaning would be towards your first option, specifying the correct device even if it isn't mapped, rather than using the default device, which could be incorrect. At least the error will let you know you need to have the mappings set up to use the device you desire.

Ah, interesting - it sounds like a larger device management fix is in order. I think you're right about specifying the device, although I hate to add things that cause stack traces. In combination with the default option, it at least means someone would have to go out of their way to make it fail.

BradLarson · 2020-08-08T00:17:38Z

BenchmarksCore/BenchmarkSettings.swift

@@ -100,6 +125,7 @@ public let defaultSettings: [BenchmarkSetting] = [
  TimeUnit(.s),
  InverseTimeUnit(.s),
  Backend(.eager),
+  Platform(.cpu),


I wonder if we should have another state: .default that would use Device.defaultXLA, so that we could have that here if .cpu and .gpu cases were made more explicit. Otherwise, we'll have to modify the PerfZero Python file to explicitly provide --gpu to the benchmarks.

this is a great suggestion, especially in combination with the changes above.

… platform

texasmichelle · 2020-08-10T23:59:17Z

swift-apis/#1059 created for Device fixing.

Add platform argument to benchmarks

3ba0238

texasmichelle requested review from BradLarson and xihui-wu August 7, 2020 22:19

Lint

c505cd5

BradLarson reviewed Aug 8, 2020

View reviewed changes

texasmichelle added 3 commits August 10, 2020 15:41

Make device selection more explicit

c51aa7f

Merge branch 'platform' of github.com:texasmichelle/swift-models into…

2da59bf

… platform

Add default platform

31d3867

texasmichelle mentioned this pull request Aug 10, 2020

Expose all devices. tensorflow/swift-apis#1059

Open

BradLarson approved these changes Aug 11, 2020

View reviewed changes

texasmichelle merged commit 3e83532 into tensorflow:master Aug 11, 2020

texasmichelle deleted the platform branch August 11, 2020 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add platform arguments to benchmarks #654

Add platform arguments to benchmarks #654

texasmichelle commented Aug 7, 2020 •

edited

Loading

BradLarson Aug 8, 2020

texasmichelle Aug 10, 2020

BradLarson Aug 10, 2020

texasmichelle Aug 10, 2020

BradLarson Aug 8, 2020

texasmichelle Aug 10, 2020

texasmichelle commented Aug 10, 2020

Add platform arguments to benchmarks #654

Add platform arguments to benchmarks #654

Conversation

texasmichelle commented Aug 7, 2020 • edited Loading

BradLarson Aug 8, 2020

Choose a reason for hiding this comment

texasmichelle Aug 10, 2020

Choose a reason for hiding this comment

BradLarson Aug 10, 2020

Choose a reason for hiding this comment

texasmichelle Aug 10, 2020

Choose a reason for hiding this comment

BradLarson Aug 8, 2020

Choose a reason for hiding this comment

texasmichelle Aug 10, 2020

Choose a reason for hiding this comment

texasmichelle commented Aug 10, 2020

texasmichelle commented Aug 7, 2020 •

edited

Loading