-
Notifications
You must be signed in to change notification settings - Fork 560
"sliding window" bigtable training mode #713
base: master
Are you sure you want to change the base?
Conversation
preprocessing.py
Outdated
@@ -261,6 +261,40 @@ def get_tpu_bt_input_tensors(games, games_nr, batch_size, num_repeats=1, | |||
return dataset | |||
|
|||
|
|||
def get_many_tpu_bt_input_tensors(games, games_nr, batch_size, | |||
start_at, num_datasets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indenting is wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
preprocessing.py
Outdated
# is proportionally along compared to last_game_number? comparing | ||
# timestamps?) | ||
ds = games.moves_from_games(start_at + (i * window_increment), | ||
start_at + (i * window_increment) + window_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit indenting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
preprocessing.py
Outdated
shuffle=True, | ||
column_family=bigtable_input.TFEXAMPLE, | ||
column='example') | ||
ds = ds.repeat(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can probably move the repeat and map out of this loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
column='example') | ||
ds = ds.repeat(1) | ||
ds = ds.map(lambda row_name, s: s) | ||
dataset = dataset.concatenate(ds) if dataset else ds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the general approach: if the training loop does multiple scans, I would expect to create a new dataset for each pass, rather than try to create a single enormous dataset, which I imagine would be harder to debug, inspect, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but multiple calls to tpuestimator.train will create new graphs :( I am not sure what a good solution for lazy evaluating of these Datasets would be. As it is, it takes a real long time to build the datasets before training even starts -- i suspect the concatenate is doing something bad as things get slower and slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, PTAL :)
column='example') | ||
ds = ds.repeat(1) | ||
ds = ds.map(lambda row_name, s: s) | ||
dataset = dataset.concatenate(ds) if dataset else ds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but multiple calls to tpuestimator.train will create new graphs :( I am not sure what a good solution for lazy evaluating of these Datasets would be. As it is, it takes a real long time to build the datasets before training even starts -- i suspect the concatenate is doing something bad as things get slower and slower.
preprocessing.py
Outdated
shuffle=True, | ||
column_family=bigtable_input.TFEXAMPLE, | ||
column='example') | ||
ds = ds.repeat(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
preprocessing.py
Outdated
@@ -261,6 +261,40 @@ def get_tpu_bt_input_tensors(games, games_nr, batch_size, num_repeats=1, | |||
return dataset | |||
|
|||
|
|||
def get_many_tpu_bt_input_tensors(games, games_nr, batch_size, | |||
start_at, num_datasets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
preprocessing.py
Outdated
# is proportionally along compared to last_game_number? comparing | ||
# timestamps?) | ||
ds = games.moves_from_games(start_at + (i * window_increment), | ||
start_at + (i * window_increment) + window_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
train.py
Outdated
@@ -132,6 +132,36 @@ def after_run(self, run_context, run_values): | |||
self.before_weights = None | |||
|
|||
|
|||
def train_many(start_at=1000000, num_datasets=3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you expose moves here also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean? number of steps?
@amj: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
instead of repeatedly calling train, create repeated datasets by incrementing along the dataset window.
@gitosaurus this is kind of a first cut -- it has to do all the full key retrieval and shuffle before it can start, and it'd be great if i could make that lazy somehow.