Why 1023 in this code? #123

ty5491003 · 2019-09-24T14:03:25Z

Line 435 in e6afb28

length=min(length, 1023 - (len(context_tokens) if prefix else 0)),

From this code, we can see that the maxlen of the generative data is 1023 tokens, no matter how we adjust the '--length' parameter when generating. So i want to ask why set a constant 1023 in this code?

minimaxir · 2019-09-26T01:36:21Z

The original GPT-2 model is fixed at a context window of 1024; if you go over, it'll error.

The fix is a sliding windows approach which is in #87 but it needs testing.

ty5491003 · 2019-09-26T02:35:17Z

Got it, thx.

minimaxir closed this as completed Sep 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why 1023 in this code? #123

Why 1023 in this code? #123

ty5491003 commented Sep 24, 2019

minimaxir commented Sep 26, 2019

ty5491003 commented Sep 26, 2019

Why 1023 in this code? #123

Why 1023 in this code? #123

Comments

ty5491003 commented Sep 24, 2019

minimaxir commented Sep 26, 2019

ty5491003 commented Sep 26, 2019