Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while trying to use the model #4

Open
sandeeppilania opened this issue Jan 13, 2020 · 17 comments
Open

Error while trying to use the model #4

sandeeppilania opened this issue Jan 13, 2020 · 17 comments

Comments

@sandeeppilania
Copy link

sandeeppilania commented Jan 13, 2020

Traceback (most recent call last):
File "bert.py", line 429, in
main()
File "bert.py", line 373, in main
config, config.task_name, tokenizer, evaluate=False)
File "bert.py", line 268, in load_and_cache_examples
examples, label_list, config.max_seq_len, tokenizer, "classification", use_entity_indicator=confi
g.use_entity_indicator)
File "C:\Users\pilanisp\Desktop\BERT FINAL\BERT IE\bert-relation-classification\utils.py", line 281
, in convert_examples_to_features
e11_p = tokens_a.index("#")+1 # the start position of entity1
ValueError: '#' is not in list

@bilalghanem
Copy link

I have the same issue!

@nannigath
Copy link

me too

@bilalghanem
Copy link

I think the authors was planning to use E11, E21, etc. but then changed the code to use # & $.

What I have done to solve the issue is that when I read the data in the beginning of the code, I convert the special tokens as the following:

E11 & E21 -> #
E21 & E22 -> $

and then everything worked perfectly.

@sandeeppilania
Copy link
Author

@bilalghanem Can you share an example of how you converted the training example.
Did you change the entire train.tsv at first or are you changing it at as you read through the file in the code

@bilalghanem
Copy link

@sandeeppilania I change it in the code.
Simply, in function convert_examples_to_features, before the line: l = len(tokens_a), use .replace to convert them.

ex.

str.replace('E11', '#')
etc.

@sandeeppilania
Copy link
Author

@bilalghanem I am asking something silly here sorry about that,
but on line tokens_a = tokenizer.tokenize(example.text_a) in function convert_examples_to_features
i tried printing out tokens_a and this is what i am able to see:
['the', 'system', 'as', 'described', 'above', 'has', 'its', 'greatest', 'application', 'in', 'an', 'array', '##ed', '[', 'e', '##11', ']', 'configuration', '[', 'e', '##12', ']', 'of', 'antenna', '[', 'e', '##21', ']', 'elements', '[', 'e', '##22', ']']
so i dont how the replace str.replace('E11', '#') would work here

@bilalghanem
Copy link

@bilalghanem I am asking something silly here sorry about that,
but on line tokens_a = tokenizer.tokenize(example.text_a) in function convert_examples_to_features
i tried printing out tokens_a and this is what i am able to see:
['the', 'system', 'as', 'described', 'above', 'has', 'its', 'greatest', 'application', 'in', 'an', 'array', '##ed', '[', 'e', '##11', ']', 'configuration', '[', 'e', '##12', ']', 'of', 'antenna', '[', 'e', '##21', ']', 'elements', '[', 'e', '##22', ']']
so i dont how the replace str.replace('E11', '#') would work here

sorry, you're right .. before applying the tokenizer. or even when u start reading the data.

@sandeeppilania
Copy link
Author

Got it,
So basically,
0 the system as described above has its greatest application in an arrayed [E11] configuration [E12] of antenna [E21] elements [E22] 12 whole component 2
should be converted to
0 the system as described above has its greatest application in an arrayed #configuration# of antenna $elements$ 12 whole component 2
Right?
because my understaning is that e11_p = tokens_a.index("#")+1 is looking for just next offset of #.

@bilalghanem
Copy link

@sandeeppilania yes, exactly.

And this line specifies the end entity in case its length is more than single word.
e12_p = l-tokens_a[::-1].index("#")+1

@ejokhan
Copy link

ejokhan commented Feb 16, 2020

@sandeeppilania hi, brother... i have the same issue.. can you share the part of the code where exactly did you made changes to solve the problem..

thanks in advance

@Valdegg
Copy link

Valdegg commented Feb 18, 2020

I think the authors was planning to use E11, E21, etc. but then changed the code to use # & $.

What I have done to solve the issue is that when I read the data in the beginning of the code, I convert the special tokens as the following:

E11 & E21 -> #
E21 & E22 -> $

and then everything worked perfectly.

You mean E11 & E12?

@Valdegg
Copy link

Valdegg commented Feb 18, 2020

I wonder why they didn't try running the software before they posted it here (and explicitly said it's "stable", when it doesn't even run)...

@wang-h
Copy link
Member

wang-h commented Mar 3, 2020

@sandeeppilania hi, brother... i have the same issue.. can you share the part of the code where exactly did you made changes to solve the problem..

thanks in advance

I think the authors was planning to use E11, E21, etc. but then changed the code to use # & $.
What I have done to solve the issue is that when I read the data in the beginning of the code, I convert the special tokens as the following:
E11 & E21 -> #
E21 & E22 -> $
and then everything worked perfectly.

You mean E11 & E12?

Please check the following lines in bert.py.
uncomment the line you need.
#additional_special_tokens = ["[E11]", "[E12]", "[E21]", "[E22]"]
additional_special_tokens = []
#additional_special_tokens = ["e11", "e12", "e21", "e22"]

@seesky8848
Copy link

Hey guys, look here!Modify the "additional_special_tokens" in the file "bert.py" so that it corresponds to the file "util.py", and pay attention to the starting and ending subscript positions in the file "util.py"; If necessary, modify about 275 lines of code in the file "util.py". After finishing, I can start training. I have tried this method and it is effective.

伙计,来瞧瞧这!修改bert.py文件中的additional_special_tokens使它和util.py文件中的tokens_a对应,同时注意开始和结束的下标位置;必要情况修改util.py文件中的275行左右的代码。完成后即可开始训练,我已经尝试了这种方法,是有效的。

@wang-h
Copy link
Member

wang-h commented Jun 21, 2022

I am sorry I have no time to correct the code, the error raises when you are using a modern transformer library.
model = XXX.from_pretrained(args.bert_model, args=args) tokenizer.add_tokens(additional_special_tokens)
add the following line!!!
model.resize_token_embeddings(len(tokenizer))

@wang-h
Copy link
Member

wang-h commented Jun 21, 2022

I am sorry I have no time to correct the code, the error raises when you are using a modern transformers library.
model = XXX.from_pretrained(args.bert_model, args=args) tokenizer.add_tokens(additional_special_tokens)
add the following line!!!
model.resize_token_embeddings(len(tokenizer))

@seesky8848
Copy link

seesky8848 commented Jun 21, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants