The Result on CNN and Daily Mail #37

xieyxclack · 2019-11-30T09:09:26Z

Hello, Thanks for providing the Transformer-based s2s models for abstractive text summarization, it helps me a lot.
I run it on CNN and Daily Mail dataset and obtain the results as:

1 ROUGE-1 Average_R: 0.40213 (95%-conf.int. 0.39962 - 0.40466)
1 ROUGE-1 Average_P: 0.40580 (95%-conf.int. 0.40310 - 0.40855)
1 ROUGE-1 Average_F: 0.39289 (95%-conf.int. 0.39072 - 0.39516)

1 ROUGE-2 Average_R: 0.17639 (95%-conf.int. 0.17417 - 0.17878)
1 ROUGE-2 Average_P: 0.17982 (95%-conf.int. 0.17756 - 0.18227)
1 ROUGE-2 Average_F: 0.17305 (95%-conf.int. 0.17094 - 0.17527)

1 ROUGE-L Average_R: 0.27810 (95%-conf.int. 0.27581 - 0.28035)
1 ROUGE-L Average_P: 0.27940 (95%-conf.int. 0.27701 - 0.28185)
1 ROUGE-L Average_F: 0.27099 (95%-conf.int. 0.26895 - 0.27300)

ROUGE-1/2/L: 39.29/17.30/27.10

I adopt the default setting but find that the results are far from those reported in the previous study. For example (ROUGE-1/2/L)):
In "Text Summarization with Pretrained Encoders": TransformerABS - 40.21; 17.76; 37.09

In fact, the ROUGE-L result is terrible compared with others, therefore I doubt I make some mistakes during training. I trained on 1 GPU for 3 days, total 17w steps with batch size = 32.

Does anyone obtain the result on CNN and Daily Mail dataset, or know what is wrong during training?
Many thanks!

zakhan4 · 2020-07-28T14:18:35Z

Hi,

Just a thought. Why are you comparing results of this model with TransformerABS? Because as far as I understand, the model in this repo, uses BERT as an encoder, while TransformerABS in the paper have a normal Transformer encoder which is trained from scratch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Result on CNN and Daily Mail #37

The Result on CNN and Daily Mail #37

xieyxclack commented Nov 30, 2019

zakhan4 commented Jul 28, 2020

The Result on CNN and Daily Mail #37

The Result on CNN and Daily Mail #37

Comments

xieyxclack commented Nov 30, 2019

1 ROUGE-1 Average_R: 0.40213 (95%-conf.int. 0.39962 - 0.40466) 1 ROUGE-1 Average_P: 0.40580 (95%-conf.int. 0.40310 - 0.40855) 1 ROUGE-1 Average_F: 0.39289 (95%-conf.int. 0.39072 - 0.39516)

1 ROUGE-2 Average_R: 0.17639 (95%-conf.int. 0.17417 - 0.17878) 1 ROUGE-2 Average_P: 0.17982 (95%-conf.int. 0.17756 - 0.18227) 1 ROUGE-2 Average_F: 0.17305 (95%-conf.int. 0.17094 - 0.17527)

1 ROUGE-L Average_R: 0.27810 (95%-conf.int. 0.27581 - 0.28035) 1 ROUGE-L Average_P: 0.27940 (95%-conf.int. 0.27701 - 0.28185) 1 ROUGE-L Average_F: 0.27099 (95%-conf.int. 0.26895 - 0.27300)

zakhan4 commented Jul 28, 2020

1 ROUGE-1 Average_R: 0.40213 (95%-conf.int. 0.39962 - 0.40466)
1 ROUGE-1 Average_P: 0.40580 (95%-conf.int. 0.40310 - 0.40855)
1 ROUGE-1 Average_F: 0.39289 (95%-conf.int. 0.39072 - 0.39516)

1 ROUGE-2 Average_R: 0.17639 (95%-conf.int. 0.17417 - 0.17878)
1 ROUGE-2 Average_P: 0.17982 (95%-conf.int. 0.17756 - 0.18227)
1 ROUGE-2 Average_F: 0.17305 (95%-conf.int. 0.17094 - 0.17527)

1 ROUGE-L Average_R: 0.27810 (95%-conf.int. 0.27581 - 0.28035)
1 ROUGE-L Average_P: 0.27940 (95%-conf.int. 0.27701 - 0.28185)
1 ROUGE-L Average_F: 0.27099 (95%-conf.int. 0.26895 - 0.27300)