Skip to content

Commit

Permalink
Update index.html
Browse files Browse the repository at this point in the history
  • Loading branch information
zhangtianshu authored Nov 22, 2023
1 parent 6c3985d commit b370aa6
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -625,7 +625,7 @@ <h2 class="title is-3">In-domain Evaluation</h2>
</div>
<div>
<br>
Specifically, we observe the following takeaways:
Specifically, we observed the following takeaways:
<ol>
<li>By simply fine-tuning a large language model on TableInstruct, TableLlama can achieve comparable or even better performance on almost all the tasks <b>without any table pretraining or special table model architecture design</b>;</li>
<li><b>TableLlama displays advantages in table QA tasks</b>: <b>TableLlama</b> can surpass the SOTA by <b>5.61 points</b> for highlighted cell based table QA task (i.e., FeTaQA) and <b>17.71 points</b> for hierarchical table QA (i.e., HiTab), which is full of numerical reasoning on tables. As LLMs have been shown superior in interacting with humans and answering questions, this indicates that <b>the existing underlying strong language understanding ability of LLMs may be beneficial for such table QA tasks, despite with semi-structured tables</b>;</li>
Expand Down Expand Up @@ -718,7 +718,7 @@ <h2 class="title is-3">Out-of-domain Evaluation</h2>
</div>
<div>
<br>
Specifically, we observed these following takeaways:
Specifically, we observed the following takeaways:
<ol>
<li><b>By learning from the table-based training tasks, the model has acquired essential underlying table understanding ability, which can be transferred to other table-based tasks/datasets and facilitate their performance;</b></li>
<li>FEVEROUS exhibits the largest gain over the other 5 datasets. This is likely because the fact verification task is an in-domain training task, although the dataset is unseen during training. <b>Compared with cross-task generalization, it may be easier to generalize to different datasets belonging to the same tasks</b>;</li>
Expand Down

0 comments on commit b370aa6

Please sign in to comment.