-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
240908 TEST 완료 : Mcp sync-up(main 2f54285) #89
base: v4.1-internal
Are you sure you want to change the base?
Conversation
Co-authored-by: kphilpark <[email protected]>
Co-authored-by: kphilpark <[email protected]>
Co-authored-by: kphilpark <[email protected]>
Co-authored-by: kphilpark <[email protected]>
Co-authored-by: jeongin-yun <[email protected]> Co-authored-by: jeongin-yun <[email protected]>
/test |
https://github.com/furiosa-ai/inference/actions/runs/10756996183 JSON Content:bert_compare_result_int8.json: bert_f1_score_int8.json: bert_compare_result_fp8.json: bert_f1_score_fp8.json: qgpt_j_compare_result_int8.json: qgpt_j_compare_result_fp8.json: llama2-70b_compare_result_int8.json: |
/test |
https://github.com/furiosa-ai/inference/actions/runs/10757015122 JSON Content:bert_compare_result_int8.json: [
{
"index": 0,
"status": "PASS",
"inp_text": "what tool has measured the amount of dust that travels from the sahara to the amazon? nasa's calipso satellite has measured the amount of dust transported by wind from the sahara to the amazon : an average 182 million tons of dust are windblown out of the sahara each year, at 15 degrees west longitude, across 1, 600 miles ( 2, 600 km ) over the atlantic ocean ( some dust falls into the atlantic ), then at 35 degrees west longitude at the eastern coast of south america, 27. 7 million tons ( 15 % ) of dust fall over the amazon basin, 132 million tons of dust remain in the air, 43 million tons of dust are windblown and falls on the caribbean sea, past 75 degrees west longitude.",
"generated_sentence": "calipso satellite"
}
] bert_f1_score_int8.json: {
"date": "2024-09-08 04:39:41",
"count": 100,
"f1_score": 94.86666666666667,
"reference_f1_score": 94.86666,
}
bert_compare_result_fp8.json: [
{
"index": 0,
"status": "PASS",
"inp_text": "what tool has measured the amount of dust that travels from the sahara to the amazon? nasa's calipso satellite has measured the amount of dust transported by wind from the sahara to the amazon : an average 182 million tons of dust are windblown out of the sahara each year, at 15 degrees west longitude, across 1, 600 miles ( 2, 600 km ) over the atlantic ocean ( some dust falls into the atlantic ), then at 35 degrees west longitude at the eastern coast of south america, 27. 7 million tons ( 15 % ) of dust fall over the amazon basin, 132 million tons of dust remain in the air, 43 million tons of dust are windblown and falls on the caribbean sea, past 75 degrees west longitude.",
"generated_sentence": "calipso satellite"
}
] bert_f1_score_fp8.json: {
"date": "2024-09-08 04:42:05",
"count": 100,
"f1_score": 95.46666666666665,
"reference_f1_score": 95.46667,
}
qgpt_j_compare_result_int8.json: [
{
"index": 0,
"status": "PASS",
"generated_sentence": "Zully Broussard selflessly gives one of her kidneys to a stranger, and it results in six patients receiving transplants.\n\"The ages of the donors and recipients range from 26 to 70,\" California Pacific Medical Center says.",
"reference_sentence": "Zully Broussard selflessly gives one of her kidneys to a stranger, and it results in six patients receiving transplants.\n\"The ages of the donors and recipients range from 26 to 70,\" California Pacific Medical Center says."
},
{
"index": 1,
"status": "DIFFERENT",
"reference_sentence": "The MLS is about to celebrate its 20th season.\nIt has transformed from a fledgling league in 1996 to one of the world's most watched sporting leagues.\nBut the league has struggled to keep talent in the U.S.",
"generated_sentence": "The MLS is about to celebrate its 20th season.\nIt has transformed from a fledgling league into one of the world's most watched sporting leagues.\nBut the league has struggled to keep talent in the U.S.",
"differences": "--- +++ @@ -14,9 +14,7 @@ a fledgling league -in -1996 -to +into one of the"
},
{
"index": 2,
"status": "DIFFERENT",
"reference_sentence": "Bafetimbi Gomis says he is now \"feeling well\" after collapsing during a match.\nGomis fainted during Swansea's 3-2 loss at Tottenham in the Premier League.\nPlay was stopped for about five minutes before resuming.",
"generated_sentence": "Bafetimbi Gomis says he is \"feeling well\" after collapsing during Swansea's match.\nGomis spent the night in hospital as a precaution, the Welsh club said.\nPlay was stopped for five minutes before resuming.",
"differences": "--- +++ @@ -3,31 +3,30 @@ says he is -now \"feeling well\" after collapsing during -a +Swansea's match. Gomis -fainted -during -Swansea's -3-2 -loss -at -Tottenham +spent +the +night in +hospital +as +a +precaution, the -Premier -League. +Welsh +club +said. Play was stopped for -about five minutes before"
},
{
"index": 3,
"status": "PASS",
"generated_sentence": "Rory McIlroy drops his third shot on the eighth hole of the WGC Cadillac Championship.\nThe Northern Irishman hits the 3-iron used to play the offending shot into the water.\nMcIlroy jokes that the club \"must have went a good 60, 70 yards\"",
"reference_sentence": "Rory McIlroy drops his third shot on the eighth hole of the WGC Cadillac Championship.\nThe Northern Irishman hits the 3-iron used to play the offending shot into the water.\nMcIlroy jokes that the club \"must have went a good 60, 70 yards\""
},
{
"index": 4,
"status": "DIFFERENT",
"reference_sentence": "Hundreds of volunteers are searching for Cayman Naib, 13.\nHe was last seen wearing a gray down winter jacket, black ski pants and hiking boots.",
"generated_sentence": "Volunteers pass out fliers, canvass area near Radnor-Wayne, Pennsylvania.\nCayman Naib, 13, was last seen Wednesday.",
"differences": "--- +++ @@ -1,25 +1,16 @@ -Hundreds -of -volunteers -are -searching -for +Volunteers +pass +out +fliers, +canvass +area +near +Radnor-Wayne, +Pennsylvania. Cayman Naib, -13. -He +13, was last seen -wearing -a -gray -down -winter -jacket, -black -ski -pants -and -hiking -boots. +Wednesday."
},
{
"index": 5,
"status": "DIFFERENT",
"reference_sentence": "Curt Schilling recently fired off a series of fastballs at a group of Twitter trolls.\nThey had tweeted vulgar and sexually-explicit comments about his daughter.\nHe tracked them down and posted the images on his blog.",
"generated_sentence": "Curt Schilling recently fired off a series of fastballs at a group of Twitter trolls.\nThey had tweeted vulgar and sexually-explicit comments about his daughter.",
"differences": "--- +++ @@ -23,14 +23,3 @@ about his daughter. -He -tracked -them -down -and -posted -the -images -on -his -blog."
},
{
"index": 6,
"status": "PASS",
"generated_sentence": "Two American women arrested for carving their initials into a wall with a coin inside Rome's Colosseum.\nThe two letters -- J and N -- were about eight inches in length and scratched on a brick wall.",
"reference_sentence": "Two American women arrested for carving their initials into a wall with a coin inside Rome's Colosseum.\nThe two letters -- J and N -- were about eight inches in length and scratched on a brick wall."
},
{
"index": 7,
"status": "DIFFERENT",
"reference_sentence": "Prince and 3rdEyeGirl are bringing the Hit & Run Tour to the U.S. for the first time.\nThe first scheduled show will take place in Louisville, Kentucky.",
"generated_sentence": "Prince and 3rdEyeGirl are bringing the Hit & Run Tour to the U.S. for the first time.\nThe first scheduled show will take place in Louisville, Kentucky.\nPortions of the ticket sales will be donated to various Louisville charities.",
"differences": "--- +++ @@ -25,3 +25,15 @@ in Louisville, Kentucky. +Portions +of +the +ticket +sales +will +be +donated +to +various +Louisville +charities."
},
{
"index": 8,
"status": "DIFFERENT",
"reference_sentence": "One French, one Belgian and three Malians were killed, a hospital official says.\nAn additional eight people were wounded, he says.\nA North African jihadist group claims responsibility for the attack.",
"generated_sentence": "One French, one Belgian and three Malians were killed, a hospital official says.\nA North African jihadist group claims responsibility.\nMali's government calls the shooting a \"criminal and terrorist act\"",
"differences": "--- +++ @@ -11,21 +11,20 @@ hospital official says. -An -additional -eight -people -were -wounded, -he -says. A North African jihadist group claims -responsibility -for +responsibility. +Mali's +government +calls the -attack. +shooting +a +\"criminal +and +terrorist +act\""
},
{
"index": 9,
"status": "PASS",
"generated_sentence": "Manchester United defender Jonny Evans and Newcastle United striker Papiss Cisse charged by FA.\nBoth players accused of spitting during Premier League clash at St James' Park on Wednesday.\nEvans and Cisse have until 6pm GMT on Friday to respond to the charge.",
"reference_sentence": "Manchester United defender Jonny Evans and Newcastle United striker Papiss Cisse charged by FA.\nBoth players accused of spitting during Premier League clash at St James' Park on Wednesday.\nEvans and Cisse have until 6pm GMT on Friday to respond to the charge."
}
] qgpt_j_compare_result_fp8.json: [
{
"index": 0,
"status": "DIFFERENT",
"reference_sentence": "Zully Broussard selflessly gives one of her kidneys to a stranger, and it results in six patients getting transplants.\nThe chain of surgeries is to be wrapped up Friday.",
"generated_sentence": "Zully Broussard selflessly gives one of her kidneys to a stranger, and it results in six patients getting transplants.\n\"I thought I was going to help this one person who I don't know, but... the fact that so many people can have a life extension\"",
"differences": "--- +++ @@ -17,13 +17,29 @@ patients getting transplants. -The -chain -of -surgeries -is +\"I +thought +I +was +going to -be -wrapped -up -Friday. +help +this +one +person +who +I +don't +know, +but... +the +fact +that +so +many +people +can +have +a +life +extension\""
},
{
"index": 1,
"status": "DIFFERENT",
"reference_sentence": "Major League Soccer prepares to mark the beginning of its 20th season.\nIt has transformed from a fledgling league in 1996 to one of the world's most watched sporting leagues.\nMLS has a salary cap and a number of ways to get around it.",
"generated_sentence": "The MLS is about to enter its 20th season.\nIt has transformed from a fledgling league in 1996 to one of the world's most watched sporting leagues.\nBut the league has struggled to retain talent.",
"differences": "--- +++ @@ -1,12 +1,9 @@ -Major -League -Soccer -prepares +The +MLS +is +about to -mark -the -beginning -of +enter its 20th season. @@ -28,17 +25,11 @@ watched sporting leagues. -MLS +But +the +league has -a -salary -cap -and -a -number -of -ways +struggled to -get -around -it. +retain +talent."
},
{
"index": 2,
"status": "DIFFERENT",
"reference_sentence": "Swansea's Bafetimbi Gomis says he is \"feeling well\" after fainting during a match.\nThe French striker collapsed after about five minutes of treatment.\nGomis spent the night in hospital as a precaution.",
"generated_sentence": "Swansea's Bafetimbi Gomis says he is \"feeling well\" after fainting at Tottenham.\nGomis spent the night in hospital as a precaution, Swansea said.\nPlay was temporarily stopped before resuming.",
"differences": "--- +++ @@ -8,19 +8,8 @@ well\" after fainting -during -a -match. -The -French -striker -collapsed -after -about -five -minutes -of -treatment. +at +Tottenham. Gomis spent the @@ -29,4 +18,12 @@ hospital as a -precaution. +precaution, +Swansea +said. +Play +was +temporarily +stopped +before +resuming."
},
{
"index": 3,
"status": "DIFFERENT",
"reference_sentence": "Rory McIlroy drops his third shot on the eighth hole of the WGC Cadillac Championship.\nThe world's top-ranked golfer launches the 3-iron used for the shot into the water.\nMcIlroy's round of 70 leaves him one-under for the tournament.",
"generated_sentence": "Rory McIlroy drops his third shot on the eighth hole of the WGC Cadillac Championship.\nThe world No.1 launches the 3-iron used to play the offending shot into the water.\nMcIlroy finishes with a second round of 70 to leave him one-under for the tournament.",
"differences": "--- +++ @@ -14,24 +14,30 @@ Cadillac Championship. The -world's -top-ranked -golfer +world +No.1 launches the 3-iron used -for +to +play the +offending shot into the water. -McIlroy's +McIlroy +finishes +with +a +second round of 70 -leaves +to +leave him one-under for"
},
{
"index": 4,
"status": "PASS",
"generated_sentence": "Hundreds of volunteers are searching for an eighth-grader missing since Wednesday.\nCayman Naib, 13, was last seen in Radnor-Wayne, Pennsylvania.",
"reference_sentence": "Hundreds of volunteers are searching for an eighth-grader missing since Wednesday.\nCayman Naib, 13, was last seen in Radnor-Wayne, Pennsylvania."
},
{
"index": 5,
"status": "DIFFERENT",
"reference_sentence": "Curt Schilling fired off a series of fastballs and mowed down a group of Twitter trolls.\nThey made the mistake of tweeting vulgar and sexually-explicit comments about his daughter.",
"generated_sentence": "Curt Schilling recently fired off a series of fastballs at a group of Twitter trolls.\nTheir tweets about his daughter, Gabby, were vulgar and sexually-explicit.",
"differences": "--- +++ @@ -1,29 +1,25 @@ Curt Schilling +recently fired off a series of fastballs -and -mowed -down +at a group of Twitter trolls. -They -made -the -mistake -of -tweeting +Their +tweets +about +his +daughter, +Gabby, +were vulgar and -sexually-explicit -comments -about -his -daughter. +sexually-explicit."
},
{
"index": 6,
"status": "PASS",
"generated_sentence": "Two American women arrested for carving their initials into a wall with a coin inside Rome's Colosseum.\nThe two letters -- J and N -- were about eight inches in length and scratched on a brick wall.",
"reference_sentence": "Two American women arrested for carving their initials into a wall with a coin inside Rome's Colosseum.\nThe two letters -- J and N -- were about eight inches in length and scratched on a brick wall."
},
{
"index": 7,
"status": "PASS",
"generated_sentence": "Prince and 3rdEyeGirl are bringing the Hit & Run Tour to the U.S.\nThe first scheduled show will take place in Louisville, Kentucky.\nPortions of the ticket sales will be donated to various Louisville charities.",
"reference_sentence": "Prince and 3rdEyeGirl are bringing the Hit & Run Tour to the U.S.\nThe first scheduled show will take place in Louisville, Kentucky.\nPortions of the ticket sales will be donated to various Louisville charities."
},
{
"index": 8,
"status": "PASS",
"generated_sentence": "One French, one Belgian and three Malians were killed, a hospital official says.\nAn additional eight people were wounded.\nA North African jihadist group claims responsibility for the attack.",
"reference_sentence": "One French, one Belgian and three Malians were killed, a hospital official says.\nAn additional eight people were wounded.\nA North African jihadist group claims responsibility for the attack."
},
{
"index": 9,
"status": "DIFFERENT",
"reference_sentence": "Manchester United's Jonny Evans and Newcastle's Papiss Cisse charged by FA.\nBoth players accused of spitting during Premier League clash on Wednesday.\nEvans and Cisse have until 6pm GMT on Friday to respond to charge.",
"generated_sentence": "Manchester United defender Jonny Evans and Newcastle United striker Papiss Cisse charged by FA.\nThe pair allegedly spat at each other during Wednesday night's Premier League game.\nBoth players have until 6pm GMT on Friday to respond to the charge.",
"differences": "--- +++ @@ -1,28 +1,32 @@ Manchester -United's +United +defender Jonny Evans and -Newcastle's +Newcastle +United +striker Papiss Cisse charged by FA. +The +pair +allegedly +spat +at +each +other +during +Wednesday +night's +Premier +League +game. Both players -accused -of -spitting -during -Premier -League -clash -on -Wednesday. -Evans -and -Cisse have until 6pm @@ -32,4 +36,5 @@ to respond to +the charge."
}
] llama2-70b_compare_result_int8.json: [
{
"index": 0,
"status": "DIFFERENT",
"reference_sentence": "To answer this question, we need to analyze the given sentence and determine if it provides enough information to support the conclusion that the woman's purse has red flowers on it.\n\nThe sentence \"A woman with a fairy tattoo on her back is carrying a purse with a red floral print\" contains two parts:\n\n1. The woman has a fairy tattoo on her back.\n2. The woman is carrying a purse with a red floral print.\n\nFrom the first part, we can infer that the woman has a tattoo on her back, but we cannot deduce any information about her purse from this part.\n\nFrom the second part, we know that the woman is carrying a purse with a red floral print. However, this part does not explicitly mention anything about red flowers on the purse.\n\nNow, let's consider the possible interpretations of the phrase \"red floral print\":\n\n1. The print could be a pattern of red flowers on the purse.\n2. The print could be a red background with floral patterns (not necessarily flowers) on the purse.\n3. The print could be a red floral pattern that does not include actual flowers, but rather a design that incorporates red and floral elements.\n\nGiven these possibilities, we cannot conclusively say that the woman's purse has red flowers on it. The sentence does not provide enough information to support this conclusion. Therefore, the answer is:\n\n2. It is not possible to tell.\n\nIn summary, the sentence does not explicitly mention red flowers on the purse, and the phrase \"red floral print\" can be interpreted in various ways that do not necessarily include red flowers. Therefore, we cannot conclude that the woman's purse has red flowers on it.",
"generated_sentence": "To answer this question, we need to analyze the given sentence and determine if it provides enough information to support the conclusion that the woman's purse has red flowers on it.\n\nThe sentence \"A woman with a fairy tattoo on her back is carrying a purse with a red floral print\" contains two parts:\n\n1. The woman has a fairy tattoo on her back.\n2. The woman is carrying a purse with a red floral print.\n\nThe first part of the sentence does not provide any information about the purse or its contents. It only tells us that the woman has a tattoo on her back.\n\nThe second part of the sentence mentions the purse and its red floral print, but it does not explicitly state that the purse has red flowers on it. It only tells us that the purse has a red floral print.\n\nTo conclude that the woman's purse has red flowers on it, we would need additional information that connects the red floral print to the contents of the purse. Without further context or information, we cannot assume that the red floral print is a representation of red flowers inside the purse.\n\n Therefore, the answer is:\n\nIt is not possible to tell.\n\nWe cannot conclude that the woman's purse has red flowers on it based solely on the information provided in the sentence. We need more information to support this conclusion.",
"differences": "--- +++ @@ -74,13 +74,27 @@ red floral print. -From -the +The first -part, -we -can -infer +part +of +the +sentence +does +not +provide +any +information +about +the +purse +or +its +contents. +It +only +tells +us that the woman @@ -89,125 +103,110 @@ tattoo on her -back, +back. +The +second +part +of +the +sentence +mentions +the +purse +and +its +red +floral +print, but -we -cannot -deduce -any -information -about -her -purse -from -this -part. -From -the -second -part, -we -know -that -the -woman -is -carrying -a -purse -with -a -red -floral -print. -However, -this -part +it does not explicitly -mention -anything -about -red -flowers -on +state +that +the +purse +has +red +flowers +on +it. +It +only +tells +us +that +the +purse +has +a +red +floral +print. +To +conclude +that +the +woman's +purse +has +red +flowers +on +it, +we +would +need +additional +information +that +connects +the +red +floral +print +to +the +contents +of the purse. -Now, -let's -consider -the -possible -interpretations -of -the -phrase -\"red -floral -print\": -1. -The -print -could -be -a -pattern -of -red -flowers -on -the -purse. -2. -The -print -could -be -a -red -background -with -floral -patterns -(not -necessarily -flowers) -on -the -purse. -3. -The -print -could -be -a -red -floral -pattern -that -does -not -include -actual -flowers, -but -rather -a -design -that -incorporates -red -and -floral -elements. -Given -these -possibilities, +Without +further +context +or +information, we cannot -conclusively -say +assume +that +the +red +floral +print +is +a +representation +of +red +flowers +inside +the +purse. +Therefore, +the +answer +is: +It +is +not +possible +to +tell. +We +cannot +conclude that the woman's @@ -216,71 +215,21 @@ red flowers on -it. -The -sentence -does -not -provide -enough +it +based +solely +on +the +information +provided +in +the +sentence. +We +need +more information to support this conclusion. -Therefore, -the -answer -is: -2. -It -is -not -possible -to -tell. -In -summary, -the -sentence -does -not -explicitly -mention -red -flowers -on -the -purse, -and -the -phrase -\"red -floral -print\" -can -be -interpreted -in -various -ways -that -do -not -necessarily -include -red -flowers. -Therefore, -we -cannot -conclude -that -the -woman's -purse -has -red -flowers -on -it."
}
] |
https://github.com/furiosa-ai/inference/actions/runs/10757015122 JSON Content:llama2-70b_compare_result_fp8.json: [
{
"index": 0,
"status": "DIFFERENT",
"reference_sentence": "The given sentence is: \"A woman with a fairy tattoo on her back is carrying a purse with a red floral print.\"\n\nTo conclude that \"The woman's purse has red flowers on it,\" we need to analyze the given sentence carefully.\n\n1. The sentence mentions a woman with a fairy tattoo on her back. This information does not directly relate to the color or design of her purse.\n2. The sentence mentions that the purse has a red floral print. This implies that the purse has a design or pattern featuring red flowers.\n\nNow, let's consider the options:\n\n1. Yes: This option suggests that we can conclude that the woman's purse has red flowers on it. This is a valid conclusion based on the information provided in the sentence.\n2. It is not possible to tell: This option suggests that we cannot determine whether the woman's purse has red flowers on it or not. However, based on the information provided, we can infer that the purse has a red floral print, which implies that it does have red flowers on it. Therefore, this option is not accurate.\n3. No: This option suggests that the woman's purse does not have red flowers on it. However, this contradicts the information provided in the sentence, which mentions a red floral print on the purse. Therefore, this option is not accurate.\n\nBased on the analysis above, the correct answer is:\n\n1. Yes, we can conclude that the woman's purse has red flowers on it.",
"generated_sentence": "The given sentence is: \"A woman with a fairy tattoo on her back is carrying a purse with a red floral print.\"\n\nTo conclude that \"The woman's purse has red flowers on it,\" we need to analyze the given sentence carefully.\n\n1. The sentence mentions a woman with a fairy tattoo on her back. This information does not directly relate to the color or design of her purse.\n2. The sentence mentions that the purse has a red floral print. This implies that the purse has a design or pattern with red flowers.\n\nNow, let's consider the options:\n\n1. Yes: This option suggests that we can conclude that the woman's purse has red flowers on it. This is a valid conclusion based on the information provided in the sentence.\n2. It is not possible to tell: This option suggests that we cannot determine whether the woman's purse has red flowers on it or not. However, based on the information provided, we can indeed make a conclusion about the purse's design, so this option is incorrect.\n3. No: This option suggests that the woman's purse does not have red flowers on it. However, this contradicts the information provided in the sentence, which states that the purse has a red floral print.\n\nBased on the analysis, the correct answer is:\n\n1. Yes, we can conclude that the woman's purse has red flowers on it.",
"differences": "--- +++ @@ -88,7 +88,7 @@ design or pattern -featuring +with red flowers. Now, @@ -160,31 +160,19 @@ provided, we can -infer -that -the -purse -has -a -red -floral -print, -which -implies -that -it -does -have -red -flowers -on -it. -Therefore, +indeed +make +a +conclusion +about +the +purse's +design, +so this option is -not -accurate. +incorrect. 3. No: This @@ -211,25 +199,19 @@ the sentence, which -mentions +states +that +the +purse +has a red floral -print -on -the -purse. -Therefore, -this -option -is -not -accurate. +print. Based on the -analysis -above, +analysis, the correct answer"
}
] |
Inference CI merge 후 업데이트 될 PR 입니다.
mcp main 715e78b 커밋 기준으로 돌아가도록 코드 일부를 업데이트 하였습니다.