Skip to content

Commit

Permalink
HW3 env, data and training (eval has problems)
Browse files Browse the repository at this point in the history
  • Loading branch information
boewoei0123 committed May 17, 2021
1 parent f0439ce commit 38dd821
Show file tree
Hide file tree
Showing 20 changed files with 49,311 additions and 16 deletions.
5,494 changes: 5,494 additions & 0 deletions HW3/data/public.jsonl

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions HW3/data/sample_submission.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{"title": "Anker新款真無線藍牙耳機Liberty Air 2 Pro 引進台灣市場", "id": "21710"}
{"title": "藍染、客家美食、舊山線自行車 「苗栗一日遊」超人氣美食美景", "id": "21711"}
{"title": "華碩打造對應軍規防護與2 in 1設計的15.6吋Chromebook", "id": "21712"}
{"title": "產業發展變革 台灣的優勢與機會", "id": "21713"}
{"title": "全球Windows 7裝置粗估至少還有1億台以上 市佔率穩穩卡在20%", "id": "21714"}
{"title": "強勢台幣理財攻略", "id": "21715"}
{"title": "「不需治療,只需到台灣!」 美國「哈台馬克杯」賣到缺貨", "id": "21716"}
{"title": "ZenBook Duo 14、ZenBook Pro Duo 15 OLED更新 第二螢幕更方便使用了", "id": "21717"}
{"title": "不出國更要怒吃雞加酒!「周末炸虎俱樂部」限量開賣", "id": "21718"}
{"title": "NBA/紐約記者爆料厄文狀況 「不喜歡奈許、跟KD疏離」", "id": "21719"}
10 changes: 10 additions & 0 deletions HW3/data/sample_test.jsonl

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions HW3/data/split.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import numpy as np
np.random.seed(1114)

with open("train.jsonl", 'r') as f:
data = f.readlines()

all_indices = np.random.permutation(np.arange(len(data)))
cut = int(len(data) * 0.2)
with open("train_split.jsonl", 'w') as f:
for i in all_indices[cut:].tolist():
f.write(data[i])
with open("valid_split.jsonl", 'w') as f:
for i in all_indices[:cut].tolist():
f.write(data[i])
21,710 changes: 21,710 additions & 0 deletions HW3/data/train.jsonl

Large diffs are not rendered by default.

Loading

0 comments on commit 38dd821

Please sign in to comment.