Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问如何将BIO格式标注的数据转为模型训练输入数据的格式? #31

Open
zjdcsu opened this issue May 11, 2024 · 1 comment

Comments

@zjdcsu
Copy link

zjdcsu commented May 11, 2024

通用的BIO标注格式一般为
"""
我们 O
看到 O
债券 B-Fin-Concept
市场 I-Fin-Concept
收益率 I-Fin-Concept
相比 O
去年 O
已经 O
有 O
明显 O
的 O
上涨 O
, O
这 O
种 O
情况 O
还 O
会 O
延续 O
。 O
"""
而duie输入到模型中的数据格式为:
"""
{"id": "AT0001", "text": ["6", "2", "号", "汽", "车", "故", "障", "报", "告", "综", "合", "情", "况", ":", "故", "障", "现", "象", ":", "加", "速", "后", ",", "丢", "开", "油", "门", ",", "发", "动", "机", "熄", "火", "。"], "labels": ["O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "B-故障设备", "I-故障设备", "I-故障设备", "B-故障原因", "I-故障原因", "O"]}
"""
并且预处理代码的输入数据格式也和BIO格式不一样,
请问怎么样才能将BIO格式的数据转换为模型训练输入的格式?

@taishan1994
Copy link
Owner

转换成一个字对应一个标签。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants