Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialize_row in fm_data_tasks/fm_data_tasks/utils/data_utils.py #9

Open
duongnhatthang opened this issue Apr 12, 2023 · 1 comment

Comments

@duongnhatthang
Copy link

I tried the dry run 10 examples cmd in the readme and got the error:

python3 -m fm_data_tasks.run_inference \
    --dry_run \
    --num_run 10 \
    --k 3 \
    --sample_method random \
    --data_dir data/datasets/entity_matching/structured/Fodors-Zagats
[04/11/23 17:50:21] INFO     2023-04-11 17:50:21,823 [run_inference] [INFO] {                     run_inference.py:145
                                 "data_dir":                                                                          
                             "/Users/thangduong/fm_data_tasks/data/datasets/entity_matching/struc                     
                             tured/Fodors-Zagats",                                                                    
                                 "output_dir": "outputs",                                                             
                                 "cache_name": "sqlite",                                                              
                                 "cache_connection": "fm_data_tasks.sqlite",                                          
                                 "client_name": "openai",                                                             
                                 "client_connection": null,                                                           
                                 "run_tag": "default",                                                                
                                 "overwrite_cache": false,                                                            
                                 "k": 3,                                                                              
                                 "sample_method": "random",                                                           
                                 "seed": 1234,                                                                        
                                 "class_balanced": false,                                                             
                                 "sep_tok": ".",                                                                      
                                 "nan_tok": "nan",                                                                    
                                 "num_run": 10,                                                                       
                                 "num_trials": 1,                                                                     
                                 "num_print": 10,                                                                     
                                 "add_task_instruction": false,                                                       
                                 "task_instruction_idx": 0,                                                           
                                 "do_test": false,                                                                    
                                 "dry_run": true,                                                                     
                                 "stop_token": "\n",                                                                  
                                 "temperature": 0.0,                                                                  
                                 "max_tokens": 3                                                                      
                             }                                                                                        
                    INFO     2023-04-11 17:50:21,827 [data_utils] [INFO] Processing                  data_utils.py:330
                             /Users/thangduong/fm_data_tasks/data/datasets/entity_matching/structure                  
                             d/Fodors-Zagats                                                                          
Traceback (most recent call last):
  File "/opt/miniconda3/envs/fm_data_tasks/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/miniconda3/envs/fm_data_tasks/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/thangduong/fm_data_tasks/fm_data_tasks/run_inference.py", line 297, in <module>
    main()
  File "/Users/thangduong/fm_data_tasks/fm_data_tasks/run_inference.py", line 153, in main
    pd_data_files = data_utils.read_data(
  File "/Users/thangduong/fm_data_tasks/fm_data_tasks/utils/data_utils.py", line 463, in read_data
    data_files_sep, label_col = read_raw_data(
  File "/Users/thangduong/fm_data_tasks/fm_data_tasks/utils/data_utils.py", line 442, in read_raw_data
    data_files_sep["train"] = read_data_func(train_file)
  File "/Users/thangduong/fm_data_tasks/fm_data_tasks/utils/data_utils.py", line 175, in read_blocked_pairs
    merged["text"] = merged.apply(
  File "/opt/miniconda3/envs/fm_data_tasks/lib/python3.10/site-packages/pandas/core/frame.py", line 9568, in apply
    return op.apply().__finalize__(self, method="apply")
  File "/opt/miniconda3/envs/fm_data_tasks/lib/python3.10/site-packages/pandas/core/apply.py", line 764, in apply
    return self.apply_standard()
  File "/opt/miniconda3/envs/fm_data_tasks/lib/python3.10/site-packages/pandas/core/apply.py", line 891, in apply_standard
    results, res_index = self.apply_series_generator()
  File "/opt/miniconda3/envs/fm_data_tasks/lib/python3.10/site-packages/pandas/core/apply.py", line 907, in apply_series_generator
    results[i] = self.f(v)
  File "/Users/thangduong/fm_data_tasks/fm_data_tasks/utils/data_utils.py", line 176, in <lambda>
    lambda row: serialize_match_pair(
  File "/Users/thangduong/fm_data_tasks/fm_data_tasks/utils/data_utils.py", line 56, in serialize_match_pair
    f"{prod_name} A is {serialize_row(row, column_mapA, sep_tok, nan_tok)}."
  File "/Users/thangduong/fm_data_tasks/fm_data_tasks/utils/data_utils.py", line 36, in serialize_row
    row[c_og] = f"{row[c_og].strip()}"
AttributeError: 'int' object has no attribute 'strip'
@nlper-hou
Copy link

这个只需要把代码改一下就行:
原来:row[c_og] = f"{row[c_og].strip()}"
修改后:row[c_og] = f"{str(row[c_og]).strip()}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants