Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

你好,请问一下,我在参考LoFTR训练megadepth中, 该如何设置从LoFTR云盘train-data文件中的cfg_1513_-1_0.2_0.8_0.15_reduced_v2文件的路径 #22

Open
chen9run opened this issue Apr 15, 2023 · 15 comments

Comments

@chen9run
Copy link

请问cfg_1513_-1_0.2_0.8_0.15_reduced_v2文件放在那个路径下,十分感谢!

@Tangshitao
Copy link
Owner

Tangshitao commented Apr 17, 2023

Can you provide more details of your problems?

@chen9run
Copy link
Author

chen9run commented Apr 17, 2023

你能提供更多关于你的问题的细节吗?

十分感谢唐博的回复,下面是我在训练时遇到的错误,我的环境cuda==10.2,pytorch==1.8.1, pytorch-lightning==1.3.5, python==3.8, torchmetrics==0.6.0, torchvision==0.2.1 ,debug似乎时数据集加载时部分图片加载不到,个人感觉也不太像路径设置错误,可以请您帮我看一下数据集是否放置正确,是否有所缺失,或者能够给予任何意见都将对我产生极大的帮助,感谢!
注:Undistorted_SfM文件下的0015和0022文件夹中的depths是由phoenix/S6/zl548/MegaDepth_v1/下对应的0015和0022拷贝得到

/home/roane/anaconda3/envs/loftr/bin/python /media/MegaDepth/LoFTR/train.py configs/data/megadepth_trainval_640.py configs/loftr/outdoor/loftr_ds_dense.py --exp_name=outdoor-ds-640-bs=1 --gpus=1 --num_nodes=1 --accelerator=ddp --batch_size=1 --num_workers=16 --pin_memory=true --check_val_every_n_epoch=1 --log_every_n_steps=1 --flush_logs_every_n_steps=1 --limit_val_batches=1. --num_sanity_val_steps=10 --benchmark=True
{'accelerator': 'ddp',
'accumulate_grad_batches': 1,
'amp_backend': 'native',
'amp_level': 'O2',
'auto_lr_find': False,
'auto_scale_batch_size': False,
'auto_select_gpus': False,
'batch_size': 1,
'benchmark': True,
'check_val_every_n_epoch': 1,
'checkpoint_callback': True,
'ckpt_path': None,
'data_cfg_path': 'configs/data/megadepth_trainval_640.py',
'default_root_dir': None,
'deterministic': False,
'disable_ckpt': False,
'distributed_backend': None,
'exp_name': 'outdoor-ds-640-bs=1',
'fast_dev_run': False,
'flush_logs_every_n_steps': 1,
'gpus': 1,
'gradient_clip_algorithm': 'norm',
'gradient_clip_val': 0.0,
'limit_predict_batches': 1.0,
'limit_test_batches': 1.0,
'limit_train_batches': 1.0,
'limit_val_batches': 1.0,
'log_every_n_steps': 1,
'log_gpu_memory': None,
'logger': True,
'main_cfg_path': 'configs/loftr/outdoor/loftr_ds_dense.py',
'max_epochs': None,
'max_steps': None,
'max_time': None,
'min_epochs': None,
'min_steps': None,
'move_metrics_to_cpu': False,
'multiple_trainloader_mode': 'max_size_cycle',
'num_nodes': 1,
'num_processes': 1,
'num_sanity_val_steps': 10,
'num_workers': 1,
'overfit_batches': 0.0,
'parallel_load_data': False,
'pin_memory': True,
'plugins': None,
'precision': 32,
'prepare_data_per_node': True,
'process_position': 0,
'profiler': None,
'profiler_name': None,
'progress_bar_refresh_rate': None,
'reload_dataloaders_every_epoch': False,
'replace_sampler_ddp': True,
'resume_from_checkpoint': None,
'stochastic_weight_avg': False,
'sync_batchnorm': False,
'terminate_on_nan': False,
'tpu_cores': None,
'track_grad_norm': -1,
'truncated_bptt_steps': None,
'val_check_interval': 1.0,
'weights_save_path': None,
'weights_summary': 'top'}
Global seed set to 66
2023-04-17 14:43:41.645 | INFO | main:main:82 - LoFTR LightningModule initialized!
2023-04-17 14:43:41.646 | INFO | main:main:86 - LoFTR DataModule initialized!
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
2023-04-17 14:43:42.517 | INFO | main:main:117 - Trainer initialized!
2023-04-17 14:43:42.517 | INFO | main:main:118 - Start training!
Global seed set to 66
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
2023-04-17 14:43:43.911 | INFO | src.lightning.data:setup:118 - [rank:0] world_size: 1
2023-04-17 14:43:43.912 | INFO | src.lightning.data:_setup_dataset:184 - [rank 0]: 368 scene(s) assigned.
[rank:0] loading train datasets: 100%|██████████| 368/368 [00:18<00:00, 20.15it/s]
2023-04-17 14:44:02.172 | INFO | src.lightning.data:_setup_dataset:184 - [rank 0]: 5 scene(s) assigned.
[rank:0] loading val datasets: 100%|██████████| 5/5 [00:00<00:00, 625.44it/s]
2023-04-17 14:44:02.180 | INFO | src.lightning.data:setup:156 - [rank:0] Train & Val Dataset loaded!
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

| Name                                    | Type                    | Params

0 | matcher | LoFTR | 11.6 M
1 | matcher.backbone | ResNetFPN_8_2 | 5.9 M
2 | matcher.backbone.conv1 | Conv2d | 6.3 K
3 | matcher.backbone.bn1 | SyncBatchNorm | 256
4 | matcher.backbone.relu | ReLU | 0
5 | matcher.backbone.layer1 | Sequential | 590 K
6 | matcher.backbone.layer1.0 | BasicBlock | 295 K
7 | matcher.backbone.layer1.0.conv1 | Conv2d | 147 K
8 | matcher.backbone.layer1.0.conv2 | Conv2d | 147 K
9 | matcher.backbone.layer1.0.bn1 | SyncBatchNorm | 256
10 | matcher.backbone.layer1.0.bn2 | SyncBatchNorm | 256
11 | matcher.backbone.layer1.0.relu | ReLU | 0
12 | matcher.backbone.layer1.1 | BasicBlock | 295 K
13 | matcher.backbone.layer1.1.conv1 | Conv2d | 147 K
14 | matcher.backbone.layer1.1.conv2 | Conv2d | 147 K
15 | matcher.backbone.layer1.1.bn1 | SyncBatchNorm | 256
16 | matcher.backbone.layer1.1.bn2 | SyncBatchNorm | 256
17 | matcher.backbone.layer1.1.relu | ReLU | 0
18 | matcher.backbone.layer2 | Sequential | 1.3 M
19 | matcher.backbone.layer2.0 | BasicBlock | 597 K
20 | matcher.backbone.layer2.0.conv1 | Conv2d | 225 K
21 | matcher.backbone.layer2.0.conv2 | Conv2d | 345 K
22 | matcher.backbone.layer2.0.bn1 | SyncBatchNorm | 392
23 | matcher.backbone.layer2.0.bn2 | SyncBatchNorm | 392
24 | matcher.backbone.layer2.0.relu | ReLU | 0
25 | matcher.backbone.layer2.0.downsample | Sequential | 25.5 K
26 | matcher.backbone.layer2.0.downsample.0 | Conv2d | 25.1 K
27 | matcher.backbone.layer2.0.downsample.1 | SyncBatchNorm | 392
28 | matcher.backbone.layer2.1 | BasicBlock | 692 K
29 | matcher.backbone.layer2.1.conv1 | Conv2d | 345 K
30 | matcher.backbone.layer2.1.conv2 | Conv2d | 345 K
31 | matcher.backbone.layer2.1.bn1 | SyncBatchNorm | 392
32 | matcher.backbone.layer2.1.bn2 | SyncBatchNorm | 392
33 | matcher.backbone.layer2.1.relu | ReLU | 0
34 | matcher.backbone.layer3 | Sequential | 2.3 M
35 | matcher.backbone.layer3.0 | BasicBlock | 1.1 M
36 | matcher.backbone.layer3.0.conv1 | Conv2d | 451 K
37 | matcher.backbone.layer3.0.conv2 | Conv2d | 589 K
38 | matcher.backbone.layer3.0.bn1 | SyncBatchNorm | 512
39 | matcher.backbone.layer3.0.bn2 | SyncBatchNorm | 512
40 | matcher.backbone.layer3.0.relu | ReLU | 0
41 | matcher.backbone.layer3.0.downsample | Sequential | 50.7 K
42 | matcher.backbone.layer3.0.downsample.0 | Conv2d | 50.2 K
43 | matcher.backbone.layer3.0.downsample.1 | SyncBatchNorm | 512
44 | matcher.backbone.layer3.1 | BasicBlock | 1.2 M
45 | matcher.backbone.layer3.1.conv1 | Conv2d | 589 K
46 | matcher.backbone.layer3.1.conv2 | Conv2d | 589 K
47 | matcher.backbone.layer3.1.bn1 | SyncBatchNorm | 512
48 | matcher.backbone.layer3.1.bn2 | SyncBatchNorm | 512
49 | matcher.backbone.layer3.1.relu | ReLU | 0
50 | matcher.backbone.layer3_outconv | Conv2d | 65.5 K
51 | matcher.backbone.layer2_outconv | Conv2d | 50.2 K
52 | matcher.backbone.layer2_outconv2 | Sequential | 1.0 M
53 | matcher.backbone.layer2_outconv2.0 | Conv2d | 589 K
54 | matcher.backbone.layer2_outconv2.1 | SyncBatchNorm | 512
55 | matcher.backbone.layer2_outconv2.2 | LeakyReLU | 0
56 | matcher.backbone.layer2_outconv2.3 | Conv2d | 451 K
57 | matcher.backbone.layer1_outconv | Conv2d | 25.1 K
58 | matcher.backbone.layer1_outconv2 | Sequential | 571 K
59 | matcher.backbone.layer1_outconv2.0 | Conv2d | 345 K
60 | matcher.backbone.layer1_outconv2.1 | SyncBatchNorm | 392
61 | matcher.backbone.layer1_outconv2.2 | LeakyReLU | 0
62 | matcher.backbone.layer1_outconv2.3 | Conv2d | 225 K
63 | matcher.pos_encoding | PositionEncodingSine | 0
64 | matcher.loftr_coarse | LocalFeatureTransformer | 5.3 M
65 | matcher.loftr_coarse.layers | ModuleList | 5.3 M
66 | matcher.loftr_coarse.layers.0 | LoFTREncoderLayer | 656 K
67 | matcher.loftr_coarse.layers.0.q_proj | Linear | 65.5 K
68 | matcher.loftr_coarse.layers.0.k_proj | Linear | 65.5 K
69 | matcher.loftr_coarse.layers.0.v_proj | Linear | 65.5 K
70 | matcher.loftr_coarse.layers.0.attention | LinearAttention | 0
71 | matcher.loftr_coarse.layers.0.merge | Linear | 65.5 K
72 | matcher.loftr_coarse.layers.0.mlp | Sequential | 393 K
73 | matcher.loftr_coarse.layers.0.mlp.0 | Linear | 262 K
74 | matcher.loftr_coarse.layers.0.mlp.1 | ReLU | 0
75 | matcher.loftr_coarse.layers.0.mlp.2 | Linear | 131 K
76 | matcher.loftr_coarse.layers.0.norm1 | LayerNorm | 512
77 | matcher.loftr_coarse.layers.0.norm2 | LayerNorm | 512
78 | matcher.loftr_coarse.layers.1 | LoFTREncoderLayer | 656 K
79 | matcher.loftr_coarse.layers.1.q_proj | Linear | 65.5 K
80 | matcher.loftr_coarse.layers.1.k_proj | Linear | 65.5 K
81 | matcher.loftr_coarse.layers.1.v_proj | Linear | 65.5 K
82 | matcher.loftr_coarse.layers.1.attention | LinearAttention | 0
83 | matcher.loftr_coarse.layers.1.merge | Linear | 65.5 K
84 | matcher.loftr_coarse.layers.1.mlp | Sequential | 393 K
85 | matcher.loftr_coarse.layers.1.mlp.0 | Linear | 262 K
86 | matcher.loftr_coarse.layers.1.mlp.1 | ReLU | 0
87 | matcher.loftr_coarse.layers.1.mlp.2 | Linear | 131 K
88 | matcher.loftr_coarse.layers.1.norm1 | LayerNorm | 512
89 | matcher.loftr_coarse.layers.1.norm2 | LayerNorm | 512
90 | matcher.loftr_coarse.layers.2 | LoFTREncoderLayer | 656 K
91 | matcher.loftr_coarse.layers.2.q_proj | Linear | 65.5 K
92 | matcher.loftr_coarse.layers.2.k_proj | Linear | 65.5 K
93 | matcher.loftr_coarse.layers.2.v_proj | Linear | 65.5 K
94 | matcher.loftr_coarse.layers.2.attention | LinearAttention | 0
95 | matcher.loftr_coarse.layers.2.merge | Linear | 65.5 K
96 | matcher.loftr_coarse.layers.2.mlp | Sequential | 393 K
97 | matcher.loftr_coarse.layers.2.mlp.0 | Linear | 262 K
98 | matcher.loftr_coarse.layers.2.mlp.1 | ReLU | 0
99 | matcher.loftr_coarse.layers.2.mlp.2 | Linear | 131 K
100 | matcher.loftr_coarse.layers.2.norm1 | LayerNorm | 512
101 | matcher.loftr_coarse.layers.2.norm2 | LayerNorm | 512
102 | matcher.loftr_coarse.layers.3 | LoFTREncoderLayer | 656 K
103 | matcher.loftr_coarse.layers.3.q_proj | Linear | 65.5 K
104 | matcher.loftr_coarse.layers.3.k_proj | Linear | 65.5 K
105 | matcher.loftr_coarse.layers.3.v_proj | Linear | 65.5 K
106 | matcher.loftr_coarse.layers.3.attention | LinearAttention | 0
107 | matcher.loftr_coarse.layers.3.merge | Linear | 65.5 K
108 | matcher.loftr_coarse.layers.3.mlp | Sequential | 393 K
109 | matcher.loftr_coarse.layers.3.mlp.0 | Linear | 262 K
110 | matcher.loftr_coarse.layers.3.mlp.1 | ReLU | 0
111 | matcher.loftr_coarse.layers.3.mlp.2 | Linear | 131 K
112 | matcher.loftr_coarse.layers.3.norm1 | LayerNorm | 512
113 | matcher.loftr_coarse.layers.3.norm2 | LayerNorm | 512
114 | matcher.loftr_coarse.layers.4 | LoFTREncoderLayer | 656 K
115 | matcher.loftr_coarse.layers.4.q_proj | Linear | 65.5 K
116 | matcher.loftr_coarse.layers.4.k_proj | Linear | 65.5 K
117 | matcher.loftr_coarse.layers.4.v_proj | Linear | 65.5 K
118 | matcher.loftr_coarse.layers.4.attention | LinearAttention | 0
119 | matcher.loftr_coarse.layers.4.merge | Linear | 65.5 K
120 | matcher.loftr_coarse.layers.4.mlp | Sequential | 393 K
121 | matcher.loftr_coarse.layers.4.mlp.0 | Linear | 262 K
122 | matcher.loftr_coarse.layers.4.mlp.1 | ReLU | 0
123 | matcher.loftr_coarse.layers.4.mlp.2 | Linear | 131 K
124 | matcher.loftr_coarse.layers.4.norm1 | LayerNorm | 512
125 | matcher.loftr_coarse.layers.4.norm2 | LayerNorm | 512
126 | matcher.loftr_coarse.layers.5 | LoFTREncoderLayer | 656 K
127 | matcher.loftr_coarse.layers.5.q_proj | Linear | 65.5 K
128 | matcher.loftr_coarse.layers.5.k_proj | Linear | 65.5 K
129 | matcher.loftr_coarse.layers.5.v_proj | Linear | 65.5 K
130 | matcher.loftr_coarse.layers.5.attention | LinearAttention | 0
131 | matcher.loftr_coarse.layers.5.merge | Linear | 65.5 K
132 | matcher.loftr_coarse.layers.5.mlp | Sequential | 393 K
133 | matcher.loftr_coarse.layers.5.mlp.0 | Linear | 262 K
134 | matcher.loftr_coarse.layers.5.mlp.1 | ReLU | 0
135 | matcher.loftr_coarse.layers.5.mlp.2 | Linear | 131 K
136 | matcher.loftr_coarse.layers.5.norm1 | LayerNorm | 512
137 | matcher.loftr_coarse.layers.5.norm2 | LayerNorm | 512
138 | matcher.loftr_coarse.layers.6 | LoFTREncoderLayer | 656 K
139 | matcher.loftr_coarse.layers.6.q_proj | Linear | 65.5 K
140 | matcher.loftr_coarse.layers.6.k_proj | Linear | 65.5 K
141 | matcher.loftr_coarse.layers.6.v_proj | Linear | 65.5 K
142 | matcher.loftr_coarse.layers.6.attention | LinearAttention | 0
143 | matcher.loftr_coarse.layers.6.merge | Linear | 65.5 K
144 | matcher.loftr_coarse.layers.6.mlp | Sequential | 393 K
145 | matcher.loftr_coarse.layers.6.mlp.0 | Linear | 262 K
146 | matcher.loftr_coarse.layers.6.mlp.1 | ReLU | 0
147 | matcher.loftr_coarse.layers.6.mlp.2 | Linear | 131 K
148 | matcher.loftr_coarse.layers.6.norm1 | LayerNorm | 512
149 | matcher.loftr_coarse.layers.6.norm2 | LayerNorm | 512
150 | matcher.loftr_coarse.layers.7 | LoFTREncoderLayer | 656 K
151 | matcher.loftr_coarse.layers.7.q_proj | Linear | 65.5 K
152 | matcher.loftr_coarse.layers.7.k_proj | Linear | 65.5 K
153 | matcher.loftr_coarse.layers.7.v_proj | Linear | 65.5 K
154 | matcher.loftr_coarse.layers.7.attention | LinearAttention | 0
155 | matcher.loftr_coarse.layers.7.merge | Linear | 65.5 K
156 | matcher.loftr_coarse.layers.7.mlp | Sequential | 393 K
157 | matcher.loftr_coarse.layers.7.mlp.0 | Linear | 262 K
158 | matcher.loftr_coarse.layers.7.mlp.1 | ReLU | 0
159 | matcher.loftr_coarse.layers.7.mlp.2 | Linear | 131 K
160 | matcher.loftr_coarse.layers.7.norm1 | LayerNorm | 512
161 | matcher.loftr_coarse.layers.7.norm2 | LayerNorm | 512
162 | matcher.coarse_matching | CoarseMatching | 0
163 | matcher.fine_preprocess | FinePreprocess | 65.8 K
164 | matcher.fine_preprocess.down_proj | Linear | 32.9 K
165 | matcher.fine_preprocess.merge_feat | Linear | 32.9 K
166 | matcher.loftr_fine | LocalFeatureTransformer | 328 K
167 | matcher.loftr_fine.layers | ModuleList | 328 K
168 | matcher.loftr_fine.layers.0 | LoFTREncoderLayer | 164 K
169 | matcher.loftr_fine.layers.0.q_proj | Linear | 16.4 K
170 | matcher.loftr_fine.layers.0.k_proj | Linear | 16.4 K
171 | matcher.loftr_fine.layers.0.v_proj | Linear | 16.4 K
172 | matcher.loftr_fine.layers.0.attention | LinearAttention | 0
173 | matcher.loftr_fine.layers.0.merge | Linear | 16.4 K
174 | matcher.loftr_fine.layers.0.mlp | Sequential | 98.3 K
175 | matcher.loftr_fine.layers.0.mlp.0 | Linear | 65.5 K
176 | matcher.loftr_fine.layers.0.mlp.1 | ReLU | 0
177 | matcher.loftr_fine.layers.0.mlp.2 | Linear | 32.8 K
178 | matcher.loftr_fine.layers.0.norm1 | LayerNorm | 256
179 | matcher.loftr_fine.layers.0.norm2 | LayerNorm | 256
180 | matcher.loftr_fine.layers.1 | LoFTREncoderLayer | 164 K
181 | matcher.loftr_fine.layers.1.q_proj | Linear | 16.4 K
182 | matcher.loftr_fine.layers.1.k_proj | Linear | 16.4 K
183 | matcher.loftr_fine.layers.1.v_proj | Linear | 16.4 K
184 | matcher.loftr_fine.layers.1.attention | LinearAttention | 0
185 | matcher.loftr_fine.layers.1.merge | Linear | 16.4 K
186 | matcher.loftr_fine.layers.1.mlp | Sequential | 98.3 K
187 | matcher.loftr_fine.layers.1.mlp.0 | Linear | 65.5 K
188 | matcher.loftr_fine.layers.1.mlp.1 | ReLU | 0
189 | matcher.loftr_fine.layers.1.mlp.2 | Linear | 32.8 K
190 | matcher.loftr_fine.layers.1.norm1 | LayerNorm | 256
191 | matcher.loftr_fine.layers.1.norm2 | LayerNorm | 256
192 | matcher.fine_matching | FineMatching | 0
193 | loss | LoFTRLoss | 0

11.6 M Trainable params
0 Non-trainable params
11.6 M Total params
46.246 Total estimated model params size (MB)
2023-04-17 14:44:02.262 | INFO | src.lightning.data:val_dataloader:302 - [rank:0/1]: Val Sampler and DataLoader re-init.
/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:69: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 16 which is the number of cpus on this machine) in theDataLoaderinit to improve performance. warnings.warn(*args, **kwargs) Validation sanity check: 100%|██████████| 10/10 [00:02<00:00, 6.09it/s]2023-04-17 14:44:05.167 | INFO | src.utils.metrics:aggregate_metrics:182 - Aggregating metrics over 10 unique items... Global seed set to 66 2023-04-17 14:44:05.257 | INFO | src.lightning.data:train_dataloader:289 - [rank:0/1]: Train Sampler and DataLoader re-init (should not re-init between epochs!). /home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:69: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of thenum_workers argument (try 16 which is the number of cpus on this machine) in the DataLoader init to improve performance.
warnings.warn(*args, **kwargs)
Epoch 0: 0%| | 0/38300 [00:00<?, ?it/s] Traceback (most recent call last):
File "/media/MegaDepth/LoFTR/train.py", line 123, in
main()
File "/media/MegaDepth/LoFTR/train.py", line 119, in main
trainer.fit(model, datamodule=data_module)
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
self._run(model)
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
self.dispatch()
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
self.accelerator.start_training(self)
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
self._results = trainer.run_stage()
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
return self.run_train()
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train
self.train_loop.run_training_epoch()
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 491, in run_training_epoch
for batch_idx, (batch, is_last_batch) in train_dataloader:
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/profiler/profilers.py", line 112, in profile_iterable
value = next(iterator)
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/supporters.py", line 530, in prefetch_iterator
last = next(it)
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/supporters.py", line 464, in next
return self.request_next_batch(self.loader_iters)
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/supporters.py", line 478, in request_next_batch
return apply_to_collection(loader_iters, Iterator, next)
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 85, in apply_to_collection
return function(data, *args, **kwargs)
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/roane/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 219, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/media/MegaDepth/LoFTR/src/datasets/megadepth.py", line 75, in getitem
image0, mask0, scale0 = read_megadepth_gray(
File "/media/MegaDepth/LoFTR/src/utils/dataset.py", line 109, in read_megadepth_gray
w, h = image.shape[1], image.shape[0]
AttributeError: 'NoneType' object has no attribute 'shape'

Epoch 0: 0%| | 0/38300 [00:00<?, ?it/s]

Process finished with exit code 1
微信图片_2
微信图片_3
image
image
image
image
image
image
image
image
image
image
image
image
如果可以的话,请问唐博可以留一个联系方式嘛

@Tangshitao
Copy link
Owner

你检查下代码里的路径是不是有那张图。

@chen9run
Copy link
Author

Can you provide more details of your problems?

你检查下代码里的路径是不是有那张图。
路径里没有那张图片,Undistorted_SfM/路径下只有0015和0022文件夹,因此,我在想是不是我数据集缺失,但是参考TRAIN的readme下载下来的文件megadepth_test_1500/Undistorted_SfM/中只包含0015和0022文件。

@Tangshitao
Copy link
Owner

我很久没用这个数据集了, 你去loftr github下面问问吧

@chen9run
Copy link
Author

我很久没用这个数据集了, 你去loftr github下面问问吧

嗯嗯,感谢

@ztqyang
Copy link

ztqyang commented May 20, 2023

你好,想请问一下您的问题解决了吗?我遇到了一样的问题,但不知道该怎么解决,非常感谢

@chen9run
Copy link
Author

chen9run commented Jun 2, 2023

你好,想请问一下您的问题解决了吗?我遇到了一样的问题,但不知道该怎么解决,非常感谢

数据集的问题,要自己去MegaDepth官网上下载数据集

@chen9run
Copy link
Author

chen9run commented Jul 7, 2023

我很久没用这个数据集了, 你去loftr github下面问问吧

您好,我将loftr的backbone部分进行的修改,对之后的粗匹配细匹配过程进行了微调。
在Validation sanity check: 100%|██████████| 10/10 时是能够完成检查的但是在检查完之后即
trainer.py的943行的for循环部分:
for dataloader_idx, dataloader in enumerate(dataloaders):
的953行
# stop short when running on limited batches
if batch_idx >= dl_max_batches:
break
当batch_idx等于10时
break后立即报错:
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable).
我们在检查数遍后仍然无法发现具体的报错地址(break‘完之间开始epoch0,不知道break后去哪里运行了。)

可以请你根据您的经验判断一下。

以下是完整的报错信息:

2023-07-07 22:33:26.279 | INFO | src.lightning.data:val_dataloader:302 - [rank:0/1]: Val Sampler and DataLoader re-init.
Validation sanity check: 100%|██████████| 10/10 [00:04<00:00, 4.14it/s]Exception ignored in: <function _MultiProcessingDataLoaderIter.del at 0x7f29b4c23ca0>
Traceback (most recent call last):
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1324, in del
self._shutdown_workers()
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1297, in _shutdown_workers
w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/multiprocessing/process.py", line 149, in join
res = self._popen.wait(timeout)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/multiprocessing/popen_fork.py", line 44, in wait
if not wait([self.sentinel], timeout):
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/multiprocessing/connection.py", line 931, in wait
ready = selector.select(timeout)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt:
2023-07-07 22:33:31.018 | INFO | src.utils.metrics:aggregate_metrics:182 - Aggregating metrics over 10 unique items...
Global seed set to 66
Training: 0it [00:00, ?it/s]2023-07-07 22:33:31.313 | INFO | src.lightning.data:train_dataloader:289 - [rank:0/1]: Train Sampler and DataLoader re-init (should not re-init between epochs!).
Epoch 0: 0%| | 1/38300 [00:01<20:11:26, 1.90s/it, loss=4.06, v_num=11]Traceback (most recent call last):
File "/home/shy/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1741, in
main()
File "/home/shy/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/shy/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/shy/pycharm-community-2019.1.1/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/2TB/LoFTR-master/train.py", line 123, in
main()
File "/2TB/LoFTR-master/train.py", line 119, in main
trainer.fit(model, datamodule=data_module)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
self._run(model)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
self.dispatch()
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
self.accelerator.start_training(self)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
self._results = trainer.run_stage()
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
return self.run_train()
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train
self.train_loop.run_training_epoch()
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch
batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 738, in run_training_batch
self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 434, in optimizer_step
model_ref.optimizer_step(
File "/2TB/LoFTR-master/src/lightning/lightning_loftr.py", line 83, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step
self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step
self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step
optimizer.step(closure=lambda_closure, **kwargs)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/torch/optim/optimizer.py", line 89, in wrapper
return func(*args, **kwargs)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/torch/optim/adamw.py", line 65, in step
loss = closure()
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 732, in train_step_and_backward_closure
result = self.training_step_and_backward(
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 823, in training_step_and_backward
result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 290, in training_step
training_step_output = self.trainer.accelerator.training_step(args)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
return self.training_type_plugin.training_step(*args)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 319, in training_step
return self.model(*args, **kwargs)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shy/anaconda3/envs/loftr1/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 692, in forward
if self.reducer._rebuild_buckets():
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable).

@Tangshitao
Copy link
Owner

pytorch lightning参数里面设置find_unused_parameters=True

@chen9run
Copy link
Author

chen9run commented Jul 7, 2023

pytorch lightning参数里面设置find_unused_parameters=True

谢谢,成功解决了

@chen9run
Copy link
Author

pytorch lightning参数里面设置find_unused_parameters=True
尊敬的唐博,你好,我想请问一下,关于这里遇到的warning,

image

No groundtruth coarse match found for: [['Undistorted_SfM/0285/images/2398430131_75b193e2b3_o.jpg'], ['Undistorted_SfM/0285/images/3390973770_759850a96e_o.jpg']],

如果你有任何的见解都会对我带来极大的帮助,十分感谢

@Yun-chat
Copy link

你好,想请问一下您的问题解决了吗?我遇到了一样的问题,但不知道该怎么解决,非常感谢

数据集的问题,要自己去MegaDepth官网上下载数据集

您好,请问您数据集的问题解决了吗?我现在对如何下载并设置数据集还很疑惑!

@panjiadipanjiadi
Copy link

你好,想请问一下您的问题解决了吗?我遇到了一样的问题,但不知道该怎么解决,非常感谢

数据集的问题,要自己去MegaDepth官网上下载数据集

您好,问一下你下载的是那个199GB的MegaDepth v1 数据集吗

@LuoXubo
Copy link

LuoXubo commented Oct 25, 2024

你好,想请问一下您的问题解决了吗?我遇到了一样的问题,但不知道该怎么解决,非常感谢

数据集的问题,要自己去MegaDepth官网上下载数据集

您好,问一下你下载的是那个199GB的MegaDepth v1 数据集吗

我是下载的这个199G的数据集,但是它的目录结构是/MegaDepth_v1/phoenix/S6/zl548,目录下面有195个场景数据集

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants