Tips for multi-room scenes #7
Replies: 8 comments
-
Beta Was this translation helpful? Give feedback.
-
The model do not process images in some order. So changing the order should not change the result. Let me check is this because of some bugs of demo. can you show your raw data of 20 images? like a link of them? |
Beta Was this translation helpful? Give feedback.
-
Hi, this is my image collection. |
Beta Was this translation helpful? Give feedback.
-
I tried the MVD pretrained weights, the results looks much better. It's weird that mvd are better than mvd++: |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I'll push the view selection code change today. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your quick response. I'll try your new strategy. I also noticed that the camera pose seemed not very accurate. I also tried mast3r-sfm which uses mast3r as the feature matching method and uses colmap as backend to calibrate the images. The point cloud and camera pose are much better and robust (though it's a lot slower), I think the main problem might be the lack of global bundle adjustment. In complex scenes, the error will accumulate, so maybe some global optimization or loop closure should be adopted to eliminate the accumulated error if using in custom datasets. |
Beta Was this translation helpful? Give feedback.
-
thanks for your suggestion and I believe extra global optimization can improve performance. dust3r and mast3r are better when local align is correct but sometimes are totally ruined due to wrong matching, which is not fixable by global optimization. Our method can put views in the coarse correct position, but yeah, it looks jittering and blurred on edges. If you would like, you can combine both sequentially and get a result better than all while not too slow. |
Beta Was this translation helpful? Give feedback.
-
Hi, I've tried your mvd plus stage 2 model. My dataset was a two room scene connected. I extracted 20 sequential images from the scene and found the results were not so good as in the paper. Here's the confidence map that might be helpful for your debugging:
The first frame:
The last frame:
I found the confidence map in the first 10 frames looks normal, but the confidence map in the last half looks not so good. And the reconstruction point cloud further confirm this conclusion.From the point cloud, The first half 10 frames have good shapes , the last half frames are messy.
Beta Was this translation helpful? Give feedback.
All reactions