Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding Eq. 1 and 2 #7

Open
tengyu-liu opened this issue Jun 20, 2023 · 9 comments
Open

Understanding Eq. 1 and 2 #7

tengyu-liu opened this issue Jun 20, 2023 · 9 comments

Comments

@tengyu-liu
Copy link

Congratulations on achieving this great work! The demo and results are very impressive, and it has been a big hit! I really like the idea of using a quasi-3D representation and ignoring the ambiguities because they are not important to the problem.

I'm trying to understand Eq. 1 and 2 from the paper and can't understand why we use the same points in the source $x_i^k$ and target frame $x_j^k=\mathcal{T}_j^{-1}\circ\mathcal{T}_i(x_i^k)$ and hope I can get some clarifications.

In my understanding, if the points $x_j^k$ are the same points as $x_i^k$ in the canonical frame, then the occlusion relationship would not change across frames as the camera ray still passes through the same set of points in the same order. Since $\sigma_k$ is stored in $G$ and does not change across frames, I don't understand why OmniMotion can handle occlusions.

So my question is, why are we computing $x_j^k$ as $\mathcal{T}_j^{-1}\circ\mathcal{T}_i(x_i^k)$ instead of sampling from a new ray in $j$-th frame and map that to the same canonical space? Why does the model work so well despite $M_\theta$ cannot change the occlusion relationship?

@qianqianwang68
Copy link
Owner

Hi Tengyu,

I'm not sure if I fully understand your confusion. I'll try to answer your questions, and let me know if you have further ones.

the occlusion relationship would not change across frames as the camera ray still passes through the same set of points in the same order

the occlusion relationship can change, as two different samples $x_j^m$ and $x_j^n$ may swap orders in frame $j$ (i.e., $x_i^m$ is closer than $x_i^n$ in frame $i$, but with deformation $x_j^m$ can become farther away than $x_j^m$ in frame $j$). And the reason that we always naturally sample points from near to far at $p_i$ in frame $i$ is because we want to compute the flow at the location of $p_i$.

I can try to give an example to explain why OmniMotion can handle occlusions. Let's say $p_i$ is occluded by some other surface in frame $j$ at $p_j$, which means $p_i$ should go to $p_j$ but is occluded. Let's assume that the corresponding surface for $p_i$ is $x_i^{n}$ ($\sigma_i^{n}$ is 1 and all other $\sigma$ on the ray are zero). And let's assume that the corresponding surface for $p_j$ in frame $j$ is $x_j^{l}$, then what happens in this case is that $x_i^{n}$ is mapped to $x_j^{n}$ (which projects into $p_j$), but it is farther away than $x_j^{l}$, and that's how it gets occluded. So occlusion happens when some other points exist in front of the points you are tracking. And the other points do not need to be among the points you sampled at $p_i$.

Why not sampling from a new ray in $j$-th frame and map that to the same canonical space?

This can also work if and only if the two points are cycle consistent (co-visible). But the loss in Eq. 2 can be applied to occluded points as well. We tried the idea of enforcing cycle-consistent points to be mapped to the same canonical space, but it didn't work very well. In fact, what you need is not only enforcing matching points to be closer but also non-matching points to be further away, otherwise a trivial solution would be to make the canonical space infinitely small. But we didn't find a version of this loss to work robustly well either.

Best,
Qianqian

@tengyu-liu
Copy link
Author

Please correct me if my understanding is wrong:

The occlusion relationship can change, as two different samples $x_j^m$ and $x_j^n$ may swap orders in the frame $j$

If $x_i^m$ is closer than $x_i^n$, that means $m\lt n$, and $T_m\cdot\alpha_m=1$ and $T_n\cdot\alpha_n=0$ for both the frames $i$ and $j$. This will not change the occlusion relationship even if the depth order changes between the two frames. Unless you re-order $x_j$ by depth.

it ($x_j^n$) is farther away than $x_j^l$, and that's how it gets occluded

Because both $x_j^l$ and $x_i^l$ map to the same point in the canonical volume, they would get exactly the same color and density right? Consider in a rigid scene, something that was not occluded in frame $i$ ($x_i^n$ is closer than $x_i^l$) is occluded in frame $j$ ($x_j^l$ is closer than $x_j^n$) due to only camera motion. The only way OmniMotion would works is that $x_i^l$ is already in the camera ray, hiding behind $x_i^n$ even though in reality the occluding object should not be in the camera ray in frame $i$. Is my understanding correct? I believe that this is why it is called a quasi-3D representation in the sense that it is geometrically incorrect but suits well for the dense tracking task.

@boxraw-tech
Copy link

boxraw-tech commented Jun 21, 2023

@tengyu-liu thanks for asking these questions, I'm also trying to get my head around this.

Because both $x_j^l$ and $x_i^l$ map to the same point in the canonical volume, they would get exactly the same colour and density right?

My understanding is they don't necessarily get the same colour and density as $F_\theta$ is parameterised differently for each frame by $\psi_i$.

@tengyu-liu
Copy link
Author

According to sections 4.1 and 4.2, I believe that $F_\theta$ is independent of the frame. $M_\theta$ is parameterised by $\phi_i$, which gives different $\mathcal{T}_i$ functions for different frames. Since both $x_i^l$ and $x_j^l$ map to the same point in the canonical volume, they are guaranteed to get the same color and density.

@boxraw-tech
Copy link

In section 4.3 it says

density and colour can be written as $(\sigma_k, c_k) = F_\theta (M_\theta(x^k_i ; \psi_i)) $

so it seems perfectly possible to get different colour and density for the same point in different frames.

@qianqianwang68
Copy link
Owner

Hi Tengyu,

The first part is correct. However, I think there is some misunderstanding here:

The only way OmniMotion would work is that $x_i^l$ is already in the camera ray, hiding behind $x_i^n$ even though in reality the occluding object should not be in the camera ray in frame i.

I don't understand why that's the only way OmniMotion would work. $x_i^n$ and $x_i^l$ do not need to be on the same ray in frame $i$. They can be at different pixel locations and both of them can be visible. Let me give you an example using the online demo:

image

In this example, the blue point is occluded by the red point in the second image (let's assume they are at the same pixel location), but their corresponding pixel locations in the first image are different and both of them are visible.

@qianqianwang68
Copy link
Owner

@boxraw-tech Tengyu is correct, if two local points map to the same point in the canonical volume, then they are guaranteed to get the same color and density.

@unlockpowerofpixels
Copy link

Hi @qianqianwang68 @tengyu-liu

I am still trying to wrap my head around the discussion. Could you guys help to clarify them for me, particularly

If $\mathbf{x}_i^{m}$ is closer than $\mathbf{x}_i^{n}$, that means m<n, and $T_m$ $\circ$ $\alpha_m$=1 and $T_n$ $\circ$ $\alpha_n$=0 for both the frames i and j. This will not change the occlusion relationship even if the depth order changes between the two frames. Unless you re-order $\mathbf{x}_j$ by depth.

Following your discussion, assuming m<n, the corresponding surface for $\mathbf{p}_i$ is $\mathbf{x}_i^n$, thus $\sigma_i^n$=1 and the rest is 0 ($\sigma_i^m$=0), isn't $T_m$ $\circ$ $\alpha_m = 0$ and $T_n$ $\circ$ $\alpha_n$=0 since $\alpha_m = 1 - \exp(0)$=0 and since $T_k$ considers the product over the $k-1$ points, thus $T_n$ $\circ$ $\alpha_n$ should be 0 as well.

@serycjon
Copy link

I think I have finally figured out the occlusions :). For a given point p_i in the first image, you always get the same positions in the canonical volume, same colors and densities. Then for the second image you do the "alpha compositing" to get a single point x_j (that is the 2D point p_j and its "depth"). You don't get the occlusion state yet. They don't mention how to get the occlusion state in the paper, but I think I have found it in the code.
To get the occlusion, they project the p_j back to the cannonical space (constructing samples, projecting) to get the densities and thus the "depth" (i.e. something like picking the sample on the p_j ray with the biggest density gives you "depth"). Finally they compare this "depth" in the second image with the "depth" of the point p_i projected into the second image.
So in the swing example, you first project the blue point on the lady into the second image to get the position on the swing frame + a "depth" prediction. Then you go backward from this position (into the cannonical frame to get densities and thus "depth") to check if the predicted "depth" is as expected (in this case it is not). The important think is that the red point does not matter at all. The occluder may not even be visible in the first image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants