-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there any example for mult-architecture support [JuliaGPU] #835
Comments
Hi, we have not published a public template for GPU based computation yet. This is referring to the design of how factors sampling and residuals are calculated. We worked to avoid restrictions as far possible regarding where computations are done. So if the sampling process or residual computation can benefit from GPU, that is the place to do it first. We previously did work on GPU for factors, which computed large FFTs using the GPU during factor residual computations. It was clunky before the upgrades, but since then we have significantly improved the API but not yet updated the legacy GPU implementation from that project. We are now mostly settled on the residual and sampling API though, so at least that should be stable for a good while. |
First of all, thanks for your reply. |
Yeah, of course -- we try answer as soon as possible, not always able to :-P
Not quite: When a factor computation is done, many particles are computed for each factor calculation, and each of those particles are updated using optimization routines that call the residual functions many many times over. So residual functions are called liek 10^5-10^8 times per solve. So in some cases GPU can help a lot. Let's say you have a residual computation that is intensive and takes 100ms on CPU with threading etc. If you were to port to GPU and get a 20ms time for equivalent computation, thats good. Next step, Caesar.jl packages when solving the factor graph make extensive use of the residual and sampling functions. So if the overhead of the GPU implementation adds like 10-20% from all the calculations, you will still be at ~30ms rather than CPU 100ms per residual, so everything will be a lot faster. Flipside is when the CPU computation for residual or sampling is like 10us, but the GPU takes longer due to memory management, etc. then the residual compute will be thrashing and will not help. Places where GPU work is when you have dense data that needs to be processed in a data intensive way. Many of the current factors in Caesar.jl system compute in the order 500ns-ish, so there is no point to try make them work on GPU. There are cases where GPU will help with some of the standard factors, we just haven't gotten around to making an example for the community yet. The caveat, as I alluded to above, is when you have large computations within each residual or sampling cycle. Then GPU is great. This is how we used it in a previous project. Can you perhaps say more about your application to see if we can point you in a good direction? What is the main issue you're trying to get around? |
I have to achieve two applications using graph optimization. |
Cool, think I follow.
No, that is not normal. Compiling should really only be the first, during warmupJIT, or precompile -- there should be no compiling in tree construction, even if the graph changes. Will have to look at that more closely:
Got it, so that is something we are interested in -- we have some capabilities there. Let us know how it goes, or if that is something you might want to offload -- likely a longer conversation. Cc'ing @GearsAD .
Probably not in this way directly. Indirectly, however, there are improvements to be had by upgrading some of the backend non-Gaussian computation steps using the GPU. But that is in the medium-long term roadmap. So there is potential for "GPU makes SLAM solve faster", but likely not in the timeframe for your project at the moment, hence we're not advertising that too loudly at this stage. |
Firstly, for the first answer, as I am not familiar with Julia. I test time and check whether it is compiled every time based on the following code:
and get the following output:
The compile time is quick except the first time; why do other times have compilation time? Furthermore, why recompile whenever adding a new factor? Its output is like this:
The factor described before looks like this;
I need more time to familiarize myself with this framework for my second target[2d slam with loop-closure]. |
Hi @wystephen , Thanks for pointing out the slow compile times on for growing the graph. That was was just found and fixed here: |
As mentioned in the document,
Is there any example for solve factor graph optimization problem use GPU?
The text was updated successfully, but these errors were encountered: