-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove spectral index and allow input of stokes varying by source, time and channel. #244
Conversation
Previously, spectral index was limited to (freq/ref_freq)**alpha, which is a fairly simplistic model. Remove this and simply allow the user to specify stokes parameters varying by source, time and channel.
I'm unlikely to merge this into master given
However, given the need for arbitrary spectral index expressions in https://github.com/cyriltasse/DDFacet/pull/452 and #171 I can keep parallel development on this branch going. |
montblanc/tests/test_meq_tf.py
Outdated
def point_ref_freq(self, context): | ||
(lp, up) = context.dim_extents('npsrc') | ||
return pt_ref_freq[lp:up] | ||
return s*(f/rf)**a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bennahugo See above for how to input stokes parameters along with the default spectral index into montblanc
@JSKenyon @SaiyanPrince After discussions with @gijzelaerr I think we're going to end up merging this into master branch. Essentially stokes parameters now vary by channel too and one has to manually work the spectral index into the Stokes terms (see test_meq_tf.py). In practice, the *_stokes data sources in your SourceProviders will need to change as they've changed in You have 2 hours to lodge objections! |
I don't object per se, but this volatile interface is a little bit frustrating. To any CubiCal user reading this, it may take a day or two for this to be fixed on our side. |
Maybe make a release of Montblanc before the change, and make current Cubical depend on that release for now? |
This is on hold for a bit while @gijzelaerr and myself consider packaging implications... |
So what montblanc will ingest is an array of dimensions n_pol x n_chan for every component? I guess in the case of interfacing with CubiCal this array will be calculated from the Tigger models using I,Q,U,V,f0,alpha,[beta,...] and then passed to montblanc? Cheers. |
In fact, the array rank is growing fairly large
Correct |
Do you then use |
Not at present, mostly because GPU broadcasting isn't available yet. There is a broadcasting PR that would make the pipeline more flexible in this regard. |
Aren't we then substantially increasing data volumes to cater for a what is only a marginal use case? |
Indeed, although it "supports the general case" in that the user can work whatever spectral models they want into the stokes parameters. I think this highlights the need for flexibility for inputs and RIME terms (related to #245, #168), balanced against the hand-coded and relatively inflexible nature of CUDA kernels. @JSKenyon and myself started brainstorming ways to support this configurability yesterday. Things like tensorflow/tensorflow#15243 would make this much easier (probably at the expense of extra memory copies). |
That I don't question. 99% of your use cases need a spectral axis. However, I wager less than 5% (if that) of the use cases need both spectral and time axes. If this is so, doesn't having an "always on" time axis induce an unnecessary penalty 94% of the time? |
Yes, and really the issue here becomes maintaining permutations of the C++/CUDA code (Most likely in the multiplication of terms rather than their calculation). Off the top of my head the permutations are device and shape ["CPU", "GPU"] [("time,chan"), ("chan"), ("time"), ()] for a total of 12 combinations. The nice thing about NumPy is that it makes this kind of thing trivial, but a couple of lines in NumPy blows up to hand-crafting each case in CUDA. Thats where a GPU broadcasting operation would make things simpler. There is the option of something like tf.matmul but then instead of a single operator doing specialised multiplies of four 2x2 Jones matrices we have four calls to tf.matmul and allocation of space for four result arrays. To me it seems like a snakes and ladders type scenario -- memory budgets on a GPU are tighter than on a CPU. My gut is against premature optimisation here -- its been a while since I've been able to do serious profiling but things like the beam, gaussian shape parameters and multiplication to produce source coherencies were the big time sinks when I last looked. Additionally, I was only intending to merge this to make #250 simpler and the cuDNN dependency is proving a hassle. |
I would reduce the shape permutations to just [("time,chan"), ("chan")]. These are the two realistic use cases.
I'm on record as being militantly against premature optimization. But is this premature, or even optimization? Consider a simple MeerKAT use case. 1000 sources, 4K channels, 4 polarisations, 1000 timeslots. That's 16G entries in your source array -- you can't even get that onto the GPU! Eliminate the time axis, it's down to 16M entries, small change. So having a permanently unrolled time axis effectively makes Montblanc incompatible with MeerKAT, our meat-and-potatoes application. All for the sake of supporting a rather exotic use case (which scintillation and/or transients are) that is only likely to be run in a regime where you have small numbers of sources, channels and antennas. |
True! That is probably too large for the general case, and is a good argument for why this PR probably shouldn't be merged as standard functionality -- I should have pencilled and papered the problem sizes because as you point out, the size of the input is huge. But also remember, that the problem also gets subdivided into chunks by time and source (think 100 time and 50 sources) for transfer to the GPU. The dask version will also allow easy chunking by channel. So the problem is less about fitting it onto the GPU, but the I/O transfer of parts of the problem to/from the GPU (16GB/s full duplex on PCI-E 3.0). I always think of RIME I/O costs as O(V + S) vs compute costs of O(V x S) where V is number of visibilities and S number of sources. I think in this case, due to the large number of sources, the compute will still outstrip the very large I/O transfer. In summary, I agree we shouldn't make this standard. I'll liase with @gijzelaerr tomorrow about packaging again. |
I think this is also why configurability is going to be a large consideration from my side in the new version. |
Can you really see a need for this many sources? Won't most use cases involve a handful of montblanc components and a MODEL_DATA column? Not that I'm arguing against making things leaner. I think the time axis would be a nice thing to keep as an option. I was chatting briefly to @twillis449 and Bruce B. about simulating RFI and this seems like a really nice way to chuck in arbitrary time and frequency behaviour. |
Well, we went up to 300 sources for the 3C147 VLA reduction. Arguably, this can be reduced by better use of the MODEL_DATA column. So let's say we need 50, or 100. That's still a big ole hunk of data with a time axis in place -- but quite lightweight without one. I'm not arguing against a time axis -- just against a non-optional one. |
tensorflow GPU requires a manual install.
I'm happy that this branch works (and is fully py3 compatible), see my latest test run below: This addresses the py3 changes needed for cubical in ratt-ru/CubiCal/pull/270 |
@sjperkins can we sit together tomorrow morning and handle the rest of the merge conflicts? |
Sure |
Alright my subtraction test passes again as above. I need to install nvidia drivers to run your tests though so fingers crossed this doesn't break my system |
This is a python 3 construct.
|
ANy reason not to merge this still? |
I can't remember any particular reason. Maybe we should just merge and deal with consequences? |
Previously, spectral index was limited to
(freq/ref_freq)**alpha
, which is a fairly simplistic model. Remove this and simply allow the user to specify stokes parameters varying by source, time and channel.