Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove spectral index and allow input of stokes varying by source, time and channel. #244

Merged
merged 46 commits into from
Apr 21, 2020

Conversation

sjperkins
Copy link
Member

Previously, spectral index was limited to (freq/ref_freq)**alpha, which is a fairly simplistic model. Remove this and simply allow the user to specify stokes parameters varying by source, time and channel.

Previously, spectral index was limited to (freq/ref_freq)**alpha,
which is a fairly simplistic model. Remove this and simply allow
the user to specify stokes parameters varying by source, time and
channel.
@sjperkins
Copy link
Member Author

I'm unlikely to merge this into master given

  1. development on the dask version
  2. Support plugging in various implementations for different sections of the RIME pipeline. #168 would be a fair amount of dev on the master branch.

However, given the need for arbitrary spectral index expressions in https://github.com/cyriltasse/DDFacet/pull/452 and #171 I can keep parallel development on this branch going.

@sjperkins
Copy link
Member Author

/cc @bennahugo @landmanbester

def point_ref_freq(self, context):
(lp, up) = context.dim_extents('npsrc')
return pt_ref_freq[lp:up]
return s*(f/rf)**a
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bennahugo See above for how to input stokes parameters along with the default spectral index into montblanc

@sjperkins
Copy link
Member Author

@JSKenyon @SaiyanPrince After discussions with @gijzelaerr I think we're going to end up merging this into master branch. Essentially stokes parameters now vary by channel too and one has to manually work the spectral index into the Stokes terms (see test_meq_tf.py). In practice, the *_stokes data sources in your SourceProviders will need to change as they've changed in test_meq_tf.py in this branch.

You have 2 hours to lodge objections!

@JSKenyon
Copy link

I don't object per se, but this volatile interface is a little bit frustrating. To any CubiCal user reading this, it may take a day or two for this to be fixed on our side.

@gijzelaerr
Copy link
Member

Maybe make a release of Montblanc before the change, and make current Cubical depend on that release for now?

@sjperkins
Copy link
Member Author

This is on hold for a bit while @gijzelaerr and myself consider packaging implications...

@IanHeywood
Copy link

So what montblanc will ingest is an array of dimensions n_pol x n_chan for every component?

I guess in the case of interfacing with CubiCal this array will be calculated from the Tigger models using I,Q,U,V,f0,alpha,[beta,...] and then passed to montblanc?

Cheers.

@sjperkins
Copy link
Member Author

So what montblanc will ingest is an array of dimensions n_pol x n_chan for every component?

In fact, the array rank is growing fairly large (n_src, n_time, n_chan, n_pol) (we have n_time in order to represent scintillation)

I guess in the case of interfacing with CubiCal this array will be calculated from the Tigger models using I,Q,U,V,f0,alpha,[beta,...] and then passed to montblanc?

Correct

@o-smirnov
Copy link
Contributor

(we have n_time in order to represent scintillation)

Do you then use n_time=1 if no sources are variable (i.e. is the time axis broadcast, numpy-style?)

@sjperkins
Copy link
Member Author

Do you then use n_time=1 if no sources are variable (i.e. is the time axis broadcast, numpy-style?)

Not at present, mostly because GPU broadcasting isn't available yet. There is a broadcasting PR that would make the pipeline more flexible in this regard.

@o-smirnov
Copy link
Contributor

Aren't we then substantially increasing data volumes to cater for a what is only a marginal use case?

@sjperkins
Copy link
Member Author

Aren't we then substantially increasing data volumes to cater for a what is only a marginal use case?

Indeed, although it "supports the general case" in that the user can work whatever spectral models they want into the stokes parameters.

I think this highlights the need for flexibility for inputs and RIME terms (related to #245, #168), balanced against the hand-coded and relatively inflexible nature of CUDA kernels. @JSKenyon and myself started brainstorming ways to support this configurability yesterday.

Things like tensorflow/tensorflow#15243 would make this much easier (probably at the expense of extra memory copies).

@o-smirnov
Copy link
Contributor

Indeed, although it "supports the general case" in that the user can work whatever spectral models they want into the stokes parameters.

That I don't question. 99% of your use cases need a spectral axis.

However, I wager less than 5% (if that) of the use cases need both spectral and time axes. If this is so, doesn't having an "always on" time axis induce an unnecessary penalty 94% of the time?

@sjperkins
Copy link
Member Author

sjperkins commented Mar 15, 2018

However, I wager less than 5% (if that) of the use cases need both spectral and time axes. If this is so, doesn't having an "always on" time axis induce an unnecessary penalty 94% of the time?

Yes, and really the issue here becomes maintaining permutations of the C++/CUDA code (Most likely in the multiplication of terms rather than their calculation). Off the top of my head the permutations are device and shape ["CPU", "GPU"] [("time,chan"), ("chan"), ("time"), ()] for a total of 12 combinations. The nice thing about NumPy is that it makes this kind of thing trivial, but a couple of lines in NumPy blows up to hand-crafting each case in CUDA. Thats where a GPU broadcasting operation would make things simpler.

There is the option of something like tf.matmul but then instead of a single operator doing specialised multiplies of four 2x2 Jones matrices we have four calls to tf.matmul and allocation of space for four result arrays. To me it seems like a snakes and ladders type scenario -- memory budgets on a GPU are tighter than on a CPU.

My gut is against premature optimisation here -- its been a while since I've been able to do serious profiling but things like the beam, gaussian shape parameters and multiplication to produce source coherencies were the big time sinks when I last looked.

Additionally, I was only intending to merge this to make #250 simpler and the cuDNN dependency is proving a hassle.

@o-smirnov
Copy link
Contributor

I would reduce the shape permutations to just [("time,chan"), ("chan")]. These are the two realistic use cases.

My gut is against premature optimization here

I'm on record as being militantly against premature optimization. But is this premature, or even optimization?

Consider a simple MeerKAT use case. 1000 sources, 4K channels, 4 polarisations, 1000 timeslots. That's 16G entries in your source array -- you can't even get that onto the GPU!

Eliminate the time axis, it's down to 16M entries, small change.

So having a permanently unrolled time axis effectively makes Montblanc incompatible with MeerKAT, our meat-and-potatoes application. All for the sake of supporting a rather exotic use case (which scintillation and/or transients are) that is only likely to be run in a regime where you have small numbers of sources, channels and antennas.

@sjperkins
Copy link
Member Author

sjperkins commented Mar 15, 2018

Consider a simple MeerKAT use case. 1000 sources, 4K channels, 4 polarisations, 1000 timeslots. That's 16G entries in your source array -- you can't even get that onto the GPU!

True! That is probably too large for the general case, and is a good argument for why this PR probably shouldn't be merged as standard functionality -- I should have pencilled and papered the problem sizes because as you point out, the size of the input is huge.

But also remember, that the problem also gets subdivided into chunks by time and source (think 100 time and 50 sources) for transfer to the GPU. The dask version will also allow easy chunking by channel. So the problem is less about fitting it onto the GPU, but the I/O transfer of parts of the problem to/from the GPU (16GB/s full duplex on PCI-E 3.0). I always think of RIME I/O costs as O(V + S) vs compute costs of O(V x S) where V is number of visibilities and S number of sources. I think in this case, due to the large number of sources, the compute will still outstrip the very large I/O transfer.

In summary, I agree we shouldn't make this standard. I'll liase with @gijzelaerr tomorrow about packaging again.

@sjperkins
Copy link
Member Author

I think this is also why configurability is going to be a large consideration from my side in the new version.

@IanHeywood
Copy link

1000 sources

Can you really see a need for this many sources? Won't most use cases involve a handful of montblanc components and a MODEL_DATA column? Not that I'm arguing against making things leaner.

I think the time axis would be a nice thing to keep as an option. I was chatting briefly to @twillis449 and Bruce B. about simulating RFI and this seems like a really nice way to chuck in arbitrary time and frequency behaviour.

@o-smirnov
Copy link
Contributor

Well, we went up to 300 sources for the 3C147 VLA reduction. Arguably, this can be reduced by better use of the MODEL_DATA column. So let's say we need 50, or 100. That's still a big ole hunk of data with a time axis in place -- but quite lightweight without one.

I'm not arguing against a time axis -- just against a non-optional one.

@bennahugo
Copy link
Contributor

I'm happy that this branch works (and is fully py3 compatible), see my latest test run below:
deep2

This addresses the py3 changes needed for cubical in ratt-ru/CubiCal/pull/270

@bennahugo
Copy link
Contributor

@sjperkins can we sit together tomorrow morning and handle the rest of the merge conflicts?

@sjperkins
Copy link
Member Author

Sure

@bennahugo
Copy link
Contributor

Alright my subtraction test passes again as above. I need to install nvidia drivers to run your tests though so fingers crossed this doesn't break my system

@bennahugo
Copy link
Contributor

This is a python 3 construct.

  File "/usr/local/lib/python2.7/dist-packages/montblanc/impl/rime/tensorflow/RimeSolver.py", line 1145, in _get_data
    raise ex.with_traceback(sys.exc_info()[2])
AttributeError: 'exceptions.ValueError' object has no attribute 'with_traceback'

@o-smirnov
Copy link
Contributor

ANy reason not to merge this still?

@sjperkins
Copy link
Member Author

ANy reason not to merge this still?

I can't remember any particular reason. Maybe we should just merge and deal with consequences?

@sjperkins sjperkins merged commit e8b2e71 into master Apr 21, 2020
@sjperkins sjperkins deleted the ddfacet branch April 21, 2020 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants