Add support for Python3 #270

gijzelaerr · 2019-05-16T11:35:10Z

I think this is mostly done now.

python2 with casacore 3.0 and python-casacore 3.0 works.

python3 with casacore 3.0 and python-casacore 3.0 gives an error:

  File "/home/gijs/Work/CubiCal/cubical/database/casa_db_adaptor.py", line 60, in init_empty
    t.putcol("TYPE", np.array(db.anttype)[antorder])
  File "/home/gijs/Work/CubiCal/.venv3_no_unicode/lib/python3.6/site-packages/casacore/tables/table.py", line 1157, in putcol
    self._putcol(columnname, startrow, nrow, rowincr, value)
RuntimeError: PycArray: unknown python array data type

Which is reported here:
casacore/python-casacore#138

python2 and python3 with the latest casacore and python-casacore give an error, which is reported here: #269

Both of them are in the same area of code, so I guess they are related.

This has been quite some work (again), so I hope this gets merged soon.

Oleg not sure what you want with your printing voodoo, but this is the result from the auto 2to3 convert things.

ratt-priv-ci · 2019-05-16T11:37:04Z

Can one of the admins verify this patch?

bennahugo · 2019-05-16T12:12:47Z

ok to test

bennahugo

Looks mostly good. Main concern is the following issues:

range vs. xrange, esp those in nested loops have a high performance and memory impact when running in python 2.7
iteritems vs. items, same here please use conditionals and six to check which to call depending on environment
pickle is backed in python on python 2.7 and the default levels are different. This has a performance impact when running in python 2.7. Use a conditional and six when importing
buildins don't exist in python 2.7 the test coverage probably excludes these files. As it stands those are not backwards compatible with python 2.7

cubical/data_handler/MBTiggerSim.py

cubical/data_handler/ms_data_handler.py

cubical/solver.py

cubical/statistics.py

cubical/tools/NpShared.py

cubical/tools/shared_dict.py

bennahugo · 2019-05-16T15:07:25Z

- 12:41:53 - param_db           �[1m�[94m[io] �[0m�[0m[0.1/0.5 2.3/2.8 0.5Gb]   loading G:gain.err, shape 1x115x64x28x2x2
 - 12:41:54 - casa_db_adaptor    �[1m�[94m[io] �[0m�[0m[0.1/0.5 2.4/2.8 0.5Gb] Exporting to CASA gaintables
 - 12:41:54 - main               �[1m�[94m[io] �[0m�[0m[0.2/0.5 2.4/2.8 0.5Gb] �[1m�[91mI/O handler for load None save -1 failed with exception: Python argument types in
    Table.__init__(table, list, list, int, int, int)
did not match C++ signature:
    __init__(_object*, casa::String, casa::String, casa::String, bool, casa::IPosition, casa::String, casa::String, int, int, casa::Vector<casa::String>, casa::Vector<casa::String>)
    __init__(_object*, casa::String, casa::Record, casa::String, casa::String, int, casa::Record, casa::Record)
    __init__(_object*, std::vector<casa::TableProxy, std::allocator<casa::TableProxy> >, casa::Vector<casa::String>, int, int, int)
    __init__(_object*, casa::Vector<casa::String>, casa::Vector<casa::String>, casa::Record, int)
    __init__(_object*, casa::String, casa::Record, int)
    __init__(_object*, casa::String, std::vector<casa::TableProxy, std::allocator<casa::TableProxy> >)
    __init__(_object*, casa::TableProxy)
    __init__(_object*)�[0m�[0m
 - 12:41:54 - main               �[1m�[94m[io] �[0m�[0m[0.2/0.5 2.4/2.8 0.5Gb] Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/cubical/workers.py", line 444, in _io_handler
    solver.gm_factory.close()
  File "/usr/local/lib/python2.7/dist-packages/cubical/machines/abstract_machine.py", line 862, in close
    db.close()
  File "/usr/local/lib/python2.7/dist-packages/cubical/database/casa_db_adaptor.py", line 467, in close
    self.__export()
  File "/usr/local/lib/python2.7/dist-packages/cubical/database/casa_db_adaptor.py", line 449, in __export
    casa_caltable_factory.create_G_table(self, "G:phase")
  File "/usr/local/lib/python2.7/dist-packages/cubical/database/casa_db_adaptor.py", line 170, in create_G_table
    with tbl(db.filename + ".%s.casa" % outname, ack=False, readonly=False) as t:
  File "/usr/local/lib/python2.7/dist-packages/casacore/tables/table.py", line 394, in __init__
    Table.__init__(self, tabname, concatsubtables, 0, 0, 0)
ArgumentError: Python argument types in
    Table.__init__(table, list, list, int, int, int)
did not match C++ signature:
    __init__(_object*, casa::String, casa::String, casa::String, bool, casa::IPosition, casa::String, casa::String, int, int, casa::Vector<casa::String>, casa::Vector<casa::String>)
    __init__(_object*, casa::String, casa::Record, casa::String, casa::String, int, casa::Record, casa::Record)
    __init__(_object*, std::vector<casa::TableProxy, std::allocator<casa::TableProxy> >, casa::Vector<casa::String>, int, int, int)
    __init__(_object*, casa::Vector<casa::String>, casa::Vector<casa::String>, casa::Record, int)
    __init__(_object*, casa::String, casa::Record, int)
    __init__(_object*, casa::String, std::vector<casa::TableProxy, std::allocator<casa::TableProxy> >)
    __init__(_object*, casa::TableProxy)
    __init__(_object*)
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
PicklingError: Can't pickle <class 'Boost.Python.ArgumentError'>: import of module Boost.Python failed

Process leaked file descriptors. See https://jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors for more information
Build step 'Execute shell' marked build as failure

Recording test results
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files were found. Configuration error?
Adding one-line test results to commit status...
Setting status of 747e1c6d1d2e457946fde892852ec4bf55fafe08 to FAILURE with url https://jenkins.meqtrees.net/job/cubical-pr/70/ and message: 'Build finished. No test results found.'

Finished: FAILURE

Smells like a python-casacore bug

gijzelaerr · 2019-05-17T10:46:10Z

What python and what python-casacore are you using.

ratt-priv-ci · 2019-05-17T10:48:19Z

This is the KERN-3 casacore. Nothing has changed in the installation process: https://github.com/ratt-ru/CubiCal/blob/master/Dockerfile

…

On Fri, May 17, 2019 at 12:46 PM Gijs Molenaar ***@***.***> wrote: What python and what python-casacore are you using. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#270?email_source=notifications&email_token=AEIVPJU277YYY3GPPZLN563PV2EHHA5CNFSM4HNLUJCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVUNR4I#issuecomment-493410545>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEIVPJVBVOX24RFFZUG752DPV2EHHANCNFSM4HNLUJCA> .

--

----------------------------------------------------------------------------------------------------- Benjamin Hugo Junior Software Developer SARAO Black River Park, 2 Fir Street, Observatory, Cape Town, Western Cape, 7925 Contact: [+27] 0716293858 <+27%2071%20629%203858> PhD. student, Radio Astronomy Techniques and Technologies, Department of Physics and Electronics, Rhodes University Skype: benna.cn -----------------------------------------------------------------------------------------------------

gijzelaerr · 2019-05-17T11:07:22Z

This is due to unicode ending up in the python-casacore code. Only the current (python-)casacore has proper Python2 support for unicode 2. If you encounter errors like this the casacore call containing the string needs to have a str() wrapper, or at some point a newer (python-)casacore needs to be used.

Unfortunatly i'm now out of time to work on this.

bennahugo · 2019-05-17T11:13:06Z

ok I will change the build system to check the latest revision from casacore and build from there. Unfortunately this can only happen next week as I'm busy packaging killMS at the moment.

gijzelaerr · 2019-05-17T11:16:47Z

Ideally it works with both the old and new python-casacore, it is probably an easy fix to insert some str() here and there.

bennahugo · 2019-05-17T11:19:51Z

hmm ok. I will debug this further next week

gijzelaerr · 2019-05-17T13:09:44Z

Where do you run this container with what? I can't replicate your issue.

bennahugo · 2019-05-17T13:19:01Z

On the Jenkins CI with the following command line

WORKSPACE_ROOT="$WORKSPACE/$BUILD_NUMBER"
TEST_OUTPUT_DIR="$WORKSPACE_ROOT/test-output"
TEST_DATA_DIR="$WORKSPACE/../../../test-data"
mkdir $TEST_OUTPUT_DIR

# build and testrun
docker build -t cubical:${BUILD_NUMBER} ${WORKSPACE_ROOT}/projects/Cubical/
docker run --rm cubical:${BUILD_NUMBER}

#run tests
docker run --rm -m 100g --cap-add sys_ptrace \
				   --memory-swap=-1 \
                   --shm-size=150g \
                   --rm=true \
                   --name=cubical$BUILD_NUMBER \
                   -v ${TEST_OUTPUT_DIR}:/workspace \
                   -v ${TEST_OUTPUT_DIR}:/root/tmp \
                   --entrypoint /bin/bash \
                   cubical:${BUILD_NUMBER} \
                   -c "cd /src/cubical && apt-get install -y git && pip install -r requirements.test.txt && nosetests --with-xunit --xunit-file /workspace/nosetests.xml test"

gijzelaerr · 2019-05-17T13:25:51Z

for me the tests are running. What specific test fails?

bennahugo · 2019-05-17T13:45:19Z

Its the main acceptance test on 147, see https://jenkins.meqtrees.net/job/cubical-pr/70/console

The latest commits don't build successfully
https://jenkins.meqtrees.net/job/cubical-pr/72/console

gijzelaerr · 2019-05-17T13:50:01Z

Ok, my guess is that this is because you use KERN-3 which has casascore 2.4.1 in it.

bennahugo · 2019-05-20T07:23:18Z

ok before this is merged we need to work on a fix for casacore 2. Some of the lofar packages do not work with the new casacore and we do need to be able to run ddfacet, killms and cubical on the same installation, otherwise it just becomes too messy for the users.

bennahugo · 2019-05-20T07:23:47Z

Also we need to keep long term support for Ubuntu 16.04

bennahugo · 2019-05-24T18:31:36Z

See issue: casacore/python-casacore#174

o-smirnov · 2019-05-24T23:11:36Z

Hmm not sure I quite agree - without basic running tests in place how do we know we don't break our hard labour that went into the python 3 mode?

Well we're not breaking it by merging surely. The codebase has been made py3-compatible, we merge it in and carry on the revolution on another branch?

bennahugo · 2019-05-28T17:16:58Z

Ok this version runs through on 16.04 py2 and 18.04 py2 and py3. However I had to made changes to the way you compute time indicies since python3 does not accept floating point arrays as index arrays, see the ms_tile provider. I also removed the SIN projection since the new montblanc does this internally.

Since we made so many changes to montblanc itself I've tested it using my small DDFacet use case and it subtracts very well as you can see on the pull request, so I doubt this is a montblanc py3 porting issue.

The variation between 16.04 and 18.04 is on the e-3 level, but both now fail. Whether or not this is a substantial difference I don't know. This needs further investigation but i suggest @o-smirnov and @JSKenyon quote the relative error instead of the absolute difference so we can better understand the
significance of the difference? 16.04 and 18.04 difference:

 - 17:08:59 - main               [0.2/0.2 1.1/1.1 0.6Gb] completed successfully
*** max diff between CORRECTED_DATA and DE_DATA is 0.00152961199638
E
======================================================================

and

*** max diff between CORRECTED_DATA and DE_DATA is 0.002001586603
E
======================================================================

bennahugo · 2019-05-28T18:00:16Z

An even better idea is to write out the model and compute the chi^2 between the model and the de-corrected residuals.

For now I've implemented the mean relative error check in decibels . I've set the threshold to -30 dB on the mean relative error and -25 dB on the 95th percentile. This gives us some leeway in the tests. Currently on the ubuntu 18.04 system this stands at:

 - 19:56:10 - main               [0.1/0.2 1.2/1.3 0.5Gb] completed successfully
*** mean relative diff between CORRECTED_DATA and DE_DATA is -35.98283290863037 dB
*** ninety fifth percentile relative diff between CORRECTED_DATA and DE_DATA is -31.854880104464907 dB
.
----------------------------------------------------------------------
Ran 1 test in 135.944s

OK

So I think we can merge this bastard @o-smirnov

bennahugo · 2019-05-29T07:50:39Z

@JSKenyon I'll make 2 images to tripple check that this didn't break anything but -30dB is well within instantaneous instrumental noise, so I don't think these gain differences are substantial enough to worry

JSKenyon · 2019-05-29T08:47:42Z

cubical/data_handler/ms_tile.py

+                    tindx = np.add.accumulate(tindx)
+                    return tindx
+
+                self._row_identifiers = ddid_index * n_bl * ntime + timestep_index(self.times) * n_bl + \


I am not convinced by this - in my opinion we are trying to fix something which should not be broken. We need to find the root cause otherwise we may have unexpected behaviour elsewhere. Doing a search for times only yields 25 results and I cannot see any reason for self.times to become non-integer. For reference, self.time_col contains float values from the TIME column of the MS. self.times is of the same shape but instead contains integer indices associating each time with the timeslot/integration to which it belongs. These indices are generated by uniquify, which is defined init.py in the data_handler folder.

It must be broken upstream from this method somewhere. I can revert the change and if you run it with py3 it will break

I think the safest thing is to sprinkle asserts for the data types into the codebase as they can always be disabled with python if you want full performance

Yeah there's no way times should be float. I vote to rename it though, as from the variable name it's not obvious at all that this means "timeslot index" (anyone new to the code base will just take it to mean the plural of "time").

Ok I found the difference it was stemming from your n_bl calculation which I already fixed with //

The relative difference is still

*** mean relative diff between CORRECTED_DATA and DE_DATA is -35.98283290863037 dB *** ninety fifth percentile relative diff between CORRECTED_DATA and DE_DATA is -31.854880104464907 dB

which points to this being the sin projection you previously employed

gijzelaerr · 2019-05-29T10:02:22Z

Or start using type annotations, mypy and pycharm will warn you. Op wo 29 mei 2019 11:56 schreef Oleg Smirnov <[email protected]>:

…

***@***.**** commented on this pull request. ------------------------------ In cubical/data_handler/ms_tile.py <#270 (comment)>: > @@ -147,34 +147,42 @@ def load_montblanc_models(self, uvwco, loaded_models, model_source, cluster, imo ddid_index, uniq_ddids, _ = data_handler.uniquify(self.ddid_col) self._freqs = np.array([self.tile.dh.chanfreqs[ddid] for ddid in uniq_ddids]) - - self._row_identifiers = ddid_index * n_bl * ntime + (self.times - self.times[0]) * n_bl + \ + + def timestep_index(times, tol=1.0e-9): + """ Compute the prescan operation to find the unique timestep idenfiers for + a TIME_CENTROID array """ + tindx = np.zeros_like(times, dtype=np.int64) + tindx[1:] = np.abs(times[1:] - times[:-1]) > tol + tindx = np.add.accumulate(tindx) + return tindx + + self._row_identifiers = ddid_index * n_bl * ntime + timestep_index(self.times) * n_bl + \ Yeah there's no way times should be float. I vote to rename it though, as from the variable name it's not obvious at all that this means "timeslot index" (anyone new to the code base will just take it to mean the plural of "time"). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#270?email_source=notifications&email_token=AACPVJACIWR3ABBBMS3L5KLPXZHONA5CNFSM4HNLUJCKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOBZ7EDKY#discussion_r288486664>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACPVJHXU3S4JW4SFOJGFFLPXZHONANCNFSM4HNLUJCA> .

o-smirnov · 2019-05-29T10:03:21Z

However I had to made changes to the way you compute time indicies since python3 does not accept floating point arrays as index arrays, see the ms_tile provider.

As discussed above, this looks to be a manifestation of another error.

I also removed the SIN projection since the new montblanc does this internally.

Since the reference data was generated with the old code, the DE_DATA test (the only one using Montblanc) can be expected to fail, since the effective model source positions will have shifted slightly. If you're right, the old code uses slightly incorrect positions, and the new code will subtract "better".

The way to verify that the new code is "better" (or at least "as correct") is to run the test with a full MS (in C and D config), and eyeball the residual images.

bennahugo · 2019-05-29T10:36:00Z

@o-smirnov I've already verified that montblanc is subtracts correctly within MeerKAT resolution which is comparable to VLA C

bennahugo · 2019-05-29T10:39:25Z

See ratt-ru/montblanc#244

bennahugo · 2019-05-29T10:54:18Z

I still maintain the differences in projection is well within visibility error bars, however I'm going to merge in montblanc master into the branch and redo the DDF MK deep2 cleaning test, which is an order of magnitude higher resolution than VLA D configuration.

bennahugo · 2019-05-29T14:38:46Z

@JSKenyon this last commit fixes your cythonization upon wheel building. It also gets invoked upon standard python setup.py install which ensures wheels can be build directly from the raw release code. I prefer this method because it means you can call pip install git+.... or pip install directly from source on pypi. It also means that your source distribution on pypi can have a direct corresponding tarball on github without the need to include c/cpp files in the one and not the other.

JSKenyon · 2019-05-31T09:28:05Z

Ok, I am sort of convinced that this is in a working state for both 2.7 and 3.6. However, post merge there are several additional fixes which need to be made. I will make issues for them, but for the sake of my memeory I will also mention them here. @o-smirnov currently postmortem flagging breaks in flag3_to_col and we need to verify flagging behaviour as my current MWE ends up with unflagged bad data.

o-smirnov · 2019-05-31T09:59:58Z

Postmortem has been broken for a long long time. We need a separate branch to fix it. If it's even worth fixing, which I'm not too sure about. What you should try is --g-max-prior-error and --g-max-post-error. These are now off (0) by default. A setting of 0.1 is appropriate for well behaved data with the occasional outlier. Cheers, Oleg

…

--- Sent from my phone. Quality of spelling inversely proportional to finger size.

On Fri, 31 May 2019, 11:28 JSKenyon, ***@***.***> wrote: Ok, I am sort of convinced that this is in a working state for both 2.7 and 3.6. However, post merge there are several additional fixes which need to be made. I will make issues for them, but for the sake of my memeory I will also mention them here. @o-smirnov <https://github.com/o-smirnov> currently postmortem flagging breaks in flag3_to_col <https://github.com/ratt-ru/CubiCal/blob/342e7261117412d1abad03d5045aa5da6f815240/cubical/data_handler/ms_data_handler.py#L1280> and we need to verify flagging behaviour as my current MWE ends up with unflagged bad data. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#270?email_source=notifications&email_token=ABRLTP2ZB55U3KNS6VWIZW3PYDVSLA5CNFSM4HNLUJCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWUW3QQ#issuecomment-497642946>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABRLTPZ76MKEF6QRZ5UR3QDPYDVSLANCNFSM4HNLUJCA> .

JSKenyon · 2019-05-31T10:18:28Z

@o-smirnov that's just it - I did have those set. Perhaps my parset is outdated and there is some other setting I have missed. I made a local change to the postmortem flagging code to make it run, and that produced sensible results. I am just a little concerned that flags are not propagating as expected. But it is not needed in this PR - we can look at fixing up all the little issues after the merge.

bennahugo · 2019-05-31T12:10:26Z

@JSKenyon can you submit all your changes so I can do one final check before we press the (red) merge button

JSKenyon · 2019-05-31T13:02:42Z

@bennahugo I have already pushed the minor changes I made. See the last two commits. So feel free to go ahead.

gijzelaerr added 7 commits May 15, 2019 11:59

run 2to3

64e4eb5

ignore virtualen vstuff

6b0e1a5

configparser is stricter

4a94c61

fix print sytnax

b377357

porting

9e4ba91

improve compat with older python-casacore

8a421e4

more old python-casascore compat code

747e1c6

gijzelaerr requested review from o-smirnov, JSKenyon and bennahugo May 16, 2019 11:35

gijzelaerr mentioned this pull request May 16, 2019

Make Python3 compatible #245

Closed

finishing touch

84f7710

bennahugo requested changes May 16, 2019

View reviewed changes

add ben feedback

1a99555

Ensure CC 3.0 is installed

b55a6a4

Py2 and Py3 compatible changes

9afe9b0

bennahugo mentioned this pull request May 28, 2019

Remove spectral index and allow input of stokes varying by source, time and channel. ratt-ru/montblanc#244

Merged

Make test relative

91dc275

JSKenyon reviewed May 29, 2019

View reviewed changes

Add asserts to verify types in ms_tile

15ccd33

fix pip install from a non-cythonized source directory

2723e27

JSKenyon added 2 commits May 30, 2019 10:02

Fix dtype error on phase centre array.

1ba4fef

Fix for python2 bombing when inseting BITFLAG column.

b213dcb

ratt-priv-ci added 3 commits May 31, 2019 15:47

Depend on tagged release of montblanc

285c9ac

typo

036d385

Done and dusted py3 tested

d0909b9

bennahugo merged commit 716cdb7 into master May 31, 2019

bennahugo deleted the py3_v2 branch May 31, 2019 16:56

Add support for Python3 #270

Add support for Python3 #270

Conversation

gijzelaerr commented May 16, 2019 • edited Loading

ratt-priv-ci commented May 16, 2019

bennahugo commented May 16, 2019

bennahugo left a comment

Choose a reason for hiding this comment

bennahugo commented May 16, 2019

gijzelaerr commented May 17, 2019

ratt-priv-ci commented May 17, 2019 via email

gijzelaerr commented May 17, 2019

bennahugo commented May 17, 2019

gijzelaerr commented May 17, 2019

bennahugo commented May 17, 2019

gijzelaerr commented May 17, 2019

bennahugo commented May 17, 2019

gijzelaerr commented May 17, 2019

bennahugo commented May 17, 2019

gijzelaerr commented May 17, 2019

bennahugo commented May 20, 2019

bennahugo commented May 20, 2019

bennahugo commented May 24, 2019

o-smirnov commented May 24, 2019

bennahugo commented May 28, 2019 • edited Loading

bennahugo commented May 28, 2019

bennahugo commented May 29, 2019

JSKenyon May 29, 2019

Choose a reason for hiding this comment

bennahugo May 29, 2019

Choose a reason for hiding this comment

bennahugo May 29, 2019

Choose a reason for hiding this comment

o-smirnov May 29, 2019

Choose a reason for hiding this comment

bennahugo May 29, 2019

Choose a reason for hiding this comment

gijzelaerr commented May 29, 2019 via email

o-smirnov commented May 29, 2019

bennahugo commented May 29, 2019

bennahugo commented May 29, 2019

bennahugo commented May 29, 2019

bennahugo commented May 29, 2019

JSKenyon commented May 31, 2019

o-smirnov commented May 31, 2019 via email

JSKenyon commented May 31, 2019

bennahugo commented May 31, 2019

JSKenyon commented May 31, 2019

gijzelaerr commented May 16, 2019 •

edited

Loading

bennahugo commented May 28, 2019 •

edited

Loading