-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fitter API and make test cases work #289
Conversation
pahfit/apfitter.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create a new directory/module pahfit.fitters
to place the fitters? And each can have a directory in there, to place components and other files they need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created pahfit/fitters (with an __init__.py
), and did the following moves
pahfit.fitter -> pahfit.fitters.fitter
pahfit.apfitter -> pahfit.fitters.ap_fitter
pahfit.component_models -> pahfit.fitters.ap_components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops hit review again.
pahfit/fitter.py
Outdated
pass | ||
|
||
@abstractmethod | ||
def evaluate_model(self, xz): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just evaluate
. This is going to make it quite hard to do tabulate
I guess, since it's just giving "the whole model". Can you pass whatever wavelengths you want? How would that work with a model which contains "duplicate" features for JointFitter types? How does it know which of those features to use? I strongly suspect this means we need to move this kind of logic out into tabulate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Philosophical question: can a Fitter
, the same Fitter
you earlier fit data with, be used to evaluate_model
on a different set of wavelengths? It has all its own internal ideas of FWHM
for lines and everything, so in principle it could. But what about multiple segments, with multiple values for FWHM
for the same line? Maybe the rule is: must use same exact bounds (can increase point density if you want), and same shape e.g. (list of N segments). Enforced, or convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have a clear answer for this.
For multiple segments, I speculate that there would be multiple model functions, each one to be compared to a certain segment during the fitting (and of which the chi2 are summed together). In that case, and index could be passed, to indicate which part of the "multi-function" should be evaluated.
I think we should first work out how we can reuse the Fitter
to tabulate individual or groups of components. Perhaps passing a list of feature names to evaluate()
would be an option? In that case, the evaluation can be generic (implemented in the ABC Fitter
using the generic component evaluation functions) or specific (making use of the individual astropy components) as necessary.
If we want to avoid duplicating any logic, having it in Model.tabulate
somehow, or the Fitter
ABC, would be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I think is an argument in favor of implementing the standard set of Fitter.gaussian/drude/blackbody
etc. components. There are really only a few of those. Then we could do away with Fitter.evaluate
altogether. Evaluation is not (really) the Fitter
's job. The fitter's job is to do the fit internally using whatever methods and data structures it wants, and return to us the results.
Beyond that, for Model
's convenience, we'd also like each Fitter
to provide access to the component functions with specified arguments (units, order, etc.) and output units. Maybe for simplicity and convenience we can ask it for a full Fitter.attenuation_function
, providing to that method geometry
, model
, tau_si
, and details for other attenuation tau
's, by which we can multiply our spectra, or components of our spectra (TBD: increase the tau
FWHMs for LSF: Sara Profile again).
With this in place, tabulate
is all we need, and it can be generic. It just looks at the (set of) spectra it's being called with, does the same parameter marrying with the instrument pack that it does for Fitter.fit
, and sums up the various fitter.drude(x,y,z)
, etc., finally multiplying by the attenuation_function
.
Yes, this is additional work that is broadly similar to what each Fitter is doing internally, but on another level it serves as a good "output comprehension" test. If you use the Fitter
's results to tabulate the model and it doesn't actually match the spectrum, you have not understood the output.
We can even add a test to demand that all Fitter
's, when given some standard input, produce the ~same output for each of the component types.
pahfit/model.py
Outdated
loop_over_non_fixed("line", "fwhm", line_fwhm_guess, force=True) | ||
for row_index in np.where(self.features["kind"] == "line")[0]: | ||
row = self.features[row_index] | ||
if np.ma.is_masked(row["fwhm"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that work with a structured array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, the element (features[row_index][col_name]) can still be masked in its entirety. I.e., it can be equal to np.ma.masked, or it can be a structured array element (the one we defined with 'val', 'min', and 'max')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would a structure array have just a single value, though? I thought our method for "fully masking" a given parameter is to mask the value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g.
In [12]: f.loc['PAH_6.2']['wavelength']
Out[12]: (6.22, nan, nan)
# "starlight" does not have the "wavelength" parameter -> masked
In [9]: f.loc['starlight']['wavelength']
Out[9]: masked
In [14]: np.ma.is_masked(f.loc['starlight']['wavelength'])
Out[14]: True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's just for display. Underlying that is still a structure array member. I believe masking any of the structure array members results in this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into a bug due to this. Apparently, in our current setup, row['fwhm']
can be either 0-dimensional or 1-dimensional. (with row
a row of the features table)
This is a problem, because if we want to check if 'fwhm' is masked, then row['fwhm'][0]
will yield an error if row['fwhm']
is 0-dimensional. Vice versa, in the 1-dimensional case, we also get an error if we try to use np.ma.is_masked(row['fwhm'])
.
A workaround would be checking row['fwhm'].ndim
each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, if the above displays masked
then we have the 0-dimensional case. So yes, it really is a single value somehow. Even more striking: if I check the data attribute, i get this
ipdb> p row['fwhm'].data
array(0.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I confirm that masked values in structured arrays show up as numpy.ma.core.MaskedConstant
. So we should I guess always first check if a value is masked using a simple:
import numpy as np
from astropy.table import Table
t=Table()
PAR_DTYPE = np.dtype([("val", "f"), ("min", "f"), ("max", "f")])
a = np.ma.array([(np.pi, 2, 4.5), (np.pi / 2, 1, np.nan), (4, np.nan, np.nan)], dtype=PAR_DTYPE, mask=[1, 0, 0])
t['power'] = a
t.pprint()
t[0]['power'] is np.ma.masked # True
t[1]['power'] is np.ma.masked # False
flux = flux_obs * (1 + z) # energy higher | ||
unc = unc_obs * (1 + z) # uncertainty scales with flux | ||
return lam_obs, flux_obs, unc_obs, lam, flux, unc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #294.
This has now been fixed (see latest commits pushed). I have also gone through most of the other comments. So we should be close to merging this, and then starting a new pull request to integrate my implementation of the power-features. A few things revealed by the tests:
|
Thanks for getting back into it Dries. My server has been cutoff at the firewall, working to restore access. I agree about the CI. Do we need to solve #294 now too given all the redshift discussion? |
Update: my server will unfortunately not be allowed back on the external internet (new VPN-only policy); I've uploaded "PAHFIT classic" to https://github.com/PAHFIT/pahfit_classic. Can you update the internal links to point there? Aside from the redshift discussion which seems to have wound down, what else is needed on this PR? |
In addition to your website, there is an additional documentation issue I discovered. So I just pushed two new link fixes. The biggest issue still remaining is cases like this: for row_index in np.where(self.features["kind"] == "line")[0]:
row = self.features[row_index]
if np.ma.is_masked(row["fwhm"]): As in the comments above, row["fwhm"] can be a single element, or a tuple (so 0D or 1D). I ran into this issue while running one of my demo notebooks. There is no "readable" check that works for both. But I have a hacky workaround, which replaces some of of the |
Did you see my |
Looks promising doing some tests in the python shell. I will give it a shot and see if I can get my notebook to run. |
Seems to work fine for now. But at one point, this logic will have to be revised, when we set up a clear way to allow users to fit the FWHM of specific lines. Will be addressed by #293. Let's leave the redshift issue as is for now, so we can finish the work on the power features too. Once this has been merged, I can rebase my power features branch, and spin up the next pull request. |
OK, great, shall we go ahead and merge this to |
I think this is good to go. I will be actively using the dev branch (with my power features applied), so I will see if anything pops up along the way. |
Here next step for #280. Aside from the change in the features table introduced by #283, the user-facing
Model
things are basically unchanged (unless users were "digging for the astropy model" to do certain things).For deep debugging or testing, one can still get direct access to the astropy stuff via
model.fitter.model
(model
is aModel
,model.fitter
is anAPFitter
, model.fitter.model is the underlying astropy model object).You can look at the implementation details of
APFitter
to get a feel for how it works. But it is more important to take a good look atFitter
, and how its functions are used inModel
.I also addressed the issues with the test cases (avoiding the use of
.loc
), and they seem to be running smoothly.Besides complete removal of
base.py
, a reference to PAHFITBase in the documentation was also removed.After this, only one more big change is needed: PowerGaussian and PowerDrude, and the change from amplitude to power units for the
Features
table.