Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fitter API and make test cases work #289

Merged
merged 27 commits into from
Jul 9, 2024
Merged

Conversation

drvdputt
Copy link
Contributor

Here next step for #280. Aside from the change in the features table introduced by #283, the user-facing Model things are basically unchanged (unless users were "digging for the astropy model" to do certain things).

For deep debugging or testing, one can still get direct access to the astropy stuff via model.fitter.model (model is a Model, model.fitter is an APFitter, model.fitter.model is the underlying astropy model object).

You can look at the implementation details of APFitter to get a feel for how it works. But it is more important to take a good look at Fitter, and how its functions are used in Model.

I also addressed the issues with the test cases (avoiding the use of .loc), and they seem to be running smoothly.

Besides complete removal of base.py, a reference to PAHFITBase in the documentation was also removed.

After this, only one more big change is needed: PowerGaussian and PowerDrude, and the change from amplitude to power units for the Features table.

Copy link
Contributor

@jdtsmith jdtsmith May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a new directory/module pahfit.fitters to place the fitters? And each can have a directory in there, to place components and other files they need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created pahfit/fitters (with an __init__.py), and did the following moves
pahfit.fitter -> pahfit.fitters.fitter
pahfit.apfitter -> pahfit.fitters.ap_fitter
pahfit.component_models -> pahfit.fitters.ap_components

pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/apfitter.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jdtsmith jdtsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops hit review again.

pahfit/apfitter.py Outdated Show resolved Hide resolved
pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/fitter.py Outdated
pass

@abstractmethod
def evaluate_model(self, xz):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just evaluate. This is going to make it quite hard to do tabulate I guess, since it's just giving "the whole model". Can you pass whatever wavelengths you want? How would that work with a model which contains "duplicate" features for JointFitter types? How does it know which of those features to use? I strongly suspect this means we need to move this kind of logic out into tabulate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Philosophical question: can a Fitter, the same Fitter you earlier fit data with, be used to evaluate_model on a different set of wavelengths? It has all its own internal ideas of FWHM for lines and everything, so in principle it could. But what about multiple segments, with multiple values for FWHM for the same line? Maybe the rule is: must use same exact bounds (can increase point density if you want), and same shape e.g. (list of N segments). Enforced, or convention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a clear answer for this.

For multiple segments, I speculate that there would be multiple model functions, each one to be compared to a certain segment during the fitting (and of which the chi2 are summed together). In that case, and index could be passed, to indicate which part of the "multi-function" should be evaluated.

I think we should first work out how we can reuse the Fitter to tabulate individual or groups of components. Perhaps passing a list of feature names to evaluate() would be an option? In that case, the evaluation can be generic (implemented in the ABC Fitter using the generic component evaluation functions) or specific (making use of the individual astropy components) as necessary.

If we want to avoid duplicating any logic, having it in Model.tabulate somehow, or the Fitter ABC, would be best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I think is an argument in favor of implementing the standard set of Fitter.gaussian/drude/blackbody etc. components. There are really only a few of those. Then we could do away with Fitter.evaluate altogether. Evaluation is not (really) the Fitter's job. The fitter's job is to do the fit internally using whatever methods and data structures it wants, and return to us the results.

Beyond that, for Model's convenience, we'd also like each Fitter to provide access to the component functions with specified arguments (units, order, etc.) and output units. Maybe for simplicity and convenience we can ask it for a full Fitter.attenuation_function, providing to that method geometry, model, tau_si, and details for other attenuation tau's, by which we can multiply our spectra, or components of our spectra (TBD: increase the tau FWHMs for LSF: Sara Profile again).

With this in place, tabulate is all we need, and it can be generic. It just looks at the (set of) spectra it's being called with, does the same parameter marrying with the instrument pack that it does for Fitter.fit, and sums up the various fitter.drude(x,y,z), etc., finally multiplying by the attenuation_function.

Yes, this is additional work that is broadly similar to what each Fitter is doing internally, but on another level it serves as a good "output comprehension" test. If you use the Fitter's results to tabulate the model and it doesn't actually match the spectrum, you have not understood the output.

We can even add a test to demand that all Fitter's, when given some standard input, produce the ~same output for each of the component types.

pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/fitter.py Outdated Show resolved Hide resolved
pahfit/model.py Outdated
loop_over_non_fixed("line", "fwhm", line_fwhm_guess, force=True)
for row_index in np.where(self.features["kind"] == "line")[0]:
row = self.features[row_index]
if np.ma.is_masked(row["fwhm"]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that work with a structured array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, the element (features[row_index][col_name]) can still be masked in its entirety. I.e., it can be equal to np.ma.masked, or it can be a structured array element (the one we defined with 'val', 'min', and 'max')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a structure array have just a single value, though? I thought our method for "fully masking" a given parameter is to mask the value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g.

In [12]: f.loc['PAH_6.2']['wavelength']
Out[12]: (6.22, nan, nan)

# "starlight" does not have the "wavelength" parameter -> masked
In [9]: f.loc['starlight']['wavelength']
Out[9]: masked

In [14]: np.ma.is_masked(f.loc['starlight']['wavelength'])
Out[14]: True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's just for display. Underlying that is still a structure array member. I believe masking any of the structure array members results in this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into a bug due to this. Apparently, in our current setup, row['fwhm'] can be either 0-dimensional or 1-dimensional. (with row a row of the features table)

This is a problem, because if we want to check if 'fwhm' is masked, then row['fwhm'][0] will yield an error if row['fwhm'] is 0-dimensional. Vice versa, in the 1-dimensional case, we also get an error if we try to use np.ma.is_masked(row['fwhm']).

A workaround would be checking row['fwhm'].ndim each time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, if the above displays masked then we have the 0-dimensional case. So yes, it really is a single value somehow. Even more striking: if I check the data attribute, i get this

ipdb>  p row['fwhm'].data

array(0.)

Copy link
Contributor

@jdtsmith jdtsmith Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I confirm that masked values in structured arrays show up as numpy.ma.core.MaskedConstant. So we should I guess always first check if a value is masked using a simple:

import numpy as np
from astropy.table import Table
t=Table()
PAR_DTYPE = np.dtype([("val", "f"), ("min", "f"), ("max", "f")])
a = np.ma.array([(np.pi, 2, 4.5), (np.pi / 2, 1, np.nan), (4, np.nan, np.nan)], dtype=PAR_DTYPE, mask=[1, 0, 0])
t['power'] = a
t.pprint()
t[0]['power'] is np.ma.masked # True
t[1]['power'] is np.ma.masked # False

pahfit/model.py Outdated Show resolved Hide resolved
pahfit/model.py Outdated Show resolved Hide resolved
Comment on lines +333 to +335
flux = flux_obs * (1 + z) # energy higher
unc = unc_obs * (1 + z) # uncertainty scales with flux
return lam_obs, flux_obs, unc_obs, lam, flux, unc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #294.

@drvdputt
Copy link
Contributor Author

Lines 183-186 of model.py:

        inst, z = self._parse_instrument_and_redshift(spec, redshift)
        # save these as part of the model (will be written to disk too)
        self.features.meta["redshift"] = inst
        self.features.meta["instrument"] = z

Is this correct? It seems that these definitions for redshift and instrument are flipped?

This has now been fixed (see latest commits pushed).

I have also gone through most of the other comments. So we should be close to merging this, and then starting a new pull request to integrate my implementation of the power-features.

A few things revealed by the tests:

  • @jdtsmith Linkcheck says the link to your website is broken (this one: http://tir.astro.utoledo.edu/jdsmith/research/pahfit.php)
  • Codecov is complaining about needing a token. Not super important right now. I think we should fix all our continous integration services in a single pull request. Will take another look at the reports and summarize in an issue.
  • I did a second push to fix a double space somewhere.

@jdtsmith
Copy link
Contributor

Thanks for getting back into it Dries. My server has been cutoff at the firewall, working to restore access. I agree about the CI. Do we need to solve #294 now too given all the redshift discussion?

@jdtsmith
Copy link
Contributor

jdtsmith commented Jul 3, 2024

Update: my server will unfortunately not be allowed back on the external internet (new VPN-only policy); I've uploaded "PAHFIT classic" to https://github.com/PAHFIT/pahfit_classic. Can you update the internal links to point there?

Aside from the redshift discussion which seems to have wound down, what else is needed on this PR?

@drvdputt
Copy link
Contributor Author

drvdputt commented Jul 3, 2024

In addition to your website, there is an additional documentation issue I discovered. So I just pushed two new link fixes.

The biggest issue still remaining is cases like this:

            for row_index in np.where(self.features["kind"] == "line")[0]:
                row = self.features[row_index]
                if np.ma.is_masked(row["fwhm"]):

As in the comments above, row["fwhm"] can be a single element, or a tuple (so 0D or 1D). I ran into this issue while running one of my demo notebooks. There is no "readable" check that works for both. But I have a hacky workaround, which replaces some of of the np.ma.is_masked checks with .ndim == 0.

@jdtsmith
Copy link
Contributor

jdtsmith commented Jul 9, 2024

But I have a hacky workaround, which replaces some of of the np.ma.is_masked checks with .ndim == 0.

Did you see my row["fwhm"] is ma.masked method above? That seems to be clearest and works for scalars and 1D things equally.

@drvdputt
Copy link
Contributor Author

drvdputt commented Jul 9, 2024

Looks promising doing some tests in the python shell. I will give it a shot and see if I can get my notebook to run.

@drvdputt
Copy link
Contributor Author

drvdputt commented Jul 9, 2024

Seems to work fine for now. But at one point, this logic will have to be revised, when we set up a clear way to allow users to fit the FWHM of specific lines. Will be addressed by #293.

Let's leave the redshift issue as is for now, so we can finish the work on the power features too. Once this has been merged, I can rebase my power features branch, and spin up the next pull request.

@jdtsmith
Copy link
Contributor

jdtsmith commented Jul 9, 2024

OK, great, shall we go ahead and merge this to dev?

@drvdputt
Copy link
Contributor Author

drvdputt commented Jul 9, 2024

I think this is good to go. I will be actively using the dev branch (with my power features applied), so I will see if anything pops up along the way.

@jdtsmith jdtsmith merged commit 50c19b7 into PAHFIT:dev Jul 9, 2024
13 of 15 checks passed
@drvdputt drvdputt deleted the dev-fitterapi branch July 9, 2024 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants