Fitter API and make test cases work #289

drvdputt · 2024-05-14T00:53:08Z

Here next step for #280. Aside from the change in the features table introduced by #283, the user-facing Model things are basically unchanged (unless users were "digging for the astropy model" to do certain things).

For deep debugging or testing, one can still get direct access to the astropy stuff via model.fitter.model (model is a Model, model.fitter is an APFitter, model.fitter.model is the underlying astropy model object).

You can look at the implementation details of APFitter to get a feel for how it works. But it is more important to take a good look at Fitter, and how its functions are used in Model.

I also addressed the issues with the test cases (avoiding the use of .loc), and they seem to be running smoothly.

Besides complete removal of base.py, a reference to PAHFITBase in the documentation was also removed.

After this, only one more big change is needed: PowerGaussian and PowerDrude, and the change from amplitude to power units for the Features table.

jdtsmith · 2024-05-14T14:35:28Z

pahfit/apfitter.py

Can we create a new directory/module pahfit.fitters to place the fitters? And each can have a directory in there, to place components and other files they need.

I created pahfit/fitters (with an __init__.py), and did the following moves
pahfit.fitter -> pahfit.fitters.fitter
pahfit.apfitter -> pahfit.fitters.ap_fitter
pahfit.component_models -> pahfit.fitters.ap_components

pahfit/features/features.py

pahfit/fitter.py

pahfit/apfitter.py

jdtsmith

Oops hit review again.

pahfit/apfitter.py

pahfit/fitter.py

jdtsmith · 2024-05-14T17:00:48Z

pahfit/fitter.py

+        pass
+
+    @abstractmethod
+    def evaluate_model(self, xz):


Please just evaluate. This is going to make it quite hard to do tabulate I guess, since it's just giving "the whole model". Can you pass whatever wavelengths you want? How would that work with a model which contains "duplicate" features for JointFitter types? How does it know which of those features to use? I strongly suspect this means we need to move this kind of logic out into tabulate.

Philosophical question: can a Fitter, the same Fitter you earlier fit data with, be used to evaluate_model on a different set of wavelengths? It has all its own internal ideas of FWHM for lines and everything, so in principle it could. But what about multiple segments, with multiple values for FWHM for the same line? Maybe the rule is: must use same exact bounds (can increase point density if you want), and same shape e.g. (list of N segments). Enforced, or convention?

I do not have a clear answer for this.

For multiple segments, I speculate that there would be multiple model functions, each one to be compared to a certain segment during the fitting (and of which the chi2 are summed together). In that case, and index could be passed, to indicate which part of the "multi-function" should be evaluated.

I think we should first work out how we can reuse the Fitter to tabulate individual or groups of components. Perhaps passing a list of feature names to evaluate() would be an option? In that case, the evaluation can be generic (implemented in the ABC Fitter using the generic component evaluation functions) or specific (making use of the individual astropy components) as necessary.

If we want to avoid duplicating any logic, having it in Model.tabulate somehow, or the Fitter ABC, would be best.

This I think is an argument in favor of implementing the standard set of Fitter.gaussian/drude/blackbody etc. components. There are really only a few of those. Then we could do away with Fitter.evaluate altogether. Evaluation is not (really) the Fitter's job. The fitter's job is to do the fit internally using whatever methods and data structures it wants, and return to us the results.

Beyond that, for Model's convenience, we'd also like each Fitter to provide access to the component functions with specified arguments (units, order, etc.) and output units. Maybe for simplicity and convenience we can ask it for a full Fitter.attenuation_function, providing to that method geometry, model, tau_si, and details for other attenuation tau's, by which we can multiply our spectra, or components of our spectra (TBD: increase the tau FWHMs for LSF: Sara Profile again).

With this in place, tabulate is all we need, and it can be generic. It just looks at the (set of) spectra it's being called with, does the same parameter marrying with the instrument pack that it does for Fitter.fit, and sums up the various fitter.drude(x,y,z), etc., finally multiplying by the attenuation_function.

Yes, this is additional work that is broadly similar to what each Fitter is doing internally, but on another level it serves as a good "output comprehension" test. If you use the Fitter's results to tabulate the model and it doesn't actually match the spectrum, you have not understood the output.

We can even add a test to demand that all Fitter's, when given some standard input, produce the ~same output for each of the component types.

pahfit/fitter.py

jdtsmith · 2024-05-14T17:09:06Z

pahfit/model.py

-            loop_over_non_fixed("line", "fwhm", line_fwhm_guess, force=True)
+            for row_index in np.where(self.features["kind"] == "line")[0]:
+                row = self.features[row_index]
+                if np.ma.is_masked(row["fwhm"]):


Does that work with a structured array?

As far as I know, the element (features[row_index][col_name]) can still be masked in its entirety. I.e., it can be equal to np.ma.masked, or it can be a structured array element (the one we defined with 'val', 'min', and 'max')

How would a structure array have just a single value, though? I thought our method for "fully masking" a given parameter is to mask the value?

E.g.

In [12]: f.loc['PAH_6.2']['wavelength'] Out[12]: (6.22, nan, nan) # "starlight" does not have the "wavelength" parameter -> masked In [9]: f.loc['starlight']['wavelength'] Out[9]: masked In [14]: np.ma.is_masked(f.loc['starlight']['wavelength']) Out[14]: True

I think that's just for display. Underlying that is still a structure array member. I believe masking any of the structure array members results in this.

I ran into a bug due to this. Apparently, in our current setup, row['fwhm'] can be either 0-dimensional or 1-dimensional. (with row a row of the features table)

This is a problem, because if we want to check if 'fwhm' is masked, then row['fwhm'][0] will yield an error if row['fwhm'] is 0-dimensional. Vice versa, in the 1-dimensional case, we also get an error if we try to use np.ma.is_masked(row['fwhm']).

A workaround would be checking row['fwhm'].ndim each time.

To be clear, if the above displays masked then we have the 0-dimensional case. So yes, it really is a single value somehow. Even more striking: if I check the data attribute, i get this

ipdb> p row['fwhm'].data array(0.)

Yeah I confirm that masked values in structured arrays show up as numpy.ma.core.MaskedConstant. So we should I guess always first check if a value is masked using a simple:

import numpy as np from astropy.table import Table t=Table() PAR_DTYPE = np.dtype([("val", "f"), ("min", "f"), ("max", "f")]) a = np.ma.array([(np.pi, 2, 4.5), (np.pi / 2, 1, np.nan), (4, np.nan, np.nan)], dtype=PAR_DTYPE, mask=[1, 0, 0]) t['power'] = a t.pprint() t[0]['power'] is np.ma.masked # True t[1]['power'] is np.ma.masked # False

pahfit/model.py

jdtsmith · 2024-05-22T17:55:10Z

pahfit/model.py

+        flux = flux_obs * (1 + z)  # energy higher
+        unc = unc_obs * (1 + z)  # uncertainty scales with flux
+        return lam_obs, flux_obs, unc_obs, lam, flux, unc


drvdputt · 2024-06-25T15:06:48Z

Lines 183-186 of model.py:
        inst, z = self._parse_instrument_and_redshift(spec, redshift)
        # save these as part of the model (will be written to disk too)
        self.features.meta["redshift"] = inst
        self.features.meta["instrument"] = z
Is this correct? It seems that these definitions for redshift and instrument are flipped?

This has now been fixed (see latest commits pushed).

I have also gone through most of the other comments. So we should be close to merging this, and then starting a new pull request to integrate my implementation of the power-features.

A few things revealed by the tests:

@jdtsmith Linkcheck says the link to your website is broken (this one: http://tir.astro.utoledo.edu/jdsmith/research/pahfit.php)
Codecov is complaining about needing a token. Not super important right now. I think we should fix all our continous integration services in a single pull request. Will take another look at the reports and summarize in an issue.
I did a second push to fix a double space somewhere.

jdtsmith · 2024-06-25T18:14:58Z

Thanks for getting back into it Dries. My server has been cutoff at the firewall, working to restore access. I agree about the CI. Do we need to solve #294 now too given all the redshift discussion?

jdtsmith · 2024-07-03T18:17:30Z

Update: my server will unfortunately not be allowed back on the external internet (new VPN-only policy); I've uploaded "PAHFIT classic" to https://github.com/PAHFIT/pahfit_classic. Can you update the internal links to point there?

Aside from the redshift discussion which seems to have wound down, what else is needed on this PR?

drvdputt · 2024-07-03T20:34:58Z

In addition to your website, there is an additional documentation issue I discovered. So I just pushed two new link fixes.

The biggest issue still remaining is cases like this:

            for row_index in np.where(self.features["kind"] == "line")[0]:
                row = self.features[row_index]
                if np.ma.is_masked(row["fwhm"]):

As in the comments above, row["fwhm"] can be a single element, or a tuple (so 0D or 1D). I ran into this issue while running one of my demo notebooks. There is no "readable" check that works for both. But I have a hacky workaround, which replaces some of of the np.ma.is_masked checks with .ndim == 0.

jdtsmith · 2024-07-09T00:57:13Z

But I have a hacky workaround, which replaces some of of the np.ma.is_masked checks with .ndim == 0.

Did you see my row["fwhm"] is ma.masked method above? That seems to be clearest and works for scalars and 1D things equally.

drvdputt · 2024-07-09T19:29:21Z

Looks promising doing some tests in the python shell. I will give it a shot and see if I can get my notebook to run.

drvdputt · 2024-07-09T20:16:39Z

Seems to work fine for now. But at one point, this logic will have to be revised, when we set up a clear way to allow users to fit the FWHM of specific lines. Will be addressed by #293.

Let's leave the redshift issue as is for now, so we can finish the work on the power features too. Once this has been merged, I can rebase my power features branch, and spin up the next pull request.

jdtsmith · 2024-07-09T20:26:11Z

OK, great, shall we go ahead and merge this to dev?

drvdputt · 2024-07-09T20:30:10Z

I think this is good to go. I will be actively using the dev branch (with my power features applied), so I will see if anything pops up along the way.

Dries Van De Putte added 6 commits May 13, 2024 20:04

Add Fitter API and APFitter implementation

c3abd93

Delete base.py and use Fitter API in Model class implementation

a73dcb4

Regular floats for Features _bounds_dtype to exactly match test

87a6e18

Fixes based on test cases

a5e370d

Remove obsolete helpers.calculate_compounds, more formatting

9ded572

Remove mentions of PAHFITBase

c1a2665