CNRDeco Initial Version (#567)

* Iniial checking of revised colored noise regularized deconvolution algorithm * Fix a hole in the C++ api for all atomic seismic data objects. There was a setter for the protected t0shift attribute but no getter. This adds the getter called get_t0shift. * add time_axis method to BasicTimeSeries to simply matplotlib graphics * Change method named deconvolve to process to mesh with RFdeconProcessor.py python interface * Fix bug from missing call to kill result with an invalid error * Checkpoint this version with scaffolding after a bug fix * Fix bug in failing to handle all zero arrays * Fix bun in handling duplicates in attributes_to_load list and load_if_defined list * Bug fixes with debug scaffolding - commit to save state but needs more work * Save a version before a significant change * Save this version before an experimental improvement in case I have to back it out * Fix rare bug in pad mode created by subsample t0 ambiguity * Fix handling in pad mode of off by one issue from t0 subsample ambiguity - this is closely link to previous commit of a C++ file * Add several new functions and fix a number of bugs - working version processes all test data successfully * Add sort_ensemble function * Partially tested version of new CNRDecon module - not yet working but preserve this version * Preserve debug version with print scaffolding * Version passing initial (incomplete) tests * Minor bug fix in array method found in initial, incomplete test * Temporary test file for interactive use and not with pytest * format with black * Untested fix for bug in handling undefined loc code in some situations. Also fixes a few other blemishes * Patch to address issue #570 * Add new implementation of EstimateBandwidth - note this version has plot scaffolding and should not be used for production * Change handling of read errors to address github issue #570 * Add a TODO comment that may indicate a future problem * Update docstrings * Fix typo before push * minor changes to save before removing scaffolding * Convert debug scaffolding graphics to a new function useful validating broadband_snr_QC behavior * Format with black * Break out function for robust filtering with estimated bandwidth * Implement the new filter function - accidentally overwrote first attempt cached incorrectly as previous commit * Fix bug failing to set ensemble live before returning * Fix handling of high f corner to silence filter warnings - prevent high corner exceeding 80 percent Nyquist * Several bug fixes and add arrival demean function * Fix error in computation in beam_coherence function * Fix pytest function for snr to match new algorithm * Fix errors in new tests for time_axis method * Update tests for some changes in error handling * Fix bugs detected in updated pytest run * Fix tests handling errors that use elog instead of throwing exceptions * Disable pickle tests for now. Needs work with pybind11 code to get that to work * Clean up formatting with black * Remove elog related to updating "sampling_rate" in resample.py * 🎨 Format Python code with psf/black * feat: make seisbench dependency optional (#578) * updated config files * fix github test failure * Resolve conflicts with master branch * Fixes to get boost to compile on macos - may be a problem * Add boost serialization code to all deconvolution operators - needed as step one for parallelization * Add C++ test program for deconvolution operator serialization * Force Boost_INCLUDE_DIR - seems necessary for macos but should be redundant for linux * Force SDK 14.5 for Macos - this may need to be disabled in the future * Add pybind11 code to allow pickle of decon operators * Minor fix to run in github enviornment * Add start of test script for CNRDeconEngine and enable pickle for RFdecon tests * Fix typo * Fix bug in handling power spectrum dt when auto setting fft size * Add code to normalize FIR impulse response outputs and assorted fixed found in testing * Add c++ tests for ShapingWavelet * Add feature to post QCmetrics to RF output * Near final form of pytest scripts with scaffolding commented out * Partial implementation of tests for new CNR algorithm * Add master pf files for decon algorithms * Revert to older version to keep gtree test script from failing * Fix path for pf file for this test * Mostly complete test for CNRRFDecon function * Change name of overloaded method for 3C dta * Change test pf file to match test name - previous used RFdeconProcessor.pf which was confusing * Fix error in handling time shifts for actual_output ethod * Additional changes for timing problem * Changes to fix problems found with pytest * Expand testing of CNRDecon module * Remove test file unintentionally included earlier * Update yaml cmake to newer version * Change to use newer yaml version * Fix error in tag - download 0.8.0 * Revert to 0.7.0 - 0.8.0 failes with cmake * Revert to yaml-cpp-0.6.3 - all later releases fail from a cmake configuration error * Fix path for AntelopePf constructors to use current directroy * Disable decon_serialization test run for now * Add error handlers in C++ code to make CNRDeconEngine more robust * Fix CNRDecon problems found with pytest * enable test_md and test_decon_serialization * enable test_md and test_decon_serialization with cmake file - should have been in earlier commit * Regularize the use of data files and add comment to Cmake file for maintenance * Add debug print statements to AntelopePf to sort out pytest failures on github * Add lines to clear PFPATH env value that caused later tests to mysteriously fail * Revert to pre 3.9 method to merge two dictionaries - 3.8 build was breaking because of this one line * Remove print scaffolding used to solve PFPATH issue * Format with black * Disable CMAKE_OSX_SYSROOT explicit setting --------- Co-authored-by: Cloud1e <[email protected]> Co-authored-by: Cloud1e <[email protected]> Co-authored-by: Jinxin Ma <[email protected]>
mspass-team · Dec 25, 2024 · 5c082c5 · 5c082c5
1 parent 3bb9746
commit 5c082c5
Show file tree

Hide file tree

Showing 56 changed files with 5,513 additions and 1,027 deletions.
diff --git a/cxx/CMakeLists.txt b/cxx/CMakeLists.txt
@@ -10,6 +10,11 @@ if (NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)
 endif ()
 message (STATUS "CMAKE_BUILD_TYPE = ${CMAKE_BUILD_TYPE}")
 
+
+# Necessary for MacOS running 14.x OS with Xcode command tools 
+# Commented out by default
+#set(CMAKE_OSX_SYSROOT "/Library/Developer/CommandLineTools/SDKs/MacOSX14.5.sdk")
+
 set (CMAKE_CXX_STANDARD 17)
 set (CMAKE_CXX_FLAGS_DEBUG "-Wall  -Wno-sign-compare -O0 -g")
 set (CMAKE_CXX_FLAGS_RELEASE "-Wall -O3")
@@ -24,8 +29,8 @@ if (CMAKE_BUILD_TYPE STREQUAL "Debug")
   set(CMAKE_CXX_OUTPUT_EXTENSION_REPLACE ON)
   message (STATUS "Enabled Coverage")
 endif ()
-
 find_package (yaml-cpp 0.5.0)
+#find_package (yaml-cpp 0.7.0)
 if (NOT YAML_CPP_INCLUDE_DIR)
   message (STATUS "Building yaml-cpp")
   include (cmake/yaml-cpp.cmake)
@@ -41,7 +46,7 @@ endif ()
 message (STATUS "YAML_CPP_LIBRARIES   = ${YAML_CPP_LIBRARIES}")
 message (STATUS "YAML_CPP_INCLUDE_DIR = ${YAML_CPP_INCLUDE_DIR}")
 
-find_package (Boost 1.64.0 COMPONENTS serialization)
+find_package (Boost 1.86.0 COMPONENTS serialization)
 if (NOT Boost_FOUND)
   message (STATUS "Building Boost")
   include (cmake/boost.cmake)

diff --git a/cxx/cmake/boost-download.cmake b/cxx/cmake/boost-download.cmake
@@ -8,10 +8,10 @@ ExternalProject_Add(
   boost
   SOURCE_DIR "@BOOST_DOWNLOAD_ROOT@/boost-src"
   URL
-    https://boostorg.jfrog.io/artifactory/main/release/1.71.0/source/boost_1_71_0.tar.gz
+    https://boostorg.jfrog.io/artifactory/main/release/1.86.0/source/boost_1_86_0.tar.gz
   URL_HASH
-    SHA256=96b34f7468f26a141f6020efb813f1a2f3dfb9797ecf76a7d7cbd843cc95f5bd
-  CONFIGURE_COMMAND ./bootstrap.sh --prefix=${PROJECT_BINARY_DIR}
+    SHA256=2575e74ffc3ef1cd0babac2c1ee8bdb5782a0ee672b1912da40e5b4b591ca01f
+  CONFIGURE_COMMAND ./bootstrap.sh --prefix=${PROJECT_BINARY_DIR} --with-toolset=gcc
   BUILD_COMMAND ./b2 cxxflags=-fPIC --with-serialization -j 8
   BUILD_IN_SOURCE 1
   INSTALL_COMMAND ./b2 install

diff --git a/cxx/cmake/boost.cmake b/cxx/cmake/boost.cmake
@@ -23,6 +23,7 @@ macro(fetch_boost _download_module_path _download_root)
     set (BOOST_ROOT ${PROJECT_BINARY_DIR}/boost)
     set (Boost_NO_BOOST_CMAKE ON)
     set (Boost_USE_STATIC_LIBS ON)
+    set (Boost_INCLUDE_DIR ${BOOST_ROOT}/boost-src)
 
-    find_package (Boost 1.71.0 REQUIRED COMPONENTS serialization)
+    find_package (Boost 1.86.0 REQUIRED COMPONENTS serialization)
 endmacro()
diff --git a/cxx/cmake/yaml-cpp.cmake b/cxx/cmake/yaml-cpp.cmake
@@ -1,5 +1,7 @@
 macro(fetch_yaml_cpp _download_module_path _download_root)
     set(YAML_CPP_DOWNLOAD_ROOT ${_download_root})
+    # May need to enable this line for some cases with MacOS
+    #set(CMAKE_OSX_SYSROOT /Library/Developer/CommandLineTools/SDKs/MacOSX14.5.sdk)
     configure_file(
         ${_download_module_path}/yaml-cpp-download.cmake
         ${_download_root}/CMakeLists.txt

diff --git a/cxx/include/mspass/algorithms/amplitudes.h b/cxx/include/mspass/algorithms/amplitudes.h
@@ -102,29 +102,34 @@ template <typename Tdata> double scale(Tdata& d,const ScalingMethod method,
     {
       case ScalingMethod::Peak:
         amplitude=PeakAmplitude(windowed_data);
-        dscale = level/amplitude;
-        newcalib /= dscale;
         break;
       case ScalingMethod::ClipPerc:
         amplitude=PercAmplitude(windowed_data,level);
-        /* for this scaling we use level as perf and output level is frozen
-        to be scaled to order unity*/
-        dscale = 1.0/amplitude;
-        newcalib /= dscale;
         break;
       case ScalingMethod::MAD:
         amplitude=MADAmplitude(windowed_data);
-        dscale = level/amplitude;
-        newcalib /= dscale;
         break;
       case ScalingMethod::RMS:
       default:
         amplitude=RMSAmplitude(windowed_data);
-        dscale = level/amplitude;
-        newcalib /= dscale;
     };
-    d *= dscale;
-    d.put(scale_factor_key,newcalib);
+    /* needed to handle case with a vector of all 0s*/
+    if(amplitude>0.0)
+    {
+      dscale = level/amplitude;
+      newcalib /= dscale;
+      d *= dscale;
+      d.put(scale_factor_key,newcalib);
+    }
+    else
+    {
+      std::stringstream ss;
+      ss << "Data array is all 0s and cannot be scaled";
+      d.elog.log_error("scale",ss.str(),mspass::utility::ErrorSeverity::Complaint);
+      /* This may not be necessary but it assures this value is always set on 
+      return even if it means nothing*/
+      d.put(scale_factor_key,newcalib);
+    }
     return amplitude;
   }catch(...){throw;};
 }
@@ -239,6 +244,12 @@ template <typename Tdata> double scale_ensemble(mspass::seismic::Ensemble<Tdata>
 
     /* restore to a value instead of natural log*/
     avgamp=exp(avgamp);
+    /* Silently do nothing if all the data are zero.   Would be better to 
+    log an error but use as a "Core" object doesn't contain an elog attribute*/
+    if(avgamp<=0.0)
+    {
+      return 0.0;      
+    }
     double dscale=level/avgamp;
     /* Now scale the data and apply calib */
     for(dptr=d.member.begin();dptr!=d.member.end();++dptr)
@@ -262,6 +273,22 @@ template <typename Tdata> double scale_ensemble(mspass::seismic::Ensemble<Tdata>
     return avgamp;
   }catch(...){throw;};
 }
+/*! Convert an std::vector to a unit vector based on L2 norm.*/
+template <class T> std::vector<T> normalize(const std::vector<T>& d)
+{
+  size_t N = d.size();
+  std::vector<T> result;
+  result.reserve(N);
+  double d_nrm(0.0);;
+  for(size_t i=0;i<N;++i)
+  {
+    d_nrm += (d[i]*d[i]);
+    result.push_back(d[i]);
+  }
+  d_nrm = sqrt(d_nrm);
+  for(size_t i=0;i<N;++i) result[i] /= d_nrm;
+  return result;
+}
 /*! \brief Holds parameters defining a passband computed from snr.
 
 This deconvolution operator has the option to determine the optimal

diff --git a/cxx/include/mspass/algorithms/deconvolution/CNRDeconEngine.h b/cxx/include/mspass/algorithms/deconvolution/CNRDeconEngine.h
@@ -0,0 +1,182 @@
+#ifndef __CNR_DECON_ENGINE_H__
+#define __CNR_DECON_ENGINE_H__
+#include <boost/archive/text_iarchive.hpp>
+#include <boost/archive/text_oarchive.hpp>
+#include <boost/serialization/base_object.hpp>
+#include <boost/serialization/shared_ptr.hpp>
+
+#include "mspass/utility/AntelopePf.h"
+#include "mspass/algorithms/deconvolution/ShapingWavelet.h"
+#include "mspass/algorithms/deconvolution/MTPowerSpectrumEngine.h"
+#include "mspass/algorithms/Taper.h"
+#include "mspass/seismic/PowerSpectrum.h"
+#include "mspass/seismic/TimeSeries.h"
+#include "mspass/seismic/Seismogram.h"
+#include "mspass/algorithms/deconvolution/FFTDeconOperator.h"
+
+namespace mspass::algorithms::deconvolution{
+/* This enum is file scope to intentionally exclude it from python wrappers.
+It is used internally to define the algorithm the processor is to run.
+I (glp) chose that approach over the inheritance approach used in the scalar
+methods as an artistic choice.   It is a matter of opinion which approach
+is better.  This makes one symbol do multiple things with changes done in
+the parameter setup as opposed to having to select the right symbolic name
+to construct.  Anyway, this enum defines algorithms that can be chosen for
+processing.
+*/
+enum class CNR3C_algorithms{
+    generalized_water_level,
+    colored_noise_damping
+};
+class CNRDeconEngine : public FFTDeconOperator
+{
+public:
+  CNRDeconEngine();
+  /* design note - delete when finished.
+
+     The constructor uses the pf to initialize operator properties.
+     Important in that is the multitaper engine that has a significant
+     initialization cost.   the initialize_inverse_operator is a messy
+     way to define load data and then compute the inverse stored
+     internally.   Doing it that way, however, allows the same code
+     to be used for both single station and array decon.   In single
+     station use every datum has to call initiallize_inverse_operator
+     while with an array decon the operator is define once for an
+     ensemble and applied to all members.   THE ASSUMPTION is all the
+     grungy work to assure all that gets done correction is handled
+     in python.
+     */
+   /*! Construct from AntelopePf container.
+
+   This is currently the only valid constructor for this object.
+   The operator requires a fair number of parameters for construction
+   that include secondary groups of parameters in a hierarchy.
+   An AntelopePf is one way to handle that with a human readable form.
+   An alternative is yaml but that is not currently supported as
+   an external format.  The AntelopePf object API is a generic
+   way to handle parameter data hierarchies that is in MsPASS.
+   Most user's will interact with the object only indirectly
+   via python wrappers.   All a user needs to do is define all the
+   required key-value pairs and get the obscure pf format correct.
+   To use this constructor one passes an instance of the object
+   of type AntelopePf that is normally constructed by reading a
+   text file with the "pf" format or retrieved from another
+   AntelopePf with the get_branch method.
+   */
+  CNRDeconEngine(const mspass::utility::AntelopePf& pf);
+  CNRDeconEngine(const CNRDeconEngine& parent);
+  void initialize_inverse_operator(const mspass::seismic::TimeSeries& wavelet,
+          const mspass::seismic::TimeSeries& noise_data);
+  void initialize_inverse_operator(const mspass::seismic::TimeSeries& wavelet,
+          const mspass::seismic::PowerSpectrum& noise_spectrum);
+  virtual ~CNRDeconEngine(){};
+  mspass::seismic::Seismogram process(const mspass::seismic::Seismogram& d,
+      const mspass::seismic::PowerSpectrum& psnoise,
+        const double fl, const double fh);
+  double get_operator_dt() const
+  {
+    return this->operator_dt;
+  }
+  /*! Compute noise spectrum using internal Multitaper operator.
+  Normally used for noise spectrum but can be used for signal.
+  Necessary to assure consistent scaling between signal and noise
+  spectrum estimators.  */
+  mspass::seismic::PowerSpectrum compute_noise_spectrum(
+      const mspass::seismic::TimeSeries& d2use);
+  /*! Compute noise spectrum from three component data.
+
+  Overloaded version returns average power spectrum of all three components.  Measure
+  of overall 3c power comparable to scalar signal power.   Could do
+  sum of component spectra but that would cause a scaling problem for
+  use in regularization. For most data this will tend to be dominated by
+  the component with the highest noise level*/
+  mspass::seismic::PowerSpectrum compute_noise_spectrum(
+      const mspass::seismic::Seismogram& d2use);
+  mspass::seismic::TimeSeries ideal_output();
+  mspass::seismic::TimeSeries actual_output(
+      const mspass::seismic::TimeSeries& wavelet);
+  mspass::seismic::TimeSeries inverse_wavelet(
+      const mspass::seismic::TimeSeries& wavelet, const double t0shift);
+  mspass::utility::Metadata QCMetrics();
+  CNRDeconEngine& operator=(const CNRDeconEngine& parent);
+
+private:
+    CNR3C_algorithms algorithm;
+    /* For the colored noise damping algorithm the damper is frequency dependent.
+       The same issue in water level that requires a floor on the water level
+       applies to damping.   We use noise_floor to create a lower bound on
+       damper values.   Note the damping constant at each frequency is
+       damp*noise except where noise is below noise_floor defined relative to
+       maximum noise value where it is set to n_peak*noise_floor*damp. */
+    double damp;
+    double noise_floor;
+    /* SNR bandbwidth estimates count frequencies with snr above this value */
+    double band_snr_floor;
+    double operator_dt;   // Data must match this sample interval
+    int shaping_wavelet_number_poles;
+    mspass::algorithms::deconvolution::ShapingWavelet shapingwavelet;
+    /* Expected time window size in samples.   When signal lengths
+    match this value the slepian tapers are not recomputed.  When there
+    is a mismatch it will change.  That means this can change dynamically
+    when run on multiple data objects. */
+    int winlength;
+    mspass::algorithms::deconvolution::MTPowerSpectrumEngine signal_engine;
+    mspass::algorithms::deconvolution::MTPowerSpectrumEngine noise_engine;
+
+    /* This algorithm uses a mix of damping and water level.   Above this floor,
+    which acts a bit like a water level, no regularization is done.  If
+    snr is less than this value we regularize with damp*noise_amplitude.
+    Note the noise_floor parameter puts a lower bound on the frequency dependent
+    regularization.   If noise amplitude (not power) is less than noise_floor
+    the floor is set like a water level as noise_max*noise_level.*/
+    double snr_regularization_floor;
+    /* These are QC metrics computed by process method.  Saved to allow them
+    to be use in QCmetrics method. */
+    double regularization_bandwidth_fraction;
+    double peak_snr[3];
+    double signal_bandwidth_fraction[3];
+    mspass::algorithms::deconvolution::ComplexArray winv;
+    /* This is the lag from sample 0 for the time defines as 0 for the
+    wavelet used to compute the inverse.  It is needed to resolve time
+    in the actual_output method.*/
+    int winv_t0_lag;
+    /*** Private methods *****/
+    void update_shaping_wavelet(const double fl, const double fh);
+    /* These are two algorithms for computing inverse operator in the frequency domain*/
+    void compute_winv(const mspass::seismic::TimeSeries& wavelet,
+        const mspass::seismic::PowerSpectrum& psnoise);
+    void compute_gwl_inverse(const mspass::seismic::TimeSeries& wavelet,
+        const mspass::seismic::PowerSpectrum& psnoise);
+    void compute_gdamp_inverse(const mspass::seismic::TimeSeries& wavelet,
+        const mspass::seismic::PowerSpectrum& psnoise);
+  friend boost::serialization::access;
+  template<class Archive>
+  void serialize(Archive& ar, const unsigned int version)
+  {
+      /* See boost documentation for why these are needed.
+      tapers are referenced through a polymorphic shared_ptr.
+      It seems all subclasses to be used need this incantation*/
+      ar.register_type(static_cast<LinearTaper *>(NULL));
+      ar.register_type(static_cast<CosineTaper *>(NULL));
+      ar.register_type(static_cast<VectorTaper *>(NULL));
+      ar & boost::serialization::base_object<FFTDeconOperator>(*this);
+      ar & algorithm;
+      ar & damp;
+      ar & noise_floor;
+      ar & band_snr_floor;
+      ar & operator_dt;
+      ar & shaping_wavelet_number_poles;
+      ar & shapingwavelet;
+      ar & signal_engine;
+      ar & noise_engine;
+      ar & snr_regularization_floor;
+      ar & winv;
+      ar & winv_t0_lag;
+      ar & regularization_bandwidth_fraction;
+      ar & peak_snr;
+      ar & signal_bandwidth_fraction;
+
+  }
+};
+}  // End namespace enscapsulation
+#endif