Skip to content

Commit

Permalink
CNRDeco Initial Version (#567)
Browse files Browse the repository at this point in the history
* Iniial checking of revised colored noise regularized deconvolution algorithm

* Fix a hole in the C++ api for all atomic seismic data objects.  There was a setter for the protected t0shift
attribute but no getter.  This adds the getter called get_t0shift.

* add time_axis method to BasicTimeSeries to simply matplotlib graphics

* Change method named deconvolve to process to mesh with RFdeconProcessor.py python interface

* Fix bug from missing call to kill result with an invalid error

* Checkpoint this version with scaffolding after a bug fix

* Fix bug in failing to handle all zero arrays

* Fix bun in handling duplicates in attributes_to_load list and load_if_defined list

* Bug fixes with debug scaffolding - commit to save state but needs more work

* Save a version before a significant change

* Save this version before an experimental improvement in case I have to back it out

* Fix rare bug in pad mode created by subsample t0 ambiguity

* Fix handling in pad mode of off by one issue from t0 subsample ambiguity - this is closely link to previous commit of a C++ file

* Add several new functions and fix a number of bugs - working version processes all test data successfully

* Add sort_ensemble function

* Partially tested version of new CNRDecon module - not yet working but preserve this version

* Preserve debug version with print scaffolding

* Version passing initial (incomplete) tests

* Minor bug fix in array method found in initial, incomplete test

* Temporary test file for interactive use and not with pytest

* format with black

* Untested fix for bug in handling undefined loc code in some situations.  Also fixes a few other blemishes

* Patch to address issue #570

* Add new implementation of EstimateBandwidth - note this version has plot scaffolding and should not be used for production

* Change handling of read errors to address github issue #570

* Add a TODO comment that may indicate a future problem

* Update docstrings

* Fix typo before push

* minor changes to save before removing scaffolding

* Convert debug scaffolding graphics to a new function useful validating broadband_snr_QC behavior

* Format with black

* Break out function for robust filtering with estimated bandwidth

* Implement the new filter function - accidentally overwrote first attempt cached incorrectly as previous commit

* Fix bug failing to set ensemble live before returning

* Fix handling of high f corner to silence filter warnings - prevent high corner exceeding 80 percent Nyquist

* Several bug fixes and add arrival demean function

* Fix error in computation in beam_coherence function

* Fix pytest function for snr to match new algorithm

* Fix errors in new tests for time_axis method

* Update tests for some changes in error handling

* Fix bugs detected in updated pytest run

* Fix tests handling errors that use elog instead of throwing exceptions

* Disable pickle tests for now.  Needs work with pybind11 code to get that to work

* Clean up formatting with black

* Remove elog related to updating "sampling_rate" in resample.py

* 🎨 Format Python code with psf/black

* feat: make seisbench dependency optional  (#578)

* updated config files

* fix github test failure

* Resolve conflicts with master branch

* Fixes to get boost to compile on macos - may be a problem

* Add boost serialization code to all deconvolution operators - needed as step one for parallelization

* Add C++ test program for deconvolution operator serialization

* Force Boost_INCLUDE_DIR - seems necessary for macos but should be redundant for linux

* Force SDK 14.5 for Macos - this may need to be disabled in the future

* Add pybind11 code to allow pickle of decon operators

* Minor fix to run in github enviornment

* Add start of test script for CNRDeconEngine and enable pickle for RFdecon tests

* Fix typo

* Fix bug in handling power spectrum dt when auto setting fft size

* Add code to normalize FIR impulse response outputs and assorted fixed found in testing

* Add c++ tests for ShapingWavelet

* Add feature to post QCmetrics to RF output

* Near final form of pytest scripts with scaffolding commented out

* Partial implementation of tests for new CNR algorithm

* Add master pf files for decon algorithms

* Revert to older version to keep gtree test script from failing

* Fix path for pf file for this test

* Mostly complete test for CNRRFDecon function

* Change name of overloaded method for 3C dta

* Change test pf file to match test name - previous used RFdeconProcessor.pf which was confusing

* Fix error in handling time shifts for actual_output ethod

* Additional changes for timing problem

* Changes to fix problems found with pytest

* Expand testing of CNRDecon module

* Remove test file unintentionally included earlier

* Update yaml cmake to newer version

* Change to use newer yaml version

* Fix error in tag - download 0.8.0

* Revert to 0.7.0 - 0.8.0 failes with cmake

* Revert to yaml-cpp-0.6.3 - all later releases fail from a cmake configuration error

* Fix path for AntelopePf constructors to use current directroy

* Disable decon_serialization test run for now

* Add error handlers in C++ code to make CNRDeconEngine more robust

* Fix CNRDecon problems found with pytest

* enable test_md and test_decon_serialization

* enable test_md and test_decon_serialization with cmake file - should have been in earlier commit

* Regularize the use of data files and add comment to Cmake file for maintenance

* Add debug print statements to AntelopePf to sort out pytest failures on github

* Add lines to clear PFPATH env value that caused later tests to mysteriously fail

* Revert to pre 3.9 method to merge two dictionaries - 3.8 build was breaking because of this one line

* Remove print scaffolding used to solve PFPATH issue

* Format with black

* Disable CMAKE_OSX_SYSROOT explicit setting

---------

Co-authored-by: Cloud1e <[email protected]>
Co-authored-by: Cloud1e <[email protected]>
Co-authored-by: Jinxin Ma <[email protected]>
  • Loading branch information
4 people authored Dec 25, 2024
1 parent 3bb9746 commit 5c082c5
Show file tree
Hide file tree
Showing 56 changed files with 5,513 additions and 1,027 deletions.
9 changes: 7 additions & 2 deletions cxx/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ if (NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)
endif ()
message (STATUS "CMAKE_BUILD_TYPE = ${CMAKE_BUILD_TYPE}")


# Necessary for MacOS running 14.x OS with Xcode command tools
# Commented out by default
#set(CMAKE_OSX_SYSROOT "/Library/Developer/CommandLineTools/SDKs/MacOSX14.5.sdk")

set (CMAKE_CXX_STANDARD 17)
set (CMAKE_CXX_FLAGS_DEBUG "-Wall -Wno-sign-compare -O0 -g")
set (CMAKE_CXX_FLAGS_RELEASE "-Wall -O3")
Expand All @@ -24,8 +29,8 @@ if (CMAKE_BUILD_TYPE STREQUAL "Debug")
set(CMAKE_CXX_OUTPUT_EXTENSION_REPLACE ON)
message (STATUS "Enabled Coverage")
endif ()

find_package (yaml-cpp 0.5.0)
#find_package (yaml-cpp 0.7.0)
if (NOT YAML_CPP_INCLUDE_DIR)
message (STATUS "Building yaml-cpp")
include (cmake/yaml-cpp.cmake)
Expand All @@ -41,7 +46,7 @@ endif ()
message (STATUS "YAML_CPP_LIBRARIES = ${YAML_CPP_LIBRARIES}")
message (STATUS "YAML_CPP_INCLUDE_DIR = ${YAML_CPP_INCLUDE_DIR}")

find_package (Boost 1.64.0 COMPONENTS serialization)
find_package (Boost 1.86.0 COMPONENTS serialization)
if (NOT Boost_FOUND)
message (STATUS "Building Boost")
include (cmake/boost.cmake)
Expand Down
6 changes: 3 additions & 3 deletions cxx/cmake/boost-download.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@ ExternalProject_Add(
boost
SOURCE_DIR "@BOOST_DOWNLOAD_ROOT@/boost-src"
URL
https://boostorg.jfrog.io/artifactory/main/release/1.71.0/source/boost_1_71_0.tar.gz
https://boostorg.jfrog.io/artifactory/main/release/1.86.0/source/boost_1_86_0.tar.gz
URL_HASH
SHA256=96b34f7468f26a141f6020efb813f1a2f3dfb9797ecf76a7d7cbd843cc95f5bd
CONFIGURE_COMMAND ./bootstrap.sh --prefix=${PROJECT_BINARY_DIR}
SHA256=2575e74ffc3ef1cd0babac2c1ee8bdb5782a0ee672b1912da40e5b4b591ca01f
CONFIGURE_COMMAND ./bootstrap.sh --prefix=${PROJECT_BINARY_DIR} --with-toolset=gcc
BUILD_COMMAND ./b2 cxxflags=-fPIC --with-serialization -j 8
BUILD_IN_SOURCE 1
INSTALL_COMMAND ./b2 install
Expand Down
3 changes: 2 additions & 1 deletion cxx/cmake/boost.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ macro(fetch_boost _download_module_path _download_root)
set (BOOST_ROOT ${PROJECT_BINARY_DIR}/boost)
set (Boost_NO_BOOST_CMAKE ON)
set (Boost_USE_STATIC_LIBS ON)
set (Boost_INCLUDE_DIR ${BOOST_ROOT}/boost-src)

find_package (Boost 1.71.0 REQUIRED COMPONENTS serialization)
find_package (Boost 1.86.0 REQUIRED COMPONENTS serialization)
endmacro()
2 changes: 2 additions & 0 deletions cxx/cmake/yaml-cpp.cmake
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
macro(fetch_yaml_cpp _download_module_path _download_root)
set(YAML_CPP_DOWNLOAD_ROOT ${_download_root})
# May need to enable this line for some cases with MacOS
#set(CMAKE_OSX_SYSROOT /Library/Developer/CommandLineTools/SDKs/MacOSX14.5.sdk)
configure_file(
${_download_module_path}/yaml-cpp-download.cmake
${_download_root}/CMakeLists.txt
Expand Down
51 changes: 39 additions & 12 deletions cxx/include/mspass/algorithms/amplitudes.h
Original file line number Diff line number Diff line change
Expand Up @@ -102,29 +102,34 @@ template <typename Tdata> double scale(Tdata& d,const ScalingMethod method,
{
case ScalingMethod::Peak:
amplitude=PeakAmplitude(windowed_data);
dscale = level/amplitude;
newcalib /= dscale;
break;
case ScalingMethod::ClipPerc:
amplitude=PercAmplitude(windowed_data,level);
/* for this scaling we use level as perf and output level is frozen
to be scaled to order unity*/
dscale = 1.0/amplitude;
newcalib /= dscale;
break;
case ScalingMethod::MAD:
amplitude=MADAmplitude(windowed_data);
dscale = level/amplitude;
newcalib /= dscale;
break;
case ScalingMethod::RMS:
default:
amplitude=RMSAmplitude(windowed_data);
dscale = level/amplitude;
newcalib /= dscale;
};
d *= dscale;
d.put(scale_factor_key,newcalib);
/* needed to handle case with a vector of all 0s*/
if(amplitude>0.0)
{
dscale = level/amplitude;
newcalib /= dscale;
d *= dscale;
d.put(scale_factor_key,newcalib);
}
else
{
std::stringstream ss;
ss << "Data array is all 0s and cannot be scaled";
d.elog.log_error("scale",ss.str(),mspass::utility::ErrorSeverity::Complaint);
/* This may not be necessary but it assures this value is always set on
return even if it means nothing*/
d.put(scale_factor_key,newcalib);
}
return amplitude;
}catch(...){throw;};
}
Expand Down Expand Up @@ -239,6 +244,12 @@ template <typename Tdata> double scale_ensemble(mspass::seismic::Ensemble<Tdata>

/* restore to a value instead of natural log*/
avgamp=exp(avgamp);
/* Silently do nothing if all the data are zero. Would be better to
log an error but use as a "Core" object doesn't contain an elog attribute*/
if(avgamp<=0.0)
{
return 0.0;
}
double dscale=level/avgamp;
/* Now scale the data and apply calib */
for(dptr=d.member.begin();dptr!=d.member.end();++dptr)
Expand All @@ -262,6 +273,22 @@ template <typename Tdata> double scale_ensemble(mspass::seismic::Ensemble<Tdata>
return avgamp;
}catch(...){throw;};
}
/*! Convert an std::vector to a unit vector based on L2 norm.*/
template <class T> std::vector<T> normalize(const std::vector<T>& d)
{
size_t N = d.size();
std::vector<T> result;
result.reserve(N);
double d_nrm(0.0);;
for(size_t i=0;i<N;++i)
{
d_nrm += (d[i]*d[i]);
result.push_back(d[i]);
}
d_nrm = sqrt(d_nrm);
for(size_t i=0;i<N;++i) result[i] /= d_nrm;
return result;
}
/*! \brief Holds parameters defining a passband computed from snr.
This deconvolution operator has the option to determine the optimal
Expand Down
182 changes: 182 additions & 0 deletions cxx/include/mspass/algorithms/deconvolution/CNRDeconEngine.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
#ifndef __CNR_DECON_ENGINE_H__
#define __CNR_DECON_ENGINE_H__
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/serialization/base_object.hpp>
#include <boost/serialization/shared_ptr.hpp>

#include "mspass/utility/AntelopePf.h"
#include "mspass/algorithms/deconvolution/ShapingWavelet.h"
#include "mspass/algorithms/deconvolution/MTPowerSpectrumEngine.h"
#include "mspass/algorithms/Taper.h"
#include "mspass/seismic/PowerSpectrum.h"
#include "mspass/seismic/TimeSeries.h"
#include "mspass/seismic/Seismogram.h"
#include "mspass/algorithms/deconvolution/FFTDeconOperator.h"

namespace mspass::algorithms::deconvolution{
/* This enum is file scope to intentionally exclude it from python wrappers.
It is used internally to define the algorithm the processor is to run.
I (glp) chose that approach over the inheritance approach used in the scalar
methods as an artistic choice. It is a matter of opinion which approach
is better. This makes one symbol do multiple things with changes done in
the parameter setup as opposed to having to select the right symbolic name
to construct. Anyway, this enum defines algorithms that can be chosen for
processing.
*/
enum class CNR3C_algorithms{
generalized_water_level,
colored_noise_damping
};
class CNRDeconEngine : public FFTDeconOperator
{
public:
CNRDeconEngine();
/* design note - delete when finished.
The constructor uses the pf to initialize operator properties.
Important in that is the multitaper engine that has a significant
initialization cost. the initialize_inverse_operator is a messy
way to define load data and then compute the inverse stored
internally. Doing it that way, however, allows the same code
to be used for both single station and array decon. In single
station use every datum has to call initiallize_inverse_operator
while with an array decon the operator is define once for an
ensemble and applied to all members. THE ASSUMPTION is all the
grungy work to assure all that gets done correction is handled
in python.
*/
/*! Construct from AntelopePf container.
This is currently the only valid constructor for this object.
The operator requires a fair number of parameters for construction
that include secondary groups of parameters in a hierarchy.
An AntelopePf is one way to handle that with a human readable form.
An alternative is yaml but that is not currently supported as
an external format. The AntelopePf object API is a generic
way to handle parameter data hierarchies that is in MsPASS.
Most user's will interact with the object only indirectly
via python wrappers. All a user needs to do is define all the
required key-value pairs and get the obscure pf format correct.
To use this constructor one passes an instance of the object
of type AntelopePf that is normally constructed by reading a
text file with the "pf" format or retrieved from another
AntelopePf with the get_branch method.
*/
CNRDeconEngine(const mspass::utility::AntelopePf& pf);
CNRDeconEngine(const CNRDeconEngine& parent);
void initialize_inverse_operator(const mspass::seismic::TimeSeries& wavelet,
const mspass::seismic::TimeSeries& noise_data);
void initialize_inverse_operator(const mspass::seismic::TimeSeries& wavelet,
const mspass::seismic::PowerSpectrum& noise_spectrum);
virtual ~CNRDeconEngine(){};
mspass::seismic::Seismogram process(const mspass::seismic::Seismogram& d,
const mspass::seismic::PowerSpectrum& psnoise,
const double fl, const double fh);
double get_operator_dt() const
{
return this->operator_dt;
}
/*! Compute noise spectrum using internal Multitaper operator.
Normally used for noise spectrum but can be used for signal.
Necessary to assure consistent scaling between signal and noise
spectrum estimators. */
mspass::seismic::PowerSpectrum compute_noise_spectrum(
const mspass::seismic::TimeSeries& d2use);
/*! Compute noise spectrum from three component data.
Overloaded version returns average power spectrum of all three components. Measure
of overall 3c power comparable to scalar signal power. Could do
sum of component spectra but that would cause a scaling problem for
use in regularization. For most data this will tend to be dominated by
the component with the highest noise level*/
mspass::seismic::PowerSpectrum compute_noise_spectrum(
const mspass::seismic::Seismogram& d2use);
mspass::seismic::TimeSeries ideal_output();
mspass::seismic::TimeSeries actual_output(
const mspass::seismic::TimeSeries& wavelet);
mspass::seismic::TimeSeries inverse_wavelet(
const mspass::seismic::TimeSeries& wavelet, const double t0shift);
mspass::utility::Metadata QCMetrics();
CNRDeconEngine& operator=(const CNRDeconEngine& parent);

private:
CNR3C_algorithms algorithm;
/* For the colored noise damping algorithm the damper is frequency dependent.
The same issue in water level that requires a floor on the water level
applies to damping. We use noise_floor to create a lower bound on
damper values. Note the damping constant at each frequency is
damp*noise except where noise is below noise_floor defined relative to
maximum noise value where it is set to n_peak*noise_floor*damp. */
double damp;
double noise_floor;
/* SNR bandbwidth estimates count frequencies with snr above this value */
double band_snr_floor;
double operator_dt; // Data must match this sample interval
int shaping_wavelet_number_poles;
mspass::algorithms::deconvolution::ShapingWavelet shapingwavelet;
/* Expected time window size in samples. When signal lengths
match this value the slepian tapers are not recomputed. When there
is a mismatch it will change. That means this can change dynamically
when run on multiple data objects. */
int winlength;
mspass::algorithms::deconvolution::MTPowerSpectrumEngine signal_engine;
mspass::algorithms::deconvolution::MTPowerSpectrumEngine noise_engine;

/* This algorithm uses a mix of damping and water level. Above this floor,
which acts a bit like a water level, no regularization is done. If
snr is less than this value we regularize with damp*noise_amplitude.
Note the noise_floor parameter puts a lower bound on the frequency dependent
regularization. If noise amplitude (not power) is less than noise_floor
the floor is set like a water level as noise_max*noise_level.*/
double snr_regularization_floor;
/* These are QC metrics computed by process method. Saved to allow them
to be use in QCmetrics method. */
double regularization_bandwidth_fraction;
double peak_snr[3];
double signal_bandwidth_fraction[3];
mspass::algorithms::deconvolution::ComplexArray winv;
/* This is the lag from sample 0 for the time defines as 0 for the
wavelet used to compute the inverse. It is needed to resolve time
in the actual_output method.*/
int winv_t0_lag;
/*** Private methods *****/
void update_shaping_wavelet(const double fl, const double fh);
/* These are two algorithms for computing inverse operator in the frequency domain*/
void compute_winv(const mspass::seismic::TimeSeries& wavelet,
const mspass::seismic::PowerSpectrum& psnoise);
void compute_gwl_inverse(const mspass::seismic::TimeSeries& wavelet,
const mspass::seismic::PowerSpectrum& psnoise);
void compute_gdamp_inverse(const mspass::seismic::TimeSeries& wavelet,
const mspass::seismic::PowerSpectrum& psnoise);
friend boost::serialization::access;
template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
/* See boost documentation for why these are needed.
tapers are referenced through a polymorphic shared_ptr.
It seems all subclasses to be used need this incantation*/
ar.register_type(static_cast<LinearTaper *>(NULL));
ar.register_type(static_cast<CosineTaper *>(NULL));
ar.register_type(static_cast<VectorTaper *>(NULL));
ar & boost::serialization::base_object<FFTDeconOperator>(*this);
ar & algorithm;
ar & damp;
ar & noise_floor;
ar & band_snr_floor;
ar & operator_dt;
ar & shaping_wavelet_number_poles;
ar & shapingwavelet;
ar & signal_engine;
ar & noise_engine;
ar & snr_regularization_floor;
ar & winv;
ar & winv_t0_lag;
ar & regularization_bandwidth_fraction;
ar & peak_snr;
ar & signal_bandwidth_fraction;

}
};
} // End namespace enscapsulation
#endif
Loading

0 comments on commit 5c082c5

Please sign in to comment.