Merge pull request #46 from numericalEFT/bugfix

numericalEFT · Jul 25, 2023 · 705ffc9 · 705ffc9 · kunyuan · Jul 25, 2023
2 parents 464108e + aeed697
commit 705ffc9
Show file tree

Hide file tree

Showing 11 changed files with 231 additions and 119 deletions.
diff --git a/README.md b/README.md
@@ -24,7 +24,7 @@ julia> f(x, c) = log(x[1]) / sqrt(x[1])   # Define your integrand
 julia> integrate(f, var = Continuous(0, 1), neval=1e5)   # Perform the MC integration for 1e5 steps
 Integral 1 = -3.99689518016736 ± 0.001364833686666744   (reduced chi2 = 0.695)
 ```
-In this example, we define an integrand function `f(x, c)` where `x` represents the random variables in the integral and `c` is a `Configuration` object parameter that can hold extra parameters that might be necessary for more complex integrand functions. The variable `x` is determined by the var parameter in `integrate()`.
+In this example, we define an integrand function `f(x, c)` where `x` represents the random variables in the integral and `c` is a [`Configuration`](https://numericaleft.github.io/MCIntegration.jl/dev/lib/montecarlo/) object parameter that can hold extra parameters that might be necessary for more complex integrand functions. The variable `x` is determined by the var parameter in `integrate()`. Learn more details from the [documentation](https://numericaleft.github.io/MCIntegration.jl/dev/lib/montecarlo/).
 
 `MCIntegration.jl` also supports Discrete variables. For instance, let's estimate $\pi$ through the Taylor series for $\pi/4 = 1 - 1/3 + 1/5 -1/7 + 1/9 - ...$:
 ```julia
@@ -86,10 +86,12 @@ Here's a brief overview of the three solvers:
 
 Given its robustness and efficiency, the default solver in this package is the `:vegasmc`. To choose a specific solver, use the `solver` parameter in the `integrate` function, like `solver=:vegas`.
 
-Please note that the calling convention for the user-defined integrand for `:mcmc` is slightly different from that of `:vegas` and `:vegasmc`. Please refer to the separate detailed note on this.
+Please note that the calling convention for the user-defined integrand for `:mcmc` is slightly different from that of `:vegas` and `:vegasmc`. Please refer to the separate [detailed note](https://numericaleft.github.io/MCIntegration.jl/dev/#Algorithm) on this.
 
 Packed variables can enhance the efficiency of :vegasmc and :mcmc solvers by reducing the auto-correlation time of the Markov chain, leading to a more effective sampling process.
 
+Learn more from documentation: [Vegas](https://numericaleft.github.io/MCIntegration.jl/dev/lib/vegas/), [VegasMC](https://numericaleft.github.io/MCIntegration.jl/dev/lib/vegasmc/) and [MCMC](https://numericaleft.github.io/MCIntegration.jl/dev/lib/mcmc/) algorithms.
+
 
 ## Parallelization
 

diff --git a/docs/make.jl b/docs/make.jl
@@ -24,6 +24,7 @@ makedocs(;
             "lib/vegas.md",
             "lib/mcmc.md",
             "lib/distribution.md",
+            "lib/utility.md",
             # map(s -> "lib/$(s)", sort(readdir(joinpath(@__DIR__, "src/lib"))))
             # "Internals" => map(s -> "lib/$(s)", sort(readdir(joinpath(@__DIR__, "src/lib"))))
         ]

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -196,10 +196,7 @@ julia> plot!(plt, grid, res.mean[2], yerror = res.stdev[2], label="sphere")
 
 This package provides three solvers.
 
-- Vegas algorithm (`:vegas`): A Monte Carlo algorithm that uses importance sampling as a variance-reduction technique. Vegas iteratively builds up a piecewise constant weight function, represented
-on a rectangular grid. Each iteration consists of a sampling step followed by a refinement
-of the grid. The exact details of the algorithm can be found in **_G.P. Lepage, J. Comp. Phys. 27 (1978) 192, 3_** and
-**_G.P. Lepage, Report CLNS-80/447, Cornell Univ., Ithaca, N.Y., 1980_**. 
+- Vegas algorithm (`:vegas`): A Monte Carlo algorithm that uses importance sampling as a variance-reduction technique. Vegas iteratively builds up a piecewise constant weight function, represented on a rectangular grid. Each iteration consists of a sampling step followed by a refinement of the grid. The exact details of the algorithm can be found in **_G.P. Lepage, J. Comp. Phys. 27 (1978) 192, 3_** and **_G.P. Lepage, Report CLNS-80/447, Cornell Univ., Ithaca, N.Y., 1980_**. 
 
 - Vegas algorithm based on Markov-chain Monte Carlo (`:vegasmc`): A markov-chain Monte Carlo algorithm that uses the Vegas variance-reduction technique. It is as accurate as the vanilla Vegas algorithm, meanwhile tends to be more robust. For complicated high-dimensional integral, the vanilla Vegas algorithm can fail to learn the piecewise constant weight function. This algorithm uses Metropolis–Hastings algorithm to sample the integrand and improves the weight function learning.
 

diff --git a/docs/src/lib/utility.md b/docs/src/lib/utility.md
@@ -0,0 +1,5 @@
+# Utility
+
+```@autodocs
+Modules = [MCIntegration.MCUtility]
+```
diff --git a/src/configuration.jl b/src/configuration.jl
@@ -36,6 +36,43 @@
     Their shapes are (number of updates X integrand number X max(integrand number, variable number).
     The last index will waste some memory, but the dimension is small anyway.
 """
+
+"""
+    mutable struct Configuration{NI,V,P,O,T}
+
+Struct that holds all the necessary parameters and variables for Monte Carlo integration.
+
+# Parameters
+
+- `NI` : Number of integrands
+- `V` : Type of variables
+- `P` : Type of user-defined data
+- `O` : Type of observables 
+- `T` : Type of integrand
+
+# Static parameters
+
+- `seed`: Seed to initialize the random number generator, also serves as the unique process ID of the configuration.
+- `rng`: A MersenneTwister random number generator, seeded by `seed`.
+- `userdata`: User-defined parameter.
+- `var`: Tuple of variables. Each variable should derive from the abstract type `Variable` (see `variable.jl` for details). Using a tuple instead of a vector improves performance.
+
+# Integrands properties
+
+- `neighbor::Vector{Tuple{Int, Int}}` : Vector of tuples defining neighboring integrands. Two neighboring integrands are directly connected in the Markov chain. 
+   The `neighbor` vector defines an undirected graph showing how the integrands are connected. Only highly correlated integrands should be defined as neighbors to reduce autocorrelations.
+- `dof::Vector{Vector{Int}}`: Degrees of freedom of each integrand, i.e., the dimensions in which each integrand can vary.
+- `observable`: Observables required to calculate the integrands, will be used in the `measure` function call.
+- `reweight`: Reweight factors for each integrand. The reweight factor of the normalization integrand (namely, the last element) is assumed to be 1.
+- `visited`: The number of times each integrand is visited by the Markov chain.
+
+# Current MC state
+
+- `step`: The number of Monte Carlo updates performed up to now.
+- `norm`: The index of the normalization integrand. `norm` is larger than the index of any user-defined integrands.
+- `normalization`: The accumulated normalization factor.
+- `propose/accept`: Arrays to store the proposed and accepted updates for each integrand and variable.
+"""
 mutable struct Configuration{NI,V,P,O,T}
     ########### static parameters ###################
     seed::Int # seed to initialize random numebr generator, also serves as the unique pid of the configuration
@@ -91,15 +128,52 @@ By default, dof=[ones(length(var)), ], which means that there is only one integr
 It is either an array of any type with the common operations like +-*/^ defined. 
 By default, it will be set to 0.0 if there is only one integrand (e.g., length(dof)==1); otherwise, it will be set to zeros(length(dof)).
 - `para`: user-defined parameter, set to nothing if not needed
-- `reweight`: reweight factors for each integrands. If not set, then all factors will be initialized with one.
-- `seed`: seed to initialize random numebr generator, also serves as the unique pid of the configuration. If it is nothing, then use RandomDevice() to generate a random seed in [1, 1000_1000]
+- `reweight`: reweight factors for each integrands. If not set, then all factors will be initialized with one. Internally, a reweight factor of 1 will be appended to the end of the reweight vector, which is for the normalization integral.
+- `seed`: seed to initialize random numebr generator, also serves as the unique `pid` of the configuration. If it is `nothing`, then use `RandomDevice()` to generate a random seed in `[1, 1000_1000]`
 - `neighbor::Vector{Tuple{Int, Int}}` : vector of tuples that defines the neighboring integrands. Two neighboring integrands are directly connected in the Markov chain. 
     e.g., [(1, 2), (2, 3)] means the integrand 1 and 2 are neighbor, and 2 and 3 are neighbor.  
     The neighbor vector defines a undirected graph showing how the integrands are connected. Please make sure all integrands are connected.
-    By default, we assume the N integrands are in the increase order, meaning the neighbor will be set to [(N+1, 1), (1, 2), (2, 4), ..., (N-1, N)], where the first N entries are for diagram 1, 2, ..., N and the last entry is for the normalization diagram. Only the first diagram is connected to the normalization diagram.
-    Only highly correlated integrands are not highly correlated should be defined as neighbors. Otherwise, most of the updates between the neighboring integrands will be rejected and wasted.
+    By default, we assume the N integrands are in the increase order, meaning the neighbor will be set to [(N+1, 1), (1, 2), (2, 4), ..., (N-1, N)], where the first N entries are for integrals 1, 2, ..., N and the last entry is for the normalization integral. Only the first integral is connected to the normalization integral.
+    Only highly correlated integrands should be defined as neighbors. Otherwise, most of the updates between the neighboring integrands will be rejected and wasted.
 - `userdata`: User data you want to pass to the integrand and the measurement
 """
+
+"""
+    function Configuration(;
+        var::Union{Variable,AbstractVector,Tuple}=(Continuous(0.0, 1.0),),
+        dof::Union{Int,AbstractVector,AbstractMatrix}=[ones(Int, length(var))],
+        type=Float64,  # type of the integrand
+        obs::AbstractVector=zeros(Float64, length(dof)),
+        reweight::Vector{Float64}=ones(length(dof) + 1),
+        seed::Int=rand(Random.RandomDevice(), 1:1000000),
+        userdata=nothing,
+        neighbor::Union{Vector{Vector{Int}},Vector{Tuple{Int,Int}},Nothing}=nothing,
+        kwargs...
+    )
+
+Create a Configuration struct for MC integration.
+
+# Arguments
+- `var`: Either a single `Variable`, a `CompositeVar`, or a tuple consisting of `Variable` and/or `CompositeVar` instances. Tuples are used to improve performance by ensuring type stability. By default, `var` is set to a tuple containing a single continuous variable, `(Continuous(0.0, 1.0),)`.
+- `dof`: Degrees of freedom for each integrand, as a vector of integers. For example, `[[0, 1], [2, 3]]` means the first integrand has zero instances of var#1 and one of var#2; while the second integrand has two instances of var#1 and 3 of var#2. Defaults to `[ones(length(var))]`, i.e., one degree of freedom for each variable.
+- `type`: Type of the integrand, Float64 by default.
+- `obs`: Vector of observables needed to calculate the integrands, which will be used in the `measure` function call. 
+- `reweight`: Vector of reweight factors for each integrand. By default, all factors are initialized to one. Internally, a reweight factor of 1 will be appended to the end of the reweight vector, which is for the normalization integral.
+- `seed`: Seed for the random number generator. This also serves as the unique identifier of the configuration. If it is `nothing`, then a random seed between 1 and 1,000,000 is generated. 
+- `userdata`: User data to pass to the integrand and the measurement.
+- `neighbor`: Vector of tuples that define neighboring integrands. For example, `[(1, 2), (2, 3)]` means that the first and second integrands, and the second and third integrands, are neighbors. Neighboring integrands are directly connected in the Markov chain. By default, all integrands are connected in ascending order. Note that the normalization integral is automatically appended at the end of the integrand list and is considered as neighbor with the first user-defined integrand.
+
+# Example
+```julia
+cfg = Configuration(
+    var = (Continuous(0.0, 1.0), Continuous(-1.0, 1.0)),
+    dof = [[1, 1], [2, 0]],
+    obs = [0.0, 0.0],
+    seed = 1234,
+    neighbor = [(1, 2)]
+)
+```
+"""
 function Configuration(;
     var::V=(Continuous(0.0, 1.0),),
     dof::Union{Int,AbstractVector,AbstractMatrix}=(is_variable(typeof(var)) ? 1 : [ones(Int, length(var)),]),

diff --git a/src/distribution/variable.jl b/src/distribution/variable.jl
@@ -162,7 +162,7 @@ end
 """
     function Continuous(bounds::AbstractVector{Union{AbstractVector,Tuple}}, size=MaxOrder; offset=0, alpha=2.0, adapt=true, ninc=1000, grid=collect(LinRange(lower, upper, ninc)))
 
-Create a set of continous variable pools sampling from the set [lower, upper) with a distribution generated by a Vegas map, and pack into a ``CompositeVar``. The distribution is trained after each iteraction if `adapt = true`. 
+Create a set of continous variable pools sampling from the set [lower, upper) with a distribution generated by a Vegas map, and pack into a `CompositeVar`. The distribution is trained after each iteraction if `adapt = true`. 
 
 # Arguments:
 - `bounds` : tuple of (lower, upper) for each continuous variable
@@ -286,7 +286,7 @@ end
 """
     function Discrete(lower::Int, upper::Int; distribution=nothing, alpha=2.0, adapt=true)
 
-Create a pool of integer variables sampling from the closed set [lower, lower+1, ..., upper] with the distribution `Discrete.distribution``. 
+Create a pool of integer variables sampling from the closed set [lower, lower+1, ..., upper] with the distribution `Discrete.distribution`. 
 The distribution is trained after each iteraction if `adapt = true`.
 
 # Arguments:
@@ -330,7 +330,7 @@ end
 """
     function Continuous(bounds::AbstractVector{Union{AbstractVector,Tuple}}, size=MaxOrder; offset=0, alpha=2.0, adapt=true, ninc=1000, grid=collect(LinRange(lower, upper, ninc)))
 
-Create a set of continous variable pools sampling from the set [lower, upper) with a distribution generated by a Vegas map, and pack into a ``CompositeVar``. The distribution is trained after each iteraction if `adapt = true`. 
+Create a set of continous variable pools sampling from the set [lower, upper) with a distribution generated by a Vegas map, and pack into a `CompositeVar`. The distribution is trained after each iteraction if `adapt = true`. 
 
 # Arguments:
 - `bounds` : tuple of (lower, upper) for each continuous variable