diff --git a/LICENSES_THIRD_PARTY b/LICENSES_THIRD_PARTY index c1f0f32..3cb2db8 100644 --- a/LICENSES_THIRD_PARTY +++ b/LICENSES_THIRD_PARTY @@ -1,6 +1,6 @@ -In order to use CEED.jl, it is necessary to download and install several third-party Julia packages, which may be distributed under various licenses. For the most recent list of these packages, refer to `Project.toml` and consult the license terms of the individual packages. +In order to use CEEDesigns.jl, it is necessary to download and install several third-party Julia packages, which may be distributed under various licenses. For the most recent list of these packages, refer to `Project.toml` and consult the license terms of the individual packages. -You must agree to the terms of these licenses, in addition to the CEED.jl source code license, in order to use this software. +You must agree to the terms of these licenses, in addition to the CEEDesigns.jl source code license, in order to use this software. -------------------------------------------------- Third party software listed by License type diff --git a/Project.toml b/Project.toml index 48d4f5b..a05fc1b 100644 --- a/Project.toml +++ b/Project.toml @@ -1,6 +1,6 @@ -name = "CEED" +name = "CEEDesigns" uuid = "e939450b-799e-4198-a5f5-3f2f7fb1c671" -version = "0.3.4" +version = "0.3.5" [deps] Clustering = "aaaa29a8-35af-508c-8bc3-b662a17a0fe5" diff --git a/docs/Project.toml b/docs/Project.toml index a3e3774..408fb3d 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -1,6 +1,6 @@ [deps] BetaML = "024491cd-cc6b-443e-8034-08ea7eb7db2b" -CEED = "e939450b-799e-4198-a5f5-3f2f7fb1c671" +CEEDesigns = "e939450b-799e-4198-a5f5-3f2f7fb1c671" CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b" D3Trees = "e3df1716-f71e-5df9-9e2d-98e193103c45" DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0" diff --git a/docs/make.jl b/docs/make.jl index 0c8e566..7cc3e8b 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -1,5 +1,5 @@ using Documenter, DocumenterMarkdown, Literate -using CEED +using CEEDesigns # Literate for tutorials const literate_dir = joinpath(@__DIR__, "..", "tutorials") @@ -37,7 +37,7 @@ pages = [ ] makedocs(; - sitename = "CEED.jl", + sitename = "CEEDesigns.jl", format = Documenter.HTML(; prettyurls = false, edit_link = "main", @@ -46,4 +46,4 @@ makedocs(; pages, ) -deploydocs(; repo = "github.com/Merck/CEED.jl.git") +deploydocs(; repo = "github.com/Merck/CEEDesigns.jl.git") diff --git a/docs/src/api.md b/docs/src/api.md index dde4f34..069b2b3 100644 --- a/docs/src/api.md +++ b/docs/src/api.md @@ -1,46 +1,46 @@ # API Documentation ```@meta -CurrentModule = CEED +CurrentModule = CEEDesigns ``` ## `StaticDesigns` ```@docs -CEED.StaticDesigns.efficient_designs -CEED.StaticDesigns.evaluate_experiments +CEEDesigns.StaticDesigns.efficient_designs +CEEDesigns.StaticDesigns.evaluate_experiments ``` ## `GenerativeDesigns` ```@docs -CEED.GenerativeDesigns.UncertaintyReductionMDP -CEED.GenerativeDesigns.EfficientValueMDP -CEED.GenerativeDesigns.State -CEED.GenerativeDesigns.Variance -CEED.GenerativeDesigns.Entropy +CEEDesigns.GenerativeDesigns.UncertaintyReductionMDP +CEEDesigns.GenerativeDesigns.EfficientValueMDP +CEEDesigns.GenerativeDesigns.State +CEEDesigns.GenerativeDesigns.Variance +CEEDesigns.GenerativeDesigns.Entropy ``` ```@docs -CEED.GenerativeDesigns.efficient_design -CEED.GenerativeDesigns.efficient_designs -CEED.GenerativeDesigns.efficient_value +CEEDesigns.GenerativeDesigns.efficient_design +CEEDesigns.GenerativeDesigns.efficient_designs +CEEDesigns.GenerativeDesigns.efficient_value ``` ### Distance-Based Sampling ```@docs -CEED.GenerativeDesigns.DistanceBased -CEED.GenerativeDesigns.QuadraticDistance -CEED.GenerativeDesigns.DiscreteDistance -CEED.GenerativeDesigns.MahalanobisDistance -CEED.GenerativeDesigns.Exponential +CEEDesigns.GenerativeDesigns.DistanceBased +CEEDesigns.GenerativeDesigns.QuadraticDistance +CEEDesigns.GenerativeDesigns.DiscreteDistance +CEEDesigns.GenerativeDesigns.MahalanobisDistance +CEEDesigns.GenerativeDesigns.Exponential ``` ## Plotting ```@docs -CEED.plot_front -CEED.make_labels -CEED.plot_evals +CEEDesigns.plot_front +CEEDesigns.make_labels +CEEDesigns.plot_evals ``` \ No newline at end of file diff --git a/docs/src/index.md b/docs/src/index.md index 44bd73f..43603fb 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -1,4 +1,4 @@ -# CEED.jl: Overview +# CEEDesigns.jl: Overview A decision-making framework for the cost-efficient design of experiments, balancing the value of acquired experimental evidence and incurred costs. We have considered two different experimental setups, which are outlined below. @@ -11,7 +11,7 @@ Here we assume that the same experimental design will be used for a population o For each subset of experiments, we consider an estimate of the value of acquired information. To give an example, if a set of experiments is used to predict the value of a specific target variable, our framework can leverage a built-in integration with [MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl) to estimate predictive accuracies of machine learning models fitted over subset of experimental features. -In the cost-sensitive setting of CEED, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized. +In the cost-sensitive setting of CEEDesigns, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized. Assuming the information values and optimized experimental costs for each subset of experiments, we then generate a set of cost-efficient experimental designs. diff --git a/docs/src/tutorials/GenerativeDesigns.jl b/docs/src/tutorials/GenerativeDesigns.jl index d4df6e4..d74f31f 100644 --- a/docs/src/tutorials/GenerativeDesigns.jl +++ b/docs/src/tutorials/GenerativeDesigns.jl @@ -115,7 +115,7 @@ data = coerce(data, types); # ## Generative Model for Outcomes Sampling -using CEED, CEED.GenerativeDesigns +using CEEDesigns, CEEDesigns.GenerativeDesigns # As previously discussed, we provide a dataset of historical records, the target variable, along with an information-theoretic measure to quantify the uncertainty about the target variable. diff --git a/docs/src/tutorials/GenerativeDesigns.md b/docs/src/tutorials/GenerativeDesigns.md index ec0ed2e..e56edc7 100644 --- a/docs/src/tutorials/GenerativeDesigns.md +++ b/docs/src/tutorials/GenerativeDesigns.md @@ -125,7 +125,7 @@ nothing #hide ## Generative Model for Outcomes Sampling ````@example GenerativeDesigns -using CEED, CEED.GenerativeDesigns +using CEEDesigns, CEEDesigns.GenerativeDesigns ```` As previously discussed, we provide a dataset of historical records, the target variable, along with an information-theoretic measure to quantify the uncertainty about the target variable. @@ -162,7 +162,10 @@ DistanceBased( target = "HeartDisease", uncertainty = Entropy, similarity = Exponential(; λ = 5), - distance = merge(Dict(c => DiscreteDistance() for c in categorical_feats), Dict(c => QuadraticDistance() for c in numeric_feats)) + distance = merge( + Dict(c => DiscreteDistance() for c in categorical_feats), + Dict(c => QuadraticDistance() for c in numeric_feats), + ), ); nothing #hide ```` diff --git a/docs/src/tutorials/StaticDesigns.jl b/docs/src/tutorials/StaticDesigns.jl index effdcf5..2a48a45 100644 --- a/docs/src/tutorials/StaticDesigns.jl +++ b/docs/src/tutorials/StaticDesigns.jl @@ -8,7 +8,7 @@ # For each subset $S \subseteq E$ of experiments, we denote by $v_S$ the value of information acquired from conducting experiments in $S$. -# In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. +# In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. # To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$. @@ -44,7 +44,7 @@ data[1:10, :] # ## Predictive Accuracy -# The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. +# The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. # We specify the experiments along with the associated features: @@ -96,9 +96,9 @@ model = classifier(; n_trees = 20, max_depth = 10) # ### Performance Evaluation -# We use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below). +# We use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below). -using CEED, CEED.StaticDesigns +using CEEDesigns, CEEDesigns.StaticDesigns # diff --git a/docs/src/tutorials/StaticDesigns.md b/docs/src/tutorials/StaticDesigns.md index 3e209b4..49685eb 100644 --- a/docs/src/tutorials/StaticDesigns.md +++ b/docs/src/tutorials/StaticDesigns.md @@ -12,7 +12,7 @@ Let us consider a set of $n$ experiments $E = \{ e_1, \ldots, e_n\}$. For each subset $S \subseteq E$ of experiments, we denote by $v_S$ the value of information acquired from conducting experiments in $S$. -In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. +In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$. @@ -50,7 +50,7 @@ data[1:10, :] ## Predictive Accuracy -The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. +The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. We specify the experiments along with the associated features: @@ -120,10 +120,10 @@ model = classifier(; n_trees = 20, max_depth = 10) ### Performance Evaluation -We use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below). +We use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below). ````@example StaticDesigns -using CEED, CEED.StaticDesigns +using CEEDesigns, CEEDesigns.StaticDesigns ```` ````@example StaticDesigns diff --git a/docs/src/tutorials/StaticDesignsFiltration.jl b/docs/src/tutorials/StaticDesignsFiltration.jl index db42586..258841e 100644 --- a/docs/src/tutorials/StaticDesignsFiltration.jl +++ b/docs/src/tutorials/StaticDesignsFiltration.jl @@ -14,7 +14,7 @@ # We denote the expected fraction of entities that remain in the triage after conducting a set $S$ of experiments as the filtration rate, $f_S$. In the context of disease triage, this can be interpreted as the fraction of patients for whom the experimental evidence does not provide a 'conclusive' result. -# In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. +# In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. # To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$. @@ -90,7 +90,7 @@ data_binary[1:10, :] # In this scenario, we model the value of information $v_S$ acquired by conducting a set of experiments as the ratio of patients for whom the results across the experiments in $S$ were 'inconclusive', i.e., $|\cap_{e\in S}\{ \text{patient} : \text{inconclusive in } e \}| / |\text{patients}|$. Essentially, the very same measure is used here to estimate the filtration rate. -# The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. +# The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. # We specify the experiments along with the associated features: @@ -105,9 +105,9 @@ experiments = Dict( # We may also provide additional zero-cost features, which are always available. zero_cost_features = ["Age", "Sex", "ChestPainType", "ExerciseAngina"] -# For binary datasets, we may use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the discriminative power of subsets of experiments. +# For binary datasets, we may use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the discriminative power of subsets of experiments. -using CEED, CEED.StaticDesigns +using CEEDesigns, CEEDesigns.StaticDesigns # @@ -175,7 +175,7 @@ scatter!( using MCTS, D3Trees experiments = Set(vcat.(designs[end][2].arrangement...)[1]) -(; planner) = CEED.StaticDesigns.optimal_arrangement( +(; planner) = CEEDesigns.StaticDesigns.optimal_arrangement( costs, perf_eval, experiments; diff --git a/docs/src/tutorials/StaticDesignsFiltration.md b/docs/src/tutorials/StaticDesignsFiltration.md index 159f936..622a139 100644 --- a/docs/src/tutorials/StaticDesignsFiltration.md +++ b/docs/src/tutorials/StaticDesignsFiltration.md @@ -18,7 +18,7 @@ Moreover, it can be assumed that a set of extrinsic decision-making rules is imp We denote the expected fraction of entities that remain in the triage after conducting a set $S$ of experiments as the filtration rate, $f_S$. In the context of disease triage, this can be interpreted as the fraction of patients for whom the experimental evidence does not provide a 'conclusive' result. -In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. +In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$. @@ -100,7 +100,7 @@ data_binary[1:10, :] In this scenario, we model the value of information $v_S$ acquired by conducting a set of experiments as the ratio of patients for whom the results across the experiments in $S$ were 'inconclusive', i.e., $|\cap_{e\in S}\{ \text{patient} : \text{inconclusive in } e \}| / |\text{patients}|$. Essentially, the very same measure is used here to estimate the filtration rate. -The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. +The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. We specify the experiments along with the associated features: @@ -120,10 +120,10 @@ We may also provide additional zero-cost features, which are always available. zero_cost_features = ["Age", "Sex", "ChestPainType", "ExerciseAngina"] ```` -For binary datasets, we may use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the discriminative power of subsets of experiments. +For binary datasets, we may use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the discriminative power of subsets of experiments. ````@example StaticDesignsFiltration -using CEED, CEED.StaticDesigns +using CEEDesigns, CEEDesigns.StaticDesigns ```` ````@example StaticDesignsFiltration @@ -204,7 +204,7 @@ The following is a visualisation of the DPW search tree that was used to find an using MCTS, D3Trees experiments = Set(vcat.(designs[end][2].arrangement...)[1]) -(; planner) = CEED.StaticDesigns.optimal_arrangement( +(; planner) = CEEDesigns.StaticDesigns.optimal_arrangement( costs, perf_eval, experiments; diff --git a/readme.md b/readme.md index b1cb280..13f8500 100644 --- a/readme.md +++ b/readme.md @@ -1,11 +1,11 @@

- CEED.jl logo - CEED.jl logo + CEEDesigns.jl logo + CEEDesigns.jl logo

_______ -[![Docs](https://img.shields.io/badge/docs-stable-blue.svg)](https://merck.github.io/CEED.jl/) +[![Docs](https://img.shields.io/badge/docs-stable-blue.svg)](https://merck.github.io/CEEDesigns.jl/) A decision-making framework for the cost-efficient design of experiments, balancing the value of acquired experimental evidence and incurred costs. We have considered two different experimental setups, which are outlined below. @@ -16,7 +16,7 @@ Here we assume that the same experimental design will be used for a population o For each subset of experiments, we consider an estimate of the value of acquired information. To give an example, if a set of experiments is used to predict the value of a specific target variable, our framework can leverage a built-in integration with [MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl) to estimate predictive accuracies of machine learning models fitted over subset of experimental features. -In the cost-sensitive setting of CEED, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized. +In the cost-sensitive setting of CEEDesigns, a user provides the monetary cost and execution time of each experiment. Given the constraint on the maximum number of parallel experiments along with a fixed tradeoff between monetary cost and execution time, we devise an arrangement of each subset of experiments such that the expected combined cost is minimized. Assuming the information values and optimized experimental costs for each subset of experiments, we then generate a set of cost-efficient experimental designs. diff --git a/src/CEED.jl b/src/CEEDesigns.jl similarity index 93% rename from src/CEED.jl rename to src/CEEDesigns.jl index 9b19da0..73b1b4c 100644 --- a/src/CEED.jl +++ b/src/CEEDesigns.jl @@ -1,4 +1,4 @@ -module CEED +module CEEDesigns using DataFrames, Plots export front, plot_front diff --git a/src/GenerativeDesigns/GenerativeDesigns.jl b/src/GenerativeDesigns/GenerativeDesigns.jl index a394bbe..9c3b1c1 100644 --- a/src/GenerativeDesigns/GenerativeDesigns.jl +++ b/src/GenerativeDesigns/GenerativeDesigns.jl @@ -11,7 +11,7 @@ using StatsBase: Weights, countmap, entropy, sample using Random: default_rng, AbstractRNG using MCTS -using ..CEED: front +using ..CEEDesigns: front export UncertaintyReductionMDP, DistanceBased export QuadraticDistance, DiscreteDistance, MahalanobisDistance diff --git a/src/StaticDesigns/StaticDesigns.jl b/src/StaticDesigns/StaticDesigns.jl index 459c66b..6b64c06 100644 --- a/src/StaticDesigns/StaticDesigns.jl +++ b/src/StaticDesigns/StaticDesigns.jl @@ -7,7 +7,7 @@ using POMDPs using POMDPTools: Deterministic using MCTS -using ..CEED: front +using ..CEEDesigns: front export evaluate_experiments, efficient_designs diff --git a/src/StaticDesigns/scripts/Project.toml b/src/StaticDesigns/scripts/Project.toml index bf4e83c..134f096 100644 --- a/src/StaticDesigns/scripts/Project.toml +++ b/src/StaticDesigns/scripts/Project.toml @@ -1,6 +1,6 @@ [deps] BetaML = "024491cd-cc6b-443e-8034-08ea7eb7db2b" -CEED = "e939450b-799e-4198-a5f5-3f2f7fb1c671" +CEEDesigns = "e939450b-799e-4198-a5f5-3f2f7fb1c671" CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b" DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0" Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f" diff --git a/src/StaticDesigns/scripts/demo.jl b/src/StaticDesigns/scripts/demo.jl index 455ab7d..0289425 100644 --- a/src/StaticDesigns/scripts/demo.jl +++ b/src/StaticDesigns/scripts/demo.jl @@ -1,4 +1,4 @@ -using CEED, CEED.StaticDesigns +using CEEDesigns, CEEDesigns.StaticDesigns using CSV, DataFrames ## heart failure prediction dataset @@ -64,7 +64,7 @@ designs2 = efficient_designs( ) # switch to plotly backend -CEED.plotly() +CEEDesigns.plotly() designs = designs2 plot_front(designs; labels = make_labels(designs), ylabel = "logloss") diff --git a/src/StaticDesigns/scripts/demo_binary.jl b/src/StaticDesigns/scripts/demo_binary.jl index da857ba..0719611 100644 --- a/src/StaticDesigns/scripts/demo_binary.jl +++ b/src/StaticDesigns/scripts/demo_binary.jl @@ -1,5 +1,5 @@ -using CEED, CEED.StaticDesigns +using CEEDesigns, CEEDesigns.StaticDesigns using CSV, DataFrames ## synthetic heart disease dataset with binary labels @@ -23,7 +23,7 @@ perf_eval = evaluate_experiments(experiments, data; zero_cost_features) designs = efficient_designs(experiments, perf_eval) # switch to plotly backend -CEED.plotly() +CEEDesigns.plotly() plot_front(designs; labels = make_labels(designs), ylabel = "discriminative pwr") diff --git a/src/StaticDesigns/scripts/demo_literate.jl b/src/StaticDesigns/scripts/demo_literate.jl index 16febba..817f12d 100644 --- a/src/StaticDesigns/scripts/demo_literate.jl +++ b/src/StaticDesigns/scripts/demo_literate.jl @@ -8,7 +8,7 @@ # For each subset $S \subseteq {x_1, ..., x_n}$ of features, we evaluate the accuracy $a_S$ of a predictive model that predicts the value of $y$ based on readouts over features in $S$. Assuming the patient population follows the same distribution as the historical observations, the predictive accuracy serves as a proxy for the information gained from observing the features in $S$. -# In the cost-sensitive setting of CEED, observing the features $S$ incurs a cost $c_S$. Generally, this cost is specified in terms of monetary cost and execution time of an experiment. Considering the constraint of a maximum number of parallel experiments, the algorithm recommends an arrangement of experiments that minimizes the total running time. Eventually, for a fixed tradeoff between monetary cost and execution time, a combined cost $c_S$ is obtained. +# In the cost-sensitive setting of CEEDesigns, observing the features $S$ incurs a cost $c_S$. Generally, this cost is specified in terms of monetary cost and execution time of an experiment. Considering the constraint of a maximum number of parallel experiments, the algorithm recommends an arrangement of experiments that minimizes the total running time. Eventually, for a fixed tradeoff between monetary cost and execution time, a combined cost $c_S$ is obtained. # Assuming we know the accuracies $a_S$ and experimental costs $c_S$ for each subset $S \subseteq {x_1, ..., x_n}$, we can generate a set of Pareto-efficient experimental designs considering both predictive accuracy and cost. @@ -21,7 +21,7 @@ data = CSV.File("data/heart_disease.csv") |> DataFrame # ## Predictive Accuracy -# The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. +# The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. # We specify the experiments along with the associated features: @@ -72,9 +72,9 @@ model = classifier(; n_trees = 20, max_depth = 10) # ### Performance Evaluation -# We use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as the measure of accuracy. It is possible to pass additional keyword arguments that will propagate `MLJ.evaluate` (such as `measure`). +# We use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as the measure of accuracy. It is possible to pass additional keyword arguments that will propagate `MLJ.evaluate` (such as `measure`). -using CEED, CEED.StaticDesigns +using CEEDesigns, CEEDesigns.StaticDesigns perf_eval = evaluate_experiments( experiments, model, @@ -103,7 +103,7 @@ costs = Dict( designs = efficient_designs(costs, perf_eval) ## Switch to plotly backend for plotting -CEED.plotly() +CEEDesigns.plotly() plot_front(designs; labels = make_labels(designs), ylabel = "logloss") diff --git a/test/GenerativeDesigns/test_distances_sum.jl b/test/GenerativeDesigns/test_distances_sum.jl index 547114a..fc0b1ac 100644 --- a/test/GenerativeDesigns/test_distances_sum.jl +++ b/test/GenerativeDesigns/test_distances_sum.jl @@ -19,7 +19,7 @@ types = Dict( ) data = coerce(data, types); -using CEED, CEED.GenerativeDesigns +using CEEDesigns, CEEDesigns.GenerativeDesigns evidence = Evidence("Age" => 35, "Sex" => "M") diff --git a/test/GenerativeDesigns/test_mahalanobis.jl b/test/GenerativeDesigns/test_mahalanobis.jl index 1a5a205..6222834 100644 --- a/test/GenerativeDesigns/test_mahalanobis.jl +++ b/test/GenerativeDesigns/test_mahalanobis.jl @@ -17,7 +17,7 @@ data = coerce(data, types); continuous_cols = filter(colname -> eltype(data[!, colname]) == Float64, names(data)) data = data[!, continuous_cols∪["HeartDisease"]] -using CEED, CEED.GenerativeDesigns +using CEEDesigns, CEEDesigns.GenerativeDesigns evidence = Evidence() diff --git a/test/StaticDesigns/test.jl b/test/StaticDesigns/test.jl index 328c1f1..b2bad19 100644 --- a/test/StaticDesigns/test.jl +++ b/test/StaticDesigns/test.jl @@ -1,5 +1,5 @@ using Random: seed! -using CEED, CEED.StaticDesigns +using CEEDesigns, CEEDesigns.StaticDesigns using CSV, DataFrames ## predictive model from `MLJ` diff --git a/test/fronts.jl b/test/fronts.jl index 6cb5405..fc34870 100644 --- a/test/fronts.jl +++ b/test/fronts.jl @@ -1,5 +1,5 @@ using Test -using CEED +using CEEDesigns @testset "`front(v)` tests" begin v = [] diff --git a/tutorials/GenerativeDesigns.jl b/tutorials/GenerativeDesigns.jl index d4df6e4..d74f31f 100644 --- a/tutorials/GenerativeDesigns.jl +++ b/tutorials/GenerativeDesigns.jl @@ -115,7 +115,7 @@ data = coerce(data, types); # ## Generative Model for Outcomes Sampling -using CEED, CEED.GenerativeDesigns +using CEEDesigns, CEEDesigns.GenerativeDesigns # As previously discussed, we provide a dataset of historical records, the target variable, along with an information-theoretic measure to quantify the uncertainty about the target variable. diff --git a/tutorials/GliomaGrading/GliomaGrading.jl b/tutorials/GliomaGrading/GliomaGrading.jl index a7f1fd6..41ea0bf 100644 --- a/tutorials/GliomaGrading/GliomaGrading.jl +++ b/tutorials/GliomaGrading/GliomaGrading.jl @@ -52,12 +52,12 @@ end # ╔═╡ 6ebd71e0-b49a-4d12-b441-4805efc69520 begin - using CEED, CEED.StaticDesigns + using CEEDesigns, CEEDesigns.StaticDesigns md""" ### Cost-Efficient Feature Selection - We use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below). + We use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below). """ end @@ -73,7 +73,7 @@ Let us consider a set of $n$ experiments $E = \{ e_1, \ldots, e_n\}$. For each subset $S \subseteq E$ of experiments, we denote by $v_S$ the value of information acquired from conducting experiments in $S$. -In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. +In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$. @@ -138,7 +138,7 @@ md"Classification target is just the glioma grade" target = "Grade" # ╔═╡ d74684f6-7fd4-41c8-9917-d99dfc1f5f64 -md"In the cost-sensitive setting of CEED, obtaining additional experimental evidence comes with a cost. We assume that each gene mutation factor is obtained through a separate experiment." +md"In the cost-sensitive setting of CEEDesigns, obtaining additional experimental evidence comes with a cost. We assume that each gene mutation factor is obtained through a separate experiment." # ╔═╡ f176f8cf-8941-4257-ba41-60fff864aa56 # We assume that each feature is measured separately and the measurement incurs a monetary cost. @@ -413,7 +413,7 @@ begin guidefontsize = 8, tickfontsize = 8, ylabel = "accuracy", - c = CEED.colorant"rgb(110,206,178)", + c = CEEDesigns.colorant"rgb(110,206,178)", label = "w/ histology feature", xrotation = 50, ) @@ -428,7 +428,7 @@ begin guidefontsize = 8, tickfontsize = 8, ylabel = "accuracy", - c = CEED.colorant"rgb(104,140,232)", + c = CEEDesigns.colorant"rgb(104,140,232)", label = "w/o histology feature", width = 2, xrotation = 50, @@ -472,10 +472,10 @@ begin xlabel = "combined cost", ylabel = "accuracy", label = "w/ histology feature", - c = CEED.colorant"rgb(110,206,178)", + c = CEEDesigns.colorant"rgb(110,206,178)", mscolor = nothing, fontsize = 16, - #fill = (0, CEED.colorant"rgb(110,206,178)"), + #fill = (0, CEEDesigns.colorant"rgb(110,206,178)"), fillalpha = 0.2, legend = :bottomright, ) @@ -485,10 +485,10 @@ begin map(x -> x[1][1], design_new_feature_no_feature), map(x -> 1 - x[1][2], design_new_feature_no_feature); label = "w/o histology feature", - c = CEED.colorant"rgb(104,140,232)", + c = CEEDesigns.colorant"rgb(104,140,232)", mscolor = nothing, fontsize = 16, - #fill = (0, CEED.colorant"rgb(104,140,232)"), + #fill = (0, CEEDesigns.colorant"rgb(104,140,232)"), fillalpha = 0.15, title = "sensitivity = $sensitivity, specificity = $specificity, cost = $cost", ) diff --git a/tutorials/GliomaGrading/GliomaGrading_anim.jl b/tutorials/GliomaGrading/GliomaGrading_anim.jl index 961ff9c..4884302 100644 --- a/tutorials/GliomaGrading/GliomaGrading_anim.jl +++ b/tutorials/GliomaGrading/GliomaGrading_anim.jl @@ -55,12 +55,12 @@ end # ╔═╡ 6ebd71e0-b49a-4d12-b441-4805efc69520 begin - using CEED, CEED.StaticDesigns + using CEEDesigns, CEEDesigns.StaticDesigns md""" ### Performance Evaluation - We use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below). + We use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below). """ end @@ -76,7 +76,7 @@ Let us consider a set of $n$ experiments $E = \{ e_1, \ldots, e_n\}$. For each subset $S \subseteq E$ of experiments, we denote by $v_S$ the value of information acquired from conducting experiments in $S$. -In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. +In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$. @@ -153,7 +153,7 @@ md"Classification target is just the glioma grade" target = "Grade" # ╔═╡ d74684f6-7fd4-41c8-9917-d99dfc1f5f64 -md"In the cost-sensitive setting of CEED, obtaining additional experimental evidence comes with a cost. We assume that each gene mutation is observed individually, while all histological features are gathered in one single experiment." +md"In the cost-sensitive setting of CEEDesigns, obtaining additional experimental evidence comes with a cost. We assume that each gene mutation is observed individually, while all histological features are gathered in one single experiment." # ╔═╡ f176f8cf-8941-4257-ba41-60fff864aa56 # We assume that each feature is measured separately and the measurement incurs a monetary cost. @@ -363,10 +363,10 @@ function make_plot(specificity, sensitivity, cost, perf_eval) xlabel = "combined cost", ylabel = "accuracy", label = "w/ histology feature", - c = CEED.colorant"rgb(110,206,178)", + c = CEEDesigns.colorant"rgb(110,206,178)", mscolor = nothing, fontsize = 16, - #fill = (0, CEED.colorant"rgb(110,206,178)"), + #fill = (0, CEEDesigns.colorant"rgb(110,206,178)"), fillalpha = 0.2, legend = :bottomright, ) @@ -380,10 +380,10 @@ function make_plot(specificity, sensitivity, cost, perf_eval) map(x -> x[1][1], design), map(x -> 1 - x[1][2], design); label = "w/o histology feature", - c = CEED.colorant"rgb(104,140,232)", + c = CEEDesigns.colorant"rgb(104,140,232)", mscolor = nothing, fontsize = 16, - #fill = (0, CEED.colorant"rgb(104,140,232)"), + #fill = (0, CEEDesigns.colorant"rgb(104,140,232)"), fillalpha = 0.15, title = "sensitivity = $sensitivity, specificity = $specificity, cost = $cost", ) diff --git a/tutorials/GliomaGrading/GliomaGrading_left_cells.jl b/tutorials/GliomaGrading/GliomaGrading_left_cells.jl index 0e6c942..4fe9616 100644 --- a/tutorials/GliomaGrading/GliomaGrading_left_cells.jl +++ b/tutorials/GliomaGrading/GliomaGrading_left_cells.jl @@ -21,7 +21,7 @@ begin guidefontsize = 8, tickfontsize = 8, ylabel = "accuracy", - c = CEED.colorant"rgb(110,206,178)", + c = CEEDesigns.colorant"rgb(110,206,178)", label = "w/ histology", xrotation = 50, ) @@ -35,7 +35,7 @@ begin guidefontsize = 8, tickfontsize = 8, ylabel = "accuracy", - c = CEED.colorant"rgb(104,140,232)", + c = CEEDesigns.colorant"rgb(104,140,232)", label = "w/o histology", width = 2, xrotation = 50, diff --git a/tutorials/Project.toml b/tutorials/Project.toml index 279798d..d4a0fc7 100644 --- a/tutorials/Project.toml +++ b/tutorials/Project.toml @@ -1,6 +1,6 @@ [deps] BetaML = "024491cd-cc6b-443e-8034-08ea7eb7db2b" -CEED = "e939450b-799e-4198-a5f5-3f2f7fb1c671" +CEEDesigns = "e939450b-799e-4198-a5f5-3f2f7fb1c671" CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b" D3Trees = "e3df1716-f71e-5df9-9e2d-98e193103c45" DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0" diff --git a/tutorials/StaticDesigns.jl b/tutorials/StaticDesigns.jl index effdcf5..2a48a45 100644 --- a/tutorials/StaticDesigns.jl +++ b/tutorials/StaticDesigns.jl @@ -8,7 +8,7 @@ # For each subset $S \subseteq E$ of experiments, we denote by $v_S$ the value of information acquired from conducting experiments in $S$. -# In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. +# In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. # To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$. @@ -44,7 +44,7 @@ data[1:10, :] # ## Predictive Accuracy -# The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. +# The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. # We specify the experiments along with the associated features: @@ -96,9 +96,9 @@ model = classifier(; n_trees = 20, max_depth = 10) # ### Performance Evaluation -# We use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below). +# We use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the predictive accuracy over subsets of experiments. We use `LogLoss` as a measure of accuracy. It is possible to pass additional keyword arguments, which will be passed to `MLJ.evaluate` (such as `measure`, shown below). -using CEED, CEED.StaticDesigns +using CEEDesigns, CEEDesigns.StaticDesigns # diff --git a/tutorials/StaticDesignsFiltration.jl b/tutorials/StaticDesignsFiltration.jl index db42586..258841e 100644 --- a/tutorials/StaticDesignsFiltration.jl +++ b/tutorials/StaticDesignsFiltration.jl @@ -14,7 +14,7 @@ # We denote the expected fraction of entities that remain in the triage after conducting a set $S$ of experiments as the filtration rate, $f_S$. In the context of disease triage, this can be interpreted as the fraction of patients for whom the experimental evidence does not provide a 'conclusive' result. -# In the cost-sensitive setting of CEED, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. +# In the cost-sensitive setting of CEEDesigns, conducting an experiment $e$ incurs a cost $(m_e, t_e)$. Generally, this cost is specified in terms of monetary cost and execution time of the experiment. # To compute the cost associated with carrying out a set of experiments $S$, we first need to introduce the notion of an arrangement $o$ of the experiments $S$. An arrangement is modeled as a sequence of mutually disjoint subsets of $S$. In other words, $o = (o_1, \ldots, o_l)$ for a given $l\in\mathbb N$, where $\bigcup_{i=1}^l o_i = S$ and $o_i \cap o_j = \emptyset$ for each $1\leq i < j \leq l$. @@ -90,7 +90,7 @@ data_binary[1:10, :] # In this scenario, we model the value of information $v_S$ acquired by conducting a set of experiments as the ratio of patients for whom the results across the experiments in $S$ were 'inconclusive', i.e., $|\cap_{e\in S}\{ \text{patient} : \text{inconclusive in } e \}| / |\text{patients}|$. Essentially, the very same measure is used here to estimate the filtration rate. -# The CEED package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. +# The CEEDesigns package offers an additional flexibility by allowing an experiment to yield readouts over multiple features at the same time. In our scenario, we can consider the features `RestingECG`, `Oldpeak`, `ST_Slope`, and `MaxHR` to be obtained from a single experiment `ECG`. # We specify the experiments along with the associated features: @@ -105,9 +105,9 @@ experiments = Dict( # We may also provide additional zero-cost features, which are always available. zero_cost_features = ["Age", "Sex", "ChestPainType", "ExerciseAngina"] -# For binary datasets, we may use `evaluate_experiments` from `CEED.StaticDesigns` to evaluate the discriminative power of subsets of experiments. +# For binary datasets, we may use `evaluate_experiments` from `CEEDesigns.StaticDesigns` to evaluate the discriminative power of subsets of experiments. -using CEED, CEED.StaticDesigns +using CEEDesigns, CEEDesigns.StaticDesigns # @@ -175,7 +175,7 @@ scatter!( using MCTS, D3Trees experiments = Set(vcat.(designs[end][2].arrangement...)[1]) -(; planner) = CEED.StaticDesigns.optimal_arrangement( +(; planner) = CEEDesigns.StaticDesigns.optimal_arrangement( costs, perf_eval, experiments;