Skip to content

Commit

Permalink
flushes out the summary stats calcs and starts evo hx
Browse files Browse the repository at this point in the history
  • Loading branch information
raywray committed Oct 23, 2024
1 parent 311fd0b commit c66b617
Show file tree
Hide file tree
Showing 20 changed files with 69 additions and 4 deletions.
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@
[submodule "evolutionary_history/supercomputer_scripts"]
path = evolutionary_history/supercomputer_scripts
url = https://github.com/raywray/supercomputer_scripts.git
[submodule "evolutionary_history/CoalMiner"]
path = evolutionary_history/CoalMiner
url = https://github.com/raywray/CoalMiner.git
1 change: 1 addition & 0 deletions evolutionary_history/CoalMiner
Submodule CoalMiner added at 1c6f55
5 changes: 4 additions & 1 deletion evolutionary_history/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
Next, the pipeline uses the SFS(s) created for fastsimcoal analyses, as well as a user-generated parameter `yaml` file, to feed into a fastsimcoal wrapper (citation here) that generates thousands of random coalescent models. This was run on a cluster. The wrapper identified the best model and parameters, ran a bootstrap analysis, and generated images. The results of the best model are included here in the `results/fastsimcoal` folder.
After the summary statistics were generated, we used the SFS(s) created for fastsimcoal analyses from statMix, as well as a user-generated parameter `yaml` file, to feed into CoalMiner (citation here), a random coalescent topology generate to create 1000 coalescent models. This was run on a cluster. The wrapper identified the best model and parameters, ran a bootstrap analysis, and generated images. The results of the best model are included here in the `results/fastsimcoal` folder.

First, we ran CoalMiner. The output can be found at `/data/output/evolutionary_history/coalminer_output`

Submodule supercomputer_scripts updated from 000000 to 8eb62c
40 changes: 40 additions & 0 deletions evolutionary_history/generate_evolutionary_history.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
import os

from utilities import basic_utilities

def generate_models_with_coalminer():
# define paths
project_path = "/home/raya/Documents/Projects/hops_pipeline"
coalminer_path = os.path.join(project_path, "evolutionary_history/CoalMiner")
coalminer_input_folder_path = os.path.join(project_path, "data/input/evolutionary_history/coal_miner_input_files")
coalminer_input_yml = "user_input_hops_k4.yml"

# copy observed SFS and .yml into the CoalMiner project
copy_sfs_cmd = [
"cp",
"-r",
coalminer_input_folder_path,
coalminer_path
]
basic_utilities.execute_command(copy_sfs_cmd)

# run coalminer
# change into the coalminer dir
os.chdir(coalminer_path)
run_coalminer_cmd = [
"python3",
"coalminer.py",
coalminer_input_yml
]
basic_utilities.execute_command(run_coalminer_cmd)


def run_models_on_cluster():
print("cluster")

def find_best_model():
print("best")

def run_bootstrap():
print("boot")
# @ARUN
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
14 changes: 14 additions & 0 deletions general_utilities/utilities.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import os

def execute_command(command_list):
command = " ".join(command_list)
print(command)
result = os.system(command)
if result != 0:
print("Command Failed to Execute")
else:
print("Command Successfully Executed")

def create_directory(dir_path):
if not os.path.exists(dir_path):
os.makedirs(dir_path)
File renamed without changes.
8 changes: 6 additions & 2 deletions summary_statistics/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
First, we created a reusable pipeline that generates several summary statistics (statMix) and fed our *hops* `vcf` data through the pipeline. The summary statistics and analysis results we generated for *hops* are as follows:
We created a reusable pipeline (statMix) that generates several summary statistics and fed our *hops* `vcf` data through the pipeline. The summary statistics and analysis results we generated for *hops* are as follows:
- Hardy Weinberg Equilibrium
- Full population structure analysis using admixture
- SFS based on the population structure results
Expand All @@ -12,4 +12,8 @@ First, we created a reusable pipeline that generates several summary statistics
- Fit
- Fis
- allele frequency
- SFS(s) compatible for fastsimcoal analyses
- SFS(s) compatible for fastsimcoal analyses

The commands we used to generate the summary statistics are found in `/summary_statistics/generate_summary_statistics.py`

The results of these analyses are found in `/data/output/summary_statistics/statmix_output`
2 changes: 1 addition & 1 deletion main.py → ...statistics/generate_summary_statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ def execute_command(command_list):
def get_statmix_stats():
stats = ["hwe", "pop_structure", "sfs", "generic_stats", "fsc"]
statmix_path = os.path.join("/home/raya/Documents/Projects/hops_pipeline/statMix", "statmix.py")
vcf_path = "/home/raya/Documents/Projects/hops_pipeline/input_data/hops.vcf"
vcf_path = "/home/raya/Documents/Projects/hops_pipeline/data/input/summary_statistics/hops.vcf"
output_prefix = "hops"

command = [
Expand Down

0 comments on commit c66b617

Please sign in to comment.