Skip to content

Commit

Permalink
Merge branch 'release/1.5.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
keiranmraine committed Jun 1, 2016
2 parents 4a5d8de + 69c2497 commit e3b13a7
Show file tree
Hide file tree
Showing 21 changed files with 234 additions and 206 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@
/setup.log
/perl/perltidy.LOG
/perl/bin/cpanm
/perl/MYMETA.json
/perl/MYMETA.yml
71 changes: 36 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,15 @@
LICENCE
=======
Copyright (c) 2014 Genome Research Ltd.

Author: Cancer Genome Project [email protected]

This file is part of cgpBattenberg.

cgpBattenberg is free software: you can redistribute it and/or modify it under
the terms of the GNU Affero General Public License as published by the Free
Software Foundation; either version 3 of the License, or (at your option) any
later version.

This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more
details.

You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.

1. The usage of a range of years within a copyright statement contained within
this distribution should be interpreted as being equivalent to a list of years
including the first and last year specified and all consecutive years between
them. For example, a copyright statement that reads 'Copyright (c) 2005, 2007-
2009, 2011-2012' should be interpreted as being identical to a statement that
reads 'Copyright (c) 2005, 2007, 2008, 2009, 2011, 2012' and a copyright
statement that reads "Copyright (c) 2005-2012' should be interpreted as being
identical to a statement that reads 'Copyright (c) 2005, 2006, 2007, 2008,
2009, 2010, 2011, 2012'."


cgpBattenberg
=============

An installation helper, perl wrapper and the R program Battenberg which detects subclonality and copy number in matched NGS data.

## Installation

Please install the following before attempting to run ``setup.sh <install_to_folder>``
Please install the following before attempting to run ``setup.sh <install_to_folder> [X/lib/perl:Y/lib/perl]``

1. [PCAP-core](https://github.com/ICGC-TCGA-PanCancer/PCAP-core/releases)
2. [alleleCount](https://github.com/cancerit/alleleCount/releases)
1. [PCAP-core v2.1.3+](https://github.com/ICGC-TCGA-PanCancer/PCAP-core/releases)
2. [alleleCount v3.0.1+](https://github.com/cancerit/alleleCount/releases)
3. [cgpVcf v2.0.1+](https://github.com/cancerit/cgpVcf/releases)

All of the items listed here use the same install method.

Expand Down Expand Up @@ -69,3 +38,35 @@ For the most up to date usage instructions for the wrapper code please see the c



----

LICENCE
=======
Copyright (c) 2014-2016 Genome Research Ltd.

Author: Cancer Genome Project [email protected]

This file is part of cgpBattenberg.

cgpBattenberg is free software: you can redistribute it and/or modify it under
the terms of the GNU Affero General Public License as published by the Free
Software Foundation; either version 3 of the License, or (at your option) any
later version.

This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more
details.

You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.

1. The usage of a range of years within a copyright statement contained within
this distribution should be interpreted as being equivalent to a list of years
including the first and last year specified and all consecutive years between
them. For example, a copyright statement that reads 'Copyright (c) 2005, 2007-
2009, 2011-2012' should be interpreted as being identical to a statement that
reads 'Copyright (c) 2005, 2007, 2008, 2009, 2011, 2012' and a copyright
statement that reads "Copyright (c) 2005-2012' should be interpreted as being
identical to a statement that reads 'Copyright (c) 2005, 2006, 2007, 2008,
2009, 2010, 2011, 2012'."
1 change: 1 addition & 0 deletions perl/MANIFEST
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,6 @@ share/battenberg/RunBAFLogR.R
share/battenberg/RunGetHaplotypedBAFs.R
share/battenberg/RunImpute.R
share/battenberg/segmentBAFphased.R
share/gender/GRCh37d5_Y.loci
t/1_pm_compile.t
t/2_pl_compile.t
42 changes: 0 additions & 42 deletions perl/MYMETA.json

This file was deleted.

23 changes: 0 additions & 23 deletions perl/MYMETA.yml

This file was deleted.

1 change: 1 addition & 0 deletions perl/Makefile.PL
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ WriteMakefile(
PREREQ_PM => {
'Const::Fast' => 0.014,
'Try::Tiny' => 0.22,
'Bio::DB::HTS' => 2.0,
}
);

Expand Down
27 changes: 21 additions & 6 deletions perl/bin/battenberg.pl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/perl

##########LICENCE##########
# Copyright (c) 2014,2015 Genome Research Ltd.
# Copyright (c) 2014-2016 Genome Research Ltd.
#
# Author: Cancer Genome Project [email protected]
#
Expand Down Expand Up @@ -45,6 +45,7 @@
segmentphased fitcn subclones finalise );

const my @VALID_PROTOCOLS => qw( WGS WXS RNA );
const my @VALID_GENDERS => qw(XX XY L);

const my $DEFAULT_ALLELE_COUNT_MBQ => 20;
const my $DEFAULT_PLATFORM_GAMMA=>1;
Expand Down Expand Up @@ -135,7 +136,6 @@ sub setup {
'p|process=s' => \$opts{'process'},
'u|thousand-genomes-loc=s' => \$opts{'1kgenloc'},
'r|reference=s' => \$opts{'reference'},
's|is-male' => \$opts{'is_male'},
'e|impute-info=s' => \$opts{'impute_info'},
'c|prob-loci=s' => \$opts{'prob_loci'},
'g|logs=s' => \$opts{'lgs'},
Expand All @@ -157,11 +157,13 @@ sub setup {
'pr|protocol=s' => \$opts{'protocol'},
'pl|platform=s' => \$opts{'platform'},
'a|allele-counts=s' => \$opts{'allele-counts'},
'ge|gender=s' => \$opts{'gender'},
'gl|genderloci=s' => \$opts{'genderloci'},
'j|jobs' => \$opts{'jobs'},
) or pod2usage(2);

pod2usage(-message => Sanger::CGP::Battenberg::license, -verbose => 0) if(defined $opts{'h'});
pod2usage(-message => Sanger::CGP::Battenberg::license, -verbose => 2) if(defined $opts{'m'});
pod2usage(-verbose => 0, -exitval => 0) if(defined $opts{'h'});
pod2usage(-verbose => 2, -exitval => 0) if(defined $opts{'m'});

# then check for no args:
my $defined;
Expand Down Expand Up @@ -205,6 +207,16 @@ sub setup {
delete $opts{'process'} unless(defined $opts{'process'});
delete $opts{'index'} unless(defined $opts{'index'});

if(defined $opts{'gender'}){
pod2usage(-message => 'unknown gender value: '.$opts{'gender'}, -verbose => 1) unless(first {$_ eq $opts{'gender'}} @VALID_GENDERS);
if($opts{'gender'} eq 'L') {
die "ERROR: Gender of XY/XX must be supplied when 'allele-counts' defined\n" if(defined $opts{'allele-counts'});
$opts{'gender'} = Sanger::CGP::Battenberg::Implement::determine_gender(\%opts);
}
} else {
pod2usage(-message => 'gender not set', -verbose => 1);
}

if(exists $opts{'protocol'} && defined $opts{'protocol'}) {
my $bad_prot = 1;
$bad_prot = 0 if(first { $_ eq $opts{'protocol'} } @VALID_PROTOCOLS);
Expand Down Expand Up @@ -269,7 +281,7 @@ sub setup {
make_path($logs) unless(-d $logs);
$opts{'logs'} = $logs;

if(exists($opts{'is_male'}) && defined($opts{'is_male'})){
if($opts{'gender'} eq 'XY'){
$opts{'is_male'} = 'TRUE';
}else{
$opts{'is_male'} = 'FALSE';
Expand All @@ -291,6 +303,7 @@ sub setup {

$opts{'protocol'} = $DEFAULT_PROTOCOL if(!exists($opts{'protocol'}) || !defined($opts{'protocol'}));
$opts{'platform'} = $DEFAULT_PLATFORM if(!exists($opts{'platform'}) || !defined($opts{'platform'}));

return \%opts;
}

Expand All @@ -312,7 +325,7 @@ =head1 SYNOPSIS
- when '-a' defined sample name
-normbam -nb Path to normal bam file
- when '-a' defined sample name
-is-male -s Flag, if the sample is male
-gender -ge Gender, XX, XY or L (see -gl)
-impute-info -e Location of the impute info file
-thousand-genomes-loc -u Location of the directory containing 1k genomes data
-ignore-contigs-file -ig File containing contigs to ignore
Expand All @@ -336,6 +349,8 @@ =head1 SYNOPSIS
-assembly -ra Reference assembly []
-protocol -pr Sequencing protocol [WGS]
-platform -pl Sequencing platfrom [ILLUMINA]
-genderloci -gl List of gender loci, required when '-ge L' [share/gender/GRCh37d5_Y.loci]
- these are loci that will not present at all in a female sample
Optional system related parameters:
-threads -t Number of threads allowed on this machine (default 1)
Expand Down
21 changes: 14 additions & 7 deletions perl/bin/battenberg_CN_to_VCF.pl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/perl

##########LICENCE##########
# Copyright (c) 2014 Genome Research Ltd.
# Copyright (c) 2014-2016 Genome Research Ltd.
#
# Author: David Jones <[email protected]>
#
Expand Down Expand Up @@ -34,7 +34,7 @@ BEGIN
use Getopt::Long;
use Pod::Usage qw(pod2usage);

use Bio::DB::Sam;
use Bio::DB::HTS;
use Try::Tiny;
use PCAP::Cli;

Expand Down Expand Up @@ -69,6 +69,7 @@ BEGIN
my $record_converter = new Sanger::CGP::Vcf::VCFCNConverter(
-contigs => [values %$contigs]
);
$record_converter->extended_cn(1);

my ($input_loc,$output_loc,$IN_FH,$OUT_FH);
try{
Expand Down Expand Up @@ -105,12 +106,19 @@ BEGIN


#Iterate through input and create a record for each.
my $fai = Bio::DB::Sam::Fai->load($reference);
my $fai = Bio::DB::HTS::Fai->load($reference);
while(<$IN_FH>){
my $line = $_;
chomp($line);
next if($line =~ m/^\s+chr/);
my ($seg_no,$chr,$start,$end,$blank1,$blank2,$blank3,$blank4,$mt_cn_tot,$mt_cn_min,undef) = split('\s+',$line);
my ($seg_no, $chr, $start, $end, $mt_cn_tot, $mt_cn_min, $mt_frac, $mt_cn_tot_sec, $mt_cn_min_sec, $mt_frac_sec) = (split('\s+',$line))[0,1,2,3,8,9,10,11,12,13];

my $extended = { 'mt_fcf' => $mt_frac eq 'NA' ? '.' : $mt_frac,
'mt_tcs' => $mt_cn_tot_sec eq 'NA' ? '.' : $mt_cn_tot_sec,
'mt_mcs' => $mt_cn_min_sec eq 'NA' ? '.' : $mt_cn_min_sec,
'mt_fcs' => $mt_frac_sec eq 'NA' ? '.' : $mt_frac_sec,
};

my $wt_cn_tot = 2;
my $wt_cn_min = 1;
$start--; # all symbolic ALTs require preceeding base padding
Expand All @@ -125,7 +133,7 @@ BEGIN
&& defined($mt_cn_min));

my $start_allele = $fai->fetch("$chr:$start-$start");
print $OUT_FH $record_converter->generate_record($chr,$start,$end,$start_allele,$wt_cn_tot,$wt_cn_min,$mt_cn_tot,$mt_cn_min);
print $OUT_FH $record_converter->generate_record($chr,$start,$end,$start_allele,$wt_cn_tot,$wt_cn_min,$mt_cn_tot,$mt_cn_min, $extended);
}

}catch{
Expand Down Expand Up @@ -172,7 +180,7 @@ sub parse_samples {
$param_mod = 'w';
}
if(defined $opts->{'sb'.$param_mod}) {
$sam = Bio::DB::Sam->new(-bam => $opts->{'sb'.$param_mod}, -fasta => $reference);
$sam = Bio::DB::HTS->new(-bam => $opts->{'sb'.$param_mod}, -fasta => $reference);
$samp_ref = Sanger::CGP::Vcf::BamUtil->parse_samples($sam->header->text,
$opts->{$param_mod.'ss'},
$opts->{$param_mod.'sq'},
Expand Down Expand Up @@ -249,7 +257,6 @@ sub setup{
PCAP::Cli::file_for_reading('r', $opts{'r'});

# required: direct input
pod2usage(-message => "\nERROR: rs|reference-species must be defined.\n", -verbose => 1, -output => \*STDERR) unless($opts{'rs'});
pod2usage(-message => "\nERROR: msq|sample-sequencing-protocol-mut must be defined.\n", -verbose => 1, -output => \*STDERR) if(exists $opts{'msq'} && ! defined $opts{'msq'});
pod2usage(-message => "\nERROR: wsq|sample-sequencing-protocol-norm must be defined.\n", -verbose => 1, -output => \*STDERR) if(exists $opts{'wsq'} && ! defined $opts{'wsq'});

Expand Down
Loading

0 comments on commit e3b13a7

Please sign in to comment.