Skip to content

Commit

Permalink
NewFile: get_bed_file_lengths.sh - downloads bedfile and counts lines…
Browse files Browse the repository at this point in the history
…; NewFile: getCenterPointFromBed.py - returns chromosome, strandedness and midpoint of bed interval from input bed file;
  • Loading branch information
izaak-coleman committed Nov 19, 2018
1 parent ac9e1ff commit d992f3e
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 0 deletions.
25 changes: 25 additions & 0 deletions getCenterPointFromBed.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import sys

def parseBed(fname):
data = []
with open(fname) as f:
data = [l.strip().split() for l in f]
return data

def extractRelevantFields(bed_list):
"""Extract fields: e[0] (chromosome), centerpoint of bed interval
e[5] (strandedness), and return as
list of tuples."""
return [tuple([e[0], (int(e[1]) + int((int(e[2]) - int(e[1])) / 2)), e[5]]) for e in bed_list]

def main():
if len(sys.argv) != 2:
print("Usage: <exe> <bed_file.bed>")
sys.exit(1)

center_points = extractRelevantFields(parseBed(sys.argv[1]))
with open(sys.argv[1][:-4]+".bcp", 'w') as f:
f.write("\n".join([",".join([c, str(cp), s]) for c, cp, s in center_points]))

if __name__ == '__main__':
main()
7 changes: 7 additions & 0 deletions get_bed_file_lengths.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
for base in `cat $1`; do
file=$base.bed.gz
curl -O -L https://www.encodeproject.org/files/$base/@@download/$file
echo $file >> all_bed_lengths.txt
gunzip -c $file | wc -l >> all_bed_lengths.txt
rm $file
done

0 comments on commit d992f3e

Please sign in to comment.