Skip to content

A repository about construction of compressed suffix array (bio-info technology)

Notifications You must be signed in to change notification settings

AtoshDustosh/Construction-of-CSA-Compressed-Suffix-Array

Repository files navigation

Construction-of-CSA-Compressed-Suffix-Array-

A repository about construction of compressed suffix array (bio-info technology)

Usage manual: modify main.c to change input file path, output file path, output file header.

details:

char* FILEPATH = "testdata/testdata_1000.fna";   // file path
int ARRAYLENGTH = 0; // length of T ~ n
int PARTLENGTH = 0; // part length of T ~ l
int PARTNUM = 0; // number of parts ~ ceil(n/l)

char* T = NULL; // DNA sequence of (A,C,G,T) plus a '$'
int* SA = NULL; // SA of T
int* SA_inverse = NULL; // inverse of SA
int* Psi = NULL; // Psi of T - the compressed suffix array

char* BWT = NULL; // BWT of T - Burrows-Wheeler Transform

char* BWTFILEPATH = "outputdata/testdata_1000.bwt";
char* BWTFILEHEADER = ">gi|110640213|ref|NC_008253.1| Escherichia coli 536, bwt array";
int LINELENGTH = 70;

FILEPATH ~ input file path

BWTFILEPATH ~ output file path

BWTFILEHEADER ~ output file header

Afterthoughts

  1. the importance of this algorithm is not about the time complexity. The point is this algorithm reduces the peak memory usage from O(nlog n) to O(n), which will be O(nlog n) if the CSA is built directly.

  2. ...

About

A repository about construction of compressed suffix array (bio-info technology)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages