Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic integration searching #9

Open
NanoporeEnthusiast opened this issue Dec 23, 2024 · 3 comments
Open

Generic integration searching #9

NanoporeEnthusiast opened this issue Dec 23, 2024 · 3 comments
Labels
question Further information is requested

Comments

@NanoporeEnthusiast
Copy link

NanoporeEnthusiast commented Dec 23, 2024

Is your feature related to a problem?

This workflow seems to be nearly compatible with integration searching in general, aside from the input of expected integration sites that are present with CRISPR experiments. Could this, or a similar workflow, be used to search for retroviral integration sites which may incorporate more randomly?

Describe the solution you'd like

As opposed to including an expected insertion site, can the workflow search for the insertional sequence and output the flanking genomic sites and proportions of insertions?

Describe alternatives you've considered

Number of insertions and locations (chromosome, nucleotide number, Gene ID if applicable), UMI discrimination functions, whether the insertions are in coding regions or non-coding regions, etc.

Additional context

No response

@nrhorner nrhorner added the question Further information is requested label Dec 31, 2024
@nrhorner
Copy link
Contributor

Thanks for your question @NanoporeEnthusiast.

Currently this workflow is not able to identify random integration sites. This functionality is also not currently available in the other EPI2ME workflows as far as I know. It does seem like a good idea for a new workflow. I'll let you know if we decide to make this.

@NanoporeEnthusiast
Copy link
Author

Thank you nrhorner, I hope that it is something that can be done in the future. Here is an example of how something similar has been done with R based tools from Ajoge et al., (https://doi.org/10.1038/s41467-022-35379-y)

Integration site library and computational analysis
Genomic DNA was processed for integration site analysis and sequenced using the Illumina MiSeq platform36,50. Briefly, genomic DNA was restriction enzyme digested using MseI and NarI and the 3’ LTR-host genome junctions were amplified by ligation-mediated PCR. After gel purification of the PCR products, the purified DNA samples were processed using the Nextera XT DNA Sample Preparation kit. A limited-cycle PCR reaction was performed to amplify the insert DNA, which was then sequenced using Illumina MiSeq using 2×150 bp chemistry at the London Regional Genomics Centre (Robarts Research Institute, Western University, Canada). Fastq sequencing reads were quality trimmed and unique integration sites identified using our in-house bioinformatics pipeline36, which is called the Barr Lab Integration Site Identification Pipeline (BLISIP version 2.9) and includes the following updates: bedtools (v2.25.0), bioawk (awk version 20110810), bowtie2 (version 2.3.4.1), and restrSiteUtils (v1.2.9). HIV-1 3’ LTR-containing fastq sequences were identified and filtered by allowing up to a maximum of five mismatches with the reference NL4-3 3’ LTR sequence and if the 3’ LTR sequence had no match with any region of the human genome (GRCh37/hg19). Integration sites were determined from the sequence junction of the 3’ LTR and human genome sequences. All genomic sites in each dataset that hosted two or more sites (i.e., identical sites) were collapsed into one unique site for our analysis. Sites located in various common genomic features and non-B DNA motifs were quantified and heatmaps were generated using our in-house python program BLISIP Heatmap (BLISIPHA v1.0). Sites that could not be unambiguously mapped to a single region in the genome were excluded from the study. All non-B DNA motifs were defined according to previously established criteria88. Matched random control integration sites were generated by matching each experimentally determined site with 10 random sites in silico that were constructed to be the same number of bases away from the restriction site as was the experimental site36. Unique HIV 3’ LTRs were identified with BLISIP, aligned with MUSCLE (version 10.1.7)89 and gap-stripped with trimAl (version 1.2)90. All columns with gaps in more than 40% of the population were gap-stripped. Unique LTR sequence logos were generated using WebLogo (version 3.6)52.

@nrhorner
Copy link
Contributor

nrhorner commented Jan 8, 2025

Thanks again @NanoporeEnthusiast I will take a look at this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

2 participants