BPNet reading note 1

BPNet paper

Enhancers are non-coding DNA sequences, which when they are bound by specific proteins increase the level of gene transcription. Enhancers activate unique gene expression patterns within cells of different types or under different conditions. Enhancers are key contributors to gene regulation, and causative variants that affect quantitative traits in humans and mice have been located in enhancer regions.
cis-regulatory element: A noncoding DNA sequence in or near a gene required for proper spatiotemporal expression of that gene, often containing binding sites for transcription factors. Often used interchangeably with enhancer.
somatic mutation: In multicellular organisms, mutations can be classed as either somatic or germ-line:

Somatic mutations – occur in a single body cell and cannot be inherited (only tissues derived from mutated cell are affected)
Germline mutations – occur in gametes and can be passed onto offspring (every cell in the entire organism will be affected)
知乎：https://www.zhihu.com/question/38765318

somatic mutation

Cooperative TF binding: TF complex

direct binding: TF binds directly to DNA
indirect binding: TF binds another TF which binds DNA

Keynote: 02/2020, Deep Learning at Base-Resolution Reveals Motif Syntax

Goal: learn predictive patterns from raw DNA sequences to maximize accuracy across the whole genome

Output: Binary, Yes (1) / No (0), is TF bound?
Input: 4*L matrix, each column is one-hot encoding for a nucleotide.
BPNet acts like a motif pattern detector, it scores the sequences based on the weights of neurons.

Motivation

Cons of traditional statistics-based peak calling methods:
- difficult to tell whether overlapping peaks are driven by the same or different sequence elements
- different peak calling method often gives different answers
- Instead, DNN-based approach takes raw sequence reads to produce

Training:

input: DNA sequence
label: chip or chip-nexus raw count data
note: the biggest gains in deep learning are not through architecture engineering, instead it is clever design of the loss function obeying the nature of the noise that you observe in your data.
joint loss function: model total occupancy (total counts), the best loss is the MSE of the log of the total counts
also **a multinomial loss **to capture the profile shape (how the reads are probabilistically distributed across positions on the profile)
Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are simultaneously learned by a shared model. Such approaches offer advantages like improved data efficiency, reduced overfitting through shared representations, and fast learning by leveraging auxiliary information.

Multinomial distribution

Generalization: different chromosomes of the same cell type

AUPRC vs. AUROC

TF footprint shape is largerly driven by TF-DNA contact with local sequence. Hence, profile 'shapes' can be predicted extremely accurately by local sequence alone.

what are profile shapes? Peak/valley?
does TF shapes is a metric of relative TF binding strength?

Chromatin state + distal interactions contribute to total strength of measured local ChIP TF occupancy. Hence, total counts can be predicted only resonably well by local sequence alone.

what is chromatin state?
what is distal?
what interactions do they have?
is total strength of occupancy determined by total counts? is it an absolute metric of TF binding?

Interpretation: DeepLift

Step 1: Take any DNA sequence as input and predict the profile
Step 2: take the predicted profile and back propagate it through the NN to get the contribution of each neuron in each layer all the way back down to the input and get a contribution score for every uncleotide in the input sequence, telling you how much it contributes to that output.
A profile-wide importance score

TF cooperativity

TF-cooperativity-1.pdf TF-cooperativity-2.pdf summary.pdf caveat.pdf kipio.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BPNet reading note 1

BPNet paper

Keynote: 02/2020, Deep Learning at Base-Resolution Reveals Motif Syntax

Interpretation: DeepLift

TF cooperativity

Clone this wiki locally