1.4 Normalization Issues

Background

Problems and issues

  • Sparsity of data and technical noise ("batch effects") --> will mask the signal of interest

Causes:

Spike-in for Normalization

RNA content (total amount and species) varies

for mRNAs:

  • 92 ERCC molecules

  • 8 mRNAs

  • whole transcriptome HeLa RNAs

for sRNAs:

  • 52(?) sRNA sequences

Caveat:

Typically only half of the spike-in were detected.

Computational Normalization Tools

for Single cell RNA-seq (and exRNA-seq)

  1. scran:

    1. pools multiple cells (samples) in order to estimate cell-specific size factors in the presence of zero inflation and unbalanced differential expression of genes across groups of cells;

    2. precluster (using e.g. rank-based clustering) the cells into smaller, more homogeneous sets

  2. SCnorm

  3. Census

If considering spike-ins:

  1. SAMstrt

  2. GRM

References

More Reading & Practice

See more about normalization, imputation and confounder (e.g. batch effect) in

Video

a) Normalization 1

@Youtube

@Bilibili

b) Normalization 2

@Youtube

@Bilibili