Training @ Lu Lab
Lu Lab Docs
  • Home
    • Training @ Lu Lab
  • Drylab Training
    • Genomics
      • RNA Types in Genome
  • Wetlab Training
    • Wetlab Safety Guide
    • Wetlab FAQ
  • Archive
    • Archive 2021
      • cfDNA Methylation
      • Genomic Annotation
    • Archive 2019 - Wetlab Training
      • Class I. Basics
        • 1. Wet Lab Safety
        • 2. Wet Lab Regulation
        • 3. Wet Lab Protocols
        • 4. How to design sample cohort
        • 5. How to collect and manage samples
        • 6. How to purify RNA from blood
        • 7. How to check the quantity and quality of RNA
        • 8. RNA storage
        • 9. How to remove DNA contanimation
        • 10. What is Spike-in
      • Class II. NGS - I
        • 1. How to do RNA-seq
        • 2. How to check the quantity and quality of RNA-seq library
        • 3. What is SMART-seq2 and Multiplex
    • Archive 2019 - Drylab Training
      • Getting Startted
      • Part I. Programming Skills
        • Introduction of PART I
        • 1.Setup
        • 2.Linux
        • 3.Bash and Github
        • 4.R
        • 5.Python
        • 6.Perl
        • Conclusion of PART I
      • Part II. Machine Learning Skills
        • 1.Machine Learning
        • 2.Feature Selection
        • 3.Machine Learning Practice
        • 4.Deep Learning
      • Part III. Case studies
        • Case Study 1. exRNA-seq
          • 1.1 Mapping, Annotation and QC
          • 1.2 Expression Matrix
          • 1.3.Differential Expression
          • 1.4 Normalization Issues
        • Case Study 2. exSEEK
          • 2.1 Plot Utilities
          • 2.2 Matrix Processing
          • 2.3 Feature Selection
        • Case Study 3. DeepSHAPE
          • 3.1 Background
          • 3.2 Resources
          • 3.3 Literature
      • Part IV. Appendix
        • Appendix I. Keep Learning
        • Appendix II. Public Data
        • Appendix III. Mapping Protocol of RNA-seq
        • Appendix IV. Useful tools for bioinformatics
      • Part V. Software
        • I. Docker Manual
        • II. Local Gitbook Builder
        • III. Teaching Materials
  • Archive 2018
Powered by GitBook
On this page
  • Pipeline
  • Data Structure
  • Inputs
  • Outputs
  • Running Scripts
  • Software/Tools
  • Example of single case
  • Tips/Utilities
  • Homework and more
  • Video
  • a) Expression matrix
Edit on GitHub
  1. Archive
  2. Archive 2019 - Drylab Training
  3. Part III. Case studies
  4. Case Study 1. exRNA-seq

1.2 Expression Matrix

Last updated 3 years ago

Pipeline

Data Structure

"genomes/hg38/"         # reference genomes (i.e. genome sequence and annotation)
"shared_scripts/"       # shared scripts by Lu Lab
~/github                # I sync my own scripts to github

~/proj_exRNA/
|-- RNA_index           # rerference transcriptomes (fasta and index) 
|-- sample*_name        # ...
|-- sample2_name        # different samples     
`-- sample1_name        
    |-- fastq           # raw data: fastq files
    |-- fastqc          # QC of fastq
    |-- trim            # trimmed fastq (e.g. 3' adaptor cutted)
    |-- mapped          # mapped data: SAM/BAM files
    `-- exp             # expression of each gene/ncRNA

Inputs

File format

Information contained in file

File description

Notes

bam

alignments

Produced by mapping reads to the transcriptome.

Reads are trimmed using a proprietary version of cutAdapt. We map to transcriptome for a better sensitivity (see details in protocol and example).

Outputs

File format

Information contained in file

File description

Notes

bigWig

signal

Normalized RNA-seq signal

Signals are generated for transcriptome both the plus and minus strands and for unique reads and unique+multimapping reads.

tsv

gene (ncRNA) quantifications

Non-normalized counts.

Running Scripts

Software/Tools

  • RSEM

  • homer

  • HTseq

  • FeatureCounts

Example of single case

# genome seuqneces and annotaions
ln -s /BioII/lulab_b/shared/genomes ~/genomes
export hg38=~/genomes/human_hg38/sequence/GRCh38.p12.genome.fa
export gtf=~/genomes/human_hg38/gtf

# working space
cd ~/proj_exRNA
# first convert relative positions to genomic coordinates
# then convert bam to bigWig, so that we can view it in genome browser

#####################################################################
#0.1 convert the relative positions on RNAs to genomic coordinates 
#####################################################################

rsem-tbam2gbam miRNA.indexDir/ NC_1.miRNA.sam NC_1.miRNA.rsem.bam

#####################################################################
#0.2 two ways to convert bam/sam to bedGraph/bigwig for visulaization 
#####################################################################
#0.2a Using homer
#Make tag directories for each sample (e.g. NC_1)
#works with sam or bam (samtools must be installed for bam) 
makeTagDirectory NC_1.miRNA.tagsDir/ NC_1.miRNA.sorted.bam

## Add "-strand separate" for strand-specific sequencing 
makeUCSCfile NC_1.miRNA.tagsDir/ -fragLength given -o auto
# (repeat for other tag directories)
makeTagDirectory NC_1.miRNA.tagsDir/ NC_1.miRNA.sorted.bam analyzeRepeats.pl analyzeRepeats.pl /BioII/lulab_b/shared/genomes/human_hg38/gtf/miRNA.gtf hg38 -count exons -d NC_1.miRNA.tagsDir/ -gid -noadj > NC_1.miRNA.homer.rpm
htseq-count -f bam -m intersection-strict miRNA_primary_transcript NC_1.miRNA_hg38_mapped.sam /BioII/lulab_b/shared/genomes/human_hg38/gtf/miRNA.gff > NC_1.miRNA.htseq.counts
featureCounts -s 1 -a /BioII/lulab_b/shared/genomes/human_hg38/gtf/miRNA.gff -o NC_1.miRNA.featureCounts.counts NC_1.miRNA.sorted.bam

Tips/Utilities

Merge multiple bams to one

# mutiple bams to one bam file, in Snakemake form

rule mergeBam:
    input:
        bam = "02.mapping/{library}/sequentialMap/{library}.hg38other.bam"
    output:
        bam = "02.mapping/{library}/sequentialMap/{library}.sequentialMap.bam"
    params:
        jobname = "{library}.mergeBam",
        dir = "02.mapping/{library}/sequentialMap/"
    shell:
        """
        samtools merge {output.bam} {input.bam} `find {params.dir} -name "*.rsem.clean.bam"`

Homework and more

  1. Visualize your mapped reads with IGV (locally) and/or UCSC Genome Browser (on line).

  2. Learn how to construct the expression matrix using HTSeq, featureCounts and homer; then compare the difference among these three methods.

More Reading and Practice

  • Bioinformatics Data Skills

    • 11) Working with Alignment Data

Video

a) Expression matrix

: 2. Construction of expression matrix

Example of batch job
@Youtube
@Bilibili
Additional Tutorial