Training @ Lu Lab
Lu Lab Docs
  • Home
    • Training @ Lu Lab
  • Drylab Training
    • Genomics
      • RNA Types in Genome
  • Wetlab Training
    • Wetlab Safety Guide
    • Wetlab FAQ
  • Archive
    • Archive 2021
      • cfDNA Methylation
      • Genomic Annotation
    • Archive 2019 - Wetlab Training
      • Class I. Basics
        • 1. Wet Lab Safety
        • 2. Wet Lab Regulation
        • 3. Wet Lab Protocols
        • 4. How to design sample cohort
        • 5. How to collect and manage samples
        • 6. How to purify RNA from blood
        • 7. How to check the quantity and quality of RNA
        • 8. RNA storage
        • 9. How to remove DNA contanimation
        • 10. What is Spike-in
      • Class II. NGS - I
        • 1. How to do RNA-seq
        • 2. How to check the quantity and quality of RNA-seq library
        • 3. What is SMART-seq2 and Multiplex
    • Archive 2019 - Drylab Training
      • Getting Startted
      • Part I. Programming Skills
        • Introduction of PART I
        • 1.Setup
        • 2.Linux
        • 3.Bash and Github
        • 4.R
        • 5.Python
        • 6.Perl
        • Conclusion of PART I
      • Part II. Machine Learning Skills
        • 1.Machine Learning
        • 2.Feature Selection
        • 3.Machine Learning Practice
        • 4.Deep Learning
      • Part III. Case studies
        • Case Study 1. exRNA-seq
          • 1.1 Mapping, Annotation and QC
          • 1.2 Expression Matrix
          • 1.3.Differential Expression
          • 1.4 Normalization Issues
        • Case Study 2. exSEEK
          • 2.1 Plot Utilities
          • 2.2 Matrix Processing
          • 2.3 Feature Selection
        • Case Study 3. DeepSHAPE
          • 3.1 Background
          • 3.2 Resources
          • 3.3 Literature
      • Part IV. Appendix
        • Appendix I. Keep Learning
        • Appendix II. Public Data
        • Appendix III. Mapping Protocol of RNA-seq
        • Appendix IV. Useful tools for bioinformatics
      • Part V. Software
        • I. Docker Manual
        • II. Local Gitbook Builder
        • III. Teaching Materials
  • Archive 2018
Powered by GitBook
On this page
  • Major Consortium Data
  • 1. ENCODE (cell lines)
  • 2. CCLE (cancer cell lines)
  • 3. TCGA (tissue)
  • 4. 1000 Genomes
  • Online Databases
  • Major Data Central
  • Expression Data
  • Video
  • a) Imputation and confounders
Edit on GitHub
  1. Archive
  2. Archive 2019 - Drylab Training
  3. Part IV. Appendix

Appendix II. Public Data

Last updated 3 years ago

Major Consortium Data

1. ENCODE (cell lines)

a comprehensive list of functional elements in the human genome.

Tier 1:

  • GM12878 is a lymphoblastoid cell line produced from the blood of a female donor with northern and western European ancestry by EBV transformation. It was one of the original HapMap cell lines and has been selected by the International HapMap Project for deep sequencing using the Solexa/Illumina platform. This cell line has a relatively normal karyotype and grows well. Choice of this cell line offers potential synergy with the International HapMap Project and genetic variation studies. It represents the mesoderm cell lineage. Cells will be obtained from the [coriell.org] (Catalog ID GM12878).

  • K562 is an immortalized cell line produced from a female patient with chronic myelogenous leukemia (CML). It is a widely used model for cell biology, biochemistry, and erythropoiesis. It grows well, is transfectable, and represents the mesoderm linage. Cells will be obtained from the [atcc.org] (ATCC Number CCL-243).

  • H1 human embryonic stem cells will be obtained from [cellulardynamics.com].

Tier 2:

  • HeLa-S3

  • HepG2

  • HUVEC

Tier 2.5

  • SK-N-SH

  • IMR90 (ATCC CCL-186)

  • A549 (ATCC CCL-185)

  • MCF7 (ATCC HTB-22)

  • HMEC or LHCM

  • CD14+

  • CD20+

  • Primary heart or liver cells

  • Differentiated H1 cells

2. CCLE (cancer cell lines)

3. TCGA (tissue)

The Pan-Cancer Atlas

From the analysis of over 11,000 tumors from 33 of the most prevalent forms of cancer, the Pan-Cancer Atlas provides a uniquely comprehensive, in-depth, and interconnected understanding of how, where, and why tumors arise in humans. As a singular and unified point of reference, the Pan-Cancer Atlas is an essential resource for the development of new treatments in the pursuit of precision medicine.

4. 1000 Genomes

find most genetic variants with frequencies of at least 1% in the populations studied.

Online Databases

Major Data Central

1. UCSC

2. Ensemble

3. NCBI

Expression Data

4. GTEx

5. Expression Atlas

6. BioPortal

7. TCGA GEPIA

8. TCGA ncRNA

Video

a) Imputation and confounders

Figure 1. Cellular Localization of different types of RNAs, Ref.:

genome browser for vertebrate.

genome annotation.

contribute to the NIH mission of ‘uncovering new knowledge’.

gene expression in different tissues.

exploring gene expression results across species under different biological conditions.

visualization, analysis and download of large-scale cancer genomics data sets.

gene expression in different TCGA tumor types.

https://www.encodeproject.org/
Cell types of ENCODE
Coriell Institute for Medical Research
America Type Culture Collection (ATCC)
Cellular Dynamics International
https://www.nature.com/articles/nature11233
https://portal.gdc.cancer.gov/
http://www.cell.com/pb-assets/consortium/pancanceratlas/pancani3/index.html
http://www.internationalgenome.org/
https://genome.ucsc.edu/
http://www.ensembl.org/index.html
https://www.ncbi.nlm.nih.gov/
https://www.gtexportal.org/home/
https://www.ebi.ac.uk/gxa/home
http://www.cbioportal.org/index.do
http://gepia.cancer-pku.cn/index.html
http://ibl.mdanderson.org/tanric/_design/basic/index.html
@Youtube
@Bilibili