Appendix II. Public Data
Major Consortium Data
1. ENCODE (cell lines)
a comprehensive list of functional elements in the human genome.
Tier 1:
GM12878 is a lymphoblastoid cell line produced from the blood of a female donor with northern and western European ancestry by EBV transformation. It was one of the original HapMap cell lines and has been selected by the International HapMap Project for deep sequencing using the Solexa/Illumina platform. This cell line has a relatively normal karyotype and grows well. Choice of this cell line offers potential synergy with the International HapMap Project and genetic variation studies. It represents the mesoderm cell lineage. Cells will be obtained from the Coriell Institute for Medical Research [coriell.org] (Catalog ID GM12878).
K562 is an immortalized cell line produced from a female patient with chronic myelogenous leukemia (CML). It is a widely used model for cell biology, biochemistry, and erythropoiesis. It grows well, is transfectable, and represents the mesoderm linage. Cells will be obtained from the America Type Culture Collection (ATCC) [atcc.org] (ATCC Number CCL-243).
H1 human embryonic stem cells will be obtained from Cellular Dynamics International [cellulardynamics.com].
Tier 2:
HeLa-S3
HepG2
HUVEC
Tier 2.5
SK-N-SH
IMR90 (ATCC CCL-186)
A549 (ATCC CCL-185)
MCF7 (ATCC HTB-22)
HMEC or LHCM
CD14+
CD20+
Primary heart or liver cells
Differentiated H1 cells
Figure 1. Cellular Localization of different types of RNAs, Ref.: https://www.nature.com/articles/nature11233
2. CCLE (cancer cell lines)
3. TCGA (tissue)
The Pan-Cancer Atlas
From the analysis of over 11,000 tumors from 33 of the most prevalent forms of cancer, the Pan-Cancer Atlas provides a uniquely comprehensive, in-depth, and interconnected understanding of how, where, and why tumors arise in humans. As a singular and unified point of reference, the Pan-Cancer Atlas is an essential resource for the development of new treatments in the pursuit of precision medicine.
https://portal.gdc.cancer.gov/
http://www.cell.com/pb-assets/consortium/pancanceratlas/pancani3/index.html
4. 1000 Genomes
find most genetic variants with frequencies of at least 1% in the populations studied.
Online Databases
Major Data Central
1. UCSC
genome browser for vertebrate. https://genome.ucsc.edu/
2. Ensemble
genome annotation. http://www.ensembl.org/index.html
3. NCBI
contribute to the NIH mission of ‘uncovering new knowledge’. https://www.ncbi.nlm.nih.gov/
Expression Data
4. GTEx
gene expression in different tissues. https://www.gtexportal.org/home/
5. Expression Atlas
exploring gene expression results across species under different biological conditions. https://www.ebi.ac.uk/gxa/home
6. BioPortal
visualization, analysis and download of large-scale cancer genomics data sets. http://www.cbioportal.org/index.do
7. TCGA GEPIA
gene expression in different TCGA tumor types. http://gepia.cancer-pku.cn/index.html
8. TCGA ncRNA
http://ibl.mdanderson.org/tanric/_design/basic/index.html
Video
a) Imputation and confounders
Last updated