WGS analysis : Lachesis
This site makes available the results from a phenotype association study using low-pass whole genome sequencing from 962 individuals from the CHARGE consortium. The data, and a user-friendly interface, are available in the form of an R package which makes use of R's powerful plotting capabilities. A more detailed description of the study design and analyses are available in:
Whole-genome sequence–based analysis of high-density lipoprotein cholesterol.
Alanna C Morrison, Arend Voorman, Andrew D Johnson, Xiaoming Liu, Jin Yu, Alexander Li, Donna Muzny, Fuli Yu, Kenneth Rice, Chengsong Zhu, Joshua Bis, Gerardo Heiss, Christopher J O'Donnell, Bruce M Psaty, L Adrienne Cupples, Richard Gibbs & Eric Boerwinkle for the Cohorts for Heart and Aging Research in Genetic Epidemiology (CHARGE) Consortium.
lachesis is an R package which which coordinates the display of summary statistics from a suite of genotype-phenotype analyses, which we illustrate with an analysis of HDL cholesterol. The package takes as input either a gene name or a range of positions (in hg19 coordinates) and returns summary statistics for the analyses in that region, which can be plotted or viewed as a table. The package is available here: Lachesis Download page
Description of the R Package:
Documentation for the package is contained in R, which can be accessed for the two main functions with the commands:
R\>?lachesis
or
R\>?lachesis.gene
Some illustrative examples of the package capabilities can be seen with the command:
R\>example(lachesis)
Installation instructions:
The package can be installed in R on Windows, Mac OSX, or Linux via the command
R\> install.packages("/path_to_directory/lachesis_1.1.tar.gz", type="source")
where "/path_to_directory/" specifies where you have downloaded the file.
If you do not have R, you can find the latest version and installation instructions here: http://www.r-project.org/
Description of the available data:
For any particular region, the following tables of summary statistics are available, and returned when queried using the R package. Each statistic was adjusted for age, sex, BMI and cohort.
mafs
The position (chr.pos) and frequency (frqs) of each variant in the specified window.
gwas
For each common variant (maf > 1%), the position (pos) and log10 p-value (logp) from a Wald test for additive effects of the variant
exono
For each gene in the region, the transcript name (name), position range (start.pos, end.pos), log10 p-values from SKAT and T1 tests (logp.skat, logp.t1), and number of coding variants (Ncoding)
exonp
For each gene in the region, the transcript name (name), position range (start.pos, end.pos), log10 p-values from SKAT and T1 tests (logp.skat, logp.t1), and number of coding and non-coding variants for each gene in the window (Ncoding, Nnoncoding). This differs from the exono table of results in that the tests include non-coding variants in splice sites, intron1 and upstream/downstream of the gene.
slide
For each 4kb window (with 2 kb overlap), the position range (start.pos,end.pos), log10 p-values from SKAT and T1 tests, (logp.skat, logp.t1), the number of variants (Nsnp), and the number of rare variants (defined as those with maf < 1%) (Nt1)
oreg
For each region annotated by ORegAnno, the name (name), position range (start.pos, end.pos), log10 p-values from SKAT and T1 tests (logp.skat, logp.t1), the number of variants (Nsnp), the number of rare variants (defined as those with maf < 1%) (Nt1).