WGS analysis : Lachesis

This site makes available the results from a phenotype association study using low-pass whole genome sequencing from 962 individuals from the CHARGE consortium. The data, and a user-friendly interface, are available in the form of an R package which makes use of R's powerful plotting capabilities. A more detailed description of the study design and analyses are available in:

Whole-genome sequence–based analysis of high-density lipoprotein cholesterol.
Alanna C Morrison, Arend Voorman, Andrew D Johnson, Xiaoming Liu, Jin Yu, Alexander Li, Donna Muzny, Fuli Yu, Kenneth Rice, Chengsong Zhu, Joshua Bis, Gerardo Heiss, Christopher J O'Donnell, Bruce M Psaty, L Adrienne Cupples, Richard Gibbs & Eric Boerwinkle for the Cohorts for Heart and Aging Research in Genetic Epidemiology (CHARGE) Consortium.

Nature Genetics (2013) doi:10.1038/ng.2671

lachesis is an R package which which coordinates the display of summary statistics from a suite of genotype-phenotype analyses, which we illustrate with an analysis of HDL cholesterol. The package takes as input either a gene name or a range of positions (in hg19 coordinates) and returns summary statistics for the analyses in that region, which can be plotted or viewed as a table. The package is available here: Lachesis Download page

Description of the R Package:

Documentation for the package is contained in R, which can be accessed for the two main functions with the commands:

R\>?lachesis

or

R\>?lachesis.gene

Some illustrative examples of the package capabilities can be seen with the command:

R\>example(lachesis)

Installation instructions:

The package can be installed in R on Windows, Mac OSX, or Linux via the command

R\> install.packages("/path_to_directory/lachesis_1.1.tar.gz", type="source")

where "/path_to_directory/" specifies where you have downloaded the file.

If you do not have R, you can find the latest version and installation instructions here: http://www.r-project.org/

Description of the available data:

For any particular region, the following tables of summary statistics are available, and returned when queried using the R package. Each statistic was adjusted for age, sex, BMI and cohort.

mafs

The position (chr.pos) and frequency (frqs) of each variant in the specified window.

gwas

For each common variant (maf > 1%), the position (pos) and log10 p-value (logp) from a Wald test for additive effects of the variant

exono

For each gene in the region, the transcript name (name), position range (start.pos, end.pos), log10 p-values from SKAT and T1 tests (logp.skat, logp.t1), and number of coding variants (Ncoding)

exonp

For each gene in the region, the transcript name (name), position range (start.pos, end.pos), log10 p-values from SKAT and T1 tests (logp.skat, logp.t1), and number of coding and non-coding variants for each gene in the window (Ncoding, Nnoncoding). This differs from the exono table of results in that the tests include non-coding variants in splice sites, intron1 and upstream/downstream of the gene.

slide

For each 4kb window (with 2 kb overlap), the position range (start.pos,end.pos), log10 p-values from SKAT and T1 tests, (logp.skat, logp.t1), the number of variants (Nsnp), and the number of rare variants (defined as those with maf < 1%) (Nt1)

oreg

For each region annotated by ORegAnno, the name (name), position range (start.pos, end.pos), log10 p-values from SKAT and T1 tests (logp.skat, logp.t1), the number of variants (Nsnp), the number of rare variants (defined as those with maf < 1%) (Nt1).