Gene-based tests for association are increasingly being seen as a useful complement to genome-wide association studies (GWAS) [1]. A gene-based approach considers association between a trait and all markers (usually SNPs) within a gene rather than each marker individually. Depending on the underlying genetic architecture, gene-based approaches can be more powerful than traditional individual-SNP based GWAS. For example, if a gene contains more than one causative variant, then several SNPs within that gene may show marginal levels of significance that are often indistinguishable from random noise in the initial GWAS results. By making use of prior biological information and combining the effects of all SNPs in a gene into a test-statistic and p-value, the gene-based test may be able to detect these effects. Gene-based tests are also ideally suited for network (or pathway) approaches for interpreting the findings from GWAS [2,3]. These approaches are necessarily gene-centric and require a measure of the relative importance of each gene to the phenotype of interest. The gene-based approach also alleviates the multiple-testing problem faced by GWAS by only considering statistical tests for ~20,000 genes per genome as opposed to testing more than half a million SNPs in a typical GWAS.
Computing a gene-based test for basic GWAS designs using permutations is conceptually simple and is currently implemented as the ‘set-based test’ in the PLINK software package [4]; however heavy computational requirements have restricted this method from being adopted on a genome-wide scale. Other gene-based tests such as those based on genetic distances [5] or entropy [6] are often also restricted to situations where individual genotype information is a...

...lated using the Cholesky decomposition method implemented in the R mvtnorm package [19]. The number of simulations per gene is determined adaptively. In the first stage, 103 simulations will be performed. If the resulting empirical p-value is less than 0.1, an additional 104 will be performed. If the empirical p-value from 104 simulations is 0, the program will perform 106 simulations. For computational reasons, if the empirical p-value is still 0, then no more simulations will be performed. An empirical p-value of 0 from 106 simulations can be interpreted as p < 10-6, which exceeds a Bonferroni corrected threshold of p < 2.85×10-6 (this threshold is likely to be conservative given the overlap between genes). The user may select whether to perform the gene-based test on either the full set of SNPs within a gene or a specified percentage of the most significant SNPs.

