Software by Faculty
CaTS
- Power Calculator for Two Stage Association Studies.
- Faculty: Goncalo Abecasis. Download: Website.
- Reference: Skol, A.D., Scott, L.J., Abecasis, G.R. and Boehnke, M., 2006. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nature genetics, 38(2), p.209.
FUGUE
- Construct haplotypes for the chromosome 22 and 19 linkage disequilibrium maps.
- Faculty: Goncalo Abecasis. Download: Website.
GAS
- Genetic Association Study (GAS) Power Calculator interface that can be used to compute statistical power for large one-stage genetic association studies.
- Faculty: Goncalo Abecasis. Download: Website.
- Reference: Johnson, J.L. and Abecasis, G.R., 2017. GAS Power Calculator: web-based power calculator for genetic association studies. bioRxiv, p.164343.
GOLD
- Graphical Overview of Linkage Disequilibrium.
- Faculty: Goncalo Abecasis. Download: Website.
- Reference: Abecasis, G.R. and Cookson, W.O.C., 2000. GOLD—graphical overview of linkage disequilibrium. Bioinformatics, 16(2), pp.182-183.
GRR
- GRR is a Windows-based application for detecting pedigree errors via graphically inspecting the distribution for marker allele sharing among pairs of family members or all pairs of individuals in a study.
- Faculty: Goncalo Abecasis. Download: Website.
LAMP
- LAMP is our software for Linkage and Association Modeling in Pedigrees.
- Faculty: Goncalo Abecasis. Download: Website.
- Reference: Li, M., Boehnke, M. and Abecasis, G.R., 2005. Joint modeling of linkage and association: identifying SNPs responsible for a linkage signal. The American Journal of Human Genetics, 76(6), pp.934-949.
MACH 1.0
- MACH 1.0 is a Markov Chain based haplotyper that can resolve long haplotypes or infer missing genotypes in samples of unrelated individuals.
- Faculty: Goncalo Abecasis. Download: Website.
Merlin
- Fast pedigree analyses, including non-parametric linkage, error detection and haplotyping.
- Faculty: Goncalo Abecasis. Download: Website.
- Reference: Abecasis, G.R., Cherny, S.S., Cookson, W.O. and Cardon, L.R., 2001. Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nature genetics, 30(1), p.97.
Metal
- METAL software is designed to facilitate meta-analysis of large datasets (such as several whole genome scans) in a convenient, rapid and memory efficient manner.
- Faculty: Goncalo Abecasis. Download: Website.
- Reference: Willer, C.J., Li, Y. and Abecasis, G.R., 2010. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics, 26(17), pp.2190-2191.
PEDSTATS
- PEDSTATS is a handy tool for quick validation and summary of any pair of pedigree (.ped) and data (.dat) files.
- Faculty: Goncalo Abecasis. Download: Website.
PSEUDO
- Fast evaluation of empirical p-values for linkage scans.
- Faculty: Goncalo Abecasis. Download: Website.
QTDT
- Linkage Disequilibrium Analyses for Quantitative and Discrete Traits.
- Faculty: Goncalo Abecasis. Download: Website.
- Reference: Abecasis, G.R., Cardon, L.R. and Cookson, W.O.C., 2000. A general test of association for quantitative traits in nuclear families. The American Journal of Human Genetics, 66(1), pp.279-292.
SNP-HWE
- Fast exact Hardy-Weinberg Equilibrium test for SNPs as described in Wigginton, et al. (2005).
- Faculty: Goncalo Abecasis. Download: Website.
- Reference: Wigginton, J.E., Cutler, D.J. and Abecasis, G.R., 2005. A note on exact tests of Hardy-Weinberg equilibrium. The American Journal of Human Genetics, 76(5), pp.887-893.
BEHAVIOUR
- R/Shiny web application for kidney renal clear cell carcinoma.
- Faculty: Veera Baladandayuthapani. Download: Website.
GraphR
- GraphR (Graphical Regression) estiamtes covariate-dependent graphs while incorporating sample heterogenity.
- Faculty: Veera Baladandayuthapani. Download: Github, Website.
iBAG
- Integrative Bayesian analyses of genomics models.
- Faculty: Veera Baladandayuthapani. Download: Website.
MMCR
- Bayesian graphical regression for multiple myeloma.
- Faculty: Veera Baladandayuthapani. Download: Website.
PRECISE
- Proteomic based integrated subject-specific networks in cancer.
- Faculty: Veera Baladandayuthapani. Download: Github, Website.
- Reference: Ha, M.J., Banerjee, S., Akbani, R., Liang, H., Mills, G.B., Do, K.A. and Baladandayuthapani, V., 2018. Personalized Integrated Network Modeling of the Cancer Proteome Atlas. Scientific reports, 8(1), p.14924. <doi:10.1038/s41598-018-32682-x>.
SpaceX
- Provides shared and cluster specfic gene co-expression networks for spatial transcriptomics data.
- Faculty: Veera Baladandayuthapani. Download: Github, Website.
- Reference: Satwik Acharyya, Xiang Zhou and Veerabhadran Baladandayuthapani (2022). SpaceX: Gene Co-expression Network Estimation for Spatial Transcriptomics. Bioinformatics, 38(22): 5033–5041.
CaTS
- Power Calculator for Two Stage Association Studies.
- Faculty: Michael Boehnke, Goncalo Abecasis. Download: Website.
- Reference: Skol, A.D., Scott, L.J., Abecasis, G.R. and Boehnke, M., 2006. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nature genetics, 38(2), p.209.
FTEC
- Coalescent simulation program capable of modeling samples drawn from a population which has undergone faster than exponential growth.
- Faculty: Michael Boehnke. Download: Website.
MultiSKAT
- Kernel Regression based association tests for Multiple phenotypes. The functions aggregate variant-phenotype score statistic in a particular region and computes corresponding p-values efficiently.
- Faculty: Michael Boehnke. Download: Github.
RELPAIR
- RELPAIR 2.0.1 is a FORTRAN 77 program that infers the relationships of pairs of individuals based on genetic marker data, either within families or across an entire sample.
- Faculty: Michael Boehnke. Download: Website.
- Reference: Epstein MP, Duren WL and Boehnke M (2000) Improved inference of relationships for pairs of individuals. American Journal of Human Genetics 67:1219-1231.
RHMAP
- RHMAP 3.0 (updated September 1996) is a statistical package for radiation hybrid mapping.
- Faculty: Michael Boehnke. Download: Website.
- Reference: Boehnke M, Lunetta K, Hauser E, Lange K, Uro J, and VanderStoep J. RHMAP: Statistical Package for Multipoint Radiation Version 3.0, September 1996.
SIBMED
- SIBMED 1.0 is a FORTRAN 77 program that identifies likely genotyping errors and mutations for a sib pair in the context of multipoint mapping.
- Faculty: Michael Boehnke. Download: Website.
- Reference: Douglas J.A. and Boehnke M. SIBMED: A Program that Identifies Likely Genotyping Errors and Mutations for a Sib Pair in the Context of Multipoint Mapping Version 1.0, April 18, 2000.
SIMLINK
- SIMLINK 4.12 (updated April 1997) is a program for estimating the power of a proposed linkage study by computer simulation.
- Faculty: Michael Boehnke. Download: Website.
verifyBamID
- Verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.
- Faculty: Michael Boehnke, Hyun Min Kang. Download: Github, Website.
- Reference: G. Jun, M. Flickinger, K. N. Hetrick, Kurt, J. M. Romm, K. F. Doheny, G. Abecasis, M. Boehnke,and H. M. Kang, Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data, American journal of human genetics doi:10.1016/j.ajhg.2012.09.004 (volume 91 issue 5 pp.839 - 848).
WINNER
- WINNER 1.1 (updated Feb 2009) is a program for correcting the winner's curse effect in genetic associations studies.
- Faculty: Michael Boehnke. Download: Website.
- Reference: Rui Xiao and Michael Boehnke 2009. Quantifying and Correcting in Genetic Association Studies. Genetic Epidemiology 33:453-462.
adaptBayes
- This package contains R functions implementing the adaptive priors described in Boonstra and Barbaro (2018).
- Language(s): R
- Faculty: Philip S. Boonstra. Download: Github.
- Reference: Boonstra, Philip S. and Barbaro, Ryan P., "Incorporating Historical Models with Adaptive Bayesian Updates" (2018) Biostatistics https://doi.org/10.1093/biostatistics/kxy053
RankModeling
- Penalized multistage models for ordered data.
- Language(s): R
- Faculty: Philip S. Boonstra. Download: Github.
- Reference: Boonstra, Philip S. and Krauss, John C., "Inferring a consensus problem list using penalized multistage models for ordered data" (October 2019) The University of Michigan Department of Biostatistics Working Paper Series. Working Paper 126.
IVEware
- Imputations of missing values using the Sequential Regression (also known as Chained Equations) Method. Multiple imputation analyses for both descriptive and model-based analysis. Analysis that accounts for complex design features, weighting, clustering and stratification.
- Faculty: Trivellore Raghunathan, Roderick Little, Michael Elliott. Download: Website.
lcra
- A user-friendly interface for doing joint Bayesian latent class and regression analysis with binary and continuous outcomes.
- Language(s): R
- Faculty: Michael Elliott. Download: Github.
- Reference: “Methods to account for uncertainty in latent class assignments when using latent classes as predictors in regression models, with application to acculturation strategy measures” (2020) In press at Epidemiology. doi:10.1097/EDE.0000000000001139
RDSsamplesize
- Provides functionality for carrying out sample size estimation and power calculation in Respondent-Driven Sampling.
- Language(s): R
- Faculty: Michael Elliott. Download: CRAN.
PRSweb
- Interactive PheWAS results from analyses conducted using Michigan Genomics Initiative and UK Biobank data.
- Faculty: Lars Fritsche, Bhramar Mukherjee. Download: Website.
singR
- R package with the implementation of SING method (SImultaneous Non-Gaussian component analysis) for data integration in neroimaging.
- Faculty: Irina Gaynanova. Download: Github.
bp
- An R package for blood pressure analysis, available from Github.
- Faculty: Irina Gaynanova. Download: Github.
latentcor
- An R package for estimating latent correlations from mixed data types, available from Github.
- Faculty: Irina Gaynanova.
SLIDE
- An R package for learning partially-shared structures from multi-view data, available from Github.
- Faculty: Irina Gaynanova. Download: Github.
biClassify
- An R package for binary classification using extensions of discriminant analysis, CRAN.
- Faculty: Irina Gaynanova. Download: CRAN.
iglu
- An R package for interpreting data from continuous glucose monitors (CGMs), available from Github and CRAN.
- Faculty: Irina Gaynanova. Download: CRAN.
SPRING
- An R package for estimation of sparse microbial association networks using rank-based correlation, available from Github.
- Faculty: Irina Gaynanova.
sparseKOS
- An R package for nonlinear binary classification using sparse kernel optimal scoring, available from Github.
- Faculty: Irina Gaynanova.
JACA
- R package for joint association and classification analysis of multi-view data, available from Github.
- Faculty: Irina Gaynanova.
mixedCCA
- R package for semiparametric sparse canonical correlation analysis for data of mixed types (continuous/ binary/ zero-inflated), available from Github.
- Faculty: Irina Gaynanova. Download: Github.
DAP
- R package to perform discriminant analysis via projections, available from Github and CRAN.
- Faculty: Irina Gaynanova. Download: CRAN.
TREX
- Matlab package to perform sparse linear regression using TREX, available from Github.
- Faculty: Irina Gaynanova. Download: Github.
MGSDA
- R package to perform sparse multi-group discriminant analysis, available from CRAN.
- Faculty: Irina Gaynanova. Download: CRAN.
MultiRobust
- Multiply robust estimation for population mean, regression analysis, and quantile regression.
- Faculty: Peisong Han. Download: CRAN.
- Reference: Multiply robust estimation for population mean (Han and Wang 2013) <doi:10.1093/biomet/ass087>, regression analysis (Han 2014) <doi:10.1080/01621459.2014.880058> (Han 2016) <doi:10.1111/sjos.12177> and quantile regression (Han et al. 2019) <doi:10.1111/rssb.12309>.
CoxKL
- A Kullback-Leibler-based Cox model (CoxKL) to integrate internal individual-level time-to-event data with external risk scores derived from published prediction models.
- Faculty: Kevin He. Download: Github.
FEprovideR
- A structured profile likelihood algorithm for the logistic fixed effects model and an approximate expectation maximization (EM) algorithm for the logistic mixed effects model.
- Faculty: Kevin He, Jack D. Kalbfleisch, Yi Li. Download: Github, CRAN.
grplasso
- Efficient Algorithm for Handling Generalized Linear and Survival Models with a High Volume of Health Centers.
- Faculty: Kevin He. Download: Github.
ppfunnel
- ppfunnel creates elegant funnel plots for profiling health care providers.
- Faculty: Kevin He. Download: Github.
surtiver
Surtvep
- R package for fitting Cox non-proportional hazards models with time-varying coefficients.
- Faculty: Kevin He. Download: Github.
SurvBoost
- A new gradient boosting method for high-dimensional variable selection with censored outcomes using the stratified proportional hazards (PH) model.
- Faculty: Kevin He, Yanming Li, Yi Li, Jian Kang. Download: Github, CRAN.
tdrecur
- tdrecur is an R package dealing with the recurrent events model with time-dependent covariates.
- Faculty: Kevin He. Download: Github.
AFTrees
- Fits AFT models that assume an additive tree model for the regression function and a DP mixture for the residual distribution.
- Faculty: Nicholas Henderson. Download: Github.
daarem
- Implements the DAAREM method for accelerating the convergence of slow, monotone sequences from smooth, fixed-point iterations such as the EM algorithm.
- Faculty: Nicholas Henderson. Download: Github, CRAN.
rvalues
- A collection of functions for computing "r-values" from various kinds of user input such as MCMC output or a list of effect size estimates and associated standard errors. Given a large collection of measurement units, the r-value, r, of a particular unit is a reported percentile that may be interpreted as the smallest percentile at which the unit should be placed in the top r-fraction of units.
- Faculty: Nicholas Henderson. Download: CRAN.
CisGenome Browser
- A Compact Stand-Alone Genome Browser.
- Faculty: Hui Jiang. Download: Website.
- Reference: Jiang, H., Wang, F., Dyer, N.P., Wong, W.H. (2010) CisGenome Browser: A Flexible Tool For Genomic Data Visualization, Bioinformatics, 26 (14).
CisGenome
- An integrated tool for tiling array, genome and cis-regulatory element analysis, working together with CisGenome Browser.
- Faculty: Hui Jiang. Download: Website.
- Reference: Hongkai Ji, Hui Jiang, Wenxiu Ma, David S. Johnson, Richard M. Myers and Wing H. Wong (2008) An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nature Biotechnology, 26: 1293-1300. doi:10.1038/nbt.1505.
ELMSeq
- A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq data.
- Faculty: Hui Jiang. Download: Website.
fast-opt
- Package for fast computation of the Optional Polya Tree (OPT).
- Faculty: Hui Jiang. Download: Website.
fastPerm
intasymm
- Asymmetric Integration of External Datasets to Small Local Data.
- Faculty: Hui Jiang. Download: Website.
Glmnet for MATLAB
- A matlab wrapper for glmnet, a solver for fitting Lasso (L1) and elastic-net regularized generalized linear models.
- Faculty: Hui Jiang. Download: Website.
MCMC-CE
- Accurate and Efficient Calculation of Small P-Values with the Cross-Entropy Method.
- Faculty: Hui Jiang. Download: CRAN Archive.
mseq
- An R package for modeling non-uniformity in short-read rates in RNA-Seq data.
- Faculty: Hui Jiang. Download: CRAN Archive.
pslinesl1
- Fitting P-splines with an l1 penalty for repeated measures.
- Faculty: Hui Jiang. Download: CRAN Archive.
rSeqNP
- A non-parametric approach for detecting differential expression and splicing from RNA-Seq data.
- Faculty: Hui Jiang. Download: Website.
- Reference: Shi, Y., Chinnaiyan, A. M., Jiang, H. (2015) rSeqNP: A non-parametric approach for detecting differential ex-pression and splicing from RNA-Seq data Bioinformatics, in press.
rSeqDiff
- Detecting differential isoform expression from RNA-seq data.
- Faculty: Hui Jiang. Download: Website.
- Reference: Shi, Y., Jiang, H. (2013). rSeqDiff: Detecting differential isoform expression from RNA-Seq data using hierarchical likelihood ratio test, PLoS One, 8 (11): e79448.
rSeq
- rSeq is a set of tools for RNA-Seq data analysis. It consists of programs that deal with many aspects of RNA-Seq data analysis, such as read quality assessment, reference sequence generation, sequence mapping, gene and isoform expressions (RPKMs) estimation, etc.
- Faculty: Hui Jiang. Download: Website.
- References: [1] Jiang, H., Wong, W.H. (2009) Statistical Inferences for Isoform Expression in RNA-Seq, Bioinformatics, 25(8), 1026–1032. [2] Salzman, J., Jiang, H., Wong, W. H. (2011) Statistical Modeling of RNA-Seq Data, Statistical Science, 26 (1): 62-83.
rSeqRobust
SeqAlto
- Fast and accurate read alignment for resequencing.
- Faculty: Hui Jiang. Download: Website.
- References: John C. Mu, Hui Jiang, Amirhossein Kiani, Marghoob Mohiyuddin, Narges Bani Asadi and Wing H. Wong, Fast and Accurate Read Alignment for Resequencing, Bioinformatics, 2012.
SeqMap
- A tool for mapping millions of short sequences to the genome.
- Faculty: Hui Jiang. Download: Website.
- References: Jiang, H., Wong, W.H. (2008) SeqMap: Mapping Massive Amount of Oligonucleotides to the Genome, Bioinformatics, 24(20).
smooth-opt
SpliceMap
- SpliceMap is a de novo splice junction discovery and alignment tool. It offers high sensitivity and support for arbitrary RNA-seq read lengths.
- Faculty: Hui Jiang. Download: Website.
- Reference: Kin Fai Au, Hui Jiang, Lan Lin, Yi Xing, and Wing Hung Wong. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Research, Advance access published on April 5, 2010.
stcf
- An Algorithm for Minimizing Sum of Truncated Convex Functions.
- Faculty: Hui Jiang. Download: Website.
FEprovideR
- A structured profile likelihood algorithm for the logistic fixed effects model and an approximate expectation maximization (EM) algorithm for the logistic mixed effects model.
- Faculty: Kevin He, Jack D. Kalbfleisch, Yi Li. Download: Github, CRAN.
CoxClusterProcess
GeneNetwork
- Gene sub-network analysis via Bayesian nonparametric methods.
- Faculty: Jian Kang. Download: Website.
GeoCopula
- Unified modeling framework for analysis of spatial-clustered continuous and binary data.
- Faculty: Jian Kang, Peter X.K. Song. Download: Website.
- Reference: Bai, Y., Kang, J., & Song, P.X.K. (2014). Efficient pairwise composite likelihood estimation for spatial‐clustered data. Biometrics, 70(3), 661-670.
Poisson Graphical Model
- Uses the EM algorithm to find the point estimates of the intensity parameters for the Poisson Graphical Model.
- Faculty: Jian Kang. Download: Website.
- Reference: References: Xue, W., Kang, J., Bowman F.D., Wager, T.D., Guo, J. (2014) Identifying Functional Co-activation Patterns in Neuroimaging Studies via Poisson Graphical Models, Biometrics , In press.
ReverseInference
STGP
- This package focus on spatial variable selection for scalar-on-image regression. It uses a new class of Bayesian nonparametric models, soft-thresholded Gaussian processes and the developed efficient posterior computation algorithms.
- Faculty: Jian Kang. Download: Website.
- References: Kang, J., Reich, B.J. and Staicu, A.M., 2018. Scalar-on-image regression via the soft-thresholded Gaussian process. Biometrika, 105(1), pp.165-184.
SurvBoost
- A new gradient boosting method for high-dimensional variable selection with censored outcomes using the stratified proportional hazards (PH) model.
- Faculty: Kevin He, Yanming Li, Yi Li, Jian Kang. Download: Github, CRAN.
TGLG
- This package implements a novel prior model for Bayesian network marker selection in the generalized linear model (GLM) framework.
- Faculty: Jian Kang. Download: Website.
- References: Cai, Q., Kang, J. and Yu, T., 2018. Bayesian network marker selection via the thresholded graph Laplacian Gaussian prior. Bayesian Analysis.
apigenome
- Libraries and command-line utilities for big data genomic analysis.
- Faculty: Hyun Min Kang. Download: Github.
cleancall
- Correction for DNA contamination in genotype calling.
- Faculty: Michael Boehnke, Hyun Min Kang. Download: Github.
cramore
- A collection of C++ tools to manipulate SAM/BAM/CRAM and BCF/VCF files in various contexts of sequence analysis.
- Faculty: Hyun Min Kang. Download: Github.
demuxlet
- Genetic multiplexing of barcoded single cell RNA-seq.
- Faculty: Hyun Min Kang. Download: Github.
- Reference: Kang, H.M., Subramaniam, M., Targ, S., Nguyen, M., Maliskova, L., McCarthy, E., Wan, E., Wong, S., Byrnes, L., Lanata, C.M. and Gate, R.E., 2018. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nature biotechnology, 36(1), p.89.
EPACTS
- Efficient and Parallelizable Association Container Toolbox. Perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface.
- Faculty: Hyun Min Kang. Download: Github.
EMMA
- Statistical test for model organisms association mapping correcting for the confounding from population structure and genetic relatedness.
- Faculty: Hyun Min Kang. Download: Website.
- Reference: Kang, H.M., Zaitlen, N.A., Wade, C.M., Kirby, A., Heckerman, D., Daly, M.J. and Eskin, E., 2008. Efficient control of population structure in model organism association mapping. Genetics, 178(3), pp.1709-1723.
EMMAX
- Statistical test for large scale human or model organism association mapping accounting for the sample structure.
- Faculty: Hyun Min Kang. Download: Wiki.
- Reference: Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, Eskin E. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42:348-54.
GotCloud
- Genomes on the Cloud, Mapping & Variant Calling Pipelines.
- Faculty: Hyun Min Kang. Download: Github, Wiki.
- Reference: Jun, Goo, et al. "An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data." Genome research (2015): gr-176552.
popscle (freemuxlet)
- A suite of population scale analysis tools for single-cell genomics data including implementation of Demuxlet/Freemuxlet methods and auxilary tools.
- Faculty: Hyun Min Kang. Download: Github.
RUTH
- Robust Unified Hardy-Weinberg Equilibrium Test.
- Faculty: Hyun Min Kang. Download: Github.
topmed_variant_calling
- A collection of software tools used for producing TOPMed variant calls and genotypes with a comprehensive documentation that allows investigators to understand the methods and reproduce the variant calls from the same set of aligned sequence reads.
- Faculty: Hyun Min Kang. Download: Github.
verifyBamID
- Verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.
- Faculty: Michael Boehnke, Hyun Min Kang. Download: Github, Website.
- Reference: G. Jun, M. Flickinger, K. N. Hetrick, Kurt, J. M. Romm, K. F. Doheny, G. Abecasis, M. Boehnke,and H. M. Kang, Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data, American journal of human genetics doi:10.1016/j.ajhg.2012.09.004 (volume 91 issue 5 pp.839 - 848).
verifyBamID2
- A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.
- Faculty: Hyun Min Kang. Download: Github.
- Reference: Zhang F., Flickinger M., InPSYght Psychiatric Genetics Consortium, Abecasis G., Boehnke M., Kang H.M.(8 November 2018)."Ancestry-agnostic estimation of DNA sample contamination from sequence reads".bioRxiv 466268; doi: https://doi.org/10.1101/466268.
VT
- A tool set for short variant discovery in genetic sequence data.
- Faculty: Hyun Min Kang. Download: Github.
- Reference: Adrian Tan, Gonçalo R. Abecasis and Hyun Min Kang. Unified Representation of Genetic Variants. Bioinformatics (2015) 31(13): 2202-2204.
SMART Sample Size Calculator
- Sample size calculator applet for SMART studies.
- Faculty: Kelley Kidwell. Download: Shiny Application.
- Reference: Kidwell, K.M., Seewald, N., Tran, B.Q., Kasari, C., Almirall, D. Design and analysis considerations to compare dynamic treatment regimens from sequential multiple assignment randomized trials. (2018). Journal of Applied Statistics. 45(9): 1628-1651.
snSMART with 3 active treatments and a binary outcome
- Small n Sequential, Multiple Assignment, Randomized Trial (snSMART) calculation applet.
- Faculty: Kelley Kidwell. Download: Shiny Application.
- Reference: Wei, B., Braun, T.M., Tamura, R.N. and Kidwell, K.M., 2018. A Bayesian analysis of small n sequential multiple assignment randomized trials (snSMARTs). Statistics in medicine, 37(26), pp.3723-3732.
snSMART with placebo, low, and high dose and a continuous outcome
- Small n Sequential, Multiple Assignment, Randomized Trial (snSMART) calculation applet.
- Faculty: Kelley Kidwell. Download: Shiny Application.
- Reference: Fang F, Tamura RN, Braun TM, Kidwell KM. Comparing dose levels to placebo using a continuous outcome in a small n, sequential, multiple assignment, randomized trial (snSMART). Statistics in Biopharmaceutical Research. 2022: 1-19.
snSMART R Package
- Consolidated data simulation, sample size calculation and analysis functions for several snSMART (small sample sequential, multiple assignment, randomized trial) designs under one library.
- Faculty: Kelley Kidwell. Download: CRAN, Github.
- Wei, B., Braun, T.M., Tamura, R.N. and Kidwell, K.M., 2018. A Bayesian analysis of small n sequential multiple assignment randomized trials (snSMARTs). Statistics in Medicine, 37(26), pp.3723-3732.
LGEWIS
- Functions for genome-wide association studies (GWAS)/gene-environment-wide interaction studies (GEWIS) with longitudinal outcomes and exposures.
- Faculty: Seunggeun Shawn Lee, Bhramar Mukherjee, Min Zhang. Download: CRAN.
- References: He et al. (2017) "Set-Based Tests for Gene-Environment Interaction in Longitudinal Studies" and He et al. (2017) "Rare-variant association tests in longitudinal studies, with an application to the Multi-Ethnic Study of Atherosclerosis (MESA)".
Lodi
- Impute observed values below the limit of detection (LOD) via censored likelihood multiple imputation (CLMI) in single-pollutant models, developed by Boss et al (2019) <doi:10.1097/EDE.0000000000001052>.
- Faculty: Seunggeun Shawn Lee, Bhramar Mukherjee, Min Zhang. Download: CRAN.
- References: Boss, J., Mukherjee, B., Ferguson, K.K., Aker, A., Alshawabkeh, A.N., Cordero, J.F., Meeker, J.D. and Kim, S., 2019. Estimating outcome-exposure associations when exposure biomarker detection limits vary across batches. Epidemiology, 30(5), pp.746-755.
SAIGE
- SAIGE is an R-package for testing for associations between genetic variants and binary phenotypes with adjusting for sample relatedness and case-control imbalance.
- Faculty: Seunggeun Shawn Lee. Download: Website, Github.
SKAT
- SKAT is an R-package for rare variant association analysis. It can carry out burden test, SKAT, SKAT-O, and combined test of common and rare variants with adjusting for covariates and kinship. For binary traits, it can calculate p-values using resampling and asymptotic based adjustment methods. It also has functions for sample size and power calculations.
- Faculty: Seunggeun Shawn Lee. Download: Website, Github, CRAN.
MetaSKAT
- MetaSKAT is an R package for gene-based meta-analysis across studies. It can carry out a meta-analysis of SKAT, SKAT-O and burden tests with individual-level genotype data or gene-level summary statistics.
- Faculty: Seunggeun Shawn Lee. Download: Website, Github, CRAN.
iECAT
- iECAT is an R-package to test for single variant and gene/region-based associations using external control samples.
- Faculty: Seunggeun Shawn Lee. Download: Website, Github, CRAN.
SPAtest
- SPAtest is an R-package to perform score test for associations between genetic variants and binary traits using saddlepoint approximation. The methods implemented in the package (FastSPA) can accurately calculate p-values even when the case-control ratio is extremely unbalanced.
- Faculty: Seunggeun Shawn Lee. Download: Website, CRAN.
JointScoreTest
- JointScoreTest is an R-package to perform a joint test of fixed and random effects in the Generalized linear mixed model framework.
- Faculty: Seunggeun Shawn Lee. Download: Website.
dSVA
- dSVA is an R-package to identify hidden factors in high-dimensional biomedical data.
- Faculty: Seunggeun Shawn Lee. Download: Website, CRAN.
TransMeta & TransMetaRare
- TransMeta is an R-package to compute single SNP p-values of trans-ethnic meta-analysis using a kernel-based random effect model. This is an early version, and we will keep updating it. We have recently extended it to gene-based rare-variant test (Transmeta-rare). The packages can be downloaded from the following github.
- Faculty: Seunggeun Shawn Lee. Download: Website, TransMetaRare Github.
EigenCorr
- EigenCorr is an R-package to compute p-values of principal components (PCs) based on EigenCorr1, EigenCorr2 and Tracy-Widom methods. You need PCs, outcome phenotypes and all eigenvalues to run EigenCorr.
- Faculty: Seunggeun Shawn Lee. Download: Website.
clikcorr
- A profile likelihood based method of estimation and inference on the correlation coefficient of bivariate data with different types of censoring and missingness.
- Faculty: Yanming Li. Download: CRAN.
MSGLasso
- Fit multivariate response and multiple predictor linear regression with an arbitrary group structure assigned on the regression coefficients matrix, using the multivariate sparse group lasso and the mixed coordinate descent algorithm.
- Faculty: Yanming Li. Download: CRAN.
mLDA
- The mLDA package implements the multi-class linear discriminant analysis method for classifications with ultrahigh-dimensional data. The method can select both marginally and jointly informative features that are informative for classifications.
- Faculty: Yanming Li. Download: Github.
- Reference: Li, Yanming and Hong, Hyokyoung and Li, Yi (2018) Multiclass Linear Discriminant Analysis with Ultrahigh-Dimensional Features. Under revision.
SurvBoost
- A new gradient boosting method for high-dimensional variable selection with censored outcomes using the stratified proportional hazards (PH) model.
- Faculty: Kevin He, Yanming Li, Yi Li, Jian Kang. Download: Github, CRAN.
FEprovideR
- A structured profile likelihood algorithm for the logistic fixed effects model and an approximate expectation maximization (EM) algorithm for the logistic mixed effects model.
- Faculty: Kevin He, Jack D. Kalbfleisch, Yi Li. Download: Github, CRAN.
plac
- A semi-parametric estimation method for the Cox model with left-truncated data using augmented information from the marginal of truncation times.
- Faculty: Yi Li. Download: CRAN.
- Reference: Wu, F., Kim, S., Qin, J., Saran, R. and Li, Y., 2018. A pairwise likelihood augmented Cox estimator for left‐truncated data. Biometrics, 74(1), pp.100-108.
screening
- Covariance-insured screening.
- Faculty: Kevin He, Yanming Li, Yi Li. Download: Website.
- Reference: He, K., Kang, J., Hong, H.G., Zhu, J., Li, Y., Lin, H., Xu, H. and Li, Y., 2018. Covariance-insured screening. arXiv preprint arXiv:1805.06595.
SPARES
- Estimation and inference for high-dimensional linear models.
- Faculty: Yi Li. Download: Github.
- Reference: Fei, Z., Zhu, J., Banerjee, M. and Li, Y., 2018. Drawing inferences for high‐dimensional linear models: A selection‐assisted partial regression and smoothing approach. Biometrics.
SurvBoost
- A new gradient boosting method for high-dimensional variable selection with censored outcomes using the stratified proportional hazards (PH) model.
- Faculty: Kevin He, Yanming Li, Yi Li, Jian Kang. Download: Github, CRAN.
IVEware
- Imputations of missing values using the Sequential Regression (also known as Chained Equations) Method. Multiple imputation analyses for both descriptive and model-based analysis. Analysis that accounts for complex design features, weighting, clustering and stratification.
- Faculty: Trivellore Raghunathan, Roderick Little, Michael Elliott. Download: Website.
Bama
- Mediation analysis in the presence of high-dimensional mediators based on the potential outcome framework. Bayesian Mediation Analysis (BAMA), developed by Song et al (2018) <doi:10.1101/467399>.
- Faculty: Bhramar Mukherjee, Min Zhang, Xiang Zhou. Download: CRAN.
- Song, Y., Zhou, X., Zhang, M., Zhao, W., Liu, Y., Kardia, S., Roux, A.D., Needham, B., Smith, J.A. and Mukherjee, B., 2018. Bayesian Shrinkage Estimation of High Dimensional Causal Mediation Effects in Omics Studies. bioRxiv, p.467399.
gigg
- This package implements a Gibbs sampler corresponding to a Group Inverse-Gamma Gamma (GIGG) regression model with adjustment covariates. Hyperparameters in the GIGG prior specification can either be fixed by the user or can be estimated via Marginal Maximum Likelihood Estimation. <arXiv:2102.10670>.
- Faculty: Bhramar Mukherjee. Download: CRAN.
- Boss, Jonathan, et al. "Group Inverse-Gamma Gamma Shrinkage for Sparse Regression with Block-Correlated Predictors." arXiv preprint arXiv:2102.10670 (2021).
higlasso
-
Hierarchical integrative group least absolute shrinkage
and selection operator (HiGLASSO), developed by Boss et al (2020)
<arXiv:2003.12844>, is a general framework to identify noteworthy nonlinear main and interaction effects in the presence of group structures among a set of exposures. - Faculty: Bhramar Mukherjee. Download: CRAN.
- Boss, J., Rix, A., Chen, Y.H., Narisetty, N.N., Wu, Z., Ferguson, K.K., McElrath, T.F., Meeker, J.D. and Mukherjee, B., 2020. A hierarchical integrative group lasso (higlasso) framework for analyzing environmental mixtures. arXiv preprint arXiv:2003.12844.
LGEWIS
- Functions for genome-wide association studies (GWAS)/gene-environment-wide interaction studies (GEWIS) with longitudinal outcomes and exposures.
- Faculty: Seunggeun Shawn Lee, Bhramar Mukherjee, Min Zhang. Download: CRAN.
- References: He et al. (2017) "Set-Based Tests for Gene-Environment Interaction in Longitudinal Studies" and He et al. (2017) "Rare-variant association tests in longitudinal studies, with an application to the Multi-Ethnic Study of Atherosclerosis (MESA)".
Lodi
- Impute observed values below the limit of detection (LOD) via censored likelihood multiple imputation (CLMI) in single-pollutant models, developed by Boss et al (2019) <doi:10.1097/EDE.0000000000001052>.
- Faculty: Seunggeun Shawn Lee, Bhramar Mukherjee, Min Zhang. Download: CRAN.
- References: Boss, J., Mukherjee, B., Ferguson, K.K., Aker, A., Alshawabkeh, A.N., Cordero, J.F., Meeker, J.D. and Kim, S., 2019. Estimating outcome-exposure associations when exposure biomarker detection limits vary across batches. Epidemiology, 30(5), pp.746-755.
medScan
- A collection of methods for large scale single mediator hypothesis testing. The six included methods for testing the mediation effect are Sobel's test, Max P test, joint significance test under the composite null hypothesis, high dimensional mediation testing, divide-aggregate composite null test, and Sobel's test under the composite null hypothesis.
- Faculty: Bhramar Mukherjee, Wei Hao, Xiang Zhou. Download: CRAN.
- Du, J., Zhou, X., Clark‐Boucher, D., Hao, W., Liu, Y., Smith, J.A. and Mukherjee, B., 2023. Methods for large‐scale single mediator hypothesis testing: Possible choices and comparisons. Genetic Epidemiology, 47(2), pp.167-184.
messi
- This R package fits the hard constraint, soft constraint, and unconstrained models in Boss et al. (2023) for mediation analyses with external summary-level information on the total effect.
- Faculty: Bhramar Mukherjee, Wei Hao, Jian Kang. Download: CRAN.
- Boss, J., Hao, W., Cathey, A., Welch, B.M., Ferguson, K.K., Meeker, J.D., Kang, J. and Mukherjee, B., 2023. Mediation with External Summary Statistic Information (MESSI). arXiv preprint arXiv:2306.17347.
MetaIntegration
- An ensemble meta-inference framework to integrate multiple regression models into a current study. Gu, T., Taylor, J.M.G. and Mukherjee, B. (2021) <arXiv:2010.09971>.
- Faculty: Bhramar Mukherjee. Download: CRAN.
- References: Du, Jiacong, et al. "Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods." arXiv preprint arXiv:2003.07398 (2020).
miselect
- Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods. Presents Stacked Adaptive Elastic Net (saenet) and Grouped Adaptive LASSO (galasso) for continuous and binary outcomes.
- Faculty: Bhramar Mukherjee. Download: CRAN.
- References: Gu, T., Taylor, J. M., & Mukherjee, B. (2020). A meta-inference framework to integrate
multiple external models into a current study. arXiv preprint arXiv:2010.09971.
Chicago
PRSweb
- Interactive PheWAS results from analyses conducted using Michigan Genomics Initiative and UK Biobank data.
- Faculty: Lars Fritsche, Bhramar Mukherjee. Download: Website.
SAMBA
- Misclassification of EHR (Electronic Health Record)-derived disease status and lack of representativeness of the study sample can result in substantial bias in effect estimates and can impact power and type I error for association tests. 'SAMBA' implements several methods for obtaining bias-corrected point estimates along with valid standard errors as proposed in Beesley and Mukherjee (2020) <doi:10.1101/2019.12.26.19015859>, currently under review.
- Faculty: Bhramar Mukherjee. Download: CRAN, Github.
- References: Beesley, L.J. and Mukherjee, B., 2019. Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification. medRxiv.
SEIRfansy
-
Extended Susceptible-Exposed-Infected-Recovery Model for handling high false negative rate and symptom based administration of diagnostic tests. <doi:10.1101/2020.09.24.20200238>.
- Faculty: Bhramar Mukherjee. Download: CRAN, Github.
- References: Bhaduri, R., Kundu, R., Purkayastha, S., Kleinsasser, M., Beesley, L. J., & Mukherjee, B. (2020). Extending the susceptible-exposed-infected-removed (SEIR) model to handle the high false negative rate and symptom-based administration of Covid-19 diagnostic tests: SEIR-fansy. Medrxiv.
subgxe
- R package that implements p-value assisted subset testing for association (pASTA), a method developed by Yu et al. (2019) <doi:10.1159/000496867>.
- Faculty: Bhramar Mukherjee, Xiang Zhou, Seunggeun Shawn Lee. Download: CRAN.
- References: Yu, Y., Xia, L., Lee, S., Zhou, X., Stringham, H.M., Boehnke, M. and Mukherjee, B., 2018. Subset-Based Analysis using Gene-Environment Interactions for Discovery of Genetic Associations across Multiple Studies or Phenotypes. Human heredity, 83(6), pp.283-314.
corrsurv
- Collection of two-sample tests for treatment effects with paired censored survival data and recurrent events survival data.
- Faculty: Susan Murray. Download: Github.
- References: 1. Murray, Susan. Nonparametric Rank-Based Methods for Group Sequential Monitoring of Paired Censored Survival Data. 2000. Biometrics, 56, pp. 984-990. 2. Tayob, N. and Murray, S., 2014. Nonparametric tests of treatment effect based on combined endpoints for mortality and recurrent events. Biostatistics, 16(1), pp.73-83.
IVEware
- Imputations of missing values using the Sequential Regression (also known as Chained Equations) Method. Multiple imputation analyses for both descriptive and model-based analysis. Analysis that accounts for complex design features, weighting, clustering and stratification.
- Faculty: Trivellore Raghunathan, Roderick Little, Michael Elliott. Download: Website.
metaboplot
- Shiny interface for exploring metabolite plots based on attributes.
- Faculty: Laura Scott. Download: Github.
accelerometer
- R tool for analysis of accelerometer data.
- Faculty: Peter X.K. Song. Download: Website, Shiny Application.
BivPPL
- Bivariate frailty models for clustered events via penalized partial likelihood methods.
- Language(s): R
- Faculty: Peter X.K. Song. Author: Lili Wang.
- Download: Github.
coxphGPLE
- Fit cox model with multiple functional covariate-environment interactions, where covariate effects can be modified nonlinearly by mixtures of exposed toxicants.
- Faculty: Peter X.K. Song. Download: Website.
eSIR
- Extended state-space SIR epidemiological models.
- Language(s): R
- Faculty: Peter X.K. Song. Author: Lili Wang.
- Download: Github.
FLAPO
- Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measures.
- Faculty: Peter X.K. Song. Download: Website.
- Reference: Wang, F., Wang, L., & Song, P.X.K. (2016). Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements. Biometrics, DOI: 10.1111/biom.12496.
GDEP
- Gene network construction based on time course microarray data.
- Faculty: Peter X.K. Song. Download: Website.
- Reference: Gao, X., Pu, DQ., & Song, P.X.K. (2009). Transition dependency: a gene-gene interactionmeasure for times seriesmicroarray data. EURASIP Journal on Bioinformatics and Systems Biology, 2009, 2.
GeoCopula
- Unified modeling framework for analysis of spatial-clustered continuous and binary data.
- Faculty: Jian Kang, Peter X.K. Song. Download: Website.
- Reference: Bai, Y., Kang, J., & Song, P.X.K. (2014). Efficient pairwise composite likelihood estimation for spatial‐clustered data. Biometrics, 70(3), 661-670.
GSMC
- A simulation-free group sequential design with max-combo tests in the presence of non-proportional hazards.
- Language(s): R
- Faculty: Peter X.K. Song. Author: Lili Wang.
- Download: Github.
IAfrac
- Calculate sample sizes and information fractions (IF) for Fleming-Harrington class weighted log-rank tests (FH-WLRT) in interim analysis (IA).
- Language(s): R
- Faculty: Peter X.K. Song. Author: Lili Wang.
- Download: Github.
metaFuse
- Fused lasso approach in regression coefficient clustering.
- Faculty: Peter X.K. Song. Download: Website.
- Reference: Tang, L., & Song, P.X.K. (2016). Fused Lasso Approach in Regression Coefficients Clustering -- Learning Parameter Heterogeneity in Data Integration. Journal of Machine Learning Research, 17(113):1−23.
MODAC
- Method of divide-and-combine in regularized generalized linear models for big data.
- Faculty: Peter X.K. Song. Download: Website.
- Reference: Tang, L., Zhou, L., and Song, P.X.K. (2016). Method of Divide-and-Combine in Regularised Generalised Linear Models for Big Data. arXiv preprint arXiv:1611.06208.
NGM
- Bayesian semi-parametric stochastic velocity model with Ornstein-Uhlenbeck process.
- Faculty: Peter X.K. Song. Download: Website.
HDDesign
- Determine the sample size for high dimensional classification studies.
- Faculty: Peter X.K. Song. Download: Website.
- Reference: Sanchez, B.N., Wu, M., Song, P.X.K., and Wang W. (2016). Study design in high-dimensional classification analysis. Biostatistics, doi: 10.1093/biostatistics/kxw018.
qif
- Estimation of regression coefficients in longitudinal marginal models using quadratic inference functions.
- Faculty: Peter X.K. Song. Download: Github, Website, CRAN.
- Reference: Bai, Y., Kang, J., & Song, P.X.K. (2014). Reference: Qu A, Lindsay BG, Li B. Improving generalized estimating equations using quadratic inference functions. Biometrika 2000, 87 823-836.
RCD
- Scalable and efficient statistical inference with estimating functions in the MapReduce paradigm for big data.
- Faculty: Peter X.K. Song. Download: Website, Shiny Application.
Tensor
- CP and non-negative tensor decomposition/factorizations. Also, a Shiny application to study convergence and clustering properties of decomposition methods.
- Faculty: Peter X.K. Song. Download: Github, Website.
nltm
- Non-linear transformation models (nltm) for analyzing survival data.
- Faculty: Alexander Tsodikov. Download: Github.
- Reference: Tsodikov, A., 2003. Semiparametric models: a generalized self‐consistency approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(3), pp.759-774.
DAP
- Integrative genetic association analysis using deterministic approximation of posteriors.
- Faculty: Xiaoquan William Wen. Download: Github.
- Reference: Wen, X., Lee, Y., Luca, F., Pique-Regi, R. Efficient Integrative Multi-SNP Association Analysis using Deterministic Approximation of Posteriors. The American Journal of Human Genetics, 98(6), 1114-1129.
fmeqtl
- Software pipeline for fine mapping QTLs in multiple subgroups.
- Faculty: Xiaoquan William Wen. Download: Github.
integrative
- Enrichment estimation aided colocalization analysis.
- Faculty: Xiaoquan William Wen. Download: Github.
- Reference: Wen, X., Pique-Regi, R., Luca, F. Integrating Molecular QTL Data into Genome-wide Genetic Association Analysis: Probabilistic Assessment of Enrichment and Colocalization. PLOS Genetics. 2017 Mar 13(3): e1006646.
IRLS
- Implementation of iteratively re-weighted least squares algorithm (IRLS) algorithm for generalized linear model in C++.
- Faculty: Xiaoquan William Wen. Download: Github.
TORUS
- QTL discovery utilizing genomic annotations. Computational procedure for discovering molecular QLTs incorporating genomic annotations.
- Faculty: Xiaoquan William Wen. Download: Github.
- Reference: Wen, X. Effective QTL Discovery Incorporating Genomic Annotations. bioRxiv doi:10.1101/032003.
sbams
- Bayesian model selection in complex linear systems.
- Faculty: Xiaoquan William Wen. Download: Github.
- Reference: Wen, X. "Bayesian Model Selection in Complex Linear Systems, as Illustrated in Genetic Association Studies", submit to Biometrics.
baker
- Bayesian Analytic Kit for Etiology Research.
- Faculty: Zhenke Wu. Download: Github, Website.
- Reference: Wu, Z., Deloria-Knoll, M. and Zeger, S.L., 2016. Nested partially latent class models for dependent binary data; estimating disease etiology. Biostatistics, 18(2), pp.200-213. <doi:10.1093/biostatistics/kxw037>.
ddtlcm
- Implements a Bayesian algorithm to fit latent class models, particularly useful for weakly separated latent classes. Reference: Li et al. (2023).
- Faculty: Zhenke Wu. Download: Github, CRAN.
- Reference: Li, M., Stephenson, B. and Wu, Z., 2023. Tree-Regularized Bayesian Latent Class Analysis for Improving Weakly Separated Dietary Pattern Subtyping in Small-Sized Subpopulations. arXiv preprint arXiv:2306.04700.
doubletree
- An R package for Empowering Domain-Adaptive Probabilistic Cause-of-Death Assignment using Verbal Autopsy via Double-Tree Shrinkage.
- Faculty: Zhenke Wu. Download: Github.
- Reference: Wu Z, Li RZ, Chen I, Li M (2023+). Tree-informed Bayesian multi-source domain adaptation: cross-population probabilistic cause-of-death assignment using verbal autopsy.
mpcr
- Package for estimating treatment effects in matched-pair cluster randomized trials (MPCR) using covariate calibration.
- Faculty: Zhenke Wu. Download: Github, Website.
- Reference: Wu, Z., Frangakis, C.E., Louis, T.A. and Scharfstein, D.O., 2014. Estimation of treatment effects in matched‐pair cluster randomized trials by calibrating covariate imbalance between clusters. Biometrics, 70(4), pp.1014-1022. <doi:10.1111/biom.12214>.
lotR
- Latent class analysis of observations organized by tree in R.
- Faculty: Zhenke Wu. Download: Github.
- Reference: Li M, Park D, Aziz M, Liu CM, Price L, Wu Z (2021). Integrating Sample Similarity Information into Latent Class Analysis: A Tree-Structured Shrinkage Approach. Biometrics.
rewind
- Package for fitting Bayesian restricted latent class models.
- Faculty: Zhenke Wu. Download: Github.
- Reference: Wu, Z., Casciola-Rosen, L., Rosen, A. and Zeger, S.L., 2018. A Bayesian approach to restricted latent class models for scientifically-structured clustering of multivariate binary outcomes. arXiv preprint arXiv:1808.08326. <doi:10.1101/400192>.
spotgear
- Package for fitting Bayesian two-dimensional image dewarping models and estimating disease subsets and signatures.
- Faculty: Zhenke Wu. Download: Github, Website.
- Reference: Wu, Z., Casciola-Rosen, L., Shah, A.A., Rosen, A. and Zeger, S.L., 2017. Estimating autoantibody signatures to detect autoimmune disease patient subsets. Biostatistics, 20(1), pp.30-47. <doi:10.1093/biostatistics/kxx061>.
ADMMnet
- Fit linear and cox models regularized with net (L1 and Laplacian), elastic-net (L1 and L2) or lasso (L1) penalty, and their adaptive forms, such as adaptive lasso and net adjusting for signs of linked coefficients. In addition, it treats the number of non-zero coefficients as another tuning parameter and simultaneously selects with the regularization parameter. The package uses one-step coordinate descent algorithm and runs extremely fast by taking into account the sparsity structure of coefficients.
- Faculty: Donglin Zeng. Download: Github.
APML0
- Fit linear, logistic and Cox models regularized with L0, lasso (L1), elastic-net (L1 and L2), or net (L1 and Laplacian) penalty, and their adaptive forms, such as adaptive lasso / elastic-net and net adjusting for signs of linked coefficients. It solves L0 penalty problem by simultaneously selecting regularization parameters and performing hard-thresholding or selecting number of non-zeros. This augmented and penalized minimization method provides an approximation solution to the L0 penalty problem, but runs as fast as L1 regularization problem. The package uses one-step coordinate descent algorithm and runs extremely fast by taking into account the sparsity structure of coefficients. It could deal with very high dimensional data and has superior selection performance.
- Faculty: Donglin Zeng. Download: CRAN.
- Reference: Li, X., Xie, S., Zeng, D. and Wang, Y., 2018. Efficient ℓ 0‐norm feature selection
based on augmented and penalized minimization. Statistics in medicine, 37(3), pp.473-486.
Vancouver.
DTRlearn
- Dynamic treatment regimens (DTRs) are sequential decision rules tailored at each stage by time-varying subject-specific features and intermediate outcomes observed in previous stages. This package implements three methods: O-learning (Zhao et. al. 2012,2014), Q-learning (Murphy et. al. 2007; Zhao et.al. 2009) and P-learning (Liu et. al. 2014, 2015) to estimate the optimal DTRs.
- Faculty: Donglin Zeng. Download: CRAN.
- Reference: Liu, Y., Wang, Y., Kosorok, M.R., Zhao, Y. and Zeng, D., 2018. Augmented outcome‐weighted learning for estimating optimal dynamic treatment regimens. Statistics in medicine, 37(26), pp.3776-3788.
MultiMlearn
- We provide a software package to estimate individualized treatment rules from multicategory treatments. In general, we use a matched learning (M-learning) method and the idea of one-versus-one approach.
- Faculty: Donglin Zeng. Download: Github.
SurvLong
- Provides kernel weighting methods for estimation of proportional hazards models with intermittently observed longitudinal covariates. Cao H., Churpek M. M., Zeng D., and Fine J. P. (2015) <doi:10.1080/01621459.2014.957289>.
- Faculty: Donglin Zeng. Download: CRAN.
- Reference: Cao, H., Churpek, M.M., Zeng, D. and Fine, J.P., 2015. Analysis of the proportional hazards model with sparse longitudinal covariates. Journal of the American Statistical Association, 110(511), pp.1187-1196.
CNVEM
- CNVEM is a Bayesian Expectation-Maximization algorithm that infers carrier status of CNVs in large samples from SNP genotyping data, such as are available in genome-wide association studies. Using Bayesian computations the program calculates the posterior probability for carrier status of known CNV in each individual of a sample by jointly analyzing genotype information and hybridization intensity. Signal intensity is modeled as a mixture of normal distributions, allowing for locus-specific and allele-specific distributions. Using an expectation maximization algorithm, these distributions are estimated and then used to infer the carrier status of each individual the boundaries of the CNV.
- Faculty: Sebastian Zöllner. Download: Website.
CoaCC
- Simulates a case-control study using a coalescent framework. It assumes a haploid sample of cases and a second haploid sample of controls. Of these two samples the genealogy is generated, dependent on the user-specified population history. From this genealogy a distribution of marker-haplotypes is generated by allowing for marker-mutation and recombinations between marker and gene as well as between markers.
- Faculty: Sebastian Zöllner. Download: Website.
CopyMap
- CopyMap is based on a hidden Markov Model (HMM), predicting the location of CNVs and their allele frequencies using data from a set of CGH experiments.
- Faculty: Sebastian Zöllner. Download: Website.
FTEC
- A coalescent simulator capable of modeling faster than exponential population growth.
- Faculty: Sebastian Zöllner. Download: Github.
- References: Reppell, M., Boehnke, M. and Zöllner, S., 2012. FTEC: a coalescent simulator for modeling faster than exponential growth. Bioinformatics, 28(9), pp.1282-1283.
TRAFIC
- TRAFIC (Test for Rare-variant Association using Family-based Internal Controls) tests for rare variant associations in affected sibpairs by comparing the allele count of rare variants on chromosome regions shared identical by descent (IBD) to the allele count of rare variants on non-shared chromosome regions.
- Faculty: Sebastian Zöllner. Download: Github.
- References: Lin, K.H. and Zöllner, S., 2015. Robust and powerful affected sibpair test for rare variant association. Genetic epidemiology, 39(5), pp.325-333.
TreeLD
- The package TreeLD is a free software tool for mapping complex trait loci. TreeLD performs a multipoint LD-analysis by inferring the ancestry of a genomic region and analyzing this ancestry for signals of disease mutations. The generated likelihoods can be used to test for the presence of a disease locus and to fine-map its location, providing a point estimate and a credible region. Furthermore, the package provides a novel way of visualizing the association signal in a sample. TreeLD is designed for high-density SNP haplotypes and can be applied to case-control data, TDT trio data and quantitative trait data.
- Faculty: Sebastian Zöllner. Download: Github.
- References: Lin, K.H. and Zöllner, S., 2015. Robust and powerful affected sibpair test for rare variant association. Genetic epidemiology, 39(5), pp.325-333.
Bama
- Mediation analysis in the presence of high-dimensional mediators based on the potential outcome framework. Bayesian Mediation Analysis (BAMA), developed by Song et al (2018) <doi:10.1101/467399>.
- Faculty: Bhramar Mukherjee, Min Zhang, Xiang Zhou. Download: CRAN.
- Reference: Song, Y., Zhou, X., Zhang, M., Zhao, W., Liu, Y., Kardia, S., Roux, A.D., Needham, B., Smith, J.A. and Mukherjee, B., 2018. Bayesian Shrinkage Estimation of High Dimensional Causal Mediation Effects in Omics Studies. bioRxiv, p.467399.
LGEWIS
- Functions for genome-wide association studies (GWAS)/gene-environment-wide interaction studies (GEWIS) with longitudinal outcomes and exposures.
- Faculty: Seunggeun Shawn Lee, Bhramar Mukherjee, Min Zhang. Download: CRAN.
- References: He et al. (2017) "Set-Based Tests for Gene-Environment Interaction in Longitudinal Studies" and He et al. (2017) "Rare-variant association tests in longitudinal studies, with an application to the Multi-Ethnic Study of Atherosclerosis (MESA)".
Lodi
- Impute observed values below the limit of detection (LOD) via censored likelihood multiple imputation (CLMI) in single-pollutant models, developed by Boss et al (2019) <doi:10.1097/EDE.0000000000001052>.
- Faculty: Seunggeun Shawn Lee, Bhramar Mukherjee, Min Zhang. Download: CRAN.
- References: Boss, J., Mukherjee, B., Ferguson, K.K., Aker, A., Alshawabkeh, A.N., Cordero, J.F., Meeker, J.D. and Kim, S., 2019. Estimating outcome-exposure associations when exposure biomarker detection limits vary across batches. Epidemiology, 30(5), pp.746-755.
AEenrich
- We extend existing gene enrichment tests to perform adverse event enrichment analysis. Unlike the continuous gene expression data, adverse event data are counts. Therefore, adverse event data has many zeros and ties. We propose two enrichment tests. One is a modified Fisher's exact test based on pre-selected significant adverse events, while the other is based on a modified Kolmogorov-Smirnov statistic. We add Covariate adjustment to improve the analysis."Adverse event enrichment tests using VAERS" Shuoran Li, Lili Zhao (2020).
- Faculty: Lili Zhao. Download: CRAN.
- Reference: Zhao, L., Anderson, M.T., Wu, W., Mobley, H.L. and Bachman, M.A., 2017. TnseqDiff: identification of conditionally essential genes in transposon sequencing studies. BMC bioinformatics, 18(1), p.326.
Tnseq
- Identification of conditionally essential genes using high-throughput sequencing data from transposon mutant libraries.
- Faculty: Lili Zhao. Download: CRAN.
- Reference: Li, S. and Zhao, L., 2021. Vaccine adverse event enrichment tests. Statistics in medicine, 40(19), pp.4269-4278.
Bama
- Mediation analysis in the presence of high-dimensional mediators based on the potential outcome framework. Bayesian Mediation Analysis (BAMA), developed by Song et al (2018) <doi:10.1101/467399>.
- Faculty: Bhramar Mukherjee, Min Zhang, Xiang Zhou. Download: CRAN.
- Reference: Song, Y., Zhou, X., Zhang, M., Zhao, W., Liu, Y., Kardia, S., Roux, A.D., Needham, B., Smith, J.A. and Mukherjee, B., 2018. Bayesian Shrinkage Estimation of High Dimensional Causal Mediation Effects in Omics Studies. bioRxiv, p.467399.
BASS
- BASS is a method for multi-scale and multi-sample analysis in spatial transcriptomics. BASS performs multi-scale transcriptomic analyses in the form of joint cell type clustering and spatial domain detection, with the two analytic tasks carried out simultaneously within a Bayesian hierarchical modeling framework. For both analyses, BASS properly accounts for the spatial correlation structure and seamlessly integrates gene expression information with spatial localization information to improve their performance. In addition, BASS is capable of multi-sample analysis that jointly models multiple tissue sections/samples, facilitating the integration of spatial transcriptomic data across tissue samples.
- Faculty: Xiang Zhou. Download: Github.
- Reference: Li, Z., Zhou, X. BASS: multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol 23, 168 (2022).
CARD
- CARD is a reference-based deconvolution method that estimates cell type composition in spatial transcriptomics based on cell type specific expression information obtained from a reference scRNA-seq data. A key feature of CARD is its ability to accommodate spatial correlation in the cell type composition across tissue locations, enabling accurate and spatially informed cell type deconvolution as well as refined spatial map construction. CARD relies on an efficient optimization algorithm for constrained maximum likelihood estimation and is scalable to spatial transcriptomics with tens of thousands of spatial locations and tens of thousands of genes. CARD is implemented as an open-source R package.
- Faculty: Xiang Zhou. Download: Github.
- Reference: Ying Ma, Xiang Zhou. Spatially Informed Cell Type Deconvolution for Spatial Transcriptomics, 2021.
CoCoNet
- CoCoNet incorporates tissue-specific gene co-expression networks constructed from either bulk or single cell RNA sequencing studies into GWAS data for trait-tissue inference. In particular, CoCoNet relies on a covariance regression network model to express gene-level effect sizes for the given GWAS trait as a function of the tissue-specific co-expression adjacency matrix. With a composite likelihood-based inference algorithm, CoCoNet is scalable to tens of thousands of genes.
- Faculty: Xiang Zhou. Download: Github.
- Reference: Shang, L., Smith, J.A. and Zhou, X., 2020. Leveraging gene co-expression patterns to infer trait-relevant tissues in genome-wide association studies. PLoS genetics, 16(4), p.e1008734.
DBSLMM
- DBSLMM is the software implementing the Deterministic Bayesian Sparse Linear Mixed Model (DBSLMM). DBSLMM can be used to construction Polygenic Genetics Score (PGS). It fits Linear Mixed Model using summary statistics, LD matrix and LD block information. It is computationally efficient and accurate for Biobank scale GWAS data and uses freely available open-source numerical libraries.
- Faculty: Xiang Zhou. Download: Github.
- Reference: Sheng Yang, Xiang Zhou (2019). Accurate and Scalable Construction of Polygenic Scores in Large Biobank Data Sets. bioRxiv.
DPR
- DPR is a software package implementing the latent Dirichlet process regression method for genetic prediction of complex traits.
- Faculty: Xiang Zhou. Download: Github, Website.
- Reference: Ping Zeng and Xiang Zhou (2017). Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models. Nature Communications. 8: 456.
GECKO
- Accurate genetic and environmental covariance estimation with composite likelihood in genome-wide association studies.
- Faculty: Xiang Zhou. Download: Github.
- Reference: Gao, B., Yang, C., Liu, J. and Zhou, X., 2021. Accurate genetic and environmental covariance estimation with composite likelihood in genome-wide association studies. PLoS genetics, 17(1), p.e1009293.
GEMMA
- GEMMA is the software implementing the Genome-wide Efficient Mixed Model Association algorithm for a standard linear mixed model and some of its close relatives for genome-wide association studies (GWAS).
- Faculty: Xiang Zhou. Download: Github, Website.
- Reference: Xiang Zhou and Matthew Stephens (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Genetics. 44: 821–824.
iDEA
- Integrative Differential expression and gene set Enrichment Analysis using summary statistics for single cell RNAseq studies.
- Faculty: Xiang Zhou. Download: Github.
- Reference: Ying Ma, Shiquan Sun, Evan T. Keller, Mengjie Chen and Xiang Zhou. Integrative differential expression and gene set enrichment analysis using summary statistics for single cell RNAseq studies, Nature Communications 2020.
iMAP
- iMAP is a method which performs integrative mapping of pleiotropic association and functional annotations using penalized Gaussian mixture models.
- Faculty: Xiang Zhou. Download: Github, Website.
- Reference: Ping Zeng, Xingjie Hao and Xiang Zhou. Pleiotropic Mapping and Annotation Selection in Genome-wide Association Studies with Penalized Gaussian Mixture Models. bioRxiv 2018. Doi: 10.1101/256461.
IMAGE
- IMAGE is a method that performs methylation quantitative trait locus (mQTL) mapping in bisulfite sequencing studies.
- Faculty: Xiang Zhou. Download: CRAN, Github, Website.
- Reference: Yue Fan, Tauras P. Vilgalys, Shiquan Sun, Qinke Peng, Jenny Tung and Xiang Zhou (2019). High-powered detection of genetic effects on DNA methylation using integrated methylation QTL mapping and allele-specific analysis. bioRxiv.
IRIS
- Briefly, IRIS is a reference-informed integrative method for detecting spatial domains on multiple tissue slices from spatial transcriptomics with spot-level, single-cell level, or subcellular level resolutions.
- Faculty: Xiang Zhou. Download: Github.
- Reference: Ying Ma, Xiang Zhou. Integrative and Reference-Informed Spatial Domain Detection for Spatial Transcriptomics, 2023.
MACAU
- MACAU is the software implementing the Mixed model Association for Count data via data AUgmentation algorithm.
- Faculty: Xiang Zhou. Download: Website.
- Reference: Amanda J. Lea, Jenny Tung and Xiang Zhou (2015). A flexible, effcient binomial mixed model for identifying differential DNA methylation in bisulfite sequencing data. PLoS Genetics. 11: e1005650.
METRO
- METRO is a computational method that leverages expression data collected from multiple genetic ancestries to perform transcriptome-wide association analysis.
- Faculty: Xiang Zhou. Download: Github.
- Reference: Zheng Li, Wei Zhao, Lulu Shang, Thomas H. Mosley, Sharon L.R. Kardia, Jennifer A. Smith, Xiang Zhou# (2022). METRO: Multi-ancestry transcriptome-wide association studies for powerful gene-trait association detection. American Journal of Human Genetics.
MRAID
- MRAID is an R package for efficient statistical inference of two-sample Mendelian Randomization.
- Faculty: Xiang Zhou. Download: Github.
- Reference: Yuan Z, Liu L, Guo P, Yan R, Xue F, Zhou X. Likelihood-based Mendelian randomization analysis with automated instrument selection and horizontal pleiotropic modeling. Science Advances 8, eabl5744.
mtPGS
- mtPGS is a statistical method that leverages multiple traits to construct accurate polygenic scores (PGS) for a target trait of interest.
- Faculty: Xiang Zhou. Download: Github.
- Reference: Chang Xu, Santhi K. Ganesh, and Xiang Zhou (2023). mtPGS: Leverage multiple correlated traits for accurate polygenic score construction.
mvMAPIT
- This R package is a generalization of the MAPIT implementation by Crawford et al. (2017) for any number of traits as described by Stamp et al. (2023). The univariate MAPIT test for marginal epistasis is implemented as the special case of running multivariate MAPIT with a single trait.
- Faculty: Xiang Zhou. Download: Github.
- Reference: L. Crawford, P. Zeng, S. Mukherjee, X. Zhou (2017). Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits. PLoS Genet. 13(7): e1006869.
OMR
- OMR is implemented as an open source R package for two-sampling Mendelian randomization analysis under omnigenic genetic architecture.
- Faculty: Xiang Zhou. Download: Github.
- Reference: Wang, L., Gao, B., Fan, Y., Xue, F. and Zhou, X., 2021. Mendelian randomization under the omnigenic architecture. Briefings in Bioinformatics, 22(6), p.bbab322.
PMR-Egger
- PMR-Egger is a method that fits probabilistic Mendelian randomization with an Egger regression assumption on horizontal pleiotropy for transcriptome-wide association studies (TWASs).
- Faculty: Xiang Zhou. Download: Github, Website.
- Reference: Zhongshang Yuan, Huanhuan Zhu, Ping Zeng, Sheng Yang, Shiquan Sun, Can Yang, Jin Liu and Xiang Zhou (2019). Testing and controlling for horizontal pleiotropy with the probabilistic Mendelian randomization in transcriptome-wide association studies.
PQLseq
- PQLseq is a method that fits generalized linear mixed models for analyzing RNA sequencing and bisulfite sequencing data.
- Faculty: Xiang Zhou. Download: Github, Website.
- Reference: Shiquan Sun*, Jiaqiang Zhu*, Sahar Mozaffari, Carole Ober, Mengjie Chen and Xiang Zhou (2018). Heritability estimation and differential analysis with generalized linear mixed models in genomic sequencing studies. Bioinformatics. in press.
SMART
- SMART is a software implementing the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage.
- Faculty: Xiang Zhou. Download: Website.
- Reference: Xingjie Hao, Ping Zeng, Shujun Zhang and Xiang Zhou (2018). Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies. PLoS Genetics. e1007186.
SPARK
- SPARK is a method for detecting genes with spatial expression patterns in spatially resolved transcriptomic studies.
- Faculty: Xiang Zhou. Download: Github, Website.
- Reference: Shiquan Sun*, Jiaqiang Zhu* and Xiang Zhou (2019). Statistical analysis of spatial expression pattern for spatially resolved transcriptomic studies.
SpatialPCA
- SpatialPCA is a spatially aware dimension reduction method that explicitly accounts for the spatial correlation across tissue locations.
- Faculty: Xiang Zhou. Download: Github.
- Reference: Lulu Shang, and Xiang Zhou (2022). Spatially aware dimension reduction for spatial transcriptomics. Nature Communications.
subgxe
- R package that implements p-value assisted subset testing for association (pASTA), a method developed by Yu et al. (2019) <doi:10.1159/000496867>.
- Faculty: Bhramar Mukherjee, Xiang Zhou, Seunggeun Shawn Lee. Download: CRAN.
- References: Yu, Y., Xia, L., Lee, S., Zhou, X., Stringham, H.M., Boehnke, M. and Mukherjee, B., 2018. Subset-Based Analysis using Gene-Environment Interactions for Discovery of Genetic Associations across Multiple Studies or Phenotypes. Human heredity, 83(6), pp.283-314.
VIPER
- VIPER is a method that performs Variability Preserving ImPutation for Expression Recovery in single cell RNA sequencing studies.
- Faculty: Xiang Zhou. Download: Github, Website.
- Reference: Mengjie Chen and Xiang Zhou (2018). VIPER: variability-preserving imputation foraccurate gene expression recovery insingle-cell RNA sequencing studies. Genome Biology. 19:196.
WHODAD
- WHODAD is a software package implementing the WHODAD method for paternity inference from low-coverage sequencing data.
- Faculty: Xiang Zhou. Download: Website.
- Reference: Noah Snyder-Mackler, William H Majoros, Michael L Yuan, Amanda O Shaver, Jacob B Gordon, Gisela H Kopp, Stephen A Schlebusch, Jeffrey D Wall, Susan C Alberts, Sayan Mukherjee, Xiang Zhou and Jenny Tung (2016). Efficient genome-wide sequencing and low-coverage pedigree analysis from non-invasively collected samples. Genetics. 203: 699-714.