Skip to contents

Introduction

In many applications, we can tell that a signal of interest exists but cannot perfectly “localize” it. For example, when regressing an outcome Y on highly correlated covariates (X1, X2), the data may suggest that at least one of (X1, X2) influences Y, but it may be challenging to tell which of (X1, X2) is important. Likewise, in genetic fine-mapping, biologists may have high confidence that a gene influences a disease without knowing precisely which genetic variants cause the disease. Similar problems arise in many settings with spatial or temporal structure, including change-point detection and astronomical point-source detection.

Bayesian Linear Programming (BLiP) is a method which jointly detects as many signals as possible while localizing them as precisely as possible. BLiP can wrap on top of nearly any Bayesian model or algorithm, and it will return a set of regions which each contain at least one signal with high confidence. For example, in regression problems, BLiP might return the region (X1, X2), which suggests that at least one of (X1, X2) is an important variable. BLiP controls the false discovery rate while also making these regions as narrow as possible, meaning that (roughly speaking) it will perfectly localize signals whenever this is possible!

blipr is an efficient python implementation of BLiP, which is designed so that BLiP can wrap on top of the Bayesian model in only one or two lines of code.

Installation

You can install blipr from GitHub:

# install.packages("remotes")
remotes::install_github("amspector100/blipr")

You may also need to install rcbc from GitHub as well:

remotes::install_github("dirkschumacher/rcbc")

If rcbc installation fails, please see the rcbc homepage.

Minimal example

Here, we apply BLiP to perform variable selection in a sparse linear regression. The first step is to generate synthetic data and fit a Bayesian model using, e.g., the ScaleSpikeSlab or NPrior packages.

# Generate sparse linear regression data
set.seed(123); n <- 100; p <- 200
data <- blipr::generate_regression_data(n=n, p=p)

# Fit a Bayesian spike-and-slab model using (e.g.) ScaleSpikeSlab
s3lm <- ScaleSpikeSlab::spike_slab_linear(
  chain_length=5000, X=data$X, y=data$y, tau0=0.01, tau1=1, q=10/p
)
post_samples <- s3lm$z

# Run BLiP on the posterior samples
detections <- blipr::BLiP(
  samples=post_samples, q=0.1, error='fdr'
)
for (j in 1:length(detections)) {
  group <- paste(detections[[j]]$group, collapse=', ')
  cat("BLiP has detected a signal in {", group, "}.\n", sep='')
}

Documentation

Documentation and tutorials are available at amspector100.github.io/blipr.

Development notes

Currently, there are utility two functions (lattice_peps and hierarchical_groups) which are slower than the version in the python pacakge, pyblip. Of course, these are just helper functions and blipr’s core functionality is quite fast. Nonetheless, we hope to substantially speed up the implementation of these functions in the near future, which will speed up the BLiP_cts wrapper.

Reference

If you use blipr or BLiP in an academic publication, please consider citing Spector and Janson (2022). The bibtex entry is below:

@article{AS-LJ:2022,
  title={Controlled Discovery and Localization of Signals via Bayesian Linear Programming},
  author={Spector, Asher and Janson, Lucas},
  journal={arXiv preprint arXiv:2203.17208},
  url={https://arxiv.org/abs/2203.17208},
  year={2022}
}