API Reference

BLiP

pyblip.blip.BLiP(samples=None, cand_groups=None, weight_fn='inverse_size', error='fdr', q=0.1, max_pep=0.25, deterministic=True, verbose=True, solver_verbose=False, perturb=True, max_iters=100, search_method='binary', return_problem_status=False, **kwargs)[source]

Given samples from a posterior or a list of CandidateGroup objects, performs resolution-adaptive signal detection to maximize power while controlling (e.g.) the FDR.

Note: when working with image data or a continuous set of locations, consider using BLiP_cts.

Parameters
samplesnp.array

An (N, p)-shaped array of posterior samples, where a nonzero value indicates the presence of a signal. Defaults to None.

cand_groupslist

A list of CandidateGroup objects. Defaults to None. Note either cand_groups or samples must be provided.

weight_fnstring or function

A function mapping CandidateGroup objects to nonnegative weights. This defines the power function. Also can be ‘inverse_size’, ‘log_inverse_size’, or ‘prespecified’. Defaults to ‘inverse_size’.

errorstring

The type of error rate to control. Must be one of ‘fdr’, ‘local_fdr’, ‘fwer’, ‘pfer’.

qfloat

The level at which to control the error rate.

max_pepfloat

BLiP automatically filters out candidate groups with a posterior error probability (PEP) above max_pep. Default: 0.5.

deterministicbool

If True, guarantees a deterministic solution. Defaults to True.

verbosebool

If True, gives occasional progress reports.

solver_verbosebool

If True, the underlying LP solver is verbose.

search_methodstring

For FWER control, how to find the optimal parameter for the LP. Either “none” or “binary.” Defaults to “binary”.

max_itersint

Maximum number of binary-search iterations for FWER problem. Defaults to 100.

perturbbool

If True, will perturb the weight function. Defaults to True.

return_problem_statusFalse

If True, will return extra information about the problem.

Returns
detectionslist

A list of CandidateGroup objects, which asserts that each group contains at least one signal.

problem_statusdict

A dict containing information about the BLiP optimization problem. Only returned if return_problem_status=True.

Examples

Here we fit BLiP to perform resolution-adaptive variable selection for a Gaussian linear model:

import numpy as np
import scipy.linalg
## Synthetic data generating process with AR1 design matrix
n, p, nsignals, rho = 100, 500, 20, 0.95
c = np.cumsum(np.zeros(p) + np.log(rho)) - np.log(rho)
cov = scipy.linalg.toeplitz(np.exp(c))
X = np.dot(np.random.randn(n, p), np.linalg.cholesky(cov).T)
# Sparse coefficients for linear model
beta = np.zeros(p)
signal_locations = np.random.choice(np.arange(p), nsignals)
beta[signal_locations] = np.random.randn(nsignals)
y = np.dot(X, beta) + np.random.randn(n)

## Fit linear spike and slab model
import pyblip
lm = pyblip.linear.LinearSpikeSlab(X=X, y=y)
lm.sample(N=1000, chains=10, bsize=2)

## Run BLiP
detections = pyblip.blip.BLiP(
        samples=lm.betas,
        q=0.1,
        error='fdr'
)
pyblip.blip.BLiP_cts(locs, grid_sizes=array([4.000e+00, 6.000e+00, 8.000e+00, 1.100e+01, 1.500e+01, 2.000e+01, 2.800e+01, 3.900e+01, 5.400e+01, 7.500e+01, 1.040e+02, 1.440e+02, 2.000e+02, 2.770e+02, 3.840e+02, 5.320e+02, 7.370e+02, 1.021e+03, 1.414e+03, 1.959e+03, 2.714e+03, 3.761e+03, 5.210e+03, 7.218e+03, 1.000e+04]), weight_fn=<function inverse_radius_weight>, extra_centers=None, max_pep=0.25, shape='circle', min_blip_size=1500, verbose=True, **kwargs)[source]

BLiP when the set of locations is continuous, e.g., when working with image data.

Parameters
locsnp.ndarray

A (N, num_disc, d)-dimensional array, where N is the number of posterior samples, d is the dimension. Each point represents to a signal in a posterior sample. NANs are ignored (useful when the number of signals changes between posterior iterations.)

grid_sizeslist or np.ndarray

List of grid sizes to split up the locations.

weight_fnstring or function

A function mapping CandidateGroup objects to nonnegative weights. This defines the power function. Defaults to inverse_radius.

extra_centersnp.ndarray

A (ncenters, d)-dimensional array. At each resolution, candidate groups will be computed with centers at these locations.

max_pepfloat

The maximum allowable PEP for output candidate groups. Defaults to 0.25.

shapestring

One of square or circle

min_blip_sizeint

Combines connected components so all subproblems are of at least this size.

verbosebool

If True (default), gives occasional status updates.

kwargsdict

Additional arguments to pass to the underlying BLiP call, such as the error rate or nominal level.

Returns
detectionslist

A list of CandidateGroup objects, which asserts that each group contains at least one signal.

problem_statusdict

A dict containing information about the BLiP optimization problem. Only returned if return_problem_status=True.

pyblip.blip.binarize_selections(cand_groups, q, error, deterministic, problem_status=None, return_problem_status=False, tol=0.001, nsample=10, verbose=False)[source]
Parameters
cand_groupslist

list of candidate groups.

qfloat

Level at which to control the error rate.

errorstring

The error to control: one of ‘fdr’, ‘fwer’, ‘pfer’, ‘local_fdr’

deterministicbool

If True, will not use a randomized solution.

nsampleint

Number of samples for randomized version.

Returns
detectionslist

List of candidate groups which have been detected.

Creating Candidate Groups

Variable selection / discrete locations

Functions for creating candidate groups from posterior samples

class pyblip.create_groups.CandidateGroup(group, pep, data=None)[source]

A single candidate group as an input to BLiP. Discovering a group asserts there is at least one signal in the group.

Parameters
grouplist or set

Set of locations in the group.

pepfloat

Posterior error probability (1 - PIP) for the group.

datadict

Miscallaneous other attributes to associate with the group.

Methods

to_dict()

Converts CandidateGroup into a dictionary.

to_dict()[source]

Converts CandidateGroup into a dictionary.

pyblip.create_groups.all_cand_groups(samples, X=None, q=0, max_pep=0.25, max_size=25, prenarrow=True, prefilter_thresholds=[0, 0.01, 0.02, 0.03, 0.05, 0.1, 0.2], min_purity=0.0, corr_matrix=None)[source]

Creates many candidate groups by prefiltering locations at various thresholds and then creating sequential and hierarchical candidate groups.

Parameters
samplesnp.ndarray

An (N, p)-shaped array of posterior samples, where a nonzero value indicates the presence of a signal.

Xnp.array

The n x p design matrix. Defaults to None. If provided, adds hierarchical groups based on a correlation cluster of X.

qfloat

The nominal level at which to control the error rate.

max_pepfloat

The maximum posterior error probability allowed in a candidate group. Default is 0.25.

max_sizefloat

Maximum size of a group. Default is 25.

prenarrowbool

If true, “prenarrows” the candidate groups as described in the paper. Defaults to False.

prefilter_thresholdslist

List of thresholds at which to prefilter the locations.

min_purityfloat

Minimum acceptable absolute correlation within a candidate group. Defaults to zero.

corr_matrixnp.array

Optional input; p x p correlation matrix. Only used if X=None and min_purity > 0.

Returns
cand_groupslist

A list of CandidateGroup objects.

pyblip.create_groups.filter_by_purity(cand_groups, min_purity=0, corr_matrix=None, X=None)[source]

Filters candidate groups to only include groups of features whose minimum absolute correlation is larger than min_purity.

Parameters
cand_groupslist

A list of CandidateGroup objects.

min_purityfloat

Minimum acceptable purity.

corr_matrixnp.array

p x p correlation matrix

Xnp.array

n x p design matrix

pyblip.create_groups.hierarchical_groups(samples, dist_matrix=None, max_pep=0.25, max_size=25, filter_sequential=False, min_purity=0, X=None, corr_matrix=None, **kwargs)[source]

Creates candidate groups by hierarchical clustering.

Parameters
samplesnp.ndarray

An (N, p)-shaped array of posterior samples, where a nonzero value indicates the presence of a signal.

dist_matrixnp.ndarray or list

A square numpy array corresponding to distances between locations. Can also be a list of different distance matrices. The default is to use a correlation matrix based on samples.

max_pepfloat

The maximum posterior error probability allowed in a candidate group. Default is 1.

max_sizefloat

Maximum size of a group. Default is 25.

filter_sequentialbool

If True, does not calculate PEPs for sequential (contiguous) groups of variables to avoid duplicates.

min_purityfloat

Minimum acceptable absolute correlation within a candidate group. Defaults to zero.

Xnp.array

Optional input; n x p design matrix. Only used if corr_matrix=None and min_purity > 0.

corr_matrixnp.array

Optional input; p x p correlation matrix. Only used if X=None and min_purity > 0.

Returns
cand_groupslist

A list of CandidateGroup objects.

pyblip.create_groups.sequential_groups(samples=None, susie_alphas=None, q=0, max_pep=0.25, max_size=25, prenarrow=False, min_purity=0.0, X=None, corr_matrix=None)[source]

Calculates all sequential candidate groups below max_size.

Parameters
samplesnp.ndarray

An (N, p)-shaped array of posterior samples, where a nonzero value indicates the presence of a signal.

susie_alphasnp.ndarray

As an alternative to posterior samples, users may specify an L x p matrix of alphas from a SuSiE object. However, calling susie_groups is recommended instead.

qfloat

The nominal level at which to control the error rate.

max_pepfloat

The maximum posterior error probability allowed in a candidate group. Default is 1.

max_sizefloat

Maximum size of a group. Default is 25.

prenarrowbool

If true, “prenarrows” the candidate groups as described in the paper. Defaults to False.

min_purityfloat

Minimum acceptable absolute correlation within a candidate group. Defaults to zero.

Xnp.array

Optional input; n x p design matrix. Only used if corr_matrix=None and min_purity > 0.

corr_matrixnp.array

Optional input; p x p correlation matrix. Only used if X=None and min_purity > 0.

Returns
cand_groupslist

A list of CandidateGroup objects.

pyblip.create_groups.susie_groups(alphas, q, X=None, max_pep=0.25, max_size=25, prenarrow=False, purity_threshold=0.0, k_threshold=None, min_purity=0.0, corr_matrix=None)[source]

Creates candidate groups based on a SuSiE fit.

Parameters
alphasnp.array

An (L, p)-shaped matrix of alphas from a SuSiE object, L is the number of SuSiE iterations and p is the number of covariates.

qfloat

The level at which to control the error rate

Xnp.array

The n x p design matrix. Defaults to None. If provided, adds hierarchical groups based on a correlation cluster of X.

max_pepfloat

The maximum posterior error probability allowed in a candidate group. Default is 1.

max_sizefloat

Maximum size of a group. Default is 25.

prenarrowbool

If true, “prenarrows” the candidate groups as described in the paper. Defaults to False.

purity_thresholdfloat

When computing PIPs, ignores SuSiE iterations which do not pass this purity threshold, as in the original SusiE paper.

k_thresholdfloat

When computing PIPs, ignores SuSiE iterations for which the vanilla SuSiE CS exceeds this size.

min_purityfloat

Minimum acceptable absolute correlation within a candidate group. Defaults to zero.

corr_matrixnp.array

Optional input; p x p correlation matrix. Only used if X=None and min_purity > 0.

Returns
cand_groupslist

A list of CandidateGroup objects.

Continuous sets of locations

Creates candidate groups when signals could appear anywhere in a continuous d-dimensional space.

pyblip.create_groups_cts.grid_peps(locs, grid_sizes, count_signals=False, extra_centers=None, max_pep=0.25, log_interval=None, time0=None, shape='square')[source]
Parameters
locsnp.array

A (N, num_disc, d)-dimensional array. Here, N is the number of samples from the posterior, d is the number of dimensions of the space, and each point corresponds to a signal in a particular posterior sample.

grid_sizeslist or np.ndarray

List of grid-sizes to split up the locations.

count_signalsbool

If True, the number of signals in each group is counted.

extra_centersnp.ndarray

A (ncenters, d)-dimensional array. At each resolution, candidate groups will be computed with centers at these locations.

shapestring

One of square or circle

Returns
cand_groupslist

A list of CandidateGroup objects.

pyblip.create_groups_cts.grid_peps_to_cand_groups(filtered_peps, time0=None, min_blip_size=1000, verbose=False, shape='square', max_pep=0.25, min_pep=0.001)[source]

Turns the output of the grid_peps function into a list of list of CandidateGroups. Each sub-list corresponds to a list of completely disconnected CandidateGroups which can be fed to BLiP separately (this saves computation).

filtered_pepsdict

An output of the grid_peps function.

time0float

The initial time the analysis started, useful for logging.

min_blip_sizeint

Combines connected components so all subproblems are at least this size.

verbose: bool

If True, will report progress over time. Default: False.

shapestring

One of square or circle

max_pepfloat

The maximum pep for candidate groups. Default: 1.

min_pepfloat

Once we achieve a pep of this level for candidate groups, we do not search to find peps any lower. Default: 0.001.

pyblip.create_groups_cts.normalize_locs(locs)[source]
Returns
locsnp.array

locs, but normalized such that all values lie in [0,1]. NAs are converted to -1.

shiftsnp.array

d-dimensional array corresponding to the shifts applied in the normalizatin.

scalesnp.array

d-dimension array corresponding to the scales in the normalization.

Edge clique cover

Polynomial-time function to find a (heuristically) minimal edge clique cover

pyblip.ecc.edge_clique_cover(G)[source]
Parameters
Gnetworkx Graph

Undirected graph.

Returns
cliqueslist

List of cliques. Each clique is a list of ints where the integers correspond to the nodes of G.

Weight functions

A few default choices of weight function to define power.

pyblip.weight_fns.inverse_area_weight(cand_group)[source]

Weights candidate groups by the inverse of their area.

Parameters
cand_groupCandidateGroup

The candidate group

Returns
weightfloat

The associated weight when calculating power.

pyblip.weight_fns.inverse_radius_weight(cand_group)[source]

Weights candidate groups by the inverse of the radius.

Parameters
cand_groupCandidateGroup

The candidate group

Returns
weightfloat

The associated weight when calculating power.

pyblip.weight_fns.inverse_size(cand_group)[source]

Weights candidate groups by 1 / len(cand_group.group)

Parameters
cand_groupCandidateGroup

The candidate group

Returns
weightfloat

The associated weight when calculating power.

pyblip.weight_fns.log_inverse_size(cand_group)[source]

Weights candidate groups by 1 / (1 + log2(len(cand_group.group)))

Parameters
cand_groupCandidateGroup

The candidate group

Returns
weightfloat

The associated weight when calculating power.

Bayesian Samplers

BLiP can wrap on top of nearly any Bayesian model or algorithm. However, for convenience, pyblip includes a few custom MCMC samplers.

Linear Spike and Slab

class pyblip.linear.linear.LinearSpikeSlab(X, y, p0=0.9, p0_a0=1, p0_b0=1, update_p0=True, min_p0=0, sigma2=1, update_sigma2=True, sigma2_a0=2, sigma2_b0=1, tau2=1, tau2_a0=2, tau2_b0=1, update_tau2=True)[source]

Spike-and-slab model for linear regression.

Parameters
Xnp.array

(n,p)-shaped design matrix.

ynp.array

n-length array of responses.

p0float

Prior probability that any coefficient equals zero.

update_p0bool

If True, updates p0 using a Beta hyperprior on p0. Else, the value of p0 is fixed.

p0_a0float

If update_p0 is True, p0 has a Beta(p0_a0, p0_b0, min_p0) hyperprior.

p0_b0float

If update_p0 is True, p0 has a TruncBeta(p0_a0, p0_b0, min_p0) hyperprior.

min_p0float

Minimum value for p0 as specified by the prior.

sigma2float

Variance of y given X.

update_sigma2bool

If True, imposes an InverseGamma hyperprior on sigma2. Else, the value of sigma2 is fixed.

sigma2_a0float

If update_sigma2 is True, sigma2 has an InvGamma(sigma2_a0, sigma2_b0) hyperprior.

sigma2_b0float

If update_sigma2 is True, sigma2 has an InvGamma(sigma2_a0, sigma2_b0) hyperprior.

tau2float

Prior variance on nonzero coefficients.

update_tau2bool

If True, imposes an InverseGamma hyperprior on tau2. Else, the value of tau2 is fixed.

tau2_a0float

If update_tau2 is True, tau2 has an InvGamma(tau2_a0, tau2_b0) hyperprior.

tau2_b0float

If update_tau2 is True, tau2 has an InvGamma(tau2_a0, tau2_b0) hyperprior.

Methods

sample:

Samples from the posterior using Gibbs sampling.

sample(N, burn=100, chains=1, num_processes=1, bsize=1, max_signals_per_block=None)[source]
Nint

Number of samples per chain

burnint

Number of samples to burn per chain

chainsint

Number of chains to run

num_processesint

How many processes to use

bsizeint

Maximum block size within gibbs sampling. Default: 1.

max_signals_per_blockint

Maximum number of signals allowed per block. Default: None (this places no restrictions on the number of signals per block). The default is highly recommended.

Probit Spike and Slab

class pyblip.probit.probit.ProbitSpikeSlab(*args, **kwargs)[source]

Spike-and-slab model for probit regression. The arguments are identical to those for LinearSpikeSlab, except y should be binary.

Methods

sample:

Samples from the posterior using Gibbs sampling.

sample(N, burn=100, chains=1, num_processes=1, bsize=1, max_signals_per_block=None)[source]
Parameters
Nint

Number of samples per chain

burnint

Number of samples to burn per chain

chainsint

Number of chains to run

num_processesint

How many processes to use

bsizeint

Maximum block size within gibbs sampling. Default: 1.

max_signals_per_blockint

Maximum number of signals allowed per block. Default: None (this places no restrictions on the number of signals per block). The default is highly recommended.

Neuronized Priors

class pyblip.nprior.nprior.NPrior(X, y, tauw2, p0=None, update_p0=True, min_p0=1e-10, sigma_prior_type=0, sigma_a0=5, sigma_b0=1, alpha0_a0=1, alpha0_b0=1)[source]

Implements Neuronized Prior sampler for spike-and-slab regression.

Parameters
Xnp.array

(n,p)-shaped design matrix

ynp.array

(n,)-shaped vector of response data

p0float

The initial parameter for the proportion of nulls. Defaults to 1 - min(0.01, 1/p).

update_p0bool

If true, will update p0 throughout MCMC sampling using a uniform hyperprior.

min_p0float

If updating p0 throughout uniform sampling, will force p0 to be above min_p0. This can dramatically speed up computation in very high-dimensional settings.

sigma_prior_typeinteger

If 0, assumes sigma2 is conditionally independent of the coefficients given the residuals.

tauw2float

prior variance of the weight parameter

a0float

sigma2 has an inverse-gamma prior with parameters a0, b0

b0float

sigma2 has an inverse-gamma prior with parameters a0, b0

Notes

See https://arxiv.org/pdf/1810.00141.pdf.

Methods

sample([N, burn, chains, num_processes, ...])

Parameters

sample(N=100, burn=10, chains=1, num_processes=1, joint_sample_W=True, group_alpha_update=True, log_interval=None)[source]
Parameters
Nint

The number of samples to draw from the chain

burnint

The burn-in period for each chain.

chainsint

The number of independent MCMC chains to run.

num_processesint

The number of processes to run the chains.

joint_sample_Wbool

If true, will jointly sample the “W” variables at each iteration before individually resampling alpha and W. This can improve sample efficiency but is a computational bottleneck in high dimensions.

group_alpha_updatebool

If true, does a joint group-move update to estimate the sparsity. Else, uses the standard conjugacy rules for a Uniform prior on the sparsity.

log_intervalint

Will log progress after log_interval iterations. Defaults to None (no logging).

Changepoint Detection

pyblip.changepoint.detect_changepoints(Y, q=0.1, lm_kwargs={}, sample_kwargs={}, blip_kwargs={})[source]

Changepoint detection with BLiP using the LinearSpikeSlab sampler.

Parameters
Ynp.array

Array of observations in order they were observed.

qfloat

Level at which to control the FDR.

**kwargsdict

Optional inputs to linear spike slab model.

**sample_kwargsdict

Optional inputs to sample method of linear spike slab model.

**blip_kwargsdict

Optional inputs to BLiP.