API Reference

BLiP

pyblip.blip.BLiP(samples=None, cand_groups=None, weight_fn='inverse_size', error='fdr', q=0.1, max_pep=0.25, deterministic=True, verbose=True, solver_verbose=False, perturb=True, max_iters=100, search_method='binary', return_problem_status=False, **kwargs)[source]

Given samples from a posterior or a list of CandidateGroup objects, performs resolution-adaptive signal detection to maximize power while controlling (e.g.) the FDR.

Note: when working with image data or a continuous set of locations, consider using BLiP_cts.

Parameters

samplesnp.array: An (N, p)-shaped array of posterior samples, where a nonzero value indicates the presence of a signal. Defaults to None.
cand_groupslist: A list of CandidateGroup objects. Defaults to None. Note either cand_groups or samples must be provided.
weight_fnstring or function: A function mapping CandidateGroup objects to nonnegative weights. This defines the power function. Also can be ‘inverse_size’, ‘log_inverse_size’, or ‘prespecified’. Defaults to ‘inverse_size’.
errorstring: The type of error rate to control. Must be one of ‘fdr’, ‘local_fdr’, ‘fwer’, ‘pfer’.
qfloat: The level at which to control the error rate.
max_pepfloat: BLiP automatically filters out candidate groups with a posterior error probability (PEP) above max_pep. Default: 0.5.
deterministicbool: If True, guarantees a deterministic solution. Defaults to True.
verbosebool: If True, gives occasional progress reports.
solver_verbosebool: If True, the underlying LP solver is verbose.
search_methodstring: For FWER control, how to find the optimal parameter for the LP. Either “none” or “binary.” Defaults to “binary”.
max_itersint: Maximum number of binary-search iterations for FWER problem. Defaults to 100.
perturbbool: If True, will perturb the weight function. Defaults to True.
return_problem_statusFalse: If True, will return extra information about the problem.

Returns

detectionslist: A list of CandidateGroup objects, which asserts that each group contains at least one signal.
problem_statusdict: A dict containing information about the BLiP optimization problem. Only returned if return_problem_status=True.

Examples

Here we fit BLiP to perform resolution-adaptive variable selection for a Gaussian linear model:

import numpy as np
import scipy.linalg
## Synthetic data generating process with AR1 design matrix
n, p, nsignals, rho = 100, 500, 20, 0.95
c = np.cumsum(np.zeros(p) + np.log(rho)) - np.log(rho)
cov = scipy.linalg.toeplitz(np.exp(c))
X = np.dot(np.random.randn(n, p), np.linalg.cholesky(cov).T)
# Sparse coefficients for linear model
beta = np.zeros(p)
signal_locations = np.random.choice(np.arange(p), nsignals)
beta[signal_locations] = np.random.randn(nsignals)
y = np.dot(X, beta) + np.random.randn(n)

## Fit linear spike and slab model
import pyblip
lm = pyblip.linear.LinearSpikeSlab(X=X, y=y)
lm.sample(N=1000, chains=10, bsize=2)

## Run BLiP
detections = pyblip.blip.BLiP(
        samples=lm.betas,
        q=0.1,
        error='fdr'
)

pyblip.blip.BLiP_cts(locs, grid_sizes=array([4.000e+00, 6.000e+00, 8.000e+00, 1.100e+01, 1.500e+01, 2.000e+01, 2.800e+01, 3.900e+01, 5.400e+01, 7.500e+01, 1.040e+02, 1.440e+02, 2.000e+02, 2.770e+02, 3.840e+02, 5.320e+02, 7.370e+02, 1.021e+03, 1.414e+03, 1.959e+03, 2.714e+03, 3.761e+03, 5.210e+03, 7.218e+03, 1.000e+04]), weight_fn=<function inverse_radius_weight>, extra_centers=None, max_pep=0.25, shape='circle', min_blip_size=1500, verbose=True, **kwargs)[source]

BLiP when the set of locations is continuous, e.g., when working with image data.

Parameters

locsnp.ndarray: A (N, num_disc, d)-dimensional array, where N is the number of posterior samples, d is the dimension. Each point represents to a signal in a posterior sample. NANs are ignored (useful when the number of signals changes between posterior iterations.)
grid_sizeslist or np.ndarray: List of grid sizes to split up the locations.
weight_fnstring or function: A function mapping CandidateGroup objects to nonnegative weights. This defines the power function. Defaults to inverse_radius.
extra_centersnp.ndarray: A (ncenters, d)-dimensional array. At each resolution, candidate groups will be computed with centers at these locations.
max_pepfloat: The maximum allowable PEP for output candidate groups. Defaults to 0.25.
shapestring: One of square or circle
min_blip_sizeint: Combines connected components so all subproblems are of at least this size.
verbosebool: If True (default), gives occasional status updates.
kwargsdict: Additional arguments to pass to the underlying BLiP call, such as the error rate or nominal level.

Returns

detectionslist: A list of CandidateGroup objects, which asserts that each group contains at least one signal.
problem_statusdict: A dict containing information about the BLiP optimization problem. Only returned if return_problem_status=True.

pyblip.blip.binarize_selections(cand_groups, q, error, deterministic, problem_status=None, return_problem_status=False, tol=0.001, nsample=10, verbose=False)[source]

Parameters

cand_groupslist: list of candidate groups.
qfloat: Level at which to control the error rate.
errorstring: The error to control: one of ‘fdr’, ‘fwer’, ‘pfer’, ‘local_fdr’
deterministicbool: If True, will not use a randomized solution.
nsampleint: Number of samples for randomized version.

Returns

detectionslist: List of candidate groups which have been detected.

Creating Candidate Groups

Variable selection / discrete locations

Functions for creating candidate groups from posterior samples

class pyblip.create_groups.CandidateGroup(group, pep, data=None)[source]

A single candidate group as an input to BLiP. Discovering a group asserts there is at least one signal in the group.

Parameters

grouplist or set: Set of locations in the group.
pepfloat: Posterior error probability (1 - PIP) for the group.
datadict: Miscallaneous other attributes to associate with the group.

Methods

to_dict()

Converts CandidateGroup into a dictionary.

to_dict()[source]: Converts CandidateGroup into a dictionary.

pyblip.create_groups.all_cand_groups(samples, X=None, q=0, max_pep=0.25, max_size=25, prenarrow=True, prefilter_thresholds=[0, 0.01, 0.02, 0.03, 0.05, 0.1, 0.2], min_purity=0.0, corr_matrix=None)[source]

Creates many candidate groups by prefiltering locations at various thresholds and then creating sequential and hierarchical candidate groups.

Parameters

samplesnp.ndarray: An (N, p)-shaped array of posterior samples, where a nonzero value indicates the presence of a signal.
Xnp.array: The n x p design matrix. Defaults to None. If provided, adds hierarchical groups based on a correlation cluster of X.
qfloat: The nominal level at which to control the error rate.
max_pepfloat: The maximum posterior error probability allowed in a candidate group. Default is 0.25.
max_sizefloat: Maximum size of a group. Default is 25.
prenarrowbool: If true, “prenarrows” the candidate groups as described in the paper. Defaults to False.
prefilter_thresholdslist: List of thresholds at which to prefilter the locations.
min_purityfloat: Minimum acceptable absolute correlation within a candidate group. Defaults to zero.
corr_matrixnp.array: Optional input; p x p correlation matrix. Only used if X=None and min_purity > 0.

Returns

cand_groupslist: A list of CandidateGroup objects.

pyblip.create_groups.filter_by_purity(cand_groups, min_purity=0, corr_matrix=None, X=None)[source]

Filters candidate groups to only include groups of features whose minimum absolute correlation is larger than min_purity.

Parameters

cand_groupslist: A list of CandidateGroup objects.
min_purityfloat: Minimum acceptable purity.
corr_matrixnp.array: p x p correlation matrix
Xnp.array: n x p design matrix

pyblip.create_groups.hierarchical_groups(samples, dist_matrix=None, max_pep=0.25, max_size=25, filter_sequential=False, min_purity=0, X=None, corr_matrix=None, **kwargs)[source]

Creates candidate groups by hierarchical clustering.

Parameters

samplesnp.ndarray: An (N, p)-shaped array of posterior samples, where a nonzero value indicates the presence of a signal.
dist_matrixnp.ndarray or list: A square numpy array corresponding to distances between locations. Can also be a list of different distance matrices. The default is to use a correlation matrix based on samples.
max_pepfloat: The maximum posterior error probability allowed in a candidate group. Default is 1.
max_sizefloat: Maximum size of a group. Default is 25.
filter_sequentialbool: If True, does not calculate PEPs for sequential (contiguous) groups of variables to avoid duplicates.
min_purityfloat: Minimum acceptable absolute correlation within a candidate group. Defaults to zero.
Xnp.array: Optional input; n x p design matrix. Only used if corr_matrix=None and min_purity > 0.
corr_matrixnp.array: Optional input; p x p correlation matrix. Only used if X=None and min_purity > 0.

Returns

cand_groupslist: A list of CandidateGroup objects.

pyblip.create_groups.sequential_groups(samples=None, susie_alphas=None, q=0, max_pep=0.25, max_size=25, prenarrow=False, min_purity=0.0, X=None, corr_matrix=None)[source]

Calculates all sequential candidate groups below max_size.

Parameters

samplesnp.ndarray: An (N, p)-shaped array of posterior samples, where a nonzero value indicates the presence of a signal.
susie_alphasnp.ndarray: As an alternative to posterior samples, users may specify an L x p matrix of alphas from a SuSiE object. However, calling susie_groups is recommended instead.
qfloat: The nominal level at which to control the error rate.
max_pepfloat: The maximum posterior error probability allowed in a candidate group. Default is 1.
max_sizefloat: Maximum size of a group. Default is 25.
prenarrowbool: If true, “prenarrows” the candidate groups as described in the paper. Defaults to False.
min_purityfloat: Minimum acceptable absolute correlation within a candidate group. Defaults to zero.
Xnp.array: Optional input; n x p design matrix. Only used if corr_matrix=None and min_purity > 0.
corr_matrixnp.array: Optional input; p x p correlation matrix. Only used if X=None and min_purity > 0.

Returns

cand_groupslist: A list of CandidateGroup objects.

pyblip.create_groups.susie_groups(alphas, q, X=None, max_pep=0.25, max_size=25, prenarrow=False, purity_threshold=0.0, k_threshold=None, min_purity=0.0, corr_matrix=None)[source]

Creates candidate groups based on a SuSiE fit.

Parameters

alphasnp.array: An (L, p)-shaped matrix of alphas from a SuSiE object, L is the number of SuSiE iterations and p is the number of covariates.
qfloat: The level at which to control the error rate
Xnp.array: The n x p design matrix. Defaults to None. If provided, adds hierarchical groups based on a correlation cluster of X.
max_pepfloat: The maximum posterior error probability allowed in a candidate group. Default is 1.
max_sizefloat: Maximum size of a group. Default is 25.
prenarrowbool: If true, “prenarrows” the candidate groups as described in the paper. Defaults to False.
purity_thresholdfloat: When computing PIPs, ignores SuSiE iterations which do not pass this purity threshold, as in the original SusiE paper.
k_thresholdfloat: When computing PIPs, ignores SuSiE iterations for which the vanilla SuSiE CS exceeds this size.
min_purityfloat: Minimum acceptable absolute correlation within a candidate group. Defaults to zero.
corr_matrixnp.array: Optional input; p x p correlation matrix. Only used if X=None and min_purity > 0.

Returns

cand_groupslist: A list of CandidateGroup objects.

Continuous sets of locations

Creates candidate groups when signals could appear anywhere in a continuous d-dimensional space.

pyblip.create_groups_cts.grid_peps(locs, grid_sizes, count_signals=False, extra_centers=None, max_pep=0.25, log_interval=None, time0=None, shape='square')[source]

Parameters

locsnp.array: A (N, num_disc, d)-dimensional array. Here, N is the number of samples from the posterior, d is the number of dimensions of the space, and each point corresponds to a signal in a particular posterior sample.
grid_sizeslist or np.ndarray: List of grid-sizes to split up the locations.
count_signalsbool: If True, the number of signals in each group is counted.
extra_centersnp.ndarray: A (ncenters, d)-dimensional array. At each resolution, candidate groups will be computed with centers at these locations.
shapestring: One of square or circle

Returns

cand_groupslist: A list of CandidateGroup objects.

pyblip.create_groups_cts.grid_peps_to_cand_groups(filtered_peps, time0=None, min_blip_size=1000, verbose=False, shape='square', max_pep=0.25, min_pep=0.001)[source]

Turns the output of the grid_peps function into a list of list of CandidateGroups. Each sub-list corresponds to a list of completely disconnected CandidateGroups which can be fed to BLiP separately (this saves computation).

filtered_pepsdict: An output of the grid_peps function.
time0float: The initial time the analysis started, useful for logging.
min_blip_sizeint: Combines connected components so all subproblems are at least this size.
verbose: bool: If True, will report progress over time. Default: False.
shapestring: One of square or circle
max_pepfloat: The maximum pep for candidate groups. Default: 1.
min_pepfloat: Once we achieve a pep of this level for candidate groups, we do not search to find peps any lower. Default: 0.001.

pyblip.create_groups_cts.normalize_locs(locs)[source]

Returns

locsnp.array: locs, but normalized such that all values lie in [0,1]. NAs are converted to -1.
shiftsnp.array: d-dimensional array corresponding to the shifts applied in the normalizatin.
scalesnp.array: d-dimension array corresponding to the scales in the normalization.

Edge clique cover

Polynomial-time function to find a (heuristically) minimal edge clique cover

pyblip.ecc.edge_clique_cover(G)[source]

Parameters

Gnetworkx Graph: Undirected graph.

Returns

cliqueslist: List of cliques. Each clique is a list of ints where the integers correspond to the nodes of G.

Weight functions

A few default choices of weight function to define power.

pyblip.weight_fns.inverse_area_weight(cand_group)[source]

Weights candidate groups by the inverse of their area.

Parameters

cand_groupCandidateGroup: The candidate group

Returns

weightfloat: The associated weight when calculating power.

pyblip.weight_fns.inverse_radius_weight(cand_group)[source]

Weights candidate groups by the inverse of the radius.

Parameters

cand_groupCandidateGroup: The candidate group

Returns

weightfloat: The associated weight when calculating power.

pyblip.weight_fns.inverse_size(cand_group)[source]

Weights candidate groups by 1 / len(cand_group.group)

Parameters

cand_groupCandidateGroup: The candidate group

Returns

weightfloat: The associated weight when calculating power.

pyblip.weight_fns.log_inverse_size(cand_group)[source]

Weights candidate groups by 1 / (1 + log2(len(cand_group.group)))

Parameters

cand_groupCandidateGroup: The candidate group

Returns

weightfloat: The associated weight when calculating power.

Bayesian Samplers

BLiP can wrap on top of nearly any Bayesian model or algorithm. However, for convenience, pyblip includes a few custom MCMC samplers.

Linear Spike and Slab

class pyblip.linear.linear.LinearSpikeSlab(X, y, p0=0.9, p0_a0=1, p0_b0=1, update_p0=True, min_p0=0, sigma2=1, update_sigma2=True, sigma2_a0=2, sigma2_b0=1, tau2=1, tau2_a0=2, tau2_b0=1, update_tau2=True)[source]

Spike-and-slab model for linear regression.

Parameters

Xnp.array: (n,p)-shaped design matrix.
ynp.array: n-length array of responses.
p0float: Prior probability that any coefficient equals zero.
update_p0bool: If True, updates p0 using a Beta hyperprior on p0. Else, the value of p0 is fixed.
p0_a0float: If update_p0 is True, p0 has a Beta(p0_a0, p0_b0, min_p0) hyperprior.
p0_b0float: If update_p0 is True, p0 has a TruncBeta(p0_a0, p0_b0, min_p0) hyperprior.
min_p0float: Minimum value for p0 as specified by the prior.
sigma2float: Variance of y given X.
update_sigma2bool: If True, imposes an InverseGamma hyperprior on sigma2. Else, the value of sigma2 is fixed.
sigma2_a0float: If update_sigma2 is True, sigma2 has an InvGamma(sigma2_a0, sigma2_b0) hyperprior.
sigma2_b0float: If update_sigma2 is True, sigma2 has an InvGamma(sigma2_a0, sigma2_b0) hyperprior.
tau2float: Prior variance on nonzero coefficients.
update_tau2bool: If True, imposes an InverseGamma hyperprior on tau2. Else, the value of tau2 is fixed.
tau2_a0float: If update_tau2 is True, tau2 has an InvGamma(tau2_a0, tau2_b0) hyperprior.
tau2_b0float: If update_tau2 is True, tau2 has an InvGamma(tau2_a0, tau2_b0) hyperprior.

Methods

sample:

Samples from the posterior using Gibbs sampling.

sample(N, burn=100, chains=1, num_processes=1, bsize=1, max_signals_per_block=None)[source]

Nint: Number of samples per chain
burnint: Number of samples to burn per chain
chainsint: Number of chains to run
num_processesint: How many processes to use
bsizeint: Maximum block size within gibbs sampling. Default: 1.
max_signals_per_blockint: Maximum number of signals allowed per block. Default: None (this places no restrictions on the number of signals per block). The default is highly recommended.

Probit Spike and Slab

class pyblip.probit.probit.ProbitSpikeSlab(*args, **kwargs)[source]

Spike-and-slab model for probit regression. The arguments are identical to those for LinearSpikeSlab, except y should be binary.

Methods

sample:

Samples from the posterior using Gibbs sampling.

sample(N, burn=100, chains=1, num_processes=1, bsize=1, max_signals_per_block=None)[source]

Parameters

Nint: Number of samples per chain
burnint: Number of samples to burn per chain
chainsint: Number of chains to run
num_processesint: How many processes to use
bsizeint: Maximum block size within gibbs sampling. Default: 1.
max_signals_per_blockint: Maximum number of signals allowed per block. Default: None (this places no restrictions on the number of signals per block). The default is highly recommended.

Neuronized Priors

class pyblip.nprior.nprior.NPrior(X, y, tauw2, p0=None, update_p0=True, min_p0=1e-10, sigma_prior_type=0, sigma_a0=5, sigma_b0=1, alpha0_a0=1, alpha0_b0=1)[source]

Implements Neuronized Prior sampler for spike-and-slab regression.

Parameters

Xnp.array: (n,p)-shaped design matrix
ynp.array: (n,)-shaped vector of response data
p0float: The initial parameter for the proportion of nulls. Defaults to 1 - min(0.01, 1/p).
update_p0bool: If true, will update p0 throughout MCMC sampling using a uniform hyperprior.
min_p0float: If updating p0 throughout uniform sampling, will force p0 to be above min_p0. This can dramatically speed up computation in very high-dimensional settings.
sigma_prior_typeinteger: If 0, assumes sigma2 is conditionally independent of the coefficients given the residuals.
tauw2float: prior variance of the weight parameter
a0float: sigma2 has an inverse-gamma prior with parameters a0, b0
b0float: sigma2 has an inverse-gamma prior with parameters a0, b0

Notes

See https://arxiv.org/pdf/1810.00141.pdf.

Methods

sample([N, burn, chains, num_processes, ...])

Parameters

sample(N=100, burn=10, chains=1, num_processes=1, joint_sample_W=True, group_alpha_update=True, log_interval=None)[source]

Parameters

Nint: The number of samples to draw from the chain
burnint: The burn-in period for each chain.
chainsint: The number of independent MCMC chains to run.
num_processesint: The number of processes to run the chains.
joint_sample_Wbool: If true, will jointly sample the “W” variables at each iteration before individually resampling alpha and W. This can improve sample efficiency but is a computational bottleneck in high dimensions.
group_alpha_updatebool: If true, does a joint group-move update to estimate the sparsity. Else, uses the standard conjugacy rules for a Uniform prior on the sparsity.
log_intervalint: Will log progress after log_interval iterations. Defaults to None (no logging).

Changepoint Detection

pyblip.changepoint.detect_changepoints(Y, q=0.1, lm_kwargs={}, sample_kwargs={}, blip_kwargs={})[source]

Changepoint detection with BLiP using the LinearSpikeSlab sampler.

Parameters

Ynp.array: Array of observations in order they were observed.
qfloat: Level at which to control the FDR.
**kwargsdict: Optional inputs to linear spike slab model.
**sample_kwargsdict: Optional inputs to sample method of linear spike slab model.
**blip_kwargsdict: Optional inputs to BLiP.