GaussianSampler

class knockpy.knockoffs.GaussianSampler(X, mu=None, Sigma=None, invSigma=None, groups=None, sample_tol=1e-05, S=None, method=None, verbose=False, **kwargs)[source]

Bases: knockpy.knockoffs.KnockoffSampler

Samples MX Gaussian (group) knockoffs.

Parameters
Xnp.ndarray

the (n, p)-shaped design

munp.ndarray

(p, )-shaped mean of the features. If None, this defaults to the empirical mean of the features.

Sigmanp.ndarray

(p, p)-shaped covariance matrix of the features. If None, this is estimated using the utilities.estimate_covariance function.

groupsnp.ndarray

For group knockoffs, a p-length array of integers from 1 to num_groups such that groups[j] == i indicates that variable j is a member of group i. Defaults to None (regular knockoffs).

Snp.ndarray

the (p, p)-shaped knockoff S-matrix used to generate knockoffs. This is defined such that Cov(X, tilde(X)) = Sigma - S. When None, will be constructed by knockoff generator. Defaults to None.

methodstr

Specifies how to construct S matrix. This will be ignored if S is not None. There are several options:

  • ‘mvr’: Minimum Variance-Based Reconstructability knockoffs.

  • ‘mmi’: Minimizes the mutual information between X and the knockoffs.

  • ‘ci’: Conditional independence knockoffs.

  • ‘sdp’: minimize the mean absolute covariance (MAC) between the features

and the knockoffs. - ‘equicorrelated’: Minimizes the MAC under the constraint that the the correlation between each feature and its knockoff is the same.

The default is to use mvr for non-group knockoffs, and to use the group-SDP for grouped knockoffs (the implementation for group mvr knockoffs is currently fairly slow). In both cases we use a block-diagonal approximation if the number if features is greater than 1000.

objectivestr

How to optimize the S matrix if using the SDP for group knockoffs. There are several options:

  • ‘abs’: minimize sum(abs(Sigma - S))

between groups and the group knockoffs. - ‘pnorm’: minimize Lp-th matrix norm. Equivalent to abs when p = 1. - ‘norm’: minimize different type of matrix norm (see norm_type below).

sample_tolfloat

Minimum eigenvalue allowed for feature-knockoff covariance matrix. Keep this small but nonzero (1e-5) to prevent numerical errors.

verbosebool

If True, prints progress over time

rec_propfloat

The proportion of knockoffs to recycle (see Barber and Candes 2018, https://arxiv.org/abs/1602.03574). If method = ‘mvr’, then S_generation takes this into account and should increase the power of recycled knockoffs. sparsely-correlated, high-dimensional settings.

kwargsdict

Other kwargs for S-matrix solvers.

Methods

check_PSD_condition(Sigma, S)

Checks that the feature-knockoff cov matrix is PSD.

check_xk_validity(X, Xk[, testname, alpha])

Runs a variety of KS tests on X and Xk to (informally) check that Xk are valid knockoffs for X.

fetch_S()

Fetches knockoff S-matrix.

many_ks_tests(sample1s, sample2s)

Samples1s, Sample2s = list of arrays Gets p values by running ks tests and then does a multiple testing correction.

sample_knockoffs([check_psd])

Samples knockoffs.

Methods Summary

fetch_S()

Fetches knockoff S-matrix.

sample_knockoffs([check_psd])

Samples knockoffs.

Methods Documentation

fetch_S()[source]

Fetches knockoff S-matrix.

sample_knockoffs(check_psd=False)[source]

Samples knockoffs. returns n x p knockoff matrix.

Parameters
check_psdbool

If True, will check and enforce that S is a valid S-matrix. Defalts to False.