Neural Networks¶

The module pyro.nn provides implementations of neural network modules that are useful in the context of deep probabilistic programming.

Pyro Modules¶

Pyro includes an experimental class PyroModule, a subclass of torch.nn.Module, whose attributes can be modified by Pyro effects. To create a poutine-aware attribute, use either the PyroParam struct or the PyroSample struct:

my_module = PyroModule()
my_module.x = PyroParam(torch.tensor(1.), constraint=constraints.positive)
my_module.y = PyroSample(dist.Normal(0, 1))

class PyroParam¶

Bases: tuple

Structure to declare a Pyro-managed learnable parameter of a PyroModule.

constraint¶: Alias for field number 1

event_dim¶: Alias for field number 2

init_value¶: Alias for field number 0

class PyroSample¶

Bases: tuple

Structure to declare a Pyro-managed random parameter of a PyroModule.

prior¶: Alias for field number 0

class PyroModule(name='')[source]¶

Bases: torch.nn.modules.module.Module

EXPERIMENTAL Subclass of torch.nn.Module whose attributes can be modified by Pyro effects. Attributes can be set using helpers PyroParam and PyroSample , and methods can be decorated by pyro_method() .

Parameters

To create a Pyro-managed parameter attribute, set that attribute using either torch.nn.Parameter (for unconstrained parameters) or PyroParam (for constrained parameters). Reading that attribute will then trigger a pyro.param() statement. For example:

# Create Pyro-managed parameter attributes.
my_module = PyroModule()
my_module.loc = nn.Parameter(torch.tensor(0.))
my_module.scale = PyroParam(torch.tensor(1.),
                            constraint=constraints.positive)
# Read the attributes.
loc = my_module.loc  # Triggers a pyro.param statement.
scale = my_module.scale  # Triggers another pyro.param statement.

Note that, unlike normal torch.nn.Module s, PyroModule s should not be registered with pyro.module() statements. PyroModule s can contain other PyroModule s and normal torch.nn.Module s. Accessing a normal torch.nn.Module attribute of a PyroModule triggers a pyro.module() statement. If multiple PyroModule s appear in a single Pyro model or guide, they should be included in a single root PyroModule for that model.

PyroModule s synchronize data with the param store at each setattr, getattr, and delattr event, based on the nested name of an attribute:

Setting mod.x = x_init tries to read x from the param store. If a value is found in the param store, that value is copied into mod and x_init is ignored; otherwise x_init is copied into both mod and the param store.
Reading mod.x tries to read x from the param store. If a value is found in the param store, that value is copied into mod; otherwise mod’s value is copied into the param store. Finally mod and the param store agree on a single value to return.
Deleting del mod.x removes a value from both mod and the param store.

Note two PyroModule of the same name will both synchronize with the global param store and thus contain the same data. When creating a PyroModule, then deleting it, then creating another with the same name, the latter will be populated with the former’s data from the param store. To avoid this persistence, either pyro.clear_param_store() or call clear() before deleting a PyroModule .

PyroModule s can be saved and loaded either directly using torch.save() / torch.load() or indirectly using the param store’s save() / load() . Note that torch.load() will be overridden by any values in the param store, so it is safest to pyro.clear_param_store() before loading.

Samples

To create a Pyro-managed random attribute, set that attribute using the PyroSample helper, specifying a prior distribution. Reading that attribute will then trigger a pyro.sample() statement. For example:

# Create Pyro-managed random attributes.
my_module.x = PyroSample(dist.Normal(0, 1))
my_module.y = PyroSample(lambda self: dist.Normal(self.loc, self.scale))

# Sample the attributes.
x = my_module.x  # Triggers a pyro.sample statement.
y = my_module.y  # Triggers one pyro.sample + two pyro.param statements.

Sampling is cached within each invocation of .__call__() or method decorated by pyro_method() . Because sample statements can appear only once in a Pyro trace, you should ensure that traced access to sample attributes is wrapped in a single invocation of .__call__() or method decorated by pyro_method() .

To make an existing module probabilistic, you can create a subclass and overwrite some parameters with PyroSample s:

class RandomLinear(nn.Linear, PyroModule):  # used as a mixin
    def __init__(self, in_features, out_features):
        super().__init__(in_features, out_features)
        self.weight = PyroSample(
            lambda self: dist.Normal(0, 1)
                             .expand([self.out_features,
                                      self.in_features])
                             .to_event(2))

Mixin classes

PyroModule can be used as a mixin class, and supports simple syntax for dynamically creating mixins, for example the following are equivalent:

# Version 1. create a named mixin class
class PyroLinear(nn.Linear, PyroModule):
    pass

m.linear = PyroLinear(m, n)

# Version 2. create a dynamic mixin class
m.linear = PyroModule[nn.Linear](m, n)

This notation can be used recursively to create Bayesian modules, e.g.:

model = PyroModule[nn.Sequential](
    PyroModule[nn.Linear](28 * 28, 100),
    PyroModule[nn.Sigmoid](),
    PyroModule[nn.Linear](100, 100),
    PyroModule[nn.Sigmoid](),
    PyroModule[nn.Linear](100, 10),
)
assert isinstance(model, nn.Sequential)
assert isinstance(model, PyroModule)

# Now we can be Bayesian about weights in the first layer.
model[0].weight = PyroSample(
    prior=dist.Normal(0, 1).expand([28 * 28, 100]).to_event(2))
guide = AutoDiagonalNormal(model)

Note that PyroModule[...] does not recursively mix in PyroModule to submodules of the input Module; hence we needed to wrap each submodule of the nn.Sequential above.

Parameters:	name (str) – Optional name for a root PyroModule. This is ignored in sub-PyroModules of another PyroModule.

named_pyro_params(prefix='', recurse=True)[source]¶

Returns an iterator over PyroModule parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters:	prefix (str) – prefix to prepend to all parameter names. recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
Returns:	a generator which yields tuples containing the name and parameter

pyro_method(fn)[source]¶

Decorator for top-level methods of a PyroModule to enable pyro effects and cache pyro.sample statements.

This should be applied to all public methods that read Pyro-managed attributes, but is not needed for .forward().

clear(mod)[source]¶

Removes data from both a PyroModule and the param store.

Parameters:	mod (PyroModule) – A module to clear.

AutoRegressiveNN¶

class AutoRegressiveNN(input_dim, hidden_dims, param_dims=[1, 1], permutation=None, skip_connections=False, nonlinearity=ReLU())[source]¶

Bases: pyro.nn.auto_reg_nn.ConditionalAutoRegressiveNN

An implementation of a MADE-like auto-regressive neural network.

Example usage:

>>> x = torch.randn(100, 10)
>>> arn = AutoRegressiveNN(10, [50], param_dims=[1])
>>> p = arn(x)  # 1 parameters of size (100, 10)
>>> arn = AutoRegressiveNN(10, [50], param_dims=[1, 1])
>>> m, s = arn(x) # 2 parameters of size (100, 10)
>>> arn = AutoRegressiveNN(10, [50], param_dims=[1, 5, 3])
>>> a, b, c = arn(x) # 3 parameters of sizes, (100, 1, 10), (100, 5, 10), (100, 3, 10)

Parameters:

input_dim (int) – the dimensionality of the input variable
hidden_dims (list[int]) – the dimensionality of the hidden units per layer
param_dims (list[int]) – shape the output into parameters of dimension (p_n, input_dim) for p_n in param_dims when p_n > 1 and dimension (input_dim) when p_n == 1. The default is [1, 1], i.e. output two parameters of dimension (input_dim), which is useful for inverse autoregressive flow.
permutation (torch.LongTensor) – an optional permutation that is applied to the inputs and controls the order of the autoregressive factorization. in particular for the identity permutation the autoregressive structure is such that the Jacobian is upper triangular. By default this is chosen at random.
skip_connections (bool) – Whether to add skip connections from the input to the output.
nonlinearity (torch.nn.module) – The nonlinearity to use in the feedforward network such as torch.nn.ReLU(). Note that no nonlinearity is applied to the final network output, so the output is an unbounded real number.

Reference:

MADE: Masked Autoencoder for Distribution Estimation [arXiv:1502.03509] Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle

forward(x)[source]¶: The forward method

ConditionalAutoRegressiveNN¶

class ConditionalAutoRegressiveNN(input_dim, context_dim, hidden_dims, param_dims=[1, 1], permutation=None, skip_connections=False, nonlinearity=ReLU())[source]¶

Bases: torch.nn.modules.module.Module

An implementation of a MADE-like auto-regressive neural network that can input an additional context variable. (See Reference [2] Section 3.3 for an explanation of how the conditional MADE architecture works.)

Example usage:

>>> x = torch.randn(100, 10)
>>> y = torch.randn(100, 5)
>>> arn = ConditionalAutoRegressiveNN(10, 5, [50], param_dims=[1])
>>> p = arn(x, context=y)  # 1 parameters of size (100, 10)
>>> arn = ConditionalAutoRegressiveNN(10, 5, [50], param_dims=[1, 1])
>>> m, s = arn(x, context=y) # 2 parameters of size (100, 10)
>>> arn = ConditionalAutoRegressiveNN(10, 5, [50], param_dims=[1, 5, 3])
>>> a, b, c = arn(x, context=y) # 3 parameters of sizes, (100, 1, 10), (100, 5, 10), (100, 3, 10)

Parameters:

input_dim (int) – the dimensionality of the input variable
context_dim (int) – the dimensionality of the context variable
hidden_dims (list[int]) – the dimensionality of the hidden units per layer
param_dims (list[int]) – shape the output into parameters of dimension (p_n, input_dim) for p_n in param_dims when p_n > 1 and dimension (input_dim) when p_n == 1. The default is [1, 1], i.e. output two parameters of dimension (input_dim), which is useful for inverse autoregressive flow.
permutation (torch.LongTensor) – an optional permutation that is applied to the inputs and controls the order of the autoregressive factorization. in particular for the identity permutation the autoregressive structure is such that the Jacobian is upper triangular. By default this is chosen at random.
skip_connections (bool) – Whether to add skip connections from the input to the output.
nonlinearity (torch.nn.module) – The nonlinearity to use in the feedforward network such as torch.nn.ReLU(). Note that no nonlinearity is applied to the final network output, so the output is an unbounded real number.

Reference:

1. MADE: Masked Autoencoder for Distribution Estimation [arXiv:1502.03509] Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle

2. Inference Networks for Sequential Monte Carlo in Graphical Models [arXiv:1602.06701] Brooks Paige, Frank Wood

forward(x, context)[source]¶: The forward method

get_permutation()[source]¶: Get the permutation applied to the inputs (by default this is chosen at random)