Autoencoder Models#
Vanilla Autoencoder#
- class pyautoencoder.vanilla.AE(encoder: Module, decoder: Module)#
Bases:
BaseAutoencoderVanilla Autoencoder composed of a user-defined encoder and decoder.
The model follows the
BaseAutoencoderinterface and implements:_encode(x)– maps inputsxto latent codesz._decode(z)– maps latent codeszto reconstructionsx_hat.forward(x)– full training forward pass returning bothzandx_hat.
The encoder and decoder are arbitrary
torch.nn.Moduleinstances that define the mapping between data space and latent space.- __init__(encoder: Module, decoder: Module)#
Construct an Autoencoder from an encoder and decoder module.
- Parameters:
encoder (nn.Module) – Module implementing the mapping
x → z.decoder (nn.Module) – Module implementing the mapping
z → x_hat.
- compute_loss(x: Tensor, ae_output: AEOutput, likelihood: str | LikelihoodType = LikelihoodType.GAUSSIAN) LossResult#
Compute Autoencoder reconstruction loss.
The scalar loss is the batch-mean reconstruction negative log-likelihood (NLL). The method also computes diagnostics to monitor model behavior.
- Parameters:
x (torch.Tensor) – Ground-truth inputs of shape
[B, ...].ae_output (AEOutput) –
Output from the AE forward pass. Expected fields include:
x_hat(torch.Tensor): Reconstructions, shape[B, ...].z(torch.Tensor): Latent representation (unused by this method).
likelihood (str | LikelihoodType, optional) – Likelihood model for computing the reconstruction term (
'gaussian'or'bernoulli'). Defaults to Gaussian.
- Returns:
Result containing:
objective – Scalar batch-mean reconstruction NLL (in nats).
diagnostics – Dictionary with:
"log_likelihood": Negative of the objective (batch-mean log-likelihood).
- Return type:
LossResult
Notes
Reductions follow:
Elementwise log-likelihood.
Sum over feature dimensions.
Mean over the batch.
Ensure that inputs match the chosen likelihood:
Gaussian: continuous data (typically standardized).
Bernoulli: targets in \([0, 1]\), predictions given as logits.
- forward(x: Tensor) AEOutput#
Full training forward pass with gradients.
- Parameters:
x (torch.Tensor) – Input batch of shape
[B, ...].- Returns:
Dataclass containing both the reconstruction
x_hatand the latent codez.- Return type:
AEOutput
Data Structures
- class pyautoencoder.vanilla.AEEncodeOutput(z: Tensor)
Output of the Autoencoder encoder stage.
- z
Latent code of shape
[B, ...]produced byAE._encode()orAE.encode().- Type:
torch.Tensor
- class pyautoencoder.vanilla.AEDecodeOutput(x_hat: Tensor)
Output of the Autoencoder decoder stage.
- x_hat
Reconstruction (or logits) of shape
[B, ...]produced byAE._decode()orAE.decode().- Type:
torch.Tensor
- class pyautoencoder.vanilla.AEOutput(x_hat: Tensor, z: Tensor)
Output of the full Autoencoder forward pass.
- x_hat
Reconstruction (or logits) of shape
[B, ...]produced byAE.forward().- Type:
torch.Tensor
- z
Latent code of shape
[B, ...]produced byAE.forward().- Type:
torch.Tensor
Variational Autoencoder#
- class pyautoencoder.variational.VAE(encoder: Module, decoder: Module, latent_dim: int)#
Bases:
BaseAutoencoderVariational Autoencoder following Kingma & Welling (2013).
The model consists of:
an encoder mapping
x → f(x)(feature representation),a fully factorized Gaussian head producing
(z, mu, log_var),a decoder mapping latent samples
z → x_hat.
Training uses Monte Carlo samples
zfor the reparameterization trick; evaluation mode returns deterministic repeated means.- __init__(encoder: Module, decoder: Module, latent_dim: int)#
Construct a Variational Autoencoder from an encoder, decoder, and latent size.
- Parameters:
encoder (nn.Module) – Maps input
xto a feature vectorf(x)with shape[B, F].decoder (nn.Module) – Maps latent samples
zto reconstructionsx_hat.latent_dim (int) – Dimensionality
D_zof the latent space.
Notes
A
FullyFactorizedGaussiansampling layer is created internally and not exposed as a constructor parameter.
- compute_loss(x: Tensor, vae_output: VAEOutput, beta: float = 1, likelihood: str | LikelihoodType = LikelihoodType.GAUSSIAN) LossResult#
Compute the Evidence Lower Bound (ELBO) for a (beta-)Variational Autoencoder.
This method implements the beta-VAE objective:
\[\mathcal{L}(x; \beta) = \mathbb{E}_{q(z \mid x)}[\log p(x \mid z)] \;-\; \beta \, \mathrm{KL}(q(z \mid x) \,\|\, p(z)).\]The reconstruction term \(\log p(x \mid z)\) is computed using
loss.base.log_likelihood(), which supports both Gaussian and Bernoulli likelihoods.Monte Carlo estimation#
If
x_hatinvae_outputcontainsSMonte Carlo samples, the expectation \(\mathbb{E}_{q(z \mid x)}\) is approximated by:\[\mathbb{E}_{q(z \mid x)}[\log p(x \mid z)] \approx \frac{1}{S} \sum_{s=1}^{S} \log p(x \mid z^{(s)}).\]Broadcasting#
If
x_hathas shape[B, ...], it is expanded to[B, 1, ...].xis broadcast to match the sample dimension ofx_hat.
- param x:
Ground-truth inputs of shape
[B, ...].- type x:
torch.Tensor
- param vae_output:
Output from the VAE forward pass. Expected fields include:
x_hat(torch.Tensor): Reconstructed samples, shape[B, ...]or[B, S, ...].mu(torch.Tensor): Mean of \(q(z \mid x)\), shape[B, D_z].log_var(torch.Tensor): Log-variance of \(q(z \mid x)\), shape[B, D_z].
- type vae_output:
VAEOutput
- param beta:
Weighting factor for the KL term (beta-VAE).
beta = 1yields the standard VAE. Defaults to 1.- type beta:
float, optional
- param likelihood:
Likelihood model for the reconstruction term (
'gaussian'or'bernoulli'). Defaults to Gaussian.- type likelihood:
str | LikelihoodType, optional
- returns:
Result containing:
objective – Negative mean ELBO (scalar).
diagnostics – Dictionary with:
"elbo": Mean ELBO over the batch."log_likelihood": Mean reconstruction term \(\mathbb{E}_{q}[\log p(x \mid z)]\)."kl_divergence": Mean \(\mathrm{KL}(q \,\|\, p)\) over the batch.
- rtype:
LossResult
Notes
All returned diagnostics are batch means.
Gradients flow through the decoder; neither input is detached.
- forward(x: Tensor, S: int = 1) VAEOutput#
Full VAE pass: encode, sample
Stimes, decode.- Parameters:
x (torch.Tensor) – Input batch of shape
[B, ...].S (int, optional) – Number of latent samples for Monte Carlo estimates. Defaults to 1.
- Returns:
Contains reconstructions
x_hat, latent samplesz, and the posterior parametersmuandlog_var.- Return type:
VAEOutput
Notes
If
S > 1, loss computation can broadcastxto shape[B, S, ...]without materializing copies. For Bernoulli likelihoods, the decoder must output logits.
Data Structures
- class pyautoencoder.variational.VAEEncodeOutput(z: Tensor, mu: Tensor, log_var: Tensor)
Output of the VAE encoder stage.
- z
Latent samples of shape
[B, S, D_z], produced byVAE._encode()orVAE.encode().- Type:
torch.Tensor
- mu
Mean of the approximate posterior
q(z \mid x), shape[B, D_z].- Type:
torch.Tensor
- log_var
Log-variance of
q(z \mid x), shape[B, D_z].- Type:
torch.Tensor
- class pyautoencoder.variational.VAEDecodeOutput(x_hat: Tensor)
Output of the VAE decoder stage.
- x_hat
Reconstructions or logits of shape
[B, S, ...], produced byVAE._decode()orVAE.decode().- Type:
torch.Tensor
- class pyautoencoder.variational.VAEOutput(x_hat: Tensor, z: Tensor, mu: Tensor, log_var: Tensor)
Output of a full VAE forward pass.
- x_hat
Reconstructions or logits of shape
[B, S, ...], produced byVAE.forward().- Type:
torch.Tensor
- z
Latent samples of shape
[B, S, D_z], produced byVAE.forward().- Type:
torch.Tensor
- mu
Mean of
q(z \mid x), shape[B, D_z].- Type:
torch.Tensor
- log_var
Log-variance of
q(z \mid x), shape[B, D_z].- Type:
torch.Tensor
Adaptive Group Variational Autoencoder#
- class pyautoencoder.variational.AdaGVAE(vae: VAE)#
Bases:
ModuleAdaptive Group Variational Autoencoder (Ada-GVAE), from Locatello et al. (2020).
Wraps a
VAEand adds adaptive posterior grouping for feature disentanglement. All VAE parameters are tracked through this wrapper.forward()expects a pair of inputs(x1, x2)and returns anAdaGVAEOutputwith adapted latent representations for both. For single-image inference after training, usemodel.vae.encodeandmodel.vae.decodedirectly.- compute_loss(x: tuple[Tensor, Tensor], vae_output: AdaGVAEOutput, beta: float = 1, likelihood: str | LikelihoodType = LikelihoodType.GAUSSIAN) LossResult#
Compute the combined ELBO for a pair of inputs with adapted posteriors.
\[\mathcal{L}(x_1, x_2; \beta) = \left[ \mathbb{E}_{q(\hat{z} \mid x_1)}[\log p(x_1 \mid \hat{z})] \;-\; \beta \, \mathrm{KL}(q(\hat{z} \mid x_1) \,\|\, p(\hat{z})) \right] + \left[ \mathbb{E}_{q(\hat{z} \mid x_2)}[\log p(x_2 \mid \hat{z})] \;-\; \beta \, \mathrm{KL}(q(\hat{z} \mid x_2) \,\|\, p(\hat{z})) \right].\]- Parameters:
x (tuple[torch.Tensor, torch.Tensor]) – The
(x1, x2)pair of ground-truth inputs, each of shape[B, ...].vae_output (AdaGVAEOutput) – Output from
forward()called in training mode.beta (float, optional) – KL weighting factor.
beta = 1yields the standard objective. Defaults to 1.likelihood (str | LikelihoodType, optional) – Likelihood model for the reconstruction term (
'gaussian'or'bernoulli'). Defaults to Gaussian.
- Returns:
Result containing:
objective – Sum of negative ELBOs for both inputs (scalar).
diagnostics – Dictionary with:
"elbo": Sum of mean ELBOs for both inputs."log_likelihood_x1": Mean reconstruction term forx1."log_likelihood_x2": Mean reconstruction term forx2."kl_divergence_x1": Mean KL divergence forx1."kl_divergence_x2": Mean KL divergence forx2.
- Return type:
LossResult
- forward(x: tuple[Tensor, Tensor], S: int = 1) AdaGVAEOutput#
AdaGVAE training pass on a pair of images.
For single-image inference after training use
model.vae.encodeandmodel.vae.decode.- Parameters:
x (tuple[torch.Tensor, torch.Tensor]) – A
(x1, x2)pair, each of shape[B, ...].S (int, optional) – Number of latent samples for Monte Carlo estimates. Defaults to 1.
- Returns:
Adapted pair outputs containing reconstructions and posterior parameters for both inputs.
- Return type:
AdaGVAEOutput