# When generating samples using variational autoencoder, we decode samples from N(0,1)N(0,1) instead of μ+σN(0,1)\mu + \sigma N(0,1)

Context: I’m trying to understand the use of variational autoencoders as generators. My understanding:

• During training, for an input point $$xix_i$$ we want to learn latent $$μi\mu_i$$ and $$σi\sigma_i$$ and then sample $$zi∼N(μi,σi)z_i \sim N(\mu_i, \sigma_i)$$ and feed it to the decoder to get a reconstruction $$ˆxi=decode(zi)\hat{x}_i = \text{decode}(z_i)$$.
• But we can’t do back propagation with sampling operator, so instead we reparametrize and use $$zi=μi+σiϵz_i = \mu_i + \sigma_i \epsilon$$ where $$ϵ∼N(0,1)\epsilon \sim N(0, 1)$$. Our reconstruction becomes $$ˆxi=decode(μi+σiϵ)\hat{x}_i = \text{decode}(\mu_i + \sigma_i \epsilon)$$.

However when we’re done with training and ready to use it as generator, we sample $$z∼N(0,1)z \sim N(0, 1)$$ and feed it to decoder: $$xsample=decode(z)x_{sample} = \text{decode}(z)$$ .

The part that confuse me is that during training, the decode operation was done using $$μi+σiϵ\mu_i + \sigma_i \epsilon$$ which to my understanding this is using $$N(μi,σi)N(\mu_i, \sigma_i)$$ with different $$μi\mu_i$$ and $$σi\sigma_i$$ for each training example. However during the generation time, the decode operation is done (effectively) on $$ϵ\epsilon$$ alone from $$N(0,1)N(0, 1)$$. Why are we setting $$μ=0\mu = 0$$ and $$σ=1\sigma = 1$$ during generation (i.e. using $$z=0+1⋅ϵz = 0 + 1 \cdot \epsilon$$)?

During training, we are drawing $$z∼P(z|x)z \sim P(z|x)$$, and then decoding with $$ˆx=g(z)\hat x = g(z)$$.
During generation, we are drawing $$z∼P(z)z \sim P(z)$$, and then decoding $$x=g(z)x = g(z)$$.