# What does it mean to take the expectation with respect to a probability distribution?

I see this expectation in a lot of machine learning literature:

$$\mathbb{E}_{p(\mathbf{x};\mathbf{\theta})}[f(\mathbf{x};\mathbf{\phi})] = \int p(\mathbf{x};\mathbf{\theta}) f(\mathbf{x};\mathbf{\phi}) d\mathbf{x}$$

For example, in the context of neural networks, a slightly different version of this expectation is used as a cost function that is computed using Monte Carlo integration.

However, I am a bit confused about the notation that is used, and would highly appreciate some clarity. In classical probability theory, the expectation:

$$\mathbb{E}[X] = \int_x x \cdot p(x) \ dx$$

Indicates the “average” value of the random variable $$X$$. Taking it a step further, the expectation:

$$\mathbb{E}[g(X)]=\int_x g(x) \cdot p(x) \ dx$$

Indicates the “average” value of the random variable $$Y=g(X)$$. From this, it seems that the expectation:

$$\mathbb{E}_{p(\mathbf{x};\mathbf{\theta})}[f(\mathbf{x};\mathbf{\phi})]$$

Is shorthand for and the same as:

$$\mathbb{E}_{\mathbf{x}}[f(\mathbf{x};\mathbf{\phi})]$$

Where:

$$\mathbf{x} \sim p(\mathbf{x};\mathbf{\theta})$$

And this indicates the average value of the random vector $$\mathbf{y} = f(\mathbf{x};\mathbf{\phi})$$. Is this correct?

By this logic, would this statement be correct too?

$$\mathbb{E}[X] = \mathbb{E}_{p(X)}[X]$$