I understand how convolution works but I don’t get how 1D convolutions are applied to 2D data.
In this example you can see a 2D convolution in a 2D data.
But how it would be if was a 1D convolution?
Just a 1D kernel sliding in the same way? And if the stride was 2?
Let $x_1, …,x_n $ be a sequence of vectors (e.g., word vectors). Applying a convolutional layer is equivalent to applying the same weight matrices to all n-grams, where $n$ is the height of your filter. E.g., if $n=3$, you can visualize it as follows:
For a slightly more mathematical explanation, you can check out
Ji Young Lee, Franck Dernoncourt. “Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks“. NAACL 2016. section 2.1.2: