I’m building a sentence classifier with a Convolutional Neural Network (CNN) architecture. I would like to do the word embedding outside of my CNN using a pre-trained model such as GoogleNews (which is based on word2vec). I’m wondering if it is worthwhile to add part-of-speech information to this model and if so, how?
I see the following options:
- Use just word2vec to embed words into 300-features vectors
- Use two channels in my CNN – one for word2vec and one for part-of-speech tag. Do I have to then embed the part of speech tag into 300 features too?
- Embed the part-of-speech tag to some other number of features (say 20 features) and concatenate this 20-feature vector to the word2vec vector (resulting with 320-feature vectors)
If #2 or #3 are preferable, which methods are available to embed a POS tag to a vector representation?
1. Concatenating word2vec and POS features
Adding POS information to your classifier is fine. You will of course want to create a train/dev/test split, eg 5-way cross-validation, to test to what extent adding this information improves your results (it’s data dependent, really depends on your data, only you can test this, using your own data).
To combine the POS and word2vec features, you can simply concatenate them. I assume when you say ‘CNN’, you mean ‘1-dimensional CNN’, is that right? So your input data if you were just using word2vec features, would be something like:
[batch size][sequence length][word2vec dimensions (ie 300)]
batch size * sequence_length * word2vec_dim sized tensor. So, concatenating with the POS features, your input data tensor will become:
[batch size][sequence length][ word2vec dimensions (ie 300) + POS dimensions (ie 20)]
batch size * sequence_length * 320 sized tensor.
You might also want to check out sense2vec, from Trask et al, 2016, https://arxiv.org/pdf/1511.06388.pdf , which makes use of POS information to disambiguate word2vec embeddings:
“This paper presents a novel approach which addresses these concerns by
modeling multiple embeddings for each word based on supervised disambiguation,
which provides a fast and accurate way for a consuming NLP model to select
a sense-disambiguated embedding. We demonstrate that these embeddings can
disambiguate both contrastive senses such as nominal and verbal senses as well
as nuanced senses such as sarcasm. We further evaluate Part-of-Speech disambiguated
embeddings on neural dependency parsing, yielding a greater than 8%
average error reduction in unlabeled attachment scores across 6 languages.”