Is a Bayesian Classifier a good approach for text with numerical meta-data?

I’m trying to come up with an approach for detecting scam adverts on my website. I think the problem has a lot in common with detecting spam email (for which a naive Bayesian classifier is a common solution) since many of the signals that indicate a scam will be found within the text of the advert.

However, there are certain other pieces of information which can be good scam indicators, but I’m not sure if/how a Bayes classifier could use them, because they involve numeric values (with values at the extremes of the range being suspicious) rather than simple binary values corresponding to the presence or absence of a word in the text.

For example, many scam adverts have the price of the item set very low (to attract lots of views), so I would like a lower than normal price to be a strong indicator that the advert may be a scam.

Is Bayes still a good fit for my requirement, if not then could you recommend a different approach?

Answer

Sure you can use Naive Bayes. You just have to specify what form the conditional distribution will have.

I can think of a few options:

  1. Binary distribution: Binarize your data using a threshold, and you revert to the problem that you were already solving.
  2. Parametric distribution: If there is some reasonable parametric distribution, e.g. Gaussian, you can use that.
  3. Non-parametric distribution: Decide on bins for the numerical data and use those to construct an empirical non-parametric distribution.

Attribution
Source : Link , Question Author : codebox , Answer Author : Bitwise

Leave a Comment