Pergunta

I am running the naive bayes classifier algorithm through apache mahout. We have the option to set up the gram size while training and running the algorithm's instance.

Changing my n-Gram size from 1 to 2, changes the resulting classification drastically. Why does this happen? How does n-Grams size make a drastic change in the result?

Foi útil?

Solução

1-grams are words. 2-grams (or bigrams) are pairs of words. It's like classifying documents based on the existence of "United" and "States", or "United States". Using bigrams can have some space and performance implications, but probably will give better results than 1-grams.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top