Combine multiple classifiers to build a multi-modal classifier
-
16-10-2019 - |
문제
Suppose I am interested in classifying a set of instances composed by different content types, e.g.:
- a piece of text
- an image
as relevant
or non-relevant
for a specific class C
.
In my classification process I perform the following steps:
- Given a sample, I subdivide it in text and image
- A first SVM binary classifier (
SVM-text
), trained only on text, classifies the text asrelevant
/non-relevant
for the classC
- A second SVM binary classifier (
SVM-image
), trained only on images, classifies the image asrelevant
/non-relevant
for the classC
Both SVM-text
and SVM-image
produce an estimate of the probability of the analyzed content (text or image) of being relevant for the class C
. Given this, I am able to state whether the text is relevant for C
and the image is relevant for C
.
However, these estimates are valid for segments of the original sample (either the text or the image), while it is not clear how to obtain a general opinion on the whole original sample (text+image). How can I combine conveniently the opinions of the two classifiers, so as to obtain a classification for the whole original sample?
해결책
Basically, you can do one of two things:
- Combine features from both classifiers. I.e., instead of
SVM-text
andSVM-image
you may train singleSVM
that uses both - textual and visual features. - Use ensemble learning. If you already have probabilities from separate classifiers, you can simply use them as weights and compute weighted average. For more sophisticated cases there are Bayesian combiners (each classifier has its prior), boosting algorithms (e.g. see AdaBoost) and others.
Note, that ensembles where initially created for combining different learners, not different sets if features. In this later case ensembles have advantage mostly in cases when different kinds of features just can't be combined in a single vector efficiently. But in general, combing features is simpler and more straightforward.