You could throw in some elements of theory. For example:
- the naive bayes classifier assumes that all variables are independent. Maybe that's not the case? But this classifier is fast and easy, so it's still a good choice for many problems, even if the variables are not really independent.
- the linear regression gives too much weight on samples that are far away from the classification boundary. That's usually a bad idea.
- the logistic regression is an attempt to fix this problem, but still assumes a linear correlation between the input variables. In other words, the boundary between the classes is a plane in the input variable space.
When I study a dataset, I typically start by drawing the distribution of each variable for each class of samples to find the most discriminating variables.
Then, for each class of samples, I usually plot a given input variable versus another to study the correlations between the variables: are there non-linear correlations? if yes, I might choose classifiers that can handle such correlations. Are there strong correlations between two input variables? if yes, one of the variables could be dropped to reduce the dimensionality of the problem.
These plots will also allow you to spot problems in your dataset.
But after all, trying many classifiers and optimizing their parameters for best results in the cross validation as you have done is a pragmatic and valid approach, and this has to be done at some point anyway.
I understand from the tags in this post that you have used the classifiers of scikit-learn. In case you have not noticed yet, this package provides powerful tools for cross validation as well http://scikit-learn.org/stable/modules/cross_validation.html