I have spoken before about the Kaggle ecosystem and the Digit recognition challenge, and I have also shown how to improve the original version of the code. However, no quality improvement over the initial solution was attempted. This blogpost focuses exactly on that: What can we do to improve the quality of our results?
In the last blog, I focused on a basic piece of functionality that provided a solution for one of the Kaggle challenges using Python. This blogpost shows some improvements in the code itself, as well as the classification process:
- Removing some of the functionality that was available in public repositories (pandas)
- Adding logging capabilities
- Include quality evaluation and cross-validation
I think that every developer should periodically used more than one programming language and more than one programming paradigm to be knowledgable enough and to not develop a “tunnel vision” which makes us believe that some solutions are not possible just because our current paradigm does not support them.
For the last year or so, I have been using mainly one developing language (Clojure). Do not get me wrong, I believe Clojure is the future and I love it as a language, but I do think that being a polyglot developer is something we all should look forward to. Therefore, I have decided to fresh up my python skills going back to the basics and use it to solve one Kaggle competition.