I have recently come back from my first ever Python Conference (PyCon), and in fact, my first ever generalistic development conference. This was quite a new experience as I am used to either academic (e.g., ECIR) or data-centric (e.g., Strata) conferences. PyConUK was quite different in many ways to the events I am used to, and I could not be happier I have attended it. The main reason is that Marco Bonzanini and myself had a workshop on Natural Language Processing in Python during the conference, but I also saw this as a great opportunity to get involved in a community that I have never been close to, despite the fact that I have coded in Python (intermittently) for several years.
Search is omnipresent these days, from the moment we type a set of keywords into our favourite search engine to find a webpage we are looking for to the moment we type a name and expect our email client to find all the emails sent by that person. Both these processes are based on years of research and experimentation in the field of Information Retrieval in order to efficiently being able to find the most relevant documents.
This blogpost will show how to set up Elasticsearch, one of the best and most popular search engines (with Solr being the other main alternative). Its main characteristic is to allow unbelievable scalability and advance querying and indexing capabilities with minimum engineering effort. In addition to this, I will also shown how to perform some basic operations using elastisch, a fantastic library for elasticsearch written in Clojure.
I have been preparing a couple of talks I have to give in the next couple of weeks and I needed some pictures of the people working in Signal to have some nice images about the team and the company in general. Although we have some of them store online, I realised that our Twitter account had some of the best pictures, especially for the early days of the company. Almost at the same time, I was reading a blogpost about mining twitter data with python, written by my good friend and ex-colleague (in Queen Mary), Dr. Marco Bonzanini. These two events together seemed like a good excuse to build a little tool in python to download the pictures that a twitter account has published and this is the main focus of this post. I hope you find it useful, I definitely have…
In the last blog, I focused on a basic piece of functionality that provided a solution for one of the Kaggle challenges using Python. This blogpost shows some improvements in the code itself, as well as the classification process:
- Removing some of the functionality that was available in public repositories (pandas)
- Adding logging capabilities
- Include quality evaluation and cross-validation
I think that every developer should periodically used more than one programming language and more than one programming paradigm to be knowledgable enough and to not develop a “tunnel vision” which makes us believe that some solutions are not possible just because our current paradigm does not support them.
For the last year or so, I have been using mainly one developing language (Clojure). Do not get me wrong, I believe Clojure is the future and I love it as a language, but I do think that being a polyglot developer is something we all should look forward to. Therefore, I have decided to fresh up my python skills going back to the basics and use it to solve one Kaggle competition.