CLEF RepLab Overview

I was invited (by my friend Dr. Julio Gonzalo) to be one of the keynote speakers in a workshop about Reputation Management that took place at CLEF 2014 in Sheffield last week, and this post are my toughs about this event.

First of all, I regret going to the conference just for two days and having to miss PAN (about plagiarism and author identification and profiling) and some of the keynotes, specially the one given by Professor Fabio Ciravegna.

The main topic of RepLab was Reputation Management. This area focuses on the changes in perception by the user for different entities, usually companies or brands. Such changes might be due to global events, marketing campaigns or any other unforeseen situations. The traditional role of reputation management from the research point of view is to define, monitor and evaluate such changes, and provide some type of analysis or report to human analysts who will act accordingly to the situation, mainly mitigating potential threads or understanding and maximising potential rewards.

One factor that has changed the landscape of field is the omnipresence of social media, with Twitter as the biggest challenge for the online reputation management. Why twitter? Not only its vast amount of information, with 500 Million tweets a day, is a challenge, but the small quantity of data in each tweet makes any text analysis much more complex.

The workshop proposed several challenges to be addressed by the research community. One of them was quite interesting for me: given a specific tweet, the researchers had to predict the “type” of the author (journalist, professional, authority, activist, investor, company, or celebrity). This task proved to be very difficult (at least in the environment set by the organisers) and neither of the teams achieved high enough quality so the solution could be applied in a commercial product.

The workshop had several interesting papers and I would like to talk a bit more about one of them which (in one of its parts) extended the content and knowledge contained in tweets by using external sources in a simple, yet elegant way. Graham McDonald and the team at Glasgow proposed to exploit the content of a tweet as a query and then use this query in a completely different collection (ClueWeb) in order to obtain similar documents to the original tweet. After this step, they will analysed the top retrieved documents and select the most “representative” terms based on this subset to extend the representation of the tweet. This approach seems like a very clever use of document similarity to do “query expansion” with tweets and I would even say this could be used in other scenarios such as text classification, specially with noise or small documents, to have better document representations. In addition, several people proposed new approaches to the authors after the talk. I would like to hear more about this approach and to try to experiment with it within the classification domain (the paper can be found in here).

In addition to the other talks, we had an invited speaker from Llorente & Cuenca, one of the biggest PR firms in Spain, Portugal and Central and South America who provided a picture much closer to the user, explaining the context for the research filed as a whole. Llorente & Cuenca have been related to the workshop for the last three years, and during this time they have provided direct support to the organisers, as well as annotated data for the training and evaluation. The second keynote speaker was Jussi Karlgren who gave an inspiring presentation about how we are trying to solve research problems by focusing on the wrong part of the problem, mainly by (hyper-)tuning specific parameters and throwing all the data we have into a SVM classifier… his point, that I support to a large extent, is that we should focus much more into the users, the features we analyse and the evaluation of the process as a whole. He also criticise the use of F1 as an evaluation metric in papers as long as it is not accompanied by other metrics such as Precision and Recall. The main reason for this criticism is the fact that F1 hides some characteristics of the models by producing a unique number, and he claimed we do this so we are able to rank and compare models easily.

The last invited talk was mine, where I discussed the gap between the research community and the reality of commercial applications, specially from the reputation management and brand monitoring point of view. After a quite general background about myself and the field I focused on how we should focused much more on the user, both from the development and research point of view. This involves thinking about how users would use the system, as well as including them in the evaluation process in a deeper level. I am happy to say that the IR community is focusing more and more on user and task-based evaluation, but more work has to be done. The other main part of my talk was focused on how to merge research and reality by increasing the collaboration between the development and research groups within a company, as well as accepting that researchers are, in the great majority of cases, not good developers. One clear example is that most the tools and frameworks coming from the academic community would be considered substandard from the point of view of professional software developers. This is, in my honest opinion, one of the main drawbacks on research integration within real-life products. However, I am not a pessimist, I am sure we can (and must) improve this, but the solution might be to ask for help to the developers community. Surprisingly enough, several people in the workshop seemed to agree with my point of view and no-one throw anything at me…

Another idea that I wanted to put across was that more research should be done on how to integrate users to support semi-automatic systems, and the fact that “creating more training data” is a possibility in real life, in addition to change the algorithm to solve a given problem. I would like to see more research related to how to pick the best data (e.g., active learning and incremental learning) and how to integrate human knowledge into the process.

The last part of the talk was to introduce the philosophy we follow in Signal by founding the company on three main components: Research, Technology, and Product.

I do think that the community is improving its relation with the practitioners who apply research concepts, but more work has to be done to include users in the process and better tooling should be produced.

The Practical Academic

Merging the best research in Text Analytics with practical and commercial perspectives

CLEF RepLab Overview

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply