After a long organisation process (explained in my last blogpost), the workshop on Recent Trends in News Information Retrieval (NewsIR) finally took place during the European Conference on Information Retrieval (ECIR) a couple of weeks ago in Italy.
The event started with a keynote from Jochen Leidner, the Head of Research of Thomson Reuters UK. During his presentation, he explained some of the early days innovations in Reuters. For instance, he explained how the founder of Reuters used pigeons to “be quicker than the news” that were being distributed by ship. He intercepted the news in Ireland and, using the pigeons was able to beat the ships that were travelling to London. I always find this idea fascinating, not alone because it considers pigeons to be cutting the edge research. Going several decades ahead, Thomson Reuters is doing a lot of R&D and some of the most interesting comments were the fact that they rely heavily in Spark and that they use Text Generation of news much more commonly that I thought. Apparently, a large percentage of the (Reuters) news we consume are at least semi-automatically generated.
After this, Jochen went into explaining several products that Reuters is currently working on. The one that was more interesting for me was a project focusing on trying to detect side effects of “popular” medicines by exploiting and analysing social media (i.e., Twitter). The initial step is to listen to all the 500M tweets that are generated in a given day and reduce it to those that mention some of the drugs of interest. In his case, Jochen focused on the 2,200 more commonly bought drugs, and his processing found 721 tweets/day referring to them. Given this data, it is possible to find correlations and unhide patterns of unknown side effects that should be documented. Also, in some cases, such side effects could potentially being use to cure or address other illnesses. Other cool projects that were mentioned were focused on company risk mining were a combination of heuristics, machine learning and human intervention was used to understand situations that will potentially jeopardise some companies. Finally, he also explain some of the most novel research they are doing in relation to rumour detection.
One of the aspects that Jochen was very clear on is that the combination of humans and computer systems is critical large-scale text processing pipelines. I share the same idea. However, how and where to use this human intervention is a much more difficult question. The other message that resonated with me is the idea that Thomson Reuters is transitioning from “news into actionable intelligence”. With the obvious differences, this is something that we are also aiming to achieve at Signal.
After the keynote, we had four papers to be presented with only 10 minutes allocated. The reason for this is that we also hold a poster session where each presented had to defend their work. This, we believe, was the best idea to encourage interaction. Based on the feedback we had after the workshop, this was probably one of the best decisions in terms of organisation.
Suzan Verberne was our first presenter and she reminded us all the fact that some tasks are recall oriented, where even missing a single relevant document is not acceptable. One of the traditional way of addressing this problem is to use very long and complex boolean queries based on keywords. Her work proposed a method to have candidate query terms suggestion from retrieved documents in order for the users to improve the quality of their searches. She also mentioned that although low precision might not be a problem for these users, it is a problem if we want to draw insights from the data.
Next in the session was Igor Brigadir. His research focused on event detection based on the diversity at specific points of time given social media. The main idea is that, given a specific query, if there is an event going on, a lot of people will be talking about it. Therefore, the diversity of tweets within the feed should decrease. The research should be easily applied to news data as well, and one of the ideas that Igor and I were talking about is the possibility of adding a tweet identifiers collection to enhance the Signal Media 1 Million Dataset. Once this is done, this version of the collection will represent news, blogs and tweets for the same given range of time. This could simplify, or even open, several areas of research such as event detection, news bias or influencer detection to name some.
The last paper of this session was the best paper award in the workshop. Stefano Mizzaro explained to us how we can exploit news articles to enhance tweet categorisation with collection enrichment using new sets of words, extracted from news on webpages of the same temporal context. They tested three different features of news namely volume, variety and freshness. The experiments confirmed the importance of all of these features.
At this point, we had the first coffee break and we could enjoy the amazing coffee and sweets from Italy. The food was absolutely fantastic. Coming back to the workshop, all the posters were already set up and the people had started to have interesting discussions during the break and we were very happy already.
The second session of the morning started with Michael Bender from Thomson Reuters. Michael explained how to use document extraction and a two steps clustering method where local clustering is applied first and then the individual clusters. Their main goal is to obtain and represent only events that matter. The paper also shown how they used human assessments on a 5 point scale where the average quality achieved was around 80%.
The next paper, presented by Andrey Kutuzov and Elizaveta Kuzmenko, illustrated how to represent specific entities using word embeddings and how a temporal shift for those representations could potentially predict specific events. In addition to this, they also shown how different semantic representations for sources in different languages/sources can be seen as a measure of bias.
Gregor Leban continued the session with a paper focused on data visualisation focusing on events rather than news. According to him, traditional news aggregation causes duplicates to be shown to the user and it does not allow for “big tail” requests such as “Provide news about Machine Learning or Artificial Intelligence”. Their approach is based on semantic annotation per document, event clustering and main event facts (driven by information extraction). Using these components, they are able to show events rather than articles to provide a more general understanding.
The last paper of the day illustrated how to explore a large news collection (the Signal Media Dataset) using data visualisation. Sérgio Nunes uncovered a lot of interesting information about the dataset using multiple visualisations, as part of the MediaViz Project.
At this point, the poster session was officially started and I was very please to see the people discussing their research in front of their posters. The interest was so high that people were still debating even after food was served downstairs…
The format of the workshop was quite different during the afternoon, where we wanted to have a discussion on the tasks we should be focusing on, and the collections that we have at our disposal. In a similar fashion as the morning session, we started with a keynote: Julio Gonzalo presented some of his research related to Online Reputation Management. In particular, Julio explained how they start their analysis in Twitter (because it is considered the “central system for PR companies”) by analysing what topics have been mentioned. Once this step is done, a filtering is applied, focusing only on the topics that are relevant for the users. PR and communication experts focus on the long tail, which makes this field extremely complicated. For instance, they might represent medium enterprises which are mentioned enough in the media to be relevant, but not enough so that traditional analysis could be applied. Other factor to consider in the space include the fact that the problem should be address from a recall point of view, where every article counts.
The presentation mentioned the RepLab where they have compiled 208K url for tweets for which they also have manual annotations. Julio also shared that 32% of those contained opinions, while 57% contained “polar facts” where there is a polarity even thought there is no subjectivity. For instance “Tesco is firing 15,000 people because their shares are dropping” would be a very negative story about Tesco, but a objective one nevertheless (Assuming it is a true story). These type of complexities make this problem very difficult to address and it requires a change of view in the evaluation process, where we should focus on user-based evaluation. Finally, Julio, as well as many other people over the conference, was advocating for systems that could easily be corrected via Adaptive Learning (or similar technologies). He went as far as to say that it is not critical if your system fails once, as long as we can incorporate this new knowledge and such error will not happen again.
The workshop then focus on the Signal Media 1 Million Documents with a talk from my colleague Dyaa Albakour about our collection. After explaining the rationale behind publishing the collection and presenting some of its features, we split the audience of the workshop in three groups in order for them to discuss the challenges we face, the data we would like to have, and the tasks we think we should focus on. At the end of this session, a representative from each group presented their outcomes. This is a summary of the main points that were raised:
Signal Media Dataset: People were very supportive and thankful for the new collection but they were also critic with some of its limitations. Firstly, the range chosen for it (September 2015) makes it impossible to be used for tasks (e.g., some temporal analysis) that require longer time spans. Other suggestions included the integration with other information sources like multimedia content from articles or more multilingual documents. The other very common request, and one enhancement that we are already looking into it with Igor Brigadir, is to incorporate a Twitter dataset over the same period to have a unified collection with news, blogs and twitter data over the same period. This will be very valuable for a multitude of different tasks. For instance, reputation management.
Tasks: Some of the most common discussed tasks were summarisation (either single document, multi-document or temporal based), entity linking and disambiguation, diversity and sentiment analysis. Another field that was mentioned several times is the evolution and verification of news. I would also include new bias, fact checking and controversy detection in this area of expertise.
One of the challenges we face for a potential NewsIR’17 is to decide if we want to focus on one or a few of these challenges. One of the main ideas that the audience came up with was to have a workshop with three types of tasks:
- Simple task using the collection. This could involve labelled data or not, but it should be a relatively straight forward task to experiment with and evaluate. One example could be to generate summaries for a subset of the articles from the collection in order to have a new single-document text summarisation collection.
- Complex and novel tasks that requires further research. My preference would be to focus this tasks in something either related to reputation management, specially if we manage to combine our collection with social media.
- Any other tasks using a news-related collection. Even if the workshop becomes more focused in the next edition, we would like to encourage people to continue any line of research they want to apply to the Signal Media Dataset in particular, and any news-related collection in general.
For the panel, we invited our two keynote speakers, Jochen and Julio to be joined by Gabriella Kazai and Stefano Mizzaro. We thought that a panel like this represents both the industry world, as well as the academic one, with expertise in a multitude of fields in IR and NLP being represented by the experience and knowledge of the panelists. The original line of questioning for the panel was the relationship and differences between academics and practitioners. However, based on the feedback during the workshop, I decided to change it and I focused instead on the following three topics:
1) Evaluation. Are me measuring the wrong thing?
Evaluation is a common topic in IR, and we are known for using the wrong metric for evaluation, therefore solving the wrong problem. The best example I can find of this is the fact that before diversity was considered, if we retrieved a relevant document and 9 duplicates of it, our precision would be 100%. Nonetheless, we all can agree that, in the majority of cases, despite the fact that all the documents are relevant, only one of them is useful for a user. Moving towards more complex tasks (e.g., reputation management) implies that we should focus on user satisfaction when designing the evaluation process. Stefano and Julio also mentioned that IR has a multitude of evaluation metrics already. We probably do not need new ones, but to use multiple of them to provide a better explanation. A very good suggestion from Jochen was to involve and invite journalists to the next NewsIR event. I think it is a fantastic idea and I will do my best to arrange it.
2) Social vs News vs Blogs. Are they different at all?
The media interaction landscape has dramatically changed in the last decade or so. Traditional news outlets, blogs, social media and personal journalism are now intrinsically linked and the boundaries between them are becoming thiner as time passes. I am not suggesting that traditional news are not still one of the most important and influential sources of information. However, some bloggers have now as much reputation (and potentially influence) as some of the major newspapers. In addition, we might struggle to classify some of the internet sources as news or blogs… The panelists agreed that the space is getting more complex with clear dependencies between some of the different mediums. For example, it is not uncommon that a tweet causes someone to create a short blogpost, that in return causes someone to verify a story and then write a short piece in a local newspaper that will eventually being picked up by a major worldwide publication. This symbiosis should be acknowledged and addressed by the community. On the other hand, while some blogs could be seen as an authority in a topic, a large portion of the social media space could be considered of low journalistic quality, with low (or none) verification of news and a high likelihood of spreading rumours. If we want to analyse all the sources in a unified way, we need to create the mechanisms to distinguish pieces of real news (from whatever source) and pseudo-news, rumours and clickbait.
3) How important is Trust and how can we help measuring it?
Trust is a critical factor in news that has been the foundation of real journalism since its inception. However, the explosion of new sources of information has increased the difficulty of knowing what sources are legitimate or trustworthy without doing detailed research. A very important question then is what do we mean by trustworthy? We live in a world where given a world event, we can find several points of views, some of which might be contradictory. What source do we then labelled as “trustworthy”? For instance, we can compute who are the main influencers in a given space, but we face a more difficult challenge when we try to measure if they are also experts in the field. Alternatively, some of us believe that the first step is to show the differences in opinion, educating the user and allowing him to see how the sources differ from each other.
We touched so many interesting topics during the workshop that the full day felt short of time. Also, everyone involved in the event was exceptionally helpful. I cannot be happier with how the event went…
W e will go back to the drawing board and start thinking about the next event, as always, with the support and collaboration of the community. If you want to know more, please subscribe to our Google discussion forum.
Once again, thank you everyone for a fantastic NewsIR’16!