WeVerify Annotation team @EUvsVirus Hackathon

By on May 3rd, 2020 in Events, News

Europe has joined forces in order to develop innovative solutions for coronavirus-related challenges in the EUvsVirus hackathon, the official EU Commission hackathon to fight COVID-19.

Among the many areas which participants were invited to address and work on, our team focused on the Social & Political Cohesion – Mitigating fake news spreading, where our experience and knowledge about disinformation helped us understand the problem and suggest solutions.

A team was formed, coordinated by the University of Sheffield, with participation of ATC and the MeVer team of MKLab CERTH. A brainstorming session between the team members led to the problem that we would solve:

The COVID-19 pandemic has given rise to an online “disinfodemic”, with dangerous real-life consequences (burnt 5G masts, deaths from unproven and dangerous “cures”, disregard for healthcare advice).

The International Fact-Checking Network, in turn, has over 100 fact-checkers working daily in over 70 countries, who have so far debunked over 3500 false stories, but comparing this to the huge amount of posts about COVID-19 shared daily through social media and other platforms, this is just a drop in the ocean (see picture below).

Fewer than 200 debunked stories a day against 50 million COVID-19 tweets daily many of which are aimed at misinformation
Fewer than 200 debunked stories a day against millions of daily social media posts on COVID-19, many of which contain misinformation

Existing automatic tools for fact-checking and disinformation analysis have been optimised for accuracy on political disinformation, so they are significantly less accurate when applied to COVID-19 disinformation. Moreover, COVID-19 disinformation falls into new, distinct categories (e.g. origin, social distancing, government lockdown policies) and there are no tools, which – given a social media post – will classify that according to such COVID-19 specific categories. The ability to do so automatically and reliably is paramount, as fact-checkers can then easily navigate prior debunks related only to the relevant topic (e.g. origin) in order to speed-up their work.

As the ‘WeVerify Annotation team’, we proposed open, scalable and cost-effective solutions that increase the efficiency of processing and classifying information around COVID-19. Solutions like these should already be in the hands of professionals like journalists, researchers, and everyone working on high volumes of COVID-19 (dis)information.

Our Goal was:

  • to provide journalists, media, fact-checkers and other professionals with an AI-based COVID-19-tailored solution to help automate part of their workflow and minimise repetitive tasks
  • to provide these professionals with real-time analytics and insights into COVID-19 information like trending topics, categories with the highest volume of disinformation, etc.

To achieve our goal, we coordinated the collection of large amounts of high-quality humanly-annotated data and developed open source deep learning AI techniques to automatically identify and cluster COVID-19 (dis)information into 10 categories which were derived by an extensive analysis conducted by the Reuters Institute for the Study of Journalism:

  • Public authority actions, policy, and communications
  • Community spread and impact
  • Medical advice and self-treatments
  • Claims about prominent actors
  • Conspiracy theories
  • Virus transmission
  • Virus origin and properties
  • Public preparedness
  • Vaccines, medical treatments, and tests
  • Protests and civil disobedience
  • Other

We also developed an Elastic search/Kibana proof-of-concept interface (the interface is password protected. Please contact Kalina Bontcheva – k.bontcheva@sheffield.ac.uk for access details if you are interested in obtaining access) that demonstrates the added benefits of the automatic categorisation, through easy-to-use visualisations.

Automatic Disinformation Classification — Demo
Automatic Disinformation Classification — Demo

Post-hackathon, we envisage the deployment in two main solutions. One operating as a standalone service for individual users (fact-checkers, researchers, etc.) who wish to classify and analyse a corpus of claims, and the other interfacing with Content Management Systems aimed at organisations that wish to integrate automatic COVID-19 information classification features into their media workflows.

Our next steps will be to adapt the COVID-19 information categorisation to languages other than English (FR, DE, ES, etc.), deploy it as Software-as-a-Service, implement a full disinformation categorisation and exploration user interface, and adapt to other medical disinformation areas, e.g. anti-vaccination.

Video Pitch: Automatic Disinformation Classification based on Deep Learning

Special thanks

In addition to the team members, thanks are due to:

Johann Petrak, who hacked the web UI for annotating COVID-19 disinformation classification and took care of many methodological aspects;

Mia Polovina, Andreas Grivas, Steven Zimmerman, Zlatina Marinova, Nikos Sarris, James Wood, Diana Maynard, Julia Ive, Lamiece Hassan, Tosin Dairo, Jon Chamberlain, Francesco Lomonaco, Tasos Papastylianou, James Allen-Robertson, Themis Makedas, Symeon Papadopoulos who all volunteered time to manually annotate some disinformation examples so we can train the machine learning models.

Authors: Olga Papadopoulous (CERTH-ITI) and Kalina Bontcheva (University of Sheffield)
Editor: Jochen Spangenberg (Deutsche Welle)