Among the many areas which participants were invited to address and work on, our team focused on the Social & Political Cohesion – Mitigating fake news spreading, where our experience and knowledge about disinformation helped us understand the problem and suggest solutions.
A team was formed, coordinated by the University of Sheffield, with participation of ATC and the MeVer team of CERTH. A brainstorming session between the team members led to the problem that we would solve:
The COVID-19 pandemic has given rise to an online “disinfodemic”, with dangerous real-life consequences (burnt 5G masts, deaths from unproven and dangerous “cures”, disregard for healthcare advice).
The International Fact-Checking Network, in turn, has over 100 fact-checkers working daily in over 70 countries, who have so far debunked over 3500 false stories, but comparing this to the huge amount of posts about COVID-19 shared daily through social media and other platforms, this is just a drop in the ocean (see picture below).
Existing automatic tools for fact-checking and disinformation analysis have been optimised for accuracy on political disinformation, so they are significantly less accurate when applied to COVID-19 disinformation. Moreover, COVID-19 disinformation falls into new, distinct categories (e.g. origin, social distancing, government lockdown policies) and there are no tools, which – given a social media post – will classify that according to such COVID-19 specific categories. The ability to do so automatically and reliably is paramount, as fact-checkers can then easily navigate prior debunks related only to the relevant topic (e.g. origin) in order to speed-up their work.
As the ‘WeVerify Annotation team’, we proposed open, scalable and cost-effective solutions that increase the efficiency of processing and classifying information around COVID-19. Solutions like these should already be in the hands of professionals like journalists, researchers, and everyone working on high volumes of COVID-19 (dis)information.
Our Goal was:
- to provide journalists, media, fact-checkers and other professionals with an AI-based COVID-19-tailored solution to help automate part of their workflow and minimise repetitive tasks
- to provide these professionals with real-time analytics and insights into COVID-19 information like trending topics, categories with the highest volume of disinformation, etc.
To achieve our goal, we coordinated the collection of large amounts of high-quality humanly-annotated data and developed open source deep learning AI techniques to automatically identify and cluster COVID-19 (dis)information into 10 categories which were derived by an extensive analysis conducted by the Reuters Institute for the Study of Journalism:
- Public authority actions, policy, and communications
- Community spread and impact
- Medical advice and self-treatments
- Claims about prominent actors
- Conspiracy theories
- Virus transmission
- Virus origin and properties
- Public preparedness
- Vaccines, medical treatments, and tests
- Protests and civil disobedience
We also developed an Elastic search/Kibana proof-of-concept interface (the interface is password protected. Please contact Kalina Bontcheva – firstname.lastname@example.org for access details if you are interested in obtaining access) that demonstrates the added benefits of the automatic categorisation, through easy-to-use visualisations.
Post-hackathon, we envisage the deployment in two main solutions. One operating as a standalone service for individual users (fact-checkers, researchers, etc.) who wish to classify and analyse a corpus of claims, and the other interfacing with Content Management Systems aimed at organisations that wish to integrate automatic COVID-19 information classification features into their media workflows.
Our next steps will be to adapt the COVID-19 information categorisation to languages other than English (FR, DE, ES, etc.), deploy it as Software-as-a-Service, implement a full disinformation categorisation and exploration user interface, and adapt to other medical disinformation areas, e.g. anti-vaccination.
In addition to the team members, thanks are due to:
Johann Petrak, who hacked the web UI for annotating COVID-19 disinformation classification and took care of many methodological aspects;
Mia Polovina, Andreas Grivas, Steven Zimmerman, Zlatina Marinova, Nikos Sarris, James Wood, Diana Maynard, Julia Ive, Lamiece Hassan, Tosin Dairo, Jon Chamberlain, Francesco Lomonaco, Tasos Papastylianou, James Allen-Robertson, Themis Makedas, Symeon Papadopoulos who all volunteered time to manually annotate some disinformation examples so we can train the machine learning models.
Authors: Olga Papadopoulous (CERTH-ITI) and Kalina Bontcheva (University of Sheffield)
Editor: Jochen Spangenberg (Deutsche Welle)