The tangible objective of this micro-project was to develop two datasets for European News with a political leaning labelling. This was needed to tackle the next step of the project, which was the one of building a bias-minimizing recommender system for European news.
The first dataset comprehends millions of European news, and it has been enriched with metadata coming from Eurotopics.net. Each entry in the dataset contains the maintext, title, publishment date, language, news source together with news source metadata. This metadata comprehends political leaning of the news source and its country.
We then built an article bias classifier, in the attempt of predicting the political label of single articles using the labels obtained through distant supervision. We then applied explainableAI to our classifier, and concluded that the classifier is effectively predicting the news source, rather than the political leaning.
In order to try and overcome this issue, we built a second dataset, which has the same features of the first one described above, but with the addition of topics, chosen between 7 macro-topics.
The immediate plan is to perform political-bias classification exploiting the new dataset by filtering out all the articles which do not carry political bias, such as those dealing with sports or gossip.