Five new scientific papers
High academic production from Norway’s largest AI community
Accepted! Five new scientific papers to be presented this summer
Five papers by NorwAI and associated researchers from the Department of Computer Science at NTNU have been accepte at different, international conferences in both Europe and the USA in the coming months. ML methods are used for work in very different domains: fake news, scandinavian languages, political science, multi-domain corpura and dynamic checklists for working environment.
Here are the details:
Fake News Detection by Weakly Supervised Learning Based on Content Features
Authors: Özlem Özgöbek, Benjamin Kille, Anja Rosvold From, Ingvild Unander Netland
Abstract: Fake news, defined as the publication of false information, either unintentional or with the intent to deceive or harm, is one of the important issues that affects today’s digital society significantly. All around the world, journalists and fact checking organizations are trying to fight this problem manually. However, fighting fake news is a time-sensitive task. Once leaked, fake news spread fast and increasingly impact society. Because of the complex and dynamic nature of news, applying artificial intelligence methods to address the automatic detection of fake news is a challenging task. This work explores the use of weak supervised learning for fake news detection by using only the content of news articles. To our knowledge, this is the first work which uses a content-based approach in weak supervised learning without the use of any contextual information for fake news detection. We propose an architecture for generating weak labels using weakly supervised learning, where these weak labels are then used to train and compare five different machine learning models. We demonstrate that weakly supervised learning is an effective approach to the automated detection of fake news in the absence of high-quality labels.
Published at the Symposium of the Norwegian AI Society (NAIS), Oslo, Norway, 31 May - 1 June, 2022
Building Sentiment Lexicons for Mainland Scandinavian Language Using Machine Translation and Sentence Embedding
Authors: Peng Liu, Cristina Marco and Jon Atle Gulla
This paper presents a simple but effective method to build sentiment lexicons for the three Mainland Scandinavian languages: Danish, Norwegian, and Swedish. This method benefits from the English Sentiwordnet and a thesaurus in one of the target languages. Sentiment information from the English resource is mapped to the target languages by using machine translation and similarity measures based on sentence embeddings. A number of experiments with Scandinavian languages are performed in order to determine the best working sentence embedding algorithm for this task. A careful extrinsic evaluation on several datasets yields state-of-the-art results using a simple rule-based sentiment analysis algorithm. The resources are made freely available under an MIT License.
Published at the Language Resources and Evaluation Conference (LREC), Marseille, France, on 20-25 June 2022, with support from NorwAI.
Using Language Models for Classifying the Party Affiliation of Political Texts
Authors: Tu My Doan, Benjamin Kille, and Jon Atle Gulla
Political viewpoints identification (PVI) is a task in Natural Language Processing that takes political texts and recognizes the writer’s opinions toward a political matter. PVI reduces the ambiguity in texts by identifying the underlying meaning and clarifying the bias margin along the political spectrum (bias leaning). Thus, even non-experts can better understand political texts. For instance, they can identify misinformation, bias, and hidden political agendas. In this paper, we formally define the concept of political viewpoints identification, explain its importance, and discuss to what extent current techniques can be used for extracting political views from text. Existing techniques address the problem of PVI inadequately. We outline their deficiencies and present a research agenda to advance PVI.
Published at the 27th International Conference on Natural Language & Information Systems (NLDB), Valencia, Spain, on 15-17 June 2022, as part of the Trondheim Analytica project.
Balancing Multi-Domain Corporate Learning for Open-Domain Response Generation
Authors: Yujie Xing, Jinglun Cai, Nils Barlaug, Peng Liu, and Jon Atle Gulla
Open-domain conversational systems are assumed to generate equally good responses on multiple domains. Previous work achieved good performance on a single corpus but combining multiple corpora from different domains is less studied. In this paper, we explore methods that deal with multi-domain corpora. We first investigate interleaved learning which intermingles multiple corpora as the baseline. We then investigate two multi-domain learning methods, labeled learning and multi-task labeled learning, which encode each corpus through a unique embedding. Furthermore, we propose Domain-specific Frequency (DF), a novel word-level importance weight that measures the relative importance of a word in a specific corpus compared to multiple corpora. Based on DF, we propose weighted learning, a method that integrates DF to the loss function, and we also adopt DF as a new evaluation metric. Extensive experiments show that our methods gain significant improvements on both automatic and human evaluation. We share our code and data for reproducibility.
Published at the Annual conference of the North American Chapter of the Association for Computational Linguists (NAACL) held in Seattle, Washington, USA, on 10-15 July, 2022, as part of NTNU and DNB’s AI & Big Data collaboration project.
Creating Dynamic Checklists via Bayesian Case-Based Reasoning: Towards Decent Working Conditions for All
Authors: Eirik Lund Flogard, Ole Jakob Mengshoel, and Kerstin Bach
Every year there are 1.9 million deaths world-wide attributed to occupational health and safety risk factors. To address poor working conditions and fulfill UN’s SDG 8, “protect labour rights and promote safe working environments for all workers”, governmental agencies conduct labour inspections, using checklists to survey individual organisations for working environment violations. Recent research highlights the benefits of using machine learning for creating checklists. However, the current methods only create static checklists and do not adapt them to new information that surfaces during use. In contrast, we propose a new method called Context-aware Bayesian Case-Based Reasoning (CBCBR) that creates dynamic checklists. These checklists are continuously adapted as the inspections progress, based on how they are answered. Our evaluations show that CBCBR’s dynamic checklists outperform static checklists created via the current state-of-the-art methods, increasing the expected number of working environment violations found in the labour inspections.
Published at the Special Track on AI for Good Papers at the 31st International Joint Conference on Artificial Intelligent (IJCAI), Vienna, Austria, on 23-29 July 2022