Categorizing Disagreements in the SNLI Corpus: an exploratory study

Author: Carla Rijlaarsdam
Source: LingUU Journal, Vol. 6, Iss. 1, pp. 18-28
Year: 2022
Published by: LingUU Journal

Abstract
This paper explores annotator disagreements in the Stanford Natural Language Inference (SNLI) Corpus (Bowman et al., 2015a) to gain more insight into how humans make language inferences. The aim of the exploration is to find patterns in the annotator disagreements and to categorize them. This categorization leads to an explanation of the disagreements, and this information can be used to train Natural Language Processing (NLP) models of any kind. The annotation process of disagreements can be divided into three phases, namely, the initial subset to create the first categories, an additional set to sharpen the categories and create the first guidelines, and a final set judged by other annotators to create the second version of the guidelines. The latest version of the guidelines can be used as a starting point for further research to enable more knowledge about human inferences. This exploratory study investigated 75 disagreement pairs in total.

Keywords
computational linguistics; language inferences; human inferences; meaning; annotators

Appendix
Read the appendix here.

linguu61