Annotate

Annotation is the practice of adding interpretative linguistic information, known also as tags and/or labels, to words, or sets of words of a text or a corpus. Annotation can be done both in raw data as well as in data that have already been processed.

There are several types of annotation, corresponding to different levels of linguistic analysis of a text or a corpus. For example, tags, or labels, added to a word or a set of words can provide information about the word class to which words in a text belong (Part Of Speech Tagging), the lemma of a word (Lemmatization), the morpho-syntactic features (MorphoSyntactic Tagging) or the syntactic structure (Syntactic Parsing) as well as the semantic features or the semantic fields of the words in a text (Semantic Annotation). Other types of annotation are Discourse Annotation, by adding information about anaphoric links in a text, or pragmatic information like speech acts (Pragmatic Annotation) and stylistic features such as speech and thought presentation (Stylistic Annotation).

Corpus annotation is a useful tool both for recalling and retrieving information from large volumes of data, training new language processing tools, or adapting tools that have already been developed to new subject fields and ad hoc research questions.

CLARIN:EL annotation tools: