Frequently Asked Questions

FAQs

In this section you will find definitions of key concepts related to the CLARIN:EL Research Infrastructure as well as answers to frequently asked questions about the the participation in the CLARIN:EL network, the use of the Infrastructure or technical and legal issues.

What is CLARIN and what is CLARIN:EL?

The CLARIN project is a european initiative to collect, organise and, finally, distribute to the research community language resouces in all european languages, through a research infrastrucure which will also offer language tools.

This initiative mainly targets the research community but it also aims at the public. Organisation-wise, it has the form of a pan-european network of research centers and offers

access to resources, services and language processing tools
language resources documentation through metadata
coordination of the creation, storage, management and access to the language resources
training and dissemination on the use of language technology.

CLARIN:EL is the Greek part of the CLARIN infrastructure. You can find additional information here.

What are Language Resources and Language Technologies?

By the terms Language Resources (LRs) and Language Technologies (LTs) we refer to language data (written or spoken) and to the tools used for their processing. We distinguish the following categories:

primary data
- digital/digitised resources, such as written texts (e.g. digitised books, web texts, newspapers, corpora etc.), recordings of spoken language (e.g. interviews, radio broadcasts etc.)
- video recordings (e.g. TV shows, facial expressions collections, gestures etc.)
- images (e.g. digital/digitised photographs with their captions etc.)
processed data
- various types of annotations of texts, sound and multimedia data, automatically or manually created (e.g. morphosyntactically annotated texts, transcriptions of spoken data, video annotations etc.)
reference resources
- various types of structured language data (e.g. word lists, dictionaries, thesauri etc.) which can be used for improved organisation, processing and study of primary data
language technology tools/applications
- tools and integrated applications that perform various types of language processing (e.g. multilingual text alignment, morphological annotation, lemmatisation, parsing, knowledge extraction etc.)
- visuallisation tools (e.g. integrated environments for the presentation of texts, mutlimedia collections, processing results etc.)

What is the CLARIN:EL Central Aggregator?

The CLARIN:EL Central Aggregator (or Central Catalogue) is the central repository of the CLARIN:EL Infrastructure, which is responsible for

the harvesting of metadata from the local repositories,
the organisation and the presentation of the metadata descriptions in a uniform catalogue and
the provision of access to the language resources to the network members and to the public.

What are Language Technologies?

Language Technologies (LTs) are workflows of tools for multilevel analysis, processing, annotation, enrichment and transformation of language data.

What are Language Processing Services?

Language Processing Services are services allowing the use of Language Resources and Technologies as well as their applications over the web.

What is Resource Provider?

Resource Provider is any organisation or individual that makes availble to the CLARIN:EL Infrastructure Language Resources, Technologies and/or Language processing services.

What are Language Data?

With the term Language Data we refer to digital language content of any form and medium, structured or unstructured.

What are Language Processing Tools?

Language Processing Tools are computational tools in the form of software, aiming at the

analysis,
processing, and
annotation of language data.

How can I join the network?

Organisations wishing to share digital content, language resources and/or language processing tools through the clarin:el infrastructure have to fill in the Membership Application, scan it and send the scanned copy (as pdf) to the clarin:el Network Coordinator and the Deputy co-ordinator.

Individuals wishing to share through the clarin:el infrastructure digital content, language resources and/or language processing tools they have developed, can express their interest by filling in and sending the relevant application to the Network Coordinator and the Deputy co-ordinator.

How can I setup my repository?

Organisations - members of the network can setup their own repositories, if they so wish. There they will be able to store, document, manage and curate their resources.

In this case, you need to contact the pyneva-ry-pbbeqvangbe@pyneva.terg.niralc@rotanidrooc-le-niralc and the pyneva-ry-grpunqzva@pyneva.terg.niralc@nimdahcet-le-niralc of clarin:el; you also need to appoint a technical administrator of your repository (Repository Technical Manager), who will be responsible for setting up the repository and for user management. Detailed information can be found in the Repository Manager's Manual.

How can I deposit my resources to the infrastructure?

In order to deposit their resources, users have to be registered members of one of the repositories of the infrastructure with editor rights (assigned to them by the repository manager).

The procedure for language resource description and uploading has the following steps (detailed guidellines -in Greek- can be found here):

Initially, the user-editor needs to describe the resource, that is, to add its metadata. This can be done in two ways:
- using the dedicated tool (metadata editor) provided by the infrastructure. The tool guides the user-editor to describe the resource, indicating the obligatory fields to be completed. Once all the obligatory fields are filled in, the descritpion is stored.
- by uploading existing descriptions. This is the case of descriptions created independently of the tool; these have to be in the form of xml files. The infrastructure checks the compatibility of the descriptions with the metadata schema used by the infrastructure and, if they are compatible, they are stored.
Following the storage of the metadata, the next stage is the storage of the data itself. At this stage, the user-editor uploads the data to a given endpoint. This endpoint functions as the local point where all users of the repository put their data files and from which these are harvested by the infrastructure. The data have to be compressed (files 'zip', 'tar.gz', 'gz', 'tgz', 'tar', 'bzip2', or 'bz2').
At the last stage, the user-editor is informed by email that the process of data uploading has been completed.

How can I upload my resource and process it with one of the offered language services?

In order to process one of your resources (that is, a resource that is not stored in the infrastructure) with one of the web services of the infrastructure(that is, a language processing tool used over the internet), you first have to select the web service you want and then to upload your resource for processing, following these steps:

from the central inventory of the infrastructure select browse
filter the catalog of available resources to locate the web service you want
- first by restricting the catalog to only tools/services by selecting Filter by > Resource Type = Tool Service
- and then restrict the results to only web services by selecting Filter by > Processing service = yes
select the web service you want
in the new page that opens, you can see the description of the service. Select upload and process to start uploading your resource
in the new page fill in the description of your resource to be uploaded
- NOTE: the resource must be in a zip file not exceeding 2 MB!
Select save and start processing
- NOTE: if you are a simple user, the resource you upload as well as the processing outcome will be internally stored for two (2) days in your repository (Institutional or Hosted Resources Repository). After this period, the resources will be deleted. However, you can store them permanently and share them with other users by applying to be an editor within these two days.
The infrastructure informs you about the progress of the processing, and, when it is completed, you receive an email with a link to the processed resource.

Is it necessary to provide the actual data or is a metadata description enough?

You can upload only metadata descriptions to clarin:el, while the resource itself might be available from elsewhere (another URL, through the owner etc.). However, resources that are not in clarin:el cannot be processed with the available clarin:el language processing web services. So, if you would like to use these services on your resources, it is best to upload the data as well as the metadata to the clarin:el infrastructure.

What are Computational Support Services?

Computational Support Services for the clarin:el infrastructure are

assignment of Persistent Identifiers (PIDs),
user authentication and authorization services (Authentication and Authorization Infrastructure, AAI),
storage services, and
provision of computational power for the execution of web services.

What does the Hosted Resources Repository offer?

The Hosted Resources Repository (HRR) hosts resources offered by providers that do not maintain a repository and makes them available to the users of the infrastructure. It also hosts the metadata descriptions of these resources and makes them available for harvesting by the central aggregator. Finally, it offers user mamagement services to its users.

What services does an Institutional Repository offer?

Each Institutional Repository hosts the resources of the relevant institution and makes them available to the users of the infrastructure according to the appropriate licenses. It also hosts the metadata descriptions for these resources and makes them available for harvesting by the central aggregator. Finally, it offers user management services to its users.

What services does the Central Aggregator offer?

The Central Aggregator hosts the central resource catalogue of the infrastructure, which contains information for all the resources of the infrastructure. The aggregator gathers this information by harvesting the metadata from the local repositories. Finally, together with the clarin:el portal, it hosts the technical and legal support services.

In which way are the Licencor, the Distribution Rights Holder and the IPR Holder different?

Licensor

The person who is legally eligible to license and actually licenses the resource. The licensor could be different from the creator, the distributor or the Intellectual Property Rights (IPR) holder. The licensor has the necessary rights or licences to license the work and is the party that actually licenses the resource that enters the clarin:el network. The licensor will have obtained the necessary rights or licences from the IPR holder and may have a distribution agreement with a distributor that disseminates the work under a set of conditions defined in the specific licence and collects revenue on the licensor's behalf. The attribution of the creator, separately from the attribution of the licensor, may be part of the licence under which the resource is distributed (as e.g. is the case with Creative Commons Licences).

Distribution Rights Holder

The person or organization that holds the distribution rights. The range and scope of distribution rights is defined in the distribution agreement. The distributor in most cases only has a limited licence to distribute the work and collect royalties on behalf of the licensor or the IPR holder and cannot give to any recipient of the work permissions that exceed the scope of the distribution agreement (e.g. to allow uses of the work that are not defined in the distribution agreement).

IPR Holder

The person or organization who holds the full Intellectual Property Rights (Copyright, trademark etc) that subsist in the resource. The IPR holder could be different from the creator that may have assigned the rights to the IPR holder (e.g. an author as a creator assigns her rights to the publisher who is the IPR holder) and the distributor that holds a specific licence (i.e. a permission) to distribute the work within the clarin:el network.

What are open licenses?

Open licenses are standard, irrevocable public licenses, that allow rights holders to share their work. Open licenses cover all types of work, such as content, data and software.

Every license allows users to use the licensed work subject to the terms that apply. They may license the work subject to attribution of the source or non-commercial use or non-derivative works or to share alike which means to redistribute the work or any derivatives thereof under the identical terms.

The most common open licenses are

For content
- CC0
- CC-BY
- CC-BY-SA
- CC-BY-NC
- CC-BY-NC-SA

For software

What are Public Domain Data? Are Public Domain Data subject to copyright/ licence restrictions?

Useful information on the different types of Public Domain Data (for example Public Sector Infomation, PSI) as well as their restrictions of use can be found here.

What are the types of licences under which Public Domain Data are available?

PSI licences constitute a key instrument for the release of huge amounts of information by Public Sector Bodies (PSBs). Useful information on key types of licensing, making reference to PSI licences as well, can be found here.

What happens if my dataset contains personal and/or sensitive data?

Useful information can be found here.

Where can I find the EU guidelines/directions concerning the Public Domain Data?

Commission encourages re-use of public sector data http://europa.eu/rapid/press-release_IP-14-840_en.htm
Open data Portals https://ec.europa.eu/digital-agenda/en/open-data-portals
Copyright Law, 2011/833/EU http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32011D0833