Update datamodel

Over the last few weeks we have made some changes to the way we represent the European plenary debates in RDF. The figure at the bottom of this post displays the updated model. It combines information about the schema – properties and classes – with example instances that display their identifiers. The remainder of this blogpost talks you through the model.

The model conceptually splits up into two parts: the organizational information of the plenary activities and information about the speakers. With respect to the former, the model captures the structure of the plenary activities in terms of (monthly) sessions, session days, agenda items and speeches. Each of these units are related in a one-to-many fashion: a session comprises multiple days, a day in parliament has multiple agenda items, each of which generally consists of a large number of speeches. The organizational units are annotated with a date and/or number for ordering and filtering purposes, and with a relationship of containment for searching within a unit. Furthermore, a subsequent relation is added to quickly request a follow-up item of the same type.

Whereas sessions and session days are defined temporally, agenda items, such as statements and debates, are motivated by their content. At the moment, the only information available on this level is the title. Frequently, however, the title contains a reference to a report, which contains valuable contextual information such as the committee that drafted the amendments to be discussed. It would be interesting in a later stage of the project to try to extract and add this information.

The parliamentary speeches, which are the cornerstone of the model, are conceptualized as uninterrupted sequences of text spoken by a single person or as a formal action of one person. For socio-lingual purposes, we distinguish between the original text, spoken in a EU language of the speaker’s choice, and its translations. However, as we defined a superordinate property ‘text’, it can still be searched in all transcripts at the same time. It is noteworthy that in practice, it frequently occurs that translations are missing for several languages. Also, there are speeches without text, which are presumed to be actions. Within a speech, the EU makes note of interruptions and applauses, as well as role-statements (e.g., “on behalf of PPE”). Even though they take the form of unstructured text material, we include these unclassified metadata, as they might be valuable for interpretation and explanation.

A speaker in parliament need not be a member of parliament (MEP). When the EU website lists the identifier of a speaker, we classify him or her not just as a speaker but as a member of parliament as well. In that case, we use the identifier to relate to this person several types of background information: his or her family name, country of representation, date of birth, and the political functions he or she has fulfilled (since 1979) in EU committees, EU parties and country-level parties. To facilitate relating spoken content to political interests, a speech instance is directly connected to the representations of all political embodiments of its speaker at that moment.

We hope that the current model is intuitive and suits users’ search needs. Shortly we will fill the database and make available a query endpoint. If you have any questions or remarks, do comment on this post!

The current version of the datamodel for Talk of Europe
The current version of the datamodel for Talk of Europe