Exploring Parliament Datasets on Cultural Heritage

Beatriz Barros, Antonio Gallardo, and Pedro Luengo

The SiSOB team participated in the Talk of Europe Creative Camp #2 with a Project entitled “Exploring Parliament Datasets on Cultural Heritage” (Abstract. http://www.talkofeurope.eu/creativecamp2/abstracts/#Exploring).  This event was organized by the Talk of Europe – Travelling CLARIN Campus Project. Attendees were selected in a public call: the SiSOB team was represented by Beatriz Barros (University of Malaga and CEICE), Antonio Gallardo (University of Seville and CEICE) of Computer Science, and Pedro Luengo (University of Seville), who is an expert on Cultural Heritage.

Our project was based on using the SiSOB extractor:
(http://sistractor.iaia.lcc.uma.es/extractor/), which had been adapted to obtain data from parliamentary proceedings using information retrieval and natural language techniques (figure 1). We also conducted some programing tasks to configure access to the dataset, annotate it, and define an interface to show the results. The result is the sistractor-TOE tool:


Figure 1. Inputs and Output of SIOB DATA Extractor

The parliament dataset is processed and stored in the RDF. SPARQL was used to extract and analyse the dataset, specifically the part pertaining to speech (id, date, country of representation, and text of the debate (in the original language, English, and the other languages)). We annotated the debates by using this dataset, the taxonomy of concepts, and the SiSOB extractor.

We prepared a case study on Cultural Heritage Policy and extracted the information from the debates using SiSOB tools and a rich set of contextual information (such as a taxonomy of concepts and the set of dictionaries needed to work with the tool) (figure 2).


Figure 2. Cultural Heritage Example

For each intervention, we annotated its topic/keywords and some features/indicators. To date, we have considered single indicators: number of times in a debate, country, gender of presenter, and political party. In the future, other more complex  inference methods and statistical methods will be considered. The general approach of this case study is shown in  Figure 3.

This is a first step in a research project in which we compare “What parliament says about a topic” and “How citizens perceive and understand the same topic”.  In the Talk of Europe Creative Camp #2, we presented the first part (STEP 1), which addressed the parliament dataset (figure 4).  The work on the social media will be conducted after the “C#2Creative Camp” meeting and updated on the same webpage in a few months.


Figure 4. Schema of the subproject presented in the Talk of Europe Creative Camp #2

In the Talk of Europe Creative Camp #2, we presented the initial results ­­­­­of a demo running on a virtual machine at the University of Malaga. Figure 5 shows some debates that took place in 2009 related to Cultural Heritage, which can be accessed via a cloud of words that were created by taking into account the number of debates on each topic during that year. The interface includes a right-hand panel to show the text of the discourse or the original debate. We used GATE (https://gate.ac.uk/) to process the text of debates. This panel shows a time-line representing the number of times that a list of topics related to Cultural Heritage appeared in the debates during 2009.

The results are shown in a window with a graphical interface. The user can zoom in to explore a particular date. The database can be explored by selecting or unselecting the topics in the upper box. There are two time-lines, a general timeline for a year (upper area) and a zoom for a specific date (lower area) (Figure 5).


Figure 5. Time-line of the topics of the parliamentary debates by keywords

A cloud of words is also provided (Figure 6). The user can select a keyword and access the parliamentary debates organized by date (right panel).


Figure 6. Cloud of words on Cultural Heritage in the parliamentary debates

One of the challenges within this research project is to search for efficient algorithms and methods by which to conduct this comparative study of society and parliament. We have noted that the period related to society has to be longer than the period related to  parliament. Regarding the societal aspect, searches of the WWW will cover a time-period beginning from the date of the publication of the parliamentary agenda and some weeks or months after its debate sessions have concluded.

This demo is a preliminary version of our project. Our aim  is to include the societal aspect and provide intelligent tools that will, firstly, show any user the view (opinion, position) of parliament and, secondly, show the opinions of the citizens on this topic and how they perceive the effect  on their lives of the work done in parliament. With this aim, we will explore social networks, newspapers, specialized blogs, and so on.

We invite you to visit the webpage of the demo and to send your comments, ideas, proposal, previous works, new approaches, and so on. All your contributions are welcome. We can also upload for you a set of keywords, or a taxonomy of concepts, to help you to explore how a theme is being discussed in the European Parliament. We will try our best to reply to all your contributions.


We provide this access address to show any user how it works. Visit us and if you want an account to upload your set of data, please send us an email.
Authors: Beatriz Barros (bbarros –at- lcc.uma.es), Antonio Gallardo & Pedro Luengo, March 2015.

More about this project:

SiSOB project:  http://sisob.lcc.uma.es

SiSOB blog:  https://sisobproject.wordpress.com/

SiSOBserver:  http://sisobserver.lcc.uma.es/

Aldo Geuna, Rodrigo Kataishi, Manuel Toselli, Eduardo Guzmán, Cornelia Lawson, Ana Fernandez-Zubieta, Beatriz Barros, SiSOB data extraction and codification: A tool to analyze scientific careers, Research Policy, Available online 25 February 2015, ISSN 0048-7333,