Citizen Scientists wanted for a project on hydrological data in Italy: SIREN (Saving Italian hydRological mEasuremeNts)

The SIREN (Saving Italian hydRological mEasuremeNts) project aims to digitize the historical series of daily flows by crowd-sourcing the recovery of hydrological measurements from historical Hydrological Yearbooks and to produce a consistent dataset. With your help, we can save Italy’s hydrological measurements and keep up the data flow!

In Italy, hydro-meteorological data collection has been managed at the national level by the National Hydrological and Mareographic Service (Servizio Idrografico e Mareografico Nazionale, SIMN) since the early 1900s. The dismantlement of the SIMN, which occurred about 30 years ago, resulted in data collection being transferred to the regional level, consisting of 19 Regions and 2 Autonomous Provinces. This shift has caused difficulties in the availability of complete and homogeneous records for the whole country.

Data acquired in the most recent years is typically available in digital format. In contrast, historical measurements are often only available in the printed version of the Hydrological Yearbooks published by the National Hydrological and Mareographic Service (Figure 1). In the past, a few initiatives attempted to partially recover this information, but they focused on a limited number of years and/or some regions.

Fig. 1: A page from one of the Hydrological Yearbooks, containing daily discharge data.

Is this lack of data in a digital format a problem?

Yes, definitely! One of the major problems that both hydrologists and climatologists face is the limited amount of historical data that can be used to test new methodologies or train models. This lack of data is even more critical in a country like Italy, with a complex morphology and climate that varies substantially across the territory.

The recovery of this considerable amount of data would not only allow a better understanding of the climate of the last century, but would also serve to estimate how the climate and the hydrological cycle could change in the future.

Why is it important to digitize historical time series?

Let’s take Piedmont (NW Italy) as a case study. Figure 2 shows an estimate of the number of historical series of daily average flows available in each year in Piedmont. Only the most recent observations, from 1995 to the present (in gray), are available in digital format through the Arpa Piemonte web portal.

Fig. 2: Number of time series of daily average flows available in Piedmont in each year. The historical series available only in the volumes of the Hydrological Annals are shown in blue.

The less recent series (before 1995, in blue) are reported in the Hydrological Yearbooks. These represent a significant portion and, in this case, the majority of the total daily average flow observations available in Piedmont. A considerable digitization effort is needed to make these historical series easily accessible.

Figure 3 shows an example of a time series of daily flows for the Tanaro River at Montecastello. The observations from 1942 to 1985 are available in the Hydrological Yearbooks, while the observations from 1996 to 2010 are available in digital format. Figure 3 also shows the series of average flows in spring and autumn (below), with the corresponding long-term averages in the two periods. The difference between the values of these two periods is evident and suggests the presence of trends in the hydrological regime.

This example highlights the importance of digitizing and reconstructing all the available time series, especially when analyzing trends and changes in hydrological regimes.

Fig. 3: Historical series of daily average flows of the Tanaro River in Montecastello and average flows in spring (March, April and May) and autumn (September, October and November). The horizontal lines indicate the long-term average over the two periods.

Why not using optical character recognition software?

Despite the remarkable improvements achieved in recent years by Optical Character Recognition (OCR) softwares and machine learning / artificial intelligence techniques, the most accurate digitization approach is still based on manual transcription.

Most of these records are printed in old documents, and the ink may be partially damaged. For example, an “8” can be easily detected as “3” in these conditions.

Moreover, these tables contain several handwritten corrections performed by different people, thus, with different calligraphies. All these peculiarities limit the applicability of standardized automatic approaches.

In other words… we need your help!

If you are interested in contributing to the digitization of this data, on https://www.zooniverse.org/projects/siren-project/siren-project you will find information about the project and a digitization tool! Even just 10 minutes of your time will be precious for the project!

Research group made of Paola Mazzoglio, Luca Lombardo, Alberto Viglione, Francesco Laio and Pierluigi Claps of the Politecnico di Torino and by Miriam Bertola of the Vienna University of Technology.

Project released by Politecnico di Torino – Department of Environment, Land and Infrastructure Engineering during the World Water Day, 22 March 2023.

Edited by C. Orieschnig