## Abstract

Karst systems play an important part in providing clean drinking water for many countries worldwide. Their sustainable management often requires the application of karst hydrological models to quantify the water volumes available and to assess their sensitivity to changes in climate or land use. The available literature provides a broad overview of different modelling approaches to simulate karst hydrology, but guidance for the application of these approaches is scarce. This paper provides personal insights into the application, calibration and evaluation of lumped karst hydrological models. An introduction to model calibration and sensitivity analysis is given, with links to further reading and ready-to-use toolboxes. It is followed by three case studies of karst model applications at three different scales (the plot scale, aquifer scale and continental scale), which apply model calibration and sensitivity analysis to obtain realistic simulations. These three case studies elaborate how the necessary trade-off between the available data and process representation in the model structures was achieved. General recommendations and a workflow for future karst model applications are given.

Karst systems develop through the weathering of water-soluble rocks, a process referred to as karstification (Kiraly 2003). The consequent feedbacks between the atmosphere, biosphere, hydrosphere and geosphere (Goldscheider 2012) result in distinct surface and subsurface landform features, such as karren fields, dolines, sinking streams, cave systems and large karst springs (Ford & Williams 2007). Although beautiful, these features pose a challenge for hydrological and hydrogeological characterization (Goldscheider & Drew 2007). Karst systems are known for their pronounced heterogeneity and anisotropy of hydraulic properties (Bakalowicz 2005; Worthington *et al.* 2016), but also for their high groundwater storage capacities, which provide favourable conditions for groundwater extraction and water supply (Andreo *et al.* 2006). Some European countries – for instance, Austria and Slovenia – obtain around half of their drinking water from karst aquifers (European Commission 1995). Karst hydrological models are commonly applied to sustainably manage karst water resources and to understand their hydrological behaviour (Hartmann *et al.* 2014*a*).

The currently available karst hydrological models are adapted to the particular processes that are typically found in karst systems (White 2003). Some model structures consider the flow dynamics of the epikarst (Tritz *et al.* 2011; Hartmann *et al.* 2012), whereas others consider the transfer of groundwater between the cave, conduit or fracture systems and the fissured carbonate matrix (Liedl *et al.* 2003; Butscher & Huggenberger 2008; Reimann *et al.* 2011). Many of the reviews summarizing the range of different karst modelling techniques (Sauter *et al.* 2006; Ford & Williams 2007; Kovacs & Sauter 2007; White 2007; Ghasemizadeh *et al.* 2012; Hartmann *et al.* 2014*a*; Hartmann & Baker 2017) distinguish two main groups of karst modelling approaches by the manner in which the real karst system is considered within the model. Spatially distributed models represent the karst system as spatially discretized and apply the model equations at each point, whereas lumped approaches neglect the spatial distribution of the system parameters (e.g. the hydraulic conductivities or porosities) and focus on the representation of the dominant processes. Some models defined as lumped models sometimes include spatial information – for instance, subsections of the aquifer or spatial units of the karst system (Rimmer & Salingar 2006; Ladouche *et al.* 2014). Some recent approaches can be classified as semi-distributed because they represent spatial variability by distribution functions (Hartmann *et al.* 2014*b*, 2016). The selection of the final approach usually depends on the purpose, requirements, data availability and cost of the modelling project.

However, another factor that influences the reliability of the model simulation, with either a distributed or a lumped karst modelling approach, is the availability of observations from which to choose the most appropriate set of model parameters, e.g. the soil thickness and storage capacity or the hydraulic conductivity of the matrix or conduits. As a result of the strong heterogeneity of karst systems, such information is difficult to obtain and is seldom available at the scale of the model application (Geyer *et al.* 2013). Although some of the necessary information can be obtained by experimental methods (e.g. soil cores or pumping tests), incommensurability prohibits the direct application of these measurements within the models (Beven 2006). As a consequence, many karst modellers use an inverse modelling approach to assess the most representative values of their model parameters by modifying them systematically until an acceptable fit between the observed and simulated model output – for instance, groundwater levels or karst spring discharge – is obtained. If the available observations contain enough information, such an inverse modelling approach allows an estimation of the most realistic set of effective model parameters and the model can be used for prediction. However, in many cases the available information is not sufficient to find a unique parameter set and many different parameter combinations result in similar model outputs (Perrin *et al.* 2003). In this case, the model predictions would have high uncertainties and the model's application for prediction would be strongly limited.

This paper provides some practical insights into my personal experience of identifying the most appropriate model structures and model parameters. It begins with an introduction into karst model calibration and the methods used to assess the information content of the available observations. This is followed by a set of three case studies performed at three scales (the plot, aquifer and continental scale) with strongly variable availability of data. An elaboration of the decisions and parameter estimation tools that lead to the final model structure and a discussion of the reliability of the simulations is presented. Although this paper represents an incomplete and personal view of karst model development and evaluation, it gives general recommendations for present and future karst modelling studies.

## Calibration and parameter identifiability in karst hydrological models

Model calibration can be explained as a process of adjusting the model parameters to obtain an acceptable representation of the hydrological processes of interest that satisfies pre-defined criteria (Fig. 1). It is necessary because models are always simplifications of a real system. Naturally variable properties, such as hydraulic conductivities, have to be expressed by average or representative values within the chosen spatial model discretization (distributed or lumped). As pre-defined criteria, modellers commonly use error functions that express the deviation of simulations and observations by a single number. The coefficient of determination (*r*^{2}), the root-mean-square error, the Nash–Sutcliffe efficiency (Nash & Sutcliffe 1970) and the Kling–Gupta efficiency (Gupta *et al.* 2009) are popular examples. Prior knowledge about the real values of model parameters is included by choosing parameter ranges that envelope their observed values – for instance, a field capacity of 200–300 mm if previous field studies found values of *c.* 250 mm.

In early hydrological models, model calibration was carried out manually and required expert knowledge about the hydrological system and the chosen model (Boyle *et al.* 2000; Anderson 2002). As computational capacities increased, automatic calibration themes evolved that systematically explored the entire space of the model parameters within a pre-defined range to minimize the selected error function (Beven & Binley 1992). One of the most popular of these is the shuffled complex evolution algorithm (Duan *et al.* 1994), but many more have been developed in the last three decades (Gupta *et al.* 1999; Vrugt *et al.* 2009).

When the model structures became increasingly complex to incorporate more hydrological processes, it was found that an acceptable error function was often associated with a large number of different parameter combinations. This problem, referred to as equifinality (Beven 1989, 2006) arose because of uncertainties in the model input (e.g. precipitation), in the model structure (due to simplifications or the incorrect representation of processes) and in the observations (e.g. the uncertainties in the rating curve used to translate water level measurements in streams into discharge measurements) allowed a variety of acceptable parameter combinations. In addition, more complex model structures allowed the model internal processes to compensate for each other. For instance, simulated soil percolation rates that were too rapid could be compensated by simulated groundwater dynamics that were less dynamic than in the real system.

Because of these newly recognized problems, methods were developed that allowed an assessment of the sensitivity of model simulations to changes in the model parameters. Sensitivity analysis methods systematically vary the model parameters, most preferably within the same pre-defined ranges as the calibration, and analyse the variations in the model output (the error function or other metrics derived from the output, e.g. the mean discharge) according to those changes. Beginning with one-at-a-time sensitivity analysis, which varies only one model parameter while the remaining parameters are fixed, a wide range of approaches based on Monte Carlo sampling are classified as regional (Hornberger & Spear 1981; Young 1983) or global sensitivity analysis (Sarrazin *et al.* 2016; Chen *et al.* 2017). Apart from providing tables that list the percentage change in output according to a change in an individual parameter or the sensitivity indices, cumulative distribution functions of the normalized error functions of a large number of randomly sampled parameter sets (Monte Carlo sampling or similar) are a popular way to visualize the sensitivity of model parameters (Fig. 2). A parameter distribution differing from a uniform distribution is often regarded as an indicator of a sensitive parameter (Wagener *et al.* 2004). Similar to the automatic calibration schemes, guides and readily programmed toolboxes are available that provide a range of different procedures for sensitivity analysis (Pianosi *et al.* 2015, 2016).

If sensitivity analysis shows that many model parameters are insensitive, the information to identify all the model parameters by model calibration is probably insufficient, i.e. there is a strong risk of over-parametrization and equifinality (Fig. 3). Consequently, the model structure has to be simplified, insensitive parameters have to be excluded from the parameter estimation, or the information content of the available observations has to be increased. This can be achieved by considering separate time periods of observations instead of the entire time series (Gupta *et al.* 1998; Dunne 1999; Madsen *et al.* 2002; Wagener *et al.* 2003) or by the definition of multiple signatures that can be derived from the observed data (Wagener *et al.* 2007; Gupta *et al.* 2008; Yilmaz *et al.* 2008). Often, this increase in parameter identifiability results in small reductions in the overall model performance (Fig. 3; Kuczera & Mroczkowski 1998; Seibert & McDonnell 2002; Son & Sivapalan 2007). In particular, the inclusion of hydrochemical information in the calibration and sensitivity analysis provides a promising direction in karst systems where water quality data are often used to characterize the flow and storage behaviour (Birk *et al.* 2005; Hartmann *et al.* 2013*a*; Hartmann 2016).

Combined with model calibration, sensitivity analysis provides a powerful tool to assess the applicability of a chosen hydrological model for prediction and water resources management. It has to be acknowledged, however, that traditional model evaluation tools, such as spatial or temporal split-sample tests with independent observations (Klemeš 1986) and, if possible, the evaluation of the realism of model internal processes and parameters in discussion with field experts and their conceptual understanding of the systems, should not be omitted in the final model application.

## Case studies

The following case studies, with very different spatial scales and different data availability, elaborate the application of model calibration, sensitivity analysis and model evaluation procedures to obtain the most reliable simulation (Table 1).

Case study 1 addresses the simulation of drip rates at a karstic cave in northern Israel with just few square metres of areal extent (Hartmann *et al.* 2012). This study aimed to simulate the spatial and temporal variability of drip rates within a karstic cave in a semi-arid environment. The input and output of the system were measured at a six-hour resolution because of the strong dynamics of the semi-arid climate and the rapid flow dynamics of the karst system. In addition to climate data and the drip rates at several stalagmites, the input and output of an artificial tracer experiment were monitored. As there was no spatially distributed information on the structural properties of the karst system above the cave, a semi-distributed model, which considers the spatial distribution of karst properties by distribution functions, was chosen and set up to simulate water and tracer fluxes at the six-hour resolution of the available data. The structure of the model was based on the general conceptual model of the epikarst (Williams 2008). A multivariate calibration (mean drip rate, mean tracer concentration and coefficient of variation of drip rates and tracer concentration among all stalagmites) using the shuffled complex evolution metropolis algorithm (Vrugt *et al.* 2003) was performed with the Nash–Sutcliffe efficiency as the error function. Regional sensitivity analysis (similar to the example in Fig. 2) showed that the 13 model parameters were only identifiable by using all the available information. Using only the average drip rates provided a superior model performance, but, because of the reduced information, many model parameters remained unidentifiable. Hence the incorporation of tracer observations and information about the spatial variability of drip rates and tracer concentrations reduced the prediction uncertainty, which was also supported by the findings of a split-sample test (Klemeš 1986) and a discussion of the realism of the calibrated model parameters.

Case study 2 dealt with the identification and application of a karst model to a larger karst system in the Middle East (800 km^{2}, Hartmann *et al.* 2013*b*). Interpolated climate data at a daily time step was derived from various stations across the study area. Observations of daily discharge and weekly to bi-weekly observations of water quality parameters were available at the two major springs draining the system. Compared with case 1, the larger extent of this system aggravated the identification of a unique conceptual model and hence the identification of a unique and adequate model structure. For that reason, four different spatially lumped karst model structures were prepared to run on a daily time step. Using the shuffled complex evolution metropolis algorithm, the models were calibrated individually on eight hydrodynamic and hydrochemical signatures. Sobol's sensitivity analysis (Saltelli *et al.* 2008; Sarrazin *et al.* 2016) was used to assess the information brought by each signature. The combination of the calibration and sensitivity analysis using the eight signatures allowed three of the four model structures to be iteratively discarded, hereby reducing the structural uncertainty of the model. The remaining model structure showed an acceptable performance for all eight signatures simultaneously and the sensitivity of all the model parameters was regarded as realistic in terms of the simulated fluxes and parameter values.

Case study 3 (Hartmann *et al.* 2015) was the first application of a karst recharge model on a continental scale (all carbonate rock regions in Europe, the Middle East and northern Africa, 7 × 10^{6} km^{2}). Other than the previous case studies, which used data from rain gauges, the input of the model consisted of a gridded climate data set at a 0.25° × 0.25° resolution (Rodell *et al.* 2004; Rui & Beaudoing 2013). Observations for the calibration consisted of evaporative flux (Baldocchi *et al.* 2001) and soil moisture (Dorigo *et al.* 2011) observations at a point scale. The large scale of the application and the spatial variability of the landscape and climate characteristics required the application of a spatially distributed karst recharge model with a relatively simple structure (four model parameters) accounting for the limited availability of calibration data. For calibration, the entire model domain was split into four regions using cluster analysis and climatic and topographic descriptors according the concept of hydrological landscapes (Winter 2001). At each of these regions, the four model parameters were estimated using a newly developed parameter estimation approach accounting for uncertainties due to the differences in observation and simulation scales, data limitations and model simplifications (Hartmann *et al.* 2015). Using such an approach, groundwater recharge over the entire continental modelling domain could be estimated and the remaining uncertainty of the simulations was assessed explicitly. A comparison with independent observations of karstic groundwater recharge derived from the literature indicated that the model provided realistic recharge values and spatial patterns of groundwater recharge across the entire model domain.

To avoid the problem of equifinality, the complexity of the model structures had to be adapted to these data limitations because they were lumped structures, the parameters of which had to be found by some sort of calibration (Fig. 3). Although continuous input data was relatively easy to obtain, the availability of observations varied strongly for the three case studies, resulting in model structures that included varying degrees of complexity. This was mostly due to the scale of their application: at the plot scale (case study 1), dense measurement networks were capable of capturing the spatial and temporal variability of water and solute fluxes, allowing the application of a complex, process-based model and identification of its model parameters.

At the scale of an entire aquifer (case study 2), measurements were mostly available at the karst spring, providing integrated information on the flow, solute transport and storage behaviour of the system. For that reason, model structures with a lower number of model parameters provided acceptable process representation without too many insensitive parameters. In addition, in case studies 1 and 2, auxiliary information (artificial tracer experiment and water quality observations) was available to increase the available information (Fig. 3).

At the scale of an entire continent (case study 3), observations were only available at locations that were considered representative for a much larger area. Hence a rather simple model structure (four model parameters) was chosen, which only considered the most relevant processes (actual evapotranspiration, diffuse and concentrated recharge generation), relying on a rather general understanding of karst recharge processes. Sensitivity analysis revealed that the parameters of this simple model structure were identifiable at those regions where enough observations of evaporation and soil moisture were available. The estimates of karstic groundwater recharge at other regions with no observation of evaporation and soil moisture were less precise, as revealed by the uncertainty analysis.

## Synthesis

The increase in computational capacities experienced over the last three decades has allowed the development of increasingly complex and process-based hydrological model structures. Likewise, improved tools for automatic model calibration and model evaluation have been developed showing that model parameters are often not uniquely identifiable, therefore limiting the prediction performance of hydrological simulation models. This is due to limited data availability, uncertainties in model input data and observation data, as well as model structure uncertainties. However, these model evaluation tools also allow us to adapt the complexity of hydrological models to the available data and to estimate the value of additional information to increase the process representation within these models and to obtain more robust and reliable predictions.

This paper has provided an overview of the methods available for automatic model calibration and model evaluation for karst system modelling, as well as some experience in applying these methods to three case studies at variable scales and with varying data availability. Even though this review is based on subjective and personal experience and is far from complete, some general recommendations and a workflow for future karst model applications are provided (similar to Beven 2001).

Review all prior knowledge about the system to be modelled and collect all available data. For successful calibration, the selected model structure should be based on a realistic conceptual model of the system and should include all the relevant processes that previous research has found to be important. The input data (e.g. precipitation and potential evaporation) has to be complete and continuous for the entire period of observation. Observations for parameter estimations do not require continuous records, but they should be dense enough to capture the hydrological dynamics of the processes to be modelled. In addition, some of the input data should be reserved for model warm-up and some of the observations should be reserved for model evaluation.

Choose the calibration procedure. Some possible algorithms for automatic calibration have been mentioned here, but a much wider range of methods is now available. All will require the definition of model parameter ranges that should reflect prior information about the system. Alternatively, parameter ranges can be derived from previous applications of the chosen model. In some cases, it may be possible to confine the model parameters by pre-analysis, e.g. recession analysis.

Choose the error function. The calibration procedure will require an error function to be minimized. Some examples are provided in this paper, but any number that expresses the fit between the observations and simulations is possible. If multiple types of data are used for calibration (e.g. discharge and water quality data), chose a procedure to stepwise or simultaneously include the different types of observations in the calibration (e.g. a weighting scheme or a multi-criteria calibration approach).

Test the performance of the model. First, it is necessary to test whether the model is generally able to obtain an acceptable fit of the observations and simulations. If the model fails, the selection of the chosen model structure has to be revised (back to step 1).

Choose an approach to sensitivity analysis. As some of the methods are computationally demanding, the selected sensitivity analysis approach should depend on the calculation times of the model. Some review papers that provide recommendations for choosing the most appropriate approach are given in this paper and in the sensitivity analysis literature.

Apply a sensitivity analysis. For consistency, choose the same parameter ranges as for the calibration. Evaluate whether the available observations contain enough information to identify all the model parameters. If a large number of insensitive parameters indicate that the information content of the observations is insufficient, then reduce the complexity of the model by structural simplifications or by excluding insensitive parameters from the parameter estimation and repeat the calibration (back to step 4) or include more observations – for instance, water quality data (back to step 3).

Evaluate the plausibility and prediction performance of the model. Use previous studies and discussions with field experts to evaluate whether the calibrated and sensitive parameters reflect the conceptual model and prior understanding of the system. Use observations that were excluded from the calibration (see step 2) to test the prediction performance of the model (e.g. by a split-sample test). If the model shows a poor prediction performance or its parameters are inconsistent with the prior understanding of the system, go back to step 1.

Apply the model for prediction. If the model passed all the previous steps, then there is a strong indication that it provides realistic and robust simulation, making it applicable for prediction.

It is necessary to acknowledge that this workflow is limited to the application of models that require calibration, i.e. it is meant to provide guidance for an inverse modelling problem. In some cases, the purpose of the model application may already define the model structure. For instance, if the spatial impact of pumping on the distribution of groundwater heads has to be assessed, then a distributed model has to be applied no matter how much information is available for parameter estimation. In such cases, the problem changes from an inverse problem to a forward modelling problem, for which model parameters are derived solely from prior information and expert knowledge. In such cases, computational limitations often limit the application of detailed uncertainty assessment schemes. However, a low number of manual variations of the model parameters that reflect the uncertainty of the prior information of the system (as, for instance, in Gunkel *et al.* 2015) may allow some preliminary estimate of the uncertainty of the simulation.

## Acknowledgements

Many thanks to Jürgen Strub of the Chair of Hydrology, University of Freiburg, for designing Figures 1 and 3. Also thanks to Javier Martín Arias, Centre of Hydrogeology of the University of Málaga (ES), for his valuable recommendations to improve this paper.

- © 2018 The Author(s). Published by The Geological Society of London. All rights reserved