The lognormal is the most common distribution chosen to describe the uncertainty in ecoinvent. It has the advantage of not being defined in the negative domain, so credits do not accidentally happen during a Monte Carlo simulation.
The lognormal is not as intuitive as the normal distribution and is often confusing to new users. As a primer, we recommend “Log-normal Distributions across the Sciences: Keys and Clues” by Eckhard Limper et al, in BioScience, May 2001, Vol. 51, No.5.
Definition and Basic Properties of the Lognormal Distribution
A variable is lognormally distributed when the logarithm of the sample is normally distributed. The probability density function (PDF) of the lognormal is:
where x is the random variable, mu and sigma are the median and standard deviation of the distribution of ln(x) (sometimes called "the underlying normal distribution"). The median and standard deviation of x, noted mu* and sigma*, can be obtained through the following equations:
mu* = exp(mu)
sigma* = exp(sigma)
The quantity sigma* is useful to calculate intervals of confidence:
In the lognormal distribution, the median corresponds to the geometric mean, and is found at exp(mu). The arithmetic mean is found slightly higher than the geometric mean, at exp(mu + sigma2/2). The mode (the most likely value) is found at a lower value, exp(mu – sigma2). The larger the standard deviation, the larger is the skewedness and the further apart those three quantities will be.
From ecoinvent to the Lognormal PDF
Three inputs are necessary from the data provider to determine the parameters of the lognormal distribution: the deterministic value, the basic uncertainty and the pedigree matrix.
Going from the deterministic value to mu is straightforward: this value is taken as equal to mu*. In ecoEditor and ecoQuery, mu is called “Arithmetic mean of log-transformed data”. The deterministic value is also called “Geometric mean” in those tools. mu = ln(deterministic value).
Then, the basic uncertainty is chosen. This value reflects the fact that even “perfect” data is uncertain: there are fluctuations over time, errors in measurements, etc. Table 10.3 of the Data Quality Guidelines provides for values, depending on the type of exchange and process modeled. In ecoEditor and ecoQuery, this value is called “Variance of log-transformed data”. The field “Standard deviation (SD95)” is equal to exp((Variance of log-transformed data)0.5)2, a value that is not used anywhere in the rest of the calculation.
Then, a score from 1 to 5 is selected for 5 indicators: reliability, completeness, temporal correlation, geographical correlation, further technological correlation. These scores are transformed into additional uncertainty in order to reflect that the amount of an exchange might come from sources that are not as reliable as primary data collection. The values can be older, from a different technology, another part of the world, or based on estimates rather than calculation or measurement. Table 10.5 of the Data Quality Guidelines shows the relationship between the pedigree scores and the additional uncertainty.
The basic uncertainty is added to the five additional contributions to the uncertainty. This sum is called “Variance of data with pedigree”. Finally, the “CI/2wP, half range of confidence interval” is calculated as
exp((Variance of log-transformed data)0.5)2, corresponding to the square of sigma*.
A Numeric Example
Corresponding ecoEditor Uncertainty Window
Consult the Excel file for a detailed example.