Feb 1st, 2024

### Is a Test Version of the Database Available?

We do not offer a test version of the database. However, you can create a free guest user account.

As an unlicensed user, you can freely access documentation for all datasets in the ecoinvent database. We provde two example datasets on our website, which will give you an impression of what the linked datasets contain.

Check out our support guide to discover which datasets are available in all the system models.

### Which Datasets are available in the ecoinvent Database?

To get a comprehensive overview of all datasets contained in the ecoinvent database, we provide a database overview file.

**How can I Access the Database?**

The ecoinvent database can be accessed online on the ecoinvent website as well as through leading software tools.

**Where Does the Data in the ecoinvent Database Come From?**

There are multiple paths through which data can find its way into the ecoinvent database; in all cases, ecoinvent obtains the right to publish the data non-exclusively, but the data creator remains the data owner.

Most of our inventory datasets are generated in data collection projects dedicated to specific economic sectors and countries or regions. To realize these projects, we commonly collaborate with external global partners with local, sectorial, and methodological expertise, such as research institutes, academia, consultancies, or industry associations. Our team also supports individual researchers or companies directly approaching us with data they want to publish in ecoinvent. Finally, some projects are carried out internally by our small team of LCA experts.

Accordingly, also the sources our data relies on are diverse. Where possible, our external partners will provide primary data, for example, from field visits or interviews. Secondary data are preferably sourced from publicly available statistics, peer-reviewed scientific literature, and databases complemented with other public (industry) data such as company reports, emission registers, etc.

Regardless of the type of data collection, all datasets undergo a thorough review from internal and independent external experts to ensure they fulfill ecoinvent quality and transparency requirements before they are published in the database.

**How to Interpret the Uncertainty Fields in ecoinvent?**

The lognormal is the most common distribution chosen to describe the uncertainty in ecoinvent. It has the advantage of not being defined in the negative domain, so credits do not accidentally happen during a Monte Carlo simulation.

The lognormal is not as intuitive as the normal distribution and is often confusing to new users. As a primer, we recommend “Log-normal Distributions across the Sciences: Keys and Clues” by Eckhard Limper et al, in BioScience, May 2001, Vol. 51, No.5.

#### Definition and Basic Properties of the Lognormal Distribution

A variable is lognormally distributed when the logarithm of the sample is normally distributed. The probability density function (PDF) of the lognormal is:

where x is the random variable, mu and sigma are the median and standard deviation of the distribution of ln(x) (sometimes called “the underlying normal distribution). The median and standard deviation of x, noted mu* and sigma*, can be obtained through the following equations:

mu* = exp(mu)

sigma* = exp(sigma)

The quantity sigma* is useful to calculate intervals of confidence:

In the lognormal distribution, the median corresponds to the geometric mean, and is found at exp(mu). The arithmetic mean is found slightly higher than the geometric mean, at exp(mu + sigma2/2). The mode (the most likely value) is found at a lower value, exp(mu – sigma2). The larger the standard deviation, the larger is the skewedness and the further apart those three quantities will be.

#### From ecoinvent to the Lognormal PDF

Three inputs are necessary from the data provider to determine the parameters of the lognormal distribution: the **deterministic value**, the **basic uncertainty** and the **pedigree matrix**.

Going from the **deterministic value** to mu is straightforward: this value is taken as equal to mu*. In ecoEditor and ecoQuery, mu is called “Arithmetic mean of log-transformed data”. The deterministic value is also called “Geometric mean” in those tools. mu = ln(deterministic value)

Then, the **basic uncertainty **is chosen. This value reflects the fact that even “perfect” data is uncertain: there are fluctuations over time, errors in measurements, etc. Table 10.3 of the data quality guideline provides for values, depending on the type of exchange and process modeled. In ecoEditor and ecoQuery, this value is called “Variance of log-transformed data”. The field “Standard deviation (SD95)” is equal to exp((Variance of log-transformed data)0.5)2, a value that is not used anywhere in the rest of the calculation.

Then, a score from 1 to 5 is selected for 5 indicators: reliability, completeness, temporal correlation, geographical correlation, further technological correlation. These scores are transformed into additional uncertainty in order to reflect that the amount of an exchange might come from sources that are not as reliable as primary data collection. The values can be older, from a different technology, another part of the world or based on estimates rather than calculation or measurement. Table 10.5 of the data quality guidelines shows the relationship between the pedigree scores and the additional uncertainty.

The basic uncertainty is added to the five additional contributions to the uncertainty. This sum is called “Variance of data with pedigree”. Finally, the “CI/2wP, half range of confidence interval” is calculated as

exp((Variance of log-transformed data)0.5)2, corresponding to the square of sigma*.

#### A Numeric Example

#### Corresponding ecoEditor uncertainty window

Consult the excel file for a detailed example.