+5 votes
4.7k views

I would like to know how OpenLCA builds the inventory for a process, to be able to get the same (or close) numbers by quering the underlying database. There is something I don't understand in OpenLCA algorithms. I tried to look through source code of "olca-modules" but it's a bit hard to follow. Maybe if I finally succeed in starting a dev build of the software, I'll get more information by attaching a step-by-step debugger. However, there is a possibility that I am missing something really simple, so I will try to explain my approach, and hope you will correct / add a missing part to it.

As far as I understand it, the most important relations are:

PROCESSES

FLOWS

EXCHANGES

IMPACT_METHODS

IMPACT_CATEGORIES

IMPACT_FACTORS

A process has multiple EXCHANGES, each representing an INPUT or OUTPUT, depending on the IS_INPUT field value. Every EXCHANGE is connected to a FLOW, which can be of one of the three types:
 

PRODUCT_FLOW

WASTE_FLOW

ELEMENTARY_FLOW

For myself, I see the ELEMENTARY_FLOWs connected to a process directly as "direct impacts" of that process, those can be seen in the OpenLCA software when you open a process and go to the "Impact analysis" tab.

These "direct impacts" can be calculated with this SQL (SQL#1):


select im.NAME AS method, ic.NAME AS ic_name, ic.REFERENCE_UNIT, u.NAME AS unit, SUM(fact.VALUE * e.RESULTING_AMOUNT_VALUE) AS impact from TBL_EXCHANGES AS e JOIN TBL_FLOWS AS f ON (f.ID = e.F_FLOW) JOIN TBL_IMPACT_FACTORS AS fact ON (fact.F_FLOW = f.ID) JOIN TBL_UNITS AS u ON (u.ID = fact.F_UNIT) JOIN TBL_IMPACT_CATEGORIES AS ic ON (ic.ID = fact.F_IMPACT_CATEGORY) JOIN TBL_IMPACT_METHODS AS im ON (im.ID = ic.F_IMPACT_METHOD) WHERE e.F_OWNER=<PROCESS ID> GROUP BY im.NAME, ic.NAME, ic.REFERENCE_UNIT, u.NAME;


The result is a full list of impacts for all impact methods and categories for a given process ID. These numbers seem to precisely match to what I see in the "Impact analysis" tab for a process.

What I do next is I try to get the impacts for a product system built from a process. In OpenLCA, I create product system as folows:

auto-link processes

only link default providers

system process

The background DB that I use (ecoinvent 3.5) does not have any flows with type=WASTE_FLOW, so I'm not currently going to implement waste flow calculations. My algorithm does the following for a given process ID:

Finds all the EXCHANGES with IS_INPUT=1 that are linked to FLOWS with TYPE=PRODUCT_FLOW

For every such exchange, look at its F_DEFAULT_PROVIDER field (this is an ID of another process, that results in the FLOW from the point 1).

Write F_DEFAULT_PROVIDER and the RESULTING_AMOUNT_VALUE from the EXCHANGE, to the "inventory" table. The "amount" I call "cumulative factor" - for the word "cumulative" see below.

Repeat pp. 1-3 for every default provider, except that the amount (EXCHANGES.RESULTING_AMOUNT_VALUE) is now multiplied by the amount from previous step - that's where "cumulative" comes from.

Repeat further on, recursively. Because Ecoinvent dataset is extremely sophisticated, it always incurs circular dependencies, and even without those, nesting depth easily reaches 1000+ making it essentially impossible to calculate. To handle that, I set a limit on cumulative factor, say, 0.00001, considering values below this threshold, negligible.

In the end, I sum up all the "cumulative factors" by process ID, and I get the final inventory table. Combining that with "direct impacts" for every process ID, that I can easily calculate, I can calculate impacts in any category that result from any linked process (or the total amount of those impacts). Problem is that inventories calculated with this method do not match to those calculated by OpenLCA. In general, I get somewhat close numbers, and if I set the precision threshold high enough, numbers match better, although it takes a significant amount of time. The results differ from process to process, most of the time my results are 1-3% lower, but sometimes they can be 10-15% lower. Such big difference can not be accounted for by just not setting precision threshold high enough. And it makes me think I am missing something significant that OpenLCA does and my algorithm does not.

In order to be able to use some advanced SQL features like recursive queries, I have imported ecoinvent-3.5 data into MySQL 8.0, using the Apache Derby DB from GreenDelta as the source. This is my recursive query that calculates an "inventory" of a process (SQL#2):

WITH RECURSIVE cte ( LEVEL, PARENT_ID, CHILD_ID, F ) AS ( SELECT 0 AS LEVEL, xp.F_OWNER AS PARENT_ID, xp.F_DEFAULT_PROVIDER AS CHILD_ID, xp.RESULTING_AMOUNT_VALUE AS F FROM TBL_EXCHANGES AS xp WHERE F_OWNER = <TARGET PROCESS ID> AND xp.F_DEFAULT_PROVIDER > 0 UNION ALL SELECT cte.LEVEL + 1 AS LEVEL, xc.F_OWNER AS PARENT_ID, xc.F_DEFAULT_PROVIDER AS CHILD_ID, xc.RESULTING_AMOUNT_VALUE * COALESCE(PARAMETER1_VALUE, 1) * COALESCE(PARAMETER2_VALUE, 1) * cte.F AS F FROM TBL_EXCHANGES AS xc INNER JOIN cte ON xc.F_OWNER = cte.CHILD_ID AND ABS(F) > <PRECISION LIMIT, i.e. 0.00001> WHERE xc.F_DEFAULT_PROVIDER > 0 ) SELECT CHILD_ID, SUM(F) FROM cte GROUP BY CHILD_ID

This query results in a map of PROCESS IDs to cumulative factor. In order to calculate all the impacts due to a certain process ID X, to the TARGET PROCESS ID, I take the X's cumulative factor (XF), calculate X's impacts with SQL#1, then multiply each impact factor by XF. Then I compare the number to those calculated by OpenLCA - product system / calculate / quick results / impact analysis.

I find the required impact method and category, and the process I'm interested in, and compare my numbers to OpenLCA. My numbers are always a bit lower, and I think I missing something here.

in openLCA by (190 points)
edited by

2 Answers

+3 votes
by (8.9k points)
edited by

You are using a sequential approach to calculate the scaling factors for each process within the product system. As you correctly assessed the ecoinvent system contains loops that can't be solved sequentially without using a limit for following the loops.

openLCA uses a different calculation method that is mathematically precise in solving these loops. The calculation uses a matrix approach for calculation of the scaling factors.

What you know when you have process inventories is, the so called Technology Matrix (A), defining the product connections in the product system, and the final demand (f) or the target amount, but unknown is the scaling factors (s). Mathematically this can be described as A*s = f

The equation for calculating the scaling vectors is then: s = A^(-1)*f with A^(-1) being the inverse of A.

Here is a quick example (for simplicity all amounts are given in kg, therefore ignoring the unit for now): Process A (P1) produces the product Pr1 with amount 1kg. P1 uses 3kg of a Product (Pr2) which is produced by in 1kg by Process B (P2) . The product system defines  Pr1 as target product with target amount 2kg. You fill the matrix by entering positive amounts for outputs and negative amounts for inputs, in this case:

So you end up with scaling factors of 2 for P1 and 6 for P2.

There is a paper "The computational structure of Life Cycle Assessment" by Reinout Heijungs and Sangwon Suh describing this in detail.

by (190 points)
In my calculation server, written in Golang, I build the full tech matrix for all the 16000 processes in Ecoinvent database, invert it and have it at my disposal in memory (this takes a few minutes, all cores busy and a fair amount of RAM used). Then it takes a simple multiplication of this matrix by a demand vector, to get the scale factors. It's done in matter of 0.01 - 0.02 seconds for any given process, which is drastically better then OpenLCA. But yes, you guys are dealing with a DB which a subject to constant change by whoever is working with this DB, whereas I need to perform calculations on reference data which never changes in given DB (ecoinvent in this case). Many thanks for helping me out here!
+2 votes
by (14.0k points)

Just some points in addition to the answer from Sebastian.

In general, using such a sequential approach is fine but with huge databases like ecoinvent such differences compared to a direct matrix based approach are quite normal due to the cut-off settings for loop handling but also due to floating-point accuracy as you will handle often a lot of very small numbers. (I did some tests using a sequential solver in openLCA for ecoinvent some time ago and have seen similar differences depending on the flow).

As Sebastian wrote, openLCA uses matrix based methods (though matrix inversion just for a full analysis). We use high performance math libraries for solving these equations. Such libraries often try to minimize such floating-point issues with clever tricks. But also the cut-off in the sequential method plays a role of course.

Additionally, I think there is an issue with this line in your script: 

xc.RESULTING_AMOUNT_VALUE
  * COALESCE(PARAMETER1_VALUE, 1)
  * COALESCE(PARAMETER2_VALUE, 1)
  * cte.F AS F

`PARAMETER1_VALUE` and `PARAMETER2_VALUE` are the parameters of a possible uncertainty distribution.

Also, I think your script assumes that the output amount of a linked product is always 1.0 and that everything is in the same unit (not sure if this is always the case in ecoinvent).

by (190 points)
Thanks for clarification on PARAMETER***_VALUE . I was thinking smth like that but was not sure. I think I have also tried exactly what you proposed but then the results got really too far off from OpenLCA. Maybe I made some stupid mistake.

I also have checked beforehand that all the outputs are equal to 1 in ecoinvent. Also, I started to get results that are super-close to OpenLCA, using the matrix calculations.
...