I would like to know how OpenLCA builds the inventory for a process, to be able to get the same (or close) numbers by quering the underlying database. There is something I don't understand in OpenLCA algorithms. I tried to look through source code of "olca-modules" but it's a bit hard to follow. Maybe if I finally succeed in starting a dev build of the software, I'll get more information by attaching a step-by-step debugger. However, there is a possibility that I am missing something really simple, so I will try to explain my approach, and hope you will correct / add a missing part to it.
As far as I understand it, the most important relations are:
PROCESSES
FLOWS
EXCHANGES
IMPACT_METHODS
IMPACT_CATEGORIES
IMPACT_FACTORS
A process has multiple EXCHANGES, each representing an INPUT or OUTPUT, depending on the IS_INPUT field value. Every EXCHANGE is connected to a FLOW, which can be of one of the three types:
PRODUCT_FLOW
WASTE_FLOW
ELEMENTARY_FLOW
For myself, I see the ELEMENTARY_FLOWs connected to a process directly as "direct impacts" of that process, those can be seen in the OpenLCA software when you open a process and go to the "Impact analysis" tab.
These "direct impacts" can be calculated with this SQL (SQL#1):
select im.NAME AS method, ic.NAME AS ic_name, ic.REFERENCE_UNIT, u.NAME AS unit, SUM(fact.VALUE * e.RESULTING_AMOUNT_VALUE) AS impact from TBL_EXCHANGES AS e JOIN TBL_FLOWS AS f ON (f.ID = e.F_FLOW) JOIN TBL_IMPACT_FACTORS AS fact ON (fact.F_FLOW = f.ID) JOIN TBL_UNITS AS u ON (u.ID = fact.F_UNIT) JOIN TBL_IMPACT_CATEGORIES AS ic ON (ic.ID = fact.F_IMPACT_CATEGORY) JOIN TBL_IMPACT_METHODS AS im ON (im.ID = ic.F_IMPACT_METHOD) WHERE e.F_OWNER=<PROCESS ID> GROUP BY im.NAME, ic.NAME, ic.REFERENCE_UNIT, u.NAME;
The result is a full list of impacts for all impact methods and categories for a given process ID. These numbers seem to precisely match to what I see in the "Impact analysis" tab for a process.
What I do next is I try to get the impacts for a product system built from a process. In OpenLCA, I create product system as folows:
auto-link processes
only link default providers
system process
The background DB that I use (ecoinvent 3.5) does not have any flows with type=WASTE_FLOW, so I'm not currently going to implement waste flow calculations. My algorithm does the following for a given process ID:
Finds all the EXCHANGES with IS_INPUT=1 that are linked to FLOWS with TYPE=PRODUCT_FLOW
For every such exchange, look at its F_DEFAULT_PROVIDER field (this is an ID of another process, that results in the FLOW from the point 1).
Write F_DEFAULT_PROVIDER and the RESULTING_AMOUNT_VALUE from the EXCHANGE, to the "inventory" table. The "amount" I call "cumulative factor" - for the word "cumulative" see below.
Repeat pp. 1-3 for every default provider, except that the amount (EXCHANGES.RESULTING_AMOUNT_VALUE) is now multiplied by the amount from previous step - that's where "cumulative" comes from.
Repeat further on, recursively. Because Ecoinvent dataset is extremely sophisticated, it always incurs circular dependencies, and even without those, nesting depth easily reaches 1000+ making it essentially impossible to calculate. To handle that, I set a limit on cumulative factor, say, 0.00001, considering values below this threshold, negligible.
In the end, I sum up all the "cumulative factors" by process ID, and I get the final inventory table. Combining that with "direct impacts" for every process ID, that I can easily calculate, I can calculate impacts in any category that result from any linked process (or the total amount of those impacts). Problem is that inventories calculated with this method do not match to those calculated by OpenLCA. In general, I get somewhat close numbers, and if I set the precision threshold high enough, numbers match better, although it takes a significant amount of time. The results differ from process to process, most of the time my results are 1-3% lower, but sometimes they can be 10-15% lower. Such big difference can not be accounted for by just not setting precision threshold high enough. And it makes me think I am missing something significant that OpenLCA does and my algorithm does not.
In order to be able to use some advanced SQL features like recursive queries, I have imported ecoinvent-3.5 data into MySQL 8.0, using the Apache Derby DB from GreenDelta as the source. This is my recursive query that calculates an "inventory" of a process (SQL#2):
WITH RECURSIVE cte ( LEVEL, PARENT_ID, CHILD_ID, F ) AS ( SELECT 0 AS LEVEL, xp.F_OWNER AS PARENT_ID, xp.F_DEFAULT_PROVIDER AS CHILD_ID, xp.RESULTING_AMOUNT_VALUE AS F FROM TBL_EXCHANGES AS xp WHERE F_OWNER = <TARGET PROCESS ID> AND xp.F_DEFAULT_PROVIDER > 0 UNION ALL SELECT cte.LEVEL + 1 AS LEVEL, xc.F_OWNER AS PARENT_ID, xc.F_DEFAULT_PROVIDER AS CHILD_ID, xc.RESULTING_AMOUNT_VALUE * COALESCE(PARAMETER1_VALUE, 1) * COALESCE(PARAMETER2_VALUE, 1) * cte.F AS F FROM TBL_EXCHANGES AS xc INNER JOIN cte ON xc.F_OWNER = cte.CHILD_ID AND ABS(F) > <PRECISION LIMIT, i.e. 0.00001> WHERE xc.F_DEFAULT_PROVIDER > 0 ) SELECT CHILD_ID, SUM(F) FROM cte GROUP BY CHILD_ID
This query results in a map of PROCESS IDs to cumulative factor. In order to calculate all the impacts due to a certain process ID X, to the TARGET PROCESS ID, I take the X's cumulative factor (XF), calculate X's impacts with SQL#1, then multiply each impact factor by XF. Then I compare the number to those calculated by OpenLCA - product system / calculate / quick results / impact analysis.
I find the required impact method and category, and the process I'm interested in, and compare my numbers to OpenLCA. My numbers are always a bit lower, and I think I missing something here.