Module tax_deficit_simulator.calculator

This module is dedicated to simulations based on macroeconomic data, namely the anonymized and aggregated country-by- country data published by the OECD for the year 2016 and the data compiled by Tørløv, Wier and Zucman (2020).

Defining the TaxDeficitCalculator class, which encapsulates all computations for the multilaral, imperfect coordination and unilateral scenarios presented in the report, this module pursues two main goals:

  • providing the computational logic for simulations run on the tax deficit online simulator;

  • allowing any Python user to reproduce the results presented in the report and to better understand the assumptions that lie behind our estimates.

All explanations regarding the estimation methodology can be found in the body of the report or in its appendices. Com- plementary information about how computations are run in Python can be found in the following docstrings and comments.

So far, the code presented here has not yet been optimized for performance. Feedback on how to improve computation ti- mes, the readability of the code or anything else are very much welcome!

Expand source code
"""
This module is dedicated to simulations based on macroeconomic data, namely the anonymized and aggregated country-by-
country data published by the OECD for the year 2016 and the data compiled by Tørløv, Wier and Zucman (2020).

Defining the TaxDeficitCalculator class, which encapsulates all computations for the multilaral, imperfect coordination
and unilateral scenarios presented in the report, this module pursues two main goals:

- providing the computational logic for simulations run on the tax deficit online simulator;

- allowing any Python user to reproduce the results presented in the report and to better understand the assumptions
that lie behind our estimates.

All explanations regarding the estimation methodology can be found in the body of the report or in its appendices. Com-
plementary information about how computations are run in Python can be found in the following docstrings and comments.

So far, the code presented here has not yet been optimized for performance. Feedback on how to improve computation ti-
mes, the readability of the code or anything else are very much welcome!
"""

# ----------------------------------------------------------------------------------------------------------------------
# --- Imports

import numpy as np
import pandas as pd

import os

from utils import rename_partner_jurisdictions, manage_overlap_with_domestic, COUNTRIES_WITH_MINIMUM_REPORTING, \
    COUNTRIES_WITH_CONTINENTAL_REPORTING, impute_missing_carve_out_values


# ----------------------------------------------------------------------------------------------------------------------
# --- Defining paths to data files and other utils

path_to_dir = os.path.dirname(os.path.abspath(__file__))

# We fetch the list of EU-28 and EU-27 country codes from a .csv file in the data folder
path_to_eu_countries = os.path.join(path_to_dir, 'data', 'listofeucountries_csv.csv')
eu_country_codes = list(pd.read_csv(path_to_eu_countries, delimiter=';')['Alpha-3 code'])

eu_27_country_codes = eu_country_codes.copy()
eu_27_country_codes.remove('GBR')

# We fetch the list of tax havens' alpha-3 country codes from a .csv file in the data folder
path_to_tax_haven_list = os.path.join(path_to_dir, 'data', 'tax_haven_list.csv')
tax_haven_country_codes = list(pd.read_csv(path_to_tax_haven_list, delimiter=';')['Alpha-3 code'])

# Absolute paths to data files, especially useful to run the app.py file
path_to_oecd = os.path.join(path_to_dir, 'data', 'oecd.csv')
path_to_twz = os.path.join(path_to_dir, 'data', 'twz.csv')
path_to_twz_domestic = os.path.join(path_to_dir, 'data', 'twz_domestic.csv')
path_to_twz_CIT = os.path.join(path_to_dir, 'data', 'twz_CIT.csv')
path_to_preprocessed_mean_wages = os.path.join(path_to_dir, 'data', 'preprocessed_mean_wages.csv')
path_to_statutory_rates = os.path.join(path_to_dir, 'data', 'statutory_rates.csv')


# ----------------------------------------------------------------------------------------------------------------------
# --- Defining the TaxDeficitCalculator class

class TaxDeficitCalculator:

    def __init__(
        self,
        alternative_imputation=True,
        carve_outs=False,
        carve_out_rate=None,
        depreciation_only=None, exclude_inventories=None, payroll_premium=20
    ):
        """
        This is the instantiation method for the TaxDeficitCalculator class.

        All its arguments have a default value. The two main ones are as follows:

        - the boolean "alternative_imputation" (set to True by default), determines whether the imputation of the non-
        haven tax deficit of non-OECD reporting countries at minimum rates of 20% or below is operated. For more details
        on this methodological choice, one can refer to Appendix A of the report;

        - the boolean "carve_outs" (False by default) indicates whether to simulate substance-based carve-outs.

        If the latter argument is set to True, additional arguments are required:

        - the "carve_out_rate" (float between 0 and 1) determines what share of tangible assets and payroll should be
        deduced from the pre-tax profits of multinationals;

        - the boolean "depreciation_only" indicates whether to only account for depreciation expenses (instead of the
        full value of tangible assets) in the tangible assets component of the carve-outs. Following the methodology of
        the OECD Secretariat in its Economic Impact Assessment of Oct. 2020, is this argument is set to True, we appro-
        ximate depreciation expenses as 10% of the book value of tangible assets;

        - the boolean "exclude_inventories" indicates whether to downgrade the tangible assets values provided by the
        OECD's aggregated and anonymized country-by-country data. As a simplification of the OECD's methodology (Oct.
        2020), if the argument is set to True, we reduce all tangible assets by 24%;

        - "payroll_premium" (float between 0 and 100 (considered as a %)) determines what upgrade to apply to the pay-
        roll proxy. Indeed, the latter is based on ILO's data about per-country mean annual earnings. Considering that
        the employees of large multinationals generally earn above-average wages, we propose to apply a premium to our
        payroll proxy.

        The instantiation function is mainly used to define several object attributes that generally correspond to as-
        sumptions taken in the report.
        """

        # These attributes will store the data loaded with the "load_clean_data" method
        self.oecd = None
        self.twz = None
        self.twz_domestic = None
        self.twz_CIT = None
        self.mean_wages = None

        # For non-OECD reporting countries, data are taken from TWZ 2019 appendix tables
        # An effective tax rate of 20% is assumed to be applied on profits registered in non-havens
        self.assumed_non_haven_ETR_TWZ = 0.2

        # An effective tax rate of 10% is assumed to be applied on profits registered in tax havens
        self.assumed_haven_ETR_TWZ = 0.1

        # Average exchange rate over the year 2016, extracted from benchmark computations run on Stata
        # Source: European Central Bank
        self.USD_to_EUR_2016 = 1 / 1.1069031

        # self.multiplier_EU = 1.13381004333496
        # self.multiplier_world = 1.1330304145813

        # Gross growth rate of worldwide GDP in current EUR between 2016 and 2021
        # Extracted from benchmark computations run on Stata
        self.multiplier_2021 = 1.1330304145813

        # For rates of 0.2 or lower an alternative imputation is used to estimate the non-haven tax deficit of non-OECD
        # reporting countries; this argument allows to enable or disable this imputation
        self.alternative_imputation = alternative_imputation
        self.reference_rate_for_alternative_imputation = 0.25

        # The list of countries whose tax deficit is partly collected by EU countries in the intermediary scenario
        self.country_list_intermediary_scenario = [
            'USA',
            'AUS',
            'CAN',
            'CHL',
            'MEX',
            'NOR',
            'BMU',
            'BRA',
            'CHN',
            'IND',
            'SGP',
            'ZAF',
            'IDN',
            'JPN'
        ]

        # This boolean indicates whether or not to apply substance-based carve-outs
        self.carve_outs = carve_outs

        # In case we want to simulate substance-based carve-outs, a few additional steps are required
        if carve_outs:

            # We first check whether all the required parameters were provided
            if carve_out_rate is None or depreciation_only is None or exclude_inventories is None:

                raise Exception(
                    'If you want to simulate substance-based carve-outs, you need to indicate all the parameters.'
                )

            self.carve_out_rate = carve_out_rate
            self.depreciation_only = depreciation_only
            self.exclude_inventories = exclude_inventories
            self.payroll_premium = payroll_premium

            # This corresponds to the OECD Secretariat's simulations in its Economic Impact Assessment (Oct. 2020):
            # inventories are excluded from tangible assets and only depreciation expenses can be partly deducted
            if depreciation_only and exclude_inventories:
                self.assets_multiplier = 0.1 * (1 - 0.24)

            # Here, we only account for depreciation expenses but do not exclude inventories
            elif depreciation_only and not exclude_inventories:
                self.assets_multiplier = 0.1

            # In this case, we take the full value of tangible assets to form the tangible assets component of substan-
            # ce-based carve-outs, while excluding inventories
            elif not depreciation_only and exclude_inventories:
                self.assets_multiplier = (1 - 0.24)

            # Benchmark case, where we take the full value of tangible assets without adjusting for inventories
            else:
                self.assets_multiplier = 1

        else:
            self.carve_out_rate = None
            self.depreciation_only = None
            self.exclude_inventories = None

    def load_clean_data(
        self,
        path_to_oecd=path_to_oecd,
        path_to_twz=path_to_twz,
        path_to_twz_domestic=path_to_twz_domestic,
        path_to_twz_CIT=path_to_twz_CIT,
        path_to_preprocessed_mean_wages=path_to_preprocessed_mean_wages,
        path_to_statutory_rates=path_to_statutory_rates,
        inplace=True
    ):
        """
        This method allows to load and clean data from 6 different sources:

        - the "oecd.csv" file which was extracted from the OECD's aggregated and anonymized country-by-country repor-
        ting, considering only the positive profit sample. Figures are in 2016 USD;

        - the "twz.csv" file which was extracted from the Table C4 of the TWZ 2019 online appendix. It presents, for
        a number of countries, the amounts of profits shifted to tax havens that are re-allocated to them on an ultima-
        te ownership basis. Figures are in 2016 USD million;

        - the "twz_domestic.csv" file, taken from the outputs of benchmark computations run on Stata. It presents for
        each country the amount of corporate profits registered locally by domestic MNEs and the effective tax rate to
        which they are subject. Figures are in 2016 USD billion;

        - the "twz_CIT.csv" file, extracted from Table U1 of the TWZ 2019 online appendix. It presents the corporate in-
        come tax revenue of each country in 2016 USD billion;

        - the "preprocessed_mean_wages.csv" file, taken from the outputs of substance-based carve-outs run on Stata. For
        each partner jurisdiction in the OECD's country-by-country data, it provides either a measure or an approxima-
        tion of the local mean annual earnings in 2016 in current USD. It is built upon ILO data, more details being
        provided in the methodological section of the Note n°1 of the Observatory on substance-based carve-outs;

        - the "statutory_rates.csv" file that provides, for a number of partner jurisdictions, their 2016 statutory cor-
        porate income tax rates.

        Default paths are used to let the simulator run via the app.py file. If you wish to use the tax_deficit_calcula-
        tor package in another context, you can save the data locally and give the method paths to the data files. The
        possibility to load the files from an online host instead will soon be implemented.
        """
        try:

            # We try to read the files from the provided paths
            oecd = pd.read_csv(path_to_oecd)
            twz = pd.read_csv(path_to_twz, delimiter=';')
            twz_domestic = pd.read_csv(path_to_twz_domestic, delimiter=';')
            twz_CIT = pd.read_csv(path_to_twz_CIT, delimiter=';')
            preprocessed_mean_wages = pd.read_csv(path_to_preprocessed_mean_wages, delimiter=';')
            statutory_rates = pd.read_csv(path_to_statutory_rates, delimiter=';')

        except FileNotFoundError:

            # If at least one of the files is not found
            raise Exception('Are you sure these are the right paths for the source files?')

        # --- Cleaning the OECD data

        # We drop a few irrelevant columns from country-by-country data
        oecd.drop(
            columns=['PAN', 'Grouping', 'Flag Codes', 'Flags', 'YEA', 'Year'],
            inplace=True
        )

        # We reshape the DataFrame from a long to a wide dataset
        oecd = oecd.pivot(
            index=['COU', 'Ultimate Parent Jurisdiction', 'JUR', 'Partner Jurisdiction'],
            columns='Variable',
            values='Value'
        ).reset_index()

        # We rename some columns to match the code that has been written before modifying how OECD data are loaded
        oecd.rename(
            columns={
                'COU': 'Parent jurisdiction (alpha-3 code)',
                'Ultimate Parent Jurisdiction': 'Parent jurisdiction (whitespaces cleaned)',
                'JUR': 'Partner jurisdiction (alpha-3 code)',
                'Partner Jurisdiction': 'Partner jurisdiction (whitespaces cleaned)'
            },
            inplace=True
        )

        # Thanks to a function defined in utils.py, we rename the "Foreign Jurisdictions Total" field for all countries
        # that only report a domestic / foreign breakdown in their CbCR
        oecd['Partner jurisdiction (whitespaces cleaned)'] = oecd.apply(rename_partner_jurisdictions, axis=1)

        # We eliminate stateless entities and the "Foreign Jurisdictions Total" filds
        oecd = oecd[
            ~oecd['Partner jurisdiction (whitespaces cleaned)'].isin(['Foreign Jurisdictions Total', 'Stateless'])
        ].copy()

        # We replace missing "Income Tax Paid" values by the corresponding "Income Tax Accrued" values
        # (Some missing values remain even after this edit)
        oecd['Income Tax Paid (on Cash Basis)'] = oecd.apply(
            (
                lambda row: row['Income Tax Paid (on Cash Basis)']
                if not np.isnan(row['Income Tax Paid (on Cash Basis)'])
                else row['Income Tax Accrued - Current Year']
            ),
            axis=1
        )

        # We clean the statutory corporate income tax rate dataset
        statutory_rates['statrate'] = statutory_rates['statrate'].map(
            lambda x: x.replace(',', '.') if isinstance(x, str) else x
        ).astype(float)

        # And we merge it with country-by-country data, on partner jurisdiction alpha-3 codes
        oecd = oecd.merge(
            statutory_rates,
            how='left',
            left_on='Partner jurisdiction (alpha-3 code)', right_on='partner'
        )

        oecd.drop(columns=['partner'], inplace=True)

        # We impute missing "Income Tax Paid" values assuming that pre-tax profits are taxed at the local statutory rate
        oecd['Income Tax Paid (on Cash Basis)'] = oecd.apply(
            (
                lambda row: row['Income Tax Paid (on Cash Basis)']
                if not np.isnan(row['Income Tax Paid (on Cash Basis)'])
                else row['Profit (Loss) before Income Tax'] * row['statrate']
            ),
            axis=1
        )

        oecd.drop(columns=['statrate'], inplace=True)

        # ETR computation (using tax paid as the numerator)
        oecd['ETR'] = oecd['Income Tax Paid (on Cash Basis)'] / oecd['Profit (Loss) before Income Tax']
        oecd['ETR'] = oecd['ETR'].map(lambda x: 0 if x < 0 else x)

        # Adding an indicator variable for domestic profits (rows with the same parent and partner jurisdiction)
        oecd['Is domestic?'] = oecd.apply(
            lambda row: row['Parent jurisdiction (alpha-3 code)'] == row['Partner jurisdiction (alpha-3 code)'],
            axis=1
        ) * 1

        # We add an indicator variable that takes value 1 if and only if the partner is a tax haven
        oecd['Is partner jurisdiction a tax haven?'] = oecd['Partner jurisdiction (alpha-3 code)'].isin(
            tax_haven_country_codes
        ) * 1

        # Adding another indicator variable that takes value 1 if and only if the partner is not a tax haven
        oecd['Is partner jurisdiction a non-haven?'] = 1 - oecd['Is partner jurisdiction a tax haven?']

        # This indicator variable is used specifically for the simulation of carve-outs; it takes value 1 if and only if
        # the partner jurisdiction is not the parent jurisdiction, not a tax haven and not a regional aggregate
        oecd['Is partner jurisdiction a non-haven? - CO'] = oecd.apply(
            (
                lambda row: 0
                if (
                    row['Parent jurisdiction (alpha-3 code)'] in COUNTRIES_WITH_MINIMUM_REPORTING
                    and row['Partner jurisdiction (alpha-3 code)'] == 'FJT'
                ) or (
                    row['Parent jurisdiction (alpha-3 code)'] in COUNTRIES_WITH_CONTINENTAL_REPORTING
                    and row['Partner jurisdiction (alpha-3 code)'] in ['GRPS', 'AFRIC', 'AMER', 'ASIAT', 'EUROP']
                ) or (
                    row['Is domestic?'] == 1
                )
                else row['Is partner jurisdiction a non-haven?']
            ),
            axis=1
        )

        # This indicator variable, used specifically for the simulation of carve-outs, takes value 1 if and only if the
        # partner is a regional aggregate
        oecd['Is partner jurisdiction an aggregate partner? - CO'] = np.logical_and(
            oecd['Is domestic?'] == 0,
            np.logical_and(
                oecd['Is partner jurisdiction a non-haven? - CO'] == 0,
                oecd['Is partner jurisdiction a tax haven?'] == 0
            )
        ) * 1

        # Thanks to a small function imported from utils.py, we manage the slightly problematic overlap between the
        # various indicator variables ("Is domestic?" sort of gets the priority over the others)
        oecd['Is partner jurisdiction a tax haven?'] = oecd.apply(
            lambda row: manage_overlap_with_domestic(row, 'haven'),
            axis=1
        )

        oecd['Is partner jurisdiction a non-haven?'] = oecd.apply(
            lambda row: manage_overlap_with_domestic(row, 'non-haven'),
            axis=1
        )

        # We need some more work on the data if we want to simulate substance-based carve-outs
        if self.carve_outs:

            # We merge earnings data with country-by-country data on partner jurisdiction codes
            oecd = oecd.merge(
                preprocessed_mean_wages[['partner2', 'earn']],
                how='left',
                left_on='Partner jurisdiction (alpha-3 code)', right_on='partner2'
            )

            oecd.drop(columns=['partner2'], inplace=True)

            oecd.rename(
                columns={
                    'earn': 'ANNUAL_VALUE'
                },
                inplace=True
            )

            # We clean the mean annual earnings column
            oecd['ANNUAL_VALUE'] = oecd['ANNUAL_VALUE'].map(
                lambda x: x.replace(',', '.') if isinstance(x, str) else x
            ).astype(float)

            # We deduce the payroll proxy from the number of employees and from mean annual earnings
            oecd['PAYROLL'] = oecd['Number of Employees'] * oecd['ANNUAL_VALUE'] * (1 + self.payroll_premium / 100)

            # We compute substance-based carve-outs from both payroll and tangible assets
            oecd['CARVE_OUT'] = self.carve_out_rate * (
                oecd['PAYROLL'] + oecd['Tangible Assets other than Cash and Cash Equivalents'] * self.assets_multiplier
            )

            # This column will contain slightly modified carve-outs, carve-outs being replaced by pre-tax profits
            # wherever the former exceeds the latter
            oecd['CARVE_OUT_TEMP'] = oecd.apply(
                (
                    lambda row: row['CARVE_OUT'] if row['Profit (Loss) before Income Tax'] > row['CARVE_OUT']
                    or np.isnan(row['CARVE_OUT'])
                    else row['Profit (Loss) before Income Tax']
                ),
                axis=1
            )

            # We exclude rows with missing carve-out values in a temporary DataFrame
            oecd_temp = oecd[
                ~np.logical_or(
                    oecd['PAYROLL'].isnull(),
                    oecd['Tangible Assets other than Cash and Cash Equivalents'].isnull()
                )
            ].copy()

            # We compute the average reduction in non-haven pre-tax profits due to carve-outs
            self.avg_carve_out_impact_non_haven = (
                oecd_temp[
                    oecd_temp['Is partner jurisdiction a non-haven? - CO'] == 1
                ]['CARVE_OUT_TEMP'].sum() /
                oecd_temp[
                    oecd_temp['Is partner jurisdiction a non-haven? - CO'] == 1
                ]['Profit (Loss) before Income Tax'].sum()
            )

            # We do the same for pre-tax profits booked in tax havens, domestically and in aggregate partners
            self.avg_carve_out_impact_tax_haven = (
                oecd_temp[oecd_temp['Is partner jurisdiction a tax haven?'] == 1]['CARVE_OUT_TEMP'].sum() /
                oecd_temp[
                    oecd_temp['Is partner jurisdiction a tax haven?'] == 1
                ]['Profit (Loss) before Income Tax'].sum()
            )
            self.avg_carve_out_impact_domestic = (
                oecd_temp[oecd_temp['Is domestic?'] == 1]['CARVE_OUT_TEMP'].sum() /
                oecd_temp[oecd_temp['Is domestic?'] == 1]['Profit (Loss) before Income Tax'].sum()
            )
            self.avg_carve_out_impact_aggregate = (
                oecd_temp[
                    oecd_temp['Is partner jurisdiction an aggregate partner? - CO'] == 1
                ]['CARVE_OUT_TEMP'].sum() /
                oecd_temp[
                    oecd_temp['Is partner jurisdiction an aggregate partner? - CO'] == 1
                ]['Profit (Loss) before Income Tax'].sum()
            )

            # We impute missing carve-out values based on these average reductions in pre-tax profits
            oecd['CARVE_OUT'] = oecd.apply(
                lambda row: impute_missing_carve_out_values(
                    row,
                    avg_carve_out_impact_domestic=self.avg_carve_out_impact_domestic,
                    avg_carve_out_impact_tax_haven=self.avg_carve_out_impact_tax_haven,
                    avg_carve_out_impact_non_haven=self.avg_carve_out_impact_non_haven,
                    avg_carve_out_impact_aggregate=self.avg_carve_out_impact_aggregate
                ),
                axis=1
            )

            # Some missing values remain whenever profits before tax are missing
            oecd = oecd[~oecd['CARVE_OUT'].isnull()].copy()

            # We remove substance-based carve-outs from pre-tax profits
            oecd['Profit (Loss) before Income Tax'] = oecd.apply(
                (
                    lambda row: row['Profit (Loss) before Income Tax'] - row['CARVE_OUT']
                    if row['Profit (Loss) before Income Tax'] - row['CARVE_OUT'] >= 0
                    else 0
                ),
                axis=1
            )

        # --- Cleaning the TWZ tax haven profits data

        # Adding an indicator variable for OECD reporting - We do not consider the Swedish CbCR
        twz['Is parent in OECD data?'] = twz['Alpha-3 country code'].map(
            lambda x: x in oecd['Parent jurisdiction (alpha-3 code)'].unique() if x != 'SWE' else False
        ) * 1

        # We reformat numeric columns - Resulting figures are expressed in 2016 USD
        for column_name in ['Profits in all tax havens', 'Profits in all tax havens (positive only)']:
            twz[column_name] = twz[column_name].map(lambda x: x.replace(',', '.'))
            twz[column_name] = twz[column_name].astype(float) * 1000000

            if self.carve_outs:
                # If we want to simulate carve-outs, we need to downgrade TWZ tax haven profits by the average reduction
                # due to carve-outs that is observed for tax haven profits in the OECD data
                twz[column_name] *= (1 - self.avg_carve_out_impact_tax_haven)
            else:
                continue

        # We filter out countries with 0 profits in tax havens
        twz = twz[twz['Profits in all tax havens (positive only)'] > 0].copy()

        # --- Cleaning the TWZ domestic profits data

        # Reformatting the profits column - Resulting figures are expressed in 2016 USD
        twz_domestic['Domestic profits'] = twz_domestic['Domestic profits']\
            .map(lambda x: x.replace(',', '.'))\
            .astype(float) * 1000000000

        # Reformatting the ETR column
        twz_domestic['Domestic ETR'] = twz_domestic['Domestic ETR'].map(lambda x: x.replace(',', '.')).astype(float)

        if self.carve_outs:
            # If we want to simulate carve-outs, we need to downgrade TWZ domestic profits by the average reduction due
            # to carve-outs that is observed for domestic profits in the OECD data
            twz_domestic['Domestic profits'] *= (1 - self.avg_carve_out_impact_domestic)

        # --- Cleaning the TWZ CIT revenue data

        # Reformatting the CIT revenue column - Resulting figures are expressed in 2016 USD
        twz_CIT['CIT revenue'] = twz_CIT['CIT revenue']\
            .map(lambda x: x.replace(',', '.'))\
            .astype(float) * 1000000000

        if inplace:
            self.oecd = oecd.copy()
            self.twz = twz.copy()
            self.twz_domestic = twz_domestic.copy()
            self.twz_CIT = twz_CIT.copy()
            self.mean_wages = preprocessed_mean_wages.copy()

        else:

            if self.carve_outs:
                return oecd.copy(), twz.copy(), twz_domestic.copy(), twz_CIT.copy(), preprocessed_mean_wages.copy()

            else:
                return oecd.copy(), twz.copy(), twz_domestic.copy(), twz_CIT.copy()

    def get_non_haven_imputation_ratio(self, minimum_ETR):
        """
        For non-OECD reporting countries, we base our estimates on data compiled by Tørsløv, Wier and Zucman (2019).
        These allow to compute domestic and tax-haven-based tax deficit of these countries. We extrapolate the non-haven
        tax deficit of these countries from the tax-haven one.

        We impute the tax deficit in non-haven jurisdictions by estimating the ratio of tax deficits in non-tax havens
        to tax-havens for the EU non-tax haven parent countries in the CbCR data. We assume a 20% ETR in non-tax havens
        and a 10% ETR in tax havens (these rates are defined in two dedicated attributes in the instantiation function).

        This function allows to compute this ratio following the (A2) formula of Appendix A.

        The methodology is described in more details in the Appendix A of the report.
        """

        # We need to have previously loaded and cleaned the OECD data
        if self.oecd is None:
            raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

        # With a minimum ETR of 10%, the formula cannot be applied (division by 0), hence this case disjunction
        if minimum_ETR > 0.1:
            oecd = self.oecd.copy()

            # In the computation of the imputation ratio, we only focus on:
            # - EU-27 parent countries
            mask_eu = oecd['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
            # - That are not tax havens
            mask_non_haven = ~oecd['Parent jurisdiction (alpha-3 code)'].isin(tax_haven_country_codes)
            # - And report a detailed country by country breakdown in their CbCR
            mask_minimum_reporting_countries = ~oecd['Parent jurisdiction (alpha-3 code)'].isin(
                COUNTRIES_WITH_MINIMUM_REPORTING + COUNTRIES_WITH_CONTINENTAL_REPORTING
            )

            # We combine the boolean indexing masks
            mask = np.logical_and(mask_eu, mask_non_haven)
            mask = np.logical_and(mask, mask_minimum_reporting_countries)

            # And convert booleans into 0 / 1 integers
            mask = mask * 1

            # We compute the profits registered by retained countries in non-haven countries
            # (excluding domestic profits, cf. the earlier use of the manage_overlap_with_domestic function)
            foreign_non_haven_profits = (
                (
                    mask * oecd['Is partner jurisdiction a non-haven?']
                ) * oecd['Profit (Loss) before Income Tax']
            ).sum()

            # We compute the profits registered by retained countries in tax havens
            # (excluding domestic profits, cf. the earlier use of the manage_overlap_with_domestic function)
            foreign_haven_profits = (
                (
                    mask * oecd['Is partner jurisdiction a tax haven?']
                ) * oecd['Profit (Loss) before Income Tax']
            ).sum()

            # We apply the formula and compute the imputation ratio
            imputation_ratio_non_haven = (
                (
                    # If the minimum ETR is below the rate assumed to be applied on non-haven profits, there is no tax
                    # deficit to collect from these profits, which is why we have this max(..., 0)
                    max(minimum_ETR - self.assumed_non_haven_ETR_TWZ, 0) * foreign_non_haven_profits
                ) /
                ((minimum_ETR - self.assumed_haven_ETR_TWZ) * foreign_haven_profits)
            )

        # We manage the case where the minimum ETR is of 10% and the formula cannot be applied
        elif minimum_ETR == 0.1:

            # As long as tax haven profits are assumed to be taxed at a rate of 10%, the value that we set here has no
            # effect (it will be multiplied to 0 tax-haven-based tax deficits) but to remain consistent with higher
            # values of the minimum ETR, we impute 0

            imputation_ratio_non_haven = 0

        else:
            # We do not yet manage effective tax rates below 10%
            raise Exception('Unexpected minimum ETR entered (strictly below 0.1).')

        return imputation_ratio_non_haven

    def get_alternative_non_haven_factor(self, minimum_ETR):
        """
        Looking at the formula (A2) of Appendix A and at the previous method, we see that for a 15% tax rate, this impu-
        tation would result in no tax deficit to be collected from non-tax haven jurisdictions. Thus, we correct for
        this underestimation by computing the ratio of the tax deficit that can be collected in non-tax havens at a 15%
        and a 25% rate for OECD-reporting countries.

        This class method allows to compute this alternative imputation ratio.

        The methodology is described in more details in the Appendix A of the report.
        """

        # We need to have previously loaded and cleaned the OECD data
        if self.oecd is None:
            raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

        # This method is only useful if the previous one yields a ratio of 0, i.e. if the minimum ETR is of 20% or less
        if minimum_ETR > 0.2:
            raise Exception('These computations are only used when the minimum ETR considered is 0.2 or less.')

        # We use the get_stratified_oecd_data to compute the non-haven tax deficit of OECD-reporting countries
        oecd_stratified = self.get_stratified_oecd_data(
            minimum_ETR=self.reference_rate_for_alternative_imputation
        )

        # We exclude countries whose CbCR breakdown does not allow to distinguish tax-haven and non-haven profits
        df_restricted = oecd_stratified[
            ~oecd_stratified['Parent jurisdiction (alpha-3 code)'].isin(
                COUNTRIES_WITH_CONTINENTAL_REPORTING + COUNTRIES_WITH_MINIMUM_REPORTING
            )
        ].copy()

        # The denominator is the total non-haven tax deficit of relevant countries at the reference minimum ETR
        denominator = df_restricted['tax_deficit_x_non_haven'].sum()

        # We follow the same process, running computations at the minimum ETR this time
        oecd_stratified = self.get_stratified_oecd_data(minimum_ETR=minimum_ETR)

        # We exclude countries whose CbCR breakdown does not allow to distinguish tax-haven and non-haven profits
        df_restricted = oecd_stratified[
            ~oecd_stratified['Parent jurisdiction (alpha-3 code)'].isin(
                COUNTRIES_WITH_CONTINENTAL_REPORTING + COUNTRIES_WITH_MINIMUM_REPORTING
            )
        ].copy()

        # The numerator is the total non-haven tax deficit of relevant countries at the selected minimum ETR
        numerator = df_restricted['tax_deficit_x_non_haven'].sum()

        return numerator / denominator

    def get_stratified_oecd_data(self, minimum_ETR=0.25):
        """
        This method constitutes a first step in the computation of each country's collectible tax deficit in the multi-
        lateral agreement scenario.

        Taking the minimum effective tax rate as input and based on OECD data, this function outputs a DataFrame that
        displays, for each OECD-reporting parent country, the tax deficit that could be collected from the domestic,
        tax haven and non-haven profits of multinationals headquartered in this country.

        The output is in 2016 USD, like the raw OECD data.
        """

        # We need to have previously loaded and cleaned the OECD data
        if self.oecd is None:
            raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

        oecd = self.oecd.copy()

        # We only profits taxed at an effective tax rate above the minimum ETR
        oecd = oecd[oecd['ETR'] < minimum_ETR].copy()

        # We compute the ETR differential for all low-taxed profits
        oecd['ETR_differential'] = oecd['ETR'].map(lambda x: minimum_ETR - x)

        # And deduce the tax deficit generated by each Parent / Partner jurisidiction pair
        oecd['tax_deficit'] = oecd['ETR_differential'] * oecd['Profit (Loss) before Income Tax']

        # Using the aforementioned indicator variables allows to breakdown this tax deficit
        oecd['tax_deficit_x_domestic'] = oecd['tax_deficit'] * oecd['Is domestic?']
        oecd['tax_deficit_x_tax_haven'] = oecd['tax_deficit'] * oecd['Is partner jurisdiction a tax haven?']
        oecd['tax_deficit_x_non_haven'] = oecd['tax_deficit'] * oecd['Is partner jurisdiction a non-haven?']

        # We group the table by Parent jurisdiction such that for, say, France, the table displays the total domestic,
        # tax-haven and non-haven tax deficit generated by French multinationals
        oecd_stratified = oecd[
            [
                'Parent jurisdiction (whitespaces cleaned)',
                'Parent jurisdiction (alpha-3 code)',
                'tax_deficit',
                'tax_deficit_x_domestic',
                'tax_deficit_x_tax_haven',
                'tax_deficit_x_non_haven'
            ]
        ].groupby(
            'Parent jurisdiction (whitespaces cleaned)'
        ).agg(
            {
                'Parent jurisdiction (alpha-3 code)': 'first',
                'tax_deficit': 'sum',
                'tax_deficit_x_domestic': 'sum',
                'tax_deficit_x_tax_haven': 'sum',
                'tax_deficit_x_non_haven': 'sum'
            }
        ).copy()

        oecd_stratified.reset_index(inplace=True)

        return oecd_stratified.copy()

    def compute_all_tax_deficits(self, minimum_ETR=0.25, CbCR_reporting_countries_only=False):
        """
        This method encapsulates most of the computations for the multilateral agreement scenario.

        Taking as input the minimum effective tax rate to apply and based on OECD and TWZ data, it outputs a DataFrame
        which presents, for each country in our sample (countries in OECD and/or TWZ data) the total tax deficit, as
        well as its breakdown into domestic, tax-haven and non-haven tax deficits.

        The output is in 2021 EUR after a currency conversion and the extrapolation from 2016 to 2021 figures.
        """
        # We need to have previously loaded and cleaned the OECD and TWZ data
        if self.oecd is None or self.twz is None:
            raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

        # We use the method defined above and will use its output as a base for the following computations
        oecd_stratified = self.get_stratified_oecd_data(minimum_ETR=minimum_ETR)

        twz = self.twz.copy()

        # From TWZ data on profits registered in tax havens and assuming that these are taxed at a given minimum ETR
        # (10% in the report, see the instantiation function for the definition of this attribute), we deduce the tax-
        # haven-based tax deficit of TWZ countries
        twz['tax_deficit_x_tax_haven_TWZ'] = \
            twz['Profits in all tax havens (positive only)'] * (minimum_ETR - self.assumed_haven_ETR_TWZ)

        # --- Managing countries in both OECD and TWZ data

        # We focus on parent countries which are in both the OECD and TWZ data
        # NB: recall that we do not consider the Swedish CbCR
        twz_in_oecd = twz[twz['Is parent in OECD data?'].astype(bool)].copy()

        # We merge the two DataFrames on country codes
        merged_df = oecd_stratified.merge(
            twz_in_oecd[['Country', 'Alpha-3 country code', 'tax_deficit_x_tax_haven_TWZ']],
            how='left',
            left_on='Parent jurisdiction (alpha-3 code)',
            right_on='Alpha-3 country code'
        ).drop(columns=['Country', 'Alpha-3 country code'])

        # For countries that are in the OECD data but not in TWZ, we impute a tax-haven-based tax deficit from TWZ of 0
        merged_df['tax_deficit_x_tax_haven_TWZ'] = merged_df['tax_deficit_x_tax_haven_TWZ'].fillna(0)

        self.countries_replaced = []

        if self.carve_outs:

            calculator = TaxDeficitCalculator()
            calculator.load_clean_data()
            _ = calculator.compute_all_tax_deficits()

            countries_replaced = calculator.countries_replaced.copy()

            merged_df['tax_deficit_x_tax_haven_merged'] = merged_df.apply(
                lambda row: self.combine_haven_tax_deficits(
                    row,
                    carve_outs=self.carve_outs,
                    countries_replaced=countries_replaced),
                axis=1
            )

        else:
            merged_df['tax_deficit_x_tax_haven_merged'] = merged_df.apply(
                lambda row: self.combine_haven_tax_deficits(
                    row,
                    carve_outs=self.carve_outs
                ),
                axis=1
            )

        self.countries_replaced = merged_df[
            merged_df['tax_deficit_x_tax_haven_merged'] == merged_df['tax_deficit_x_tax_haven_TWZ']
        ]['Parent jurisdiction (alpha-3 code)'].unique()

        merged_df.drop(columns=['tax_deficit_x_tax_haven', 'tax_deficit_x_tax_haven_TWZ'], inplace=True)

        merged_df.rename(
            columns={
                'tax_deficit_x_tax_haven_merged': 'tax_deficit_x_tax_haven'
            },
            inplace=True
        )

        # Summing the tax-haven-based, non-haven and domestic tax deficits yields the total tax deficit of each country
        merged_df['tax_deficit'] = merged_df['tax_deficit_x_tax_haven'] \
            + merged_df['tax_deficit_x_domestic'] \
            + merged_df['tax_deficit_x_non_haven']

        # --- Countries only in the TWZ data

        # We now focus on countries that are absent from the OECD data
        # NB: recall that we do not consider the Swedish CbCR
        twz_not_in_oecd = twz[~twz['Is parent in OECD data?'].astype(bool)].copy()

        twz_not_in_oecd.drop(
            columns=['Profits in all tax havens', 'Profits in all tax havens (positive only)'],
            inplace=True
        )

        # - Extrapolating the foreign non-haven tax deficit

        # We compute the imputation ratio with the method defined above
        imputation_ratio_non_haven = self.get_non_haven_imputation_ratio(minimum_ETR=minimum_ETR)

        # And we deduce the non-haven tax deficit of countries that are only found in TWZ data
        twz_not_in_oecd['tax_deficit_x_non_haven'] = \
            twz_not_in_oecd['tax_deficit_x_tax_haven_TWZ'] * imputation_ratio_non_haven

        # - Computing the domestic tax deficit

        # For countries that are only in TWZ data, we still need to compute their domestic tax deficit
        twz_domestic = self.twz_domestic.copy()

        # We only consider countries whose domestic ETR is stricly below the minimum ETR
        # (otherwise, there is no tax deficit to collect from domestic profits)
        twz_domestic = twz_domestic[twz_domestic['Domestic ETR'] < minimum_ETR].copy()

        # We compute the ETR differential
        twz_domestic['ETR_differential'] = twz_domestic['Domestic ETR'].map(lambda x: minimum_ETR - x)

        # And deduce the domestic tax deficit of each country
        twz_domestic['tax_deficit_x_domestic'] = twz_domestic['ETR_differential'] * twz_domestic['Domestic profits']

        # - Combining the different forms of tax deficit

        # We merge the two DataFrames to complement twz_not_in_oecd with domestic tax deficit results
        twz_not_in_oecd = twz_not_in_oecd.merge(
            twz_domestic[['Alpha-3 country code', 'tax_deficit_x_domestic']],
            how='left',
            on='Alpha-3 country code'
        )

        # As we filtered out countries whose domestic ETR is stricly below the minimum ETR, some missing values
        # appear during the merge; we impute 0 for these as they do not have any domestic tax deficit to collect
        twz_not_in_oecd['tax_deficit_x_domestic'] = twz_not_in_oecd['tax_deficit_x_domestic'].fillna(0)

        # We deduce the total tax deficit for each country
        twz_not_in_oecd['tax_deficit'] = twz_not_in_oecd['tax_deficit_x_tax_haven_TWZ'] \
            + twz_not_in_oecd['tax_deficit_x_domestic'] \
            + twz_not_in_oecd['tax_deficit_x_non_haven']

        # --- Merging the results of the two data sources

        # We need columns to match for the concatenation to operate smoothly
        twz_not_in_oecd.rename(
            columns={
                'Country': 'Parent jurisdiction (whitespaces cleaned)',
                'Alpha-3 country code': 'Parent jurisdiction (alpha-3 code)',
                'tax_deficit_x_tax_haven_TWZ': 'tax_deficit_x_tax_haven'
            },
            inplace=True
        )

        twz_not_in_oecd.drop(columns=['Is parent in OECD data?'], inplace=True)

        # We exclude Sweden from the OECD-drawn results, as we do not consider its CbCR
        merged_df = merged_df[merged_df['Parent jurisdiction (alpha-3 code)'] != 'SWE'].copy()

        # We eventually concatenate the two DataFrames
        merged_df = pd.concat(
            [merged_df, twz_not_in_oecd],
            axis=0
        )

        # --- Extrapolations to 2021 EUR

        # We convert 2016 USD results in 2016 EUR and extraprolate them to 2021 EUR
        for column_name in merged_df.columns[2:]:
            merged_df[column_name] = merged_df[column_name] * self.USD_to_EUR_2016 * self.multiplier_2021

        # --- Managing the case where the minimum ETR is 20% or below for TWZ countries

        # As mentioned above and detailed in Appendix A, the imputation of the non-haven tax deficit of TWZ countries
        # follows a specific process whenever the chosen minimum ETR is of or below 20%
        if minimum_ETR <= 0.2 and self.alternative_imputation:
            # We get the new multiplying factor from the method defined above
            multiplying_factor = self.get_alternative_non_haven_factor(minimum_ETR=minimum_ETR)

            # We compute all tax deficits at the reference rate (25% in the report)
            df = self.compute_all_tax_deficits(
                minimum_ETR=self.reference_rate_for_alternative_imputation
            )

            # We only consider countries that are absent from the OECD data, except Sweden as usual
            oecd_reporting_countries_but_SWE = self.oecd[
                self.oecd['Parent jurisdiction (alpha-3 code)'] != 'SWE'
            ]['Parent jurisdiction (alpha-3 code)'].unique()

            df = df[
                ~df['Parent jurisdiction (alpha-3 code)'].isin(oecd_reporting_countries_but_SWE)
            ].copy()

            # For these countries, we multiply the non-haven tax deficit at the reference rate by the multiplying factor
            df['tax_deficit_x_non_haven_imputation'] = df['tax_deficit_x_non_haven'] * multiplying_factor

            # We save the results in a dictionary that will allow to map the DataFrame that we want to output in the end
            mapping = {}

            for _, row in df.iterrows():
                mapping[row['Parent jurisdiction (alpha-3 code)']] = row['tax_deficit_x_non_haven_imputation']

            # We create a new column in the to-be-output DataFrame which takes as value:
            # - the non-haven tax deficit estimated just above for TWZ countries
            # - 0 for OECD-reporting countries, which do not require this imputation
            merged_df['tax_deficit_x_non_haven_imputation'] = merged_df['Parent jurisdiction (alpha-3 code)'].map(
                lambda country_code: mapping.get(country_code, 0)
            )

            # We deduce the non-haven tax deficit of all countries
            merged_df['tax_deficit_x_non_haven'] += merged_df['tax_deficit_x_non_haven_imputation']

            # And add this imputation also to the column that presents the total tax deficit of each country
            merged_df['tax_deficit'] += merged_df['tax_deficit_x_non_haven_imputation']

            merged_df.drop(
                columns=['tax_deficit_x_non_haven_imputation'],
                inplace=True
            )

        if CbCR_reporting_countries_only:
            merged_df = merged_df[
                merged_df['Parent jurisdiction (whitespaces cleaned)'].isin(
                    self.oecd['Parent jurisdiction (whitespaces cleaned)'].unique()
                )
            ].copy()

        return merged_df.reset_index(drop=True).copy()

    def combine_haven_tax_deficits(
        self,
        row,
        carve_outs=False,
        countries_replaced=None
    ):
        """
        This function is used to compute the tax deficit of all in-sample headquarter countries in the multilateral im-
        plementation scenario.

        For parent countries that are in both the OECD and TWZ data, we have two different sources to compute their tax-
        haven-based tax deficit and we retain the highest of these two amounts.

        Besides, for parent countries in the OECD data that do not report a fully detailed country-by-country breakdown
        of the activity of their multinationals, we cannot distinguish their tax-haven and non-haven tax deficits. Quite
        arbitrarily in the Python code, we attribute everything to the non-haven tax deficit. In the Table A1 of the re-
        port, these specific cases are described with the "Only foreign aggregate data" column.
        """
        if carve_outs and countries_replaced is None:
            raise Exception(
                'Using this function under carve-outs requires to indicate a list of countries to replace.'
            )

        if row['Parent jurisdiction (alpha-3 code)'] not in (
            COUNTRIES_WITH_MINIMUM_REPORTING + COUNTRIES_WITH_CONTINENTAL_REPORTING
        ):
            if countries_replaced is None:

                if row['tax_deficit_x_tax_haven_TWZ'] > row['tax_deficit_x_tax_haven']:
                    self.countries_replaced.append(row['Parent jurisdiction (alpha-3 code)'])
                    return row['tax_deficit_x_tax_haven_TWZ']

                else:
                    return row['tax_deficit_x_tax_haven']

            else:
                if (
                    row['tax_deficit_x_tax_haven_TWZ'] > row['tax_deficit_x_tax_haven']
                    and row['Parent jurisdiction (alpha-3 code)'] in countries_replaced
                ):
                    self.countries_replaced.append(row['Parent jurisdiction (alpha-3 code)'])
                    return row['tax_deficit_x_tax_haven_TWZ']

                else:
                    return row['tax_deficit_x_tax_haven']

        else:
            return 0

    def check_tax_deficit_computations(self, minimum_ETR=0.25):
        """
        Taking the selected minimum ETR as input and relying on the compute_all_tax_deficits method defined above, this
        method outputs a DataFrame that can be compared with Table A1 of the report. For each country in OECD and/or TWZ
        data, it displays its total tax deficit and a breakdown into domestic, tax-haven-based and non-haven tax defi-
        cits. Figures are display in 2021 billion EUR.
        """

        # We start from the output of the previously defined method
        df = self.compute_all_tax_deficits(minimum_ETR=minimum_ETR)

        # And convert numeric columns from 2021 EUR to 2021 billion EUR
        for column_name in df.columns[2:]:
            df[column_name] = df[column_name] / 10**9

        return df.copy()

    def get_total_tax_deficits(self, minimum_ETR=0.25):
        """
        This method takes the selected minimum ETR as input and relies on the compute_all_tax_deficits, to output a Da-
        taFrame with (i) the total tax defict of each in-sample country in 2021 EUR and (ii) the sum of these tax defi-
        cits at the EU-27 and at the whole sample level. It can be considered as an intermediary step towards the fully
        formatted table displayed on the online simulator (section "Multilateral implementation scenario").
        """

        df = self.compute_all_tax_deficits(minimum_ETR=minimum_ETR)

        df = df[
            ['Parent jurisdiction (whitespaces cleaned)', 'Parent jurisdiction (alpha-3 code)', 'tax_deficit']
        ]

        df.sort_values(
            by='Parent jurisdiction (whitespaces cleaned)',
            inplace=True
        )

        # We compute the sum of total tax deficits at the EU-27 level and for the whole sample
        total_eu = (df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes) * 1 * df['tax_deficit']).sum()
        total_whole_sample = df['tax_deficit'].sum()

        # Possibly suboptimal process to add "Total" lines at the end of the DataFrame
        dict_df = df.to_dict()

        dict_df[df.columns[0]][len(df)] = 'Total - EU27'
        dict_df[df.columns[1]][len(df)] = '..'
        dict_df[df.columns[2]][len(df)] = total_eu

        dict_df[df.columns[0]][len(df) + 1] = 'Total - Whole sample'
        dict_df[df.columns[1]][len(df) + 1] = '..'
        dict_df[df.columns[2]][len(df) + 1] = total_whole_sample

        df = pd.DataFrame.from_dict(dict_df)

        return df.reset_index(drop=True)

    def check_appendix_A2(self):
        """
        Relying on the get_total_tax_deficits method and on TWZ data on corporate income tax revenues, this method out-
        puts a DataFrame that can be compared with the first 4 columns of Table A2 in the report. For each in-sample
        country and at four different minimum ETRs (15%, 21%, 25% and 30% which are the four main cases considered in
        the report), the table presents estimated revenue gains as a percentage of currently corporate income taxes.
        """

        # We need to have previously loaded and cleaned the TWZ data on corporate income tax revenues
        # (figures in the pre-loaded DataFrame are provided in 2016 USD)
        if self.twz_CIT is None:
            raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

        # We compute total tax deficits, first at a 15% minimum ETR and in 2021 EUR
        df = self.get_total_tax_deficits(minimum_ETR=0.15)

        df.rename(columns={'tax_deficit': 'tax_deficit_15'}, inplace=True)

        # We merge the two DataFrames to combine information on collectible tax deficits and current CIT revenues
        merged_df = df.merge(
            self.twz_CIT,
            how='left',
            left_on='Parent jurisdiction (alpha-3 code)',
            right_on='Country (alpha-3 code)'
        ).drop(columns=['Country', 'Country (alpha-3 code)'])

        # We bring back the tax deficit estimated to 2016 USD (from 2021 EUR)
        merged_df['tax_deficit_15'] /= (merged_df['CIT revenue'] * self.multiplier_2021 * self.USD_to_EUR_2016 / 100)

        # For the 3 other rates considered in the output table
        for rate in [0.21, 0.25, 0.3]:
            # We compute total tax deficits at this rate
            df = self.get_total_tax_deficits(minimum_ETR=rate)

            # We add these results to the central DataFrame thanks to a merge operation
            merged_df = merged_df.merge(
                df,
                how='left',
                on='Parent jurisdiction (alpha-3 code)'
            )

            # We impute the missing values produced by the merge
            merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

            # We rename the newly-added tax deficit column
            merged_df.rename(
                columns={'tax_deficit': f'tax_deficit_{int(rate * 100)}'},
                inplace=True
            )

            # And we bring it back to 2016 USD
            merged_df[f'tax_deficit_{int(rate * 100)}'] /= (
                merged_df['CIT revenue'] * self.multiplier_2021 * self.USD_to_EUR_2016 / 100
            )

        # We want to also verify the EU-27 average and restrict the DataFrame to these countries
        eu_df = merged_df[merged_df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)].copy()

        # This attribute stores the average EU-27 revenue gain estimate in % of current CIT revenues for each of the 4
        # minimum ETRs of interest (respectively 15.1%, 30.5%, 52.3% and 84.1% in the report)
        self.check = [
            (
                eu_df[f'tax_deficit_{rate}'] * eu_df['CIT revenue'] / 100
            ).sum() / eu_df['CIT revenue'].sum() for rate in [15, 21, 25, 30]
        ]

        # Coming back to the DataFrame with all in-sample countries, we only keep the relevant columns and output it
        merged_df = merged_df[
            [
                'Parent jurisdiction (whitespaces cleaned)_x',
                'tax_deficit_15', 'tax_deficit_21', 'tax_deficit_25', 'tax_deficit_30'
            ]
        ].copy()

        # NB: in the current version of this method, the successive merges have a poor effect on the "Total" rows that
        # are included in the output of the get_total_tax_deficits method; this could easily be improved

        return merged_df.copy()

    def output_tax_deficits_formatted(self, minimum_ETR=0.25):
        """
        This method is used in the "app.py" file, which underlies the Streamlit simulator. It is used to produce the
        table on the "Multilateral implementation scenario" page. It takes as input the selected minimum ETR and widely
        relies on the get_total_tax_deficits method defined above. It mostly consists in a series of formatting steps.
        """

        # We build the unformatted results table thanks to the get_total_tax_deficits method
        df = self.get_total_tax_deficits(minimum_ETR=minimum_ETR)

        # We only want to include certain countries in the output table:
        # - all the EU-27 countries that are included in our sample (4 unfortunately missing for now)
        # - most of the OECD-reporting countries, excluding only Singapore and Bermuda

        # We first build the list of OECD-reporting countries, excluding Singapore and Bermuda
        oecd_reporting_countries = self.oecd['Parent jurisdiction (alpha-3 code)'].unique()
        oecd_reporting_countries = [
            country_code for country_code in oecd_reporting_countries if country_code not in ['SGP', 'BMU']
        ]

        # From this list, we build the relevant boolean indexing mask that corresponds to our filtering choice
        mask = np.logical_or(
            df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes),
            df['Parent jurisdiction (alpha-3 code)'].isin(oecd_reporting_countries)
        )

        df = df[mask].copy()

        # We sort values by the name of the parent jurisdiction, in the alphabetical order
        df.sort_values(
            by='Parent jurisdiction (whitespaces cleaned)',
            inplace=True
        )

        df.reset_index(drop=True, inplace=True)

        # We convert 2021 EUR figures into 2021 million EUR ones
        df['tax_deficit'] = df['tax_deficit'] / 10**6

        # Again, the same possibly sub-optimal process to add the "Total" lines
        dict_df = df.to_dict()

        dict_df[df.columns[0]][len(df)] = 'Total - EU27'
        dict_df[df.columns[1]][len(df)] = '..'
        dict_df[df.columns[2]][len(df)] = df[
            df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
        ]['tax_deficit'].sum()

        dict_df[df.columns[0]][len(df) + 1] = 'Total - Whole sample'
        dict_df[df.columns[1]][len(df) + 1] = '..'
        dict_df[df.columns[2]][len(df) + 1] = df['tax_deficit'].sum()

        df = pd.DataFrame.from_dict(dict_df)

        # We drop country codes
        df.drop(columns=['Parent jurisdiction (alpha-3 code)'], inplace=True)

        # And we eventually reformat figures with a thousand separator and a 0-decimal rounding
        df['tax_deficit'] = df['tax_deficit'].map('{:,.0f}'.format)

        # We rename columns
        df.rename(
            columns={
                'Parent jurisdiction (whitespaces cleaned)': 'Headquarter country',
                'tax_deficit': 'Collectible tax deficit (€m)'
            },
            inplace=True
        )

        return df.copy()

    def compute_unilateral_scenario_gain(self, country, minimum_ETR=0.25):
        """
        This method encapsulates most of the computations for the unilateral implementation scenario.

        It takes as input:

        - the name of the country assumed to unilaterally implement the tax deficit collection;

        - the minimum effective tax rate that it applies when collecting the full tax deficit of its multinationals and
        a part of the tax deficit of foreign multinationals, based on the location of their sales.

        The output of this method is a DataFrame organized as follows:

        - each row is a headquarter country whose tax deficit would be collected partly or entirely by the taxing coun-
        try (including the taxing country which collects 100% of the tax deficit of its multinationals);

        - there are two columns, with the name of the headquarter country considered and the tax deficit amount that
        could be collected from its multinationals by the taxing country.

        Figures are presented in 2021 EUR.

        Important disclaimer: for now, this method is not robust to variations in the country name, i.e. only country
        names as presented in the OECD CbCR data will generate a result. These are the country names that are proposed
        in the selectbox on the online simulator.

        The methogology behind these computations is described in much more details in Appendix B of the report.
        """

        # We start from the total tax deficits of all countries which can be partly re-allocated to the taxing country
        tax_deficits = self.get_total_tax_deficits(minimum_ETR=minimum_ETR)

        # The OECD data provides the information of extra-group sales, needed to allocate foreign tax deficits
        oecd = self.oecd.copy()

        # We simply convert the name of the taxing country to the corresponding alpha-3 code
        taxing_country = country
        try:
            taxing_country_code = self.oecd[
                self.oecd['Parent jurisdiction (whitespaces cleaned)'] == taxing_country
            ]['Parent jurisdiction (alpha-3 code)'].iloc[0]
        except:
            taxing_country_code = self.twz[
                self.twz['Country'] == taxing_country
            ]['Alpha-3 country code'].iloc[0]

        # This list will store the allocation ratios (for each headquarter country, the share of its tax deficit that
        # can be collected by the taxing country) computed based on the location of extra-group sales
        attribution_ratios = []

        # We iterate over parent countries in the OECD data
        for country_code in tax_deficits['Parent jurisdiction (alpha-3 code)'].values:

            # The taxing country collects 100% of the tax deficit of its own multinationals
            if country_code == taxing_country_code:
                attribution_ratios.append(1)

            # If the parent country is not the taxing country
            else:
                # We restrict the DataFrame to the CbCR of the considered parent country
                oecd_restricted = oecd[oecd['Parent jurisdiction (alpha-3 code)'] == country_code].copy()

                # If the taxing country is not part of its partner jurisdictions, the attribution ratio is of 0
                if taxing_country_code not in oecd_restricted['Partner jurisdiction (alpha-3 code)'].values:
                    attribution_ratios.append(0)

                else:
                    # We fetch extra-group sales registered in the taxing country
                    mask = (oecd_restricted['Partner jurisdiction (alpha-3 code)'] == taxing_country_code)
                    sales_in_taxing_country = oecd_restricted[mask]['Unrelated Party Revenues'].iloc[0]

                    # We compute total extra-group sales
                    total_sales = oecd_restricted['Unrelated Party Revenues'].sum()

                    # We append the resulting ratio to the list of attribution ratios
                    attribution_ratios.append(sales_in_taxing_country / total_sales)

        # We add this list to the DataFrame as a new column
        tax_deficits['Attribution ratios'] = attribution_ratios

        # We deduce, for each headquarter country, the tax deficit that could be collected by the taxing country
        tax_deficits[f'Collectible tax deficit for {taxing_country}'] = \
            tax_deficits['tax_deficit'] * tax_deficits['Attribution ratios']

        # We eliminate irrelevant columns
        tax_deficits.drop(
            columns=[
                'Attribution ratios',
                'tax_deficit',
                'Parent jurisdiction (alpha-3 code)'
            ],
            inplace=True
        )

        # We filter out rows for which the collectible tax deficit is 0
        tax_deficits = tax_deficits[tax_deficits[f'Collectible tax deficit for {taxing_country}'] > 0].copy()

        # We sort values based on the resulting tax deficit, in descending order
        tax_deficits.sort_values(
            by=f'Collectible tax deficit for {taxing_country}',
            ascending=False,
            inplace=True
        )

        # Because the OECD data only gather 26 headquarter countries, we need to make an assumption on the tax deficit
        # that could be collected from other parent countries, excluded from the 2016 version of the data

        # We therefore double the tax deficit collected from non-US foreign countries
        imputation = tax_deficits[
            ~tax_deficits['Parent jurisdiction (whitespaces cleaned)'].isin([taxing_country, 'United States'])
        ][f'Collectible tax deficit for {taxing_country}'].sum()

        # Except for Germany, for which we add back only half of the tax deficit collected from non-US foreign countries
        if taxing_country_code == 'DEU':
            imputation /= 2

        tax_deficits.reset_index(drop=True, inplace=True)

        # Again the same inelegant way of adding "Total" fields at the end of the DataFrame
        dict_df = tax_deficits.to_dict()

        dict_df[tax_deficits.columns[0]][len(tax_deficits)] = 'Others (imputation)'
        dict_df[tax_deficits.columns[1]][len(tax_deficits)] = imputation

        dict_df[tax_deficits.columns[0]][len(tax_deficits) + 1] = 'Total'
        dict_df[tax_deficits.columns[1]][len(tax_deficits) + 1] = (
            tax_deficits[tax_deficits.columns[1]].sum() + imputation
        )

        df = pd.DataFrame.from_dict(dict_df)

        return df.copy()

    def check_unilateral_scenario_gain_computations(self, minimum_ETR=0.25):
        """
        Taking as input the selected minimum effective tax rate and relying on the compute_unilateral_scenario_gain,
        this method outputs a DataFrame that can be compared with the Table 3 of the report. For each country that is
        part of the EU-27 and/or included in the 2016 aggregated and anonymized CbCR data of the OECD, it shows the to-
        tal corporate tax revenue gain that could be drawn from the unilateral implementation of the tax deficit col-
        lection. It also provides a breakdown of this total between the tax deficit of the country's own multinationals,
        the amount that could be collected from US multinationals and revenues that could be collected from non-US ones.
        """

        # We build the list of countries that we want to include in the output table
        country_list = self.get_total_tax_deficits()

        country_list = country_list[
            ~country_list['Parent jurisdiction (whitespaces cleaned)'].isin(['Total - EU27', 'Total - Whole sample'])
        ].copy()

        country_list = list(country_list['Parent jurisdiction (whitespaces cleaned)'].values)

        # We prepare the structure of the output first as a dictionary
        output = {
            'Country': country_list,
            'Own tax deficit': [],
            'Collection of US tax deficit': [],
            'Collection of non-US tax deficit': [],
            'Imputation': [],
            'Total': []
        }

        # We iterate over the list of relevant countries
        for country in country_list:

            # Using the method defined above, we output the table presenting the tax deficit that could be collected
            # from a unilateral implementation of the tax deficit collection by the considered country and its origin
            df = self.compute_unilateral_scenario_gain(
                country=country,
                minimum_ETR=minimum_ETR
            )

            column_name = f'Collectible tax deficit for {country}'

            if country in df['Parent jurisdiction (whitespaces cleaned)'].unique():
                # We fetch the tax deficit that could be collected from the country's own multinationals
                output['Own tax deficit'].append(
                    df[df['Parent jurisdiction (whitespaces cleaned)'] == country][column_name].iloc[0]
                )

            else:
                output['Own tax deficit'].append(0)

            # We fetch the tax deficit that could be collected from US multinationals
            if 'United States' in df['Parent jurisdiction (whitespaces cleaned)'].values:
                output['Collection of US tax deficit'].append(
                    df[df['Parent jurisdiction (whitespaces cleaned)'] == 'United States'][column_name].iloc[0]
                )
            else:
                output['Collection of US tax deficit'].append(0)

            # We fetch the tax deficit that was imputed following our methodology
            output['Imputation'].append(
                df[df['Parent jurisdiction (whitespaces cleaned)'] == 'Others (imputation)'][column_name].iloc[0]
            )

            # We fetch the total tax deficit
            output['Total'].append(
                df[df['Parent jurisdiction (whitespaces cleaned)'] == 'Total'][column_name].iloc[0]
            )

            # And finally, we sum the tax deficits collected from foreign non-US multinationals
            output['Collection of non-US tax deficit'].append(
                df[
                    ~df['Parent jurisdiction (whitespaces cleaned)'].isin(
                        [
                            country, 'United States', 'Total', 'Others (imputation)'
                        ]
                    )
                ][column_name].sum()
            )

        # We convert the dictionary into a DataFrame
        df = pd.DataFrame.from_dict(output)

        # We sum the imputation and the tax deficit collected from foreign, non-US multinationals to obtain the uprated
        # figures that correspond to the "Other foreign firms" column of Table 3 in the report
        df['Collection of non-US tax deficit (uprated with imputation)'] = \
            df['Imputation'] + df['Collection of non-US tax deficit']

        # We convert the results from 2021 EUR into 2021 billion EUR
        for column_name in df.columns[1:]:
            df[column_name] /= 10**9

        return df.copy()

    def output_unilateral_scenario_gain_formatted(self, country, minimum_ETR=0.25):
        """
        This method is used in the "app.py" file, which lies behind the Streamlit simulator. It allows to produce the
        table presented on the "Unilateral implementation scenario" page. It takes as input the selected minimum ETR and
        the name of the country assumed to unilaterally implement the tax deficit collection. Then, it widely relies on
        the compute_unilateral_scenario_gain method defined above and mostly consists in a series of formatting steps to
        make the table more readable and understandable.
        """

        # We compute the gains from the unilateral implementation of the tax deficit collection for the taxing country
        df = self.compute_unilateral_scenario_gain(
            country=country,
            minimum_ETR=minimum_ETR
        )

        # We convert the numeric outputs into 2021 million EUR
        df[f'Collectible tax deficit for {country}'] = df[f'Collectible tax deficit for {country}'] / 10**6

        # We reformat figures with two decimals and a thousand separator
        df[f'Collectible tax deficit for {country}'] = \
            df[f'Collectible tax deficit for {country}'].map('{:,.2f}'.format)

        # We rename columns in accordance
        df.rename(
            columns={
                f'Collectible tax deficit for {country}': f'Collectible tax deficit for {country} (€m)',
                'Parent jurisdiction (whitespaces cleaned)': 'Headquarter country'
            },
            inplace=True
        )

        return df.copy()

    def compute_intermediary_scenario_gain(self, minimum_ETR=0.25):
        """
        This method encapsulates the computations used to estimate the corporate tax revenue gains of EU countries,
        should the European Union implement the tax deficit collection as a block. This corresponds therefore to the
        partial cooperation scenario described in the report.

        Taking as input the selected minimum effective tax rate, this method outputs a DataFrame that presents for each
        in-sample EU-27 country:

        - the corporate tax revenue gains that could be collected from its own multinationals ("tax_deficit" column);
        - the tax deficit that could be collected from foreign, non-EU multinationals ("From foreign MNEs" column);
        - and the resulting total corporate tax revenue gain.

        All figures are output in 2021 million EUR.

        The three lines at the end of the DataFrame are a bit specific. Some OECD-reporting contries do not provide a
        perfectly detailed country-by-country report and for these, the "Other Europe" and "Europe" fields are assumed
        to be related to EU countries and are included in the total collectible tax deficit. The final line presents
        this total.

        The methogology behind these computations is described in much more details in Appendix C of the report.
        """

        # We start by computing the total tax deficits of all in-sample countries (those of the multilateral scenario)
        tax_deficits = self.get_total_tax_deficits(minimum_ETR=minimum_ETR)

        oecd = self.oecd.copy()

        # We extract the total tax deficit for the EU-27
        eu_27_tax_deficit = tax_deficits[
            tax_deficits['Parent jurisdiction (whitespaces cleaned)'] == 'Total - EU27'
        ]['tax_deficit'].iloc[0]

        # And we store in a separate DataFrame the tax deficits of EU-27 countries
        eu_27_tax_deficits = tax_deficits[
            tax_deficits['Parent jurisdiction (alpha-3 code)'].isin(
                eu_27_country_codes
            )
        ].copy()

        # We focus only on a few non-EU countries, defined when the TaxDeficitCalculator object is instantiated
        tax_deficits = tax_deficits[
            tax_deficits['Parent jurisdiction (alpha-3 code)'].isin(
                self.country_list_intermediary_scenario
            )
        ].copy()

        # We store the results in a dictionary, which we will map upon the eu_27_tax_deficits DataFrame
        additional_revenue_gains = {}

        # We iterate over EU-27 countries and compute for eacht he tax deficit collected from non-EU multinationals
        for eu_country in eu_27_country_codes:

            td_df = tax_deficits.copy()

            # This dictionary will store the attribution ratios based on extra-group sales to be mapped upon td_df
            attribution_ratios = {}

            # We iterate over non-EU countries in our list
            for country in self.country_list_intermediary_scenario:

                oecd_restricted = oecd[oecd['Parent jurisdiction (alpha-3 code)'] == country].copy()

                # We fetch the extra-group sales registered by the non-EU country's multinationals in the EU-27 country
                # (defaults to 0 if the EU-27 country is not among the partners of the non-EU country)
                sales_in_eu_country = oecd_restricted[
                    oecd_restricted['Partner jurisdiction (alpha-3 code)'] == eu_country
                ]['Unrelated Party Revenues'].sum()

                # We compute the total extra-group sales registered by the non-EU country's multinationals worldwide
                total_sales = oecd_restricted['Unrelated Party Revenues'].sum()

                # We deduce the share of the non-EU country's tax deficit attributable to the EU-27 country
                attribution_ratios[country] = sales_in_eu_country / total_sales

            # We map the attribution_ratios dictionary upon the td_df DataFrame
            td_df['Attribution ratios'] = td_df['Parent jurisdiction (alpha-3 code)'].map(attribution_ratios)

            # We deduce, for each non-EU country, the amount of its tax deficit that is collected by the EU-27 country
            td_df['Collectible tax deficit'] = td_df['Attribution ratios'] * td_df['tax_deficit']

            # We sum all these and multiply the total by 2 to estimate the total tax deficit that the EU-27 country
            # could collect from non-EU multinationals
            additional_revenue_gains[eu_country] = td_df['Collectible tax deficit'].sum() * 2

            # NB: the multiplication by 2 corresponds to the imputation strategy defined in Appendix C of the report

        # We map the resulting dictionary upon the eu_27_tax_deficits DataFrame
        eu_27_tax_deficits['From foreign MNEs'] = eu_27_tax_deficits['Parent jurisdiction (alpha-3 code)'].map(
            additional_revenue_gains
        )

        # And deduce total corporate tax revenue gains from such a scenario for all EU-27 countries
        eu_27_tax_deficits['Total'] = (
            eu_27_tax_deficits['tax_deficit'] + eu_27_tax_deficits['From foreign MNEs']
        )

        # We operate a similar process for "Europe" and "Other Europe" field
        additional_revenue_gains = {}

        for aggregate in ['Europe', 'Other Europe']:

            td_df = tax_deficits.copy()

            attribution_ratios = {}

            for country in self.country_list_intermediary_scenario:

                # We do not consider the "Other Europe" field in the US CbCR as it probably does not correspond to
                # activities operated in EU-27 countries (sufficient country-by-country breakdown to exclude this)
                if country == 'USA':
                    attribution_ratios[country] = 0

                    continue

                oecd_restricted = oecd[oecd['Parent jurisdiction (alpha-3 code)'] == country].copy()

                sales_in_europe_or_other_europe = oecd_restricted[
                    oecd_restricted['Partner jurisdiction (whitespaces cleaned)'] == aggregate
                ]['Unrelated Party Revenues'].sum()

                total_sales = oecd_restricted['Unrelated Party Revenues'].sum()

                attribution_ratios[country] = sales_in_europe_or_other_europe / total_sales

            td_df['Attribution ratios'] = td_df['Parent jurisdiction (alpha-3 code)'].map(attribution_ratios)

            td_df['Collectible tax deficit'] = td_df['Attribution ratios'] * td_df['tax_deficit']

            additional_revenue_gains[aggregate] = td_df['Collectible tax deficit'].sum()

        # We drop unnecessary columns
        eu_27_tax_deficits.drop(
            columns=['Parent jurisdiction (alpha-3 code)'],
            inplace=True
        )

        # And we operate very inelegant transformations of the DataFrame to add the "Other Europe", "Europe" and "Total"
        # fields at the bottom of the DataFrame
        eu_27_tax_deficits.reset_index(drop=True, inplace=True)

        dict_df = eu_27_tax_deficits.to_dict()

        dict_df[eu_27_tax_deficits.columns[0]][len(eu_27_tax_deficits)] = 'Other Europe'
        dict_df[eu_27_tax_deficits.columns[1]][len(eu_27_tax_deficits)] = 0
        dict_df[eu_27_tax_deficits.columns[2]][len(eu_27_tax_deficits)] = additional_revenue_gains['Other Europe']
        dict_df[eu_27_tax_deficits.columns[3]][len(eu_27_tax_deficits)] = additional_revenue_gains['Other Europe']

        dict_df[eu_27_tax_deficits.columns[0]][len(eu_27_tax_deficits) + 1] = 'Europe'
        dict_df[eu_27_tax_deficits.columns[1]][len(eu_27_tax_deficits) + 1] = 0
        dict_df[eu_27_tax_deficits.columns[2]][len(eu_27_tax_deficits) + 1] = additional_revenue_gains['Europe']
        dict_df[eu_27_tax_deficits.columns[3]][len(eu_27_tax_deficits) + 1] = additional_revenue_gains['Europe']

        # Here we compute total corporate tax revenue gains for EU-27 countries
        # NB: We have not multiplied the "Other Europe" and "Europe" fields by 2 (no imputation for these)
        total_additional_revenue_gain = eu_27_tax_deficits['From foreign MNEs'].sum() \
            + additional_revenue_gains['Europe'] \
            + additional_revenue_gains['Other Europe']

        dict_df[eu_27_tax_deficits.columns[0]][len(eu_27_tax_deficits) + 2] = 'Total'
        dict_df[eu_27_tax_deficits.columns[1]][len(eu_27_tax_deficits) + 2] = eu_27_tax_deficit
        dict_df[eu_27_tax_deficits.columns[2]][len(eu_27_tax_deficits) + 2] = total_additional_revenue_gain
        dict_df[eu_27_tax_deficits.columns[3]][len(eu_27_tax_deficits) + 2] = \
            eu_27_tax_deficit + total_additional_revenue_gain

        eu_27_tax_deficits = pd.DataFrame.from_dict(dict_df)

        # We convert 2021 EUR figures into 2021 billion EUR
        for column_name in eu_27_tax_deficits.columns[1:]:
            eu_27_tax_deficits[column_name] /= 10**6

        return eu_27_tax_deficits.copy()

    def output_intermediary_scenario_gain_formatted(self, minimum_ETR=0.25):
        """
        This method is used in the "app.py" file, which lies behind the Streamlit simulator. It allows to produce the
        table presented on the "Partial cooperation scenario" page. It takes as input the selected minimum ETR and then,
        widely relies on the compute_intermediary_scenario_gain method defined above. It mostly consists in a series of
        formatting steps to make the table more readable and understandable.
        """

        # We compute corporate tax revenue gains from the partial cooperation scenario
        df = self.compute_intermediary_scenario_gain(minimum_ETR=minimum_ETR)

        # We eliminate irrelevant columns
        df.drop(columns=['tax_deficit', 'From foreign MNEs'], inplace=True)

        # We reformat figures with a thousand separator and a 0-decimal rounding
        df['Total'] = df['Total'].map('{:,.0f}'.format)

        # We rename columns to make them more explicit
        df.rename(
            columns={
                'Parent jurisdiction (whitespaces cleaned)': 'Taxing country',
                'Total': 'Collectible tax deficit (€m)'
            },
            inplace=True
        )

        # We add quotation marks to the "Europe" and "Other Europe" fields
        df['Taxing country'] = df['Taxing country'].map(
            lambda x: x if x not in ['Europe', 'Other Europe'] else f'"{x}"'
        )

        return df.copy()

    def assess_carve_out_impact(self, minimum_ETR=0.25):
        """
        This function takes as input a minimum effective tax rate (which defaults to 25%) and outputs a DataFrame
        showing, for each in-sample country (EU and/or CbCR-reporting countries):

        - the tax deficit that it could collect by imposing this minimum ETR on the profits of its multinationals;
        - the split between domestic, tax haven and non-haven tax deficits;
        - and the same amounts with carve-outs being applied.

        Carve-outs are applied with the parameters (carve-out rate, use of the full value of tangible assets or of de-
        preciation expenses only and exclusion of inventories or not) that are defined when instantiating the TaxDefi-
        citCalculator object.
        """

        # If carve-out parameters have not been indicated, we cannot run the computations
        if self.carve_out_rate is None or self.depreciation_only is None or self.exclude_inventories is None:
            raise Exception(
                'If you want to simulate substance-based carve-outs, you need to indicate all the parameters.'
            )

        # We instantiate a TaxDeficitCalculator object with carve-outs
        calculator = TaxDeficitCalculator(
            carve_outs=True,
            carve_out_rate=self.carve_out_rate,
            depreciation_only=self.depreciation_only,
            exclude_inventories=self.exclude_inventories
        )

        # We load the data
        calculator.load_clean_data()

        # And deduce total tax deficits and their split, with carve-outs being applied
        carve_outs = calculator.compute_all_tax_deficits(
            CbCR_reporting_countries_only=False,
            minimum_ETR=minimum_ETR
        )

        # We instantiate a TaxDeficitCalculator object without carve-outs
        calculator_no_carve_out = TaxDeficitCalculator()

        # We load the data
        calculator_no_carve_out.load_clean_data()

        # And deduce total tax deficits and their split, without any carve-out being applied
        no_carve_outs = calculator_no_carve_out.compute_all_tax_deficits(
            CbCR_reporting_countries_only=False,
            minimum_ETR=minimum_ETR
        )

        # We merge the two DataFrames
        carve_outs_impact = carve_outs.merge(
            no_carve_outs,
            how='inner',
            on=[
                'Parent jurisdiction (whitespaces cleaned)',
                'Parent jurisdiction (alpha-3 code)'
            ]
        ).rename(
            columns={
                'tax_deficit_x': 'TD_with_carve_outs',
                'tax_deficit_y': 'TD_no_carve_outs',
                'tax_deficit_x_domestic_x': 'domestic_TD_with_carve_outs',
                'tax_deficit_x_domestic_y': 'domestic_TD_no_carve_outs',
                'tax_deficit_x_non_haven_x': 'non_haven_TD_with_carve_outs',
                'tax_deficit_x_non_haven_y': 'non_haven_TD_no_carve_outs',
                'tax_deficit_x_tax_haven_x': 'tax_haven_TD_with_carve_outs',
                'tax_deficit_x_tax_haven_y': 'tax_haven_TD_no_carve_outs'
            }
        )

        # We only show EU and/or CbCR-reporting countries
        cbcr_reporting_countries = list(self.oecd['Parent jurisdiction (alpha-3 code)'].unique())

        mask_eu = carve_outs_impact['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
        mask_cbcr = carve_outs_impact['Parent jurisdiction (alpha-3 code)'].isin(cbcr_reporting_countries)

        # This condition is encapsulated in this boolean indexing mask
        mask = np.logical_or(mask_eu, mask_cbcr)

        # We add two useful indicator variables
        carve_outs_impact['IS_EU'] = mask_eu * 1
        carve_outs_impact['REPORTS_CbCR'] = mask_cbcr * 1

        # And restrict the DataFrame to relevant countries
        restricted_df = carve_outs_impact[mask].copy()

        # We finalise the formatting of the table
        restricted_df.sort_values(
            by=['IS_EU', 'Parent jurisdiction (alpha-3 code)'],
            ascending=[False, True],
            inplace=True
        )

        columns = [
            'Parent jurisdiction (whitespaces cleaned)', 'Parent jurisdiction (alpha-3 code)',
            'TD_with_carve_outs', 'TD_no_carve_outs', 'domestic_TD_with_carve_outs', 'domestic_TD_no_carve_outs',
            'non_haven_TD_with_carve_outs', 'non_haven_TD_no_carve_outs', 'tax_haven_TD_with_carve_outs',
            'tax_haven_TD_no_carve_outs', 'IS_EU', 'REPORTS_CbCR'
        ]

        return restricted_df[columns].copy()

    def assess_carve_out_impact_formatted(self, minimum_ETR=0.25):
        """
        This method is used in the "app.py" file, which underlies the Streamlit simulator. It is used to produce the
        table on the "Substance-based carve-outs" page. It takes as input the selected minimum ETR and widely relies on
        the assess_carve_out_impact method defined above. It mostly consists in a series of formatting steps.
        """
        df = self.assess_carve_out_impact(minimum_ETR=minimum_ETR)

        df.sort_values(
            by='Parent jurisdiction (whitespaces cleaned)',
            inplace=True
        )

        mask_eu = df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)

        df = df[['Parent jurisdiction (whitespaces cleaned)', 'TD_no_carve_outs', 'TD_with_carve_outs']].copy()

        dict_df = df.to_dict()

        dict_df[df.columns[0]][len(df) + 1] = 'Total - EU27'
        dict_df[df.columns[1]][len(df) + 1] = df[mask_eu]['TD_no_carve_outs'].sum()
        dict_df[df.columns[2]][len(df) + 1] = df[mask_eu]['TD_with_carve_outs'].sum()

        dict_df[df.columns[0]][len(df) + 2] = 'Total - Whole sample'
        dict_df[df.columns[1]][len(df) + 2] = df['TD_no_carve_outs'].sum()
        dict_df[df.columns[2]][len(df) + 2] = df['TD_with_carve_outs'].sum()

        df = pd.DataFrame.from_dict(dict_df)

        df['Change in % of revenue gains without carve-outs'] = (
            (df['TD_with_carve_outs'] - df['TD_no_carve_outs']) / df['TD_no_carve_outs']
        ) * 100

        df.rename(
            columns={
                'TD_no_carve_outs': 'Collectible tax deficit without carve-outs (€m)',
                'TD_with_carve_outs': 'Collectible tax deficit with carve-outs (€m)'
            },
            inplace=True
        )

        for column_name in df.columns[1:-1]:
            df[column_name] /= 10**6
            df[column_name] = df[column_name].map('{:,.0f}'.format)

        df[df.columns[-1]] = df[df.columns[-1]].map('{:.1f}'.format)

        return df.copy()

    def get_carve_outs_table(
        self,
        TWZ_countries_methodology,
        depreciation_only, exclude_inventories,
        carve_out_rate=0.05
    ):
        """
        This function takes as input:

        - the methodology to use to estimate the post-carve-out revenue gains of TWZ countries;

        - a boolean, "depreciation_only", indicating whether to restrict the tangible assets component of substance-
        based carve-outs to a share of depreciation expenses;

        - a boolean, "exlude_inventories", indicating whether to exlude inventories from tangible assets or not;

        - the carve-out rate to use (which defaults to 5%).

        It returns a DataFrame that shows, for the 15% and 25% minimum rates and for each in-sample country, the estima-
        ted revenue gains from a global minimum tax without and with carve-outs being applied.
        """

        # We need to have previously loaded and cleaned the OECD data
        if self.oecd is None:
            raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

        # The "TWZ_countries_methodology" argument can only take a few string values
        if TWZ_countries_methodology not in ['initial', 'new']:
            raise Exception('The "TWZ_countries_methodology" argument only accepts two values: "initial" or "new".')

        # Computing tax deficits without substance-based carve-outs
        calculator = TaxDeficitCalculator()

        calculator.load_clean_data()

        td_25 = calculator.get_total_tax_deficits(minimum_ETR=0.25).iloc[:-2, :]
        td_15 = calculator.get_total_tax_deficits(minimum_ETR=0.15).iloc[:-2, :]

        # We merge the resulting DataFrames for the 15% and 25% minimum rates
        merged_df = td_25.merge(
            td_15[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df['tax_deficit_y'] = merged_df['tax_deficit_y'].fillna(0)

        merged_df.rename(
            columns={
                'tax_deficit_x': 'tax_deficit_25_no_carve_out',
                'tax_deficit_y': 'tax_deficit_15_no_carve_out'
            },
            inplace=True
        )

        # Computing corresponding tax deficits with substance-based carve-outs
        calculator = TaxDeficitCalculator(
            carve_outs=True, carve_out_rate=carve_out_rate,
            depreciation_only=depreciation_only, exclude_inventories=exclude_inventories
        )

        calculator.load_clean_data()

        td_25 = calculator.get_total_tax_deficits(minimum_ETR=0.25).iloc[:-2]
        td_15 = calculator.get_total_tax_deficits(minimum_ETR=0.15).iloc[:-2]

        # We merge the DataFrame obtained for the 25% minimum rate
        merged_df = merged_df.merge(
            td_25[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df.rename(
            columns={
                'tax_deficit': 'tax_deficit_25_with_carve_out'
            },
            inplace=True
        )

        # We merge the DataFrame obtained for the 15% minimum rate
        merged_df = merged_df.merge(
            td_15[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

        merged_df.rename(
            columns={
                'tax_deficit': 'tax_deficit_15_with_carve_out'
            },
            inplace=True
        )

        # We only show EU and/or CbCR-reporting countries
        cbcr_reporting_countries = list(self.oecd['Parent jurisdiction (alpha-3 code)'].unique())

        mask_eu = merged_df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
        mask_cbcr = merged_df['Parent jurisdiction (alpha-3 code)'].isin(cbcr_reporting_countries)

        # This condition is encapsulated in this boolean indexing mask
        mask = np.logical_or(mask_eu, mask_cbcr)

        # We add two useful indicator variables
        merged_df['IS_EU'] = mask_eu * 1
        merged_df['REPORTS_CbCR'] = mask_cbcr * 1

        # And we restrict the DataFrame to relevant countries
        restricted_df = merged_df[mask].copy()

        # We finalise the reformatting of the DataFrame
        restricted_df.sort_values(
            by=['IS_EU', 'Parent jurisdiction (alpha-3 code)'],
            ascending=[False, True],
            inplace=True
        )

        if TWZ_countries_methodology == 'initial':
            # If we have opted for the "initial" methodology for TWZ countries, we can simply return the DataFrame as is
            return restricted_df.copy()

        else:
            # If we have chosen the "new" methodology, we have a bit more work!

            # We create a temporary copy of the DataFrame, restricted to CbCR-reporting countries (excluding Sweden)
            temp = restricted_df[restricted_df['REPORTS_CbCR'] == 1].copy()
            temp = temp[temp['Parent jurisdiction (alpha-3 code)'] != 'SWE'].copy()

            # We deduce the average reduction factors to apply to the collectible tax deficits of TWZ countries
            self.imputation_15 = temp['tax_deficit_15_with_carve_out'].sum() / temp['tax_deficit_15_no_carve_out'].sum()
            self.imputation_25 = temp['tax_deficit_25_with_carve_out'].sum() / temp['tax_deficit_25_no_carve_out'].sum()

            # We apply the two downgrade factors to tax deficits without carve-outs
            restricted_df['tax_deficit_15_with_carve_out'] = restricted_df.apply(
                (
                    lambda row: row['tax_deficit_15_no_carve_out'] * self.imputation_15 if row['REPORTS_CbCR'] == 0 or
                    row['Parent jurisdiction (alpha-3 code)'] == 'SWE' else row['tax_deficit_15_with_carve_out']
                ),
                axis=1
            )

            restricted_df['tax_deficit_25_with_carve_out'] = restricted_df.apply(
                (
                    lambda row: row['tax_deficit_25_no_carve_out'] * self.imputation_25 if row['REPORTS_CbCR'] == 0 or
                    row['Parent jurisdiction (alpha-3 code)'] == 'SWE' else row['tax_deficit_25_with_carve_out']
                ),
                axis=1
            )

            # And we return the adjusted DataFrame
            return restricted_df.copy()

    def get_carve_outs_table_2(
        self,
        exclude_inventories, depreciation_only,
        carve_out_rate=0.05,
        output_Excel=False
    ):
        """
        This function takes as input:

        - a boolean, "depreciation_only", indicating whether to restrict the tangible assets component of substance-
        based carve-outs to a share of depreciation expenses;

        - a boolean, "exlude_inventories", indicating whether to exlude inventories from tangible assets or not;

        - the carve-out rate to use (which defaults to 5%).

        It returns a DataFrame that shows, for the different minimum effective tax rates and for each in-sample country,
        the estimated impact of substance-based carve-outs. The change is expressed as a percentage of revenue gain es-
        timates without substance-based carve-outs.
        """

        # The "get_carve_outs_table" method provides the required information for two minimum ETRs, 15% and 25%
        # This will serve as a central DataFrame to which we will add the 21% and 30% columns
        df = self.get_carve_outs_table(
            TWZ_countries_methodology='initial',
            exclude_inventories=exclude_inventories, depreciation_only=depreciation_only,
            carve_out_rate=carve_out_rate
        )

        # Computing tax deficits without substance-based carve-outs
        calculator = TaxDeficitCalculator()

        calculator.load_clean_data()

        td_21 = calculator.get_total_tax_deficits(minimum_ETR=0.21).iloc[:-2, :]
        td_30 = calculator.get_total_tax_deficits(minimum_ETR=0.3).iloc[:-2, :]

        # We add the 21% tax deficit to the central DataFrame
        merged_df = df.merge(
            td_21[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

        # We add the 30% tax deficit to the central DataFrame
        merged_df = merged_df.merge(
            td_30[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df['tax_deficit_y'] = merged_df['tax_deficit_y'].fillna(0)

        merged_df.rename(
            columns={
                'tax_deficit_x': 'tax_deficit_21_no_carve_out',
                'tax_deficit_y': 'tax_deficit_30_no_carve_out'
            },
            inplace=True
        )

        # Computing corresponding tax deficits with substance-based carve-outs
        calculator = TaxDeficitCalculator(
            carve_outs=True,
            carve_out_rate=carve_out_rate,
            depreciation_only=depreciation_only,
            exclude_inventories=exclude_inventories
        )

        calculator.load_clean_data()

        td_21 = calculator.get_total_tax_deficits(minimum_ETR=0.21).iloc[:-2]
        td_30 = calculator.get_total_tax_deficits(minimum_ETR=0.3).iloc[:-2]

        # We add the 21% tax deficit with carve-outs to the central DataFrame
        merged_df = merged_df.merge(
            td_21[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

        merged_df.rename(
            columns={
                'tax_deficit': 'tax_deficit_21_with_carve_out'
            },
            inplace=True
        )

        # We add the 30% tax deficit with carve-outs to the central DataFrame
        merged_df = merged_df.merge(
            td_30[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

        merged_df.rename(
            columns={
                'tax_deficit': 'tax_deficit_30_with_carve_out'
            },
            inplace=True
        )

        # We have the tax deficit absolute amounts with and without carve-outs at 15%, 21%, 25% and 30% minimum rates
        # But we want to display the changes due to carve-outs, as a % of the no-carve-out tax deficit

        # We store the names of the 4 columns that we are going to add to the central DataFrame
        new_columns = []

        # We iterate over the 4 minimum rates
        for minimum_rate in [15, 21, 25, 30]:
            column_name_no_carve_out = f'tax_deficit_{minimum_rate}_no_carve_out'
            column_name_with_carve_out = f'tax_deficit_{minimum_rate}_with_carve_out'

            # We are going to add a new column that provides the % reduction due to carve-outs at the rate considered
            new_column_name = f'reduction_at_{minimum_rate}_minimum_rate'

            # We make the corresponding computation
            merged_df[new_column_name] = (
                (merged_df[column_name_with_carve_out] - merged_df[column_name_no_carve_out]) /
                merged_df[column_name_no_carve_out]
            ) * 100

            new_columns.append(new_column_name)

        if output_Excel:
            with pd.ExcelWriter('/Users/Paul-Emmanuel/Desktop/carve_outs_table_2.xlsx', engine='xlsxwriter') as writer:
                merged_df.to_excel(writer, sheet_name='table_2', index=False)

        # We output the resulting DataFrame with country codes and names, as well as the 4 columns of interest
        merged_df = merged_df[
            ['Parent jurisdiction (alpha-3 code)', 'Parent jurisdiction (whitespaces cleaned)'] + new_columns
        ].copy()

        return merged_df.copy()

    def get_carve_outs_rate_table(
        self,
        minimum_ETR,
        depreciation_only, exclude_inventories,
    ):
        """
        This function takes as inputs:

        - the minimum effective tax rate to apply to multinationals' profits;

        - a boolean, "depreciation_only", indicating whether to restrict the tangible assets component of substance-
        based carve-outs to a share of depreciation expenses;

        - a boolean, "exlude_inventories", indicating whether to exlude inventories from tangible assets or not.

        It returns a DataFrame that shows, for each in-sample country, the estimated revenues that could be collected
        from a global minimum tax without any carve-outs and with carve-outs of 5%, 7.5% and 10% of tangible assets and
        payroll combined.
        """

        # We instantiate a TaxDeficitCalculator object without carve-outs
        calculator = TaxDeficitCalculator()

        calculator.load_clean_data()

        # We use it to compute revenue gains without any carve-out
        td_no_carve_out = calculator.get_total_tax_deficits(minimum_ETR=minimum_ETR).iloc[:-2]

        td_no_carve_out.rename(
            columns={
                'tax_deficit': 'tax_deficit_no_carve_out'
            },
            inplace=True
        )

        # A copy of the resulting DataFrame will be used as a central table to which we add the relevant columns
        merged_df = td_no_carve_out.copy()

        # We iterate over carve-out rates
        for carve_out_rate in [5, 7.5, 10]:
            actual_rate = carve_out_rate / 100

            # We instantiate a TaxDeficitCalculator object with carve-outs at the rate considered
            calculator = TaxDeficitCalculator(
                carve_outs=True, carve_out_rate=actual_rate,
                depreciation_only=False, exclude_inventories=exclude_inventories
            )
            calculator.load_clean_data()

            # We use it to compute revenue gains with substance-based carve-outs being applied
            td_carve_out = calculator.get_total_tax_deficits(minimum_ETR=minimum_ETR).iloc[:-2]

            # We add the tax deficits thereby computed to the central table
            merged_df = merged_df.merge(
                td_carve_out[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
                how='left',
                on='Parent jurisdiction (alpha-3 code)'
            )

            merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

            merged_df.rename(
                columns={
                    'tax_deficit': f'tax_deficit_{carve_out_rate}_carve_out'
                },
                inplace=True
            )

        # We only display EU or CbCR-reporting countries
        cbcr_reporting_countries = list(self.oecd['Parent jurisdiction (alpha-3 code)'].unique())

        mask_eu = merged_df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
        mask_cbcr = merged_df['Parent jurisdiction (alpha-3 code)'].isin(cbcr_reporting_countries)

        # This condition is encapsulated in the following boolean indexing mask
        mask = np.logical_or(mask_eu, mask_cbcr)

        # We add two useful indicator variables
        merged_df['IS_EU'] = mask_eu * 1
        merged_df['REPORTS_CbCR'] = mask_cbcr * 1

        # And we restrict the DataFrame to relevant countries
        restricted_df = merged_df[mask].copy()

        # We finalise the formatting of the table
        restricted_df.sort_values(
            by=['IS_EU', 'Parent jurisdiction (alpha-3 code)'],
            ascending=[False, True],
            inplace=True
        )

        # And eventually return the DataFrame
        return restricted_df.copy()

Classes

class TaxDeficitCalculator (alternative_imputation=True, carve_outs=False, carve_out_rate=None, depreciation_only=None, exclude_inventories=None, payroll_premium=20)

This is the instantiation method for the TaxDeficitCalculator class.

All its arguments have a default value. The two main ones are as follows:

  • the boolean "alternative_imputation" (set to True by default), determines whether the imputation of the non- haven tax deficit of non-OECD reporting countries at minimum rates of 20% or below is operated. For more details on this methodological choice, one can refer to Appendix A of the report;

  • the boolean "carve_outs" (False by default) indicates whether to simulate substance-based carve-outs.

If the latter argument is set to True, additional arguments are required:

  • the "carve_out_rate" (float between 0 and 1) determines what share of tangible assets and payroll should be deduced from the pre-tax profits of multinationals;

  • the boolean "depreciation_only" indicates whether to only account for depreciation expenses (instead of the full value of tangible assets) in the tangible assets component of the carve-outs. Following the methodology of the OECD Secretariat in its Economic Impact Assessment of Oct. 2020, is this argument is set to True, we appro- ximate depreciation expenses as 10% of the book value of tangible assets;

  • the boolean "exclude_inventories" indicates whether to downgrade the tangible assets values provided by the OECD's aggregated and anonymized country-by-country data. As a simplification of the OECD's methodology (Oct. 2020), if the argument is set to True, we reduce all tangible assets by 24%;

  • "payroll_premium" (float between 0 and 100 (considered as a %)) determines what upgrade to apply to the pay- roll proxy. Indeed, the latter is based on ILO's data about per-country mean annual earnings. Considering that the employees of large multinationals generally earn above-average wages, we propose to apply a premium to our payroll proxy.

The instantiation function is mainly used to define several object attributes that generally correspond to as- sumptions taken in the report.

Expand source code
class TaxDeficitCalculator:

    def __init__(
        self,
        alternative_imputation=True,
        carve_outs=False,
        carve_out_rate=None,
        depreciation_only=None, exclude_inventories=None, payroll_premium=20
    ):
        """
        This is the instantiation method for the TaxDeficitCalculator class.

        All its arguments have a default value. The two main ones are as follows:

        - the boolean "alternative_imputation" (set to True by default), determines whether the imputation of the non-
        haven tax deficit of non-OECD reporting countries at minimum rates of 20% or below is operated. For more details
        on this methodological choice, one can refer to Appendix A of the report;

        - the boolean "carve_outs" (False by default) indicates whether to simulate substance-based carve-outs.

        If the latter argument is set to True, additional arguments are required:

        - the "carve_out_rate" (float between 0 and 1) determines what share of tangible assets and payroll should be
        deduced from the pre-tax profits of multinationals;

        - the boolean "depreciation_only" indicates whether to only account for depreciation expenses (instead of the
        full value of tangible assets) in the tangible assets component of the carve-outs. Following the methodology of
        the OECD Secretariat in its Economic Impact Assessment of Oct. 2020, is this argument is set to True, we appro-
        ximate depreciation expenses as 10% of the book value of tangible assets;

        - the boolean "exclude_inventories" indicates whether to downgrade the tangible assets values provided by the
        OECD's aggregated and anonymized country-by-country data. As a simplification of the OECD's methodology (Oct.
        2020), if the argument is set to True, we reduce all tangible assets by 24%;

        - "payroll_premium" (float between 0 and 100 (considered as a %)) determines what upgrade to apply to the pay-
        roll proxy. Indeed, the latter is based on ILO's data about per-country mean annual earnings. Considering that
        the employees of large multinationals generally earn above-average wages, we propose to apply a premium to our
        payroll proxy.

        The instantiation function is mainly used to define several object attributes that generally correspond to as-
        sumptions taken in the report.
        """

        # These attributes will store the data loaded with the "load_clean_data" method
        self.oecd = None
        self.twz = None
        self.twz_domestic = None
        self.twz_CIT = None
        self.mean_wages = None

        # For non-OECD reporting countries, data are taken from TWZ 2019 appendix tables
        # An effective tax rate of 20% is assumed to be applied on profits registered in non-havens
        self.assumed_non_haven_ETR_TWZ = 0.2

        # An effective tax rate of 10% is assumed to be applied on profits registered in tax havens
        self.assumed_haven_ETR_TWZ = 0.1

        # Average exchange rate over the year 2016, extracted from benchmark computations run on Stata
        # Source: European Central Bank
        self.USD_to_EUR_2016 = 1 / 1.1069031

        # self.multiplier_EU = 1.13381004333496
        # self.multiplier_world = 1.1330304145813

        # Gross growth rate of worldwide GDP in current EUR between 2016 and 2021
        # Extracted from benchmark computations run on Stata
        self.multiplier_2021 = 1.1330304145813

        # For rates of 0.2 or lower an alternative imputation is used to estimate the non-haven tax deficit of non-OECD
        # reporting countries; this argument allows to enable or disable this imputation
        self.alternative_imputation = alternative_imputation
        self.reference_rate_for_alternative_imputation = 0.25

        # The list of countries whose tax deficit is partly collected by EU countries in the intermediary scenario
        self.country_list_intermediary_scenario = [
            'USA',
            'AUS',
            'CAN',
            'CHL',
            'MEX',
            'NOR',
            'BMU',
            'BRA',
            'CHN',
            'IND',
            'SGP',
            'ZAF',
            'IDN',
            'JPN'
        ]

        # This boolean indicates whether or not to apply substance-based carve-outs
        self.carve_outs = carve_outs

        # In case we want to simulate substance-based carve-outs, a few additional steps are required
        if carve_outs:

            # We first check whether all the required parameters were provided
            if carve_out_rate is None or depreciation_only is None or exclude_inventories is None:

                raise Exception(
                    'If you want to simulate substance-based carve-outs, you need to indicate all the parameters.'
                )

            self.carve_out_rate = carve_out_rate
            self.depreciation_only = depreciation_only
            self.exclude_inventories = exclude_inventories
            self.payroll_premium = payroll_premium

            # This corresponds to the OECD Secretariat's simulations in its Economic Impact Assessment (Oct. 2020):
            # inventories are excluded from tangible assets and only depreciation expenses can be partly deducted
            if depreciation_only and exclude_inventories:
                self.assets_multiplier = 0.1 * (1 - 0.24)

            # Here, we only account for depreciation expenses but do not exclude inventories
            elif depreciation_only and not exclude_inventories:
                self.assets_multiplier = 0.1

            # In this case, we take the full value of tangible assets to form the tangible assets component of substan-
            # ce-based carve-outs, while excluding inventories
            elif not depreciation_only and exclude_inventories:
                self.assets_multiplier = (1 - 0.24)

            # Benchmark case, where we take the full value of tangible assets without adjusting for inventories
            else:
                self.assets_multiplier = 1

        else:
            self.carve_out_rate = None
            self.depreciation_only = None
            self.exclude_inventories = None

    def load_clean_data(
        self,
        path_to_oecd=path_to_oecd,
        path_to_twz=path_to_twz,
        path_to_twz_domestic=path_to_twz_domestic,
        path_to_twz_CIT=path_to_twz_CIT,
        path_to_preprocessed_mean_wages=path_to_preprocessed_mean_wages,
        path_to_statutory_rates=path_to_statutory_rates,
        inplace=True
    ):
        """
        This method allows to load and clean data from 6 different sources:

        - the "oecd.csv" file which was extracted from the OECD's aggregated and anonymized country-by-country repor-
        ting, considering only the positive profit sample. Figures are in 2016 USD;

        - the "twz.csv" file which was extracted from the Table C4 of the TWZ 2019 online appendix. It presents, for
        a number of countries, the amounts of profits shifted to tax havens that are re-allocated to them on an ultima-
        te ownership basis. Figures are in 2016 USD million;

        - the "twz_domestic.csv" file, taken from the outputs of benchmark computations run on Stata. It presents for
        each country the amount of corporate profits registered locally by domestic MNEs and the effective tax rate to
        which they are subject. Figures are in 2016 USD billion;

        - the "twz_CIT.csv" file, extracted from Table U1 of the TWZ 2019 online appendix. It presents the corporate in-
        come tax revenue of each country in 2016 USD billion;

        - the "preprocessed_mean_wages.csv" file, taken from the outputs of substance-based carve-outs run on Stata. For
        each partner jurisdiction in the OECD's country-by-country data, it provides either a measure or an approxima-
        tion of the local mean annual earnings in 2016 in current USD. It is built upon ILO data, more details being
        provided in the methodological section of the Note n°1 of the Observatory on substance-based carve-outs;

        - the "statutory_rates.csv" file that provides, for a number of partner jurisdictions, their 2016 statutory cor-
        porate income tax rates.

        Default paths are used to let the simulator run via the app.py file. If you wish to use the tax_deficit_calcula-
        tor package in another context, you can save the data locally and give the method paths to the data files. The
        possibility to load the files from an online host instead will soon be implemented.
        """
        try:

            # We try to read the files from the provided paths
            oecd = pd.read_csv(path_to_oecd)
            twz = pd.read_csv(path_to_twz, delimiter=';')
            twz_domestic = pd.read_csv(path_to_twz_domestic, delimiter=';')
            twz_CIT = pd.read_csv(path_to_twz_CIT, delimiter=';')
            preprocessed_mean_wages = pd.read_csv(path_to_preprocessed_mean_wages, delimiter=';')
            statutory_rates = pd.read_csv(path_to_statutory_rates, delimiter=';')

        except FileNotFoundError:

            # If at least one of the files is not found
            raise Exception('Are you sure these are the right paths for the source files?')

        # --- Cleaning the OECD data

        # We drop a few irrelevant columns from country-by-country data
        oecd.drop(
            columns=['PAN', 'Grouping', 'Flag Codes', 'Flags', 'YEA', 'Year'],
            inplace=True
        )

        # We reshape the DataFrame from a long to a wide dataset
        oecd = oecd.pivot(
            index=['COU', 'Ultimate Parent Jurisdiction', 'JUR', 'Partner Jurisdiction'],
            columns='Variable',
            values='Value'
        ).reset_index()

        # We rename some columns to match the code that has been written before modifying how OECD data are loaded
        oecd.rename(
            columns={
                'COU': 'Parent jurisdiction (alpha-3 code)',
                'Ultimate Parent Jurisdiction': 'Parent jurisdiction (whitespaces cleaned)',
                'JUR': 'Partner jurisdiction (alpha-3 code)',
                'Partner Jurisdiction': 'Partner jurisdiction (whitespaces cleaned)'
            },
            inplace=True
        )

        # Thanks to a function defined in utils.py, we rename the "Foreign Jurisdictions Total" field for all countries
        # that only report a domestic / foreign breakdown in their CbCR
        oecd['Partner jurisdiction (whitespaces cleaned)'] = oecd.apply(rename_partner_jurisdictions, axis=1)

        # We eliminate stateless entities and the "Foreign Jurisdictions Total" filds
        oecd = oecd[
            ~oecd['Partner jurisdiction (whitespaces cleaned)'].isin(['Foreign Jurisdictions Total', 'Stateless'])
        ].copy()

        # We replace missing "Income Tax Paid" values by the corresponding "Income Tax Accrued" values
        # (Some missing values remain even after this edit)
        oecd['Income Tax Paid (on Cash Basis)'] = oecd.apply(
            (
                lambda row: row['Income Tax Paid (on Cash Basis)']
                if not np.isnan(row['Income Tax Paid (on Cash Basis)'])
                else row['Income Tax Accrued - Current Year']
            ),
            axis=1
        )

        # We clean the statutory corporate income tax rate dataset
        statutory_rates['statrate'] = statutory_rates['statrate'].map(
            lambda x: x.replace(',', '.') if isinstance(x, str) else x
        ).astype(float)

        # And we merge it with country-by-country data, on partner jurisdiction alpha-3 codes
        oecd = oecd.merge(
            statutory_rates,
            how='left',
            left_on='Partner jurisdiction (alpha-3 code)', right_on='partner'
        )

        oecd.drop(columns=['partner'], inplace=True)

        # We impute missing "Income Tax Paid" values assuming that pre-tax profits are taxed at the local statutory rate
        oecd['Income Tax Paid (on Cash Basis)'] = oecd.apply(
            (
                lambda row: row['Income Tax Paid (on Cash Basis)']
                if not np.isnan(row['Income Tax Paid (on Cash Basis)'])
                else row['Profit (Loss) before Income Tax'] * row['statrate']
            ),
            axis=1
        )

        oecd.drop(columns=['statrate'], inplace=True)

        # ETR computation (using tax paid as the numerator)
        oecd['ETR'] = oecd['Income Tax Paid (on Cash Basis)'] / oecd['Profit (Loss) before Income Tax']
        oecd['ETR'] = oecd['ETR'].map(lambda x: 0 if x < 0 else x)

        # Adding an indicator variable for domestic profits (rows with the same parent and partner jurisdiction)
        oecd['Is domestic?'] = oecd.apply(
            lambda row: row['Parent jurisdiction (alpha-3 code)'] == row['Partner jurisdiction (alpha-3 code)'],
            axis=1
        ) * 1

        # We add an indicator variable that takes value 1 if and only if the partner is a tax haven
        oecd['Is partner jurisdiction a tax haven?'] = oecd['Partner jurisdiction (alpha-3 code)'].isin(
            tax_haven_country_codes
        ) * 1

        # Adding another indicator variable that takes value 1 if and only if the partner is not a tax haven
        oecd['Is partner jurisdiction a non-haven?'] = 1 - oecd['Is partner jurisdiction a tax haven?']

        # This indicator variable is used specifically for the simulation of carve-outs; it takes value 1 if and only if
        # the partner jurisdiction is not the parent jurisdiction, not a tax haven and not a regional aggregate
        oecd['Is partner jurisdiction a non-haven? - CO'] = oecd.apply(
            (
                lambda row: 0
                if (
                    row['Parent jurisdiction (alpha-3 code)'] in COUNTRIES_WITH_MINIMUM_REPORTING
                    and row['Partner jurisdiction (alpha-3 code)'] == 'FJT'
                ) or (
                    row['Parent jurisdiction (alpha-3 code)'] in COUNTRIES_WITH_CONTINENTAL_REPORTING
                    and row['Partner jurisdiction (alpha-3 code)'] in ['GRPS', 'AFRIC', 'AMER', 'ASIAT', 'EUROP']
                ) or (
                    row['Is domestic?'] == 1
                )
                else row['Is partner jurisdiction a non-haven?']
            ),
            axis=1
        )

        # This indicator variable, used specifically for the simulation of carve-outs, takes value 1 if and only if the
        # partner is a regional aggregate
        oecd['Is partner jurisdiction an aggregate partner? - CO'] = np.logical_and(
            oecd['Is domestic?'] == 0,
            np.logical_and(
                oecd['Is partner jurisdiction a non-haven? - CO'] == 0,
                oecd['Is partner jurisdiction a tax haven?'] == 0
            )
        ) * 1

        # Thanks to a small function imported from utils.py, we manage the slightly problematic overlap between the
        # various indicator variables ("Is domestic?" sort of gets the priority over the others)
        oecd['Is partner jurisdiction a tax haven?'] = oecd.apply(
            lambda row: manage_overlap_with_domestic(row, 'haven'),
            axis=1
        )

        oecd['Is partner jurisdiction a non-haven?'] = oecd.apply(
            lambda row: manage_overlap_with_domestic(row, 'non-haven'),
            axis=1
        )

        # We need some more work on the data if we want to simulate substance-based carve-outs
        if self.carve_outs:

            # We merge earnings data with country-by-country data on partner jurisdiction codes
            oecd = oecd.merge(
                preprocessed_mean_wages[['partner2', 'earn']],
                how='left',
                left_on='Partner jurisdiction (alpha-3 code)', right_on='partner2'
            )

            oecd.drop(columns=['partner2'], inplace=True)

            oecd.rename(
                columns={
                    'earn': 'ANNUAL_VALUE'
                },
                inplace=True
            )

            # We clean the mean annual earnings column
            oecd['ANNUAL_VALUE'] = oecd['ANNUAL_VALUE'].map(
                lambda x: x.replace(',', '.') if isinstance(x, str) else x
            ).astype(float)

            # We deduce the payroll proxy from the number of employees and from mean annual earnings
            oecd['PAYROLL'] = oecd['Number of Employees'] * oecd['ANNUAL_VALUE'] * (1 + self.payroll_premium / 100)

            # We compute substance-based carve-outs from both payroll and tangible assets
            oecd['CARVE_OUT'] = self.carve_out_rate * (
                oecd['PAYROLL'] + oecd['Tangible Assets other than Cash and Cash Equivalents'] * self.assets_multiplier
            )

            # This column will contain slightly modified carve-outs, carve-outs being replaced by pre-tax profits
            # wherever the former exceeds the latter
            oecd['CARVE_OUT_TEMP'] = oecd.apply(
                (
                    lambda row: row['CARVE_OUT'] if row['Profit (Loss) before Income Tax'] > row['CARVE_OUT']
                    or np.isnan(row['CARVE_OUT'])
                    else row['Profit (Loss) before Income Tax']
                ),
                axis=1
            )

            # We exclude rows with missing carve-out values in a temporary DataFrame
            oecd_temp = oecd[
                ~np.logical_or(
                    oecd['PAYROLL'].isnull(),
                    oecd['Tangible Assets other than Cash and Cash Equivalents'].isnull()
                )
            ].copy()

            # We compute the average reduction in non-haven pre-tax profits due to carve-outs
            self.avg_carve_out_impact_non_haven = (
                oecd_temp[
                    oecd_temp['Is partner jurisdiction a non-haven? - CO'] == 1
                ]['CARVE_OUT_TEMP'].sum() /
                oecd_temp[
                    oecd_temp['Is partner jurisdiction a non-haven? - CO'] == 1
                ]['Profit (Loss) before Income Tax'].sum()
            )

            # We do the same for pre-tax profits booked in tax havens, domestically and in aggregate partners
            self.avg_carve_out_impact_tax_haven = (
                oecd_temp[oecd_temp['Is partner jurisdiction a tax haven?'] == 1]['CARVE_OUT_TEMP'].sum() /
                oecd_temp[
                    oecd_temp['Is partner jurisdiction a tax haven?'] == 1
                ]['Profit (Loss) before Income Tax'].sum()
            )
            self.avg_carve_out_impact_domestic = (
                oecd_temp[oecd_temp['Is domestic?'] == 1]['CARVE_OUT_TEMP'].sum() /
                oecd_temp[oecd_temp['Is domestic?'] == 1]['Profit (Loss) before Income Tax'].sum()
            )
            self.avg_carve_out_impact_aggregate = (
                oecd_temp[
                    oecd_temp['Is partner jurisdiction an aggregate partner? - CO'] == 1
                ]['CARVE_OUT_TEMP'].sum() /
                oecd_temp[
                    oecd_temp['Is partner jurisdiction an aggregate partner? - CO'] == 1
                ]['Profit (Loss) before Income Tax'].sum()
            )

            # We impute missing carve-out values based on these average reductions in pre-tax profits
            oecd['CARVE_OUT'] = oecd.apply(
                lambda row: impute_missing_carve_out_values(
                    row,
                    avg_carve_out_impact_domestic=self.avg_carve_out_impact_domestic,
                    avg_carve_out_impact_tax_haven=self.avg_carve_out_impact_tax_haven,
                    avg_carve_out_impact_non_haven=self.avg_carve_out_impact_non_haven,
                    avg_carve_out_impact_aggregate=self.avg_carve_out_impact_aggregate
                ),
                axis=1
            )

            # Some missing values remain whenever profits before tax are missing
            oecd = oecd[~oecd['CARVE_OUT'].isnull()].copy()

            # We remove substance-based carve-outs from pre-tax profits
            oecd['Profit (Loss) before Income Tax'] = oecd.apply(
                (
                    lambda row: row['Profit (Loss) before Income Tax'] - row['CARVE_OUT']
                    if row['Profit (Loss) before Income Tax'] - row['CARVE_OUT'] >= 0
                    else 0
                ),
                axis=1
            )

        # --- Cleaning the TWZ tax haven profits data

        # Adding an indicator variable for OECD reporting - We do not consider the Swedish CbCR
        twz['Is parent in OECD data?'] = twz['Alpha-3 country code'].map(
            lambda x: x in oecd['Parent jurisdiction (alpha-3 code)'].unique() if x != 'SWE' else False
        ) * 1

        # We reformat numeric columns - Resulting figures are expressed in 2016 USD
        for column_name in ['Profits in all tax havens', 'Profits in all tax havens (positive only)']:
            twz[column_name] = twz[column_name].map(lambda x: x.replace(',', '.'))
            twz[column_name] = twz[column_name].astype(float) * 1000000

            if self.carve_outs:
                # If we want to simulate carve-outs, we need to downgrade TWZ tax haven profits by the average reduction
                # due to carve-outs that is observed for tax haven profits in the OECD data
                twz[column_name] *= (1 - self.avg_carve_out_impact_tax_haven)
            else:
                continue

        # We filter out countries with 0 profits in tax havens
        twz = twz[twz['Profits in all tax havens (positive only)'] > 0].copy()

        # --- Cleaning the TWZ domestic profits data

        # Reformatting the profits column - Resulting figures are expressed in 2016 USD
        twz_domestic['Domestic profits'] = twz_domestic['Domestic profits']\
            .map(lambda x: x.replace(',', '.'))\
            .astype(float) * 1000000000

        # Reformatting the ETR column
        twz_domestic['Domestic ETR'] = twz_domestic['Domestic ETR'].map(lambda x: x.replace(',', '.')).astype(float)

        if self.carve_outs:
            # If we want to simulate carve-outs, we need to downgrade TWZ domestic profits by the average reduction due
            # to carve-outs that is observed for domestic profits in the OECD data
            twz_domestic['Domestic profits'] *= (1 - self.avg_carve_out_impact_domestic)

        # --- Cleaning the TWZ CIT revenue data

        # Reformatting the CIT revenue column - Resulting figures are expressed in 2016 USD
        twz_CIT['CIT revenue'] = twz_CIT['CIT revenue']\
            .map(lambda x: x.replace(',', '.'))\
            .astype(float) * 1000000000

        if inplace:
            self.oecd = oecd.copy()
            self.twz = twz.copy()
            self.twz_domestic = twz_domestic.copy()
            self.twz_CIT = twz_CIT.copy()
            self.mean_wages = preprocessed_mean_wages.copy()

        else:

            if self.carve_outs:
                return oecd.copy(), twz.copy(), twz_domestic.copy(), twz_CIT.copy(), preprocessed_mean_wages.copy()

            else:
                return oecd.copy(), twz.copy(), twz_domestic.copy(), twz_CIT.copy()

    def get_non_haven_imputation_ratio(self, minimum_ETR):
        """
        For non-OECD reporting countries, we base our estimates on data compiled by Tørsløv, Wier and Zucman (2019).
        These allow to compute domestic and tax-haven-based tax deficit of these countries. We extrapolate the non-haven
        tax deficit of these countries from the tax-haven one.

        We impute the tax deficit in non-haven jurisdictions by estimating the ratio of tax deficits in non-tax havens
        to tax-havens for the EU non-tax haven parent countries in the CbCR data. We assume a 20% ETR in non-tax havens
        and a 10% ETR in tax havens (these rates are defined in two dedicated attributes in the instantiation function).

        This function allows to compute this ratio following the (A2) formula of Appendix A.

        The methodology is described in more details in the Appendix A of the report.
        """

        # We need to have previously loaded and cleaned the OECD data
        if self.oecd is None:
            raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

        # With a minimum ETR of 10%, the formula cannot be applied (division by 0), hence this case disjunction
        if minimum_ETR > 0.1:
            oecd = self.oecd.copy()

            # In the computation of the imputation ratio, we only focus on:
            # - EU-27 parent countries
            mask_eu = oecd['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
            # - That are not tax havens
            mask_non_haven = ~oecd['Parent jurisdiction (alpha-3 code)'].isin(tax_haven_country_codes)
            # - And report a detailed country by country breakdown in their CbCR
            mask_minimum_reporting_countries = ~oecd['Parent jurisdiction (alpha-3 code)'].isin(
                COUNTRIES_WITH_MINIMUM_REPORTING + COUNTRIES_WITH_CONTINENTAL_REPORTING
            )

            # We combine the boolean indexing masks
            mask = np.logical_and(mask_eu, mask_non_haven)
            mask = np.logical_and(mask, mask_minimum_reporting_countries)

            # And convert booleans into 0 / 1 integers
            mask = mask * 1

            # We compute the profits registered by retained countries in non-haven countries
            # (excluding domestic profits, cf. the earlier use of the manage_overlap_with_domestic function)
            foreign_non_haven_profits = (
                (
                    mask * oecd['Is partner jurisdiction a non-haven?']
                ) * oecd['Profit (Loss) before Income Tax']
            ).sum()

            # We compute the profits registered by retained countries in tax havens
            # (excluding domestic profits, cf. the earlier use of the manage_overlap_with_domestic function)
            foreign_haven_profits = (
                (
                    mask * oecd['Is partner jurisdiction a tax haven?']
                ) * oecd['Profit (Loss) before Income Tax']
            ).sum()

            # We apply the formula and compute the imputation ratio
            imputation_ratio_non_haven = (
                (
                    # If the minimum ETR is below the rate assumed to be applied on non-haven profits, there is no tax
                    # deficit to collect from these profits, which is why we have this max(..., 0)
                    max(minimum_ETR - self.assumed_non_haven_ETR_TWZ, 0) * foreign_non_haven_profits
                ) /
                ((minimum_ETR - self.assumed_haven_ETR_TWZ) * foreign_haven_profits)
            )

        # We manage the case where the minimum ETR is of 10% and the formula cannot be applied
        elif minimum_ETR == 0.1:

            # As long as tax haven profits are assumed to be taxed at a rate of 10%, the value that we set here has no
            # effect (it will be multiplied to 0 tax-haven-based tax deficits) but to remain consistent with higher
            # values of the minimum ETR, we impute 0

            imputation_ratio_non_haven = 0

        else:
            # We do not yet manage effective tax rates below 10%
            raise Exception('Unexpected minimum ETR entered (strictly below 0.1).')

        return imputation_ratio_non_haven

    def get_alternative_non_haven_factor(self, minimum_ETR):
        """
        Looking at the formula (A2) of Appendix A and at the previous method, we see that for a 15% tax rate, this impu-
        tation would result in no tax deficit to be collected from non-tax haven jurisdictions. Thus, we correct for
        this underestimation by computing the ratio of the tax deficit that can be collected in non-tax havens at a 15%
        and a 25% rate for OECD-reporting countries.

        This class method allows to compute this alternative imputation ratio.

        The methodology is described in more details in the Appendix A of the report.
        """

        # We need to have previously loaded and cleaned the OECD data
        if self.oecd is None:
            raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

        # This method is only useful if the previous one yields a ratio of 0, i.e. if the minimum ETR is of 20% or less
        if minimum_ETR > 0.2:
            raise Exception('These computations are only used when the minimum ETR considered is 0.2 or less.')

        # We use the get_stratified_oecd_data to compute the non-haven tax deficit of OECD-reporting countries
        oecd_stratified = self.get_stratified_oecd_data(
            minimum_ETR=self.reference_rate_for_alternative_imputation
        )

        # We exclude countries whose CbCR breakdown does not allow to distinguish tax-haven and non-haven profits
        df_restricted = oecd_stratified[
            ~oecd_stratified['Parent jurisdiction (alpha-3 code)'].isin(
                COUNTRIES_WITH_CONTINENTAL_REPORTING + COUNTRIES_WITH_MINIMUM_REPORTING
            )
        ].copy()

        # The denominator is the total non-haven tax deficit of relevant countries at the reference minimum ETR
        denominator = df_restricted['tax_deficit_x_non_haven'].sum()

        # We follow the same process, running computations at the minimum ETR this time
        oecd_stratified = self.get_stratified_oecd_data(minimum_ETR=minimum_ETR)

        # We exclude countries whose CbCR breakdown does not allow to distinguish tax-haven and non-haven profits
        df_restricted = oecd_stratified[
            ~oecd_stratified['Parent jurisdiction (alpha-3 code)'].isin(
                COUNTRIES_WITH_CONTINENTAL_REPORTING + COUNTRIES_WITH_MINIMUM_REPORTING
            )
        ].copy()

        # The numerator is the total non-haven tax deficit of relevant countries at the selected minimum ETR
        numerator = df_restricted['tax_deficit_x_non_haven'].sum()

        return numerator / denominator

    def get_stratified_oecd_data(self, minimum_ETR=0.25):
        """
        This method constitutes a first step in the computation of each country's collectible tax deficit in the multi-
        lateral agreement scenario.

        Taking the minimum effective tax rate as input and based on OECD data, this function outputs a DataFrame that
        displays, for each OECD-reporting parent country, the tax deficit that could be collected from the domestic,
        tax haven and non-haven profits of multinationals headquartered in this country.

        The output is in 2016 USD, like the raw OECD data.
        """

        # We need to have previously loaded and cleaned the OECD data
        if self.oecd is None:
            raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

        oecd = self.oecd.copy()

        # We only profits taxed at an effective tax rate above the minimum ETR
        oecd = oecd[oecd['ETR'] < minimum_ETR].copy()

        # We compute the ETR differential for all low-taxed profits
        oecd['ETR_differential'] = oecd['ETR'].map(lambda x: minimum_ETR - x)

        # And deduce the tax deficit generated by each Parent / Partner jurisidiction pair
        oecd['tax_deficit'] = oecd['ETR_differential'] * oecd['Profit (Loss) before Income Tax']

        # Using the aforementioned indicator variables allows to breakdown this tax deficit
        oecd['tax_deficit_x_domestic'] = oecd['tax_deficit'] * oecd['Is domestic?']
        oecd['tax_deficit_x_tax_haven'] = oecd['tax_deficit'] * oecd['Is partner jurisdiction a tax haven?']
        oecd['tax_deficit_x_non_haven'] = oecd['tax_deficit'] * oecd['Is partner jurisdiction a non-haven?']

        # We group the table by Parent jurisdiction such that for, say, France, the table displays the total domestic,
        # tax-haven and non-haven tax deficit generated by French multinationals
        oecd_stratified = oecd[
            [
                'Parent jurisdiction (whitespaces cleaned)',
                'Parent jurisdiction (alpha-3 code)',
                'tax_deficit',
                'tax_deficit_x_domestic',
                'tax_deficit_x_tax_haven',
                'tax_deficit_x_non_haven'
            ]
        ].groupby(
            'Parent jurisdiction (whitespaces cleaned)'
        ).agg(
            {
                'Parent jurisdiction (alpha-3 code)': 'first',
                'tax_deficit': 'sum',
                'tax_deficit_x_domestic': 'sum',
                'tax_deficit_x_tax_haven': 'sum',
                'tax_deficit_x_non_haven': 'sum'
            }
        ).copy()

        oecd_stratified.reset_index(inplace=True)

        return oecd_stratified.copy()

    def compute_all_tax_deficits(self, minimum_ETR=0.25, CbCR_reporting_countries_only=False):
        """
        This method encapsulates most of the computations for the multilateral agreement scenario.

        Taking as input the minimum effective tax rate to apply and based on OECD and TWZ data, it outputs a DataFrame
        which presents, for each country in our sample (countries in OECD and/or TWZ data) the total tax deficit, as
        well as its breakdown into domestic, tax-haven and non-haven tax deficits.

        The output is in 2021 EUR after a currency conversion and the extrapolation from 2016 to 2021 figures.
        """
        # We need to have previously loaded and cleaned the OECD and TWZ data
        if self.oecd is None or self.twz is None:
            raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

        # We use the method defined above and will use its output as a base for the following computations
        oecd_stratified = self.get_stratified_oecd_data(minimum_ETR=minimum_ETR)

        twz = self.twz.copy()

        # From TWZ data on profits registered in tax havens and assuming that these are taxed at a given minimum ETR
        # (10% in the report, see the instantiation function for the definition of this attribute), we deduce the tax-
        # haven-based tax deficit of TWZ countries
        twz['tax_deficit_x_tax_haven_TWZ'] = \
            twz['Profits in all tax havens (positive only)'] * (minimum_ETR - self.assumed_haven_ETR_TWZ)

        # --- Managing countries in both OECD and TWZ data

        # We focus on parent countries which are in both the OECD and TWZ data
        # NB: recall that we do not consider the Swedish CbCR
        twz_in_oecd = twz[twz['Is parent in OECD data?'].astype(bool)].copy()

        # We merge the two DataFrames on country codes
        merged_df = oecd_stratified.merge(
            twz_in_oecd[['Country', 'Alpha-3 country code', 'tax_deficit_x_tax_haven_TWZ']],
            how='left',
            left_on='Parent jurisdiction (alpha-3 code)',
            right_on='Alpha-3 country code'
        ).drop(columns=['Country', 'Alpha-3 country code'])

        # For countries that are in the OECD data but not in TWZ, we impute a tax-haven-based tax deficit from TWZ of 0
        merged_df['tax_deficit_x_tax_haven_TWZ'] = merged_df['tax_deficit_x_tax_haven_TWZ'].fillna(0)

        self.countries_replaced = []

        if self.carve_outs:

            calculator = TaxDeficitCalculator()
            calculator.load_clean_data()
            _ = calculator.compute_all_tax_deficits()

            countries_replaced = calculator.countries_replaced.copy()

            merged_df['tax_deficit_x_tax_haven_merged'] = merged_df.apply(
                lambda row: self.combine_haven_tax_deficits(
                    row,
                    carve_outs=self.carve_outs,
                    countries_replaced=countries_replaced),
                axis=1
            )

        else:
            merged_df['tax_deficit_x_tax_haven_merged'] = merged_df.apply(
                lambda row: self.combine_haven_tax_deficits(
                    row,
                    carve_outs=self.carve_outs
                ),
                axis=1
            )

        self.countries_replaced = merged_df[
            merged_df['tax_deficit_x_tax_haven_merged'] == merged_df['tax_deficit_x_tax_haven_TWZ']
        ]['Parent jurisdiction (alpha-3 code)'].unique()

        merged_df.drop(columns=['tax_deficit_x_tax_haven', 'tax_deficit_x_tax_haven_TWZ'], inplace=True)

        merged_df.rename(
            columns={
                'tax_deficit_x_tax_haven_merged': 'tax_deficit_x_tax_haven'
            },
            inplace=True
        )

        # Summing the tax-haven-based, non-haven and domestic tax deficits yields the total tax deficit of each country
        merged_df['tax_deficit'] = merged_df['tax_deficit_x_tax_haven'] \
            + merged_df['tax_deficit_x_domestic'] \
            + merged_df['tax_deficit_x_non_haven']

        # --- Countries only in the TWZ data

        # We now focus on countries that are absent from the OECD data
        # NB: recall that we do not consider the Swedish CbCR
        twz_not_in_oecd = twz[~twz['Is parent in OECD data?'].astype(bool)].copy()

        twz_not_in_oecd.drop(
            columns=['Profits in all tax havens', 'Profits in all tax havens (positive only)'],
            inplace=True
        )

        # - Extrapolating the foreign non-haven tax deficit

        # We compute the imputation ratio with the method defined above
        imputation_ratio_non_haven = self.get_non_haven_imputation_ratio(minimum_ETR=minimum_ETR)

        # And we deduce the non-haven tax deficit of countries that are only found in TWZ data
        twz_not_in_oecd['tax_deficit_x_non_haven'] = \
            twz_not_in_oecd['tax_deficit_x_tax_haven_TWZ'] * imputation_ratio_non_haven

        # - Computing the domestic tax deficit

        # For countries that are only in TWZ data, we still need to compute their domestic tax deficit
        twz_domestic = self.twz_domestic.copy()

        # We only consider countries whose domestic ETR is stricly below the minimum ETR
        # (otherwise, there is no tax deficit to collect from domestic profits)
        twz_domestic = twz_domestic[twz_domestic['Domestic ETR'] < minimum_ETR].copy()

        # We compute the ETR differential
        twz_domestic['ETR_differential'] = twz_domestic['Domestic ETR'].map(lambda x: minimum_ETR - x)

        # And deduce the domestic tax deficit of each country
        twz_domestic['tax_deficit_x_domestic'] = twz_domestic['ETR_differential'] * twz_domestic['Domestic profits']

        # - Combining the different forms of tax deficit

        # We merge the two DataFrames to complement twz_not_in_oecd with domestic tax deficit results
        twz_not_in_oecd = twz_not_in_oecd.merge(
            twz_domestic[['Alpha-3 country code', 'tax_deficit_x_domestic']],
            how='left',
            on='Alpha-3 country code'
        )

        # As we filtered out countries whose domestic ETR is stricly below the minimum ETR, some missing values
        # appear during the merge; we impute 0 for these as they do not have any domestic tax deficit to collect
        twz_not_in_oecd['tax_deficit_x_domestic'] = twz_not_in_oecd['tax_deficit_x_domestic'].fillna(0)

        # We deduce the total tax deficit for each country
        twz_not_in_oecd['tax_deficit'] = twz_not_in_oecd['tax_deficit_x_tax_haven_TWZ'] \
            + twz_not_in_oecd['tax_deficit_x_domestic'] \
            + twz_not_in_oecd['tax_deficit_x_non_haven']

        # --- Merging the results of the two data sources

        # We need columns to match for the concatenation to operate smoothly
        twz_not_in_oecd.rename(
            columns={
                'Country': 'Parent jurisdiction (whitespaces cleaned)',
                'Alpha-3 country code': 'Parent jurisdiction (alpha-3 code)',
                'tax_deficit_x_tax_haven_TWZ': 'tax_deficit_x_tax_haven'
            },
            inplace=True
        )

        twz_not_in_oecd.drop(columns=['Is parent in OECD data?'], inplace=True)

        # We exclude Sweden from the OECD-drawn results, as we do not consider its CbCR
        merged_df = merged_df[merged_df['Parent jurisdiction (alpha-3 code)'] != 'SWE'].copy()

        # We eventually concatenate the two DataFrames
        merged_df = pd.concat(
            [merged_df, twz_not_in_oecd],
            axis=0
        )

        # --- Extrapolations to 2021 EUR

        # We convert 2016 USD results in 2016 EUR and extraprolate them to 2021 EUR
        for column_name in merged_df.columns[2:]:
            merged_df[column_name] = merged_df[column_name] * self.USD_to_EUR_2016 * self.multiplier_2021

        # --- Managing the case where the minimum ETR is 20% or below for TWZ countries

        # As mentioned above and detailed in Appendix A, the imputation of the non-haven tax deficit of TWZ countries
        # follows a specific process whenever the chosen minimum ETR is of or below 20%
        if minimum_ETR <= 0.2 and self.alternative_imputation:
            # We get the new multiplying factor from the method defined above
            multiplying_factor = self.get_alternative_non_haven_factor(minimum_ETR=minimum_ETR)

            # We compute all tax deficits at the reference rate (25% in the report)
            df = self.compute_all_tax_deficits(
                minimum_ETR=self.reference_rate_for_alternative_imputation
            )

            # We only consider countries that are absent from the OECD data, except Sweden as usual
            oecd_reporting_countries_but_SWE = self.oecd[
                self.oecd['Parent jurisdiction (alpha-3 code)'] != 'SWE'
            ]['Parent jurisdiction (alpha-3 code)'].unique()

            df = df[
                ~df['Parent jurisdiction (alpha-3 code)'].isin(oecd_reporting_countries_but_SWE)
            ].copy()

            # For these countries, we multiply the non-haven tax deficit at the reference rate by the multiplying factor
            df['tax_deficit_x_non_haven_imputation'] = df['tax_deficit_x_non_haven'] * multiplying_factor

            # We save the results in a dictionary that will allow to map the DataFrame that we want to output in the end
            mapping = {}

            for _, row in df.iterrows():
                mapping[row['Parent jurisdiction (alpha-3 code)']] = row['tax_deficit_x_non_haven_imputation']

            # We create a new column in the to-be-output DataFrame which takes as value:
            # - the non-haven tax deficit estimated just above for TWZ countries
            # - 0 for OECD-reporting countries, which do not require this imputation
            merged_df['tax_deficit_x_non_haven_imputation'] = merged_df['Parent jurisdiction (alpha-3 code)'].map(
                lambda country_code: mapping.get(country_code, 0)
            )

            # We deduce the non-haven tax deficit of all countries
            merged_df['tax_deficit_x_non_haven'] += merged_df['tax_deficit_x_non_haven_imputation']

            # And add this imputation also to the column that presents the total tax deficit of each country
            merged_df['tax_deficit'] += merged_df['tax_deficit_x_non_haven_imputation']

            merged_df.drop(
                columns=['tax_deficit_x_non_haven_imputation'],
                inplace=True
            )

        if CbCR_reporting_countries_only:
            merged_df = merged_df[
                merged_df['Parent jurisdiction (whitespaces cleaned)'].isin(
                    self.oecd['Parent jurisdiction (whitespaces cleaned)'].unique()
                )
            ].copy()

        return merged_df.reset_index(drop=True).copy()

    def combine_haven_tax_deficits(
        self,
        row,
        carve_outs=False,
        countries_replaced=None
    ):
        """
        This function is used to compute the tax deficit of all in-sample headquarter countries in the multilateral im-
        plementation scenario.

        For parent countries that are in both the OECD and TWZ data, we have two different sources to compute their tax-
        haven-based tax deficit and we retain the highest of these two amounts.

        Besides, for parent countries in the OECD data that do not report a fully detailed country-by-country breakdown
        of the activity of their multinationals, we cannot distinguish their tax-haven and non-haven tax deficits. Quite
        arbitrarily in the Python code, we attribute everything to the non-haven tax deficit. In the Table A1 of the re-
        port, these specific cases are described with the "Only foreign aggregate data" column.
        """
        if carve_outs and countries_replaced is None:
            raise Exception(
                'Using this function under carve-outs requires to indicate a list of countries to replace.'
            )

        if row['Parent jurisdiction (alpha-3 code)'] not in (
            COUNTRIES_WITH_MINIMUM_REPORTING + COUNTRIES_WITH_CONTINENTAL_REPORTING
        ):
            if countries_replaced is None:

                if row['tax_deficit_x_tax_haven_TWZ'] > row['tax_deficit_x_tax_haven']:
                    self.countries_replaced.append(row['Parent jurisdiction (alpha-3 code)'])
                    return row['tax_deficit_x_tax_haven_TWZ']

                else:
                    return row['tax_deficit_x_tax_haven']

            else:
                if (
                    row['tax_deficit_x_tax_haven_TWZ'] > row['tax_deficit_x_tax_haven']
                    and row['Parent jurisdiction (alpha-3 code)'] in countries_replaced
                ):
                    self.countries_replaced.append(row['Parent jurisdiction (alpha-3 code)'])
                    return row['tax_deficit_x_tax_haven_TWZ']

                else:
                    return row['tax_deficit_x_tax_haven']

        else:
            return 0

    def check_tax_deficit_computations(self, minimum_ETR=0.25):
        """
        Taking the selected minimum ETR as input and relying on the compute_all_tax_deficits method defined above, this
        method outputs a DataFrame that can be compared with Table A1 of the report. For each country in OECD and/or TWZ
        data, it displays its total tax deficit and a breakdown into domestic, tax-haven-based and non-haven tax defi-
        cits. Figures are display in 2021 billion EUR.
        """

        # We start from the output of the previously defined method
        df = self.compute_all_tax_deficits(minimum_ETR=minimum_ETR)

        # And convert numeric columns from 2021 EUR to 2021 billion EUR
        for column_name in df.columns[2:]:
            df[column_name] = df[column_name] / 10**9

        return df.copy()

    def get_total_tax_deficits(self, minimum_ETR=0.25):
        """
        This method takes the selected minimum ETR as input and relies on the compute_all_tax_deficits, to output a Da-
        taFrame with (i) the total tax defict of each in-sample country in 2021 EUR and (ii) the sum of these tax defi-
        cits at the EU-27 and at the whole sample level. It can be considered as an intermediary step towards the fully
        formatted table displayed on the online simulator (section "Multilateral implementation scenario").
        """

        df = self.compute_all_tax_deficits(minimum_ETR=minimum_ETR)

        df = df[
            ['Parent jurisdiction (whitespaces cleaned)', 'Parent jurisdiction (alpha-3 code)', 'tax_deficit']
        ]

        df.sort_values(
            by='Parent jurisdiction (whitespaces cleaned)',
            inplace=True
        )

        # We compute the sum of total tax deficits at the EU-27 level and for the whole sample
        total_eu = (df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes) * 1 * df['tax_deficit']).sum()
        total_whole_sample = df['tax_deficit'].sum()

        # Possibly suboptimal process to add "Total" lines at the end of the DataFrame
        dict_df = df.to_dict()

        dict_df[df.columns[0]][len(df)] = 'Total - EU27'
        dict_df[df.columns[1]][len(df)] = '..'
        dict_df[df.columns[2]][len(df)] = total_eu

        dict_df[df.columns[0]][len(df) + 1] = 'Total - Whole sample'
        dict_df[df.columns[1]][len(df) + 1] = '..'
        dict_df[df.columns[2]][len(df) + 1] = total_whole_sample

        df = pd.DataFrame.from_dict(dict_df)

        return df.reset_index(drop=True)

    def check_appendix_A2(self):
        """
        Relying on the get_total_tax_deficits method and on TWZ data on corporate income tax revenues, this method out-
        puts a DataFrame that can be compared with the first 4 columns of Table A2 in the report. For each in-sample
        country and at four different minimum ETRs (15%, 21%, 25% and 30% which are the four main cases considered in
        the report), the table presents estimated revenue gains as a percentage of currently corporate income taxes.
        """

        # We need to have previously loaded and cleaned the TWZ data on corporate income tax revenues
        # (figures in the pre-loaded DataFrame are provided in 2016 USD)
        if self.twz_CIT is None:
            raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

        # We compute total tax deficits, first at a 15% minimum ETR and in 2021 EUR
        df = self.get_total_tax_deficits(minimum_ETR=0.15)

        df.rename(columns={'tax_deficit': 'tax_deficit_15'}, inplace=True)

        # We merge the two DataFrames to combine information on collectible tax deficits and current CIT revenues
        merged_df = df.merge(
            self.twz_CIT,
            how='left',
            left_on='Parent jurisdiction (alpha-3 code)',
            right_on='Country (alpha-3 code)'
        ).drop(columns=['Country', 'Country (alpha-3 code)'])

        # We bring back the tax deficit estimated to 2016 USD (from 2021 EUR)
        merged_df['tax_deficit_15'] /= (merged_df['CIT revenue'] * self.multiplier_2021 * self.USD_to_EUR_2016 / 100)

        # For the 3 other rates considered in the output table
        for rate in [0.21, 0.25, 0.3]:
            # We compute total tax deficits at this rate
            df = self.get_total_tax_deficits(minimum_ETR=rate)

            # We add these results to the central DataFrame thanks to a merge operation
            merged_df = merged_df.merge(
                df,
                how='left',
                on='Parent jurisdiction (alpha-3 code)'
            )

            # We impute the missing values produced by the merge
            merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

            # We rename the newly-added tax deficit column
            merged_df.rename(
                columns={'tax_deficit': f'tax_deficit_{int(rate * 100)}'},
                inplace=True
            )

            # And we bring it back to 2016 USD
            merged_df[f'tax_deficit_{int(rate * 100)}'] /= (
                merged_df['CIT revenue'] * self.multiplier_2021 * self.USD_to_EUR_2016 / 100
            )

        # We want to also verify the EU-27 average and restrict the DataFrame to these countries
        eu_df = merged_df[merged_df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)].copy()

        # This attribute stores the average EU-27 revenue gain estimate in % of current CIT revenues for each of the 4
        # minimum ETRs of interest (respectively 15.1%, 30.5%, 52.3% and 84.1% in the report)
        self.check = [
            (
                eu_df[f'tax_deficit_{rate}'] * eu_df['CIT revenue'] / 100
            ).sum() / eu_df['CIT revenue'].sum() for rate in [15, 21, 25, 30]
        ]

        # Coming back to the DataFrame with all in-sample countries, we only keep the relevant columns and output it
        merged_df = merged_df[
            [
                'Parent jurisdiction (whitespaces cleaned)_x',
                'tax_deficit_15', 'tax_deficit_21', 'tax_deficit_25', 'tax_deficit_30'
            ]
        ].copy()

        # NB: in the current version of this method, the successive merges have a poor effect on the "Total" rows that
        # are included in the output of the get_total_tax_deficits method; this could easily be improved

        return merged_df.copy()

    def output_tax_deficits_formatted(self, minimum_ETR=0.25):
        """
        This method is used in the "app.py" file, which underlies the Streamlit simulator. It is used to produce the
        table on the "Multilateral implementation scenario" page. It takes as input the selected minimum ETR and widely
        relies on the get_total_tax_deficits method defined above. It mostly consists in a series of formatting steps.
        """

        # We build the unformatted results table thanks to the get_total_tax_deficits method
        df = self.get_total_tax_deficits(minimum_ETR=minimum_ETR)

        # We only want to include certain countries in the output table:
        # - all the EU-27 countries that are included in our sample (4 unfortunately missing for now)
        # - most of the OECD-reporting countries, excluding only Singapore and Bermuda

        # We first build the list of OECD-reporting countries, excluding Singapore and Bermuda
        oecd_reporting_countries = self.oecd['Parent jurisdiction (alpha-3 code)'].unique()
        oecd_reporting_countries = [
            country_code for country_code in oecd_reporting_countries if country_code not in ['SGP', 'BMU']
        ]

        # From this list, we build the relevant boolean indexing mask that corresponds to our filtering choice
        mask = np.logical_or(
            df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes),
            df['Parent jurisdiction (alpha-3 code)'].isin(oecd_reporting_countries)
        )

        df = df[mask].copy()

        # We sort values by the name of the parent jurisdiction, in the alphabetical order
        df.sort_values(
            by='Parent jurisdiction (whitespaces cleaned)',
            inplace=True
        )

        df.reset_index(drop=True, inplace=True)

        # We convert 2021 EUR figures into 2021 million EUR ones
        df['tax_deficit'] = df['tax_deficit'] / 10**6

        # Again, the same possibly sub-optimal process to add the "Total" lines
        dict_df = df.to_dict()

        dict_df[df.columns[0]][len(df)] = 'Total - EU27'
        dict_df[df.columns[1]][len(df)] = '..'
        dict_df[df.columns[2]][len(df)] = df[
            df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
        ]['tax_deficit'].sum()

        dict_df[df.columns[0]][len(df) + 1] = 'Total - Whole sample'
        dict_df[df.columns[1]][len(df) + 1] = '..'
        dict_df[df.columns[2]][len(df) + 1] = df['tax_deficit'].sum()

        df = pd.DataFrame.from_dict(dict_df)

        # We drop country codes
        df.drop(columns=['Parent jurisdiction (alpha-3 code)'], inplace=True)

        # And we eventually reformat figures with a thousand separator and a 0-decimal rounding
        df['tax_deficit'] = df['tax_deficit'].map('{:,.0f}'.format)

        # We rename columns
        df.rename(
            columns={
                'Parent jurisdiction (whitespaces cleaned)': 'Headquarter country',
                'tax_deficit': 'Collectible tax deficit (€m)'
            },
            inplace=True
        )

        return df.copy()

    def compute_unilateral_scenario_gain(self, country, minimum_ETR=0.25):
        """
        This method encapsulates most of the computations for the unilateral implementation scenario.

        It takes as input:

        - the name of the country assumed to unilaterally implement the tax deficit collection;

        - the minimum effective tax rate that it applies when collecting the full tax deficit of its multinationals and
        a part of the tax deficit of foreign multinationals, based on the location of their sales.

        The output of this method is a DataFrame organized as follows:

        - each row is a headquarter country whose tax deficit would be collected partly or entirely by the taxing coun-
        try (including the taxing country which collects 100% of the tax deficit of its multinationals);

        - there are two columns, with the name of the headquarter country considered and the tax deficit amount that
        could be collected from its multinationals by the taxing country.

        Figures are presented in 2021 EUR.

        Important disclaimer: for now, this method is not robust to variations in the country name, i.e. only country
        names as presented in the OECD CbCR data will generate a result. These are the country names that are proposed
        in the selectbox on the online simulator.

        The methogology behind these computations is described in much more details in Appendix B of the report.
        """

        # We start from the total tax deficits of all countries which can be partly re-allocated to the taxing country
        tax_deficits = self.get_total_tax_deficits(minimum_ETR=minimum_ETR)

        # The OECD data provides the information of extra-group sales, needed to allocate foreign tax deficits
        oecd = self.oecd.copy()

        # We simply convert the name of the taxing country to the corresponding alpha-3 code
        taxing_country = country
        try:
            taxing_country_code = self.oecd[
                self.oecd['Parent jurisdiction (whitespaces cleaned)'] == taxing_country
            ]['Parent jurisdiction (alpha-3 code)'].iloc[0]
        except:
            taxing_country_code = self.twz[
                self.twz['Country'] == taxing_country
            ]['Alpha-3 country code'].iloc[0]

        # This list will store the allocation ratios (for each headquarter country, the share of its tax deficit that
        # can be collected by the taxing country) computed based on the location of extra-group sales
        attribution_ratios = []

        # We iterate over parent countries in the OECD data
        for country_code in tax_deficits['Parent jurisdiction (alpha-3 code)'].values:

            # The taxing country collects 100% of the tax deficit of its own multinationals
            if country_code == taxing_country_code:
                attribution_ratios.append(1)

            # If the parent country is not the taxing country
            else:
                # We restrict the DataFrame to the CbCR of the considered parent country
                oecd_restricted = oecd[oecd['Parent jurisdiction (alpha-3 code)'] == country_code].copy()

                # If the taxing country is not part of its partner jurisdictions, the attribution ratio is of 0
                if taxing_country_code not in oecd_restricted['Partner jurisdiction (alpha-3 code)'].values:
                    attribution_ratios.append(0)

                else:
                    # We fetch extra-group sales registered in the taxing country
                    mask = (oecd_restricted['Partner jurisdiction (alpha-3 code)'] == taxing_country_code)
                    sales_in_taxing_country = oecd_restricted[mask]['Unrelated Party Revenues'].iloc[0]

                    # We compute total extra-group sales
                    total_sales = oecd_restricted['Unrelated Party Revenues'].sum()

                    # We append the resulting ratio to the list of attribution ratios
                    attribution_ratios.append(sales_in_taxing_country / total_sales)

        # We add this list to the DataFrame as a new column
        tax_deficits['Attribution ratios'] = attribution_ratios

        # We deduce, for each headquarter country, the tax deficit that could be collected by the taxing country
        tax_deficits[f'Collectible tax deficit for {taxing_country}'] = \
            tax_deficits['tax_deficit'] * tax_deficits['Attribution ratios']

        # We eliminate irrelevant columns
        tax_deficits.drop(
            columns=[
                'Attribution ratios',
                'tax_deficit',
                'Parent jurisdiction (alpha-3 code)'
            ],
            inplace=True
        )

        # We filter out rows for which the collectible tax deficit is 0
        tax_deficits = tax_deficits[tax_deficits[f'Collectible tax deficit for {taxing_country}'] > 0].copy()

        # We sort values based on the resulting tax deficit, in descending order
        tax_deficits.sort_values(
            by=f'Collectible tax deficit for {taxing_country}',
            ascending=False,
            inplace=True
        )

        # Because the OECD data only gather 26 headquarter countries, we need to make an assumption on the tax deficit
        # that could be collected from other parent countries, excluded from the 2016 version of the data

        # We therefore double the tax deficit collected from non-US foreign countries
        imputation = tax_deficits[
            ~tax_deficits['Parent jurisdiction (whitespaces cleaned)'].isin([taxing_country, 'United States'])
        ][f'Collectible tax deficit for {taxing_country}'].sum()

        # Except for Germany, for which we add back only half of the tax deficit collected from non-US foreign countries
        if taxing_country_code == 'DEU':
            imputation /= 2

        tax_deficits.reset_index(drop=True, inplace=True)

        # Again the same inelegant way of adding "Total" fields at the end of the DataFrame
        dict_df = tax_deficits.to_dict()

        dict_df[tax_deficits.columns[0]][len(tax_deficits)] = 'Others (imputation)'
        dict_df[tax_deficits.columns[1]][len(tax_deficits)] = imputation

        dict_df[tax_deficits.columns[0]][len(tax_deficits) + 1] = 'Total'
        dict_df[tax_deficits.columns[1]][len(tax_deficits) + 1] = (
            tax_deficits[tax_deficits.columns[1]].sum() + imputation
        )

        df = pd.DataFrame.from_dict(dict_df)

        return df.copy()

    def check_unilateral_scenario_gain_computations(self, minimum_ETR=0.25):
        """
        Taking as input the selected minimum effective tax rate and relying on the compute_unilateral_scenario_gain,
        this method outputs a DataFrame that can be compared with the Table 3 of the report. For each country that is
        part of the EU-27 and/or included in the 2016 aggregated and anonymized CbCR data of the OECD, it shows the to-
        tal corporate tax revenue gain that could be drawn from the unilateral implementation of the tax deficit col-
        lection. It also provides a breakdown of this total between the tax deficit of the country's own multinationals,
        the amount that could be collected from US multinationals and revenues that could be collected from non-US ones.
        """

        # We build the list of countries that we want to include in the output table
        country_list = self.get_total_tax_deficits()

        country_list = country_list[
            ~country_list['Parent jurisdiction (whitespaces cleaned)'].isin(['Total - EU27', 'Total - Whole sample'])
        ].copy()

        country_list = list(country_list['Parent jurisdiction (whitespaces cleaned)'].values)

        # We prepare the structure of the output first as a dictionary
        output = {
            'Country': country_list,
            'Own tax deficit': [],
            'Collection of US tax deficit': [],
            'Collection of non-US tax deficit': [],
            'Imputation': [],
            'Total': []
        }

        # We iterate over the list of relevant countries
        for country in country_list:

            # Using the method defined above, we output the table presenting the tax deficit that could be collected
            # from a unilateral implementation of the tax deficit collection by the considered country and its origin
            df = self.compute_unilateral_scenario_gain(
                country=country,
                minimum_ETR=minimum_ETR
            )

            column_name = f'Collectible tax deficit for {country}'

            if country in df['Parent jurisdiction (whitespaces cleaned)'].unique():
                # We fetch the tax deficit that could be collected from the country's own multinationals
                output['Own tax deficit'].append(
                    df[df['Parent jurisdiction (whitespaces cleaned)'] == country][column_name].iloc[0]
                )

            else:
                output['Own tax deficit'].append(0)

            # We fetch the tax deficit that could be collected from US multinationals
            if 'United States' in df['Parent jurisdiction (whitespaces cleaned)'].values:
                output['Collection of US tax deficit'].append(
                    df[df['Parent jurisdiction (whitespaces cleaned)'] == 'United States'][column_name].iloc[0]
                )
            else:
                output['Collection of US tax deficit'].append(0)

            # We fetch the tax deficit that was imputed following our methodology
            output['Imputation'].append(
                df[df['Parent jurisdiction (whitespaces cleaned)'] == 'Others (imputation)'][column_name].iloc[0]
            )

            # We fetch the total tax deficit
            output['Total'].append(
                df[df['Parent jurisdiction (whitespaces cleaned)'] == 'Total'][column_name].iloc[0]
            )

            # And finally, we sum the tax deficits collected from foreign non-US multinationals
            output['Collection of non-US tax deficit'].append(
                df[
                    ~df['Parent jurisdiction (whitespaces cleaned)'].isin(
                        [
                            country, 'United States', 'Total', 'Others (imputation)'
                        ]
                    )
                ][column_name].sum()
            )

        # We convert the dictionary into a DataFrame
        df = pd.DataFrame.from_dict(output)

        # We sum the imputation and the tax deficit collected from foreign, non-US multinationals to obtain the uprated
        # figures that correspond to the "Other foreign firms" column of Table 3 in the report
        df['Collection of non-US tax deficit (uprated with imputation)'] = \
            df['Imputation'] + df['Collection of non-US tax deficit']

        # We convert the results from 2021 EUR into 2021 billion EUR
        for column_name in df.columns[1:]:
            df[column_name] /= 10**9

        return df.copy()

    def output_unilateral_scenario_gain_formatted(self, country, minimum_ETR=0.25):
        """
        This method is used in the "app.py" file, which lies behind the Streamlit simulator. It allows to produce the
        table presented on the "Unilateral implementation scenario" page. It takes as input the selected minimum ETR and
        the name of the country assumed to unilaterally implement the tax deficit collection. Then, it widely relies on
        the compute_unilateral_scenario_gain method defined above and mostly consists in a series of formatting steps to
        make the table more readable and understandable.
        """

        # We compute the gains from the unilateral implementation of the tax deficit collection for the taxing country
        df = self.compute_unilateral_scenario_gain(
            country=country,
            minimum_ETR=minimum_ETR
        )

        # We convert the numeric outputs into 2021 million EUR
        df[f'Collectible tax deficit for {country}'] = df[f'Collectible tax deficit for {country}'] / 10**6

        # We reformat figures with two decimals and a thousand separator
        df[f'Collectible tax deficit for {country}'] = \
            df[f'Collectible tax deficit for {country}'].map('{:,.2f}'.format)

        # We rename columns in accordance
        df.rename(
            columns={
                f'Collectible tax deficit for {country}': f'Collectible tax deficit for {country} (€m)',
                'Parent jurisdiction (whitespaces cleaned)': 'Headquarter country'
            },
            inplace=True
        )

        return df.copy()

    def compute_intermediary_scenario_gain(self, minimum_ETR=0.25):
        """
        This method encapsulates the computations used to estimate the corporate tax revenue gains of EU countries,
        should the European Union implement the tax deficit collection as a block. This corresponds therefore to the
        partial cooperation scenario described in the report.

        Taking as input the selected minimum effective tax rate, this method outputs a DataFrame that presents for each
        in-sample EU-27 country:

        - the corporate tax revenue gains that could be collected from its own multinationals ("tax_deficit" column);
        - the tax deficit that could be collected from foreign, non-EU multinationals ("From foreign MNEs" column);
        - and the resulting total corporate tax revenue gain.

        All figures are output in 2021 million EUR.

        The three lines at the end of the DataFrame are a bit specific. Some OECD-reporting contries do not provide a
        perfectly detailed country-by-country report and for these, the "Other Europe" and "Europe" fields are assumed
        to be related to EU countries and are included in the total collectible tax deficit. The final line presents
        this total.

        The methogology behind these computations is described in much more details in Appendix C of the report.
        """

        # We start by computing the total tax deficits of all in-sample countries (those of the multilateral scenario)
        tax_deficits = self.get_total_tax_deficits(minimum_ETR=minimum_ETR)

        oecd = self.oecd.copy()

        # We extract the total tax deficit for the EU-27
        eu_27_tax_deficit = tax_deficits[
            tax_deficits['Parent jurisdiction (whitespaces cleaned)'] == 'Total - EU27'
        ]['tax_deficit'].iloc[0]

        # And we store in a separate DataFrame the tax deficits of EU-27 countries
        eu_27_tax_deficits = tax_deficits[
            tax_deficits['Parent jurisdiction (alpha-3 code)'].isin(
                eu_27_country_codes
            )
        ].copy()

        # We focus only on a few non-EU countries, defined when the TaxDeficitCalculator object is instantiated
        tax_deficits = tax_deficits[
            tax_deficits['Parent jurisdiction (alpha-3 code)'].isin(
                self.country_list_intermediary_scenario
            )
        ].copy()

        # We store the results in a dictionary, which we will map upon the eu_27_tax_deficits DataFrame
        additional_revenue_gains = {}

        # We iterate over EU-27 countries and compute for eacht he tax deficit collected from non-EU multinationals
        for eu_country in eu_27_country_codes:

            td_df = tax_deficits.copy()

            # This dictionary will store the attribution ratios based on extra-group sales to be mapped upon td_df
            attribution_ratios = {}

            # We iterate over non-EU countries in our list
            for country in self.country_list_intermediary_scenario:

                oecd_restricted = oecd[oecd['Parent jurisdiction (alpha-3 code)'] == country].copy()

                # We fetch the extra-group sales registered by the non-EU country's multinationals in the EU-27 country
                # (defaults to 0 if the EU-27 country is not among the partners of the non-EU country)
                sales_in_eu_country = oecd_restricted[
                    oecd_restricted['Partner jurisdiction (alpha-3 code)'] == eu_country
                ]['Unrelated Party Revenues'].sum()

                # We compute the total extra-group sales registered by the non-EU country's multinationals worldwide
                total_sales = oecd_restricted['Unrelated Party Revenues'].sum()

                # We deduce the share of the non-EU country's tax deficit attributable to the EU-27 country
                attribution_ratios[country] = sales_in_eu_country / total_sales

            # We map the attribution_ratios dictionary upon the td_df DataFrame
            td_df['Attribution ratios'] = td_df['Parent jurisdiction (alpha-3 code)'].map(attribution_ratios)

            # We deduce, for each non-EU country, the amount of its tax deficit that is collected by the EU-27 country
            td_df['Collectible tax deficit'] = td_df['Attribution ratios'] * td_df['tax_deficit']

            # We sum all these and multiply the total by 2 to estimate the total tax deficit that the EU-27 country
            # could collect from non-EU multinationals
            additional_revenue_gains[eu_country] = td_df['Collectible tax deficit'].sum() * 2

            # NB: the multiplication by 2 corresponds to the imputation strategy defined in Appendix C of the report

        # We map the resulting dictionary upon the eu_27_tax_deficits DataFrame
        eu_27_tax_deficits['From foreign MNEs'] = eu_27_tax_deficits['Parent jurisdiction (alpha-3 code)'].map(
            additional_revenue_gains
        )

        # And deduce total corporate tax revenue gains from such a scenario for all EU-27 countries
        eu_27_tax_deficits['Total'] = (
            eu_27_tax_deficits['tax_deficit'] + eu_27_tax_deficits['From foreign MNEs']
        )

        # We operate a similar process for "Europe" and "Other Europe" field
        additional_revenue_gains = {}

        for aggregate in ['Europe', 'Other Europe']:

            td_df = tax_deficits.copy()

            attribution_ratios = {}

            for country in self.country_list_intermediary_scenario:

                # We do not consider the "Other Europe" field in the US CbCR as it probably does not correspond to
                # activities operated in EU-27 countries (sufficient country-by-country breakdown to exclude this)
                if country == 'USA':
                    attribution_ratios[country] = 0

                    continue

                oecd_restricted = oecd[oecd['Parent jurisdiction (alpha-3 code)'] == country].copy()

                sales_in_europe_or_other_europe = oecd_restricted[
                    oecd_restricted['Partner jurisdiction (whitespaces cleaned)'] == aggregate
                ]['Unrelated Party Revenues'].sum()

                total_sales = oecd_restricted['Unrelated Party Revenues'].sum()

                attribution_ratios[country] = sales_in_europe_or_other_europe / total_sales

            td_df['Attribution ratios'] = td_df['Parent jurisdiction (alpha-3 code)'].map(attribution_ratios)

            td_df['Collectible tax deficit'] = td_df['Attribution ratios'] * td_df['tax_deficit']

            additional_revenue_gains[aggregate] = td_df['Collectible tax deficit'].sum()

        # We drop unnecessary columns
        eu_27_tax_deficits.drop(
            columns=['Parent jurisdiction (alpha-3 code)'],
            inplace=True
        )

        # And we operate very inelegant transformations of the DataFrame to add the "Other Europe", "Europe" and "Total"
        # fields at the bottom of the DataFrame
        eu_27_tax_deficits.reset_index(drop=True, inplace=True)

        dict_df = eu_27_tax_deficits.to_dict()

        dict_df[eu_27_tax_deficits.columns[0]][len(eu_27_tax_deficits)] = 'Other Europe'
        dict_df[eu_27_tax_deficits.columns[1]][len(eu_27_tax_deficits)] = 0
        dict_df[eu_27_tax_deficits.columns[2]][len(eu_27_tax_deficits)] = additional_revenue_gains['Other Europe']
        dict_df[eu_27_tax_deficits.columns[3]][len(eu_27_tax_deficits)] = additional_revenue_gains['Other Europe']

        dict_df[eu_27_tax_deficits.columns[0]][len(eu_27_tax_deficits) + 1] = 'Europe'
        dict_df[eu_27_tax_deficits.columns[1]][len(eu_27_tax_deficits) + 1] = 0
        dict_df[eu_27_tax_deficits.columns[2]][len(eu_27_tax_deficits) + 1] = additional_revenue_gains['Europe']
        dict_df[eu_27_tax_deficits.columns[3]][len(eu_27_tax_deficits) + 1] = additional_revenue_gains['Europe']

        # Here we compute total corporate tax revenue gains for EU-27 countries
        # NB: We have not multiplied the "Other Europe" and "Europe" fields by 2 (no imputation for these)
        total_additional_revenue_gain = eu_27_tax_deficits['From foreign MNEs'].sum() \
            + additional_revenue_gains['Europe'] \
            + additional_revenue_gains['Other Europe']

        dict_df[eu_27_tax_deficits.columns[0]][len(eu_27_tax_deficits) + 2] = 'Total'
        dict_df[eu_27_tax_deficits.columns[1]][len(eu_27_tax_deficits) + 2] = eu_27_tax_deficit
        dict_df[eu_27_tax_deficits.columns[2]][len(eu_27_tax_deficits) + 2] = total_additional_revenue_gain
        dict_df[eu_27_tax_deficits.columns[3]][len(eu_27_tax_deficits) + 2] = \
            eu_27_tax_deficit + total_additional_revenue_gain

        eu_27_tax_deficits = pd.DataFrame.from_dict(dict_df)

        # We convert 2021 EUR figures into 2021 billion EUR
        for column_name in eu_27_tax_deficits.columns[1:]:
            eu_27_tax_deficits[column_name] /= 10**6

        return eu_27_tax_deficits.copy()

    def output_intermediary_scenario_gain_formatted(self, minimum_ETR=0.25):
        """
        This method is used in the "app.py" file, which lies behind the Streamlit simulator. It allows to produce the
        table presented on the "Partial cooperation scenario" page. It takes as input the selected minimum ETR and then,
        widely relies on the compute_intermediary_scenario_gain method defined above. It mostly consists in a series of
        formatting steps to make the table more readable and understandable.
        """

        # We compute corporate tax revenue gains from the partial cooperation scenario
        df = self.compute_intermediary_scenario_gain(minimum_ETR=minimum_ETR)

        # We eliminate irrelevant columns
        df.drop(columns=['tax_deficit', 'From foreign MNEs'], inplace=True)

        # We reformat figures with a thousand separator and a 0-decimal rounding
        df['Total'] = df['Total'].map('{:,.0f}'.format)

        # We rename columns to make them more explicit
        df.rename(
            columns={
                'Parent jurisdiction (whitespaces cleaned)': 'Taxing country',
                'Total': 'Collectible tax deficit (€m)'
            },
            inplace=True
        )

        # We add quotation marks to the "Europe" and "Other Europe" fields
        df['Taxing country'] = df['Taxing country'].map(
            lambda x: x if x not in ['Europe', 'Other Europe'] else f'"{x}"'
        )

        return df.copy()

    def assess_carve_out_impact(self, minimum_ETR=0.25):
        """
        This function takes as input a minimum effective tax rate (which defaults to 25%) and outputs a DataFrame
        showing, for each in-sample country (EU and/or CbCR-reporting countries):

        - the tax deficit that it could collect by imposing this minimum ETR on the profits of its multinationals;
        - the split between domestic, tax haven and non-haven tax deficits;
        - and the same amounts with carve-outs being applied.

        Carve-outs are applied with the parameters (carve-out rate, use of the full value of tangible assets or of de-
        preciation expenses only and exclusion of inventories or not) that are defined when instantiating the TaxDefi-
        citCalculator object.
        """

        # If carve-out parameters have not been indicated, we cannot run the computations
        if self.carve_out_rate is None or self.depreciation_only is None or self.exclude_inventories is None:
            raise Exception(
                'If you want to simulate substance-based carve-outs, you need to indicate all the parameters.'
            )

        # We instantiate a TaxDeficitCalculator object with carve-outs
        calculator = TaxDeficitCalculator(
            carve_outs=True,
            carve_out_rate=self.carve_out_rate,
            depreciation_only=self.depreciation_only,
            exclude_inventories=self.exclude_inventories
        )

        # We load the data
        calculator.load_clean_data()

        # And deduce total tax deficits and their split, with carve-outs being applied
        carve_outs = calculator.compute_all_tax_deficits(
            CbCR_reporting_countries_only=False,
            minimum_ETR=minimum_ETR
        )

        # We instantiate a TaxDeficitCalculator object without carve-outs
        calculator_no_carve_out = TaxDeficitCalculator()

        # We load the data
        calculator_no_carve_out.load_clean_data()

        # And deduce total tax deficits and their split, without any carve-out being applied
        no_carve_outs = calculator_no_carve_out.compute_all_tax_deficits(
            CbCR_reporting_countries_only=False,
            minimum_ETR=minimum_ETR
        )

        # We merge the two DataFrames
        carve_outs_impact = carve_outs.merge(
            no_carve_outs,
            how='inner',
            on=[
                'Parent jurisdiction (whitespaces cleaned)',
                'Parent jurisdiction (alpha-3 code)'
            ]
        ).rename(
            columns={
                'tax_deficit_x': 'TD_with_carve_outs',
                'tax_deficit_y': 'TD_no_carve_outs',
                'tax_deficit_x_domestic_x': 'domestic_TD_with_carve_outs',
                'tax_deficit_x_domestic_y': 'domestic_TD_no_carve_outs',
                'tax_deficit_x_non_haven_x': 'non_haven_TD_with_carve_outs',
                'tax_deficit_x_non_haven_y': 'non_haven_TD_no_carve_outs',
                'tax_deficit_x_tax_haven_x': 'tax_haven_TD_with_carve_outs',
                'tax_deficit_x_tax_haven_y': 'tax_haven_TD_no_carve_outs'
            }
        )

        # We only show EU and/or CbCR-reporting countries
        cbcr_reporting_countries = list(self.oecd['Parent jurisdiction (alpha-3 code)'].unique())

        mask_eu = carve_outs_impact['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
        mask_cbcr = carve_outs_impact['Parent jurisdiction (alpha-3 code)'].isin(cbcr_reporting_countries)

        # This condition is encapsulated in this boolean indexing mask
        mask = np.logical_or(mask_eu, mask_cbcr)

        # We add two useful indicator variables
        carve_outs_impact['IS_EU'] = mask_eu * 1
        carve_outs_impact['REPORTS_CbCR'] = mask_cbcr * 1

        # And restrict the DataFrame to relevant countries
        restricted_df = carve_outs_impact[mask].copy()

        # We finalise the formatting of the table
        restricted_df.sort_values(
            by=['IS_EU', 'Parent jurisdiction (alpha-3 code)'],
            ascending=[False, True],
            inplace=True
        )

        columns = [
            'Parent jurisdiction (whitespaces cleaned)', 'Parent jurisdiction (alpha-3 code)',
            'TD_with_carve_outs', 'TD_no_carve_outs', 'domestic_TD_with_carve_outs', 'domestic_TD_no_carve_outs',
            'non_haven_TD_with_carve_outs', 'non_haven_TD_no_carve_outs', 'tax_haven_TD_with_carve_outs',
            'tax_haven_TD_no_carve_outs', 'IS_EU', 'REPORTS_CbCR'
        ]

        return restricted_df[columns].copy()

    def assess_carve_out_impact_formatted(self, minimum_ETR=0.25):
        """
        This method is used in the "app.py" file, which underlies the Streamlit simulator. It is used to produce the
        table on the "Substance-based carve-outs" page. It takes as input the selected minimum ETR and widely relies on
        the assess_carve_out_impact method defined above. It mostly consists in a series of formatting steps.
        """
        df = self.assess_carve_out_impact(minimum_ETR=minimum_ETR)

        df.sort_values(
            by='Parent jurisdiction (whitespaces cleaned)',
            inplace=True
        )

        mask_eu = df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)

        df = df[['Parent jurisdiction (whitespaces cleaned)', 'TD_no_carve_outs', 'TD_with_carve_outs']].copy()

        dict_df = df.to_dict()

        dict_df[df.columns[0]][len(df) + 1] = 'Total - EU27'
        dict_df[df.columns[1]][len(df) + 1] = df[mask_eu]['TD_no_carve_outs'].sum()
        dict_df[df.columns[2]][len(df) + 1] = df[mask_eu]['TD_with_carve_outs'].sum()

        dict_df[df.columns[0]][len(df) + 2] = 'Total - Whole sample'
        dict_df[df.columns[1]][len(df) + 2] = df['TD_no_carve_outs'].sum()
        dict_df[df.columns[2]][len(df) + 2] = df['TD_with_carve_outs'].sum()

        df = pd.DataFrame.from_dict(dict_df)

        df['Change in % of revenue gains without carve-outs'] = (
            (df['TD_with_carve_outs'] - df['TD_no_carve_outs']) / df['TD_no_carve_outs']
        ) * 100

        df.rename(
            columns={
                'TD_no_carve_outs': 'Collectible tax deficit without carve-outs (€m)',
                'TD_with_carve_outs': 'Collectible tax deficit with carve-outs (€m)'
            },
            inplace=True
        )

        for column_name in df.columns[1:-1]:
            df[column_name] /= 10**6
            df[column_name] = df[column_name].map('{:,.0f}'.format)

        df[df.columns[-1]] = df[df.columns[-1]].map('{:.1f}'.format)

        return df.copy()

    def get_carve_outs_table(
        self,
        TWZ_countries_methodology,
        depreciation_only, exclude_inventories,
        carve_out_rate=0.05
    ):
        """
        This function takes as input:

        - the methodology to use to estimate the post-carve-out revenue gains of TWZ countries;

        - a boolean, "depreciation_only", indicating whether to restrict the tangible assets component of substance-
        based carve-outs to a share of depreciation expenses;

        - a boolean, "exlude_inventories", indicating whether to exlude inventories from tangible assets or not;

        - the carve-out rate to use (which defaults to 5%).

        It returns a DataFrame that shows, for the 15% and 25% minimum rates and for each in-sample country, the estima-
        ted revenue gains from a global minimum tax without and with carve-outs being applied.
        """

        # We need to have previously loaded and cleaned the OECD data
        if self.oecd is None:
            raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

        # The "TWZ_countries_methodology" argument can only take a few string values
        if TWZ_countries_methodology not in ['initial', 'new']:
            raise Exception('The "TWZ_countries_methodology" argument only accepts two values: "initial" or "new".')

        # Computing tax deficits without substance-based carve-outs
        calculator = TaxDeficitCalculator()

        calculator.load_clean_data()

        td_25 = calculator.get_total_tax_deficits(minimum_ETR=0.25).iloc[:-2, :]
        td_15 = calculator.get_total_tax_deficits(minimum_ETR=0.15).iloc[:-2, :]

        # We merge the resulting DataFrames for the 15% and 25% minimum rates
        merged_df = td_25.merge(
            td_15[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df['tax_deficit_y'] = merged_df['tax_deficit_y'].fillna(0)

        merged_df.rename(
            columns={
                'tax_deficit_x': 'tax_deficit_25_no_carve_out',
                'tax_deficit_y': 'tax_deficit_15_no_carve_out'
            },
            inplace=True
        )

        # Computing corresponding tax deficits with substance-based carve-outs
        calculator = TaxDeficitCalculator(
            carve_outs=True, carve_out_rate=carve_out_rate,
            depreciation_only=depreciation_only, exclude_inventories=exclude_inventories
        )

        calculator.load_clean_data()

        td_25 = calculator.get_total_tax_deficits(minimum_ETR=0.25).iloc[:-2]
        td_15 = calculator.get_total_tax_deficits(minimum_ETR=0.15).iloc[:-2]

        # We merge the DataFrame obtained for the 25% minimum rate
        merged_df = merged_df.merge(
            td_25[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df.rename(
            columns={
                'tax_deficit': 'tax_deficit_25_with_carve_out'
            },
            inplace=True
        )

        # We merge the DataFrame obtained for the 15% minimum rate
        merged_df = merged_df.merge(
            td_15[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

        merged_df.rename(
            columns={
                'tax_deficit': 'tax_deficit_15_with_carve_out'
            },
            inplace=True
        )

        # We only show EU and/or CbCR-reporting countries
        cbcr_reporting_countries = list(self.oecd['Parent jurisdiction (alpha-3 code)'].unique())

        mask_eu = merged_df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
        mask_cbcr = merged_df['Parent jurisdiction (alpha-3 code)'].isin(cbcr_reporting_countries)

        # This condition is encapsulated in this boolean indexing mask
        mask = np.logical_or(mask_eu, mask_cbcr)

        # We add two useful indicator variables
        merged_df['IS_EU'] = mask_eu * 1
        merged_df['REPORTS_CbCR'] = mask_cbcr * 1

        # And we restrict the DataFrame to relevant countries
        restricted_df = merged_df[mask].copy()

        # We finalise the reformatting of the DataFrame
        restricted_df.sort_values(
            by=['IS_EU', 'Parent jurisdiction (alpha-3 code)'],
            ascending=[False, True],
            inplace=True
        )

        if TWZ_countries_methodology == 'initial':
            # If we have opted for the "initial" methodology for TWZ countries, we can simply return the DataFrame as is
            return restricted_df.copy()

        else:
            # If we have chosen the "new" methodology, we have a bit more work!

            # We create a temporary copy of the DataFrame, restricted to CbCR-reporting countries (excluding Sweden)
            temp = restricted_df[restricted_df['REPORTS_CbCR'] == 1].copy()
            temp = temp[temp['Parent jurisdiction (alpha-3 code)'] != 'SWE'].copy()

            # We deduce the average reduction factors to apply to the collectible tax deficits of TWZ countries
            self.imputation_15 = temp['tax_deficit_15_with_carve_out'].sum() / temp['tax_deficit_15_no_carve_out'].sum()
            self.imputation_25 = temp['tax_deficit_25_with_carve_out'].sum() / temp['tax_deficit_25_no_carve_out'].sum()

            # We apply the two downgrade factors to tax deficits without carve-outs
            restricted_df['tax_deficit_15_with_carve_out'] = restricted_df.apply(
                (
                    lambda row: row['tax_deficit_15_no_carve_out'] * self.imputation_15 if row['REPORTS_CbCR'] == 0 or
                    row['Parent jurisdiction (alpha-3 code)'] == 'SWE' else row['tax_deficit_15_with_carve_out']
                ),
                axis=1
            )

            restricted_df['tax_deficit_25_with_carve_out'] = restricted_df.apply(
                (
                    lambda row: row['tax_deficit_25_no_carve_out'] * self.imputation_25 if row['REPORTS_CbCR'] == 0 or
                    row['Parent jurisdiction (alpha-3 code)'] == 'SWE' else row['tax_deficit_25_with_carve_out']
                ),
                axis=1
            )

            # And we return the adjusted DataFrame
            return restricted_df.copy()

    def get_carve_outs_table_2(
        self,
        exclude_inventories, depreciation_only,
        carve_out_rate=0.05,
        output_Excel=False
    ):
        """
        This function takes as input:

        - a boolean, "depreciation_only", indicating whether to restrict the tangible assets component of substance-
        based carve-outs to a share of depreciation expenses;

        - a boolean, "exlude_inventories", indicating whether to exlude inventories from tangible assets or not;

        - the carve-out rate to use (which defaults to 5%).

        It returns a DataFrame that shows, for the different minimum effective tax rates and for each in-sample country,
        the estimated impact of substance-based carve-outs. The change is expressed as a percentage of revenue gain es-
        timates without substance-based carve-outs.
        """

        # The "get_carve_outs_table" method provides the required information for two minimum ETRs, 15% and 25%
        # This will serve as a central DataFrame to which we will add the 21% and 30% columns
        df = self.get_carve_outs_table(
            TWZ_countries_methodology='initial',
            exclude_inventories=exclude_inventories, depreciation_only=depreciation_only,
            carve_out_rate=carve_out_rate
        )

        # Computing tax deficits without substance-based carve-outs
        calculator = TaxDeficitCalculator()

        calculator.load_clean_data()

        td_21 = calculator.get_total_tax_deficits(minimum_ETR=0.21).iloc[:-2, :]
        td_30 = calculator.get_total_tax_deficits(minimum_ETR=0.3).iloc[:-2, :]

        # We add the 21% tax deficit to the central DataFrame
        merged_df = df.merge(
            td_21[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

        # We add the 30% tax deficit to the central DataFrame
        merged_df = merged_df.merge(
            td_30[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df['tax_deficit_y'] = merged_df['tax_deficit_y'].fillna(0)

        merged_df.rename(
            columns={
                'tax_deficit_x': 'tax_deficit_21_no_carve_out',
                'tax_deficit_y': 'tax_deficit_30_no_carve_out'
            },
            inplace=True
        )

        # Computing corresponding tax deficits with substance-based carve-outs
        calculator = TaxDeficitCalculator(
            carve_outs=True,
            carve_out_rate=carve_out_rate,
            depreciation_only=depreciation_only,
            exclude_inventories=exclude_inventories
        )

        calculator.load_clean_data()

        td_21 = calculator.get_total_tax_deficits(minimum_ETR=0.21).iloc[:-2]
        td_30 = calculator.get_total_tax_deficits(minimum_ETR=0.3).iloc[:-2]

        # We add the 21% tax deficit with carve-outs to the central DataFrame
        merged_df = merged_df.merge(
            td_21[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

        merged_df.rename(
            columns={
                'tax_deficit': 'tax_deficit_21_with_carve_out'
            },
            inplace=True
        )

        # We add the 30% tax deficit with carve-outs to the central DataFrame
        merged_df = merged_df.merge(
            td_30[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

        merged_df.rename(
            columns={
                'tax_deficit': 'tax_deficit_30_with_carve_out'
            },
            inplace=True
        )

        # We have the tax deficit absolute amounts with and without carve-outs at 15%, 21%, 25% and 30% minimum rates
        # But we want to display the changes due to carve-outs, as a % of the no-carve-out tax deficit

        # We store the names of the 4 columns that we are going to add to the central DataFrame
        new_columns = []

        # We iterate over the 4 minimum rates
        for minimum_rate in [15, 21, 25, 30]:
            column_name_no_carve_out = f'tax_deficit_{minimum_rate}_no_carve_out'
            column_name_with_carve_out = f'tax_deficit_{minimum_rate}_with_carve_out'

            # We are going to add a new column that provides the % reduction due to carve-outs at the rate considered
            new_column_name = f'reduction_at_{minimum_rate}_minimum_rate'

            # We make the corresponding computation
            merged_df[new_column_name] = (
                (merged_df[column_name_with_carve_out] - merged_df[column_name_no_carve_out]) /
                merged_df[column_name_no_carve_out]
            ) * 100

            new_columns.append(new_column_name)

        if output_Excel:
            with pd.ExcelWriter('/Users/Paul-Emmanuel/Desktop/carve_outs_table_2.xlsx', engine='xlsxwriter') as writer:
                merged_df.to_excel(writer, sheet_name='table_2', index=False)

        # We output the resulting DataFrame with country codes and names, as well as the 4 columns of interest
        merged_df = merged_df[
            ['Parent jurisdiction (alpha-3 code)', 'Parent jurisdiction (whitespaces cleaned)'] + new_columns
        ].copy()

        return merged_df.copy()

    def get_carve_outs_rate_table(
        self,
        minimum_ETR,
        depreciation_only, exclude_inventories,
    ):
        """
        This function takes as inputs:

        - the minimum effective tax rate to apply to multinationals' profits;

        - a boolean, "depreciation_only", indicating whether to restrict the tangible assets component of substance-
        based carve-outs to a share of depreciation expenses;

        - a boolean, "exlude_inventories", indicating whether to exlude inventories from tangible assets or not.

        It returns a DataFrame that shows, for each in-sample country, the estimated revenues that could be collected
        from a global minimum tax without any carve-outs and with carve-outs of 5%, 7.5% and 10% of tangible assets and
        payroll combined.
        """

        # We instantiate a TaxDeficitCalculator object without carve-outs
        calculator = TaxDeficitCalculator()

        calculator.load_clean_data()

        # We use it to compute revenue gains without any carve-out
        td_no_carve_out = calculator.get_total_tax_deficits(minimum_ETR=minimum_ETR).iloc[:-2]

        td_no_carve_out.rename(
            columns={
                'tax_deficit': 'tax_deficit_no_carve_out'
            },
            inplace=True
        )

        # A copy of the resulting DataFrame will be used as a central table to which we add the relevant columns
        merged_df = td_no_carve_out.copy()

        # We iterate over carve-out rates
        for carve_out_rate in [5, 7.5, 10]:
            actual_rate = carve_out_rate / 100

            # We instantiate a TaxDeficitCalculator object with carve-outs at the rate considered
            calculator = TaxDeficitCalculator(
                carve_outs=True, carve_out_rate=actual_rate,
                depreciation_only=False, exclude_inventories=exclude_inventories
            )
            calculator.load_clean_data()

            # We use it to compute revenue gains with substance-based carve-outs being applied
            td_carve_out = calculator.get_total_tax_deficits(minimum_ETR=minimum_ETR).iloc[:-2]

            # We add the tax deficits thereby computed to the central table
            merged_df = merged_df.merge(
                td_carve_out[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
                how='left',
                on='Parent jurisdiction (alpha-3 code)'
            )

            merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

            merged_df.rename(
                columns={
                    'tax_deficit': f'tax_deficit_{carve_out_rate}_carve_out'
                },
                inplace=True
            )

        # We only display EU or CbCR-reporting countries
        cbcr_reporting_countries = list(self.oecd['Parent jurisdiction (alpha-3 code)'].unique())

        mask_eu = merged_df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
        mask_cbcr = merged_df['Parent jurisdiction (alpha-3 code)'].isin(cbcr_reporting_countries)

        # This condition is encapsulated in the following boolean indexing mask
        mask = np.logical_or(mask_eu, mask_cbcr)

        # We add two useful indicator variables
        merged_df['IS_EU'] = mask_eu * 1
        merged_df['REPORTS_CbCR'] = mask_cbcr * 1

        # And we restrict the DataFrame to relevant countries
        restricted_df = merged_df[mask].copy()

        # We finalise the formatting of the table
        restricted_df.sort_values(
            by=['IS_EU', 'Parent jurisdiction (alpha-3 code)'],
            ascending=[False, True],
            inplace=True
        )

        # And eventually return the DataFrame
        return restricted_df.copy()

Methods

def assess_carve_out_impact(self, minimum_ETR=0.25)

This function takes as input a minimum effective tax rate (which defaults to 25%) and outputs a DataFrame showing, for each in-sample country (EU and/or CbCR-reporting countries):

  • the tax deficit that it could collect by imposing this minimum ETR on the profits of its multinationals;
  • the split between domestic, tax haven and non-haven tax deficits;
  • and the same amounts with carve-outs being applied.

Carve-outs are applied with the parameters (carve-out rate, use of the full value of tangible assets or of de- preciation expenses only and exclusion of inventories or not) that are defined when instantiating the TaxDefi- citCalculator object.

Expand source code
def assess_carve_out_impact(self, minimum_ETR=0.25):
    """
    This function takes as input a minimum effective tax rate (which defaults to 25%) and outputs a DataFrame
    showing, for each in-sample country (EU and/or CbCR-reporting countries):

    - the tax deficit that it could collect by imposing this minimum ETR on the profits of its multinationals;
    - the split between domestic, tax haven and non-haven tax deficits;
    - and the same amounts with carve-outs being applied.

    Carve-outs are applied with the parameters (carve-out rate, use of the full value of tangible assets or of de-
    preciation expenses only and exclusion of inventories or not) that are defined when instantiating the TaxDefi-
    citCalculator object.
    """

    # If carve-out parameters have not been indicated, we cannot run the computations
    if self.carve_out_rate is None or self.depreciation_only is None or self.exclude_inventories is None:
        raise Exception(
            'If you want to simulate substance-based carve-outs, you need to indicate all the parameters.'
        )

    # We instantiate a TaxDeficitCalculator object with carve-outs
    calculator = TaxDeficitCalculator(
        carve_outs=True,
        carve_out_rate=self.carve_out_rate,
        depreciation_only=self.depreciation_only,
        exclude_inventories=self.exclude_inventories
    )

    # We load the data
    calculator.load_clean_data()

    # And deduce total tax deficits and their split, with carve-outs being applied
    carve_outs = calculator.compute_all_tax_deficits(
        CbCR_reporting_countries_only=False,
        minimum_ETR=minimum_ETR
    )

    # We instantiate a TaxDeficitCalculator object without carve-outs
    calculator_no_carve_out = TaxDeficitCalculator()

    # We load the data
    calculator_no_carve_out.load_clean_data()

    # And deduce total tax deficits and their split, without any carve-out being applied
    no_carve_outs = calculator_no_carve_out.compute_all_tax_deficits(
        CbCR_reporting_countries_only=False,
        minimum_ETR=minimum_ETR
    )

    # We merge the two DataFrames
    carve_outs_impact = carve_outs.merge(
        no_carve_outs,
        how='inner',
        on=[
            'Parent jurisdiction (whitespaces cleaned)',
            'Parent jurisdiction (alpha-3 code)'
        ]
    ).rename(
        columns={
            'tax_deficit_x': 'TD_with_carve_outs',
            'tax_deficit_y': 'TD_no_carve_outs',
            'tax_deficit_x_domestic_x': 'domestic_TD_with_carve_outs',
            'tax_deficit_x_domestic_y': 'domestic_TD_no_carve_outs',
            'tax_deficit_x_non_haven_x': 'non_haven_TD_with_carve_outs',
            'tax_deficit_x_non_haven_y': 'non_haven_TD_no_carve_outs',
            'tax_deficit_x_tax_haven_x': 'tax_haven_TD_with_carve_outs',
            'tax_deficit_x_tax_haven_y': 'tax_haven_TD_no_carve_outs'
        }
    )

    # We only show EU and/or CbCR-reporting countries
    cbcr_reporting_countries = list(self.oecd['Parent jurisdiction (alpha-3 code)'].unique())

    mask_eu = carve_outs_impact['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
    mask_cbcr = carve_outs_impact['Parent jurisdiction (alpha-3 code)'].isin(cbcr_reporting_countries)

    # This condition is encapsulated in this boolean indexing mask
    mask = np.logical_or(mask_eu, mask_cbcr)

    # We add two useful indicator variables
    carve_outs_impact['IS_EU'] = mask_eu * 1
    carve_outs_impact['REPORTS_CbCR'] = mask_cbcr * 1

    # And restrict the DataFrame to relevant countries
    restricted_df = carve_outs_impact[mask].copy()

    # We finalise the formatting of the table
    restricted_df.sort_values(
        by=['IS_EU', 'Parent jurisdiction (alpha-3 code)'],
        ascending=[False, True],
        inplace=True
    )

    columns = [
        'Parent jurisdiction (whitespaces cleaned)', 'Parent jurisdiction (alpha-3 code)',
        'TD_with_carve_outs', 'TD_no_carve_outs', 'domestic_TD_with_carve_outs', 'domestic_TD_no_carve_outs',
        'non_haven_TD_with_carve_outs', 'non_haven_TD_no_carve_outs', 'tax_haven_TD_with_carve_outs',
        'tax_haven_TD_no_carve_outs', 'IS_EU', 'REPORTS_CbCR'
    ]

    return restricted_df[columns].copy()
def assess_carve_out_impact_formatted(self, minimum_ETR=0.25)

This method is used in the "app.py" file, which underlies the Streamlit simulator. It is used to produce the table on the "Substance-based carve-outs" page. It takes as input the selected minimum ETR and widely relies on the assess_carve_out_impact method defined above. It mostly consists in a series of formatting steps.

Expand source code
def assess_carve_out_impact_formatted(self, minimum_ETR=0.25):
    """
    This method is used in the "app.py" file, which underlies the Streamlit simulator. It is used to produce the
    table on the "Substance-based carve-outs" page. It takes as input the selected minimum ETR and widely relies on
    the assess_carve_out_impact method defined above. It mostly consists in a series of formatting steps.
    """
    df = self.assess_carve_out_impact(minimum_ETR=minimum_ETR)

    df.sort_values(
        by='Parent jurisdiction (whitespaces cleaned)',
        inplace=True
    )

    mask_eu = df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)

    df = df[['Parent jurisdiction (whitespaces cleaned)', 'TD_no_carve_outs', 'TD_with_carve_outs']].copy()

    dict_df = df.to_dict()

    dict_df[df.columns[0]][len(df) + 1] = 'Total - EU27'
    dict_df[df.columns[1]][len(df) + 1] = df[mask_eu]['TD_no_carve_outs'].sum()
    dict_df[df.columns[2]][len(df) + 1] = df[mask_eu]['TD_with_carve_outs'].sum()

    dict_df[df.columns[0]][len(df) + 2] = 'Total - Whole sample'
    dict_df[df.columns[1]][len(df) + 2] = df['TD_no_carve_outs'].sum()
    dict_df[df.columns[2]][len(df) + 2] = df['TD_with_carve_outs'].sum()

    df = pd.DataFrame.from_dict(dict_df)

    df['Change in % of revenue gains without carve-outs'] = (
        (df['TD_with_carve_outs'] - df['TD_no_carve_outs']) / df['TD_no_carve_outs']
    ) * 100

    df.rename(
        columns={
            'TD_no_carve_outs': 'Collectible tax deficit without carve-outs (€m)',
            'TD_with_carve_outs': 'Collectible tax deficit with carve-outs (€m)'
        },
        inplace=True
    )

    for column_name in df.columns[1:-1]:
        df[column_name] /= 10**6
        df[column_name] = df[column_name].map('{:,.0f}'.format)

    df[df.columns[-1]] = df[df.columns[-1]].map('{:.1f}'.format)

    return df.copy()
def check_appendix_A2(self)

Relying on the get_total_tax_deficits method and on TWZ data on corporate income tax revenues, this method out- puts a DataFrame that can be compared with the first 4 columns of Table A2 in the report. For each in-sample country and at four different minimum ETRs (15%, 21%, 25% and 30% which are the four main cases considered in the report), the table presents estimated revenue gains as a percentage of currently corporate income taxes.

Expand source code
def check_appendix_A2(self):
    """
    Relying on the get_total_tax_deficits method and on TWZ data on corporate income tax revenues, this method out-
    puts a DataFrame that can be compared with the first 4 columns of Table A2 in the report. For each in-sample
    country and at four different minimum ETRs (15%, 21%, 25% and 30% which are the four main cases considered in
    the report), the table presents estimated revenue gains as a percentage of currently corporate income taxes.
    """

    # We need to have previously loaded and cleaned the TWZ data on corporate income tax revenues
    # (figures in the pre-loaded DataFrame are provided in 2016 USD)
    if self.twz_CIT is None:
        raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

    # We compute total tax deficits, first at a 15% minimum ETR and in 2021 EUR
    df = self.get_total_tax_deficits(minimum_ETR=0.15)

    df.rename(columns={'tax_deficit': 'tax_deficit_15'}, inplace=True)

    # We merge the two DataFrames to combine information on collectible tax deficits and current CIT revenues
    merged_df = df.merge(
        self.twz_CIT,
        how='left',
        left_on='Parent jurisdiction (alpha-3 code)',
        right_on='Country (alpha-3 code)'
    ).drop(columns=['Country', 'Country (alpha-3 code)'])

    # We bring back the tax deficit estimated to 2016 USD (from 2021 EUR)
    merged_df['tax_deficit_15'] /= (merged_df['CIT revenue'] * self.multiplier_2021 * self.USD_to_EUR_2016 / 100)

    # For the 3 other rates considered in the output table
    for rate in [0.21, 0.25, 0.3]:
        # We compute total tax deficits at this rate
        df = self.get_total_tax_deficits(minimum_ETR=rate)

        # We add these results to the central DataFrame thanks to a merge operation
        merged_df = merged_df.merge(
            df,
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        # We impute the missing values produced by the merge
        merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

        # We rename the newly-added tax deficit column
        merged_df.rename(
            columns={'tax_deficit': f'tax_deficit_{int(rate * 100)}'},
            inplace=True
        )

        # And we bring it back to 2016 USD
        merged_df[f'tax_deficit_{int(rate * 100)}'] /= (
            merged_df['CIT revenue'] * self.multiplier_2021 * self.USD_to_EUR_2016 / 100
        )

    # We want to also verify the EU-27 average and restrict the DataFrame to these countries
    eu_df = merged_df[merged_df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)].copy()

    # This attribute stores the average EU-27 revenue gain estimate in % of current CIT revenues for each of the 4
    # minimum ETRs of interest (respectively 15.1%, 30.5%, 52.3% and 84.1% in the report)
    self.check = [
        (
            eu_df[f'tax_deficit_{rate}'] * eu_df['CIT revenue'] / 100
        ).sum() / eu_df['CIT revenue'].sum() for rate in [15, 21, 25, 30]
    ]

    # Coming back to the DataFrame with all in-sample countries, we only keep the relevant columns and output it
    merged_df = merged_df[
        [
            'Parent jurisdiction (whitespaces cleaned)_x',
            'tax_deficit_15', 'tax_deficit_21', 'tax_deficit_25', 'tax_deficit_30'
        ]
    ].copy()

    # NB: in the current version of this method, the successive merges have a poor effect on the "Total" rows that
    # are included in the output of the get_total_tax_deficits method; this could easily be improved

    return merged_df.copy()
def check_tax_deficit_computations(self, minimum_ETR=0.25)

Taking the selected minimum ETR as input and relying on the compute_all_tax_deficits method defined above, this method outputs a DataFrame that can be compared with Table A1 of the report. For each country in OECD and/or TWZ data, it displays its total tax deficit and a breakdown into domestic, tax-haven-based and non-haven tax defi- cits. Figures are display in 2021 billion EUR.

Expand source code
def check_tax_deficit_computations(self, minimum_ETR=0.25):
    """
    Taking the selected minimum ETR as input and relying on the compute_all_tax_deficits method defined above, this
    method outputs a DataFrame that can be compared with Table A1 of the report. For each country in OECD and/or TWZ
    data, it displays its total tax deficit and a breakdown into domestic, tax-haven-based and non-haven tax defi-
    cits. Figures are display in 2021 billion EUR.
    """

    # We start from the output of the previously defined method
    df = self.compute_all_tax_deficits(minimum_ETR=minimum_ETR)

    # And convert numeric columns from 2021 EUR to 2021 billion EUR
    for column_name in df.columns[2:]:
        df[column_name] = df[column_name] / 10**9

    return df.copy()
def check_unilateral_scenario_gain_computations(self, minimum_ETR=0.25)

Taking as input the selected minimum effective tax rate and relying on the compute_unilateral_scenario_gain, this method outputs a DataFrame that can be compared with the Table 3 of the report. For each country that is part of the EU-27 and/or included in the 2016 aggregated and anonymized CbCR data of the OECD, it shows the to- tal corporate tax revenue gain that could be drawn from the unilateral implementation of the tax deficit col- lection. It also provides a breakdown of this total between the tax deficit of the country's own multinationals, the amount that could be collected from US multinationals and revenues that could be collected from non-US ones.

Expand source code
def check_unilateral_scenario_gain_computations(self, minimum_ETR=0.25):
    """
    Taking as input the selected minimum effective tax rate and relying on the compute_unilateral_scenario_gain,
    this method outputs a DataFrame that can be compared with the Table 3 of the report. For each country that is
    part of the EU-27 and/or included in the 2016 aggregated and anonymized CbCR data of the OECD, it shows the to-
    tal corporate tax revenue gain that could be drawn from the unilateral implementation of the tax deficit col-
    lection. It also provides a breakdown of this total between the tax deficit of the country's own multinationals,
    the amount that could be collected from US multinationals and revenues that could be collected from non-US ones.
    """

    # We build the list of countries that we want to include in the output table
    country_list = self.get_total_tax_deficits()

    country_list = country_list[
        ~country_list['Parent jurisdiction (whitespaces cleaned)'].isin(['Total - EU27', 'Total - Whole sample'])
    ].copy()

    country_list = list(country_list['Parent jurisdiction (whitespaces cleaned)'].values)

    # We prepare the structure of the output first as a dictionary
    output = {
        'Country': country_list,
        'Own tax deficit': [],
        'Collection of US tax deficit': [],
        'Collection of non-US tax deficit': [],
        'Imputation': [],
        'Total': []
    }

    # We iterate over the list of relevant countries
    for country in country_list:

        # Using the method defined above, we output the table presenting the tax deficit that could be collected
        # from a unilateral implementation of the tax deficit collection by the considered country and its origin
        df = self.compute_unilateral_scenario_gain(
            country=country,
            minimum_ETR=minimum_ETR
        )

        column_name = f'Collectible tax deficit for {country}'

        if country in df['Parent jurisdiction (whitespaces cleaned)'].unique():
            # We fetch the tax deficit that could be collected from the country's own multinationals
            output['Own tax deficit'].append(
                df[df['Parent jurisdiction (whitespaces cleaned)'] == country][column_name].iloc[0]
            )

        else:
            output['Own tax deficit'].append(0)

        # We fetch the tax deficit that could be collected from US multinationals
        if 'United States' in df['Parent jurisdiction (whitespaces cleaned)'].values:
            output['Collection of US tax deficit'].append(
                df[df['Parent jurisdiction (whitespaces cleaned)'] == 'United States'][column_name].iloc[0]
            )
        else:
            output['Collection of US tax deficit'].append(0)

        # We fetch the tax deficit that was imputed following our methodology
        output['Imputation'].append(
            df[df['Parent jurisdiction (whitespaces cleaned)'] == 'Others (imputation)'][column_name].iloc[0]
        )

        # We fetch the total tax deficit
        output['Total'].append(
            df[df['Parent jurisdiction (whitespaces cleaned)'] == 'Total'][column_name].iloc[0]
        )

        # And finally, we sum the tax deficits collected from foreign non-US multinationals
        output['Collection of non-US tax deficit'].append(
            df[
                ~df['Parent jurisdiction (whitespaces cleaned)'].isin(
                    [
                        country, 'United States', 'Total', 'Others (imputation)'
                    ]
                )
            ][column_name].sum()
        )

    # We convert the dictionary into a DataFrame
    df = pd.DataFrame.from_dict(output)

    # We sum the imputation and the tax deficit collected from foreign, non-US multinationals to obtain the uprated
    # figures that correspond to the "Other foreign firms" column of Table 3 in the report
    df['Collection of non-US tax deficit (uprated with imputation)'] = \
        df['Imputation'] + df['Collection of non-US tax deficit']

    # We convert the results from 2021 EUR into 2021 billion EUR
    for column_name in df.columns[1:]:
        df[column_name] /= 10**9

    return df.copy()
def combine_haven_tax_deficits(self, row, carve_outs=False, countries_replaced=None)

This function is used to compute the tax deficit of all in-sample headquarter countries in the multilateral im- plementation scenario.

For parent countries that are in both the OECD and TWZ data, we have two different sources to compute their tax- haven-based tax deficit and we retain the highest of these two amounts.

Besides, for parent countries in the OECD data that do not report a fully detailed country-by-country breakdown of the activity of their multinationals, we cannot distinguish their tax-haven and non-haven tax deficits. Quite arbitrarily in the Python code, we attribute everything to the non-haven tax deficit. In the Table A1 of the re- port, these specific cases are described with the "Only foreign aggregate data" column.

Expand source code
def combine_haven_tax_deficits(
    self,
    row,
    carve_outs=False,
    countries_replaced=None
):
    """
    This function is used to compute the tax deficit of all in-sample headquarter countries in the multilateral im-
    plementation scenario.

    For parent countries that are in both the OECD and TWZ data, we have two different sources to compute their tax-
    haven-based tax deficit and we retain the highest of these two amounts.

    Besides, for parent countries in the OECD data that do not report a fully detailed country-by-country breakdown
    of the activity of their multinationals, we cannot distinguish their tax-haven and non-haven tax deficits. Quite
    arbitrarily in the Python code, we attribute everything to the non-haven tax deficit. In the Table A1 of the re-
    port, these specific cases are described with the "Only foreign aggregate data" column.
    """
    if carve_outs and countries_replaced is None:
        raise Exception(
            'Using this function under carve-outs requires to indicate a list of countries to replace.'
        )

    if row['Parent jurisdiction (alpha-3 code)'] not in (
        COUNTRIES_WITH_MINIMUM_REPORTING + COUNTRIES_WITH_CONTINENTAL_REPORTING
    ):
        if countries_replaced is None:

            if row['tax_deficit_x_tax_haven_TWZ'] > row['tax_deficit_x_tax_haven']:
                self.countries_replaced.append(row['Parent jurisdiction (alpha-3 code)'])
                return row['tax_deficit_x_tax_haven_TWZ']

            else:
                return row['tax_deficit_x_tax_haven']

        else:
            if (
                row['tax_deficit_x_tax_haven_TWZ'] > row['tax_deficit_x_tax_haven']
                and row['Parent jurisdiction (alpha-3 code)'] in countries_replaced
            ):
                self.countries_replaced.append(row['Parent jurisdiction (alpha-3 code)'])
                return row['tax_deficit_x_tax_haven_TWZ']

            else:
                return row['tax_deficit_x_tax_haven']

    else:
        return 0
def compute_all_tax_deficits(self, minimum_ETR=0.25, CbCR_reporting_countries_only=False)

This method encapsulates most of the computations for the multilateral agreement scenario.

Taking as input the minimum effective tax rate to apply and based on OECD and TWZ data, it outputs a DataFrame which presents, for each country in our sample (countries in OECD and/or TWZ data) the total tax deficit, as well as its breakdown into domestic, tax-haven and non-haven tax deficits.

The output is in 2021 EUR after a currency conversion and the extrapolation from 2016 to 2021 figures.

Expand source code
def compute_all_tax_deficits(self, minimum_ETR=0.25, CbCR_reporting_countries_only=False):
    """
    This method encapsulates most of the computations for the multilateral agreement scenario.

    Taking as input the minimum effective tax rate to apply and based on OECD and TWZ data, it outputs a DataFrame
    which presents, for each country in our sample (countries in OECD and/or TWZ data) the total tax deficit, as
    well as its breakdown into domestic, tax-haven and non-haven tax deficits.

    The output is in 2021 EUR after a currency conversion and the extrapolation from 2016 to 2021 figures.
    """
    # We need to have previously loaded and cleaned the OECD and TWZ data
    if self.oecd is None or self.twz is None:
        raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

    # We use the method defined above and will use its output as a base for the following computations
    oecd_stratified = self.get_stratified_oecd_data(minimum_ETR=minimum_ETR)

    twz = self.twz.copy()

    # From TWZ data on profits registered in tax havens and assuming that these are taxed at a given minimum ETR
    # (10% in the report, see the instantiation function for the definition of this attribute), we deduce the tax-
    # haven-based tax deficit of TWZ countries
    twz['tax_deficit_x_tax_haven_TWZ'] = \
        twz['Profits in all tax havens (positive only)'] * (minimum_ETR - self.assumed_haven_ETR_TWZ)

    # --- Managing countries in both OECD and TWZ data

    # We focus on parent countries which are in both the OECD and TWZ data
    # NB: recall that we do not consider the Swedish CbCR
    twz_in_oecd = twz[twz['Is parent in OECD data?'].astype(bool)].copy()

    # We merge the two DataFrames on country codes
    merged_df = oecd_stratified.merge(
        twz_in_oecd[['Country', 'Alpha-3 country code', 'tax_deficit_x_tax_haven_TWZ']],
        how='left',
        left_on='Parent jurisdiction (alpha-3 code)',
        right_on='Alpha-3 country code'
    ).drop(columns=['Country', 'Alpha-3 country code'])

    # For countries that are in the OECD data but not in TWZ, we impute a tax-haven-based tax deficit from TWZ of 0
    merged_df['tax_deficit_x_tax_haven_TWZ'] = merged_df['tax_deficit_x_tax_haven_TWZ'].fillna(0)

    self.countries_replaced = []

    if self.carve_outs:

        calculator = TaxDeficitCalculator()
        calculator.load_clean_data()
        _ = calculator.compute_all_tax_deficits()

        countries_replaced = calculator.countries_replaced.copy()

        merged_df['tax_deficit_x_tax_haven_merged'] = merged_df.apply(
            lambda row: self.combine_haven_tax_deficits(
                row,
                carve_outs=self.carve_outs,
                countries_replaced=countries_replaced),
            axis=1
        )

    else:
        merged_df['tax_deficit_x_tax_haven_merged'] = merged_df.apply(
            lambda row: self.combine_haven_tax_deficits(
                row,
                carve_outs=self.carve_outs
            ),
            axis=1
        )

    self.countries_replaced = merged_df[
        merged_df['tax_deficit_x_tax_haven_merged'] == merged_df['tax_deficit_x_tax_haven_TWZ']
    ]['Parent jurisdiction (alpha-3 code)'].unique()

    merged_df.drop(columns=['tax_deficit_x_tax_haven', 'tax_deficit_x_tax_haven_TWZ'], inplace=True)

    merged_df.rename(
        columns={
            'tax_deficit_x_tax_haven_merged': 'tax_deficit_x_tax_haven'
        },
        inplace=True
    )

    # Summing the tax-haven-based, non-haven and domestic tax deficits yields the total tax deficit of each country
    merged_df['tax_deficit'] = merged_df['tax_deficit_x_tax_haven'] \
        + merged_df['tax_deficit_x_domestic'] \
        + merged_df['tax_deficit_x_non_haven']

    # --- Countries only in the TWZ data

    # We now focus on countries that are absent from the OECD data
    # NB: recall that we do not consider the Swedish CbCR
    twz_not_in_oecd = twz[~twz['Is parent in OECD data?'].astype(bool)].copy()

    twz_not_in_oecd.drop(
        columns=['Profits in all tax havens', 'Profits in all tax havens (positive only)'],
        inplace=True
    )

    # - Extrapolating the foreign non-haven tax deficit

    # We compute the imputation ratio with the method defined above
    imputation_ratio_non_haven = self.get_non_haven_imputation_ratio(minimum_ETR=minimum_ETR)

    # And we deduce the non-haven tax deficit of countries that are only found in TWZ data
    twz_not_in_oecd['tax_deficit_x_non_haven'] = \
        twz_not_in_oecd['tax_deficit_x_tax_haven_TWZ'] * imputation_ratio_non_haven

    # - Computing the domestic tax deficit

    # For countries that are only in TWZ data, we still need to compute their domestic tax deficit
    twz_domestic = self.twz_domestic.copy()

    # We only consider countries whose domestic ETR is stricly below the minimum ETR
    # (otherwise, there is no tax deficit to collect from domestic profits)
    twz_domestic = twz_domestic[twz_domestic['Domestic ETR'] < minimum_ETR].copy()

    # We compute the ETR differential
    twz_domestic['ETR_differential'] = twz_domestic['Domestic ETR'].map(lambda x: minimum_ETR - x)

    # And deduce the domestic tax deficit of each country
    twz_domestic['tax_deficit_x_domestic'] = twz_domestic['ETR_differential'] * twz_domestic['Domestic profits']

    # - Combining the different forms of tax deficit

    # We merge the two DataFrames to complement twz_not_in_oecd with domestic tax deficit results
    twz_not_in_oecd = twz_not_in_oecd.merge(
        twz_domestic[['Alpha-3 country code', 'tax_deficit_x_domestic']],
        how='left',
        on='Alpha-3 country code'
    )

    # As we filtered out countries whose domestic ETR is stricly below the minimum ETR, some missing values
    # appear during the merge; we impute 0 for these as they do not have any domestic tax deficit to collect
    twz_not_in_oecd['tax_deficit_x_domestic'] = twz_not_in_oecd['tax_deficit_x_domestic'].fillna(0)

    # We deduce the total tax deficit for each country
    twz_not_in_oecd['tax_deficit'] = twz_not_in_oecd['tax_deficit_x_tax_haven_TWZ'] \
        + twz_not_in_oecd['tax_deficit_x_domestic'] \
        + twz_not_in_oecd['tax_deficit_x_non_haven']

    # --- Merging the results of the two data sources

    # We need columns to match for the concatenation to operate smoothly
    twz_not_in_oecd.rename(
        columns={
            'Country': 'Parent jurisdiction (whitespaces cleaned)',
            'Alpha-3 country code': 'Parent jurisdiction (alpha-3 code)',
            'tax_deficit_x_tax_haven_TWZ': 'tax_deficit_x_tax_haven'
        },
        inplace=True
    )

    twz_not_in_oecd.drop(columns=['Is parent in OECD data?'], inplace=True)

    # We exclude Sweden from the OECD-drawn results, as we do not consider its CbCR
    merged_df = merged_df[merged_df['Parent jurisdiction (alpha-3 code)'] != 'SWE'].copy()

    # We eventually concatenate the two DataFrames
    merged_df = pd.concat(
        [merged_df, twz_not_in_oecd],
        axis=0
    )

    # --- Extrapolations to 2021 EUR

    # We convert 2016 USD results in 2016 EUR and extraprolate them to 2021 EUR
    for column_name in merged_df.columns[2:]:
        merged_df[column_name] = merged_df[column_name] * self.USD_to_EUR_2016 * self.multiplier_2021

    # --- Managing the case where the minimum ETR is 20% or below for TWZ countries

    # As mentioned above and detailed in Appendix A, the imputation of the non-haven tax deficit of TWZ countries
    # follows a specific process whenever the chosen minimum ETR is of or below 20%
    if minimum_ETR <= 0.2 and self.alternative_imputation:
        # We get the new multiplying factor from the method defined above
        multiplying_factor = self.get_alternative_non_haven_factor(minimum_ETR=minimum_ETR)

        # We compute all tax deficits at the reference rate (25% in the report)
        df = self.compute_all_tax_deficits(
            minimum_ETR=self.reference_rate_for_alternative_imputation
        )

        # We only consider countries that are absent from the OECD data, except Sweden as usual
        oecd_reporting_countries_but_SWE = self.oecd[
            self.oecd['Parent jurisdiction (alpha-3 code)'] != 'SWE'
        ]['Parent jurisdiction (alpha-3 code)'].unique()

        df = df[
            ~df['Parent jurisdiction (alpha-3 code)'].isin(oecd_reporting_countries_but_SWE)
        ].copy()

        # For these countries, we multiply the non-haven tax deficit at the reference rate by the multiplying factor
        df['tax_deficit_x_non_haven_imputation'] = df['tax_deficit_x_non_haven'] * multiplying_factor

        # We save the results in a dictionary that will allow to map the DataFrame that we want to output in the end
        mapping = {}

        for _, row in df.iterrows():
            mapping[row['Parent jurisdiction (alpha-3 code)']] = row['tax_deficit_x_non_haven_imputation']

        # We create a new column in the to-be-output DataFrame which takes as value:
        # - the non-haven tax deficit estimated just above for TWZ countries
        # - 0 for OECD-reporting countries, which do not require this imputation
        merged_df['tax_deficit_x_non_haven_imputation'] = merged_df['Parent jurisdiction (alpha-3 code)'].map(
            lambda country_code: mapping.get(country_code, 0)
        )

        # We deduce the non-haven tax deficit of all countries
        merged_df['tax_deficit_x_non_haven'] += merged_df['tax_deficit_x_non_haven_imputation']

        # And add this imputation also to the column that presents the total tax deficit of each country
        merged_df['tax_deficit'] += merged_df['tax_deficit_x_non_haven_imputation']

        merged_df.drop(
            columns=['tax_deficit_x_non_haven_imputation'],
            inplace=True
        )

    if CbCR_reporting_countries_only:
        merged_df = merged_df[
            merged_df['Parent jurisdiction (whitespaces cleaned)'].isin(
                self.oecd['Parent jurisdiction (whitespaces cleaned)'].unique()
            )
        ].copy()

    return merged_df.reset_index(drop=True).copy()
def compute_intermediary_scenario_gain(self, minimum_ETR=0.25)

This method encapsulates the computations used to estimate the corporate tax revenue gains of EU countries, should the European Union implement the tax deficit collection as a block. This corresponds therefore to the partial cooperation scenario described in the report.

Taking as input the selected minimum effective tax rate, this method outputs a DataFrame that presents for each in-sample EU-27 country:

  • the corporate tax revenue gains that could be collected from its own multinationals ("tax_deficit" column);
  • the tax deficit that could be collected from foreign, non-EU multinationals ("From foreign MNEs" column);
  • and the resulting total corporate tax revenue gain.

All figures are output in 2021 million EUR.

The three lines at the end of the DataFrame are a bit specific. Some OECD-reporting contries do not provide a perfectly detailed country-by-country report and for these, the "Other Europe" and "Europe" fields are assumed to be related to EU countries and are included in the total collectible tax deficit. The final line presents this total.

The methogology behind these computations is described in much more details in Appendix C of the report.

Expand source code
def compute_intermediary_scenario_gain(self, minimum_ETR=0.25):
    """
    This method encapsulates the computations used to estimate the corporate tax revenue gains of EU countries,
    should the European Union implement the tax deficit collection as a block. This corresponds therefore to the
    partial cooperation scenario described in the report.

    Taking as input the selected minimum effective tax rate, this method outputs a DataFrame that presents for each
    in-sample EU-27 country:

    - the corporate tax revenue gains that could be collected from its own multinationals ("tax_deficit" column);
    - the tax deficit that could be collected from foreign, non-EU multinationals ("From foreign MNEs" column);
    - and the resulting total corporate tax revenue gain.

    All figures are output in 2021 million EUR.

    The three lines at the end of the DataFrame are a bit specific. Some OECD-reporting contries do not provide a
    perfectly detailed country-by-country report and for these, the "Other Europe" and "Europe" fields are assumed
    to be related to EU countries and are included in the total collectible tax deficit. The final line presents
    this total.

    The methogology behind these computations is described in much more details in Appendix C of the report.
    """

    # We start by computing the total tax deficits of all in-sample countries (those of the multilateral scenario)
    tax_deficits = self.get_total_tax_deficits(minimum_ETR=minimum_ETR)

    oecd = self.oecd.copy()

    # We extract the total tax deficit for the EU-27
    eu_27_tax_deficit = tax_deficits[
        tax_deficits['Parent jurisdiction (whitespaces cleaned)'] == 'Total - EU27'
    ]['tax_deficit'].iloc[0]

    # And we store in a separate DataFrame the tax deficits of EU-27 countries
    eu_27_tax_deficits = tax_deficits[
        tax_deficits['Parent jurisdiction (alpha-3 code)'].isin(
            eu_27_country_codes
        )
    ].copy()

    # We focus only on a few non-EU countries, defined when the TaxDeficitCalculator object is instantiated
    tax_deficits = tax_deficits[
        tax_deficits['Parent jurisdiction (alpha-3 code)'].isin(
            self.country_list_intermediary_scenario
        )
    ].copy()

    # We store the results in a dictionary, which we will map upon the eu_27_tax_deficits DataFrame
    additional_revenue_gains = {}

    # We iterate over EU-27 countries and compute for eacht he tax deficit collected from non-EU multinationals
    for eu_country in eu_27_country_codes:

        td_df = tax_deficits.copy()

        # This dictionary will store the attribution ratios based on extra-group sales to be mapped upon td_df
        attribution_ratios = {}

        # We iterate over non-EU countries in our list
        for country in self.country_list_intermediary_scenario:

            oecd_restricted = oecd[oecd['Parent jurisdiction (alpha-3 code)'] == country].copy()

            # We fetch the extra-group sales registered by the non-EU country's multinationals in the EU-27 country
            # (defaults to 0 if the EU-27 country is not among the partners of the non-EU country)
            sales_in_eu_country = oecd_restricted[
                oecd_restricted['Partner jurisdiction (alpha-3 code)'] == eu_country
            ]['Unrelated Party Revenues'].sum()

            # We compute the total extra-group sales registered by the non-EU country's multinationals worldwide
            total_sales = oecd_restricted['Unrelated Party Revenues'].sum()

            # We deduce the share of the non-EU country's tax deficit attributable to the EU-27 country
            attribution_ratios[country] = sales_in_eu_country / total_sales

        # We map the attribution_ratios dictionary upon the td_df DataFrame
        td_df['Attribution ratios'] = td_df['Parent jurisdiction (alpha-3 code)'].map(attribution_ratios)

        # We deduce, for each non-EU country, the amount of its tax deficit that is collected by the EU-27 country
        td_df['Collectible tax deficit'] = td_df['Attribution ratios'] * td_df['tax_deficit']

        # We sum all these and multiply the total by 2 to estimate the total tax deficit that the EU-27 country
        # could collect from non-EU multinationals
        additional_revenue_gains[eu_country] = td_df['Collectible tax deficit'].sum() * 2

        # NB: the multiplication by 2 corresponds to the imputation strategy defined in Appendix C of the report

    # We map the resulting dictionary upon the eu_27_tax_deficits DataFrame
    eu_27_tax_deficits['From foreign MNEs'] = eu_27_tax_deficits['Parent jurisdiction (alpha-3 code)'].map(
        additional_revenue_gains
    )

    # And deduce total corporate tax revenue gains from such a scenario for all EU-27 countries
    eu_27_tax_deficits['Total'] = (
        eu_27_tax_deficits['tax_deficit'] + eu_27_tax_deficits['From foreign MNEs']
    )

    # We operate a similar process for "Europe" and "Other Europe" field
    additional_revenue_gains = {}

    for aggregate in ['Europe', 'Other Europe']:

        td_df = tax_deficits.copy()

        attribution_ratios = {}

        for country in self.country_list_intermediary_scenario:

            # We do not consider the "Other Europe" field in the US CbCR as it probably does not correspond to
            # activities operated in EU-27 countries (sufficient country-by-country breakdown to exclude this)
            if country == 'USA':
                attribution_ratios[country] = 0

                continue

            oecd_restricted = oecd[oecd['Parent jurisdiction (alpha-3 code)'] == country].copy()

            sales_in_europe_or_other_europe = oecd_restricted[
                oecd_restricted['Partner jurisdiction (whitespaces cleaned)'] == aggregate
            ]['Unrelated Party Revenues'].sum()

            total_sales = oecd_restricted['Unrelated Party Revenues'].sum()

            attribution_ratios[country] = sales_in_europe_or_other_europe / total_sales

        td_df['Attribution ratios'] = td_df['Parent jurisdiction (alpha-3 code)'].map(attribution_ratios)

        td_df['Collectible tax deficit'] = td_df['Attribution ratios'] * td_df['tax_deficit']

        additional_revenue_gains[aggregate] = td_df['Collectible tax deficit'].sum()

    # We drop unnecessary columns
    eu_27_tax_deficits.drop(
        columns=['Parent jurisdiction (alpha-3 code)'],
        inplace=True
    )

    # And we operate very inelegant transformations of the DataFrame to add the "Other Europe", "Europe" and "Total"
    # fields at the bottom of the DataFrame
    eu_27_tax_deficits.reset_index(drop=True, inplace=True)

    dict_df = eu_27_tax_deficits.to_dict()

    dict_df[eu_27_tax_deficits.columns[0]][len(eu_27_tax_deficits)] = 'Other Europe'
    dict_df[eu_27_tax_deficits.columns[1]][len(eu_27_tax_deficits)] = 0
    dict_df[eu_27_tax_deficits.columns[2]][len(eu_27_tax_deficits)] = additional_revenue_gains['Other Europe']
    dict_df[eu_27_tax_deficits.columns[3]][len(eu_27_tax_deficits)] = additional_revenue_gains['Other Europe']

    dict_df[eu_27_tax_deficits.columns[0]][len(eu_27_tax_deficits) + 1] = 'Europe'
    dict_df[eu_27_tax_deficits.columns[1]][len(eu_27_tax_deficits) + 1] = 0
    dict_df[eu_27_tax_deficits.columns[2]][len(eu_27_tax_deficits) + 1] = additional_revenue_gains['Europe']
    dict_df[eu_27_tax_deficits.columns[3]][len(eu_27_tax_deficits) + 1] = additional_revenue_gains['Europe']

    # Here we compute total corporate tax revenue gains for EU-27 countries
    # NB: We have not multiplied the "Other Europe" and "Europe" fields by 2 (no imputation for these)
    total_additional_revenue_gain = eu_27_tax_deficits['From foreign MNEs'].sum() \
        + additional_revenue_gains['Europe'] \
        + additional_revenue_gains['Other Europe']

    dict_df[eu_27_tax_deficits.columns[0]][len(eu_27_tax_deficits) + 2] = 'Total'
    dict_df[eu_27_tax_deficits.columns[1]][len(eu_27_tax_deficits) + 2] = eu_27_tax_deficit
    dict_df[eu_27_tax_deficits.columns[2]][len(eu_27_tax_deficits) + 2] = total_additional_revenue_gain
    dict_df[eu_27_tax_deficits.columns[3]][len(eu_27_tax_deficits) + 2] = \
        eu_27_tax_deficit + total_additional_revenue_gain

    eu_27_tax_deficits = pd.DataFrame.from_dict(dict_df)

    # We convert 2021 EUR figures into 2021 billion EUR
    for column_name in eu_27_tax_deficits.columns[1:]:
        eu_27_tax_deficits[column_name] /= 10**6

    return eu_27_tax_deficits.copy()
def compute_unilateral_scenario_gain(self, country, minimum_ETR=0.25)

This method encapsulates most of the computations for the unilateral implementation scenario.

It takes as input:

  • the name of the country assumed to unilaterally implement the tax deficit collection;

  • the minimum effective tax rate that it applies when collecting the full tax deficit of its multinationals and a part of the tax deficit of foreign multinationals, based on the location of their sales.

The output of this method is a DataFrame organized as follows:

  • each row is a headquarter country whose tax deficit would be collected partly or entirely by the taxing coun- try (including the taxing country which collects 100% of the tax deficit of its multinationals);

  • there are two columns, with the name of the headquarter country considered and the tax deficit amount that could be collected from its multinationals by the taxing country.

Figures are presented in 2021 EUR.

Important disclaimer: for now, this method is not robust to variations in the country name, i.e. only country names as presented in the OECD CbCR data will generate a result. These are the country names that are proposed in the selectbox on the online simulator.

The methogology behind these computations is described in much more details in Appendix B of the report.

Expand source code
def compute_unilateral_scenario_gain(self, country, minimum_ETR=0.25):
    """
    This method encapsulates most of the computations for the unilateral implementation scenario.

    It takes as input:

    - the name of the country assumed to unilaterally implement the tax deficit collection;

    - the minimum effective tax rate that it applies when collecting the full tax deficit of its multinationals and
    a part of the tax deficit of foreign multinationals, based on the location of their sales.

    The output of this method is a DataFrame organized as follows:

    - each row is a headquarter country whose tax deficit would be collected partly or entirely by the taxing coun-
    try (including the taxing country which collects 100% of the tax deficit of its multinationals);

    - there are two columns, with the name of the headquarter country considered and the tax deficit amount that
    could be collected from its multinationals by the taxing country.

    Figures are presented in 2021 EUR.

    Important disclaimer: for now, this method is not robust to variations in the country name, i.e. only country
    names as presented in the OECD CbCR data will generate a result. These are the country names that are proposed
    in the selectbox on the online simulator.

    The methogology behind these computations is described in much more details in Appendix B of the report.
    """

    # We start from the total tax deficits of all countries which can be partly re-allocated to the taxing country
    tax_deficits = self.get_total_tax_deficits(minimum_ETR=minimum_ETR)

    # The OECD data provides the information of extra-group sales, needed to allocate foreign tax deficits
    oecd = self.oecd.copy()

    # We simply convert the name of the taxing country to the corresponding alpha-3 code
    taxing_country = country
    try:
        taxing_country_code = self.oecd[
            self.oecd['Parent jurisdiction (whitespaces cleaned)'] == taxing_country
        ]['Parent jurisdiction (alpha-3 code)'].iloc[0]
    except:
        taxing_country_code = self.twz[
            self.twz['Country'] == taxing_country
        ]['Alpha-3 country code'].iloc[0]

    # This list will store the allocation ratios (for each headquarter country, the share of its tax deficit that
    # can be collected by the taxing country) computed based on the location of extra-group sales
    attribution_ratios = []

    # We iterate over parent countries in the OECD data
    for country_code in tax_deficits['Parent jurisdiction (alpha-3 code)'].values:

        # The taxing country collects 100% of the tax deficit of its own multinationals
        if country_code == taxing_country_code:
            attribution_ratios.append(1)

        # If the parent country is not the taxing country
        else:
            # We restrict the DataFrame to the CbCR of the considered parent country
            oecd_restricted = oecd[oecd['Parent jurisdiction (alpha-3 code)'] == country_code].copy()

            # If the taxing country is not part of its partner jurisdictions, the attribution ratio is of 0
            if taxing_country_code not in oecd_restricted['Partner jurisdiction (alpha-3 code)'].values:
                attribution_ratios.append(0)

            else:
                # We fetch extra-group sales registered in the taxing country
                mask = (oecd_restricted['Partner jurisdiction (alpha-3 code)'] == taxing_country_code)
                sales_in_taxing_country = oecd_restricted[mask]['Unrelated Party Revenues'].iloc[0]

                # We compute total extra-group sales
                total_sales = oecd_restricted['Unrelated Party Revenues'].sum()

                # We append the resulting ratio to the list of attribution ratios
                attribution_ratios.append(sales_in_taxing_country / total_sales)

    # We add this list to the DataFrame as a new column
    tax_deficits['Attribution ratios'] = attribution_ratios

    # We deduce, for each headquarter country, the tax deficit that could be collected by the taxing country
    tax_deficits[f'Collectible tax deficit for {taxing_country}'] = \
        tax_deficits['tax_deficit'] * tax_deficits['Attribution ratios']

    # We eliminate irrelevant columns
    tax_deficits.drop(
        columns=[
            'Attribution ratios',
            'tax_deficit',
            'Parent jurisdiction (alpha-3 code)'
        ],
        inplace=True
    )

    # We filter out rows for which the collectible tax deficit is 0
    tax_deficits = tax_deficits[tax_deficits[f'Collectible tax deficit for {taxing_country}'] > 0].copy()

    # We sort values based on the resulting tax deficit, in descending order
    tax_deficits.sort_values(
        by=f'Collectible tax deficit for {taxing_country}',
        ascending=False,
        inplace=True
    )

    # Because the OECD data only gather 26 headquarter countries, we need to make an assumption on the tax deficit
    # that could be collected from other parent countries, excluded from the 2016 version of the data

    # We therefore double the tax deficit collected from non-US foreign countries
    imputation = tax_deficits[
        ~tax_deficits['Parent jurisdiction (whitespaces cleaned)'].isin([taxing_country, 'United States'])
    ][f'Collectible tax deficit for {taxing_country}'].sum()

    # Except for Germany, for which we add back only half of the tax deficit collected from non-US foreign countries
    if taxing_country_code == 'DEU':
        imputation /= 2

    tax_deficits.reset_index(drop=True, inplace=True)

    # Again the same inelegant way of adding "Total" fields at the end of the DataFrame
    dict_df = tax_deficits.to_dict()

    dict_df[tax_deficits.columns[0]][len(tax_deficits)] = 'Others (imputation)'
    dict_df[tax_deficits.columns[1]][len(tax_deficits)] = imputation

    dict_df[tax_deficits.columns[0]][len(tax_deficits) + 1] = 'Total'
    dict_df[tax_deficits.columns[1]][len(tax_deficits) + 1] = (
        tax_deficits[tax_deficits.columns[1]].sum() + imputation
    )

    df = pd.DataFrame.from_dict(dict_df)

    return df.copy()
def get_alternative_non_haven_factor(self, minimum_ETR)

Looking at the formula (A2) of Appendix A and at the previous method, we see that for a 15% tax rate, this impu- tation would result in no tax deficit to be collected from non-tax haven jurisdictions. Thus, we correct for this underestimation by computing the ratio of the tax deficit that can be collected in non-tax havens at a 15% and a 25% rate for OECD-reporting countries.

This class method allows to compute this alternative imputation ratio.

The methodology is described in more details in the Appendix A of the report.

Expand source code
def get_alternative_non_haven_factor(self, minimum_ETR):
    """
    Looking at the formula (A2) of Appendix A and at the previous method, we see that for a 15% tax rate, this impu-
    tation would result in no tax deficit to be collected from non-tax haven jurisdictions. Thus, we correct for
    this underestimation by computing the ratio of the tax deficit that can be collected in non-tax havens at a 15%
    and a 25% rate for OECD-reporting countries.

    This class method allows to compute this alternative imputation ratio.

    The methodology is described in more details in the Appendix A of the report.
    """

    # We need to have previously loaded and cleaned the OECD data
    if self.oecd is None:
        raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

    # This method is only useful if the previous one yields a ratio of 0, i.e. if the minimum ETR is of 20% or less
    if minimum_ETR > 0.2:
        raise Exception('These computations are only used when the minimum ETR considered is 0.2 or less.')

    # We use the get_stratified_oecd_data to compute the non-haven tax deficit of OECD-reporting countries
    oecd_stratified = self.get_stratified_oecd_data(
        minimum_ETR=self.reference_rate_for_alternative_imputation
    )

    # We exclude countries whose CbCR breakdown does not allow to distinguish tax-haven and non-haven profits
    df_restricted = oecd_stratified[
        ~oecd_stratified['Parent jurisdiction (alpha-3 code)'].isin(
            COUNTRIES_WITH_CONTINENTAL_REPORTING + COUNTRIES_WITH_MINIMUM_REPORTING
        )
    ].copy()

    # The denominator is the total non-haven tax deficit of relevant countries at the reference minimum ETR
    denominator = df_restricted['tax_deficit_x_non_haven'].sum()

    # We follow the same process, running computations at the minimum ETR this time
    oecd_stratified = self.get_stratified_oecd_data(minimum_ETR=minimum_ETR)

    # We exclude countries whose CbCR breakdown does not allow to distinguish tax-haven and non-haven profits
    df_restricted = oecd_stratified[
        ~oecd_stratified['Parent jurisdiction (alpha-3 code)'].isin(
            COUNTRIES_WITH_CONTINENTAL_REPORTING + COUNTRIES_WITH_MINIMUM_REPORTING
        )
    ].copy()

    # The numerator is the total non-haven tax deficit of relevant countries at the selected minimum ETR
    numerator = df_restricted['tax_deficit_x_non_haven'].sum()

    return numerator / denominator
def get_carve_outs_rate_table(self, minimum_ETR, depreciation_only, exclude_inventories)

This function takes as inputs:

  • the minimum effective tax rate to apply to multinationals' profits;

  • a boolean, "depreciation_only", indicating whether to restrict the tangible assets component of substance- based carve-outs to a share of depreciation expenses;

  • a boolean, "exlude_inventories", indicating whether to exlude inventories from tangible assets or not.

It returns a DataFrame that shows, for each in-sample country, the estimated revenues that could be collected from a global minimum tax without any carve-outs and with carve-outs of 5%, 7.5% and 10% of tangible assets and payroll combined.

Expand source code
def get_carve_outs_rate_table(
    self,
    minimum_ETR,
    depreciation_only, exclude_inventories,
):
    """
    This function takes as inputs:

    - the minimum effective tax rate to apply to multinationals' profits;

    - a boolean, "depreciation_only", indicating whether to restrict the tangible assets component of substance-
    based carve-outs to a share of depreciation expenses;

    - a boolean, "exlude_inventories", indicating whether to exlude inventories from tangible assets or not.

    It returns a DataFrame that shows, for each in-sample country, the estimated revenues that could be collected
    from a global minimum tax without any carve-outs and with carve-outs of 5%, 7.5% and 10% of tangible assets and
    payroll combined.
    """

    # We instantiate a TaxDeficitCalculator object without carve-outs
    calculator = TaxDeficitCalculator()

    calculator.load_clean_data()

    # We use it to compute revenue gains without any carve-out
    td_no_carve_out = calculator.get_total_tax_deficits(minimum_ETR=minimum_ETR).iloc[:-2]

    td_no_carve_out.rename(
        columns={
            'tax_deficit': 'tax_deficit_no_carve_out'
        },
        inplace=True
    )

    # A copy of the resulting DataFrame will be used as a central table to which we add the relevant columns
    merged_df = td_no_carve_out.copy()

    # We iterate over carve-out rates
    for carve_out_rate in [5, 7.5, 10]:
        actual_rate = carve_out_rate / 100

        # We instantiate a TaxDeficitCalculator object with carve-outs at the rate considered
        calculator = TaxDeficitCalculator(
            carve_outs=True, carve_out_rate=actual_rate,
            depreciation_only=False, exclude_inventories=exclude_inventories
        )
        calculator.load_clean_data()

        # We use it to compute revenue gains with substance-based carve-outs being applied
        td_carve_out = calculator.get_total_tax_deficits(minimum_ETR=minimum_ETR).iloc[:-2]

        # We add the tax deficits thereby computed to the central table
        merged_df = merged_df.merge(
            td_carve_out[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
            how='left',
            on='Parent jurisdiction (alpha-3 code)'
        )

        merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

        merged_df.rename(
            columns={
                'tax_deficit': f'tax_deficit_{carve_out_rate}_carve_out'
            },
            inplace=True
        )

    # We only display EU or CbCR-reporting countries
    cbcr_reporting_countries = list(self.oecd['Parent jurisdiction (alpha-3 code)'].unique())

    mask_eu = merged_df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
    mask_cbcr = merged_df['Parent jurisdiction (alpha-3 code)'].isin(cbcr_reporting_countries)

    # This condition is encapsulated in the following boolean indexing mask
    mask = np.logical_or(mask_eu, mask_cbcr)

    # We add two useful indicator variables
    merged_df['IS_EU'] = mask_eu * 1
    merged_df['REPORTS_CbCR'] = mask_cbcr * 1

    # And we restrict the DataFrame to relevant countries
    restricted_df = merged_df[mask].copy()

    # We finalise the formatting of the table
    restricted_df.sort_values(
        by=['IS_EU', 'Parent jurisdiction (alpha-3 code)'],
        ascending=[False, True],
        inplace=True
    )

    # And eventually return the DataFrame
    return restricted_df.copy()
def get_carve_outs_table(self, TWZ_countries_methodology, depreciation_only, exclude_inventories, carve_out_rate=0.05)

This function takes as input:

  • the methodology to use to estimate the post-carve-out revenue gains of TWZ countries;

  • a boolean, "depreciation_only", indicating whether to restrict the tangible assets component of substance- based carve-outs to a share of depreciation expenses;

  • a boolean, "exlude_inventories", indicating whether to exlude inventories from tangible assets or not;

  • the carve-out rate to use (which defaults to 5%).

It returns a DataFrame that shows, for the 15% and 25% minimum rates and for each in-sample country, the estima- ted revenue gains from a global minimum tax without and with carve-outs being applied.

Expand source code
def get_carve_outs_table(
    self,
    TWZ_countries_methodology,
    depreciation_only, exclude_inventories,
    carve_out_rate=0.05
):
    """
    This function takes as input:

    - the methodology to use to estimate the post-carve-out revenue gains of TWZ countries;

    - a boolean, "depreciation_only", indicating whether to restrict the tangible assets component of substance-
    based carve-outs to a share of depreciation expenses;

    - a boolean, "exlude_inventories", indicating whether to exlude inventories from tangible assets or not;

    - the carve-out rate to use (which defaults to 5%).

    It returns a DataFrame that shows, for the 15% and 25% minimum rates and for each in-sample country, the estima-
    ted revenue gains from a global minimum tax without and with carve-outs being applied.
    """

    # We need to have previously loaded and cleaned the OECD data
    if self.oecd is None:
        raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

    # The "TWZ_countries_methodology" argument can only take a few string values
    if TWZ_countries_methodology not in ['initial', 'new']:
        raise Exception('The "TWZ_countries_methodology" argument only accepts two values: "initial" or "new".')

    # Computing tax deficits without substance-based carve-outs
    calculator = TaxDeficitCalculator()

    calculator.load_clean_data()

    td_25 = calculator.get_total_tax_deficits(minimum_ETR=0.25).iloc[:-2, :]
    td_15 = calculator.get_total_tax_deficits(minimum_ETR=0.15).iloc[:-2, :]

    # We merge the resulting DataFrames for the 15% and 25% minimum rates
    merged_df = td_25.merge(
        td_15[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
        how='left',
        on='Parent jurisdiction (alpha-3 code)'
    )

    merged_df['tax_deficit_y'] = merged_df['tax_deficit_y'].fillna(0)

    merged_df.rename(
        columns={
            'tax_deficit_x': 'tax_deficit_25_no_carve_out',
            'tax_deficit_y': 'tax_deficit_15_no_carve_out'
        },
        inplace=True
    )

    # Computing corresponding tax deficits with substance-based carve-outs
    calculator = TaxDeficitCalculator(
        carve_outs=True, carve_out_rate=carve_out_rate,
        depreciation_only=depreciation_only, exclude_inventories=exclude_inventories
    )

    calculator.load_clean_data()

    td_25 = calculator.get_total_tax_deficits(minimum_ETR=0.25).iloc[:-2]
    td_15 = calculator.get_total_tax_deficits(minimum_ETR=0.15).iloc[:-2]

    # We merge the DataFrame obtained for the 25% minimum rate
    merged_df = merged_df.merge(
        td_25[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
        how='left',
        on='Parent jurisdiction (alpha-3 code)'
    )

    merged_df.rename(
        columns={
            'tax_deficit': 'tax_deficit_25_with_carve_out'
        },
        inplace=True
    )

    # We merge the DataFrame obtained for the 15% minimum rate
    merged_df = merged_df.merge(
        td_15[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
        how='left',
        on='Parent jurisdiction (alpha-3 code)'
    )

    merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

    merged_df.rename(
        columns={
            'tax_deficit': 'tax_deficit_15_with_carve_out'
        },
        inplace=True
    )

    # We only show EU and/or CbCR-reporting countries
    cbcr_reporting_countries = list(self.oecd['Parent jurisdiction (alpha-3 code)'].unique())

    mask_eu = merged_df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
    mask_cbcr = merged_df['Parent jurisdiction (alpha-3 code)'].isin(cbcr_reporting_countries)

    # This condition is encapsulated in this boolean indexing mask
    mask = np.logical_or(mask_eu, mask_cbcr)

    # We add two useful indicator variables
    merged_df['IS_EU'] = mask_eu * 1
    merged_df['REPORTS_CbCR'] = mask_cbcr * 1

    # And we restrict the DataFrame to relevant countries
    restricted_df = merged_df[mask].copy()

    # We finalise the reformatting of the DataFrame
    restricted_df.sort_values(
        by=['IS_EU', 'Parent jurisdiction (alpha-3 code)'],
        ascending=[False, True],
        inplace=True
    )

    if TWZ_countries_methodology == 'initial':
        # If we have opted for the "initial" methodology for TWZ countries, we can simply return the DataFrame as is
        return restricted_df.copy()

    else:
        # If we have chosen the "new" methodology, we have a bit more work!

        # We create a temporary copy of the DataFrame, restricted to CbCR-reporting countries (excluding Sweden)
        temp = restricted_df[restricted_df['REPORTS_CbCR'] == 1].copy()
        temp = temp[temp['Parent jurisdiction (alpha-3 code)'] != 'SWE'].copy()

        # We deduce the average reduction factors to apply to the collectible tax deficits of TWZ countries
        self.imputation_15 = temp['tax_deficit_15_with_carve_out'].sum() / temp['tax_deficit_15_no_carve_out'].sum()
        self.imputation_25 = temp['tax_deficit_25_with_carve_out'].sum() / temp['tax_deficit_25_no_carve_out'].sum()

        # We apply the two downgrade factors to tax deficits without carve-outs
        restricted_df['tax_deficit_15_with_carve_out'] = restricted_df.apply(
            (
                lambda row: row['tax_deficit_15_no_carve_out'] * self.imputation_15 if row['REPORTS_CbCR'] == 0 or
                row['Parent jurisdiction (alpha-3 code)'] == 'SWE' else row['tax_deficit_15_with_carve_out']
            ),
            axis=1
        )

        restricted_df['tax_deficit_25_with_carve_out'] = restricted_df.apply(
            (
                lambda row: row['tax_deficit_25_no_carve_out'] * self.imputation_25 if row['REPORTS_CbCR'] == 0 or
                row['Parent jurisdiction (alpha-3 code)'] == 'SWE' else row['tax_deficit_25_with_carve_out']
            ),
            axis=1
        )

        # And we return the adjusted DataFrame
        return restricted_df.copy()
def get_carve_outs_table_2(self, exclude_inventories, depreciation_only, carve_out_rate=0.05, output_Excel=False)

This function takes as input:

  • a boolean, "depreciation_only", indicating whether to restrict the tangible assets component of substance- based carve-outs to a share of depreciation expenses;

  • a boolean, "exlude_inventories", indicating whether to exlude inventories from tangible assets or not;

  • the carve-out rate to use (which defaults to 5%).

It returns a DataFrame that shows, for the different minimum effective tax rates and for each in-sample country, the estimated impact of substance-based carve-outs. The change is expressed as a percentage of revenue gain es- timates without substance-based carve-outs.

Expand source code
def get_carve_outs_table_2(
    self,
    exclude_inventories, depreciation_only,
    carve_out_rate=0.05,
    output_Excel=False
):
    """
    This function takes as input:

    - a boolean, "depreciation_only", indicating whether to restrict the tangible assets component of substance-
    based carve-outs to a share of depreciation expenses;

    - a boolean, "exlude_inventories", indicating whether to exlude inventories from tangible assets or not;

    - the carve-out rate to use (which defaults to 5%).

    It returns a DataFrame that shows, for the different minimum effective tax rates and for each in-sample country,
    the estimated impact of substance-based carve-outs. The change is expressed as a percentage of revenue gain es-
    timates without substance-based carve-outs.
    """

    # The "get_carve_outs_table" method provides the required information for two minimum ETRs, 15% and 25%
    # This will serve as a central DataFrame to which we will add the 21% and 30% columns
    df = self.get_carve_outs_table(
        TWZ_countries_methodology='initial',
        exclude_inventories=exclude_inventories, depreciation_only=depreciation_only,
        carve_out_rate=carve_out_rate
    )

    # Computing tax deficits without substance-based carve-outs
    calculator = TaxDeficitCalculator()

    calculator.load_clean_data()

    td_21 = calculator.get_total_tax_deficits(minimum_ETR=0.21).iloc[:-2, :]
    td_30 = calculator.get_total_tax_deficits(minimum_ETR=0.3).iloc[:-2, :]

    # We add the 21% tax deficit to the central DataFrame
    merged_df = df.merge(
        td_21[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
        how='left',
        on='Parent jurisdiction (alpha-3 code)'
    )

    merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

    # We add the 30% tax deficit to the central DataFrame
    merged_df = merged_df.merge(
        td_30[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
        how='left',
        on='Parent jurisdiction (alpha-3 code)'
    )

    merged_df['tax_deficit_y'] = merged_df['tax_deficit_y'].fillna(0)

    merged_df.rename(
        columns={
            'tax_deficit_x': 'tax_deficit_21_no_carve_out',
            'tax_deficit_y': 'tax_deficit_30_no_carve_out'
        },
        inplace=True
    )

    # Computing corresponding tax deficits with substance-based carve-outs
    calculator = TaxDeficitCalculator(
        carve_outs=True,
        carve_out_rate=carve_out_rate,
        depreciation_only=depreciation_only,
        exclude_inventories=exclude_inventories
    )

    calculator.load_clean_data()

    td_21 = calculator.get_total_tax_deficits(minimum_ETR=0.21).iloc[:-2]
    td_30 = calculator.get_total_tax_deficits(minimum_ETR=0.3).iloc[:-2]

    # We add the 21% tax deficit with carve-outs to the central DataFrame
    merged_df = merged_df.merge(
        td_21[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
        how='left',
        on='Parent jurisdiction (alpha-3 code)'
    )

    merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

    merged_df.rename(
        columns={
            'tax_deficit': 'tax_deficit_21_with_carve_out'
        },
        inplace=True
    )

    # We add the 30% tax deficit with carve-outs to the central DataFrame
    merged_df = merged_df.merge(
        td_30[['Parent jurisdiction (alpha-3 code)', 'tax_deficit']],
        how='left',
        on='Parent jurisdiction (alpha-3 code)'
    )

    merged_df['tax_deficit'] = merged_df['tax_deficit'].fillna(0)

    merged_df.rename(
        columns={
            'tax_deficit': 'tax_deficit_30_with_carve_out'
        },
        inplace=True
    )

    # We have the tax deficit absolute amounts with and without carve-outs at 15%, 21%, 25% and 30% minimum rates
    # But we want to display the changes due to carve-outs, as a % of the no-carve-out tax deficit

    # We store the names of the 4 columns that we are going to add to the central DataFrame
    new_columns = []

    # We iterate over the 4 minimum rates
    for minimum_rate in [15, 21, 25, 30]:
        column_name_no_carve_out = f'tax_deficit_{minimum_rate}_no_carve_out'
        column_name_with_carve_out = f'tax_deficit_{minimum_rate}_with_carve_out'

        # We are going to add a new column that provides the % reduction due to carve-outs at the rate considered
        new_column_name = f'reduction_at_{minimum_rate}_minimum_rate'

        # We make the corresponding computation
        merged_df[new_column_name] = (
            (merged_df[column_name_with_carve_out] - merged_df[column_name_no_carve_out]) /
            merged_df[column_name_no_carve_out]
        ) * 100

        new_columns.append(new_column_name)

    if output_Excel:
        with pd.ExcelWriter('/Users/Paul-Emmanuel/Desktop/carve_outs_table_2.xlsx', engine='xlsxwriter') as writer:
            merged_df.to_excel(writer, sheet_name='table_2', index=False)

    # We output the resulting DataFrame with country codes and names, as well as the 4 columns of interest
    merged_df = merged_df[
        ['Parent jurisdiction (alpha-3 code)', 'Parent jurisdiction (whitespaces cleaned)'] + new_columns
    ].copy()

    return merged_df.copy()
def get_non_haven_imputation_ratio(self, minimum_ETR)

For non-OECD reporting countries, we base our estimates on data compiled by Tørsløv, Wier and Zucman (2019). These allow to compute domestic and tax-haven-based tax deficit of these countries. We extrapolate the non-haven tax deficit of these countries from the tax-haven one.

We impute the tax deficit in non-haven jurisdictions by estimating the ratio of tax deficits in non-tax havens to tax-havens for the EU non-tax haven parent countries in the CbCR data. We assume a 20% ETR in non-tax havens and a 10% ETR in tax havens (these rates are defined in two dedicated attributes in the instantiation function).

This function allows to compute this ratio following the (A2) formula of Appendix A.

The methodology is described in more details in the Appendix A of the report.

Expand source code
def get_non_haven_imputation_ratio(self, minimum_ETR):
    """
    For non-OECD reporting countries, we base our estimates on data compiled by Tørsløv, Wier and Zucman (2019).
    These allow to compute domestic and tax-haven-based tax deficit of these countries. We extrapolate the non-haven
    tax deficit of these countries from the tax-haven one.

    We impute the tax deficit in non-haven jurisdictions by estimating the ratio of tax deficits in non-tax havens
    to tax-havens for the EU non-tax haven parent countries in the CbCR data. We assume a 20% ETR in non-tax havens
    and a 10% ETR in tax havens (these rates are defined in two dedicated attributes in the instantiation function).

    This function allows to compute this ratio following the (A2) formula of Appendix A.

    The methodology is described in more details in the Appendix A of the report.
    """

    # We need to have previously loaded and cleaned the OECD data
    if self.oecd is None:
        raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

    # With a minimum ETR of 10%, the formula cannot be applied (division by 0), hence this case disjunction
    if minimum_ETR > 0.1:
        oecd = self.oecd.copy()

        # In the computation of the imputation ratio, we only focus on:
        # - EU-27 parent countries
        mask_eu = oecd['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
        # - That are not tax havens
        mask_non_haven = ~oecd['Parent jurisdiction (alpha-3 code)'].isin(tax_haven_country_codes)
        # - And report a detailed country by country breakdown in their CbCR
        mask_minimum_reporting_countries = ~oecd['Parent jurisdiction (alpha-3 code)'].isin(
            COUNTRIES_WITH_MINIMUM_REPORTING + COUNTRIES_WITH_CONTINENTAL_REPORTING
        )

        # We combine the boolean indexing masks
        mask = np.logical_and(mask_eu, mask_non_haven)
        mask = np.logical_and(mask, mask_minimum_reporting_countries)

        # And convert booleans into 0 / 1 integers
        mask = mask * 1

        # We compute the profits registered by retained countries in non-haven countries
        # (excluding domestic profits, cf. the earlier use of the manage_overlap_with_domestic function)
        foreign_non_haven_profits = (
            (
                mask * oecd['Is partner jurisdiction a non-haven?']
            ) * oecd['Profit (Loss) before Income Tax']
        ).sum()

        # We compute the profits registered by retained countries in tax havens
        # (excluding domestic profits, cf. the earlier use of the manage_overlap_with_domestic function)
        foreign_haven_profits = (
            (
                mask * oecd['Is partner jurisdiction a tax haven?']
            ) * oecd['Profit (Loss) before Income Tax']
        ).sum()

        # We apply the formula and compute the imputation ratio
        imputation_ratio_non_haven = (
            (
                # If the minimum ETR is below the rate assumed to be applied on non-haven profits, there is no tax
                # deficit to collect from these profits, which is why we have this max(..., 0)
                max(minimum_ETR - self.assumed_non_haven_ETR_TWZ, 0) * foreign_non_haven_profits
            ) /
            ((minimum_ETR - self.assumed_haven_ETR_TWZ) * foreign_haven_profits)
        )

    # We manage the case where the minimum ETR is of 10% and the formula cannot be applied
    elif minimum_ETR == 0.1:

        # As long as tax haven profits are assumed to be taxed at a rate of 10%, the value that we set here has no
        # effect (it will be multiplied to 0 tax-haven-based tax deficits) but to remain consistent with higher
        # values of the minimum ETR, we impute 0

        imputation_ratio_non_haven = 0

    else:
        # We do not yet manage effective tax rates below 10%
        raise Exception('Unexpected minimum ETR entered (strictly below 0.1).')

    return imputation_ratio_non_haven
def get_stratified_oecd_data(self, minimum_ETR=0.25)

This method constitutes a first step in the computation of each country's collectible tax deficit in the multi- lateral agreement scenario.

Taking the minimum effective tax rate as input and based on OECD data, this function outputs a DataFrame that displays, for each OECD-reporting parent country, the tax deficit that could be collected from the domestic, tax haven and non-haven profits of multinationals headquartered in this country.

The output is in 2016 USD, like the raw OECD data.

Expand source code
def get_stratified_oecd_data(self, minimum_ETR=0.25):
    """
    This method constitutes a first step in the computation of each country's collectible tax deficit in the multi-
    lateral agreement scenario.

    Taking the minimum effective tax rate as input and based on OECD data, this function outputs a DataFrame that
    displays, for each OECD-reporting parent country, the tax deficit that could be collected from the domestic,
    tax haven and non-haven profits of multinationals headquartered in this country.

    The output is in 2016 USD, like the raw OECD data.
    """

    # We need to have previously loaded and cleaned the OECD data
    if self.oecd is None:
        raise Exception('You first need to load clean data with the dedicated method and inplace=True.')

    oecd = self.oecd.copy()

    # We only profits taxed at an effective tax rate above the minimum ETR
    oecd = oecd[oecd['ETR'] < minimum_ETR].copy()

    # We compute the ETR differential for all low-taxed profits
    oecd['ETR_differential'] = oecd['ETR'].map(lambda x: minimum_ETR - x)

    # And deduce the tax deficit generated by each Parent / Partner jurisidiction pair
    oecd['tax_deficit'] = oecd['ETR_differential'] * oecd['Profit (Loss) before Income Tax']

    # Using the aforementioned indicator variables allows to breakdown this tax deficit
    oecd['tax_deficit_x_domestic'] = oecd['tax_deficit'] * oecd['Is domestic?']
    oecd['tax_deficit_x_tax_haven'] = oecd['tax_deficit'] * oecd['Is partner jurisdiction a tax haven?']
    oecd['tax_deficit_x_non_haven'] = oecd['tax_deficit'] * oecd['Is partner jurisdiction a non-haven?']

    # We group the table by Parent jurisdiction such that for, say, France, the table displays the total domestic,
    # tax-haven and non-haven tax deficit generated by French multinationals
    oecd_stratified = oecd[
        [
            'Parent jurisdiction (whitespaces cleaned)',
            'Parent jurisdiction (alpha-3 code)',
            'tax_deficit',
            'tax_deficit_x_domestic',
            'tax_deficit_x_tax_haven',
            'tax_deficit_x_non_haven'
        ]
    ].groupby(
        'Parent jurisdiction (whitespaces cleaned)'
    ).agg(
        {
            'Parent jurisdiction (alpha-3 code)': 'first',
            'tax_deficit': 'sum',
            'tax_deficit_x_domestic': 'sum',
            'tax_deficit_x_tax_haven': 'sum',
            'tax_deficit_x_non_haven': 'sum'
        }
    ).copy()

    oecd_stratified.reset_index(inplace=True)

    return oecd_stratified.copy()
def get_total_tax_deficits(self, minimum_ETR=0.25)

This method takes the selected minimum ETR as input and relies on the compute_all_tax_deficits, to output a Da- taFrame with (i) the total tax defict of each in-sample country in 2021 EUR and (ii) the sum of these tax defi- cits at the EU-27 and at the whole sample level. It can be considered as an intermediary step towards the fully formatted table displayed on the online simulator (section "Multilateral implementation scenario").

Expand source code
def get_total_tax_deficits(self, minimum_ETR=0.25):
    """
    This method takes the selected minimum ETR as input and relies on the compute_all_tax_deficits, to output a Da-
    taFrame with (i) the total tax defict of each in-sample country in 2021 EUR and (ii) the sum of these tax defi-
    cits at the EU-27 and at the whole sample level. It can be considered as an intermediary step towards the fully
    formatted table displayed on the online simulator (section "Multilateral implementation scenario").
    """

    df = self.compute_all_tax_deficits(minimum_ETR=minimum_ETR)

    df = df[
        ['Parent jurisdiction (whitespaces cleaned)', 'Parent jurisdiction (alpha-3 code)', 'tax_deficit']
    ]

    df.sort_values(
        by='Parent jurisdiction (whitespaces cleaned)',
        inplace=True
    )

    # We compute the sum of total tax deficits at the EU-27 level and for the whole sample
    total_eu = (df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes) * 1 * df['tax_deficit']).sum()
    total_whole_sample = df['tax_deficit'].sum()

    # Possibly suboptimal process to add "Total" lines at the end of the DataFrame
    dict_df = df.to_dict()

    dict_df[df.columns[0]][len(df)] = 'Total - EU27'
    dict_df[df.columns[1]][len(df)] = '..'
    dict_df[df.columns[2]][len(df)] = total_eu

    dict_df[df.columns[0]][len(df) + 1] = 'Total - Whole sample'
    dict_df[df.columns[1]][len(df) + 1] = '..'
    dict_df[df.columns[2]][len(df) + 1] = total_whole_sample

    df = pd.DataFrame.from_dict(dict_df)

    return df.reset_index(drop=True)
def load_clean_data(self, path_to_oecd='/Users/Paul-Emmanuel/Desktop/EU Tax Observatory/4. Own Work/0. Tax Deficit/Platform/tax_deficit_simulator/tax_deficit_simulator/data/oecd.csv', path_to_twz='/Users/Paul-Emmanuel/Desktop/EU Tax Observatory/4. Own Work/0. Tax Deficit/Platform/tax_deficit_simulator/tax_deficit_simulator/data/twz.csv', path_to_twz_domestic='/Users/Paul-Emmanuel/Desktop/EU Tax Observatory/4. Own Work/0. Tax Deficit/Platform/tax_deficit_simulator/tax_deficit_simulator/data/twz_domestic.csv', path_to_twz_CIT='/Users/Paul-Emmanuel/Desktop/EU Tax Observatory/4. Own Work/0. Tax Deficit/Platform/tax_deficit_simulator/tax_deficit_simulator/data/twz_CIT.csv', path_to_preprocessed_mean_wages='/Users/Paul-Emmanuel/Desktop/EU Tax Observatory/4. Own Work/0. Tax Deficit/Platform/tax_deficit_simulator/tax_deficit_simulator/data/preprocessed_mean_wages.csv', path_to_statutory_rates='/Users/Paul-Emmanuel/Desktop/EU Tax Observatory/4. Own Work/0. Tax Deficit/Platform/tax_deficit_simulator/tax_deficit_simulator/data/statutory_rates.csv', inplace=True)

This method allows to load and clean data from 6 different sources:

  • the "oecd.csv" file which was extracted from the OECD's aggregated and anonymized country-by-country repor- ting, considering only the positive profit sample. Figures are in 2016 USD;

  • the "twz.csv" file which was extracted from the Table C4 of the TWZ 2019 online appendix. It presents, for a number of countries, the amounts of profits shifted to tax havens that are re-allocated to them on an ultima- te ownership basis. Figures are in 2016 USD million;

  • the "twz_domestic.csv" file, taken from the outputs of benchmark computations run on Stata. It presents for each country the amount of corporate profits registered locally by domestic MNEs and the effective tax rate to which they are subject. Figures are in 2016 USD billion;

  • the "twz_CIT.csv" file, extracted from Table U1 of the TWZ 2019 online appendix. It presents the corporate in- come tax revenue of each country in 2016 USD billion;

  • the "preprocessed_mean_wages.csv" file, taken from the outputs of substance-based carve-outs run on Stata. For each partner jurisdiction in the OECD's country-by-country data, it provides either a measure or an approxima- tion of the local mean annual earnings in 2016 in current USD. It is built upon ILO data, more details being provided in the methodological section of the Note n°1 of the Observatory on substance-based carve-outs;

  • the "statutory_rates.csv" file that provides, for a number of partner jurisdictions, their 2016 statutory cor- porate income tax rates.

Default paths are used to let the simulator run via the app.py file. If you wish to use the tax_deficit_calcula- tor package in another context, you can save the data locally and give the method paths to the data files. The possibility to load the files from an online host instead will soon be implemented.

Expand source code
def load_clean_data(
    self,
    path_to_oecd=path_to_oecd,
    path_to_twz=path_to_twz,
    path_to_twz_domestic=path_to_twz_domestic,
    path_to_twz_CIT=path_to_twz_CIT,
    path_to_preprocessed_mean_wages=path_to_preprocessed_mean_wages,
    path_to_statutory_rates=path_to_statutory_rates,
    inplace=True
):
    """
    This method allows to load and clean data from 6 different sources:

    - the "oecd.csv" file which was extracted from the OECD's aggregated and anonymized country-by-country repor-
    ting, considering only the positive profit sample. Figures are in 2016 USD;

    - the "twz.csv" file which was extracted from the Table C4 of the TWZ 2019 online appendix. It presents, for
    a number of countries, the amounts of profits shifted to tax havens that are re-allocated to them on an ultima-
    te ownership basis. Figures are in 2016 USD million;

    - the "twz_domestic.csv" file, taken from the outputs of benchmark computations run on Stata. It presents for
    each country the amount of corporate profits registered locally by domestic MNEs and the effective tax rate to
    which they are subject. Figures are in 2016 USD billion;

    - the "twz_CIT.csv" file, extracted from Table U1 of the TWZ 2019 online appendix. It presents the corporate in-
    come tax revenue of each country in 2016 USD billion;

    - the "preprocessed_mean_wages.csv" file, taken from the outputs of substance-based carve-outs run on Stata. For
    each partner jurisdiction in the OECD's country-by-country data, it provides either a measure or an approxima-
    tion of the local mean annual earnings in 2016 in current USD. It is built upon ILO data, more details being
    provided in the methodological section of the Note n°1 of the Observatory on substance-based carve-outs;

    - the "statutory_rates.csv" file that provides, for a number of partner jurisdictions, their 2016 statutory cor-
    porate income tax rates.

    Default paths are used to let the simulator run via the app.py file. If you wish to use the tax_deficit_calcula-
    tor package in another context, you can save the data locally and give the method paths to the data files. The
    possibility to load the files from an online host instead will soon be implemented.
    """
    try:

        # We try to read the files from the provided paths
        oecd = pd.read_csv(path_to_oecd)
        twz = pd.read_csv(path_to_twz, delimiter=';')
        twz_domestic = pd.read_csv(path_to_twz_domestic, delimiter=';')
        twz_CIT = pd.read_csv(path_to_twz_CIT, delimiter=';')
        preprocessed_mean_wages = pd.read_csv(path_to_preprocessed_mean_wages, delimiter=';')
        statutory_rates = pd.read_csv(path_to_statutory_rates, delimiter=';')

    except FileNotFoundError:

        # If at least one of the files is not found
        raise Exception('Are you sure these are the right paths for the source files?')

    # --- Cleaning the OECD data

    # We drop a few irrelevant columns from country-by-country data
    oecd.drop(
        columns=['PAN', 'Grouping', 'Flag Codes', 'Flags', 'YEA', 'Year'],
        inplace=True
    )

    # We reshape the DataFrame from a long to a wide dataset
    oecd = oecd.pivot(
        index=['COU', 'Ultimate Parent Jurisdiction', 'JUR', 'Partner Jurisdiction'],
        columns='Variable',
        values='Value'
    ).reset_index()

    # We rename some columns to match the code that has been written before modifying how OECD data are loaded
    oecd.rename(
        columns={
            'COU': 'Parent jurisdiction (alpha-3 code)',
            'Ultimate Parent Jurisdiction': 'Parent jurisdiction (whitespaces cleaned)',
            'JUR': 'Partner jurisdiction (alpha-3 code)',
            'Partner Jurisdiction': 'Partner jurisdiction (whitespaces cleaned)'
        },
        inplace=True
    )

    # Thanks to a function defined in utils.py, we rename the "Foreign Jurisdictions Total" field for all countries
    # that only report a domestic / foreign breakdown in their CbCR
    oecd['Partner jurisdiction (whitespaces cleaned)'] = oecd.apply(rename_partner_jurisdictions, axis=1)

    # We eliminate stateless entities and the "Foreign Jurisdictions Total" filds
    oecd = oecd[
        ~oecd['Partner jurisdiction (whitespaces cleaned)'].isin(['Foreign Jurisdictions Total', 'Stateless'])
    ].copy()

    # We replace missing "Income Tax Paid" values by the corresponding "Income Tax Accrued" values
    # (Some missing values remain even after this edit)
    oecd['Income Tax Paid (on Cash Basis)'] = oecd.apply(
        (
            lambda row: row['Income Tax Paid (on Cash Basis)']
            if not np.isnan(row['Income Tax Paid (on Cash Basis)'])
            else row['Income Tax Accrued - Current Year']
        ),
        axis=1
    )

    # We clean the statutory corporate income tax rate dataset
    statutory_rates['statrate'] = statutory_rates['statrate'].map(
        lambda x: x.replace(',', '.') if isinstance(x, str) else x
    ).astype(float)

    # And we merge it with country-by-country data, on partner jurisdiction alpha-3 codes
    oecd = oecd.merge(
        statutory_rates,
        how='left',
        left_on='Partner jurisdiction (alpha-3 code)', right_on='partner'
    )

    oecd.drop(columns=['partner'], inplace=True)

    # We impute missing "Income Tax Paid" values assuming that pre-tax profits are taxed at the local statutory rate
    oecd['Income Tax Paid (on Cash Basis)'] = oecd.apply(
        (
            lambda row: row['Income Tax Paid (on Cash Basis)']
            if not np.isnan(row['Income Tax Paid (on Cash Basis)'])
            else row['Profit (Loss) before Income Tax'] * row['statrate']
        ),
        axis=1
    )

    oecd.drop(columns=['statrate'], inplace=True)

    # ETR computation (using tax paid as the numerator)
    oecd['ETR'] = oecd['Income Tax Paid (on Cash Basis)'] / oecd['Profit (Loss) before Income Tax']
    oecd['ETR'] = oecd['ETR'].map(lambda x: 0 if x < 0 else x)

    # Adding an indicator variable for domestic profits (rows with the same parent and partner jurisdiction)
    oecd['Is domestic?'] = oecd.apply(
        lambda row: row['Parent jurisdiction (alpha-3 code)'] == row['Partner jurisdiction (alpha-3 code)'],
        axis=1
    ) * 1

    # We add an indicator variable that takes value 1 if and only if the partner is a tax haven
    oecd['Is partner jurisdiction a tax haven?'] = oecd['Partner jurisdiction (alpha-3 code)'].isin(
        tax_haven_country_codes
    ) * 1

    # Adding another indicator variable that takes value 1 if and only if the partner is not a tax haven
    oecd['Is partner jurisdiction a non-haven?'] = 1 - oecd['Is partner jurisdiction a tax haven?']

    # This indicator variable is used specifically for the simulation of carve-outs; it takes value 1 if and only if
    # the partner jurisdiction is not the parent jurisdiction, not a tax haven and not a regional aggregate
    oecd['Is partner jurisdiction a non-haven? - CO'] = oecd.apply(
        (
            lambda row: 0
            if (
                row['Parent jurisdiction (alpha-3 code)'] in COUNTRIES_WITH_MINIMUM_REPORTING
                and row['Partner jurisdiction (alpha-3 code)'] == 'FJT'
            ) or (
                row['Parent jurisdiction (alpha-3 code)'] in COUNTRIES_WITH_CONTINENTAL_REPORTING
                and row['Partner jurisdiction (alpha-3 code)'] in ['GRPS', 'AFRIC', 'AMER', 'ASIAT', 'EUROP']
            ) or (
                row['Is domestic?'] == 1
            )
            else row['Is partner jurisdiction a non-haven?']
        ),
        axis=1
    )

    # This indicator variable, used specifically for the simulation of carve-outs, takes value 1 if and only if the
    # partner is a regional aggregate
    oecd['Is partner jurisdiction an aggregate partner? - CO'] = np.logical_and(
        oecd['Is domestic?'] == 0,
        np.logical_and(
            oecd['Is partner jurisdiction a non-haven? - CO'] == 0,
            oecd['Is partner jurisdiction a tax haven?'] == 0
        )
    ) * 1

    # Thanks to a small function imported from utils.py, we manage the slightly problematic overlap between the
    # various indicator variables ("Is domestic?" sort of gets the priority over the others)
    oecd['Is partner jurisdiction a tax haven?'] = oecd.apply(
        lambda row: manage_overlap_with_domestic(row, 'haven'),
        axis=1
    )

    oecd['Is partner jurisdiction a non-haven?'] = oecd.apply(
        lambda row: manage_overlap_with_domestic(row, 'non-haven'),
        axis=1
    )

    # We need some more work on the data if we want to simulate substance-based carve-outs
    if self.carve_outs:

        # We merge earnings data with country-by-country data on partner jurisdiction codes
        oecd = oecd.merge(
            preprocessed_mean_wages[['partner2', 'earn']],
            how='left',
            left_on='Partner jurisdiction (alpha-3 code)', right_on='partner2'
        )

        oecd.drop(columns=['partner2'], inplace=True)

        oecd.rename(
            columns={
                'earn': 'ANNUAL_VALUE'
            },
            inplace=True
        )

        # We clean the mean annual earnings column
        oecd['ANNUAL_VALUE'] = oecd['ANNUAL_VALUE'].map(
            lambda x: x.replace(',', '.') if isinstance(x, str) else x
        ).astype(float)

        # We deduce the payroll proxy from the number of employees and from mean annual earnings
        oecd['PAYROLL'] = oecd['Number of Employees'] * oecd['ANNUAL_VALUE'] * (1 + self.payroll_premium / 100)

        # We compute substance-based carve-outs from both payroll and tangible assets
        oecd['CARVE_OUT'] = self.carve_out_rate * (
            oecd['PAYROLL'] + oecd['Tangible Assets other than Cash and Cash Equivalents'] * self.assets_multiplier
        )

        # This column will contain slightly modified carve-outs, carve-outs being replaced by pre-tax profits
        # wherever the former exceeds the latter
        oecd['CARVE_OUT_TEMP'] = oecd.apply(
            (
                lambda row: row['CARVE_OUT'] if row['Profit (Loss) before Income Tax'] > row['CARVE_OUT']
                or np.isnan(row['CARVE_OUT'])
                else row['Profit (Loss) before Income Tax']
            ),
            axis=1
        )

        # We exclude rows with missing carve-out values in a temporary DataFrame
        oecd_temp = oecd[
            ~np.logical_or(
                oecd['PAYROLL'].isnull(),
                oecd['Tangible Assets other than Cash and Cash Equivalents'].isnull()
            )
        ].copy()

        # We compute the average reduction in non-haven pre-tax profits due to carve-outs
        self.avg_carve_out_impact_non_haven = (
            oecd_temp[
                oecd_temp['Is partner jurisdiction a non-haven? - CO'] == 1
            ]['CARVE_OUT_TEMP'].sum() /
            oecd_temp[
                oecd_temp['Is partner jurisdiction a non-haven? - CO'] == 1
            ]['Profit (Loss) before Income Tax'].sum()
        )

        # We do the same for pre-tax profits booked in tax havens, domestically and in aggregate partners
        self.avg_carve_out_impact_tax_haven = (
            oecd_temp[oecd_temp['Is partner jurisdiction a tax haven?'] == 1]['CARVE_OUT_TEMP'].sum() /
            oecd_temp[
                oecd_temp['Is partner jurisdiction a tax haven?'] == 1
            ]['Profit (Loss) before Income Tax'].sum()
        )
        self.avg_carve_out_impact_domestic = (
            oecd_temp[oecd_temp['Is domestic?'] == 1]['CARVE_OUT_TEMP'].sum() /
            oecd_temp[oecd_temp['Is domestic?'] == 1]['Profit (Loss) before Income Tax'].sum()
        )
        self.avg_carve_out_impact_aggregate = (
            oecd_temp[
                oecd_temp['Is partner jurisdiction an aggregate partner? - CO'] == 1
            ]['CARVE_OUT_TEMP'].sum() /
            oecd_temp[
                oecd_temp['Is partner jurisdiction an aggregate partner? - CO'] == 1
            ]['Profit (Loss) before Income Tax'].sum()
        )

        # We impute missing carve-out values based on these average reductions in pre-tax profits
        oecd['CARVE_OUT'] = oecd.apply(
            lambda row: impute_missing_carve_out_values(
                row,
                avg_carve_out_impact_domestic=self.avg_carve_out_impact_domestic,
                avg_carve_out_impact_tax_haven=self.avg_carve_out_impact_tax_haven,
                avg_carve_out_impact_non_haven=self.avg_carve_out_impact_non_haven,
                avg_carve_out_impact_aggregate=self.avg_carve_out_impact_aggregate
            ),
            axis=1
        )

        # Some missing values remain whenever profits before tax are missing
        oecd = oecd[~oecd['CARVE_OUT'].isnull()].copy()

        # We remove substance-based carve-outs from pre-tax profits
        oecd['Profit (Loss) before Income Tax'] = oecd.apply(
            (
                lambda row: row['Profit (Loss) before Income Tax'] - row['CARVE_OUT']
                if row['Profit (Loss) before Income Tax'] - row['CARVE_OUT'] >= 0
                else 0
            ),
            axis=1
        )

    # --- Cleaning the TWZ tax haven profits data

    # Adding an indicator variable for OECD reporting - We do not consider the Swedish CbCR
    twz['Is parent in OECD data?'] = twz['Alpha-3 country code'].map(
        lambda x: x in oecd['Parent jurisdiction (alpha-3 code)'].unique() if x != 'SWE' else False
    ) * 1

    # We reformat numeric columns - Resulting figures are expressed in 2016 USD
    for column_name in ['Profits in all tax havens', 'Profits in all tax havens (positive only)']:
        twz[column_name] = twz[column_name].map(lambda x: x.replace(',', '.'))
        twz[column_name] = twz[column_name].astype(float) * 1000000

        if self.carve_outs:
            # If we want to simulate carve-outs, we need to downgrade TWZ tax haven profits by the average reduction
            # due to carve-outs that is observed for tax haven profits in the OECD data
            twz[column_name] *= (1 - self.avg_carve_out_impact_tax_haven)
        else:
            continue

    # We filter out countries with 0 profits in tax havens
    twz = twz[twz['Profits in all tax havens (positive only)'] > 0].copy()

    # --- Cleaning the TWZ domestic profits data

    # Reformatting the profits column - Resulting figures are expressed in 2016 USD
    twz_domestic['Domestic profits'] = twz_domestic['Domestic profits']\
        .map(lambda x: x.replace(',', '.'))\
        .astype(float) * 1000000000

    # Reformatting the ETR column
    twz_domestic['Domestic ETR'] = twz_domestic['Domestic ETR'].map(lambda x: x.replace(',', '.')).astype(float)

    if self.carve_outs:
        # If we want to simulate carve-outs, we need to downgrade TWZ domestic profits by the average reduction due
        # to carve-outs that is observed for domestic profits in the OECD data
        twz_domestic['Domestic profits'] *= (1 - self.avg_carve_out_impact_domestic)

    # --- Cleaning the TWZ CIT revenue data

    # Reformatting the CIT revenue column - Resulting figures are expressed in 2016 USD
    twz_CIT['CIT revenue'] = twz_CIT['CIT revenue']\
        .map(lambda x: x.replace(',', '.'))\
        .astype(float) * 1000000000

    if inplace:
        self.oecd = oecd.copy()
        self.twz = twz.copy()
        self.twz_domestic = twz_domestic.copy()
        self.twz_CIT = twz_CIT.copy()
        self.mean_wages = preprocessed_mean_wages.copy()

    else:

        if self.carve_outs:
            return oecd.copy(), twz.copy(), twz_domestic.copy(), twz_CIT.copy(), preprocessed_mean_wages.copy()

        else:
            return oecd.copy(), twz.copy(), twz_domestic.copy(), twz_CIT.copy()
def output_intermediary_scenario_gain_formatted(self, minimum_ETR=0.25)

This method is used in the "app.py" file, which lies behind the Streamlit simulator. It allows to produce the table presented on the "Partial cooperation scenario" page. It takes as input the selected minimum ETR and then, widely relies on the compute_intermediary_scenario_gain method defined above. It mostly consists in a series of formatting steps to make the table more readable and understandable.

Expand source code
def output_intermediary_scenario_gain_formatted(self, minimum_ETR=0.25):
    """
    This method is used in the "app.py" file, which lies behind the Streamlit simulator. It allows to produce the
    table presented on the "Partial cooperation scenario" page. It takes as input the selected minimum ETR and then,
    widely relies on the compute_intermediary_scenario_gain method defined above. It mostly consists in a series of
    formatting steps to make the table more readable and understandable.
    """

    # We compute corporate tax revenue gains from the partial cooperation scenario
    df = self.compute_intermediary_scenario_gain(minimum_ETR=minimum_ETR)

    # We eliminate irrelevant columns
    df.drop(columns=['tax_deficit', 'From foreign MNEs'], inplace=True)

    # We reformat figures with a thousand separator and a 0-decimal rounding
    df['Total'] = df['Total'].map('{:,.0f}'.format)

    # We rename columns to make them more explicit
    df.rename(
        columns={
            'Parent jurisdiction (whitespaces cleaned)': 'Taxing country',
            'Total': 'Collectible tax deficit (€m)'
        },
        inplace=True
    )

    # We add quotation marks to the "Europe" and "Other Europe" fields
    df['Taxing country'] = df['Taxing country'].map(
        lambda x: x if x not in ['Europe', 'Other Europe'] else f'"{x}"'
    )

    return df.copy()
def output_tax_deficits_formatted(self, minimum_ETR=0.25)

This method is used in the "app.py" file, which underlies the Streamlit simulator. It is used to produce the table on the "Multilateral implementation scenario" page. It takes as input the selected minimum ETR and widely relies on the get_total_tax_deficits method defined above. It mostly consists in a series of formatting steps.

Expand source code
def output_tax_deficits_formatted(self, minimum_ETR=0.25):
    """
    This method is used in the "app.py" file, which underlies the Streamlit simulator. It is used to produce the
    table on the "Multilateral implementation scenario" page. It takes as input the selected minimum ETR and widely
    relies on the get_total_tax_deficits method defined above. It mostly consists in a series of formatting steps.
    """

    # We build the unformatted results table thanks to the get_total_tax_deficits method
    df = self.get_total_tax_deficits(minimum_ETR=minimum_ETR)

    # We only want to include certain countries in the output table:
    # - all the EU-27 countries that are included in our sample (4 unfortunately missing for now)
    # - most of the OECD-reporting countries, excluding only Singapore and Bermuda

    # We first build the list of OECD-reporting countries, excluding Singapore and Bermuda
    oecd_reporting_countries = self.oecd['Parent jurisdiction (alpha-3 code)'].unique()
    oecd_reporting_countries = [
        country_code for country_code in oecd_reporting_countries if country_code not in ['SGP', 'BMU']
    ]

    # From this list, we build the relevant boolean indexing mask that corresponds to our filtering choice
    mask = np.logical_or(
        df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes),
        df['Parent jurisdiction (alpha-3 code)'].isin(oecd_reporting_countries)
    )

    df = df[mask].copy()

    # We sort values by the name of the parent jurisdiction, in the alphabetical order
    df.sort_values(
        by='Parent jurisdiction (whitespaces cleaned)',
        inplace=True
    )

    df.reset_index(drop=True, inplace=True)

    # We convert 2021 EUR figures into 2021 million EUR ones
    df['tax_deficit'] = df['tax_deficit'] / 10**6

    # Again, the same possibly sub-optimal process to add the "Total" lines
    dict_df = df.to_dict()

    dict_df[df.columns[0]][len(df)] = 'Total - EU27'
    dict_df[df.columns[1]][len(df)] = '..'
    dict_df[df.columns[2]][len(df)] = df[
        df['Parent jurisdiction (alpha-3 code)'].isin(eu_27_country_codes)
    ]['tax_deficit'].sum()

    dict_df[df.columns[0]][len(df) + 1] = 'Total - Whole sample'
    dict_df[df.columns[1]][len(df) + 1] = '..'
    dict_df[df.columns[2]][len(df) + 1] = df['tax_deficit'].sum()

    df = pd.DataFrame.from_dict(dict_df)

    # We drop country codes
    df.drop(columns=['Parent jurisdiction (alpha-3 code)'], inplace=True)

    # And we eventually reformat figures with a thousand separator and a 0-decimal rounding
    df['tax_deficit'] = df['tax_deficit'].map('{:,.0f}'.format)

    # We rename columns
    df.rename(
        columns={
            'Parent jurisdiction (whitespaces cleaned)': 'Headquarter country',
            'tax_deficit': 'Collectible tax deficit (€m)'
        },
        inplace=True
    )

    return df.copy()
def output_unilateral_scenario_gain_formatted(self, country, minimum_ETR=0.25)

This method is used in the "app.py" file, which lies behind the Streamlit simulator. It allows to produce the table presented on the "Unilateral implementation scenario" page. It takes as input the selected minimum ETR and the name of the country assumed to unilaterally implement the tax deficit collection. Then, it widely relies on the compute_unilateral_scenario_gain method defined above and mostly consists in a series of formatting steps to make the table more readable and understandable.

Expand source code
def output_unilateral_scenario_gain_formatted(self, country, minimum_ETR=0.25):
    """
    This method is used in the "app.py" file, which lies behind the Streamlit simulator. It allows to produce the
    table presented on the "Unilateral implementation scenario" page. It takes as input the selected minimum ETR and
    the name of the country assumed to unilaterally implement the tax deficit collection. Then, it widely relies on
    the compute_unilateral_scenario_gain method defined above and mostly consists in a series of formatting steps to
    make the table more readable and understandable.
    """

    # We compute the gains from the unilateral implementation of the tax deficit collection for the taxing country
    df = self.compute_unilateral_scenario_gain(
        country=country,
        minimum_ETR=minimum_ETR
    )

    # We convert the numeric outputs into 2021 million EUR
    df[f'Collectible tax deficit for {country}'] = df[f'Collectible tax deficit for {country}'] / 10**6

    # We reformat figures with two decimals and a thousand separator
    df[f'Collectible tax deficit for {country}'] = \
        df[f'Collectible tax deficit for {country}'].map('{:,.2f}'.format)

    # We rename columns in accordance
    df.rename(
        columns={
            f'Collectible tax deficit for {country}': f'Collectible tax deficit for {country} (€m)',
            'Parent jurisdiction (whitespaces cleaned)': 'Headquarter country'
        },
        inplace=True
    )

    return df.copy()