_main.Rmd

---
title: "Revealed comparative advantage and network centrality"
author: "Sergej Kaiser"
date: "20/01/2017"
header-includes:
  - \setlength{\parindent}{4em}
  - \setlength{\parskip}{0em}
site: bookdown::bookdown_site
output: bookdown::gitbook:
  split_by: chapter
  # html_document:
  #   toc: true
  #   number_sections: true
  #   fig_caption: true
  #   self_contained: yes
    

colorlinks: yes

fontsize: 12pt
bibliography: thesis.bib
css: style.css
csl: chicago-author-date.csl
link-citations: yes
---
<img src="A4-Cover_FEB_e_metSleutel_def_2.pdf" alt="Cover"  width="4200" height="4200">
<div style="line-height: 1.5em;">
```{r setup1, include=F,message=F,warning=F,echo=F}
library(tidyverse)
library(forcats)
library(haven)
library(ggrepel)
library(purrr)
library(htmlTable)
library(grid)
library(gridExtra)
library(knitr)
library(lazyeval)
library(forcats)
library(boot)

knitr::opts_chunk$set(
	error = FALSE,
	echo = F,
	fig.height = 6,
	fig.path = "figure/",
	fig.width = 10,
	message = FALSE,
	warning = FALSE,
	cache.path = "cache",
	dev = "svg"
)
```

<!--chapter:end:index.rmd-->
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
  TeX: { equationNumbers: { autoNumber: "AMS" } }
});
</script>

# Abbreviations {-}

| Abbrev.| Description                                             | 
|--------|---------------------------------------------------------|
| ANBERD | Analytical Business Enterprise Research and Development |
| BVAT   | Backward value-added trade                              |
| F.O.B. | Free on Board                                           |
| FVAT   | Forward value-added trade                               |
| I.I.D. | Independent and Identical Distributed                   |
| ISIC   | International Standard Industry Classification          |
| IV     | Instrumental Variable                                   |
| OECD   | Organization for Economic Co-operation and Development  |
| OLS    | Ordinary Least Squares                                  |
| MI     | Multiple Imputation                                     |
| PMM    | Predictive Mean Matching                                |
| RCA    | Revealed Comparative Advantage                          |
| STAN   | Structural Analysis Database                            |
| TiVA   | Trade in Value-Added                                    |
| WiOD   | World Input-Output Database                             |
| WTO    | World Trade Organization                                |
| VAT    | Value-Added Trade                                       |

#Introduction

In the last four decades, international trade has become increasingly characterized by international production fragmentation (IPF). 
International product fragmentation means that the production of goods is split across several countries. 
A large body of literature has documented the large extent of international product fragmentation.
^[Prominent papers in this literature are @Johnson2012, @baldwin2014, @Timmer2015. 
@Johnson2012 measure the extent of international product fragmentation between 1970-2010 with an indicator of value-added trade to gross exports. 
Specifically, they show that international product fragmentation grows in the 1970s, is unchanged over the 1980s and grows at an extended speed since the 1990s.]
In particular, @Timmer2015 documented that the foreign content incorporated in goods increased by 85  perc. between 1995 and 2005.^[
@Timmer2015 analyzed 560 final manufacturing products in 40 countries of which 27 were European countries.] <p>
Before international product fragmentation, goods were mainly produced within one country and therefore the value of bilateral trade flows (gross exports) was a good measure of the contributions of domestic inputs and of domestic production factors. 
However, with international product fragmentation goods are produced in several stages in multiple countries. 
Consequently, gross exports also include foreign content and may not correctly capture the contribution of domestic production factors to final output. 
Indeed, it has been documented in a large literature that the distribution of who produces for whom in the world economy differs strongly when we look at value-added trade instead of gross exports.^[Some references in this literature  are  @daudin2011, @Koopman, @wang2013, @baldwin2014.] 
To give an example, @Koopman show that the trade surplus of China with the USA is reduced by 41  perc. based on value-added trade compared to gross exports. <p>
The international production fragmentation literature documented that the volume of trade flows changes significantly when we focus on the inferred domestic content instead of the observed gross exports. 
Consequently, the pattern of specialization of countries may also be sensitive to looking at value-added trade instead of gross exports. 
The seminal work of Ricardo showed that a country has a comparative advantage in producing and exporting those goods for which it has a relatively lower production cost compared to its trade partners.
^[Ricardo's model studied the case of two countries and two goods. His results were extended to the case of a continuum of goods by @dornbusch and to any number of countries by @eaton.]
Gross exports include foreign contributions to the relative production costs.
The pattern of revealed comparative advantage on the basis of gross exports may differ from the pattern of revealed comparative advantage based on the domestic content of trade. 
The first objective of this thesis is to evaluate whether the pattern of revealed comparative advantage changes significantly, when it is computed on the basis of domestic factors only.<p>
A theoretically sound approach is necessary to analyze the pattern of revealed comparative advantage. 
The seminal contribution of @costinot showed that the ranking of relative exports maps to the underlying ranking of relative productivity in a setup with multiple countries and multiple industries. 
Further, the authors showed that the pattern of revealed comparative advantage could be retrieved from the bilateral trade matrix.
Specifically, they showed that a two-step estimation procedure allows obtaining the pattern of revealed comparative advantage.
<p> 
To compute the pattern of revealed comparative advantage, I use two sources of value-added trade and gross exports data, namely the WiOD and the TiVA.
^[Details about the WiOD are provided in @Timmer2015. 
The TiVA database is a joint initiative of the WTO and the OECD. Details are provided in @tiva2.]
The TiVA data includes 61 countries of the following groups: the OECD, the EU 28, the G20 and most of East and South-East Asian countries and several South American countries. 
The WiOD includes the countries of the EU 27 and 14 other major economies.
Both datasets provide information on the primary, manufacturing and service industries according to the ISIC Rev. 3.1 industry classification.
I assess the degree of divergence of revealed comparative advantage rankings obtained on the basis of gross exports and of value-added trade with two measures of association: Spearman's $\rho$ and Kendall's $\tau$. <p>
My main results are as follows. 
Regarding the stability of the ranking of revealed comparative advantage,  I find that the stability depends on the way I define value-added trade. 
There are two main definitions of value-added trade. 
Backward value-added trade extracts the contribution of the domestic supply chain to gross exports.
Forward value-added trade extracts the domestic factor content of trade. 
I find that the pattern of specialization is substantially unchanged when I compare backward value-added trade to gross exports.
But there are divergences when I compare gross exports to forward value-added trade.<p>
I interpret my results as follows. The similarity of the ranking obtained for gross exports, and backward value-added trade indicates that the foreign content does not substantially change the pattern of relative production costs. 
The contribution of the domestic supply chain is sufficient to predict the ranking of comparative advantage.
The dissimilarity of the ranking for gross exports and forward value-added trade trade indicates that comparative advantage according to the factor content of trade differs from comparative advantage according to the domestic supply chain. <p>
Overall, I conclude that the pattern of comparative advantage based on gross exports captures the domestic content of trade. 
However, it does not capture very well the pattern of comparative advantage associated with the factor content of trade.<p> 
The second objective of this thesis is to assess the contribution of different countries to the propagation of shocks in the international trade network. 
A large literature has addressed this question on the basis of gross trade flows, but given the extent of international production fragmentation, it is important to construct this analysis on value-added trade rather than gross exports. 
The shift of focus is necessary because I want to correctly capture how shocks to domestic production factors diffuse through the international trade network. <p>
A sound approach is necessary to assess the centrality of countries in the propagation of shocks through the network. 
Different metrics have been proposed to assess network centrality.^[Some references in this literature are @Katz1953, @bonacich1972, @freeman1978 and @jackson96.] 
I follow the approach of @acemoglu2012, who have shown that eigenvector is the theoretically sound approach.
^[@acemoglu2012 use the notion of influence vector constructed on the basis of the alpha-centrality (see [@bonacich2001evcent]). Alpha-centrality is very closely linked to eigenvector-centrality.]
@acemoglu2012 analyze the centrality of different industries in the US economy.
The unit eigenvector ranks every node according to its centrality in the national network. 
Elements of the eigenvector capture the contribution of each node when an extra dollar is added to the network [@spizzirri2011]. <p>
I adapt this approach to my dataset that contains many countries and many industries. 
First, to illustrate the concepts of the international trade network and eigenvector centrality, I compute the unit eigenvector for total value-added, where each country is a node. 
Second, to analyse the variability of the ranking of countries according to their eigenvector centrality, I compute the unit eigenvector separately for each sector, where each country is a node and I obtain as many eigenvectors as there are sectors.
<p> 
My results are as follows. 
First, I find that the ranking of countries according to eigenvector centrality highlights a core-periphery structure. 
The core consits of seven countries, of whom four are European economies: France, Germany, Great Britain,  and Italy.
 The remaining countries of the core group are China, Japan, and the USA. 
Second, I find that the rank positions of the countries in the core vary noticeably across sectors.
Third, I find that for any pair of countries, the ranking of relative eigenvector centrality maps into their pattern of revealed comparative advantage.
This result is new in this literature.
 <p> 
I interpret the findings as follows.
First, the variation of the rank positions of the most central countries at the country-industry level may indicate that countries' sectoral ability determines their network centrality.  
Second, the mapping of relative eigenvector centrality to the pattern of comparative advantage indicates that relative centrality may pick up relative sectoral ability.
According to the theory of comparative advantage, an industry which is relatively more productive will contribute relatively more to the world production.  
According to network theory, an industry with relative higher network centrality is relatively more important in terms of how many dollars it contributes to the total value of the network.
Consequently, it may be expected that both measures capture the ranking of relative sectoral ability of any given pair of countries.<p>
This thesis contributes to the literature that studies the pattern of revealed comparative advantage. 
In this literature, @costinot derived a methodology to obtain the pattern of revealed comparative advantage and computed it on a dataset with 13 manufacturing industries in 21 countries for 1997. 
The characterization of structural revealed comparative advantage was extended to a larger dataset with 1018 products, 20 exporting countries and 70 importing countries in the period from 1995 till 2010 by @leromain2014. 
Both papers computed the revealed comparative advantage indicator on the basis of gross exports. 
I contribute to this literature by quantifying the degree of divergence between the pattern of revealed comparative advantage based on value-added trade and on gross exports for fifty-six countries in 2005. 
I further contribute by showing that the pattern of revealed comparative advantage obtained by implementing the methodology of [@costinot] can be alternatively obtained by constructing the ranking of relative sectoral centrality for any pair of countries. 
The latter approach is simpler because the ranking is directly obtained from value-added trade and does not rely on estimation.  <p>
This thesis is not the first to investigate the impact of international product fragmentation on countries' specialization in world trade. 
Previous studies used the ad-hoc Balassa-Samuelson measure of revealed comparative advantage. 
In particular, two recent contributions compared the pattern of revealed comparative advantage based on value-added trade in line with the factor content of trade (forward value-added) and gross exports.  
@wang2013 examined the time profile of the pattern of revealed comparative advantage of the USA and China in the period from 1995 till 2011 for one manufacturing industry and one service industry. 
@Koopman compared the pattern of specialization for one manufacturing industry and one service industry in 2004.
Both studies found that the pattern of specialization was significantly altered by switching to value-added trade.  
On the other hand, @baldwin2014 found that the pattern of specialization for eight manufacturing industries in thirty-eight countries was not altered significantly by changing from gross exports to backward value-added trade. 
I contribute to this literature by switching from the ad-hoc measure of revealed comparative advantage to the structural measure derived by @costinot. <p>
The thesis is organised as follows. 
In chapter two I present the methodology to retrieve the revealed comparative advantage from trade flows. 
Moreover, I discuss the results of analyzing the degree of divergence of revealed comparative advantage based on value-added trade and gross exports. 
In chapter three I outline the network theory of trade and the concept of eigenvector centrality.
Further, I present the results on the contribution of different countries to shock diffusion on the basis of value-added trade. 
I also compare the ranking of relative network centrality and the ranking of revealed comparative advantage. 
In chapter four I conclude. 

<!--chapter:end:01_thesis_ch1.rmd-->

#Structural revealed comparative advantage for value-added trade

In this chapter I summarize the methodology of @costinot (hereafter: CDK) to compare the ranking of revealed comparative advantage (RCA) on the basis of backward value-added trade (BVAT), forward value-added trade (FVAT) and gross exports (EXGR). 
Further, I discuss my results regarding the degree of divergence of the RCA rankings across three types of indicators.

## The pattern of specialization

The theoretical framework of CDK is set up as follows. 
The world economy consists of  $i = 1, \dots, n$ countries and $k = 1, \dots , K$ sectors or goods.
Labour is the only factor of production, and it is perfectly mobile across sectors and immobile between countries. 
$L_i$ denotes the number of workers in each country $i$  and $w_i$ denotes their wage. <p>
Each good is produced with a constant returns to scale technology. 
Further, for each good, there are infinitely many varieties $\omega \in \Omega$. 
The productive efficiency  $z^k_i(\omega)$ describes how many units of the variety $\omega$ of good $k$ can be produced with one unit of labor in country $i$. 
^[CDK showed in the working paper version that the close link between trade flows and productivity differences holds as well under weaker assumptions as long as the technology differences across countries are small.] The assumption that the productivity distribution is Frechet follows the seminal article of @eaton.
^[@eaton2 outline a microfoundation for the choice of the Frechet distribution. 
The authors show that under the assumption, that the production technology available is the result of a series of successful improvements in technology drawn from the Pareto distribution and the additional assumption that only the best technology is effectively used in production, the production of these "best" draws is Frechet.]
 <p>
The production technology differences across countries and sectors are completely described by the two parameters $z^k_{i}$ and $\theta$.
The first parameter $z^k_i$ captures the stock of technology in sector $k$ and  country $i$. 
It corresponds to the expected productivity in the country and sector.
Variation in $z^k_i$ determines the cross-country differences in relative labor productivity.
 The second parameter $\theta$ measures the inverse of within-sector heterogeneity. 
 It reflects the dispersion in production know-how across varieties. 
The dispersion parameter is assumed to be the same across sectors and countries. <p> 
Formally, for every unit of a good shipped from sector $k$ in country $i$ to $j$ only $1/d^k_{i,j} \leq 1$ units arrive, where $d^k_{i,j}$
denotes the trade cost of sector $k$ in country $i$ exporting to country $j$.  <p>
An auxiliary assumption about the trade cost is that the triangle inequality holds. 
This means that for any third country $j$, it is more expensive to indirectly import a good from country $i$ through country $i'$ than to directly import the good.  <p>
The market structure is perfect competition: all countries can produce all varieties at different cost. 
Consumers seek the lowest price of each variety of a good around the world.
The price for each variety of a good is equal to the cost of production ${w_i}/{z_i^k}$  multiplied by the transport cost $d^k_{i,j}$.
Thus, the unit cost is $c^k_{i,j}=\frac{d^k_{i,j} w_i}{z_i^k}$, which is assumed to be greater than zero in the following.
   <p> 
The structure of the consumer preferences is as follows.  
The upper-tier utility function is a Cobb-Douglas function.
The lower-tier utility function is a constant elasticity of substitution utility function. Demand for each variety is:
\begin{align*}
x^k_{j}(\omega) &= \left[\frac{p^k_{j}(\omega) } {p^k_{j} } \right]^{1-\sigma_{j}^k} \alpha^k_j w_j L_j \\ 
\text{where}\quad 0 & \leq \alpha_j^k \leq 1,\, \sigma_{j}^k < 1+\theta, \quad \text{and} \quad 
p^k_{j}=\left[ \sum_{\omega'  \in \Omega} p_j^k (\omega')^{1-\sigma_j^k} \right]^{ 1 / ( {1-\sigma_j^k} )}  
\end{align*} 
The parameter $\alpha^k_j$ denotes the expenditure shares of country $j$ on varieties of sector $k$ and $\sigma^k_j$ denotes the elasticity of substitution between the varieties. 
The restriction on $\sigma^k_j$ is a technical assumption. It guarantees the existence of a well-defined CES price index $p_j^k$.  <p>
The previous assumptions imply that across the full set of varieties, in relative terms
the ratio of exports of country $i$ and $i'$ to country $j$ in sector $k$ and $k'$ is determined by the ratio of expected sectoral productivities and the ratio of relative sectoral trade cost.
\begin{equation} \label{eq:1}\ln \left( \frac{x_{i,j}^k x^{k'}_{i'j}}{x_{i,j}^{k'} x^{k}_{i'j}} \right)= \theta \ln \left( \frac{z_{i}^k z^{k'}_{i'}}{z_{i}^{k'} z^{k}_{i'}} \right)-\ln \left( \frac{ d_{i,j}^k d^{k'}_{i'j}}{d_{i,j}^{k'} {d}^{k}_{i',j}} \right) \end{equation}
where $x_{i,j}^k$ denotes bilateral exports and $z_{i}^k$ denotes productivity.

The additional assumption that trade costs can be modeled as the product of a bilateral trade cost and an importer-sector trade cost delivers the prediction that the pattern of trade is determined by country-sector differences in productive efficiency. 
The ranking of relative sectoral exports is directly determined by the ranking of relative sectoral productivity. 
\begin{equation}
\frac{z_{i}^1}{{z}^{1}_{i'}} \leq \dots \leq \frac{z_{i}^K}{{z}^{K}_{i'}} \Leftrightarrow \frac{x_{i,j}^1}{x^{1}_{i',j}}\leq \dots \leq\frac{x_{i,j}^K}{x^{K}_{i',j}}
\end{equation}

## The pattern of RCA for gross exports and for value-added trade

In this section, I provide the intuition why the ranking of sectoral exports for any pair of exports may differ for EXGR and value-added trade (VAT).
I also explain how VAT is constructed on the basis of trade matrices and input-output tables.

### Why the pattern of RCA on the basis of value-added trade may differ from gross exports

In the simple production structure of CDK all goods use the same bundle of primary production factors.
Thus, the contributions of inputs cancel out in relative terms and only the contribution of domestic productivity is left to determine the ranking of sectoral exports. <p>
However, a large literature since Leontief has shown that the production process and factor-input combinations are sector specific. 
Recent contributions to this literature are @Levchenko2016 and @Koopman. 
Both features may lead to a wedge between the relative sector-level production costs based on domestic and foreign contributions.
and the relative production costs based on domestic contributions only. 
Hence, the pattern of RCA on the basis of domestic production factors and inputs may be different from the pattern of RCA on the basis of domestic and foreign factors. 

### Value-added trade methodology

Ideally, VAT would be directly measured; however, this would require global transaction data  [@baldwin2014].
Instead, VAT is constructed on the basis of global input-output tables [@wang2013].   
Global input-output tables are linked by researchers on the basis of national input-output tables, which are constructed by national statistical agencies @timmer_gvc.   <p> 
The methodology to decompose EXGR into VAT is based on Leontief's input-output models [@wang2013]. 
Leontief showed that one can estimate the type and the quantity of necessary intermediate inputs to produce one unit of output on the basis of the input-output structure of the economy. 
@wang2013 refine the input-output approach to obtain the mapping from observed gross output  to any sector and country to the underlying value-added content contributed by production factors in any partner country and sector.
On the basis of the flow of gross output, VAT is obtained by multiplying the flows of gross output with the ratio of the value-added to gross output at the bilateral sectoral level. <p>
The intuition behind the decomposition of EXGR into VAT is described by @wang2013. 
The production of one dollar of exports occurs in several stages. 
In the final round of production, an sector produces the exported final good by creating value-added and by making use of intermediate inputs. 
The intermediate inputs are themselves produced by creating value-added and by making use of intermediate inputs. 
One can account for the total domestic value-added used in the production of one dollar of exports by tracing the amount of value-added occurring at several stages of production. 
The total domestic value-added is the sum of all, directly and indirectly, created value-added induced by the one dollar export.
<p>
At a sectoral level of aggregation, EXGR can be decomposed into VAT in two different ways [@wang2013].
The decomposition of EXGR into VAT yields a matrix of value-added.
The matrix records the origins of value-added from a country-sector and the destination of value-added to a country-sector.
BVAT is the sum across the columns of the value-added matrix.
The BVAT of a sector includes the direct value-added produced in this sector and all the indirect value-added from further domestic upstream sectors.
This concept measures the contribution of the domestic supply-chain in EXGR.
<p>
FVAT is the sum across the rows of the value-added matrix.
The FVAT of a sector includes the direct value-added of the sector and all the value-added of this sector included in the exports of domestic downstream sectors.
FVAT describes the factor content of trade. 
It measures the direct and indirect contributions of a sector's capital and labor to the domestic value-added included in the EXGR of the sector and other domestic sectors.

##Retrieving the pattern of RCA

It is immediate that the CDK approach can be used to retrieve the ranking of RCA for EXGR, BVAT and  FVAT. 
However, I need to interpret the trade flows $x_{i,j}^k$ accordingly and redefine $1/z_i^k$ as capturing the corresponding production cost of country $i$ and sector $k$. 

Under the additional assumption that the trade cost in all three cases can be modeled as the product of a bilateral trade cost and an importer-sector trade cost, the empirical specification of eq. 1 is as follows.
 \begin{align}  \ln \left( \frac{x_{i,j}^k x^{k'}_{i'j}}{x_{i,j}^{k'} x^{k}_{i'j}} \right)= \theta \ln \left( \frac{z_{i}^k z^{k'}_{i'}}{z_{i}^{k'} z^{k}_{i'}} \right)+\ln \left( \frac{ \epsilon_{i,j}^k \epsilon^{k'}_{i',j}}{\epsilon_{i,j}^{k'} {\epsilon}^{k}_{i',j}} \right) \end{align}
The econometric error term $\epsilon_{i,j}^k$ includes variable trade cost and other time varying unobserved components. 
 $x_{i,j}^k$ denotes EXGR, BVAT and FVAT respectively.
For EXGR  $1/z_i^k$ denotes  total sectoral production cost. 
For BVAT  $1/z_i^k$ denotes total domestic sectoral production cost.
For FVAT  $1/z_i^k$ denotes total domestic factor cost in the sector.
<p> 
The authors state that the following simpler equation may be estimated equivalently instead of the previous equation.
  \begin{align} 
  \label{eq:2} 
  \ln x_{i,j}^k=\delta_{i,j}+\delta_j^k + \theta \ln z_i^k+\epsilon^k_{i,j}
 \end{align} 
The eq. 3 states that the three trade indicators EXGR, FVAT and BVAT $x_{i,j}^k$ from country $i$ in sector $k$ to market $j$ are determined by the inverse of production cost $\ln z_i^k$, exporter-importer fixed-effects $\delta_{i,j}$ and importer-sector fixed-effects $\delta_j^k$. 
This approach is akin to a 'difference-in-difference' estimation. <p>
The structural revealed comparative advantage measure is based on the estimated exporter-sector fixed effect.
The fixed effect $\delta_i^k$. is the empirical equivalent to the $\theta \ln z_i^k$ term in eq. 4.   
   \begin{align} \label{eq:3}
\ln {x}_{i,j}^k=\delta_{i,j}+\delta_j^k + \delta_i^k + \epsilon^k_{i,j}
 \end{align}
 The revealed  comparative advantage can be retrieved based on the estimate of $\theta$ and the exported-sector fixed effect $\delta_i^k$.
 ^[This approach of retrieving the pattern of RCA differs from the Balassa approach. 
 In the Balassa approach RCA is defined as: $$ \left(x^k_{i,World} \, / \sum_{k'=1}^K x_{i,World}^{k'} \right)  \, /  \left(\sum_{i'=1}^I x^k_{i',World} \, / \sum_{i'=1}^I \sum_{k'=1}^K x_{i,World}^{k'} \right) $$
 @leromain2014 show that the structural RCA is better suited to analyse the pattern of RCA across countries and sectors than the Balassa-based RCA.]
 \begin{align*} 
 \label{eq:4}
  z_i^k=e^{{\delta_i^ k}/{\theta}} 
  \end{align*}
I need an estimate of the dispersion parameter $\theta$ to construct the rankings of comparative advantage. 
It can be inferred on the basis of eq. 4 that  $\theta$ may be directly estimated using value-added trade or EXGR.
However, as our analysis of the degree of divergence between the RCA rankings is robust to the estimate of $\theta$, I discuss the methodology, additional data sources and further data manipulations to estimate $\theta$ in the appendix.

## Data

I use two data sources to obtain the ranking of RCA on the basis of BVAT, FVAT and EXGR in the year 2005. 
The first data source is the TiVA  [@OECDSTAN] database. 
The TiVA database includes 61 countries and 18 sectors. 
It covers the following country blocks, the OECD, EU28, G20, as well as most of the East and South-East Asian economies and a subset of South American economies.
The TiVA database includes 18 sectors, of which 6 are service sectors, 10 are manufacturing sectors, and 2 are primary sectors.
^[A sector denotes one or more chapter of the ISIC REV. 3.1 classification.] <p>
In this thesis, I use a subset of the TiVA data both regarding sectors and countries covered. 
First, I excluded several countries which had no information on forward value-added trade in 2005.
^[The following countries are dropped: Lithuania, Latvia, Malaysia, Philippines, Romania, Rest of the World, Russia, Singapore, Thailand, Tunisia, Taiwan, Vietnam, South Africa.]
Second, I excluded countries that have no positive trade flows in a complete row in the trade flow matrix specific to each sector.
^[Therefore, the following countries are dropped: Malta, Island, Costa Rica, Brunei Darussalam, Cambodia.]
Moreover, I excluded the sector 'electricity, water and gas supply' (40-41) and 'public sector services (75-95)' as many countries recorded zero exports in those sectors.
Third, I excluded Saudi Arabia from the estimations because it exports mainly oil.^[The share of petroleum exports as a fraction of total f.o.b. exports in 2005 were 90 percent [@opec].]
<p>
The second data source is the WiOD [@Timmer2015].^[The decomposition of EXGR to VAT on the basis of the WiOD was conducted with the R packages "wiod" and "decompr" created by @wiod_R.] 
The WiOD has a larger focus on European countries.
It includes 27 European countries and 15 other economies. 
It covers 19 sectors, of which 6 are service sectors, 11 are manufacturing sectors and 2 are primary sectors.^[A sector denotes one or more chapter of the ISIC REV. 3.1 classification.] <p>
The TiVA data includes a geographically more diverse sample of countries relatively to the WiOD. 
It includes observations from the following countries, which are not present in the WiOD: Argentina, Chile, Switzerland, Chile, Columbia, Hong Kong, Israel, Norway and New Zealand.
On the other hand, the WiOD data includes five European countries which are dropped in our subset of the TiVA data: Lithuania, Latvia, Malta, Romania, Russia. 
Further, the WiOD data includes Taiwan and a construct for the Rest of the World. 
The two databases overlap for thirty-four countries, of which 25 are European and 9 are non-European.

## Results: Comparing RCA for gross exports and for value-added trade

```{r fig data prep, cache=T}
#norm_RCA includes WIOD regression results
load("norm_RCA.Rdata")
set.seed(153)

dva<-results.norm2$DVA.BI
dvix<-results.norm2$DViX_Fsr.BI
exgr<-results.norm2$EXGR.BI

fvax.exgr.wiod<-dplyr::inner_join(dvix,exgr, by=c("Country","Industry"))

dva.fvax.exgr.wiod<-dplyr::inner_join(dva,fvax.exgr.wiod, by=c("Country","Industry"))

attr(fvax.exgr.wiod,"vars")<-NULL

attr(dva.fvax.exgr.wiod,"vars")<-NULL
attr(exgr,"vars")<-NULL
attr(exgr,"labels")<-NULL
attr(exgr,"Indices")<-NULL
attr(exgr,"indices")<-NULL
attr(exgr,"group_sizes")<-NULL
attr(exgr,"biggest_group_size")<-NULL

attr(dvix,"vars")<-NULL
attr(dvix,"labels")<-NULL
attr(dvix,"Indices")<-NULL
attr(dvix,"indices")<-NULL
attr(dvix,"group_sizes")<-NULL
attr(dvix,"biggest_group_size")<-NULL

attr(dva,"vars")<-NULL
attr(dva,"labels")<-NULL
attr(dva,"Indices")<-NULL
attr(dva,"indices")<-NULL
attr(dva,"group_sizes")<-NULL
attr(dva,"biggest_group_size")<-NULL

my.ggplot2<-function(df,title,shape.val,linetype.vals,color){ 
  plot(ggplot(df, aes(x=IND, y=VAX)) +
          geom_line(aes(colour=Variable, group=Variable,shape=Variable,linetype=Variable)) +
          geom_point(aes(colour=Variable, shape=Variable))  +
          coord_cartesian(ylim = c(0.8, 1.2))+
          geom_hline(yintercept=1)+
          labs(x="Sector",y="RCA",title=title)+
          theme_bw() +   
          scale_shape_manual(name = "", values=shape.val)+
          scale_linetype_manual("", values=linetype.vals)+
        # scale_color_brewer(name="",palette = "Dark2") +
         scale_color_manual(name = "",values=color)+
          scale_y_continuous(breaks=c(seq(from=0.8,to=1.2,by=0.05)))+
          theme(plot.title = element_text(hjust=0.5,vjust=0.5),
            legend.title=element_blank(), legend.position = "bottom", # legend location in graph
                panel.grid.minor = element_blank(),
                axis.title=element_text( size="10"),
                axis.text.x=element_text(angle = 90, size=10)
          ))
}
```

```{r data prep, cache=T}
#data prep steps make DF long again to plot in ggplot2 
keep<-c("15t16","17t18", "19",    "20",    "21t22", "23",    "24" ,   "25",    "26"   ,"27t28", "29",    "30t33", "34t35", "36t37")
dva.fvax.exgr.wiod<-dva.fvax.exgr.wiod %>% filter(.,Country %in% c("DEU","BEL")) %>% filter(.,Industry %in% keep)


vars=c("z_DVA_norm","z_EXGR_norm","z_DViX_Fsr_norm")
#### create list with each element of the list seperate df from each variable
data.prep.steps<-function(variable,df,id_col1,id_col2){
  select_variables=c(id_col1,id_col2, variable)
  z.wide.df.wiod<-df %>% dplyr::select_(., .dots=select_variables ) %>%spread_(.,key_col=id_col1, value_col=variable)
  name.var<-gsub("z_","",variable)
  name.var<-gsub("_norm","",name.var)
  colnames(z.wide.df.wiod) <-c("IND",paste(name.var, colnames(z.wide.df.wiod)[2:3], sep = "_"))
  z.wide.df.wiod}


dva.fvax.exgr.wiod.list<-lapply(vars , FUN= data.prep.steps,df=dva.fvax.exgr.wiod,id_col1="Country" ,id_col2="Industry")
#here change t to - in the string of IND
dva.fvax.exgr.wiod.list<-map(dva.fvax.exgr.wiod.list, .f=function(x,col) {
    x[[col]]<-gsub("t","-",x[[col]])
    x} , col="IND")
z.wide.exgr.fddva.dva.wiod<-dplyr::inner_join(dva.fvax.exgr.wiod.list[[1]],dva.fvax.exgr.wiod.list[[2]],by="IND")
z.wide.exgr.fddva.dva.wiod<-dplyr::inner_join(z.wide.exgr.fddva.dva.wiod,dva.fvax.exgr.wiod.list[[3]],by="IND")

#certain colors
tableau.color<-c("#1F77B4", "#2CA02C",  "#D62728", "#FF7F0E")
brewer<-c('#e41a1c','#377eb8','#4daf4a','#984ea3','#ff7f00','#ffff33')
brewer.1<-c('#377eb8','#4daf4a','#e41a1c','#ff7f00' )


colors.vis<-c("#50AC69","#B02985","#78A619","#C50138","#1C7FA7","#BD922F")

#dat.plot$shapes_values<-revalue(dat.plot$variable, c("Forw. VAX Belgium"="FVAXBEL","Forw. VAX Germany"="FVAXGER", "Back. VAX Belgium"="BVAXBEL","Back. VAX Germany"="BVAXGER") )
#dat.plot$variable<- revalue(dat.plot$variable, c("Forw. VAX Belgium"="FVAXBEL","Forw. VAX Germany"="FVAXGER", "Back. VAX Belgium"="BVAXBEL","Back. VAX Germany"="BVAXGER"))
#set specific shape values for different indicators
shapes_values<-c(1,2,1,2)
dat.m.wiod<- z.wide.exgr.fddva.dva.wiod%>% dplyr::select(.,IND, DVA_BEL,DVA_DEU, DViX_Fsr_BEL, DViX_Fsr_DEU) %>% gather(.,variable,VAX,-IND)
p.1.wiod <- ggplot(dat.m.wiod, aes(x=IND, y=VAX, group=variable,shape = variable,color=variable)) +
  geom_point( size = 2)  +
  geom_line()+
  coord_cartesian(ylim = c(0.8, 1.2))+
  scale_shape_manual(name = "",labels =c("Forw. VAX Belgium","Backw. VAX Belgium","Forw. VAX Germany","Backw. VAX Germany"),  values=shapes_values) +
  scale_color_manual(name = "", values=c(brewer.1))+ 
  ggtitle(expression(atop("Structural RCA Belgium and Germany")))+
  labs(x="Industry") +
   scale_y_continuous("RCA",limits=c(0.8,1.2))+
  coord_cartesian( )+
  theme_bw() + 
  theme( legend.title=element_blank(),legend.position = "bottom", # legend location in graph
         panel.grid.minor = element_blank(),
         axis.title=element_text( size="10"),
         axis.text.x=element_text(angle = 90, size=10))

brewer<-c('#e41a1c','#377eb8','#4daf4a','#984ea3','#ff7f00','#ffff33')


colors2<-list(c("#e41a1c","#377eb8"),c("#e41a1c","#377eb8"),
              c("#4daf4a","#e41a1c"),c("#4daf4a","#e41a1c"))

# dat.wiod.bel.deu <- z.wide.exgr.fddva.dva.wiod%>%dplyr::select(.,IND,DViX_Fsr_BEL,EXGR_BEL,DViX_Fsr_DEU,EXGR_DEU)%>%mutate(.,ratio_exgr_bel_deu=EXGR_BEL/EXGR_DEU, ratio_fddva_bel_deu=DViX_Fsr_BEL/DViX_Fsr_DEU) %>%dplyr::select(.,IND,ratio_exgr_bel_deu,ratio_fddva_bel_deu) %>%gather( ., Variable,VAX,-IND)
# dat.wiod.bel.deu$IND<-gsub("t","-",dat.wiod.bel.deu$IND)
z.long.exgr.fddva.dva.wiod<-z.wide.exgr.fddva.dva.wiod %>%gather( ., Variable,VAX,-IND)
z.long.exgr.fddva.dva.wiod$COU<-stringr::str_sub(z.long.exgr.fddva.dva.wiod$Variable,start=-3)
z.long.exgr.fddva.dva.wiod$Variable<-stringr::str_sub(z.long.exgr.fddva.dva.wiod$Variable,start=1,end=4)
z.long.exgr.fddva.dva.wiod$Variable<-gsub("_","",z.long.exgr.fddva.dva.wiod$Variable)

z.l.exgr.fvax.bvax.de<-z.long.exgr.fddva.dva.wiod%>%filter(COU%in%"DEU")
z.l.exgr.fvax.bvax.be<-z.long.exgr.fddva.dva.wiod%>%filter(COU%in%"BEL")
```

```{r prep plots, message=FALSE, warning=FALSE, cache=T}
# BEL.GER<-z.long.exgr.fddva.dva.wiod%>% spread(data=., key=Variable, value=VAX) %>% mutate(diff.EXGR.DViX=EXGR-DViX)
# filter_list_variable<-list(c(paste(c("DViX_Fsr","DVA"),("BEL"),sep="_"),paste(c("DViX_Fsr","DVA"),("DEU"),sep="_")),paste(c("DViX_Fsr","EXGR"),("BEL"),sep="_"),paste(c("DViX_Fsr","EXGR"),("DEU"),sep="_"),paste(c("DVA","EXGR"),("BEL"),sep="_"),paste(c("DVA","EXGR"),("DEU"),sep="_"))
# #create seperate data frames for each plot, first FVAX BVAX BEL & DEU than FVAX EXGR for DEU, BEL, than EXGR BVAX
# list.of.df<-lapply(filter_list_variable, FUN=function(df, filter){ 
#   df<- df  %>% filter_(., .dots=interp(~  Variable %in% filter))} ,df=z.long.exgr.fddva.dva.wiod)
# #gsub industry
# list.of.df<-map(list.of.df ,.f=function(x,col) {
#   x[[col]]<-gsub("_"," ",x[[col]])
#   x} , col="Variable")

#list.of.df<-list(dat.wiod.bel.deu,dat.m.wiod.2.bel,dat.m.wiod.2.deu,dat.m.wiod.1.bel,dat.m.wiod.2.deu)
list.of.titles<-list("RCA across industries of Belgium realtive to Germany WiOD","Structural RCA Belgium WiOD","Structural RCA Germany WiOD","Structural RCA Belgium  WiOD","Structural RCA Germany WiOD")
list.of.shape.vals<-list(c(1,2,1,2),c(1,4),c(1,4), c(1,3),c(1,3))

list.of.linetype.vals<-list(c(4,1,4,1),c(1,2),c(1,2),c(4,1),c(4,1))
z.long.exgr.fddva.dva.wiod<-z.long.exgr.fddva.dva.wiod %>% mutate(.,Variable=fct_recode(Variable,BVAX="DVA",FVAX="DViX"))
dat.m.wiod.bel.deu<-filter(z.long.exgr.fddva.dva.wiod,Variable %in% c("DVA","DViX"))
dat.m.wiod.2.bel<-filter(z.long.exgr.fddva.dva.wiod,COU %in% "BEL") %>%filter(.,Variable %in% c("EXGR","FVAX"))
dat.m.wiod.2.deu<-filter(z.long.exgr.fddva.dva.wiod,COU %in% "DEU")%>%filter(.,Variable %in% c("EXGR","FVAX"))
dat.m.wiod.1.bel<-filter(z.long.exgr.fddva.dva.wiod,COU %in% "BEL")%>%filter(.,Variable %in% c("EXGR","BVAX"))
dat.m.wiod.1.deu<-filter(z.long.exgr.fddva.dva.wiod,COU %in% "DEU")%>%filter(.,Variable %in% c("EXGR","BVAX"))
```

```{r prep plots2, message=FALSE, warning=FALSE, cache=T,include=F}
list.of.df<-list(dat.m.wiod.2.bel,dat.m.wiod.2.deu,dat.m.wiod.1.bel,dat.m.wiod.1.deu)

list.of.titles<-list("Structural RCABelgium WiOD","Structural RCA Germany WiOD","Structural RCA Belgium  WiOD","Structural RCA Germany WiOD")

#list.of.titles<-c(rep("",4))

list.of.shape.vals<-list(c(4,1),c(4,1), c(3,1),c(3,1))

list.of.linetype.vals<-list(c(2,1),c(2,1),c(4,1),c(4,1))


colors2<-list(c("#e41a1c","#377eb8"),c("#e41a1c","#377eb8"),
              c("#4daf4a","#e41a1c"),c("#4daf4a","#e41a1c"))
list.of.plots<-pmap(list(df=list.of.df,title=list.of.titles,shape.val=list.of.shape.vals,linetype.vals=list.of.linetype.vals,color=colors2),.f=my.ggplot2)

#Counry pair BVAX EXGR
p1.2.1<-(list.of.plots[[4]])
p1.2.2<-(list.of.plots[[3]])
#Counry pair FVAX EXGR
p.3.2.1<-list.of.plots[[1]]
p.3.2.2<-list.of.plots[[2]]
```

```{r cor rca wiod, message=FALSE, warning=FALSE, cache=T}
####################################################################COR RCA WIOD ####################################################################
small.res<-list()
results.norm<-list(exgr,dvix,dva)
exgr.dvix<-dplyr::inner_join(exgr,dvix,by=c("Country","Industry"))
exgr.dva<-dplyr::inner_join(exgr,dva,by=c("Country","Industry"))
dvix.dva<-dplyr::inner_join(dvix,dva,by=c("Country","Industry"))
cor.dfs.RCA<-list(exgr.dvix,exgr.dva,dvix.dva)
names(cor.dfs.RCA)<-c("EXGR.FVAX","EXGR.BVAX","FVAX.BVAX")
names.cors<-c("EXGR.FVAX","EXGR.BVAX","FVAX.BVAX")
names(results.norm)<-c("EXGR","DViX","DVA")
vec<-names(results.norm)
vals.res<-map(results.norm,.f=function(x){ names(x)[3] 
})
keys.res<-(map(results.norm,.f=function(x){ names(x)[2]}))
results.list.matrix<-list()
#try.func<-function(data,key_col,value_col){data %>% spread_(.,key_col,value_col)} %>%dplyr::select_(., .dots=list(quote(-AtB), quote(-C),quote(-L),quote(-M), quote(-N),quote(-O),quote(-P)  ))}

results.norm<-pmap(list(data=results.norm, key_col=keys.res, value_col=vals.res),.f=function(data,key_col,value_col){data %>% spread_(.,key_col,value_col) %>%dplyr::select_(., .dots=list(quote(-AtB), quote(-C),quote(-L),quote(-M), quote(-N),quote(-O),quote(-P)  ))})
```

In this section, I assess the degree of divergence of the structural RCA ranking obtained on the basis of EXGR with the ranking obtained on the basis of FVAT and BVAT.
To compare RCA rankings I proceed as follows.
First, I measure the strength of association between the ranking of RCA obtained on the basis of EXGR to the ranking obtained on the basis of VAT.
Specifically, I compute the strength of association on the basis of the RCA ranking of sectors within each country between EXGR and VAT with Spearman's $\rho$ and Kendall's $\tau$.
As stated earlier, the results are robust to the specific value of $\theta$.  <p>
The structure is as follows.
In the first subsection, I compare the pattern of RCA on the basis of BVAT and on the basis of EXGR.
In the second subsection, I compare the pattern of RCA on the basis of FVAT and EXGR, and I analyse whether the degree of divergence between the RCA ranking on the basis of FVAT and on the basis of EXGR relates to a country's level of development. 
In the third subsection, I check the robustness of the results by comparing the strength of association of the rankings on the basis of the WiOD data to the strength of association of the rankings on the basis of the TiVA data.
Further, I analyse whether the differences between the degree of divergence of the RCA rankings between the two data sources are linked to a country's level of development.

### Strength of association between EXGR and BVAT

I start by illustrating the strength of association  for RCA constructed on the basis of BVAT and EXGR on the example of Germany and Belgium.^[Following the approach of @leromain2014, I normalize  the RCA as follows:  $RCA_{i}^{k}=\frac{ z^k_i * \bar{z} }{\bar{z}_i * \bar{z}^k}$, where $\bar{z}$ denotes the grand mean, $\bar{z}^k$ denotes the sector specific mean and $\bar{z}_i$ denotes the country specific mean.]
In the process, I characterize the pattern of comparative advantage for these two countries.
The measure of RCA  has the following interpretation: 
a value above (below) one indicates that a country has a comparative (dis)advantage in an sector compared to the other countries. 
Especially, if the RCA is larger than one it indicates that the country-sector productivity scaled by the sample mean is larger than the expected value for the country-sector, which is the product of the country mean and sector mean  [@leromain2014]. <p>
The first insight from figure 2.1 is that the pattern of RCA on the basis of BVAT closely traces the pattern of RCA on the basis of EXGR for both countries. 
The pattern of comparative advantage is as follows: 
Germany has a comparative advantage  in all manufacturing sectors except for 'fuel products' (23).
Belgium has a comparative advantage in the following nine manufacturing sectors: 'food products' (15-16), 'paper and publishing products' (21-22), 'fuel products' (23), 'chemical products' (24), 'rubber and plastics products' (25), 'other non-metallic mineral products' (26), 'basic metals and metal products' (27-28), 'machinery and equipment' (29), and 'transport equipment' (34-35).^[The numbers in the brackets refer to the corresponding chapters of the sectors according to the ISIC Rev. 3.1 classification.]
Comparing the pattern of RCA of Belgium relative to Germany across sectors, the figure shows that Belgium has a comparative advantage relative to Germany in the following three sectors: 'food products' (15-16), 'fuel products' (23) and 'chemical products' (24). 

```{r fig-7 Country pair RCA: BVAT and EXGR, echo=F, fig.align='center', fig.cap="Figure 2.1: Country pair RCA: BVAT and EXGR", message=FALSE, warning=FALSE}
grid.arrange(p1.2.1$plot,p1.2.2$plot,ncol=2)
```

Next, in figure 2.2 I summarize the degree of divergence between the pattern of RCA on the basis of BVAT and on the basis of EXGR for the full set of countries. 
The degree of divergence between the pattern of RCA on the basis of BVAT and on the basis of EXGR for the complete set of countries is similar for both rank correlation measures.
Hence, in figure 2.2 I present only the results for Kendall's $\tau$.
The strength of the association between the EXGR and the BVAT rankings is high. 
Even the countries with the lowest strength of association, e.g. Hungary (0.85) and the Slovak Republic (0.89) show high coefficients. 
Further, the countries with the highest strength of association e.g. Germany (0.98), Finland (0.97) and China (0.97), show coefficients close to 1.

```{r figure 2.2 Strength of association: BVAT EXGR RCA, cache=T,include=F,echo=F}
##function to calculate corr coefficients between two data frames in one list
results.norm<-list(exgr,dvix,dva)
exgr.dvix<-dplyr::inner_join(exgr,dvix,by=c("Country","Industry"))
exgr.dva<-dplyr::inner_join(exgr,dva,by=c("Country","Industry"))
dvix.dva<-dplyr::inner_join(dvix,dva,by=c("Country","Industry"))
cor.dfs.RCA<-list(exgr.dvix,exgr.dva,dvix.dva)
names(cor.dfs.RCA)<-c("EXGR.FVAX","EXGR.BVAX","FVAX.BVAX")
names.cors<-c("EXGR.FVAX","EXGR.BVAX","FVAX.BVAX")
names(results.norm)<-c("EXGR","DViX","DVA")
vec<-names(results.norm)
vals.res<-map(results.norm,.f=function(x){ names(x)[3] 
})
keys.res<-(map(results.norm,.f=function(x){ names(x)[2]}))

results.norm<-pmap(list(data=results.norm, key_col=keys.res, value_col=vals.res),.f=function(data,key_col,value_col){data %>% spread_(.,key_col,value_col) %>%dplyr::select_(., .dots=list(quote(-AtB), quote(-C),quote(-L),quote(-M), quote(-N),quote(-O),quote(-P)  ))})
##try different approach to compute RCA cor
cor.dfs.RCA<-map(cor.dfs.RCA,.f=function(x){
  map(unique(x$Country),function(filter_country){
  condition <- lazyeval::interp(~ y == a, y=as.name("Country"), a=filter_country)  
  c<-filter_(x, condition) 
  })})
names(cor.dfs.RCA)<-c("EXGR.FVAX","EXGR.BVAX","FVAX.BVAX")
cor.dfs.RCA.safe<-cor.dfs.RCA
cor.dfs.RCA<-map(cor.dfs.RCA.safe,.f=function(x){ 
   names.country<-map_chr(x, .f=function(df){ 
     unique(df$Country)
     })
   names(x)<-names.country
   x
})


bootstr.ci<-function(dataframe,method.name,nmr=10000,colname1="z_EXGR_norm",colname2="z_DViX_Fsr_norm") { 
  
  f <- function(data, indices){
    d <- data[indices,] # allows boot to dplyr::select sample 
    cor(d[[colname1]], d[[colname2]],method=method.name)
  }
  x<-as.data.frame(dataframe)
  bootcorr<- boot::boot(x, f, R=nmr)
  boot::boot.ci(bootcorr, type =c("bca","perc"))
}

#kendall ci adapted from package NSM3 from the book Hollander, Wolfe, and Chicken - Nonparametric Statistical Methods 3rd edition.
kendall.ci<-function (x = NULL, y = NULL, alpha = 0.05, type = "t") {
  continue <- T
  if (is.null(x) | is.null(y)) {
    cat("\n")
    cat("You must supply an x sample and a y sample!", "\n")
    cat("\n")
    continue <- F
  }
  if (continue & (length(x) != length(y))) {
    cat("\n")
    cat("Samples must be of the same length!", "\n")
    cat("\n")
    continue <- F
  }
  if (continue & (length(x) <= 1)) {
    cat("\n")
    cat("Sample size n must be at least two!", "\n")
    cat("\n")
    continue <- F
  }
  if (continue & (type != "t" & type != "l" & type != "u")) {
    cat("\n")
    cat("Argument \"type\" must be one of \"s\" (symmetric), \"l\" (lower) or \"u\" (upper)!", 
        "\n")
    cat("\n")
    continue <- F
  }
  Q <- function(i, j) {
    Q.ij <- 0
    ij <- (j[2] - i[2]) * (j[1] - i[1])
    if (ij > 0) 
      Q.ij <- 1
    if (ij < 0) 
      Q.ij <- -1
    Q.ij
  }
  C.i <- function(x, y, i) {
    C.i <- 0
    for (k in 1:length(x)) if (k != i) 
      C.i <- C.i + Q(c(x[i], y[i]), c(x[k], y[k]))
    C.i
  }
  if (continue) {
    c.i <- numeric(0)
    n <- length(x)
    for (i in 1:n) c.i <- c(c.i, C.i(x, y, i))
    options(warn = -1)
    tau.hat <- cor.test(x, y, method = "k")$estimate
    options(warn = 0)
    sigma.hat.2 <- 2 * (n - 2) * var(c.i)/n/(n - 1)
    sigma.hat.2 <- sigma.hat.2 + 1 - (tau.hat)^2
    sigma.hat.2 <- sigma.hat.2 * 2/n/(n - 1)
    if (type == "t") 
      z <- qnorm(alpha/2, lower.tail = F)
    if (type != "t") 
      z <- qnorm(alpha, lower.tail = F)
    tau.L <- tau.hat - z * sqrt(sigma.hat.2)
    tau.U <- tau.hat + z * sqrt(sigma.hat.2)
    if (type == "l") 
      tau.U <- 1
    if (type == "u") 
      tau.L <- -1
  }
  tau<- c(tau.L,tau.U)
  #Hollander, Wolfe, and Chicken - Nonparametric Statistical Methods 3rd edition.
  names(tau)<-c("ci.min.kend.hwc","ci.max.kend.hwc")
  tau
}
cor.r.res.RCA.EXGR.FVAX<-map(cor.dfs.RCA$EXGR.FVAX,.f=function(df,nmr){
    a<-df$z_EXGR_norm
    b<-df$z_DViX_Fsr_norm
    res<-cor(a,b,method="spearman",use="pairwise.complete.obs")
    names(res)<-"spearman"
    res.2<-cor(a,b,method="kendall",use="pairwise.complete.obs")
    names(res.2)<-"kendall"
    res.spear.kend<-c(res,res.2)
    
    # #Fieler (1957) Spearman rank-order SE, Bonett and Wright’s (2000) SE and bootstrap ci 95% pc bca
    # ci.spear.bs<-bootstr.ci(df, method.name = "spearman", nmr = 10000)
    # ci.spear.bs.pc<-c(ci.spear.bs$percent[,4],ci.spear.bs$percent[,5])
    # names(ci.spear.bs.pc)<-c("ci.min.spear.bs.perc","ci.max.spear.bs.perc")
    # ci.spear.bs.bca<-c(ci.spear.bs$bca[,4],ci.spear.bs$bca[,5])
    # names(ci.spear.bs.bca)<-c("ci.min.spear.bs.bca","ci.max.spear.bs.bca")
    # ci.spear<-c(ci.spear.bs.pc,ci.spear.bs.bca)
    # 
    # ci.kend.bs<-bootstr.ci(df, method.name = "kendall",  nmr = 10000)
    # ci.kend.bs.pc<-c(ci.kend.bs$percent[,4],ci.kend.bs$percent[,5])
    # names(ci.kend.bs.pc)<-c("ci.min.kend.bs.perc","ci.max.kend.bs.perc")
    # ci.kend.bs.bca<-c(ci.kend.bs$bca[,4],ci.kend.bs$bca[,5])
    # names(ci.kend.bs.bca)<-c("ci.min.kend.bs.bca","ci.max.kend.bs.bca")
    # ci.kend<-c(ci.kend.bs.pc,ci.kend.bs.bca)
    # 
    # ci.bs<-c(ci.spear,ci.kend)
    z.spear <- psych::fisherz(res)
    z.kend <- psych::fisherz(res.2)
    n<-length(df$Industry)
    
    zbwbounds.spear <- z.spear + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res^2)/2)/(n - 3))
    #zbwbounds.kend <-   #z.kend + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res.2^2)/2)/(n - 3))
    
    zfielbounds.spear <- z.spear+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 
    
    zfielbounds.kend <-   z.kend+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 
    
    rsfielbounds.spear <- psych::fisherz2r(zfielbounds.spear)
    rsfielbounds.kend <-psych::fisherz2r(zfielbounds.kend)
    
    names(rsfielbounds.spear)<-c("ci.min.spear.fiel","ci.max.spear.fiel")
    names(rsfielbounds.kend)<-c("ci.min.kend.fiel","ci.max.kend.fiel")
    rsfielbounds<-c(rsfielbounds.spear,rsfielbounds.kend)
    
    rbwbounds.spear <- psych::fisherz2r(zbwbounds.spear)
    rbwbounds.kend <- kendall.ci(a,b)
    #Abdi 
    #Abdi: res.2 +c(qnorm(0.025), qnorm(0.975)) * (2* ( 2*n + 5 ) ) / (9*n*(n -1))
    
    names(rbwbounds.spear)<-c("ci.min.spear.bw","ci.max.spear.bw")
    #names(rbwbounds.kend)<-c("ci.min.kend.bw","ci.max.kend.bw")
    
    rbwbounds<-c(rbwbounds.spear,rbwbounds.kend)
    #names(rsfielbounds)<-c(paste("ci.min_",names(res.3)[1]),paste("ci.max_",names(res.3)[1]),paste("ci.min_",names(res.3)[2]),paste("ci.max_",names(res.3)[2]))
    #names(ci.kend)<-
    res.f<-as.data.frame(c(res.spear.kend, rsfielbounds,rbwbounds))#,ci.bs))
    res.f$COU<-rownames(res.spear.kend)
   # colnames(res.f)<-z
    res.f
  },nmr=10^4)
cor.r.res.RCA.EXGR.FVAX<-map(cor.r.res.RCA.EXGR.FVAX,.f=function(x){
  x<-rownames_to_column(x,"VAR")
  colnames(x)<-c("VAR","VALUE")
  x})

cor.r.res.RCA.EXGR.FVAX.df<-data.table::rbindlist(cor.r.res.RCA.EXGR.FVAX, use.names=TRUE, fill=TRUE,idcol = "COU")

cor.r.res.RCA.EXGR.BVAX<-map(cor.dfs.RCA$EXGR.BVAX,.f=function(df,nmr,colname1="z_EXGR_norm",colname2="z_DVA_norm"){
  a<-df$z_EXGR_norm
  b<-df$z_DVA_norm
  res<-cor(a,b,method="spearman",use="pairwise.complete.obs")
  names(res)<-"spearman"
  res.2<-cor(a,b,method="kendall",use="pairwise.complete.obs")
  names(res.2)<-"kendall"
  res.spear.kend<-c(res,res.2)
  
  # #Fieler (1957) Spearman rank-order SE, Bonett and Wright’s (2000) SE and bootstrap ci 95% pc bca
  # ci.spear.bs<-bootstr.ci(df, method.name = "spearman", nmr = nmr,colname1 =colname1 ,colname2 = colname2)
  # ci.spear.bs.pc<-c(ci.spear.bs$percent[,4],ci.spear.bs$percent[,5])
  # names(ci.spear.bs.pc)<-c("ci.min.spear.bs.perc","ci.max.spear.bs.perc")
  # ci.spear.bs.bca<-c(ci.spear.bs$bca[,4],ci.spear.bs$bca[,5])
  # names(ci.spear.bs.bca)<-c("ci.min.spear.bs.bca","ci.max.spear.bs.bca")
  # ci.spear<-c(ci.spear.bs.pc,ci.spear.bs.bca)
  # 
  # ci.kend.bs<-bootstr.ci(df, method.name = "kendall", nmr = nmr,colname1 =colname1 ,colname2 = colname2)
  # ci.kend.bs.pc<-c(ci.kend.bs$percent[,4],ci.kend.bs$percent[,5])
  # names(ci.kend.bs.pc)<-c("ci.min.kend.bs.perc","ci.max.kend.bs.perc")
  # ci.kend.bs.bca<-c(ci.kend.bs$bca[,4],ci.kend.bs$bca[,5])
  # names(ci.kend.bs.bca)<-c("ci.min.kend.bs.bca","ci.max.kend.bs.bca")
  # ci.kend<-c(ci.kend.bs.pc,ci.kend.bs.bca)
  # 
  # ci.bs<-c(ci.spear,ci.kend)
  z.spear <- psych::fisherz(res)
  z.kend <- psych::fisherz(res.2)
  n<-length(df$Industry)
  
  zbwbounds.spear <- z.spear + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res^2)/2)/(n - 3))
  #zbwbounds.kend <-   #z.kend + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res.2^2)/2)/(n - 3))
  
  zfielbounds.spear <- z.spear+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 
  
  zfielbounds.kend <-   z.kend+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 
  
  rsfielbounds.spear <- psych::fisherz2r(zfielbounds.spear)
  rsfielbounds.kend <-psych::fisherz2r(zfielbounds.kend)
  
  names(rsfielbounds.spear)<-c("ci.min.spear.fiel","ci.max.spear.fiel")
  names(rsfielbounds.kend)<-c("ci.min.kend.fiel","ci.max.kend.fiel")
  rsfielbounds<-c(rsfielbounds.spear,rsfielbounds.kend)
  
  rbwbounds.spear <- psych::fisherz2r(zbwbounds.spear)
  rbwbounds.kend <- kendall.ci(a,b)
  #Abdi 
  #Abdi: res.2 +c(qnorm(0.025), qnorm(0.975)) * (2* ( 2*n + 5 ) ) / (9*n*(n -1))
  
  names(rbwbounds.spear)<-c("ci.min.spear.bw","ci.max.spear.bw")
  #names(rbwbounds.kend)<-c("ci.min.kend.bw","ci.max.kend.bw")
  
  rbwbounds<-c(rbwbounds.spear,rbwbounds.kend)
  #names(rsfielbounds)<-c(paste("ci.min_",names(res.3)[1]),paste("ci.max_",names(res.3)[1]),paste("ci.min_",names(res.3)[2]),paste("ci.max_",names(res.3)[2]))
  #names(ci.kend)<-
  res.f<-as.data.frame(c(res.spear.kend, rsfielbounds,rbwbounds))#,ci.bs))
  res.f$COU<-rownames(res.spear.kend)
  # colnames(res.f)<-z
  res.f
},nmr=10^3)
cor.r.res.RCA.EXGR.BVAX<-map(cor.r.res.RCA.EXGR.BVAX,.f=function(x){
  x<-rownames_to_column(x,"VAR")
  colnames(x)<-c("VAR","VALUE")
  x})

cor.r.res.RCA.EXGR.BVAX.df<-data.table::rbindlist(cor.r.res.RCA.EXGR.BVAX, use.names=TRUE, fill=TRUE,idcol = "COU")
#names(small.res)<-c("cor.EXGR.FVAX","cor.EXGR.BVAX","cor.DVA.FVAX")
kendall_exgr_bvax_wiod<-spread(cor.r.res.RCA.EXGR.BVAX.df,VAR,VALUE)
kendall_exgr_bvax_wiod$above.avg<-ifelse(kendall_exgr_bvax_wiod$kendall>= mean(kendall_exgr_bvax_wiod$kendall),1,0)
kendall_exgr_bvax_wiod$above.avg<-as.factor(kendall_exgr_bvax_wiod$above.avg)
```

```{r plot figure 2.2,fig.align='center',fig.cap='Figure 2.2: Strength of association: BVAT EXGR RCA'}
ggplot(kendall_exgr_bvax_wiod, aes(x=reorder(COU,-kendall), y=kendall,fill=above.avg)) +
#geom_point()+
    geom_bar(stat="identity", position="dodge", colour="black") +
  scale_fill_grey(start=0.4,end=0.8)+
  labs(x="Country",y=expression(paste("Kendall's",tau)))+
 # scale_y_continuous(breaks=c(seq(0, 1,0.1)),minor_breaks=c(seq(0.05, 1,0.1))) +
  coord_cartesian(ylim = c(0.5, 1.0))+
  ggtitle(expression(atop("Association of structural RCA EXGR and BVAX")))+
  theme_bw()+
  theme( plot.title = element_text(hjust=0.5,vjust=0.5),
         legend.position = "none", # legend location in graph
         #panel.grid.minor = element_blank(),
         axis.title=element_text( size="11"),
         axis.text.x=element_text(angle = 90, size=10))+
  geom_hline(yintercept=0.94)+
annotate("text",x=39,y=0.92,label="avg.")
```

### RCA on the basis of FVAT and EXGR

Next, I assess the strength of association between the ranking of RCA on the basis of FVAT and the ranking of RCA on the basis of EXGR. 
As in the previous subsection, I first illustrate the main result using Belgium and Germany. 
I then discuss the results for the full set of countries. <p>
For all sectors in Germany deviations of the RCA from the mean are more pronounced on the basis of EXGR than on the basis of FVAT with the exception of 'fuel products'. 
Similarly, I observe for most sectors in Belgium that deviations of the RCA from the mean are more pronounced on the basis of EXGR than on the basis of FVAT.
However, for Belgium, I observe the opposite pattern being picked up in the following four sectors: 'wood products'  (20), 'paper and paper products' (21-22), 'machinery' (29). <p>
For Germany constructing the RCA on the basis of FVAT instead of EXGR changes the status of four sectors from a comparative advantage (CA) to a comparative disadvantage: 'food products' (15-16), 'textiles and textile products' (17-18), 'leather products'  (19), 'wood products' (20).
For Belgium the pattern of RCA is the same for both measures.
I interpret the results as follows. 
The strength of the German supply chain plays a role in determining the pattern of CA while for Belgium it is the relative efficiency of domestic production factors that determines the pattern of CA.
<p>
 Looking at the pattern of CA of Belgium in terms of forward value-aded trade relative to Germany across sectors, I find the following:
Belgium has a higher comparative advantage in the following sectors: 'food sector' (15-16), 'textile products' (17-18), 'petroleum products' (23), 'chemical products' (24) and 'non-metallic mineral products' (26). 
In terms of EXGR Belgium has a CA relatively to Germany in the same sectors except of the sector 'textile products' (17-18).
```{r Country pair RCA: EXGR and FVAT RCA, echo=FALSE, fig.align='center', fig.cap='Figure 2.3: Country pair RCA: EXGR and FVAT RCA'}
grid.arrange(p.3.2.1$plot,p.3.2.2$plot,name="Country pair RCA: EXGR and FVAT RCA",ncol=2)
```

Figure 2.4 shows the strength of association between the RCA rankings constructed on the basis of FVAT and on the basis of EXGR.
In the left panel of figure 2.4 the strength of the association between the rankings is measured with Kendall's $\tau$  and in the right panel with Spearman's $\rho$. 
The strength of association is highlighted in the graph by a gray color coding. 
Specifically, I use the color light gray for a strength of association above average, medium gray for a strength of association within the 95 perc. asymptotic confidence interval of the mean, and dark gray for a strength of association significantly below average. <p>
Overall, both panels highlight two important differences compared to figure 2.2. 
First, the average strength of association is lower.
Specifically, the average strength of association is 0.80 on the basis of Spearman's $\rho$ and 0.62 on the basis of Kendall's $\tau$. 
Second, the strength of association shows a larger range of values. 
Specifically, the range of the strength of association on the basis of Spearman's $\rho$ (Kendall's $\tau$) is between 0.51 (0.40) for Germany and 0.96 (0.86) for Ireland. <p>
I interpret the results as follows: For a subset of countries that show low strength of association the pattern of CA obtained on the basis of EXGR is strongly determined by the domestic supply chain, while for other countries, those that show high strength of association, the pattern of trade is strongly determined by the efficiency of sector-specific domestic production factors.
```{r fig 2.4 Strength of association: EXGR and FVAT - RCA, cache=T, message=FALSE, warning=FALSE, cache=T, include=FALSE}
plot.wiod.RCA.EXGR.FVAX.df<-spread(cor.r.res.RCA.EXGR.FVAX.df,VAR,VALUE)
kend.cats.low.up<-confint(lm(plot.wiod.RCA.EXGR.FVAX.df$kendall~1))
#categorize kendall middle interval ci around mean
plot.wiod.RCA.EXGR.FVAX.df$kend.cat3<-Hmisc::cut2(plot.wiod.RCA.EXGR.FVAX.df$kendall,cuts=c(round(kend.cats.low.up[1],2),round(kend.cats.low.up[2],2),0.81),digits = 2,g=4)  
#reverse factor
plot.wiod.RCA.EXGR.FVAX.df$kend.cat3<-forcats::fct_rev(plot.wiod.RCA.EXGR.FVAX.df$kend.cat3)
# #reverse ordering within fct labels
# levels(plot.wiod.RCA.EXGR.FVAX.df$kend.cat3)<-map_chr(seq(1,3),.f=function(n){paste(paste0("[",stringi::stri_sub(str=levels(plot.wiod.RCA.EXGR.FVAX.df$kend.cat3)[n],from=8,to=12)),paste0(stringi::stri_sub(str=levels(plot.wiod.RCA.EXGR.FVAX.df$kend.cat3)[n],from=2,to=6),")"),sep=",")})
#pdf(file = "kendall_exgr_fvax_wiod.pdf")

#tikz(file = "kendall_exgr_fvax_wiod.tex",width=7,height=7)
p.kendall.exgr.fvax<-ggplot(plot.wiod.RCA.EXGR.FVAX.df, aes(x=reorder(COU,-kendall), y=kendall,fill = kend.cat3)) + 
  geom_bar(stat="identity", position="dodge", color="black") +
  scale_fill_manual(name=expression(tau),values=c(  "#f0f0f0","#bdbdbd","#636363"))+
  coord_cartesian(ylim=c(0,1.01))+
  scale_y_continuous(breaks=c(seq(0,1,0.1)))+
  labs(x="",y=expression(paste("Kendall's ", tau))) +
  #ggtitle(expression(atop("Association of structural RCA EXGR and FVAT")))+
  theme_bw()+
  theme(legend.position = "bottom", # legend location in graph
         panel.grid.minor = element_blank(),
         axis.title=element_text( size="10"),
         axis.text.x=element_text(angle = 90, size=9))+
  geom_hline(yintercept=0.63) +#mean
annotate("text",x=39,y=0.61, label ="avg.")
# make a categorical var from orginal data

spear.cats.low.up<-confint(lm(plot.wiod.RCA.EXGR.FVAX.df$spearman~1))
#categorize spearman var, 3 levels, 2.interval ci around mean
plot.wiod.RCA.EXGR.FVAX.df$spear.cat3<-Hmisc::cut2(plot.wiod.RCA.EXGR.FVAX.df$spearman,cuts=c(round(spear.cats.low.up[1],2),round(spear.cats.low.up[2],2)),0.94,digits=2)  
#reverse factor
plot.wiod.RCA.EXGR.FVAX.df$spear.cat3<-forcats::fct_rev(plot.wiod.RCA.EXGR.FVAX.df$spear.cat3)
# #reverse fct labels
# levels(plot.wiod.RCA.EXGR.FVAX.df$spear.cat3)<-map_chr(seq(1,3),.f=function(n){paste(paste0("[",stringi::stri_sub(str=levels(plot.wiod.RCA.EXGR.FVAX.df$spear.cat3)[n],from=8,to=12)),paste0(stringi::stri_sub(str=levels(plot.wiod.RCA.EXGR.FVAX.df$spear.cat3)[n],from=2,to=6),")"),sep=",")})

p.spear.exgr.fvax<-ggplot(plot.wiod.RCA.EXGR.FVAX.df, aes(x=reorder(COU,-spearman), y=spearman,fill=spear.cat3)) + 
  geom_bar(stat="identity", position="dodge", color="black") +
  theme_bw()+
  scale_fill_manual(name=expression(rho),values=c(  "#f0f0f0","#bdbdbd","#636363"))+
  coord_cartesian(ylim=c(0,1.01))+
  scale_y_continuous(breaks=c(seq(0,1,0.1)))+
  labs(x="", y=expression(paste("Spearman's ", rho))) +
  theme( plot.title = element_text(hjust=0.5,vjust=0.5),
         legend.position = "bottom", # legend location in graph
         panel.grid.minor = element_blank(),
         axis.title=element_text( size="10"),
         axis.text.x=element_text(angle = 90, size=9))+
  geom_hline(yintercept=0.78)+ #mean+
annotate("text",x=39,y=0.76, label ="avg.")
```
```{r fig Strength of association: EXGR FVAT RCA,fig.alig='center',fig.cap='Figure 2.4: Strength of association: EXGR & FVAT RCA'}
#Strength of association: EXGR FVAT RCA
grid.arrange(p.kendall.exgr.fvax,p.spear.exgr.fvax,ncol=2)
```
I conclude that the strength of association between the RCA rankings obtained on the basis of FVAT and of EXGR is significantly reduced compared to the strength of association between the RCA rankings obtained on the basis of BVAT and EXGR. 
The RCA ranking obtained on the basis of the factor content of trade is substantially different from the ranking obtained on the basis of EXGR.<p>
I check whether the variation across countries in the strength of association between the RCA rankings obtained on the basis of FVAT and of EXGR can be attributed in GDP per capita, as measured in constant 2005 US dollars. 
I test the hypothesis that differences in the strength of association may be connected to the country's level of development.<p>
```{r help functions, cache=T, message=FALSE, warning=FALSE, cache=T, include=FALSE}
bootstr.ci<-function(dataframe,method.name,nmr=10000,colname1="z_EXGR_norm",colname2="z_DViX_Fsr_norm") { 
  
  f <- function(data, indices){
    d <- data[indices,] # allows boot to dplyr::select sample 
    cor(d[[colname1]], d[[colname2]],method=method.name)
  }
  x<-as.data.frame(dataframe)
  bootcorr<- boot::boot(x, f, R=nmr)
  boot::boot.ci(bootcorr, type =c("bca","perc"))
}

#kendall ci adapted from package NSM3 from the book Hollander, Wolfe, and Chicken - Nonparametric Statistical Methods 3rd edition.
kendall.ci<-function (x = NULL, y = NULL, alpha = 0.05, type = "t") {
  continue <- T
  if (is.null(x) | is.null(y)) {
    cat("\n")
    cat("You must supply an x sample and a y sample!", "\n")
    cat("\n")
    continue <- F
  }
  if (continue & (length(x) != length(y))) {
    cat("\n")
    cat("Samples must be of the same length!", "\n")
    cat("\n")
    continue <- F
  }
  if (continue & (length(x) <= 1)) {
    cat("\n")
    cat("Sample size n must be at least two!", "\n")
    cat("\n")
    continue <- F
  }
  if (continue & (type != "t" & type != "l" & type != "u")) {
    cat("\n")
    cat("Argument \"type\" must be one of \"s\" (symmetric), \"l\" (lower) or \"u\" (upper)!", 
        "\n")
    cat("\n")
    continue <- F
  }
  Q <- function(i, j) {
    Q.ij <- 0
    ij <- (j[2] - i[2]) * (j[1] - i[1])
    if (ij > 0) 
      Q.ij <- 1
    if (ij < 0) 
      Q.ij <- -1
    Q.ij
  }
  C.i <- function(x, y, i) {
    C.i <- 0
    for (k in 1:length(x)) if (k != i) 
      C.i <- C.i + Q(c(x[i], y[i]), c(x[k], y[k]))
    C.i
  }
  if (continue) {
    c.i <- numeric(0)
    n <- length(x)
    for (i in 1:n) c.i <- c(c.i, C.i(x, y, i))
    options(warn = -1)
    tau.hat <- cor.test(x, y, method = "k")$estimate
    options(warn = 0)
    sigma.hat.2 <- 2 * (n - 2) * var(c.i)/n/(n - 1)
    sigma.hat.2 <- sigma.hat.2 + 1 - (tau.hat)^2
    sigma.hat.2 <- sigma.hat.2 * 2/n/(n - 1)
    if (type == "t") 
      z <- qnorm(alpha/2, lower.tail = F)
    if (type != "t") 
      z <- qnorm(alpha, lower.tail = F)
    tau.L <- tau.hat - z * sqrt(sigma.hat.2)
    tau.U <- tau.hat + z * sqrt(sigma.hat.2)
    if (type == "l") 
      tau.U <- 1
    if (type == "u") 
      tau.L <- -1
  }
  tau<- c(tau.L,tau.U)
  #Hollander, Wolfe, and Chicken - Nonparametric Statistical Methods 3rd edition.
  names(tau)<-c("ci.min.kend.hwc","ci.max.kend.hwc")
  tau
}
```
```{r Strength of association and GDP per capita, cache=T, message=FALSE, warning=FALSE,echo=F}
######load tiva STATA results and create correlation
library(haven)
library(boot)
results.raw<-read_dta('results_end_march_centr_ricardo.dta', encoding = "UTF-8")
results.raw$IND<-gsub("T","-", results.raw$IND)
setwd("/Users/sergej/Google Drive/Send_liza/")
load("z_ik_tiva.Rdata")

variables<-c("EXGR","EXGR_DVA","FFD_DVA")
#function to normalize z.ik to specific country and industry

#second normalization - normalize z.ik w res grande average, industry and country avg, 
#close to Blassa index - interpretation norm z.ik >1 CA country has CA in industry
# below 1 comparative disadvantage
z.ik.tiva.norm<-lapply( variables, function(data,var,col1,col2){
  varname <- paste("l", var , sep="_")
  select_variables<-c(col1,col2,varname)

  var2<-paste("z",var, sep="_")
  var.i<-paste(var2,col1, sep="_")
  var.k<-paste(var2,col2, sep="_")
  var.m<-paste(var2,"m", sep="_")


    #reduce to necessary columns
  d2<-select_(data, varname, col1, col2)
  mutate_call = lazyeval::interp(~ mean(a), a = as.name(varname))
results_m<- data  %>% select_( .dots=select_variables )  %>% transmute_(.dots=setNames(list( mutate_call),var.m))
results_m<-cbind(  results_m ,d2)

results_std_i<- data %>% select_( .dots=select_variables ) %>% group_by_(.dots=col1)   %>% mutate_(.dots=setNames(list( mutate_call),var.i) ) %>% select_( .dots=c(col2,var.i))

results_std_tr  <-dplyr::inner_join( results_std_i,results_m ,by=c(col1,col2))

results_std_k2<- data %>% select_( .dots=select_variables) %>%  group_by_(.dots=col2)  %>% mutate_(.dots=setNames( list(mutate_call),var.k) ) %>% select_( .dots=c(col1,var.k)) %>% dplyr::inner_join( results_std_tr, by=c(col1,col2))

varval2 <- lazyeval::interp(~ var1* var2 / (var3 * var4),.values= list( var1=as.name(varname) , var2=as.name(var.m), var3=as.name(var.i), var4=as.name(var.k)))

mutate_fn <- function(d_in,  varval,  varname_norm){
  d_out = d_in %>%
    mutate_(.dots = setNames(  varval,  varname_norm))
}

d_in=results_std_k2
varname2_norm <- paste(var2,"norm", sep="_")
data2.norm = mutate_fn(d_in,varval2,varname2_norm )
n.2<-length(data2.norm)
names(data2.norm)[n.2]<-varname2_norm
data2.norm <-select_(  data2.norm,.dots=c(col1,col2,varname2_norm))
},data=res.z.exgr.dva.fvax.tiva,  col1="Country",col2="Industry")
z.ik.tiva.norm<-map(z.ik.tiva.norm,.f=function(x){ colnames(x)[1:2]<-c("COU","IND") 
x})
#subset and create df for each measure

tiva.exgr.bvax.zik.R<-dplyr::inner_join(z.ik.tiva.norm[[1]],z.ik.tiva.norm[[2]],by=c("COU","IND"))
tiva.exgr.fvax.zik.R<-dplyr::inner_join(z.ik.tiva.norm[[1]],z.ik.tiva.norm[[3]],by=c("COU","IND"))
tiva.bvax.fvax.zik.R<-dplyr::inner_join(z.ik.tiva.norm[[2]],z.ik.tiva.norm[[3]],by=c("COU","IND"))
#######R results############
cor.dfs.RCA.tiva.R<-list(tiva.exgr.fvax.zik.R,tiva.exgr.bvax.zik.R,tiva.bvax.fvax.zik.R)
#for every df in list subset a df with only one country
cor.dfs.RCA.tiva.R<-map(cor.dfs.RCA.tiva.R,.f=function(x){
  map(unique(x$COU),function(filter_country,df){
    df[df$COU%in%filter_country,]  
  },df=x)})
names(cor.dfs.RCA.tiva.R)<-c("EXGR.FVAX.tiva.R","EXGR.BVAX.tiva.R","FVAX.BVAX.tiva.R")

cor.dfs.RCA.tiva.R<-map(cor.dfs.RCA.tiva.R,.f=function(x){ 
  names.country<-map_chr(x, .f=function(df){ 
    unique(df$COU)
  })
  names(x)<-names.country
  x
})

##construct corr results for tiva with CI
cor.res.RCA.EXGR.FVAX.tiva.R<-map(cor.dfs.RCA.tiva.R$EXGR.FVAX.tiva.R,.f=function(df,nmr,colname1="z_EXGR_norm",colname2="z_FFD_DVA_norm"){
  a<-df[[colname1]]
  b<-df[[colname2]]
  res<-cor(a,b,method="spearman",use="pairwise.complete.obs")
  names(res)<-"spearman"
  res.2<-cor(a,b,method="kendall",use="pairwise.complete.obs")
  names(res.2)<-"kendall"
  res.spear.kend<-c(res,res.2)
  
  # #Fieler (1957) Spearman rank-order SE, Bonett and Wright’s (2000) SE and bootstrap ci 95% pc bca
  # ci.spear.bs<-bootstr.ci(df, method.name = "spearman", nmr = nmr,colname1=colname1,colname2=colname2)
  # ci.spear.bs.pc<-c(ci.spear.bs$percent[,4],ci.spear.bs$percent[,5])
  # names(ci.spear.bs.pc)<-c("ci.min.spear.bs.perc","ci.max.spear.bs.perc")
  # ci.spear.bs.bca<-c(ci.spear.bs$bca[,4],ci.spear.bs$bca[,5])
  # names(ci.spear.bs.bca)<-c("ci.min.spear.bs.bca","ci.max.spear.bs.bca")
  # ci.spear<-c(ci.spear.bs.pc,ci.spear.bs.bca)
  # 
  # ci.kend.bs<-bootstr.ci(df, method.name = "kendall",  nmr = nmr,colname1=colname1,colname2=colname2)
  # ci.kend.bs.pc<-c(ci.kend.bs$percent[,4],ci.kend.bs$percent[,5])
  # names(ci.kend.bs.pc)<-c("ci.min.kend.bs.perc","ci.max.kend.bs.perc")
  # ci.kend.bs.bca<-c(ci.kend.bs$bca[,4],ci.kend.bs$bca[,5])
  # names(ci.kend.bs.bca)<-c("ci.min.kend.bs.bca","ci.max.kend.bs.bca")
  # ci.kend<-c(ci.kend.bs.pc,ci.kend.bs.bca)
  # 
  # ci.bs<-c(ci.spear,ci.kend)
  z.spear <- psych::fisherz(res)
  z.kend <- psych::fisherz(res.2)
  n<-length(df$IND)
  
  zbwbounds.spear <- z.spear + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res^2)/2)/(n - 3))
  #zbwbounds.kend <-   #z.kend + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res.2^2)/2)/(n - 3))
  
  zfielbounds.spear <- z.spear+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 
  
  zfielbounds.kend <-   z.kend+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 
  
  rsfielbounds.spear <- psych::fisherz2r(zfielbounds.spear)
  rsfielbounds.kend <-psych::fisherz2r(zfielbounds.kend)
  
  names(rsfielbounds.spear)<-c("ci.min.spear.fiel","ci.max.spear.fiel")
  names(rsfielbounds.kend)<-c("ci.min.kend.fiel","ci.max.kend.fiel")
  rsfielbounds<-c(rsfielbounds.spear,rsfielbounds.kend)
  
  rbwbounds.spear <- psych::fisherz2r(zbwbounds.spear)
  rbwbounds.kend <- kendall.ci(a,b)
  #Abdi 
  #Abdi: res.2 +c(qnorm(0.025), qnorm(0.975)) * (2* ( 2*n + 5 ) ) / (9*n*(n -1))
  
  names(rbwbounds.spear)<-c("ci.min.spear.bw","ci.max.spear.bw")
  #names(rbwbounds.kend)<-c("ci.min.kend.bw","ci.max.kend.bw")
  
  rbwbounds<-c(rbwbounds.spear,rbwbounds.kend)
  #names(rsfielbounds)<-c(paste("ci.min_",names(res.3)[1]),paste("ci.max_",names(res.3)[1]),paste("ci.min_",names(res.3)[2]),paste("ci.max_",names(res.3)[2]))
  #names(ci.kend)<-
  res.f<-as.data.frame(c(res.spear.kend, rsfielbounds,rbwbounds))#,ci.bs))
  res.f$COU<-rownames(res.spear.kend)
  # colnames(res.f)<-z
  res.f
},nmr=10^3)
cor.res.RCA.EXGR.FVAX.tiva.R<-map(cor.res.RCA.EXGR.FVAX.tiva.R,.f=function(x){
  x<-rownames_to_column(x,"VAR")
  colnames(x)<-c("VAR","VALUE")
  x})

cor.res.RCA.EXGR.FVAX.tiva.R.df<-data.table::rbindlist(cor.res.RCA.EXGR.FVAX.tiva.R, use.names=TRUE, fill=TRUE,idcol = "COU")

cor.res.RCA.EXGR.BVAX.tiva.R<-map(cor.dfs.RCA.tiva.R$EXGR.BVAX.tiva.R,.f=function(df,nmr,col1="z_EXGR_norm",col2="z_EXGR_DVA_norm"){
  a<-df[[col1]]
  b<-df[[col2]]
  res<-cor(a,b,method="spearman",use="pairwise.complete.obs")
  names(res)<-"spearman"
  res.2<-cor(a,b,method="kendall",use="pairwise.complete.obs")
  names(res.2)<-"kendall"
  res.spear.kend<-c(res,res.2)
  
  # #Fieler (1957) Spearman rank-order SE, Bonett and Wright’s (2000) SE and bootstrap ci 95% pc bca
  # ci.spear.bs<-bootstr.ci(df, method.name = "spearman", nmr = nmr,colname1=col1,colname2=col2)
  # ci.spear.bs.pc<-c(ci.spear.bs$percent[,4],ci.spear.bs$percent[,5])
  # names(ci.spear.bs.pc)<-c("ci.min.spear.bs.perc","ci.max.spear.bs.perc")
  # ci.spear.bs.bca<-c(ci.spear.bs$bca[,4],ci.spear.bs$bca[,5])
  # names(ci.spear.bs.bca)<-c("ci.min.spear.bs.bca","ci.max.spear.bs.bca")
  # ci.spear<-c(ci.spear.bs.pc,ci.spear.bs.bca)
  # 
  # ci.kend.bs<-bootstr.ci(df, method.name = "kendall",  nmr = nmr,colname1=col1,colname2=col2)
  # ci.kend.bs.pc<-c(ci.kend.bs$percent[,4],ci.kend.bs$percent[,5])
  # names(ci.kend.bs.pc)<-c("ci.min.kend.bs.perc","ci.max.kend.bs.perc")
  # ci.kend.bs.bca<-c(ci.kend.bs$bca[,4],ci.kend.bs$bca[,5])
  # names(ci.kend.bs.bca)<-c("ci.min.kend.bs.bca","ci.max.kend.bs.bca")
  # ci.kend<-c(ci.kend.bs.pc,ci.kend.bs.bca)
  # 
  # ci.bs<-c(ci.spear,ci.kend)
  # z.spear <- psych::fisherz(res)
  # z.kend <- psych::fisherz(res.2)
  # n<-length(df$IND)
  # 
  # zbwbounds.spear <- z.spear + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res^2)/2)/(n - 3))
  # #zbwbounds.kend <-   #z.kend + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res.2^2)/2)/(n - 3))
  # 
  # zfielbounds.spear <- z.spear+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 
  # 
  # zfielbounds.kend <-   z.kend+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 
  # 
  # rsfielbounds.spear <- psych::fisherz2r(zfielbounds.spear)
  # rsfielbounds.kend <-psych::fisherz2r(zfielbounds.kend)
  # 
  # names(rsfielbounds.spear)<-c("ci.min.spear.fiel","ci.max.spear.fiel")
  # names(rsfielbounds.kend)<-c("ci.min.kend.fiel","ci.max.kend.fiel")
  # rsfielbounds<-c(rsfielbounds.spear,rsfielbounds.kend)
  # 
  # rbwbounds.spear <- psych::fisherz2r(zbwbounds.spear)
  # rbwbounds.kend <- kendall.ci(a,b)
  # #Abdi 
  # #Abdi: res.2 +c(qnorm(0.025), qnorm(0.975)) * (2* ( 2*n + 5 ) ) / (9*n*(n -1))
  # 
  # names(rbwbounds.spear)<-c("ci.min.spear.bw","ci.max.spear.bw")
  # #names(rbwbounds.kend)<-c("ci.min.kend.bw","ci.max.kend.bw")
  # 
  # rbwbounds<-c(rbwbounds.spear,rbwbounds.kend)
  # #names(rsfielbounds)<-c(paste("ci.min_",names(res.3)[1]),paste("ci.max_",names(res.3)[1]),paste("ci.min_",names(res.3)[2]),paste("ci.max_",names(res.3)[2]))
  # #names(ci.kend)<-
  res.f<-as.data.frame(res.spear.kend)
  res.f$COU<-rownames(res.spear.kend)
  # colnames(res.f)<-z
  res.f
},nmr=10^3)
cor.res.RCA.EXGR.BVAX.tiva.R<-map(cor.res.RCA.EXGR.BVAX.tiva.R,.f=function(x){
  x<-rownames_to_column(x,"VAR")
  colnames(x)<-c("VAR","VALUE")
  x})

cor.res.RCA.EXGR.BVAX.tiva.R.df<-data.table::rbindlist(cor.res.RCA.EXGR.BVAX.tiva.R, use.names=TRUE, fill=TRUE,idcol = "COU")
cor.res.RCA.EXGR.BVAX.tiva.R.wide<-cor.res.RCA.EXGR.BVAX.tiva.R.df%>% spread(VAR,VALUE)

######STATA Results#############
results.raw<-read_dta("results_end_march_centr_ricardo.dta", encoding = "UTF-8")
results.raw$IND<-gsub("T","-", results.raw$IND)

#subset and create df for each measure
norm.prod.va.diff.2005<-dplyr::select(results.raw, z_dva_std,COU,IND)
norm.prod.fd.va.diff.2005<-dplyr::select(results.raw, z_fddva_std,COU,IND)
norm.prod.diff.exgr.2005<-dplyr::select(results.raw, z_exgr_std,COU,IND)
tiva.exgr.bvax.zik<-dplyr::inner_join(norm.prod.diff.exgr.2005,norm.prod.va.diff.2005,by=c("COU","IND"))
tiva.exgr.fvax.zik<-dplyr::inner_join(norm.prod.diff.exgr.2005,norm.prod.fd.va.diff.2005,by=c("COU","IND"))
tiva.bvax.fvax.zik<-dplyr::inner_join(norm.prod.va.diff.2005,norm.prod.fd.va.diff.2005,by=c("COU","IND"))

cor.dfs.RCA.tiva<-list(tiva.exgr.fvax.zik,tiva.exgr.bvax.zik,tiva.bvax.fvax.zik)
cor.dfs.RCA.tiva<-map(cor.dfs.RCA.tiva,.f=function(x){
  map(unique(x$COU),function(filter_country){
    condition <- lazyeval::interp(~ y == a, y=as.name("COU"), a=filter_country)  
    c<-filter_(x, condition) 
  })})
names(cor.dfs.RCA.tiva)<-c("EXGR.FVAX","EXGR.BVAX","FVAX.BVAX")

cor.dfs.RCA.tiva<-map(cor.dfs.RCA.tiva,.f=function(x){ 
  names.country<-map_chr(x, .f=function(df){ 
    unique(df$COU)
  })
  names(x)<-names.country
  x
})

##construct corr results for tiva with CI
cor.res.RCA.EXGR.FVAX.tiva<-map(cor.dfs.RCA.tiva$EXGR.FVAX,.f=function(df,nmr,colname1="z_exgr_std",colname2="z_fddva_std"){
  a<-df[[colname1]]
  b<-df[[colname2]]
  res<-cor(a,b,method="spearman",use="pairwise.complete.obs")
  names(res)<-"spearman"
  res.2<-cor(a,b,method="kendall",use="pairwise.complete.obs")
  names(res.2)<-"kendall"
  res.spear.kend<-c(res,res.2)
  
  # #Fieler (1957) Spearman rank-order SE, Bonett and Wright’s (2000) SE and bootstrap ci 95% pc bca
  # ci.spear.bs<-bootstr.ci(df, method.name = "spearman", nmr = nmr,colname1=colname1,colname2=colname2)
  # ci.spear.bs.pc<-c(ci.spear.bs$percent[,4],ci.spear.bs$percent[,5])
  # names(ci.spear.bs.pc)<-c("ci.min.spear.bs.perc","ci.max.spear.bs.perc")
  # ci.spear.bs.bca<-c(ci.spear.bs$bca[,4],ci.spear.bs$bca[,5])
  # names(ci.spear.bs.bca)<-c("ci.min.spear.bs.bca","ci.max.spear.bs.bca")
  # ci.spear<-c(ci.spear.bs.pc,ci.spear.bs.bca)
  # 
  # ci.kend.bs<-bootstr.ci(df, method.name = "kendall",  nmr = nmr,colname1=colname1,colname2=colname2)
  # ci.kend.bs.pc<-c(ci.kend.bs$percent[,4],ci.kend.bs$percent[,5])
  # names(ci.kend.bs.pc)<-c("ci.min.kend.bs.perc","ci.max.kend.bs.perc")
  # ci.kend.bs.bca<-c(ci.kend.bs$bca[,4],ci.kend.bs$bca[,5])
  # names(ci.kend.bs.bca)<-c("ci.min.kend.bs.bca","ci.max.kend.bs.bca")
  # ci.kend<-c(ci.kend.bs.pc,ci.kend.bs.bca)
  # 
  # ci.bs<-c(ci.spear,ci.kend)
  z.spear <- psych::fisherz(res)
  z.kend <- psych::fisherz(res.2)
  n<-length(df$IND)
  
  zbwbounds.spear <- z.spear + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res^2)/2)/(n - 3))
  #zbwbounds.kend <-   #z.kend + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res.2^2)/2)/(n - 3))
  
  zfielbounds.spear <- z.spear+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 
  
  zfielbounds.kend <-   z.kend+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 
  
  rsfielbounds.spear <- psych::fisherz2r(zfielbounds.spear)
  rsfielbounds.kend <-psych::fisherz2r(zfielbounds.kend)
  
  names(rsfielbounds.spear)<-c("ci.min.spear.fiel","ci.max.spear.fiel")
  names(rsfielbounds.kend)<-c("ci.min.kend.fiel","ci.max.kend.fiel")
  rsfielbounds<-c(rsfielbounds.spear,rsfielbounds.kend)
  
  rbwbounds.spear <- psych::fisherz2r(zbwbounds.spear)
  rbwbounds.kend <- kendall.ci(a,b)
  #Abdi 
  #Abdi: res.2 +c(qnorm(0.025), qnorm(0.975)) * (2* ( 2*n + 5 ) ) / (9*n*(n -1))
  
  names(rbwbounds.spear)<-c("ci.min.spear.bw","ci.max.spear.bw")
  #names(rbwbounds.kend)<-c("ci.min.kend.bw","ci.max.kend.bw")
  
  rbwbounds<-c(rbwbounds.spear,rbwbounds.kend)
  #names(rsfielbounds)<-c(paste("ci.min_",names(res.3)[1]),paste("ci.max_",names(res.3)[1]),paste("ci.min_",names(res.3)[2]),paste("ci.max_",names(res.3)[2]))
  #names(ci.kend)<-
  res.f<-as.data.frame(c(res.spear.kend, rsfielbounds,rbwbounds))#,ci.bs))
  res.f$COU<-rownames(res.spear.kend)
  # colnames(res.f)<-z
  res.f
},nmr=10^3)
cor.res.RCA.EXGR.FVAX.tiva<-map(cor.res.RCA.EXGR.FVAX.tiva,.f=function(x){
  x<-rownames_to_column(x,"VAR")
  colnames(x)<-c("VAR","VALUE")
  x})

cor.res.RCA.EXGR.FVAX.tiva.df<-data.table::rbindlist(cor.res.RCA.EXGR.FVAX.tiva, use.names=TRUE, fill=TRUE,idcol = "COU")

cor.res.RCA.EXGR.BVAX.tiva<-map(cor.dfs.RCA.tiva$EXGR.BVAX,.f=function(df,nmr){
  a<-df$z_exgr_std
  b<-df$z_dva_std
  res<-cor(a,b,method="spearman",use="pairwise.complete.obs")
  names(res)<-"spearman"
  res.2<-cor(a,b,method="kendall",use="pairwise.complete.obs")
  names(res.2)<-"kendall"
  res.spear.kend<-c(res,res.2)
  
  # #Fieler (1957) Spearman rank-order SE, Bonett and Wright’s (2000) SE and bootstrap ci 95% pc bca
  # ci.spear.bs<-bootstr.ci(df, method.name = "spearman", nmr = nmr,colname1="z_exgr_std",colname2="z_dva_std")
  # ci.spear.bs.pc<-c(ci.spear.bs$percent[,4],ci.spear.bs$percent[,5])
  # names(ci.spear.bs.pc)<-c("ci.min.spear.bs.perc","ci.max.spear.bs.perc")
  # ci.spear.bs.bca<-c(ci.spear.bs$bca[,4],ci.spear.bs$bca[,5])
  # names(ci.spear.bs.bca)<-c("ci.min.spear.bs.bca","ci.max.spear.bs.bca")
  # ci.spear<-c(ci.spear.bs.pc,ci.spear.bs.bca)
  # 
  # ci.kend.bs<-bootstr.ci(df, method.name = "kendall",  nmr = nmr,colname1="z_exgr_std",colname2="z_dva_std")
  # ci.kend.bs.pc<-c(ci.kend.bs$percent[,4],ci.kend.bs$percent[,5])
  # names(ci.kend.bs.pc)<-c("ci.min.kend.bs.perc","ci.max.kend.bs.perc")
  # ci.kend.bs.bca<-c(ci.kend.bs$bca[,4],ci.kend.bs$bca[,5])
  # names(ci.kend.bs.bca)<-c("ci.min.kend.bs.bca","ci.max.kend.bs.bca")
  # ci.kend<-c(ci.kend.bs.pc,ci.kend.bs.bca)
  
  # ci.bs<-c(ci.spear,ci.kend)
  z.spear <- psych::fisherz(res)
  z.kend <- psych::fisherz(res.2)
  n<-length(df$IND)
  
  zbwbounds.spear <- z.spear + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res^2)/2)/(n - 3))
  #zbwbounds.kend <-   #z.kend + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res.2^2)/2)/(n - 3))
  
  zfielbounds.spear <- z.spear+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 
  
  zfielbounds.kend <-   z.kend+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 
  
  rsfielbounds.spear <- psych::fisherz2r(zfielbounds.spear)
  rsfielbounds.kend <-psych::fisherz2r(zfielbounds.kend)
  
  names(rsfielbounds.spear)<-c("ci.min.spear.fiel","ci.max.spear.fiel")
  names(rsfielbounds.kend)<-c("ci.min.kend.fiel","ci.max.kend.fiel")
  rsfielbounds<-c(rsfielbounds.spear,rsfielbounds.kend)
  
  rbwbounds.spear <- psych::fisherz2r(zbwbounds.spear)
  rbwbounds.kend <- kendall.ci(a,b)
  #Abdi 
  #Abdi: res.2 +c(qnorm(0.025), qnorm(0.975)) * (2* ( 2*n + 5 ) ) / (9*n*(n -1))
  
  names(rbwbounds.spear)<-c("ci.min.spear.bw","ci.max.spear.bw")
  #names(rbwbounds.kend)<-c("ci.min.kend.bw","ci.max.kend.bw")
  
  rbwbounds<-c(rbwbounds.spear,rbwbounds.kend)
  #names(rsfielbounds)<-c(paste("ci.min_",names(res.3)[1]),paste("ci.max_",names(res.3)[1]),paste("ci.min_",names(res.3)[2]),paste("ci.max_",names(res.3)[2]))
  #names(ci.kend)<-
  res.f<-as.data.frame(c(res.spear.kend, rsfielbounds,rbwbounds))#,ci.bs))
  res.f$COU<-rownames(res.spear.kend)
  # colnames(res.f)<-z
  res.f
},nmr=10^3)
cor.res.RCA.EXGR.BVAX.tiva<-map(cor.res.RCA.EXGR.BVAX.tiva,.f=function(x){
  x<-rownames_to_column(x,"VAR")
  colnames(x)<-c("VAR","VALUE")
  x})

cor.res.RCA.EXGR.BVAX.tiva.df<-data.table::rbindlist(cor.res.RCA.EXGR.BVAX.tiva, use.names=TRUE, fill=TRUE,idcol = "COU")
cor.res.RCA.EXGR.BVAX.tiva.wide<-cor.res.RCA.EXGR.BVAX.tiva.df%>% spread(VAR,VALUE)
```
```{r cor RCA on the basis of R regression, cache=T,include=FALSE, message=FALSE, warning=FALSE, cache=T}
GDP.cntry<-readr::read_csv(file="/Users/Sergej/Google Drive/Send_liza/Original_Data/GDP_per_capita_World_Development_Indicators_Data.csv",na = "..")
names(GDP.cntry)<-gsub(" ", ".",names(GDP.cntry))
names(GDP.cntry)[5]<-"YR.2005"
GDP.cntry<-dplyr::select(GDP.cntry,Country.Code,YR.2005)
GDP.cntry$Country.Code<-as.character(GDP.cntry$Country.Code)

GDP.cntry <-dplyr::rename(GDP.cntry, GDP.per.capita.2005 = YR.2005, COU = Country.Code )
cor.res.RCA.EXGR.FVAX.tiva.df.sub<-filter(cor.res.RCA.EXGR.FVAX.tiva.df,VAR %in% c("spearman","kendall"))
cor.res.RCA.EXGR.FVAX.tiva.df.sub<-cor.res.RCA.EXGR.FVAX.tiva.df.sub%>%spread(.,VAR,VALUE)
cor.res.RCA.EXGR.FVAX.tiva.df.sub<-dplyr::rename(cor.res.RCA.EXGR.FVAX.tiva.df.sub,kendall.fvax.tiva=kendall,spearman.fvax.tiva=spearman)

cor.res.RCA.EXGR.FVAX.tiva.df.sub.R<-filter(cor.res.RCA.EXGR.FVAX.tiva.R.df,VAR %in% c("spearman","kendall"))
cor.res.RCA.EXGR.FVAX.tiva.df.sub.R<-cor.res.RCA.EXGR.FVAX.tiva.df.sub.R%>%spread(.,VAR,VALUE)
cor.res.RCA.EXGR.FVAX.tiva.df.sub.R<-dplyr::rename(cor.res.RCA.EXGR.FVAX.tiva.df.sub.R,kendall.fvax.tiva=kendall,spearman.fvax.tiva=spearman)
```
```{r data prep fig 2-6 2-7, message=F,warning=F,cache=T,echo=F, include=FALSE}
stata.r.fvax.gdp.fvax.RCA<-map(list(cor.res.RCA.EXGR.FVAX.tiva.df.sub,cor.res.RCA.EXGR.FVAX.tiva.df.sub.R),.f=function(x){
  y<-dplyr::select_(plot.wiod.RCA.EXGR.FVAX.df,"COU","kendall","spearman")
wiod.tiva.fvax.exgr.gdp<-dplyr::inner_join(x,y, by="COU")
wiod.tiva.fvax.exgr.gdp<-left_join(wiod.tiva.fvax.exgr.gdp,GDP.cntry, by="COU") 

wiod.fvax.gdp<-left_join(plot.wiod.RCA.EXGR.FVAX.df,GDP.cntry, by="COU") 

wiod.tiva.fvax.exgr.gdp<-dplyr::rename(wiod.tiva.fvax.exgr.gdp,"kendall.fvax.wiod"=kendall,"spearman.fvax.wiod"=spearman,"GDP.pc"=GDP.per.capita.2005)

wiod.tiva.fvax.exgr.gdp.kend<-dplyr::select_(wiod.tiva.fvax.exgr.gdp,"kendall.fvax.wiod","kendall.fvax.tiva","COU","GDP.pc")

wiod.tiva.fvax.exgr.gdp.kend<-gather_(wiod.tiva.fvax.exgr.gdp.kend,key_col="variable",value_col="kendall",gather_cols=c("kendall.fvax.wiod","kendall.fvax.tiva"))

wiod.tiva.fvax.exgr.gdp.spear<-dplyr::select_(wiod.tiva.fvax.exgr.gdp,"spearman.fvax.wiod","spearman.fvax.tiva","COU","GDP.pc")
wiod.tiva.fvax.exgr.gdp.spear<-gather_(wiod.tiva.fvax.exgr.gdp.spear,key_col="variable",value_col="spearman",gather_cols=c("spearman.fvax.wiod","spearman.fvax.tiva") )

wiod.tiva.fvax.exgr.gdp.spear.wide<- spread_(wiod.tiva.fvax.exgr.gdp.spear,key_col="variable", value_col="spearman")
wiod.tiva.fvax.exgr.gdp.spear.wide$dist.sq<-(wiod.tiva.fvax.exgr.gdp.spear.wide$spearman.fvax.wiod-wiod.tiva.fvax.exgr.gdp.spear.wide$spearman.fvax.tiva)^2
wiod.tiva.fvax.exgr.gdp.spear.wide$diff.wiod.tiva<-wiod.tiva.fvax.exgr.gdp.spear.wide$spearman.fvax.wiod-wiod.tiva.fvax.exgr.gdp.spear.wide$spearman.fvax.tiva

wiod.tiva.fvax.exgr.gdp.kend.wide<-spread_(wiod.tiva.fvax.exgr.gdp.kend,key_col="variable", value_col="kendall")
wiod.tiva.fvax.exgr.gdp.kend.wide$dist.sq<-(wiod.tiva.fvax.exgr.gdp.kend.wide$kendall.fvax.wiod-wiod.tiva.fvax.exgr.gdp.kend.wide$kendall.fvax.tiva)^2
wiod.tiva.fvax.exgr.gdp.kend.wide$diff.wiod.tiva<-(wiod.tiva.fvax.exgr.gdp.kend.wide$kendall.fvax.wiod-wiod.tiva.fvax.exgr.gdp.kend.wide$kendall.fvax.tiva)
res<-list(wiod.fvax.gdp,wiod.tiva.fvax.exgr.gdp.spear.wide,wiod.tiva.fvax.exgr.gdp.kend.wide)
names(res)<-c("fvax.gdp","RCA.FVAX.spear","RCA.FVAX.kend")
res
})
names(stata.r.fvax.gdp.fvax.RCA)<-c("Stata.RCA","R.RCA")
```

```{r WIOD STRENGTH OF ASSOCIATION GDP PER CAPITA,cache=T, message=FALSE, warning=FALSE, cache=T, include=FALSE,echo=F }
####WIOD STRENGTH OF ASSOCIATION GDP PER CAPITA####
library(ggrepel)
p.wiod.1<-filter(stata.r.fvax.gdp.fvax.RCA$R.RCA$fvax.gdp,!is.na(GDP.per.capita.2005))
#tikz("wiod_spear_RCA_gdp.tex")
p.wiod.tiva.fvax.exgr.gdp.sp<-ggplot(p.wiod.1, aes(x=GDP.per.capita.2005/1000, y=spearman)) +
  coord_cartesian(ylim=c(0.35,1.0))+
  geom_point(size=0.8) +
  geom_text_repel(aes(label=COU))+
  geom_smooth(method=lm,   # Add linear regression line
              se=T,color="black", size=1.0)+
  scale_x_continuous(breaks=c(seq(0,80,10)))+
  scale_y_continuous( breaks=c(seq(0.3,1,0.1) )) +
  labs(x="GDP per capita (constant thousand 2005 US)",y=expression(paste("Spearman's ", rho),title=""))+
  theme_bw()+
  #  annotate("text", x = 55, y = 0.88, label = lm_eqn(lm(spearman.va.exgr ~ GDP.per.capita.2005, plot)),colour="black", size = 4, parse=TRUE)+
  theme( legend.position = "none", # legend location in graph
         panel.grid.minor = element_blank(),
         axis.title=element_text( size="11"),
         axis.text.x=element_text(angle = 0, size=11))
#dev.off()
#tikz("wiod_kend_RCA_gdp.tex")
p.wiod.tiva.fvax.exgr.gdp.kend<-ggplot(p.wiod.1, aes(x=GDP.per.capita.2005/1000, y=kendall)) +
  coord_cartesian(ylim=c(0.35,1.0))+
  geom_point(size=0.8) +
  geom_text_repel(aes(label=COU))+
  geom_smooth(method=lm,   # Add linear regression line
              se=T,color="black", size=1.0)+
  scale_x_continuous(breaks=c(seq(0,80,10)))+
  scale_y_continuous( breaks=c(seq(0.3,1,0.1) )) +
  labs(x="GDP per capita (constant thousand 2005 US)",y=expression(paste("Kendall's ",tau)),title="")+
  theme_bw()+
  theme( legend.position = "none", # legend location in graph
         panel.grid.minor = element_blank(),
         axis.title=element_text( size="11"),
         axis.text.x=element_text(angle = 0, size=11))
```

```{r plot Strength of association and GDP per capita, fig.align='center', fig.cap='Figure 2.5: Strength of association and GDP per capita', echo=FALSE}
grid.arrange(p.wiod.tiva.fvax.exgr.gdp.sp,p.wiod.tiva.fvax.exgr.gdp.kend,ncol=2)
```

The figure shows a weak positive relationship between the strength of association and GDP per capita.
However, this relationship is not statistically significant.^[I used a non-parametric bootstrap with 1000 replications to bootstrap the standard errors. 
In the next step, I obtained the z-test statistic under the null hypothesis that the true value of the coefficient is zero. 
The test statistic for the two-sided test was 1.3. Therefore, I can not reject the null hypothesis.]
I conclude that the degree of divergence between the pattern of RCA on the basis of FVAT and EXGR is not simply explained by a country's level of development. 
Rather, it is likely explained by the contribution of the supply chain in determining the pattern of comparative advantage.

### Robustness of the results

I assess the robustness of the results on the strength of association between RCA rankings for FVAT and EXGR by comparing statistics obtained for the WiOD with those obtained on the TiVA data.
I focus on the set of 34 countries present in both databases. <p>
Figure 2.6 and figure 2.7  summarize the differences between the strength of association that I obtain in each dataset.
The figures highlight that there is a substantial degree of divergence between RCA rankings obtained in each dataset.<p>
Figure 2.6 consists of two panels. The left panel  shows the strength of association between RCA rankings on the basis of Kendall's $\tau$ and the right panel on the basis of Spearman's $\rho$. 
Further, within each panel is divided in three groups.
Each group displays the categorization of the difference between the strength of association in the TiVA compared to the WiOD. 
Further, the countries are sorted in each group according to the absolute difference of the strength of association. <p>
I find the following results for the differences between the strength of association, as measured with Spearman's $\rho$, of the RCA rankings obtained in the TiVA and the strength of association obtained in the WiOD.
The largest increases occur for  China (+0.39), Italy (+0.24) and Turkey (+0.24).
The largest decreases occur for Indonesia  (-0.40), India (-0.23), and Spain (-0.17). 
I find the following results for the differences between the strength of association, as measured with Kendall's $\tau$, of the RCA rankings obtained in the TiVA and the strength of association obtained in the WiOD.
The largest increases occur for  China (+0.41), Turkey (+0.24) and Italy (+0.24). 
The largest decrease occur for Indonesia ( -0.40), Spain (-0.17) and Greece (-0.17). 
```{r FVAT EXGR RCA - WiOD and TiVA, message=FALSE, warning=FALSE, cache=T, include=FALSE,echo=F}
stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear$eucl.dist<-sqrt(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear$dist.sq)
stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend$eucl.dist<-sqrt(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend$dist.sq)
stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend$GDP.std<-(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend$GDP.pc-mean(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend$GDP.pc))/sd(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend$GDP.pc)
stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear$GDP.std<-(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear$GDP.pc-mean(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear$GDP.pc))/sd(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear$GDP.pc)


euc.gdp.kend.lin.reg<-lm(eucl.dist~GDP.pc,data=stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend)
#euc.gdp.kend.lin.reg<-lm(sqrt(dist.sq)~GDP.pc,data=wiod.tiva.fvax.exgr.gdp.kend.wide)


# Bootstrap 95% CI for regression coefficients 

# function to obtain regression weights 
bs <- function(formula, data, indices) {
  d <- data[indices,] # allows boot to dplyr::select sample 
  fit <- lm(formula, data=d)
  return(coef(fit)) 
} 
### Bootstrap with 10000 replications euclid dist ~GDP p.c.####

results.lm.ass.gdp <- boot(data=stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend, statistic=bs, 
                R=1000, formula=eucl.dist~I(GDP.pc/1000))


#top 4 bottom 4
top4.bottom4.kend<-stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend%>% filter( 0.12<diff.wiod.tiva| diff.wiod.tiva< -0.18 )%>%  arrange(-diff.wiod.tiva)
#top 4 bottom 4
top4.bottom4.spear<-stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear%>% filter(0.139<diff.wiod.tiva| diff.wiod.tiva< -0.19) %>%arrange(-diff.wiod.tiva)

#categorize

stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend$diff.categories<-Hmisc::cut2(-stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend$diff.wiod.tiva,cuts=c(-0.33,-0.05,0.05,0.38),digits = 2)
nameorde.kend<-unique(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend$COU[order(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend$diff.categories,abs(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend$diff.wiod.tiva))])
nameorde.kend<-factor(nameorde.kend)
stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend$COU<-factor(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend$COU,levels=nameorde.kend)

#categorize spearman
stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear$diff.categories<-Hmisc::cut2(-stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear$diff.wiod.tiva,cuts=c(-0.43,-0.05,0.05,0.39,digits=2))
nameorde.spear<-unique(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear$COU[order(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear$diff.categories,abs(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear$diff.wiod.tiva))])
nameorde.spear<-factor(nameorde.spear)
stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear$COU<-factor(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear$COU,levels=nameorde.spear)
```
```{r FVAT EXGR RCA - WiOD and TiVA plots, cache=T,echo=F,message=F,warning=F}


library(forcats)
stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend<-gather_(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend,"variable","kendall",gather_cols=c("kendall.fvax.tiva","kendall.fvax.wiod"))
stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend<-stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend%>%mutate(.,variable=fct_recode(variable,TiVA="kendall.fvax.tiva",WiOD="kendall.fvax.wiod"))
# p.wi.ti.fvax.exgr.kend<-ggplot(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend, aes(x=kendall, y=COU, shape=variable)) + 
#   geom_segment(aes(yend=COU),xend=-0.05, colour="grey50")+
#   #geom_point(size=2.5)+
#   geom_point(size=2.5, aes(colour=diff.categories))+
#   scale_color_brewer(palette="Set1",labels=c("[-0.40,-0.05)","[-0.05, 0.05)","[ 0.05, 0.40]"))+
#   scale_shape_manual(values=c(16,17))+
#   #scale_color_manual(name="",values = c("#edf8b1","#7fcdbb","#2c7fb8"),labels=c("[-0.40,-0.05)","[-0.05, 0.05)","[ 0.05, 0.30]"))+
#   coord_cartesian(xlim=c(0.00,1))+
#   scale_x_continuous(breaks=c(seq(0,1,0.1)),minor_breaks=c(seq(0.05,1,.1))) +
#   labs(y="",x="Kendall's tau")+
#   guides(shape=guide_legend(title="Source",nrow=1))+#,colour=guide_legend(title="Difference"))+
#   theme_bw()+
#   theme( legend.position = "bottom", # legend location in graph
#         # panel.grid.minor = element_blank(),
#         panel.grid.major = element_line(size = 0.7),
#         panel.grid.minor = element_line(size = 0.5),
#          axis.title=element_text( size="11"),
#          axis.text.x=element_text(angle = 0, size=11))+
#   facet_grid(diff.categories~., labeller = as_labeller(facet_names),scales ="free_y",space="free_y")

stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear<-stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear%>%mutate(facet_names.spear =fct_recode( diff.categories,
"Decreased"="[-0.43,-0.05)",
"Unchanged" =    "[-0.05, 0.05)" ,
 "Increased" ="[ 0.05, 0.39)"
))
stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear<-gather_(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear,"variable","spearman",gather_cols=c("spearman.fvax.tiva","spearman.fvax.wiod"))
stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear<-stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear%>%mutate(.,variable=fct_recode(variable,TiVA="spearman.fvax.tiva",WiOD="spearman.fvax.wiod"))
stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend<-stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend%>%mutate(facet_names =fct_recode( diff.categories,
"Decreased"=levels( diff.categories)[1],
"Unchanged" =levels( diff.categories)[2] ,
 "Increased" =levels( diff.categories)[3]
))
clev.small.multiple<-ggplot(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend, aes(x=kendall, y=COU, shape=variable)) + 
  geom_segment(aes(yend=COU),xend=-0.05, colour="grey50")+
  geom_point(aes(shape = variable),color = 'black',size=3)+
  geom_point(size=2.5, aes(colour=diff.categories))+
  #scale_color_brewer(palette="Set1",labels=c("[-0.40,-0.05)","[-0.05, 0.05)","[ 0.05, 0.30]"))+
  scale_shape_manual(values=c(16,17))+
  scale_color_manual(name="",values = c("#edf8b1","#7fcdbb","#2c7fb8"),labels=c( "[-0.33,-0.05)", "[-0.05, 0.05)" ,"[ 0.05, 0.38]"))+
  coord_cartesian(xlim=c(0.00,1))+
  scale_x_continuous(breaks=c(seq(0,1,0.1)),minor_breaks=c(seq(0.05,1,.1))) +
  labs(y="",x=expression(paste("Kendall's ",tau)))+
  guides(colour=guide_legend(title=expression(paste(tau["TiVA"]," - ", tau["WiOD"])),order = 1,override.aes = list(shape = 15)),shape=guide_legend(title="Source :",order = 2))+
  theme_bw()+
  theme( legend.position = "bottom", # legend location in graph
         # panel.grid.minor = element_blank(),
         panel.grid.major = element_line(size = 0.7),
         panel.grid.minor = element_line(size = 0.5),
         axis.title=element_text( size="11"),
         axis.text.x=element_text(angle = 0, size=11),
         axis.title.x = element_text(margin =margin(t=12)))+
      facet_grid(facet_names~.,scales ="free_y",space="free_y")
  

clev.spear<- ggplot(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear, aes(x=spearman, y=COU, shape=variable)) + 
    geom_segment(aes(yend=COU),xend=-0.05, colour="grey50")+
    geom_point(aes(shape = variable),color = 'black',size=3)+
    geom_point(size=2.5, aes(colour=diff.categories,shape = variable))+
    #scale_color_brewer(palette="Set1",labels=c("[-0.40,-0.05)","[-0.05, 0.05)","[ 0.05, 0.30]"))+
    scale_shape_manual(values=c(16,17))+
   scale_color_manual(name="",values = c("#edf8b1","#7fcdbb","#2c7fb8"),labels=c("[-0.39,-0.05)","[-0.05, 0.05)","[ 0.05, 0.42]"))+
   # scale_fill_manual(name="",values = c("#edf8b1","#7fcdbb","#2c7fb8"),labels=c("[-0.41,-0.05)","[-0.05, 0.05)","[ 0.05, 0.40]"))+
    coord_cartesian(xlim=c(0.00,1))+
    scale_x_continuous(breaks=c(seq(0,1,0.1)),minor_breaks=c(seq(0.05,1,.1))) +
  labs(y="",x=expression(paste("Spearman's ",rho)))+
  guides(colour=guide_legend(title=expression(paste(rho["TiVA"]," - ", rho["WiOD"])),order = 1,override.aes = list(shape = 15)),shape=guide_legend(title="Source :",order = 2))+
    theme_bw()+
    theme( legend.position = "bottom", # legend location in graph
           # panel.grid.minor = element_blank(),
           panel.grid.major = element_line(size = 0.7),
           panel.grid.minor = element_line(size = 0.5),
           axis.title=element_text( size="11"),
           axis.text.x=element_text(angle = 0, size=11),
           axis.title.x = element_text(margin =margin(t=12)))+
    facet_grid(facet_names.spear~.,scales ="free_y",space="free_y")
# The labels of each group refer to a categorization of the difference between the strength of association in the TiVA compared to the WiOD. I refer to an increase of $> 0.05$ as "Increased" and a decrease of $< -0.05$ as "Decreased". I refer to a small increase of $<0.05$ or a small decrease of $>-0.05$ as "Unchanged"
```
```{r fig 2-6 FVAT EXGR RCA WiOD and TiVA,  fig.align='center', fig.cap='Figure 2.6: Strength of association: RCA - FVAT and EXGR - WiOD and TiVA', echo=FALSE}
grid.arrange(clev.small.multiple ,clev.spear,ncol=2 )
```


I investigate whether the discrepancies between the strength of the association of the two rankings may be related to a country's level of development, as measured by its GDP per capita. <p> 
Figure 2.7 shows on the y-axis the Euclidean distance between the strength of association of the RCA ranking on the basis of FVAT and on the basis of EXGR in the two data sets and on the x-axis it shows GDP per capita.
The relationship between the Euclidian distance and GDP per capita is statistically significant.
^[Formally, I used a linear regression with Euclidian distance as dependent variable and GDP per capita as independent variable. 
Second, I obtained the t-statistic under the null-hypothesis that the coefficient is equal to zero. 
The t-statistic using the nonparametric bootstrap standard error is -2.64. 
The corresponding p-value from a t-distribution with 32 degrees of freedom is lower than 0.01 and hence, I reject the null hypothesis.]
This result suggests that differences between the strength of association of the rankings are more pronounced for less developed countries, and hence the RCA rankings computed on the basis of value-added trade data for less developed countries should be regarded with caution.
```{r fig2-7 kend spear gdp, echo=FALSE, message=FALSE, warning=FALSE, cache=T}
plot.data.2.7<-filter(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.kend,!is.na(GDP.pc))
plot.data.2.7<-filter(plot.data.2.7,variable%in% "TiVA")
plot.wiod.kend.gdp.eu<-ggplot(plot.data.2.7,aes( y=eucl.dist,x=GDP.pc/1000))+
 # xlab("Country")+
  #ylab(expression(paste("Kendall's ", tau))) +
#ggplot(wiod.tiva.fvax.exgr.gdp.kend.wide, aes(x=COU,y=diff.wiod.tiva))+
  geom_point(size=0.8)+
  geom_text_repel(aes(label=COU))+
  geom_smooth(method="lm",se=T)+
labs(y=expression(paste("Euclidian dist. Kendall's ",tau," -- WiOD and TiVA")), x="GDP per capita 2005 US Dollar")+    
    theme_bw()+
  theme( legend.position = "bottom", # legend location in graph
         panel.grid.minor = element_blank(),
         axis.title=element_text( size="10"),
         axis.text.x=element_text(angle = 0, size=10))
#tikz("euclid_dist_spear_tiva_wiod_fvax_exgr_gdp.tex")
plot.data.2.7.2<-filter(stata.r.fvax.gdp.fvax.RCA$R.RCA$RCA.FVAX.spear,!is.na(GDP.pc))
plot.data.2.7.2<-filter(plot.data.2.7.2,variable%in% "TiVA")
plot.wiod.t.gdp.euc<-ggplot(plot.data.2.7.2,aes(x=GDP.pc/1000, y=eucl.dist))+
  geom_point(size=0.8)+
  geom_text_repel(aes(label=COU))+
  geom_smooth(method="lm",se=T)+
  ylab(expression(paste("Euclidian dist. Spearman's ", rho, " -- WiOD and TiVA" )))+    
  xlab("GDP per capita 2005 US Dollar")+
  theme_bw()+
  theme( legend.position = "bottom", # legend location in graph
         panel.grid.minor = element_blank(),
         axis.title=element_text( size="10"),
         axis.text.x=element_text(size=10))
#geom_smooth(method="lm", se=T)
#dev.off
```
```{r fig 2-7 plot distnce strength of associations of RCA rankings, echo=FALSE, fig.align='center', fig.cap='Figure 2.7: Euclidian distance: Strength of associations of RCA rankings'}
grid.arrange(plot.wiod.kend.gdp.eu,plot.wiod.t.gdp.euc, name="Distance strength of associations of RCA rankings -- WiOD and TiVA",ncol=2)
```

```{r bootstrap eucdist gdp lin R, message=FALSE, warning=FALSE, cache=T, include=FALSE}
euc.gdp.kend.lin.reg<-lm(eucl.dist~GDP.pc,data=plot.data.2.7)
#euc.gdp.kend.lin.reg<-lm(sqrt(dist.sq)~GDP.pc,data=wiod.tiva.fvax.exgr.gdp.kend.wide)


# Bootstrap 95% CI for regression coefficients 

# function to obtain regression weights 
bs <- function(formula, data, indices) {
  d <- data[indices,] # allows boot to dplyr::select sample 
  fit <- lm(formula, data=d)
  return(coef(fit)) 
} 
### Bootstrap with 1000 replications euclid dist ~GDP p.c.####

results.lm.ass.gdp <- boot(data=plot.data.2.7, statistic=bs, 
                R=1000, formula=eucl.dist~I(GDP.pc/1000))
```

## Summary: Results RCA

For most countries foreign value-added appears to contribute little in determining the pattern of comparative advantage.
A few exceptions are smaller countries that appear deeply integrated in the regional value chain.
<p> 
The strength of association of RCA rankings obtained on the basis of factor trade and EXGR is lower than for BVAT. 
This result indicates that the domestic supply chain contributes to determine the pattern of comparative advantage.
Here, I pin down which countries rely relatively more on their domestic supply chain, such as Germany and China, and which countries rely relatively
more on the intrinsic efficiency of the domestic production factors, such as the USA and Ireland. <p>
Further, I document that the strength of association between EXGR and FVAT rankings is independent of the data sources for developed economies while relatively sensitive to the data sources for developing countries.
I conclude that the value-added trade data for developing countries need to be regarded with caution.

<!--chapter:end:02_thesis_ch2.rmd-->

# Network Centrality and Revealed Comparative Advantage

```{r prep eigenvec, message=FALSE, warning=FALSE, cache=T, include=FALSE, echo=F}
EU.wiod.tiva<-c("AUS","AUT","BEL","BGR","CAN","CYP","CZE","DEU","DNK","ESP","EST","FIN","FRA","GBR","GRC", "HUN","IRL", "ITA", "LUX","NLD", "POL" ,"PRT","SVK", "SVN", "SWE")
Non.EU.wiod.tiva<-c("BRA","CHN","IDN","IND","JPN","KOR", "MEX", "TUR" ,"USA")
#here identify most important countries in terms of impact of industr-specificy output shocks to global produciton output
#most important industries in terms of eigenvector centrality
#setwd("/Users/sergej/Google Drive/Send_liza/Results")
set.seed(153)
#display color scales
pal<-function(col, border = "light gray", ...){
  n <- length(col)
  plot(0, 0, type="n", xlim = c(0, 1), ylim = c(0, 1),
       axes = FALSE, xlab = "", ylab = "", ...)
  rect(0:(n-1)/n, 0, 1:n/n, 1, col = col, border = border)
}
#bootstrap confidence intervals
bootstr.ci<-function(dataframe,method.name,nmr=10000) { 
  
  f <- function(data, indices, z="out.eigen.cent"){
    d <- data[indices,] # allows boot to select sample 
    cor(d[["z_fddva_std"]], d[[z]],method=method.name)
  }
  x<-as.data.frame(dataframe)
  bootcorr<- boot::boot(x, f, R=nmr)
  boot::boot.ci(bootcorr, type =c("bca","perc"))
}
#kendall ci adapted from package NSM3 from the book Hollander, Wolfe, and Chicken - Nonparametric Statistical Methods 3rd edition.
kendall.ci<-function (x = NULL, y = NULL, alpha = 0.05, type = "t") {
  continue <- T
  if (is.null(x) | is.null(y)) {
    cat("\n")
    cat("You must supply an x sample and a y sample!", "\n")
    cat("\n")
    continue <- F
  }
  if (continue & (length(x) != length(y))) {
    cat("\n")
    cat("Samples must be of the same length!", "\n")
    cat("\n")
    continue <- F
  }
  if (continue & (length(x) <= 1)) {
    cat("\n")
    cat("Sample size n must be at least two!", "\n")
    cat("\n")
    continue <- F
  }
  if (continue & (type != "t" & type != "l" & type != "u")) {
    cat("\n")
    cat("Argument \"type\" must be one of \"s\" (symmetric), \"l\" (lower) or \"u\" (upper)!", 
        "\n")
    cat("\n")
    continue <- F
  }
  Q <- function(i, j) {
    Q.ij <- 0
    ij <- (j[2] - i[2]) * (j[1] - i[1])
    if (ij > 0) 
      Q.ij <- 1
    if (ij < 0) 
      Q.ij <- -1
    Q.ij
  }
  C.i <- function(x, y, i) {
    C.i <- 0
    for (k in 1:length(x)) if (k != i) 
      C.i <- C.i + Q(c(x[i], y[i]), c(x[k], y[k]))
    C.i
  }
  if (continue) {
    c.i <- numeric(0)
    n <- length(x)
    for (i in 1:n) c.i <- c(c.i, C.i(x, y, i))
    options(warn = -1)
    tau.hat <- cor.test(x, y, method = "k")$estimate
    options(warn = 0)
    sigma.hat.2 <- 2 * (n - 2) * var(c.i)/n/(n - 1)
    sigma.hat.2 <- sigma.hat.2 + 1 - (tau.hat)^2
    sigma.hat.2 <- sigma.hat.2 * 2/n/(n - 1)
    if (type == "t") 
      z <- qnorm(alpha/2, lower.tail = F)
    if (type != "t") 
      z <- qnorm(alpha, lower.tail = F)
    tau.L <- tau.hat - z * sqrt(sigma.hat.2)
    tau.U <- tau.hat + z * sqrt(sigma.hat.2)
    if (type == "l") 
      tau.U <- 1
    if (type == "u") 
      tau.L <- -1
  }
  tau<- c(tau.L,tau.U)
  #Hollander, Wolfe, and Chicken - Nonparametric Statistical Methods 3rd edition.
  names(tau)<-c("ci.min.kend.hwc","ci.max.kend.hwc")
  tau
}
```
```{r eigenvec calc, message=FALSE, warning=FALSE, cache=T, include=FALSE}
setwd("/Users/sergej/Google Drive/Send_liza/Ch1_ch2_ch3_28_12")
load("tiva2005.Rdata")
names(tiva.orig)[1]<-"VAR"
tiva.2005.sub<-tiva.orig %>%dplyr::rename(.,ind=IND)  %>%  filter(.,Time %in% c(2005)) %>% dplyr::select(.,VAR, COU,PAR,Country,Partner,Value,Industry,ind)
tiva.2005.sub$ind<-gsub ("C","",tiva.2005.sub$ind)
tiva.2005.sub$ind<-gsub( "T","-",tiva.2005.sub$ind)
tiva.2005.sub<-as_tibble(tiva.2005.sub)

exclude.country<-c("Cambodia" ,"Saudi Arabia" ,"Brunei Darussalam")
exclude.cou<-c("LTU", "LVA", "MLT", "MYS", "PHL", "ROU", "ROW" ,"RUS" ,"SAU" ,"SGP" ,"THA","TUN", "TWN" ,"VNM" ,"ZAF" ,"CRI", "BRN", "KHM", "ISL" )
exclude.ind<-c("01-05","10-14","40-41")
tiva.2005.sub.final<-as_tibble(tiva.2005.sub) %>% filter(., ! Country %in% exclude.country) %>% filter(., ! Partner %in% exclude.country) %>% filter(.,! COU %in% exclude.cou) %>% filter( .,! PAR %in% exclude.cou) %>% filter(!ind %in% exclude.ind)
tiva.2005.sub.final$Value<-tiva.2005.sub.final$Value*10^6
for (i in c("Country", "Partner", "COU", "PAR","Industry","ind")){ 
  tiva.2005.sub.final[[i]]<-as.factor(tiva.2005.sub.final[[i]])
}
final.vars<-levels(tiva.2005.sub.final$VAR)[c(1,2,4)]
tiva.sub.final.split<-map(final.vars,.f=function(x){
  y<-dplyr::filter(tiva.2005.sub.final,VAR %in% x) 
  y$VAR<-droplevels(y$VAR)
  y
})
names(tiva.sub.final.split)<-map_chr(tiva.sub.final.split,.f=function(x){
  levels(x$VAR)
})
res<-list()
res<-map(tiva.sub.final.split,.f=function(x){ 
  result.1<-map(seq(1,20,1),.f=function(i){
  y<-levels(x$ind)[i]
  result<-dplyr::filter(x,ind %in% y)   
  result$ind<-droplevels(result$ind)
  result})
    })
res<-map(res,.f=function(x){ 
 names(x) <- map(x, .f=function(z){
    a<-levels(z$VAR)
    b<-levels(z$ind)
    b<-gsub("-","_",b)
    c<-paste(a,b,sep=".")
    c<-paste0("I.",c)
    c })
x
   })
eigen.res<-map(res$FFD_DVA,.f=function(x){ 
   x<-select_(x,"COU","PAR","Value") 
   x<-spread_(x,"COU","Value")
   rownames(x)<-NULL
   x<-column_to_rownames(x,"PAR")
   rows<-rowSums(x)
    cols<-colSums(x, na.rm=T)
    #y column normalized
    y<-sweep(x,MARGIN=2,STATS=cols, FUN="/")
    #x.new row normalized
    x.new<-sweep(x,MARGIN=1,STATS=rows, FUN="/")
    col.sms=colSums(y)
    row.sms<-rowSums(x.new)
   # #check if normalization correct
    assertthat::assert_that(all(row.sms>=0.999999))
   assertthat::assert_that(all(row.sms<=1.000001))
   assertthat::assert_that(all(col.sms>=0.999999))
   assertthat::assert_that(all(col.sms<=1.000001))
   #right eigenvector corresponds to in-degree hence in-eigenvec centrality
   y.eigen.2<-abs(as.numeric(eigen(y)$vector[,1]))
   names(y.eigen.2)<-rownames(x)
   #check unit euclidian norm
   assertthat::assert_that(all(round(sum(y.eigen.2^2),4)==1))

    x.eigen.2<-abs(as.numeric(eigen(t(x.new))$vector[,1]) )

   names(x.eigen.2)<-rownames(x)
   #check unit euclidian norm
   assertthat::assert_that(all(round(sum(x.eigen.2^2),4)==1))
   #left  multiply row-normalized matrix with left eigenvector: should give back same eigenvector
   eigen.vec2<- x.eigen.2%*% as.matrix(x.new)
   
   #right multiply col-normalized matrix with right eigenvector: should give back same eigenvector
   eigen.vec3<- as.matrix(y) %*%y.eigen.2
   
  assertthat::assert_that(all(round(x.eigen.2,4)==round(eigen.vec2,4)))
  assertthat::assert_that(all(round(y.eigen.2,4)==round(eigen.vec3,4)))
                  eigen.res<-list(x.eigen.2,y.eigen.2)         
                  names(eigen.res)<-c("out.eigen.cent","in.eigen.cent")
                  eigen.res})
  
   #eigen.vec
names.list.2<-c("out.eigen.cent","in.eigen.cent")


eigen.res.final<-map(eigen.res,.f=function(x){
  map2(x,names.list.2,.f=function(x,y){
  df<-as.data.frame(x)
  df<-df %>% rownames_to_column(.,var="COU")
                names(df)[2]<-y   
                df})})

  #build data frame from list
  eigen.res.final.df<-map(eigen.res.final,.f=function(x){
  bind_cols(x)})
  eigen.res.final.df<-map(eigen.res.final.df,.f=function(x){
    x<-x[,c(1,2,4)]
  })
  eigen.res.final.df<-data.table::rbindlist(  eigen.res.final.df, use.names=TRUE, fill=TRUE,idcol = "ind")
eigen.res.final.df$ind<-gsub("I.FFD_DVA.","",eigen.res.final.df$ind)

################################################################Normalise Eigenvector result################################################################
eigen.res.ind<-eigen.res.final.df%>%group_by(.,ind) %>% summarise(.,z.k.out=mean(out.eigen.cent),z.k.in=mean(in.eigen.cent))
eigen.res.cou<-eigen.res.final.df%>%group_by(.,COU) %>% summarise(.,z.i.out=mean(out.eigen.cent),z.i.in=mean(in.eigen.cent))
eigen.res.final.df<-inner_join(eigen.res.cou,eigen.res.final.df,by="COU")
eigen.res.final.df<-inner_join(eigen.res.ind,eigen.res.final.df,by="ind")
z..out<-mean(eigen.res.final.df$out.eigen.cent)
z..in<-mean(eigen.res.final.df$in.eigen.cent)
eigen.res.final.df<-eigen.res.final.df%>% mutate(.,out.eigen.cent.std=(out.eigen.cent*z..out)/(z.k.out*z.i.out),in.eigen.cent.std=(in.eigen.cent*z..in)/(z.k.in*z.i.in)) %>%dplyr::select(COU,ind,out.eigen.cent,out.eigen.cent.std,in.eigen.cent,in.eigen.cent.std)

#################################### Load Ricardo CA ####################################
setwd("/Users/sergej/Google Drive/Send_liza/Results")
results<-read_dta("results_end_march_centr_ricardo.dta")
nams<-names(results)
nams<-nams[-c(2,3 )]
nams<-nams[!grepl("exgr_2005", nams)]
nams<-nams[!grepl("dva_2005", nams)]
nams<-nams[!grepl("DVA_2005", nams)]
names<-c(nams,"sector_dva")

d<-gather(results, variable,value,-COU,-IND) %>%filter(.,variable%in%names)
#split data frame and apply cast, to obtain wide df
final<-lapply(split(d,d$variable),function(x) { 
  v<-spread(x,variable,value)
  v})

#for a single variable at a time
data.prep<-function(x){
  var<-names(x)[3]
  df.prep.wide<-spread_(x, "IND", var)
  df.prep.wide
}
final.wide<-map(.x=final,.f=data.prep)
varlist<- c("z_dva_std", "in_centr_dva_std", "z_exgr_std", "in_centr_exgr_std" ,"z_fddva_std","in_centr_fddva_std",
            "sector_dva", "in_centr_dva_US_Food", "sector_exgr", "in_centr_exgr_US_Food", "sector_fd_va", "in_centr_fddva_US_Food",
            "z_dva_std", "out_centr_dva_std","z_exgr_std", "out_centr_exgr_std","z_fddva_std", "out_centr_fddva_std",
            "sector_dva", "out_centr_dva_US_Food","sector_exgr", "out_centr_exgr_US_Food","sector_fd_va", "out_centr_fddva_US_Food")
#collect the corr results in a list
names.vec<-as.vector(names(final.wide)[7:18])
final.names<-c(names.vec[1],names.vec[10],names.vec[3],names.vec[11],names.vec[5],names.vec[12])

final.list<-lapply(final.names, function(x) final.wide[[x]])
names(final.list)<-final.names

ricardo.fddva<-gather(final.list$z_fddva_std,key="IND",val="z_fddva_std",-COU)
ricardo.fddva$IND<-gsub("T","_",ricardo.fddva$IND)
####################################compute correlation Ricardo RCA and Eigenvec centrality################################################################
#pass argument country name to filter_, which filters COU in RCA Df and in Eigen DF to specific country
#compute corr between filtered df's - across industries
eigen.ricardo<-inner_join(ricardo.fddva, eigen.res.final.df,by=c("COU","IND"="ind"))
cor.dfs<-map(unique(eigen.ricardo$COU),function(filter_country){
  condition <- lazyeval::interp(~ y == a, y=as.name("COU"), a=filter_country)  
c<-filter_(eigen.ricardo, condition) 
})
names(cor.dfs)<-map_chr(cor.dfs,.f=function(x){ 
  unique(x$COU)
})
eigen.names<-c("out.eigen.cent.std","in.eigen.cent.std")
cor.method<-c("spearman","kendall")
cor.r.res<-map(cor.dfs,.f=function(dfs,nmr){ 
  map(eigen.names, .f=function(z,df,nmr){
    a<-df$z_fddva_std
    b<-df[[z]]
  res<-cor(a,b,method="spearman")
  names(res)<-"spearman"
  res.2<-cor(a,b,method="kendall")
  names(res.2)<-"kendall"
  res.spear.kend<-c(res,res.2)
  
  # #Fieler (1957) Spearman rank-order SE, Bonett and Wright’s (2000) SE and bootstrap ci 95% pc bca
  #ci.spear.bs<-bootstr.ci(df, method.name = "spearman", nmr = 10000)
  #ci.spear.bs.pc<-c(ci.spear.bs$percent[,4],ci.spear.bs$percent[,5])
  #names(ci.spear.bs.pc)<-c("ci.min.spear.bs.perc","ci.max.spear.bs.perc")
  #ci.spear.bs.bca<-c(ci.spear.bs$bca[,4],ci.spear.bs$bca[,5])
  #names(ci.spear.bs.bca)<-c("ci.min.spear.bs.bca","ci.max.spear.bs.bca")
  #ci.spear<-c(ci.spear.bs.pc,ci.spear.bs.bca)
  
  #ci.kend.bs<-bootstr.ci(df, method.name = "kendall",  nmr = 10000)
  #ci.kend.bs.pc<-c(ci.kend.bs$percent[,4],ci.kend.bs$percent[,5])
  #names(ci.kend.bs.pc)<-c("ci.min.kend.bs.perc","ci.max.kend.bs.perc")
  #ci.kend.bs.bca<-c(ci.kend.bs$bca[,4],ci.kend.bs$bca[,5])
  #names(ci.kend.bs.bca)<-c("ci.min.kend.bs.bca","ci.max.kend.bs.bca")
  #ci.kend<-c(ci.kend.bs.pc,ci.kend.bs.bca)
  
   #ci.bs<-c(ci.spear,ci.kend)
    z.spear <- psych::fisherz(res)
    z.kend <- psych::fisherz(res.2)
    n<-length(df$IND)
    
    zbwbounds.spear <- z.spear + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res^2)/2)/(n - 3))
    #zbwbounds.kend <-   #z.kend + c(qnorm(0.025), qnorm(0.975)) * sqrt((1 + (res.2^2)/2)/(n - 3))
    
    zfielbounds.spear <- z.spear+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 

    zfielbounds.kend <-   z.kend+ c(qnorm(0.025), qnorm(0.975)) * sqrt(1.06/(n - 3)) 
    
    rsfielbounds.spear <- psych::fisherz2r(zfielbounds.spear)
    rsfielbounds.kend <-psych::fisherz2r(zfielbounds.kend)
    
    names(rsfielbounds.spear)<-c("ci.min.spear.fiel","ci.max.spear.fiel")
    names(rsfielbounds.kend)<-c("ci.min.kend.fiel","ci.max.kend.fiel")
    rsfielbounds<-c(rsfielbounds.spear,rsfielbounds.kend)
    
    rbwbounds.spear <- psych::fisherz2r(zbwbounds.spear)
    rbwbounds.kend <- kendall.ci(a,b)
    #Abdi 
    #Abdi: res.2 +c(qnorm(0.025), qnorm(0.975)) * (2* ( 2*n + 5 ) ) / (9*n*(n -1))
    
    names(rbwbounds.spear)<-c("ci.min.spear.bw","ci.max.spear.bw")
    #names(rbwbounds.kend)<-c("ci.min.kend.bw","ci.max.kend.bw")
    
    rbwbounds<-c(rbwbounds.spear,rbwbounds.kend)
    #names(rsfielbounds)<-c(paste("ci.min_",names(res.3)[1]),paste("ci.max_",names(res.3)[1]),paste("ci.min_",names(res.3)[2]),paste("ci.max_",names(res.3)[2]))
   #names(ci.kend)<-
  res.f<-as.data.frame(c(res.spear.kend, rsfielbounds,rbwbounds))#,ci.bs))
  res.f$COU<-rownames(res.spear.kend)
#  colnames(res.f)<-z
  res.f
},df=dfs,nmr=10^3)
  })
names.list.3<-c("out.eigen.RCA","in.eigen.RCA")


####bind out and in eigenvec results together and name columns accrodingly
cor.res.wiod<-map(cor.r.res,.f=function(x){
  y<-cbind(x[[1]],x[[2]])
  y<-as.data.frame(y)
  y$var<-rownames(y)
  names(y)<-c(names.list.3,"VAR")
  y
})
###create df by binding lists together and create id column
cor.res.wiod.df<-data.table::rbindlist(cor.res.wiod, use.names=TRUE, fill=TRUE,idcol = "COU")
#cor standardized eigen and standardized RCA
results.spearman<-cor.res.wiod.df%>%filter(.,VAR %in% grep("spear",unique(cor.res.wiod.df$VAR),value=T))%>% dplyr::select(COU, out.eigen.RCA,VAR) %>% spread(VAR,out.eigen.RCA)
results.kendall<-cor.res.wiod.df%>%filter(.,VAR %in% grep("kend",unique(cor.res.wiod.df$VAR),value=T))%>% dplyr::select(COU, out.eigen.RCA,VAR) %>% spread(VAR,out.eigen.RCA)
```
```{r plot cor RCA Eigen,cache=T}
################################################################PLOT RESULTS OF COR EIGEN RCA ################################################################
results.spearman$cat.4<-Hmisc::cut2(results.spearman$spearman,c(-0.14,0,0.7,0.89,.99), digits=2,g=4)
#tikz( "spear_RCA_out_cent.tex")
p.eigen.RCA.spear<-ggplot(results.spearman, aes(reorder(COU,-spearman),spearman,fill=cat.4))+ 
 # geom_errorbar(aes(ymin=ci.min.spear.fiel, ymax=ci.max.spear.fiel), colour="black", width=.1) +
  #geom_point(size=1.5)+
  geom_bar(stat="identity", position="dodge",colour="black")+
  scale_y_continuous(breaks=c(round(seq(-0.2,1,0.1),1))) +
  coord_cartesian(ylim=c(-0.2,1))+
  scale_fill_grey(start = 0.3, end = .8,breaks=c(levels(results.spearman$cat.4)[4],levels(results.spearman$cat.4)[3],levels(results.spearman$cat.4)[2],levels(results.spearman$cat.4)[1]))+
  labs(y=expression(paste("Strength of Association - Spearman's ",rho)),x="")+
  geom_hline(aes(yintercept=median(results.spearman$spearman)))+
  guides(fill=guide_legend(title=expression(rho)))+
  #geom_hline(aes(yintercept= 0.71))+
  theme_bw()+
  theme(plot.title = element_text(hjust=0.5,vjust=0.5),
        legend.position = "bottom", # legend location in graph
        panel.grid.minor = element_blank(),
     axis.title=element_text( size="10"),
        axis.text.x=element_text(angle = 90, size=9))+
  #mean 
  annotate("text", 41, 0.86, label ="med." , angle = 0)
 #annotate("text", 38, 0.68, label ="1st Qu.- 1.5*IQR" , angle = 270)+
#dev.off()
####################################################################Outlier Analysis ####################################################################
outliner.spear<-c(summary(results.spearman$spearman)[2]-1.5*stats::IQR(results.spearman$spearman),summary(results.spearman$spearman)[5]+1.5*stats::IQR(results.spearman$spearman))
outliner.kend<-c(summary(results.kendall$kendall)[2]-1.5*stats::IQR(results.kendall$kendall),summary(results.kendall$kendall)[5]+1.5*stats::IQR(results.kendall$kendall))
robust.kendall.outlier<-robustbase::adjboxStats (results.kendall$kendall)$fence
# filter(results.kendall,kendall<robust.kendall.outlier[1])
# filter(results.kendall,kendall>robust.kendall.outlier[2])
robust.spearman.outlier<-robustbase::adjboxStats (results.spearman$spearman)$fence
# filter(results.spearman,spearman<robust.spearman.outlier[1])
# filter(results.spearman,spearman>robust.spearman.outlier[2])

results.kendall$cat.4<-Hmisc::cut2(results.kendall$kendall,c(-0.13,0,round(outliner.kend[1],3),round(median(results.kendall$kendall),3),round(max(results.kendall$kendall),3)),g=4,digits = 2)

p.kend.RCA.eigen<-ggplot(results.kendall, aes(reorder(COU,-kendall),kendall, fill=cat.4))+ 
  #geom_errorbar(aes(ymin=ci.min.kend.hwc, ymax=ci.max.kend.hwc), colour="black", width=.1) +
  #geom_point(size=1.5)+
  coord_cartesian(ylim=c(-0.15,1))+
  geom_bar(stat="identity", position="dodge", color="black")+
  scale_y_continuous(breaks=c(round(seq(-0.6,1,0.1),1))) +
  scale_fill_grey(start = 0.3, end = .8,breaks=c(levels(results.kendall$cat.4)[4],levels(results.kendall$cat.4)[3],levels(results.kendall$cat.4)[2],levels(results.kendall$cat.4)[1]))+
  labs(y=expression(paste("Strength of Association - Kendall's ", tau)),x="")+
  
  geom_hline(aes(yintercept=median(results.kendall$kendall)))+
  guides(fill=guide_legend(title=expression(tau)))+
 # geom_hline(aes(yintercept=outliner.kend[1]))+
  theme_bw()+
  theme(plot.title = element_text(hjust=0.5,vjust=0.5),
        legend.position = "bottom", # legend location in graph
        panel.grid.minor = element_blank(),
        axis.title=element_text( size="10"),
        axis.text.x=element_text(angle = 90, size=9))+
  #mean 
  annotate("text", 41, 0.72, label ="med." , angle = 0)
```

In this chapter I move further and use the VAT data to investigate the importance of each country in the diffusion of shocks. 
To assess this importance, I use the notion of eigenvector centrality. 
This chapter is structured as follows. 
First I define the concepts of the international trade network and eigenvector centrality. 
Second, I illustrate the concepts of the international trade network on the basis of value-added trade and eigenvector centrality for the full set of countries. 
Third, I compute eigenvector centrality at the sectoral level for a subset of ten manufacturing sectors and two service sectors and I analyze the variation of importance of the seven most central countries in the diffusion of shocks in this set of 12 sectoral networks. 
Fourth, I show that relative eigenvector centrality for any pair of countries pins down the pattern of RCA.

##The international trade network and the concept of network centrality

I start by providing the definitions of the binary international trade network and the weighted international trade network.
On the basis of these definitions, I define the concept of network centrality. 
I close this section with an illustration of the international trade network and network centrality for total value-added.
<p>
The binary international trade network consists of $N = 1, \dots , n$ nodes, where 
each node represents a country.
Each edge $g_{i,j} $ represents a trade relationship between an origin country $j$ and destination country $i$.
Specifically, the presence of a trade relationship means that there are positive exports from country $j$ to country $i$ in 2005. 
Trade relationships need not to be symmetric and hence the international trade network is directed.
Formally, I define the variable $g_{i,j}$ as follows.
\[g_{i,j} = \begin{cases}
 1 \quad \text{if} \quad x_{i,j} > 0 \\
0 \quad \text{if} \quad x_{i,j} = 0 \end{cases} \]
Each edge $g_{i,j}$ is recorded in the adjacency matrix $\boldsymbol{G}$ of the dimensions $n \times n$.<p>
The binary trade network is the tuple of nodes and edges $(N, \boldsymbol{G})$. <p>
The weighted international trade network is defined as follows. 
The weight of each edge $w_{i,j}$ is the dollar value of exports from country $j$ to country $i$.
<p> All the weights are recorded in the weight matrix $\boldsymbol{W}$ of the dimensions $n \times n$. 
The weighted international trade network is the triplet of the set of countries, the matrix of trade relationships and the weight matrix, thus  $ITN =( (N, \boldsymbol{G}), \boldsymbol{W})$. <p>
I use network theory for the following reason. 
My focus is to study the importance of countries in the propagation of shocks. 
The propagation of a shock occurs through economic interactions of countries or industries influencing each other.
As @acemoglu2012 note, shocks may propagate through the full set of countries or industries as a result of interactions among economic actors.
The focus of network theory are relationships among nodes, thus their interactions, while taking into account the complete set of the effects of other nodes on these interactions [@de2014network].
Thus using the framework of network theory allows us to study the propagation of shocks among countries while taking into account the complete set of relations among them. I conduct this study separately for total value-added in the world trade network as well for total sectoral value-added. 
<p>
Next, I define eigenvector centrality for the binary trade network. 
Then I extend the definition of eigenvector centrality to the weighted international trade network.^[I follow the outline of @bonacich2001evcent, who attribute the original exposition to @bonacich1972.]
 <p>
The centrality $c^e_{j'}$ of a node $j'$ is proportional to the weighted sum of the centrality of the nodes it is connected to. \begin{align} \lambda c^e_{j'} &= g_{1,j'} c^e_{1}+g_{2,j'} c^e_{2}+ \dots + g_{n,j'} c^e_{n} \end{align} where $\lambda$ denotes a proportionality constant. 
The equation 6 may be restated in matrix notation as below. It is the general eigenvector equation and  it  has $n$ solutions for $n$ values of $\lambda$. 
\begin{align}
\label{eq:6}
\boldsymbol{G}^T \boldsymbol{C}^e = \boldsymbol{C}^e  \boldsymbol{\lambda}
\end{align}
$\boldsymbol{G}^T$ denotes the transpose of the trade relationships matrix with $n \times n$ dimensions. The trade relationship matrix is row normalized so that each row adds up to 1.  $\boldsymbol{\lambda}$ denotes the diagonal matrix of eigenvalues and $\boldsymbol{C}^e$ denotes a $n \times n$ matrix, where each column is an eigenvector of the trade relationship matrix $\boldsymbol{G}$.
<p>
Two important properties of eigenvector centrality follow from the Perron-Frobenius theorem.
 First, the theorem states that the largest eigenvalue is equal to one for a row stochastic matrix and all other eigenvalues are smaller if the matrix $\boldsymbol{G}$  has only positive elements.^[
A row stochastic matrix denotes a matrix where the sum of each row is equal one.]
Second, it implies that the left-hand eigenvector of the largest eigenvalue is positive for a non-negative row stochastic matrix.<p>
I focus on the principal eigenvector corresponding to the largest eigenvalue as the measure of network centrality. It is this eigenvector to which the matrix of trade relationships converges when taken to the power of $n$. <p> 
Accordingly, equation 7 simplifies to the following equation, where $\boldsymbol{c}^e$ denotes the principal eigenvector.
\begin{align}
\label{eq:17}
 \boldsymbol{G}^T \boldsymbol{c}^e&=	 \boldsymbol{c}^e \end{align}
 <p>
I focus on eigenvector centrality as the theoretically sound choice to analyze the centrality of nodes in the propagation of shocks through the international trade network. I motivate the choice as follows. 
First, @acemoglu2012 showed that the unit eigenvector is a first-order characteristic of the importance of sectors in the propagation of shocks in a country's production network. 
In particular, they showed that a shock to a sector with a higher eigenvector centrality has larger effects on total value-added.
 Second, the elements of the unit eigenvector measure how much a node contributes to the value of the matrix when an extra unit of value is generated [@spizzirri2011]. 
 <p>
The weighted version of eigenvector centrality is based on the row-normalized weight matrix $\boldsymbol{A}$, where the rows $i=1,\dots,n$ are the importing countries and the columns $j=1,\dots,n$ are the exporting countries. Each element $a_{i,j}$ is strictly positive, if country $j$ exports to country $i$. 
Further, $a_{i,j}$ is the share of total expenditures of country $i$ attributed to country $j$. 
The weighted out-eigenvector centrality is then the right hand eigenvector of the transposed matrix $\boldsymbol{A}^T$. <p>
In the following I use the weighted out-eigenvector centrality.
The choice is motivated by the connection of this concept to the theory of comparative advantage.
According to network theory the weighted out-eigenvector implies that a country is exporting relatively more to countries with a high share of exports.
According to the theory of comparative advantage a country is expected to produce relatively more and contribute more to the world production when it is more productive. 
Hence both concepts may capture the ability of a country while taking into account all others' ability.


### Centrality of countries in the propagation of shocks in the international trade network

In this subsection I discuss the importance of countries in terms of their contribution to the diffusion of shocks in the international trade network. 
In particular, the importance of countries is measured according to eigenvector centrality. 
<p>
I use the weighted out-eigenvector centrality as the measure of importance of a country in the shock propagation in the international trade network. 
A country with a higher centrality is expected to contribute relatively more with its exports to the international trade network.
Therefore, a shock to a country with a higher centrality has larger effects on the international trade network. <p>
I construct the international trade network on the basis of forward value-added trade for the following reasons.
First, a large literature has addressed the importance of countries in the international shock propagation on the basis of gross trade flows.
However, given the importance of international production fragmentation, this analysis should be based on on value-added trade instead of gross exports.
Second, the definition of the international trade networks on the basis of forward value-added trade, allows to correctly identify how shocks to a country's domestic production factors propagate through the international trade network. <p>
Figure 3.1 shows the eigenvector centrality of the full set of countries in the international trade network for total value-added trade. 
The eigenvector centrality scores indicate a core-periphery structure.^[I define the frontier of the core group according to the criterium that the ratio of two adjacent countries' eigenvector centralities is less than 1.7.]
The core group with the highest eigenvector centralities consists of the following countries: USA, Germany, Japan, France, Great Britain, China and Italy. 
Notably, China is the only country in this group which is not a high-income country.
The importance of the USA as the most central economy is highlighted by the result that its eigenvector centrality is 1.3 times higher than the second most central economy, Germany. 
The periphery group consists of the remaining countries. 
This group includes nearly three quarters of the countries in the sample. <p>
The result that the eigenvector-centrality on the basis of the international trade network with FVAT indicates a core-periphery structure extends a similar finding by @de2014network for the international trade network on the basis of gross exports in 2007. 

```{r shock propagation countries, message=FALSE, warning=FALSE, cache=T, include=FALSE,echo=F}
exclude<-c("15T16","20","21T22","36T37","60T64","45","50T52","55","75T95")
exclude<-gsub("T","_",exclude)
out.eigen.plot.2005<-eigen.res.final.df %>% filter(.,! ind %in% exclude) %>% select(out.eigen.cent,ind,COU)
out.eigen.plot.2005<-rename(out.eigen.plot.2005,IND=ind)
#create std max eigenvector within each industry
out.eigen.plot.2005<- out.eigen.plot.2005%>% group_by(IND) %>% mutate(out.std=out.eigen.cent/max(out.eigen.cent))
exclude<-c("15T16","20", "36T37","60T64","45","50T52","55","75T95")
exclude<-gsub("T","_",exclude)
out.eigen.plot.fvax.2005<-eigen.res.final.df %>% dplyr::rename(IND=ind) %>% dplyr::filter(.,! IND %in% exclude)
#create std max eigenvector within each industry
out.eigen.plot.fvax.2005<- out.eigen.plot.fvax.2005%>% group_by(IND) %>% mutate(out.std=out.eigen.cent/max(out.eigen.cent))

out.wide.fvax.20<-eigen.res.final.df %>% dplyr::select(COU,ind,out.eigen.cent)  %>%spread(.,ind,out.eigen.cent)
seq_numbrs=seq(2,length(out.wide.fvax.20),1)

#create lists of top 5 countries for each industry,  
top_5<-map(seq_numbrs,.f  =function(n){
  out.wide.fvax.20[,c(1,n)]%>%top_n(.,5)
})
  
#first create a list of the names of the countries across industries and then merge the results into a single vecotr including each country max 1 so that as result we have the set of top5 countries across industries
top_5.names<-map(top_5,.f  =function(df){
  select_(df, .dots=list(quote(COU)))
})
#merge the results
numbers<-seq(1,11,1)
top.5.all<-  unique(map_df( numbers,.f=function(n){ 
 res<-cbind( top_5.names[[n]], top_5.names[[n+1]] ) } ))
top.5.all<-rbind(top.5.all,"ITA")
titles<-map(top_5,.f=function(df){names(df)[2]})


#myColors <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00',"#cab2d6") 
myColors<-RColorBrewer::brewer.pal(12,"Paired")

names(myColors) <-top.5.all$COU
  

countrys<-c("IDN","IND")
vars.sub<-c("spearman","kendall")
#################################################################ANALYSE DF W KEND AND SPEAR RESULTS################################################################
#build a df with cou spearman method_ci ci_min ci_max 
analyse.spear<-as_tibble(dplyr::select(results.spearman,COU,spearman,ci.min.spear.bw,ci.max.spear.bw,ci.min.spear.fiel,ci.max.spear.fiel))
analyse.spear<-analyse.spear%>% gather(key="method",value="ci",-COU,-spearman)
analyse.spear$method<-gsub("ci.min.spear.","ci.min.spear:",analyse.spear$method)
analyse.spear$method<-gsub("ci.max.spear.","ci.max.spear:",analyse.spear$method)
analyse.spear%>% gather(key="methods",value=ci,-COU,-spearman)
analyse.spear<-analyse.spear%>%separate(.,method,c("min.max","methods"),sep=":") %>% mutate(.,min.max=gsub("ci.max.spear","ci.max",.$min.max))%>% mutate(.,min.max=gsub("ci.min.spear","ci.min",.$min.max)) 
analyse.spear$min.max<-as.factor(analyse.spear$min.max)
analyse.spear$COU<-as.factor(analyse.spear$COU)  
analyse.spear$methods<-as.factor(analyse.spear$methods)
analyse.spear<-spread(analyse.spear,min.max,ci)

analyse.kend<-as_tibble(dplyr::select(results.kendall,COU,kendall,ci.min.kend.hwc,ci.max.kend.hwc))

#how many sd are india and idnoesia away from mean of strenght of assoication kendall/spearman
analyse.kend$COU<-as.factor(analyse.kend$COU)
df.final<-inner_join(analyse.kend,filter(analyse.spear,methods=="fiel"), by="COU")
df.final.sub<-filter(df.final,COU  %in% c("IND","IDN"))
df.final.sub<-dplyr::select(df.final.sub,spearman,kendall)


DoubleMAD <- function(x, zero.mad.action="warn"){
  # The zero.mad.action determines the action in the event of an MAD of zero.
  # Possible values: "stop", "warn", "na" and "warn and na".
  x         <- x[!is.na(x)]
  m         <- median(x)
  abs.dev   <- abs(x - m)
  left.mad  <- median(abs.dev[x<=m])
  right.mad <- median(abs.dev[x>=m])
  if (left.mad == 0 || right.mad == 0){
    if (zero.mad.action == "stop") stop("MAD is 0")
    if (zero.mad.action %in% c("warn", "warn and na")) warning("MAD is 0")
    if (zero.mad.action %in% c(  "na", "warn and na")){
      if (left.mad  == 0) left.mad  <- NA
      if (right.mad == 0) right.mad <- NA
    }
  }
  return(c(left.mad, right.mad))
}

DoubleMADsFromMedian <- function(x, zero.mad.action="warn"){
  # The zero.mad.action determines the action in the event of an MAD of zero.
  # Possible values: "stop", "warn", "na" and "warn and na".
  two.sided.mad <- DoubleMAD(x, zero.mad.action)
  m <- median(x, na.rm=TRUE)
  x.mad <- rep(two.sided.mad[1], length(x))
  x.mad[x > m] <- two.sided.mad[2]
  mad.distance <- abs(x - m) / x.mad
  mad.distance[x==m] <- 0
  return(mad.distance)
}
df.final$mad.from.median<-abs(df.final$kendall-median(df.final$kendall))/mad(df.final$kendall)
df.final$mad.from.median.spear<-abs(df.final$spearman-median(df.final$spearman))/mad(df.final$spearman)
df.final$double.mad.from.median<-DoubleMADsFromMedian(df.final$kendall)
out.eigen.plot.fvax.2005<-eigen.res.final.df %>% dplyr::rename(IND=ind) %>% filter(.,! IND %in% exclude)
#create std max eigenvector within each industry
out.eigen.plot.fvax.2005<- out.eigen.plot.fvax.2005%>% group_by(IND) %>% mutate(out.std=out.eigen.cent/max(out.eigen.cent))

out.eigen.plot.fvax.2005$IND<-gsub("_","-",out.eigen.plot.fvax.2005$IND)
top.5.all.ranks.long<-out.eigen.plot.fvax.2005%>% dplyr::select(COU,IND,out.eigen.cent) %>% group_by(.,IND)%>% mutate(rank_ind=rank(-out.eigen.cent)) %>% dplyr::filter(COU %in% top.5.all$COU)
top.5.all.ranks.wide<-dplyr::select(top.5.all.ranks.long,COU,IND,rank_ind) %>% spread(IND,rank_ind) %>% mutate(row_sms=rowSums(.[,-1]))%>% arrange(row_sms) 
top.5.all.ranks.wide.table<-top.5.all.ranks.wide%>%dplyr::select(-row_sms) 

#table of all top 12 countries in the selected 12 industries
#busy table difficult
top.5.all.ranks.wide.table$COU<-as.character(top.5.all.ranks.wide.table$COU)
#print(xtable::xtable(top.5.all.ranks.wide.table,digits=0),booktabs=T,include.rownames=F)
#other id parallel coordinates plot 
#however more than 5 different lines overwhelming, so select top

upper7<-top.5.all.ranks.wide%>% top_n(.,7,-row_sms) %>% dplyr::select(-row_sms) %>% gather(.,IND,rank,-COU)
upper7$IND<-gsub("_","-",upper7$IND)
#upper5$IND<-gsub("_","-",upper5$IND)
# upper5<-upper5%>%spread(IND,rank)

upper.4<-c("USA", "DEU", "JPN", "FRA")
upper7_4<-filter(upper7,COU %in% upper.4)
upper7_3<-filter(upper7,!COU %in% upper.4)
col.up.7<-RColorBrewer::brewer.pal(7,"Set1")
#tikz("parallel_coord_four.tex")
p.7.4<-ggplot(upper7_4,aes(x=IND,y=rank, group=COU,color=COU,shape=COU))+
  geom_line()+
  geom_point(aes(shape=COU))+
  coord_cartesian(ylim=c(1,13))+
  scale_y_continuous(breaks=seq(0,14,1))+
  scale_shape_manual(values=c(seq(0,4,1)))+
  scale_color_manual(values=col.up.7[1:5])+
  theme_bw()+
  labs(y="Rank",x="ISIC Rev.3 Code", title="")+
  theme( plot.title = element_text(hjust=0.5,vjust=0.5),
         legend.position = "bottom", # legend location in graph
         panel.grid.minor = element_blank(),
         axis.title=element_text( size="10"),
         axis.text.x=element_text(angle = 90, size=10))

col.up.7.2<-col.up.7
#replace sixed color with the same yellow color with less lightness
col.up.7.2[6]<-"#cccc00"
#tikz("parallel_coord_three.tex")
p.7.3<-ggplot(upper7_3,aes(x=IND,y=rank, group=COU,color=COU,shape=COU))+
  geom_line()+
  geom_point(aes(shape=COU))+
  scale_y_continuous(breaks=seq(0,14,1))+
  scale_shape_manual(values=c(seq(4,7,1)))+
  scale_color_manual(values=col.up.7.2[4:7])+
  theme_bw()+
  labs(y="Rank",x="ISIC Rev.3 Code", title="")+
  theme( plot.title = element_text(hjust=0.5,vjust=0.5),
         legend.position = "bottom", # legend location in graph
         panel.grid.minor = element_blank(),
         axis.title=element_text( size="10"),
         axis.text.x=element_text(angle = 90, size=10))
#dev.off()
```
```{r shock importance of countries, message=FALSE, warning=FALSE, cache=T, include=F,echo=F}
tiva.2005.agg.cou.par<-tiva.2005.sub.final %>% filter(.,VAR%in%c("FFD_DVA")) %>% group_by(COU,PAR) %>% summarise(sum.fvat.ind=sum(Value))
 tiva.2005.agg.cou.par<-spread_(tiva.2005.agg.cou.par,"COU","sum.fvat.ind")
  rownames(tiva.2005.agg.cou.par)<-NULL
  tiva.2005.agg.cou.par<-column_to_rownames(tiva.2005.agg.cou.par,"PAR")
  rows<-rowSums(tiva.2005.agg.cou.par)
  cols<-colSums(tiva.2005.agg.cou.par, na.rm=T)
  y<-sweep(tiva.2005.agg.cou.par,MARGIN=2,STATS=cols, FUN="/")
  tiva.2005.agg.cou.par.new<-sweep(tiva.2005.agg.cou.par,MARGIN=1,STATS=rows, FUN="/")
  col.sms=colSums(y)
  row.sms<-rowSums(tiva.2005.agg.cou.par.new)
  # #check if normalization correct
  assertthat::assert_that(all(row.sms>=0.999999))
  assertthat::assert_that(all(row.sms<=1.000001))
  assertthat::assert_that(all(col.sms>=0.999999))
  assertthat::assert_that(all(col.sms<=1.000001))
  #right eigenvector corresponds to out-degree hence out-eigenvec centrality
  y.eigen.2<-abs(as.numeric(eigen(y)$vector[,1]))
  names(y.eigen.2)<-rownames(tiva.2005.agg.cou.par)
  #check unit euclidian norm
  assertthat::assert_that(all(round(sum(y.eigen.2^2),4)==1))
  x.eigen.2<-abs(as.numeric(eigen(t(tiva.2005.agg.cou.par.new))$vector[,1]) )
  #the sign of the eigenvector is arbitrary, hence absolut value does not alter the eigenvector
  names(x.eigen.2)<-rownames(tiva.2005.agg.cou.par)
  #check unit euclidian norm
  assertthat::assert_that(all(round(sum(x.eigen.2^2),4)==1))
  #left  multiply col-normalized matrix with this eigenvector: should give back same eigenvector
  eigen.vec2<- x.eigen.2%*% as.matrix(tiva.2005.agg.cou.par.new)
  
  #right multiply row-normalized matrix with this eigenvector: should give back same eigenvector
  eigen.vec3<- as.matrix(y) %*%y.eigen.2
  
  assertthat::assert_that(all(round(x.eigen.2,4)==round(eigen.vec2,4)))
  assertthat::assert_that(all(round(y.eigen.2,4)==round(eigen.vec3,4)))
  eigen.res.cou.par<-cbind(x.eigen.2,y.eigen.2)         
  eigen.res.cou.par.tb<-as_tibble(eigen.res.cou.par)
  eigen.res.cou.par.tb$COU<-rownames(eigen.res.cou.par)
  
  
  eigen.res.cou.par.tb$cor.per<-ifelse(eigen.res.cou.par.tb$COU%in%c("USA","DEU","JPN","FRA","GBR","CHN","ITA"),"core","periphery")
  colnames(eigen.res.cou.par.tb)<-c("out.eigen.cent","in.eigen.cent","COU","cor.pres")
  
 p.1.out.eigen.cou.par<- ggplot(eigen.res.cou.par.tb, aes(reorder(COU, -out.eigen.cent),out.eigen.cent ,fill=cor.pres)) +
    geom_bar(stat="identity", position="dodge") +# fill="#D3D3D3"
   labs(x="",y="Out-eigen",title="Out eigenvector centrality")+
   scale_fill_grey(start = 0.8, end = .3)+
    #scale_fill_brewer(type="seq",palette =6)+
   # scale_color_brewer()+
    scale_y_continuous(breaks=c(seq(0,1,.1)),minor_breaks=c(seq(0.05,1.0,.1)))+
    theme_bw()+
    theme(  plot.title = element_text(hjust=0.5,vjust=0.5),legend.position = "bottom", # legend location in graph
           #panel.grid.minor = element_blank(),
           axis.title=element_text( size="10"),
           axis.text.x=element_text(angle = 90, size=10))+guides(fill=guide_legend(title=""))
```
```{r plot shock propagation countries, echo=FALSE, fig.align='center', fig.cap='Figure 3.1: Shock propagation: Importance of countries', message=FALSE, warning=FALSE}
p.1.out.eigen.cou.par
 # \begin{figure}[H]
# \centering
# \caption{Shock propagation: Importance of countries}
# \includegraphics[width=.55\linewidth]{./fig/out_eigen_cent_COU_PAR.tex}
# \end{figure}
```

## The international sectoral trade network and relative eigenvector centrality

In this section I work with the concepts of the international trade network and eigenvector centrality defined at the sectoral level. 
I compute the principal eigenvector for the set of countries separately in each sector and investigate the variability in centrality in this set of countries. 
These results motivate the analysis of the strength of association between relative network centrality and RCA.
  <p>
Conceptually, extending the definitions of weighted out-eigenvector centrality and the international trade network at the country-sector level is easy.
It requires adding a superscript $k$ to the matrix of trade relationships and to the weight matrix. 
I focus on a subset of ten manufacturing and two service industries.
I compute the set of eigenvector for the full set of countries. 
For readability, I present the results for the set of countries which I identified as the core group.
 These countries are the most central in the international trade network and hence the most relevant for the propagation of shocks in the network.
<p>
Figure 3.2 displays the importance of countries in the core group according to weighted out-eigenvector centrality separately for each sector.
The left panel shows the four countries with the highest rank sum across sectors in the core group.
The right panel shows the remaining three countries of the core group. <p>
Overall, I find that the rank positions of the countries vary noticeably.
A counterexample is the USA, which is ranked in the first or second position across ten of the twelve industries. 
But, the countries in the right panel show more variation of their rank positions.
For example, China is ranked first in the industry 'textiles and leather' industry (17-19) and is among the ten most important countries across the manufacturing industries. Yet, its rank position is only thirteenth in the 'financial services and intermediation' industry (65-67) and eleventh in the 'real estate and business services' industry (70-74). <p>
Summing up figure 3.2 highlights that the rank positions of the core countries show noticeable variation in the centrality rankings at the sectoral level.
The variability of the ranking of the countries at the sectoral level indicates that the centrality of countries may be linked to their ability to contribute to the international trade network specific to each sector.
To test this hypothesis further, I compute the strength of association between the ranking of relative centrality and and the ranking of RCA in chapter 2.
```{r shock propagation, fig.align='center', fig.cap='Figure 3.2: Shock propagation: Seven highest ranked countries across industries',echo=F}
grid.arrange(p.7.4,p.7.3,nrow=1,ncol=2)
```
I analyze the association between the ranking of relative network centrality and the ranking of structural RCA constructed on the basis of forward value-added trade.
Specifically, I normalize weighted out-eigenvector centrality using the same approach as in the normalization of RCA rankings.^[
For each concept the value specific to exporting country $j$ and industry $k$ is re-scaled by the sample mean and then divided by the product of the exporting country mean and the industry mean. The latter product may be interpreted as the expected value for the particular exporting country $j$ in industry $k$.]
<p>
I motivate the empirical analysis as follows. According to the theory of comparative advantage a country with a higher productivity will contribute relatively more to the world production. According to network theory a country is expected to be more important in terms of how many dollars it contributes to the total value of the network. 
It follows that in relative terms both measures are expected to capture the ranking of relative sectoral ability for any pair of countries.
<p>
If both measures indeed capture the ranking of relative sectoral ability, eigenvector centrality may be preferred as a simpler measure. 
Specifically, the structural RCA measure is constructed using a two-step estimation procedure. 
However, the eigenvector approach is not based on estimation but on matrix diagonalization.
<p>
Figure 3.3 shows the strength of association between the ranking of industries within each country according to RCA and the ranking of industries within each country according to relative network centrality. 
Overall, I find that most countries exhibit a high strength of association between the rankings. 
This is highlighted by the relatively high median of the strength of association, which is 0.88 for Spearman's $\rho$ and 0.74 for Kendall's $\tau$.
The strength of association is for most countries in a narrow range. The interquartile range is $0.09$ on the basis of Spearman's $\rho$ and $0.12$ on the basis of Kendall's $\tau$. The value of the first quantile is 0.83 on the basis of Spearman's $\rho$ and 0.67 on the basis of Kendall's $\tau$. Further, the value of the third quantile is 0.92 on the basis of Spearman's $\rho$ and 0.79 on the basis of Kendall's $\tau$. <p>
For a number of countries the strength of association between the rankings is rather low. 
Especially, the following countries are lower outliers: Belgium, Spain, Mexico, France, India and Indonesia. The countries are outliers based on the definition that their absolute deviation from the median is higher than 3 times the median absolute deviation.^[This outlier definition is based on @Hampel.] 
```{r plot eigen RCA, echo=FALSE, fig.align='center', fig.cap='Figure 3.3: Strength of association between RCA and eigenvector centrality', message=FALSE, warning=FALSE}
grid.arrange(p.eigen.RCA.spear,p.kend.RCA.eigen,ncol=2)
```
Concluding, I find that for most countries the ranking of relative network centrality maps into their pattern of RCA. 
However, for a group of six countries I find that the pattern of relative network centrality does not map into their pattern of RCA. 
A theoretical explanation for this result is a question for future research. 


<!--chapter:end:03_thesis_ch3.rmd-->

# Conclusion

In this thesis I evaluated two research questions. 
First, I analyzed whether the pattern of revealed comparative advantage changes significantly, when it is computed on the basis of domestic factors only. 
Specifically, I  compared the RCA ranking obtained on the basis of gross exports to the RCA rankings on the basis of backward value-added trade and of forward-value-added trade. <p>
 Second, I assessed the importance of countries to the propagation of shocks in the international trade network and compared this ranking to the ranking obtained on the basis of RCA.  
Specifically, I measured the importance of countries for the international trade network for total value-added and sectoral total value-added with the notion of eigenvector centrality.
<p>
Regarding the first objective, I find that the stability of the ranking of revealed comparative advantage depends on the definition of value-added trade. 
The ranking of RCA based on value-added trade in line with the contribution of the domestic supply-chain in gross exports mirrors closely the RCA ranking on the basis of gross exports.
However, there are noticeable differences between the ranking of RCA on the basis of the factor content of trade to the ranking on the basis of gross exports. <p>
I interpret my results as follows. 
The similarity of the ranking obtained for gross exports, and backward value-added trade indicates that the foreign content does not substantially change the pattern of relative production costs. 
The contribution of the domestic supply chain is sufficient to predict the ranking of comparative advantage.
The dissimilarity of the ranking for gross exports and forward value-added trade trade indicates that comparative advantage according to the factor content of trade differs from comparative advantage according to the domestic supply chain. 
Overall, I conclude that the pattern of comparative advantage based on gross exports captures the domestic content of trade. 
However, it does not capture very well the pattern of comparative advantage associated with the factor content of trade.<p>

Regarding the second objective, I find that the importance of countries  for the propagation of shocks on the basis of total value-added indicates a core-periphery structure.
The core group consists of four European countries, the USA, Japan and China. 
Further, I find that the rank positions of the core countries show noticeable variations for the international trade network on the basis of sectoral value-added. 
Finally, I find that for most countries pairs the ranking of relative network centrality maps into their pattern of RCA. 
However, for six countries I find that ranking of relative network centrality does not map into their pattern of RCA. <p>
The research of this thesis may open an avenue for further theoretical and empirical economic research.
First, future research may provide a theoretical underpinning for the mapping of the ranking of relative network centrality into the pattern of RCA. 
Second, future empirical research may analyze whether the ranking of network centrality maps into the ranking of comparative advantage according to Heckscher-Ohlin. 

<!--chapter:end:04_thesis_ch4.Rmd-->


# Appendix {-}
In the appendix I describe the methods and the data to obtain the parameter $\theta$. 
Second, I include a data appendix with relevant summary statistics.

## Estimation

In the following section I outline the methods, the additional data sources and the data manipulations I used to estimate  $\theta$. In particular, I motivate the IV estimation and describe the data imputation. 
Finally, I discuss the results. <p>
I follow the approach of CDK to obtain an estimate of $\theta$. Thus, I estimate the following equation.
\[ ln x_{i,j}^k=\delta_{i,j}+\delta_{j}^k + \theta \ln z_{i,j}^k+\epsilon_{i,j}^k\]
In the equation $x_{i,j}^k$ denotes the trade flows between exporting country $i$, importing country $j$ in industry $k$, $\delta_{i,j}$ denotes an importer-exporter fixed effect, $\delta_{j}^k$ denotes an importer-sector fixed effect,  $z_{i,j}^k$ denotes the productive efficiency and $\epsilon_{i,j}^k$ denotes the error term.  As in CDK I specify $z_{i,j}^k$ as the inverse of producer prices. <p>
The structural parameter $\theta$ may be estimated with OLS under the condition that the econometric error term is exogenous. 
In the model the error term may be interpreted as a variable trade cost. 
Thus, the exogeneity condition requires for the three specifications that the inverse of total production cost, total domestic cost and total domestic factor cost is not correlated with variable trade cost. <p>
[@costinot] state two reasons why the condition may be violated.
First, the condition may be violated because of a simultaneity bias. 
An example for a simultaneity bias is agglomeration effects. ^[A relevant agglomeration effect in our context would be a positive spillover from the decision of one firm to export into a certain market to the decision of second firm to export to this destination [@bernard2004].]
The sign of the simultaneity bias is apriori ambiguous [@costinot]. <p>
Second, the exogeneity condition may be violated due to a measurement error.
A measurement error of the international price data would downward bias the estimate of $\theta$ under the condition that the measurement error is correlated with the true underlying variable [@greene,p.85].<p>
I use an IV estimation strategy to address the two outlined problems. 
By instrumenting the inverse producer price with the instrument R&amp;D expenditures, I attempt to isolate the variation of the regressor, which is exogenous to the econometric error term. 
Moreover, if the variation of the producer prices explained by R&amp;D mainly affects our independent variable through productive efficiency, than the IV estimation identifies the effect of Ricardian sources of comparative advantage.
<p>
 I motivate the choice of R&amp;D expenditures as instrumental variable as follows. 
 First, modelling productive efficiency as a process of R&amp;D is in line with the approach of  @costinot and @eaton.
A possible mechanism for R&amp;D expenditures to affect the inverse of producer prices is that an increase in R&amp;D expenditures may lead to innovations, which lead to more cost efficient production technologies. 
In the model a decrease in the cost of producing a good is directly passed through to the producer price, due to  the assumption of perfect competition. <p>
I can test the outlined mechanism, on the basis of the first stage regression of the inverse of producer prices on  R&amp;D expenditures.
Under the outlined mechanism, I expect that the coefficient is positive and statistically significant.
An empirical test of the exclusion assumption, which is that the instrument is exogenous to the econometric error term, is however not possible   [@cameron2009,p.109].

 \subsection{Data}
I use the following additional data sources to estimate $\theta$. 
I use international producer price data for the year 2005 from the GGDC [@Inklaar2012], R&amp;D expenditures for the year 2005 from the [@stan2] ANBRED database and value-added trade and gross export data from the TiVA database. Further, to harmonize the level of aggregation of the international price data with the other data sources, I used value-added output data from the OECD STAN @stan2.
 <p>
I combined the additional data sources with the value-added and gross export data from the TiVA using the ISIC Rev.3.1 two digits classification.
In the cases in which the international price data is more disaggregated, I used a weighted average. 
Specifically, to merge two sectors I assigned to each sector a weight equal to the share of the sectors value-added output relative to the sum of value-added output of the two sectors. %Hence, I aggregated several prices from the service sectors using a weighted average.

### Missing data imputation

@schafer1998multiple note that the following three concerns arise due to missing data: (1) efficiency losses, (2) complications in data handling and data analysis, (3) bias due to differences between the observed and unobserved data.
For the estimation of $\theta$ potential problems may arise due missing data because  data on  R \& D expenditures is not available for some industries..
In particular, the missing data may cause a loss of efficiency in the first stage of the IV estimation, which would reduce the strength of the first stage association between R&amp;D and the inverse of producer prices and hence upward bias the estimates of $\theta$. <p>

Multiple imputation is a Bayesian technique to impute missing data by simulated draws from the posterior predictive distribution. ^[It was initially proposed in @rubin1978 for non-response in surveys, and its statistical properties were developed in  @Rubin1987.]<p>
 In the following, I outline multiple imputation based on  [@Little:2002a, pp. 209-211].<p>
Missing data techniques assume that the missing observations of a variable are random variables with a statistical distribution.
Multiple imputation as other missing data imputation techniques assume that the probability of a missing observation depends only on the observed data and not on the missing data. <p>
The idea of multiple imputation is to relate the observed posterior distribution to the complete-data posterior distribution, which would be observed in the absence of missing data.
The main result of [@Rubin1987] is that the posterior distribution of a statistical quantity may be simulated by first imputing the missing observations with repeated draws from the predictive posterior distribution of the missing data given the observed data and then drawing the statistical quantity from its complete data posterior distribution. ^[The posterior distribution is in Bayesian interference obtained by dividing the product of the assumed prior distribution and the likelihood by a normalizing constant. The posterior predictive distribution describes the predicted value averaged over the posterior distribution.]
<p>
Multiple imputation produces valid interference from a frequentist perspective [@Little:2002a,p.90]. <p>
The choice of multiple imputation is based on the following reasons.
First, techniques ignoring the missing observations such as complete case analysis or case-wise deletion, require a stronger assumption about the missing data.  Specifically, they require that the missing data is a random subset of all observations @bhaskaran}.
Multiple imputation offers a simple and general approach and it correctly accounts for the uncertainty induced by missing observations [@schafer1998multiple].
 <p>
After the imputation complete-data methods can be used on the imputed data-sets and the results may be combined using Rubin's rules [@Rubin1987].
Specifically, I obtain an estimate of a statistical quantity $\bar{Q}_m$, by taking the mean of the estimates obtained within each imputed data set $\bar{Q}_m=1/m \sum_{l=1}^m Q_{l}$, where $Q_1 \dots Q_m$ denotes the estimate obtained within each data set.
The associated variance is the combination of the variance estimates within each imputed data set and the variance between the imputations. Formally, 
 $ T_m=\bar{V_m}+ ({m+1}/m)*B_m \, \text{where } B_m=1/{m-1}*\sum_{l=1}^m (Q_l-Q_m)^2 \, \text{and} \,\bar{V_m}=1/m * \sum_{l=1}^m V_l $.
 <p>
To overcome possible shortcomings of multiple imputation, I combine it with predictive mean matching (PMM).
PMM  is a nearest neighbour matching technique.  Its use in the context of multiple imputation is attributed to @Rubin_matching and @little86. 
Unlike multiple imputation, which is based on a normality assumption, PMM imputes missing data with random draws from the closest observations in the observed data. 
As a consequence, it is well suited to impute skewed variables [@White_MI_chained]. This approach is relevant to our imputation as R\&D expenditure is highly skewed.

<p> 
Under PMM the missing observation $y_i$ of unit $i$ is imputed using a random draw from the observations $y_j$ of those units $j$, which have the smallest distance between its predicted value $\hat{y_j}$ to the predicted value for unit $\hat{y_i}$ based on a regression of $\boldsymbol{Y}$ on some covariates $\boldsymbol{X}$. In particular, I impute each missing value with a random draw from the ten closest observations.
This choice rests on the recommendations of the simulation study by [@Morris2014].<p> 
The imputation is conducted as follows. 
First, I impute the outcome variable of the first stage regression, the log of the inverse of the international producer prices, using country and sector fixed effects. 
Second, I impute the log of R&amp;D expenditure using country and sector fixed effects and the log of the inverse of the international producer prices. 
I impute both variables using the country and sector fixed effects, to account for time-invariant determinants of both at the country and sector level.
Similarly, @costinot} imputed the log of R&amp;D expenditures with the predicted values from a regression on country and sector fixed effects. 
<p> 
I impute the outcome variable based on the following arguments. First, @little1992regression} argued that if both the regressors and the outcome variable have missing values, the latter may provide additional information to impute the regressors. 
Second, the simulation study of @Moons:2006a found that the results of multiple imputation of covariates with missing observations were biased if the outcome variable was not used in the imputation.

\subsection{Results: Second stage}

Table 1  presents the cross-sectional results of the estimates of $\theta$ for the year 2005. Table 1 is divided into three subtables for each of the three dependent variables, gross exports, backward value-added and forward value-added. 
Across tables, I present the OLS estimates in column 1. 
I present the IV estimates with the instrument "R&amp;D expenditure" in columns 2-4.
In columns 2-3, I present the IV estimates using the complete sector coverage and a restricted sample without primary sectors.
In column 5, I present the IV estimates  including only high income countries based on the World Bank classification for 2005.
<p>
The OLS estimates for gross exports and backward value-added trade show a statistically significant positive coefficient.
The IV estimates in the columns 2-4 are significantly increased relative to column 1. The estimated $\theta$ in column 2-4 is between 12.63 and 14.68.
The significant increase of the IV estimate compared to the OLS  estimate, confirms that the regressor is endogenous [@hausman1978].<p>
The IV estimates of $\theta$ across the samples for the dependent variables "gross exports" and "backward value-added trade" show the following results.
Overall, the estimates of $\theta$ for both dependent variables are very close and the difference is statistically not significant.
Second,  the sample including only high-income countries and excluding the primary sectors shows a statistically significant increase of $\theta$. ^[I performed an significance based on the t-test. The distribution of test statistic is a t-distribution with $v$ degrees of freedom, where $v=(m-1)*(1+ ((1+m^{-1})*B/ \bar{V})^{-1})$ and $\bar{V}$ denotes the average within-imputation variance and $B$ denotes the between imputation variation of the estimated parameter [@Rubin1987,p.77]].
A higher estimate of $\theta$ implies a decreased dispersion of production costs within sectors. The result is in line with our expectations, since the sample includes only high income countries.<p>
The estimates of $\theta$ on the basis of FVAT are not significant based on OLS. 
As for the other two dependent variables the IV estimates for $\theta$ are significantly increased. 
Third, the estimate of $\theta$ for the sample including only high-income countries shows in contrast to the results for EXGR and BVAT no significant difference compared to the other samples. 
 <p>
Directly comparing the estimate of $\theta$ on the basis of gross exports to the result of @costinot, I find that the IV results in column 2-3 are not statistically different from CDKs results  ($\theta_{CDK}$ 11.1 SE 0.981).
  However, the authors' favourite estimate of $\theta$ is $6.58$ on the basis of openness corrected exports. 
  The authors use openness corrected gross exports to account for trade selection, which  downward biases  the productive differences. ^[Trade selection denotes that a country does not produce certain goods for which they receive a low productivity draw and instead import them [@costinot].] <p>
For two reasons I decided to use gross exports and value added trade without correcting for openness.
First, data on the import penetration ratio is only available for the manufacturing sectors, which would reduce the sample size considerably.
Second, I was unable to obtain a similar correction for VAT. 
```{r theta, message=FALSE, warning=FALSE, cache=T, include=FALSE,echo=F}
ov.caption="Estimation of theta"
cgroup.1<-c("Dep. var. log EXGR 2005")
cgroup.2<-c("Dep. var. log BVAX 2005")
cgroup.3<-c("Dep. var. log FVAX 2005")
cp.tb1<-c("Cross-section results I")
cp.tb2<-"Cross section results II"
cp.tb3<-"Cross section results III"
col.names.3<-c("OLS","Full Sample","Without primary sectors","Without primary sectors high")
estimate.exgr<-c(0.434,     12.653 , 11.424 , 14.689)
exge.se<-c("(0.067)","(1.331)","(1.422)","(2.130)") 
Exp.Imp.FE<-rep("Yes",4)
Imp.Sec.FE<-rep("Yes",4)
r.sq.exgr<-c( 0.771, 0.197 , 0.321, 0.141)
rownames.3<-c("","Log inv. prod. prices","Exporter-Importer Fixed Effect","Exporter-Sector Fixed Effect", "Observations","First-Stage F-Statistic","log inv. prod. prices")
exgr.obs<-c(18143,18143,16582,14449)
f.stat<-c( "",151.41,     125.60      ,    85.24 )
table.exgr<-data.frame(rbind(rbind(rep("",4)),estimate.exgr,exge.se,Exp.Imp.FE,Imp.Sec.FE,exgr.obs,r.sq.exgr,f.stat))
text.group1<-c("is instrumented in", "columns 2-5 with","log of R&amp;D", "expenditures")
text.group.1<-c("HC robust" ,"standard errors", "in parentheses" ,"")
bl.tb.cgroup.3<-c("Without primary sectors excludes the sectors mining and agriculture")
bl.tb.cgroup.4<-c("R^2 pooled using Fisher's z transformation")
bl.tb.cgroup.5<-c("high denotes highly developed countries, in the sample this includes the following countries: AUS, AUT  BEL, CAN, CZE, DEU, ESP, EST, FIN, FRA, GBR, GRC, HUN, IRL, ITA, JPN, KOR, LUX, NLD, POL, PRT, RUS, SVK, SVN, USA")

estimates.dva<-c( 0.476 , 12.911 , 11.762 , 15.080)
se.dva <- c( 0.066, 1.340,1.447, 2.180)
obs.dva<-c(  18143 ,18143  ,16582 ,14449)
rsq.dva<-c(0.775 , 0.180 , 0.304 , 0.128)
table.dva<-data.frame(rbind(rbind(rep("",4)),estimates.dva,se.dva,Exp.Imp.FE,Imp.Sec.FE,obs.dva,rsq.dva,f.stat))

estimates.fddva<-c( -0.019,  9.286 ,10.325 ,10.218  )
se.fddva <-c(0.045,0.868,1.291,1.199)
rsq.fddva<-c( 0.882 , 0.475 ,0.431 ,0.488  )
obs.fddva<-c(  18143 ,18143  ,16582 ,14449)
table.fddva<-data.frame(rbind(estimates.fddva,se.fddva,Exp.Imp.FE,Imp.Sec.FE,obs.fddva,rsq.fddva,f.stat))
```

|                      | OLS             | Full Sample    | Without primary sectors** |   Without primary sectors high*** |
|----------------------|-----------------|----------------|---------------------------|-----------------------------------|
| Dep. var.            | log EXGR 2005   |                |                           |                                   |
| &theta;              |      0.434      |     12.653     |           11.424          |               14.689              |
| SE*                  |     (0.067)     |     (1.331)    |          (1.422)          |              (2.130)              |
| Exporter Importer FE |       Yes       |       Yes      |            Yes            |                Yes                |
| Importer Sector FE   |       Yes       |       Yes      |            Yes            |                Yes                |
| Observations         |      18143      |      18143     |           16582           |               14449               |
| R-squared            |      0.771      |      0.197     |           0.321           |               0.141               |
| F-stat 1st stage     |                 |     151.41     |           125.6           |               85.24               |
| *HC robust           | standard errors | in parentheses |                           |                                   |
| **Without primary    | sectors         | excludes       | the sectors mining &amp;  | agriculture                       |
| ***high denotes      | highly          | developed      |  countries                |                                   |

Table: Cross-section results I

|                      | OLS             | Full Sample    | Without primary sectors** |   Without primary sectors high*** |
|----------------------|-----------------|----------------|---------------------------|-----------------------------------|
| Dep. var.            | log BVAX 2005   |                |                           |                                   |
| &theta;              |0.476            |  12.911        |           11.762          |              15.080               |
| SE*                  |     (0.066)     |  (1.34)        | (1.447)                   |            (2.18)                 |
| Exporter Importer FE |       Yes       |       Yes      |            Yes            |                Yes                |
| Importer Sector FE   |       Yes       |       Yes      |            Yes            |                Yes                |
| Observations         |      18143      |      18143     |           16582           |               14449               |
| R-squared            |      0.775      |      0.18      |      0.304                |               0.128               |
| F-stat 1st stage     |                 |     151.41     |           125.6           |               85.24               |
|----------------------|-----------------|----------------|---------------------------|-----------------------------------|
| *HC robust           | standard errors | in parentheses |                           |                                   |
| **Without primary    | sectors         | excludes       | the sectors mining &amp;  | agriculture                       |
| ***high denotes      | highly          | developed      |  countries                |                                   |

Table: Cross-section results II

|                      | OLS             | Full Sample    | Without primary sectors** |   Without primary sectors high*** |
|----------------------|-----------------|----------------|---------------------------|-----------------------------------|
| Dep. var.            | log FVAX 2005   |                |                           |                                   |
| &theta;              |      0.019      |     9.286      |           10.325          |               10.218              |
| SE*                  |     (0.045)     | (0.868)        | (1.291)                   |           (1.199)                 |
| Exporter Importer FE |       Yes       |       Yes      |            Yes            |                Yes                |
| Importer Sector FE   |       Yes       |       Yes      |            Yes            |                Yes                |
| Observations         |      18143      |      18143     |           16582           |               14449               |
| R-squared            |       0.882     |       0.475    |          0.431           |               0.488               |
| F-stat 1st stage     |                 |     151.41     |           125.6           |               85.24               |
| *HC robust           | standard errors | in parentheses |                           |                                   |
| **Without primary    | sectors         | excludes       | the sectors mining &amp;  | agriculture                       |
| ***high denotes      | highly          | developed      |  countries                |                                   |

Table: Cross-section results III

\subsection{Results: First stage}

The results of the first stage regression address two concerns about the validity of the IV regression: the relevance of the instrument and whether the instrument affects the endogenous regressor in the hypothesized way. <p>
The table shows that the F-statistic of the excluded instrument in the first stage is very high.
This implies  that the instrument is highly relevant.
Further, the first stage shows a statistical significant positive effect of  R&amp;D on the inverse of producer prices, which confirms the expected positive effect of R&amp;D.
```{r first stage, message=FALSE, warning=FALSE, cache=T, include=FALSE}
estim.first<-c(      0.022 ,       0.023   ,       0.020  )
      se.first      <-c(      0.002,0.002,0.002)
      FE.1<-c(rep("Yes",3))
       F.stat.first<-c(125.60,88.17,85.24)
        obs.first<-c(19343,     17661    ,15283)    
      imp<-c(29, 29, 29)
      row.fs<-c("Log of R\\&amp;D","SE","Exporter Importer FE","Export Sector FE","Observations","F (excl. dummies)","Imputations")
      table.4<-data.frame(rbind(estim.first,se.first, FE.1, FE.1,obs.first,F.stat.first,imp ))
      rownames(table.4)<-row.fs
      colnames(table.4)<-c("Full Sample","Without primary sectors","Without primary sectors high")
# \begin{table}[H]
# \footnotesize
# \def\sym#1{\ifmmode^{#1}\else\(^{#1}\)\fi}
# \begin{tabular}{l*{3}{c}}
# \toprule
#             &\multicolumn{1}{c}{(1)}&\multicolumn{1}{c}{(2)}&\multicolumn{1}{c}{(3)}\\
#             &\multicolumn{1}{c}{Full Sample}&\multicolumn{1}{c}{Without primary sectors}&\multicolumn{1}{c}{Without primary sectors $\text{high}^6$ } \\
# \midrule
# Log of R&amp;D  &      0.022 \sym{***}&       0.023   \sym{***}&       0.020   \sym{***}\\
#             &        (0.002)        &     (0.002)         &     (0.002)       \\
# \midrule
# Exporter Importer FE & \multicolumn{1}{c}{Yes} & \multicolumn{1}{c}{Yes}& \multicolumn{1}{c}{Yes}\\
# Importer sector FE & \multicolumn{1}{c}{Yes}& \multicolumn{1}{c}{Yes}& \multicolumn{1}{c}{Yes} \\
# \(N\)       &     19343    &     17661    &         15283         \\
# %\(R^{2}\)   &          0.614         &       0.645         &       0.693          \\
# F (excluding dummies)    &     125.60         &      88.17         &    85.24         \\
# Imputations & 29 & 29 & 29 \\
# \bottomrule
# \multicolumn{4}{l}{\footnotesize Standard errors in parentheses}\\
# \multicolumn{4}{l}{\footnotesize \sym{*} \(p<0.05\), \sym{**} \(p<0.01\), \sym{***} \(p<0.001\)}\\
# \end{tabular}
# 
# \end{table}
```
```{r fsreg, echo=FALSE, message=FALSE, warning=FALSE, results="asis"}
kable(table.4)
```

## ISIC and ISO 3 Alpha Classification


|ISIC Code | Short                            | Description                                          | 
|:---------|:---------------------------------|:-----------------------------------------------------|
| 01-05    | Agriculture products             | Agriculture, hunting, forestry and fishing           | 
| 10-14    | Mining products                  | Mining and quarrying                                 | 
| 15-16    | Food sector                      | Food products, beverages and tobacco                 | 
| 17-18    | Textile products                 |Textile and textile products                          |
|19        | Leather products                 | Leather and footwear                                 |
|17-19     | Textiles &amp; Leather products  | Textiles, textile products, leather and footwear     | 
|20        | Wood products                    | Wood and products of wood and cork                   | 
|  21-22   | Paper products                   | Pulp, paper, paper products, printing and publishing | 
|  23      | Fuel products                    | Coke, refined petroleum products and nuclear fuel    | 
|  24      | Chemical products                | Chemicals and chemical products                      | 
|  25      | Plastic products                 | Rubber and plastics products                         | 
|  26      | Mineral products                 | Other non-metallic mineral products                  | 
|  27-28   | Metals                           | Basic metals and fabricated metal products           | 
|  29      | Machinery                        | Machinery and equipment, nec                         | 
|  30-33   | Electrical                       | Electrical and optical equipment                     | 
|  34-35   | Transport                        | Transport equipment                                  | 
|  36-37   | Misc. Manufacturing              | Manufacturing nec; recycling                         | 
|  40-41   | Electricity                      | Electricity, gas and water supply                    | 
|  45      | Construction                     | Construction                                         | 
|  50-52   | Trade                            | Wholesale and retail trade; repairs                  | 
| 55       | Gastronomy                       | Hotels and restaurants                               | 
|  60-64   | Communication                    | Transport and storage, post and telecommunication    | 
|  65-67   | Finance                          | Financial intermediation                             |  
|  70-74   | Real estate                      | Real estate, renting and business activities         | 
|  75-95   | Social                           | Community, social and personal services              | 


|ISO 3 |    Country   | COU |     Country              |
|:---|:-------------|:----|:-------------------------|
|ARG |Argentina     | ITA | Italy                    |
|AUS |Australia     | JPN | Japan                    |
|AUT |Austria       | KOR | Korea                    |
|BEL |Belgium       | LTU | Lithuania                |
|BGR |Bulgaria      | LUX | Luxembourg               |
|BRA |Brazil        | LVA | Latvia                   |
|CAN |Canada        | MEX | Mexico                   |
|CHE |Switzerland   | MYS | Malaysia                 |
|CHL |Chile         | NLD | Netherlands              |
|CHN |    China     | NOR | Norway                   |
|COL |Colombia      | NZL | New Zeeland              |
|CYP |Cyprus        | PHL | Philippiens              |
|CZE |Czech Republic| POL | Poland                   |
|DEU |Germany       | PRT | Portugal                 |
|DNK |Denmark       | ROU | Romania                  |
|ESP |Spain         | ROW | Rest of the World        |
|EST |Estonia       | RUS | Russian Federation       |
|FIN |Finland       | SGP | Singapore                |
|FRA |France        | SVK | Slovakia                 |
|GBR |United Kingdom| SVN |  Slovenia                |
|GRC |Greece        | SWE | Sweden                   |
|HKG |Hong Kong     | THA | Thailand                 |   
|HRV |Croatia       | TUN | Tunisia                  |
|HUN |Hungary       | TUR | Turkey                   |
|IDN |India         | TWN | Taiwan                   |
|IND |Indonesia     | USA | United States of America |
|IRL |Ireland       | VNM | Vietnam                  |
|ISR |Israel        | ZAF | South Africa             | 


\section{Data Appendix}

```{r summary stats, message=FALSE, warning=FALSE, include=FALSE}
col.names<-c("Mean","Std. Dev", "Min", "Max", "N")
row.names<-c("Log BVAT","Log EXGR","Log FVAT","Log inv. prod. prices","Log R\\&D")
vec.1<-c(  2.443 , 2.867 ,-4.605, 10.754, 17453)
vec.2<-c(2.742 , 2.871  , - 4.605  , 11.108, 17505)
vec.3<- c(3 , 2.351 , - 4.605, 10.739 ,15999)
vec.4<-c( 0.267,   0.274,  -0.672  ,1.167 ,18444)
vec.5<-c(17.801 ,   2.441 ,  10.745, 24.759 , 17313)
table.1<-data.frame(rbind(vec.1,vec.2,vec.3,vec.4,vec.5))
colnames(table.1)<-col.names
rownames(table.1)<-row.names
```


|                       |Mean   | Std_Dev | Min    | Max    | N    |
|-----------------------|:-------:|:-----:|:------:|:------:|:----:|
| Log BVAT              | 2.443   | 2.867 | -4.605 |  10.754 | 17453 |
| Log EXGR              | 2.742   | 2.871 | -4.605 | 11.108 |17505 |
| Log FVAT              | 3.000   | 2.351 | -4.605 |  10.739| 15999 |
| Log inv. prod. prices | 0.267   | 0.274 | -0.672 |  1.167 |18444 |
| Log R&D               | 17.801  | 2.441 | 10.745 |  24.759| 17313 |
 
 Table: Summary statistics in $\theta$ sample

```{r table.2 corr, message=FALSE, warning=FALSE, cache=T, include=FALSE}
#"Variables",
colnames.2<-c("Log EXGR","Log BVAT", "Log FVAT","Log inv. prod. prices","Log R\\& D")
 vec.1.2<-c(1.000 , "-"  		,"-"		  , "-"	,"-")  
 vec.2.2<-c(0.996,1.000 , "-"		  ,"-"			, "-"  ) 
 vec.3.2<-c(0.872,0.890,1.000 ," -"				, "-" ) 
 vec.4.2<-c(-0.092,-0.100,-0.211,1.000, "-")	
 vec.5.2<-c(0.434,0.446,0.488,-0.200,1.000)
 table.2<-data.frame(rbind(vec.1.2,vec.2.2,vec.3.2,vec.4.2,vec.5.2))
 colnames(table.2)<-colnames.2
 rownames(table.2)<-c("log gross exports","log backward value-added trade","log forward value-added trade","Log inv. prod. prices","Log R\\&D")
# \begin{table}[H]
# \centering\caption{Pairwise correlation in $\theta$ sample}
# \footnotesize
# \label{tab:pwcorr}
# \scalebox{0.8}{
# \begin{tabular}{l*{5}{c}}\toprule
# \multicolumn{1}{c}{Variables} &\multicolumn{1}{c}{Log EXGR}&\multicolumn{1}{c}{Log BVAT}&\multicolumn{1}{c}{ Log FVAT}&\multicolumn{1}{c}{Log inv. prod. prices}&\multicolumn{1}{c}{Log R&amp; D}\\ \midrule
# Log gross exports								 &1.000 &   		& 		  &				&   \\
# Log backward value-added trade &0.996&1.000 & 		  &				&    \\
# Log forward value-added trade   &0.872&0.890&1.000 & 				&     \\
# Log inv. prod. prices								&-0.092&-0.100&-0.211&1.000& 		\\
# log R&amp;D 												&0.434&0.446&0.488&-0.200&1.000\\
# \bottomrule
# \end{tabular}
# }
# \end{table}
```
```{r final-table, echo=FALSE, message=FALSE, warning=FALSE, results='asis'}
kable(table.2)
```

</div>
<!--chapter:end:06_thesis_appendix.Rmd-->


`r if (knitr:::is_html_output()) '# References {-}'`

<!--chapter:end:05_thesis_references.rmd-->

<!--chapter:end:Thesis.Rmd-->