## David Aldous’ review of the Black Swan

The phrase “Black Swan” (arising earlier in the different context of Popperian falsification) is here defined as an event characterized [p. xviii] by rarity, extreme impact, and retrospective (though not prospective) predictability, and Taleb’s thesis is that such events have much greater effect, in financial markets and the broader world of human affairs, than we usually suppose. The book is challenging to review because it requires considerable effort to separate the content from the style. The style is rambling and pugnacious—well described by one reviewer as “with few exceptions, the writers and professionals Taleb describes are knaves or fools, mostly fools. His writing is full of irrelevances, asides and colloquialisms, reading like the conversation of a raconteur rather than a tightly argued thesis”. And clearly this is perfectly deliberate. Such a book invites a review that reflects the reviewer’s opinions more than is customary in the Notices. My own overall reaction is that Taleb is sensible (going on prescient) in his discussion of financial markets and in some of his general philosophical thought but tends toward irrelevance or ridiculous exaggeration otherwise.

## Rio’s Inequality

Let ${X}$ and ${Y}$ be two integrable real-valued random variables and let ${ Q_x(u) = inf\{t: P(|X|>t) \leq u \}}$ be the quantile function of ${|X|}$. Then if ${Q_X Q_Y}$ is integrable over ${ (0,1)}$ we have

$\displaystyle \boxed{|Cov(X,Y)| \leq 2 \int\limits_{0}^{2a} Q_x(u) Q_Y(u) du}$

where ${ a= a(\sigma(X), \sigma(Y)) = \sup\limits_{\substack{B \in \mathcal{B} \\ C \in \mathcal{C}}} |Cov(\mathbb{I}_{\sigma(X)},\mathbb{I}_{\sigma(Y)})|}$ is the a-mixing coefficient.

Proof: Set ${X^{+} = sup(0,X)}$ and ${X^{-} = sup(0,-X)}$ then

$\displaystyle Cov(X,Y) = Cov(X^{+},Y^{+}) + Cov(X^{-},Y^{-}) - Cov(X^{+},Y^{-}) - Cov(X^{-},Y^{+})$

since ${ X = (X^{+} - X^{-})}$ and ${ Y = (Y^{+} - Y^{-})}$

note also that

$\displaystyle Cov(X^+,Y^+) = \int \int_{\mathbb{R}^{2}_{+}} [P(X>u, Y> \upsilon) - P(X>u)P(Y> \upsilon)]du d\upsilon$

which implies that

$\displaystyle |Cov(X^+,Y^+)| \leq \int \int_{\mathbb{R}^{2}_{+}} \inf (a, P(X>u),P(Y> \upsilon))du d\upsilon$

## Kolmogorov’s Maximal Inequality

1. Let ${X_1, X_2,...,X_n}$ be independent random variables with ${ \mathbb{E}[X_i]=0, \mathbb{E}[X_i^2]< \infty }$. Set ${S_n = \sum_{i=1}^{n} X_n}$. Then ${\forall \varepsilon > 0}$

$\displaystyle \boxed{ P \left( \max_{1 \leq k \leq n} |S_K| \geq \varepsilon \right) \leq \frac{\mathbb{E}[S_n^2]}{\varepsilon^2} }$

Proof: Let

$\displaystyle \begin{array}{rcl} A &\equiv& \{ \max_{1 \leq k \leq n} |S_k| \geq \varepsilon \} ,\\ A_k &\equiv& \{ |S_i| < \varepsilon, i=1,...,k-1,|S_k| \geq \varepsilon \}, \qquad 1 \leq k \leq n \end{array}$

Notice that ${\cup_{i=1}^{n}A_k = A}$ and

## An inequality of the expectation

Let ${X_1,X_2,...}$ be i.i.d. r.vs with ${\mathbb{E}[|X_i|] < \infty}$ and ${Y_k = X_k \mathbb{I}_{(|X_k| \leq k)}}$. Then

$\displaystyle \boxed{ \mathbb{E}[X_1] \geq \sum_{k=1}^{\infty} \frac{var(Y_k)}{4 k^2 } }$

Proof:

$\displaystyle \begin{array}{rcl} var(Y_k) \leq \mathbb{E}[Y_k^2] & =& \int_{0}^{\infty}2y P(|Y_k|>y)dy \\ &\leq& \int_{0}^{k}2 y P(|X_1|>y)dy \end{array}$

So

$\displaystyle \begin{array}{rcl} \sum_{k=1}^{\infty} \mathbb{E}[Y_k^2]/k^2 &\leq& \sum_{k=1}^{\infty} k^{-2} \mathbb{I}_{(yy ) dy \\ &=& \int_{0}^{\infty} \left\lbrace \sum_{k=1}^{\infty} k^{-2} \mathbb{I}_{(yy ) dy \end{array}$

## LAN for Linear Processes

Consider a m-vector linear process

$\displaystyle \mathbf{X}(t) = \sum\limits_{j=0}^{\infty} A_{\theta}(j)\mathbf{U}(t-j), \qquad t \in \mathbb{Z}$

where ${\mathbf{U}(t)}$ are i.i.d. m-vector random variables with p.d.f. ${p(\mathbf{u})>0}$ on ${\mathbf{R}^m}$, ${A_{\theta} (j)}$ are ${m \times m}$ matrices depending on a parameter vector ${ \mathbf{\theta} = (\theta_1,...,\theta_q) \in \Theta \subset \mathbf{R}^q}$.

Set

$\displaystyle A_{\theta}(z) = \sum\limits_{j=0}^{\infty} A_{\theta}(j)z^j, \qquad |z| \leq 1.$

Assume the following conditions are satisfied

A1 i) For some ${D}$ ${(0

$\displaystyle \pmb{|} A_{\theta}(j) \pmb{|} = O(j^{-1+D}), \qquad j \in \mathbb{N},$

where ${ \pmb{|} A_{\theta}(j) \pmb{|}}$ denotes the sum of the absolute values of the entries of ${ A_{\theta}(j)}$.

ii) Every ${ A_{\theta}(j)}$ is continuously two times differentiable with respect to ${\theta}$, and the derivatives satisfy

$\displaystyle |\partial_{i_1} \partial_{i_2}... \partial_{i_k} A_{\theta, ab}(j)| = O \{j^{-1+D}(logj)^k\}, \qquad k=0,1,2$

for ${a,b=1,...,m,}$ where ${\partial_i = \partial/ \partial\theta_i}$.

iii) ${det A_{\theta}(z) \neq 0}$ for ${|z| \leq 1}$ and ${A_{\theta}(z)^{-1}}$ can be expanded as follows:

$\displaystyle A_{\theta}(z)^{-1} = I_m + B_{\theta}(1)z + B_{\theta}(2)z^2 + ...,$

where ${ B_{\theta}(j)}$, ${j=1,2,...,}$ satisfy

$\displaystyle \pmb{|} B_{\theta}(j) \pmb{|} = O(j^{-1-D}).$

iv) Every ${ B_{\theta}(j)}$ is continuously two times differentiable with respect to ${\theta}$, and the derivatives satisfy

$\displaystyle |\partial_{i_1} \partial_{i_2}... \partial_{i_k} B_{\theta, ab}(j)| = O \{j^{-1+D}(logj)^k\}, \qquad k=0,1,2$

for ${a,b=1,...,m.}$

A2 ${p(.)}$ satisfies

$\displaystyle \lim\limits_{\| \mathbf{u} \| \rightarrow \infty} p(\mathbf{u})=0, \qquad \int \mathbf{u} p(\mathbf{u}) d \mathbf{u} =0, \qquad \text{and} \qquad \int \mathbf{uu'}p(\mathbf{u}) d \mathbf{u}=I_m$

A3 The continuous derivative ${Dp}$ of ${p(.)}$ exists on ${\mathbf{R}^m}$.

A4

$\displaystyle \int \pmb{|} \phi(\mathbf{u}) \pmb{|}^4 p (\mathbf{u}) d \mathbf{u} < \infty,$
where ${\phi(\mathbf{u}) = p^{-1}Dp}$.

From A1 the linear process can be expressed as

$\displaystyle \sum\limits_{j=0}^{\infty} B_{\theta}(j) \mathbf{X}(t-j) = \mathbf{U}(t), \qquad B_{\theta} (0) = I_m$
and hence

$\displaystyle \mathbf{U}(t) = \sum\limits_{j=0}^{t-1}B_{\theta}(j)\mathbf{X}(t-j)+\sum\limits_{r=0}^{\infty}C_{\theta}(r,t)\mathbf{U}(-r),$

where

$\displaystyle C_{\theta}(r,t)= \sum\limits_{r'=0}^{r}B_{\theta}(r'+t)A_{\theta}(r-r').$

## Local Asymptotic Normality

The concept of Local Asymptotic Normality (LAN) – introduced by Lucien LeCam – is one of the most important and fundamental ideas of the general asymptotic statistical theory. The LAN property is of particular importance in the asymptotic theory of testing, estimation and discriminant analysis. Many statistical models  have got likelihood ratios which are locally asymptotic normal  – that is the likelihood ratio processes of those models are asymptotically similar to those for the normal location parameter.

Let ${P_{0,n}}$ and ${P_{1,n}}$ be two sequences of probability measures on ${( \Omega_n, \mathcal{F}_n )}$. Suppose there is a sequence ${\mathcal{F}_{n,k}}$, ${k=1,...,k_n,}$ of sub ${\sigma}$-algebras of ${\mathcal{F}_n}$ s.th. ${\mathcal{F}_{n,k} \subset \mathcal{F}_{k+1}}$ and ${\mathcal{F}_{n,k_n} = \mathcal{F}_n}$. Let ${P_{i,n,k}}$ be the restriction of ${P_{i,n}}$ to ${\mathcal{F}_{n,k}}$ and let ${\gamma_{n,k}}$ be the Radon-Nikodym density taken on ${\mathcal{F}_{n,k}}$ of the part of ${P_{1,n,k}}$ that is dominated by ${P_{0,n,k}}$. Put

$\displaystyle Y_{n,k} = (\gamma_{n,k}/\gamma_{n,k-1})^{1/2} -1$

where ${\gamma_{n,0}=1}$ and ${n=1,2,...}$.

The logarithm of likelihood ratio

$\displaystyle \Lambda_n = \log \frac{dP_{1,n}}{dP_{0,n}}$

taken on ${\mathcal{F}_n}$ is then

$\displaystyle \Lambda_n = 2 \sum_k \log (Y_{n,k}+1)$

since ${ \log (\gamma_{n,k}/\gamma_{n,k-1}) = 2(Y_{n,k}+1) }$.

(LeCam 1986). Suppose that under ${P_{0,n}}$ the following conditions are satisfied

• L1: ${\max_k |Y_{n,k}| \xrightarrow{p} 0}$
• L2: ${\sum_{k}Y^2_{n,k} \xrightarrow{p} \tau^2/4 }$,
• L3: ${\sum_{k}E(Y^2_{n,k}+2Y_{n,k}| \mathcal{F}_{n,k-1}) \xrightarrow{p} 0}$, and
• L4: ${\sum_k E\{ Y^2_{n,k} \mathbb{I}(|Y_{n,k}|> \delta)| \mathcal{F}_{n,k-1} \} \xrightarrow{p} 0}$ for some ${\delta > 0}$. then

$\displaystyle \boxed{ \Lambda_n \xrightarrow{d} N(-\tau^2/2,\tau^2)}.$

Posted in Statistics | | 1 Comment

## Whittle’s Approximate Likelihood

The Whittle Likelihood is a frequency-based approximation to the Gaussian Likelihood which is up to a constant asymptotically efficient. The Whittle estimate is asymptotically efficient and can be interpreted as minimum distance estimate of the distance between the parametric spectral density and the (nonparametric) periodogram. It also minimises the asymptotic Kullback-Leibler divergence and, for autoregressive processes, is identical to the Yule-Walker estimate. The evaluation of the Whittle Likelihood can be done very fast by computing the periodogram via the FFT in only ${O(NlogN)}$ operations.

Suppose that a stationary, zero mean, gaussian process ${\{X_t \}}$ is observed at times ${t=1,2,..T}$. Assume ${\{X_t \}}$ has spectral density ${f_{\theta}(\lambda)}$, ${\lambda \in \Pi := (-\pi, \pi]}$, depending on a vector of unknown parameters ${\theta \subset \Theta \in \mathbb{R}^p}$. A natural approach to estimate the parameter ${\theta}$ from the sample ${\mathbf{X}_T}$ is to maximize the likelihood function or alternatively to minimise ${-1/T}$ times the log-likelihood. The later takes the form