Consider a distribution with and . Suppose independent drawings are made from the distribution and the resulting frequencies are given by , where . Then the probability of getting the same frequencies by sampling from is given by

and thus

since . Set . Then

where is the entropy of the distribution w.r.t. the distribution . The entropy here can be interpreted as the logarithm of the probability of getting the distribution (which could asymptotically be the true distribution) by sampling from an hypothetical distribution .

Based on Sanov’s result (1961) the above discussion may be extended to more general distributions. Let and be the pdfs of the true and hypothetical distributions respectively, and the pdf estimate based on the random sampling of observations from . Then

as Note that equals which is the Kullback-Leibler divergence between and . Note also that . That is because

Suppose that we observe a data set of N elements. We could predict the future observations whose distribution is identical to that of by specifying a predictive distribution which is a function of the given dataset . The “closeness” of to the true distribution of the future observations is measured by the entropy

Hence the entropy is equivalent to the expected log-likelihood with respect to a future observation apart for a constant. The goodness of the estimation procedure specified by is measured by which is the average over the observed data of the expected log-likelihood of the model w.r.t. a future observation.

Suppose and are independent and that the distribution is specified by a fixed parameter vector (i.e.. Then and hence the conventional ML estimation procedure is justified as

However generally

Akaike proposes the log-likelihood of the data-dependent model as distinct form the log-likelihood of the parameter , be defined by

where is a constant s.th.

For the above definition to be operational the constant C must be a constant for the members of a family of possible models. That could be done by restricting to be of the form .

Let , denote competing models. Assume that the true distribution belongs to each of these models. Use the notation for and assume that the usual regularity conditions hold. Let denote the ML estimate of .

- As , the LR statistic
asymptotically, where .

- Expanding the following expression in the neighbourhood of we get
Ignoring the higher order terms we have

Note that the property of the best asymptotic normality of implies that

where . Hence

as .

Combining 1 and 2 we thus get

and

and thus it follows that

Hence a model which could be adopted is the one which maximizes over . The basic principle underlying this procedure is the *\underline{entropy maximization}* as the maximization of is equivalent to the maximization of . This maximization problem is more commonly replaced by the equivalent problem of minimization of which can be generally expressed as

The RHS the first term reflects the lack of fit while the second penalizes the model complexity. The optimum model which minimises the AIC reflects the trade-off between the two terms.

——

**References**

Akaike, H. (1978a). On the Likelihood of a Time Series Model. Journal of the Royal Statistical Society. Series D (The Statistician). Vol. 27, No. 3/4, Partial Proceedings of the 1978 I.O.S. Annual Conference on Time Series Analysis (and Forecasting) (Sep. – Dec., 1978), pp. 217-235

Akaike, H. (1985). Prediction and entropy. Pages 1-24 in A. C. Atkinson, and S. E. Fienberg (Eds.) A celebration of statistics. Springer, New York, NY.

Sanov, I. (1961). On the probability of large deviations of random variables. IMS and ASM Selected Translations in Mathematical Statistics and Probability. 1, 213-44

Sakamoto, Y., Ishiguro, M., and Kitagawa, G. (1986). Akaike information criterion statistics. KTK Scientific Publishers, Tokyo.

Tong, H. (*1990*). *Non-linear Time Series: A Dynamical System Approach*. Oxford University Press.

Tong, H. (1994). Akaike’s approach can yield consistent order determination. Pages 93-103 in H. Bozdogan (Ed.) Engineering and Scientific Applications. Vol. 1, Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach. Kluwer Academic Publishers, Dordrecht, Netherlands.

**See also:**

Parzen, E. Hirotugu Akaike, Statistical Scientist.