Abstracts
Résumé
Au cours de ces dernières années, beaucoup d'efforts ont été consacrés au développement de formules de probabilité empirique (FPE) non biaisées. En raison même de leur définition, les FPE non biaisées sont dépendantes de la distribution parente des échantillons considérés, et une formule doit donc être établie pour chaque distribution. Dans cette étude on passe en revue les différentes approches pour le développement des FPE, et on montre que la FPE basée sur la médiane des statistiques d'ordre peut constituer un compromis acceptable entre les FPE non biaisées (i.e. correspondant à la moyenne des statistiques d'ordre), et les FPE basées sur le mode des statistiques d'ordre.
Contrairement à ces dernières, la FPE basée sur la médiane des statistiques d'ordre est indépendante de la distribution parente des échantillons, et peut donc être utilisée de façon standard. Par ailleurs, la FPE basée sur la médiane des statistiques d'ordre est moins biaisée que la FPE de Weibull, qui est également indépendante de la distribution parente des échantillons. Bien qu'il n'existe pas d'expression analytique exacte pour la FPE basée sur la médiane des statistiques d'ordre, BENARD et BOS-LEVENBACH en ont proposé une très bonne approximation, pm= (m - 0.3)/(n + 0.4). Cette formule est aussi connue sous le nom de formule de Chegodayev.
Mots-clés:
- Formules de probabilité empirique,
- distribution des fréquences échantillonnales,
- médiane des statistiques d'ordre,
- formule de Benard et Bos-Levenbac
Abstract
Plotting position formulae (PPFs) maintain an important role in engineering practice. In the area of flood frequency analysis they are used to assign exceedance probabilities to observed floods. In the present study we review various principles for the choice of PPFs. These can be divided into three main categories : (1) formulae based on the observed sample frequencies, (2) formulae based on the distribution of sample frequencies, and (3) formulae based on the distribution of order statistics. PPFs in the first two categories are distribution-free, meaning that no assumption needs to be made regarding the form of the parent distribution of events. The Hazen PPF is an example of a formula belonging to the first category.
The Weibull PPF, which is probably the most used formula in practice, belongs to the second category. It can be shown that the frequency corresponding to a particular order statistic is beta distributed regardless of the form of the parent distribution. Being equivalent to the expected value in the beta distribution the Weibull plotting position, pm = m/(n + 1), therefore corresponds to the mean value of sample frequencies. In his book on statistical extremes GUMBEL (1958) recommended the Weibull formula, because it fulfils a set of criteria which he found important. Some of these criteria have later been questioned, for instance by CUNNANE (1978). Various studies have demonstrated that the Weibull PPF is significantly biased in the event domain for most common distributions (an exception is the uniform distribution, where the Weibull PPF is the exact unbiased plotting position).
In recent years most attention has been paid to PPFs of the third category. CUNNANE (1978) strongly recommended the use of unbiased PPFs, i.e. formulae for the exceedance probabilities of the expected values of order statistics. In the last decade much effort has been devoted to the development of unbiased plotting positions. As unbiased PPs in virtue of their definition are related to the parent distribution, they will differ for each individual distribution. In general, it is not possible to derive exact unbiased PPFs, but good approximations have been developed for most common distributions such as the normal, the Gumbel, the generalized extreme value, and the Pearson type III distributions (see table 1). If a distribution contains a shape parameter, this must be reflected in its unbiased PPF. Approximate formulae for such distributions are therefore in general of a more complex form. This fact along with the need for distribution-dependent formulae is probably the reason why the Weibull formula is still the most used PPF in practice.
In the present study we emphasize the practical convenience of having a distribution-free PPF which at the same time has a statistical interpretation and is related to the distribution of order statistics. The median PP fulfils these points. It is easily seen to be distribution-free by observing that the median in the distribution of order statistics corresponds to the median in the distribution of frequencies (beta distribution). In general, one of two different principles is commonly adopted when developing estimators : either the choice of the modal value (maximum likelihood principle) or the choice of the mean value (principle of unbiasedness). Unfortunately these distinct principles usually lead to different estimators. For continuous distributions used in flood frequency, the median of order statistics is located in between the modal value and the mean value. In general, it is much less biased thon the Weibull PPF, which for the EV1 distribution is a reasonable approximation to the modal PP. Although no analytical expression exists for the median plotting position, an approximation has been deduced by BENARO and BOS-LEVENBACH (1953), namely pm = (m - 0.3)/(n + 0.4). This formula is also known as the Chegodayev PPF.
Various PPFs are exemplified in the case of three different parent distributions, namely the normal, the log-normal, and the Gumbel distributions. The sample size n = 10 is considered. The results of applying different PPFs are presented in table 2 (for the largest order statistic in the sample). Several observations can be made : 1) The Benard and Bos-Levenbach-formula is a good approximation to the median PP; 2) The Weibull-formula is close to the modal value PP when the parent is EV1, but differs significantly in the case of other parents; 3) The unbiased PPs depend strongly on the underlying distribution, and an unbiased PPF suitable for all distributions can therefore not be found; 4) The median PP is a fair compromise between the mean and the modal values of the order statistics in all three cases, and it is therefore recommended as a good choice of a standard PPF.
Keywords:
- Plotting position formulae,
- unbiasedness,
- median,
- Beta distribution,
- Benard and Bos-Levenbach formula
Download the article in PDF to read it.
Download