Can someone tell me if the Anderson-Darling test is an optimal test and if it is valid to check the normality?

NOTE. Besides using the AD test, I'm also checking the associated QQplots in order to be sure my data is not normal.

- Thread starter David E.S.
- Start date

Can someone tell me if the Anderson-Darling test is an optimal test and if it is valid to check the normality?

NOTE. Besides using the AD test, I'm also checking the associated QQplots in order to be sure my data is not normal.

Of course if you have 5000 cases normality almost certainly does not matter anyhow. Why are you testing for normality with so many cases? If you are running a regression model, ANOVA etc, normality will have no impact when that many cases.

I tested the normality of my data because I wanted to know whether to calculate mean and standard deviation or median, Q25 and Q75, While my data is not normal, I used the second option (median and quantiles), not only for creating a table but also to represent graphically my data (using the median despite the mean).

In the statistical course of my University, we used the Shapiro test for determining the normality of datasets, unless QQplots were used to check the hypothesis of the normality of the residuals in a model... Why are these tests not recommended?

Thanks!!

There are several reasons they are not recommended and I have not read the material in a while (once I read the critique I quit using them).

All those test have a null say its normality. But when you have a lot of cases, a lot of statistical power, you are likely to reject the null regardless. More importantly, what does rejecting the null really mean. Say it means your data is non-normal. But how non-normal, in what form? I would guess all real world data is somewhat non-normal (there is a joke that goes if the data is normal someone made it up). QQ plots address both problems to some extent. They tell you how, and in what form, they are non-normal.

Most people, although this is not true in your case, are concerned about normality because it is an assumption of regression and ANOVA. But it turns out with a lot of cases it rarely matters if it is non-normal for these methods. They still teach it in classes, most who run models ignore it as long as they have a hundred of so cases.

If you have large data sets the tests can be overly sensitive, so you are better off using the fat pencil test on a QQ plot.

If you have large data sets the tests can be overly sensitive, so you are better off using the fat pencil test on a QQ plot.

What the heck is the fat pencil test miner.