PSYC 6802 - Introduction to Psychology Statistics
The epitools package (Aragon et al., 2020) includes many functions that help with calculating odds ratios and risk ratios. These are common statistics that are used to summarize the information in contingency tables.
Today we’ll look at some flight delays:
'data.frame': 539383 obs. of 9 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ Airline : chr "CO" "US" "AA" "AA" ...
$ Flight : int 269 1558 2400 2466 108 1094 1768 2722 2606 2538 ...
$ AirportFrom: chr "SFO" "PHX" "LAX" "SFO" ...
$ AirportTo : chr "IAH" "CLT" "DFW" "DFW" ...
$ DayOfWeek : int 3 3 3 3 3 3 3 3 3 3 ...
$ Time : int 15 15 20 20 30 30 30 30 35 40 ...
$ Length : int 205 222 165 195 202 181 220 228 216 200 ...
$ Delay : int 1 1 1 1 0 1 0 0 1 1 ...We will mostly be looking at categorical variables today, so it’s a good point to introduce factor variables. Notice the Delay variable on the previous slide, which is an integer vector of 0s and 1s:
factor() function:
So, we define the values as 0 = On Time and 1 = Delay:
A factor variable is more or less the same as assigning value labels in SPSS.
levels =: here you define the levels of you factors (the name of each group).
labels =: here you can label the corresponding groups. Now 0 and 1 will show up as On Time and Delay.
The χ2 (χ reads “ki”) test of goodness of fit refers to a test used to check whether some values occur at different rates than what you would expect. The simplest possible example is checking if a coin is fair (i.e., heads and tails come up 50% of the times):
I flipped my coin 200 times and I got 110 heads and 90 tails:
The null and alternative hypotheses are:
The chisq.test() function takes in either a vector or table of counts and runs a χ2 test.
Thus, at α=.05, there wasn’t enough evidence to reject the H0, χ2(1)=2, p=.16.
Sometimes I like to run these analyses “by hand” to check that I am doing the right thing. In the lecture you have seen that the formula to calculate the χ2 statistic is given by:
χ2=∑i(Oi−Ei)2Ei
Where Oi is the observed proportion of a category and Ei is the expected proportion of that category. Since we have 200 flips, if the coin is fair, we expect both tails and heads to happen 100 times each (Ei=100 in both cases), so:
Of course you would always use the chisq.test() function in practice, but I want to briefly touch on the intuition behind the χ2 formula.
Unfortunately many statistics formulas seem to come out of nowhere 🤷 χ2=∑i(Oi−Ei)2Ei is no exception. However, you can often gain a good sense of what the formula is doing by breaking down its components:
So, the χ2 formula summarizes how different your observed data is compared to what you would expect to see. The more unexpected your observed frequencies, the larger the χ2 statistic will be.
We reject H0, χ2(1)=410.66, p<.001, meaning that there is some association between airline and delay.
Even in the case of a table the formula to calculate the χ2 statistic is still ∑i(Oi−Ei)2Ei. Although it does not change the overall logic, the expected cell frequencies, Ei, is calculated in a slightly different way: Ei=Rtot×CtotTotal, where
This works because the chisq.test() function actually saves a lot of information when you save it as an object.
The same is true for many of the functions that we will see later in the course.
You may find the some association part from two slides ago a bit vague…and you’d be right! All the χ2 test we just ran tells us is that there is some association somewhere in out table. In our case it is easy to figure out which airline has a higher proportion of delays, but it becomes much harder when the number of rows and columns increases 🧐
Once again the χ2 formula summarize how different Ois are from Eis, how different the data is from what we would expect if H0 is true. The idea is that, based on probability theory:
the only way that we observe something that is on average far from our expected frequencies is if there is some association between the rows and the columns
Thus, it would be really unlikely to see a table of observed frequencies that is different that the expected if H0 is true and there is no association. Notice that this is H1 from two slides ago and is the only thing that you are testing when running a χ2 test 🤔
Here are two popular effect sizes for χ2 tests
We got a W=.06. How big (or small) is that effect size? The correct answer is that it depends. The determination usually involves some cost-benefit trade off which depends on the field that you are in.
Everyone will reference effect sizes “guidelines” from Cohen (1988), who went through many popular effect sizes metrics and suggested some thresholds for what is a small, medium, and large effect size.
According to Cohen’s guidelines for W :
So, our effect size of W=.06 is small, meaning that the degree of association between airlines and delay is small (i.e., although one airline has more delays, it’s not by that much).
“Guidelines”? The Unfortunate Truth
The main argument for providing guidelines is that it helps applied researchers to make decisions. The idea is reasonable, but in my experience this sometimes turns into “I don’t really understand it, so tell me what to do based on these numbers in my output”. The most obvious case is p<.05, which is probably the main cause of the replication crisis. Unfortunately, we teach to follow these “guidelines”, and this course will be no different (I am not happy about it 😔). When doing statistics in the real world, always stop and ask yourself why you are doing something or making a certain decision. If your answer is “because someone else said so”, then you still do not know the why, which is the most important thing.
You may come across the likelihood ratio test, which in the case of a contingency table ends up being quite similar to the χ2 test:
χ2=2∑i[Oi×ln(OiEi)]
Where ln() is the natural logarithm. In R this scary equation is quite simple to compute:
The classical χ2 test generally works best for contingency tables. However, the likelihood ratio test is more general and will come up again because it can be used to compare different statistical models.
Some smart math people have worked out that if any of the expected frequencies are too small (Ei≤5, not the case of the example), the the type I error rate (more on this in Lab 5) of the χ2 test is not accurate. In this case you should apply some adjustments to your χ2 statistic. Both of these are fine:
The chisq.test() function applies this correction by default. You can see it in the output:
This is good because there is no “disadvantage” to applying this correction, so it may as well be the default.
χ2adj=χ2×NN−1, where N is the sample size.
X-squared
410.9175
The adjustment makes almost no difference because of the really large N.
When you have small expected frequencies, Fisher’s exact test is probably a better choice than a χ2 test. The why it is a better choice is not straightforward to explain (but let me know if you are curious).
Fisher’s exact test works by checking whether the odds ratio (see next slides) of the table is significantly different from 1.
The null and alternative hypotheses are:
The odds ratio is significantly different from 1, odds = 1.29, p<.001, 95% CI[1.26, 1.32]
Fun Fact
Fisher’s exact test is based on the hypergeometric distribution. The fun fact is that the hypergeometric distribution is the probability distribution that is used to determine the probability of drawing certain hands in card games. Incidentally, the hypergeometric distribution is used a lot in Texas Hold’em Poker, and any other competitive card game really.
Odds and probabilities are two different ways of describing the likelihood of an event occurring. Let’s say that team A and team B are playing a Basketball game and we know that team A wins 80% of the times. Let’s define π=.8 as the probability of team A winning. Then, the the odds of team A winning are:
odds=π1−π=.81−.8=4
This is what you would normally hear as “team A has 4 to 1 odds of beating team B”. It is important to note that given any odds, we can recover the probability π:
π=oddsodds+1=44+1=.8
Odds: Number of successes compared to number of failures.
Probability: Number of successes compared to total number of trials.
In the case of our airplane example, we can look at the odds of both Delta and AA having a delayed flight:
On Time Delay
AA 27920 17736
DL 33488 27452
We see that Delta is more likely to have delays as it’s odds are higher than AA. But by how much? We can calculate an odds ratio:
oddsratio=oddsDLoddsAA=.82.63=1.3
Interpretation: The odds of Delta flights being delayed are 1.3 larger than the odds of AA flights being delayed.
So, an odds ratio is quite literally the ratio of two odds 😀
In epidemiology language, Risk is the exact same as probability. We can get the risk (probability) of either company’s flights being delayed:
On Time Delay
AA 27920 17736
DL 33488 27452
How much more likely are Delta flights to be delayed? The relative risk of a Delta flights being delayed compared to AA flights answers this question:
riskrel=riskDLriskAA=.45.39=1.15
Interpretation: Delta flights are 1.15 more likely to be delayed compared to AA flights.
So odds and risk (probability) have different interpretations. Be careful when you use them or read about them.
epitoolsIn practice, you probably want to use the epitootls package to calculate odds ratios:
$data
On Time Delay Total
AA 27920 17736 45656
DL 33488 27452 60940
Total 61408 45188 106596
$measure
odds ratio with 95% C.I.
estimate lower upper
AA 1.000000 NA NA
DL 1.290446 1.258956 1.322845
$p.value
two-sided
midp.exact fisher.exact chi.square
AA NA NA NA
DL 0 1.502078e-91 2.314218e-91
$correction
[1] FALSE
attr(,"method")
[1] "Conditional MLE & exact CI from 'fisher.test'"
The function prints a bunch of stuff. The first thing to note is that the odds ratio is under $measure in the second row DL 1.290446, which is the same thing we got by hand. The first row is 1 because the function is dividing the odds of AA being late to the odds of every row (oddsAAoddsAA=1).
More importantly, under $measure you can also find the 95% CI for each of the odds ratios.
Finally, under $p.value you can find significance tests based on three different tests criteria (which will almost always agree in practice). In all cases, p<.001.
epitoolsThe epitootls package also has functions that calculates relative risk (also called risk ratio):
$data
On Time Delay Total
AA 27920 17736 45656
DL 33488 27452 60940
Total 61408 45188 106596
$measure
risk ratio with 95% C.I.
estimate lower upper
AA 1.000000 NA NA
DL 1.159615 1.142957 1.176515
$p.value
two-sided
midp.exact fisher.exact chi.square
AA NA NA NA
DL 0 1.502078e-91 2.314218e-91
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
The output is the same, so we find the risk ratio and the confidence interval under $measure. Once again, go the same risk ratio by hand (with some rounding error).
The epitools package has other functions to calculate odds ratios and relative risk, which only differ a bit in how the 95% CI is computed (very similar in practice).
e-N?
In R output you will often see some e-N. On the left we see e-91 in the p-values. This is computer notation to represent small numbers. So, 1.134e-3 means 0.001134. Then, e-91 means zero with 90 zeros after the decimal point (pretty small number). This is good because you do not want to see 90 zeros in your output.
By default, the epitools functions take the odds or risk of the first row, and treat the second column as the success. Then, they divide every other row by the odds/risk of the first row. (you can have more than 2 rows, but you always need two columns exactly)
$data
On Time Delay Total
DL 33488 27452 60940
AA 27920 17736 45656
Total 61408 45188 106596
$measure
odds ratio with 95% C.I.
estimate lower upper
DL 1.0000000 NA NA
AA 0.7749256 0.7559463 0.7943092
$p.value
two-sided
midp.exact fisher.exact chi.square
DL NA NA NA
AA 0 1.502078e-91 2.314218e-91
$correction
[1] FALSE
attr(,"method")
[1] "Conditional MLE & exact CI from 'fisher.test'"
You can change the order of the table columns and rows by using the rev = argument. Here I invert the rows. You can confirm that by comparing the table under $data from this slide to the previous slides.
The idea is that, because you know how the functions calculate odds/risk ratios, you give it a table such that it calculates what you want. Here .77 is oddsAAoddsDL.
How do I know all of this? The function help menu ?oddsratio.fisher! (the risk version of these function work exactly the same)
Categorical data comes up a lot in applied research, and we definitely do not talk about categorical data much in your average intro to statistics course (or in regression).
If you are dealing with lots of categorical data, I highly recommend you try to get a hold of Categorical Data Analysis by Agresti (2018). This book is quite comprehensive, includes R code examples, and explains many important concepts very clearly!
NOTE: You may also find the same book called An Introduction to Categorical Data Analysis, but they are the same book. Not sure what’s up with that 🤷
PSYC 6802 - Lab 3: Chi-Square