Simple Interactive Statistical Analysis
Note: Soon this page will move to this location.
T-test
Input.
Compare two independent samples
Select options and hit the calculate button.
Compare a single sample with the population
Select options and hit the population button.
Explanation.
Risk Ratio | Odds Ratio | PARF | Rate Ratio | Number Needed to Treat (NNT) | Fisher Exact Analysis | Other Exact Analysis
Chi-squares |
Equal Variance (Welch/Student) | Confidence Intervals | Degrees of Freedom | Population Analysis
For an explanation of the Pairwise t-test or the t-test for two correlated samples, consult the pairwise help page.
T-test concerns a number of procedures concerned with comparing two averages. It can be used to compare the difference in weight between two groups on a different diet, or to compare the proportion of patients suffering from complications after two different types of operations, or the number of traffic accidents on two busy junctions. You can compare 'continuous' averages, they can be above or below one, examples are the difference in mean length or weight between two groups of people. The certainty with which these averages are measured are expressed in the standard deviation. Also, you can compare 'proportion' averages, basically a number divided by a larger number. Examples are the proportion of people suffering from complications comparing two different types of operation (number of complications on the number of operations), the proportion of a manufactured product damaged comparing two different methods of production (number damaged on the number manufactured). The certainty of these averages is directly related to the number of cases observed. Some more discussion on proportion averages can be found on the Binomial help-page. Lastly, discussion on counted averages can be found on the Poisson help-page.
The t-test gives the probability that the difference between the two means is caused by chance. It is customary to say that if this probability is less than 0.05, that the difference is 'significant', the difference is not caused by chance.
The t-test is basically not valid for testing the difference between two proportions. However, the t-test in proportions has been extensively studied, has been found to be robust, and is widely and successfully used in proportional data. With one exception: if one of the proportions is very close to zero, one or minus one, you will do better with Fisher's exact test.Both one and double sided probabilities are given. In one-sided tests it is assumed that before doing the test you had a hypothesis that one mean of the two means was bigger than the other mean, i.e. proportion. If you did not have such a prior hypothesis, and you only aim to test for a possible difference between the means, you need to do a double-sided test; in this case you would mostly multiply the p-value by two.
Learn more about the t-test from Statistics at Square One.
The program provides you with a number of additional statistics:
1) The Odds-ratio. The odds ratio takes values between zero ('0') and infinity. One ('1') is the neutral value and means that there is no difference between the groups compared; close to zero or infinity means a large difference. An odds ratio larger than one means that group one has a larger proportion than group two, if the opposite is true the odds ratio will be smaller than one. If you swap the two proportions, the odds ratio will take on its inverse (1/OR).
The odds ratio gives the ratio of the odds of suffering some fate. The odds themselves are also a ratio. To explain this we will take the example of traditional versus alternative surgery. If 10% of operations results in complications, then the odds of having complications if traditional surgery is used equals 0.11 (0.1/0.9, you have a 0.11 times higher chance of getting complications than of not getting complications). 12.5% of the operations using the alternative method result in complications, giving odds of 0.143 (0.125/0.875). The odds ratio equals 0.778 (0.11/0.143). You have a 0.778 times higher chance of getting complications than of not getting complications, in traditional as compared with alternative surgery. The inverse of the odds ratio equals 1.286. You have a 1.286 times higher chance of getting complications than of not getting complications, in alternative as compared with traditional surgery. This takes some getting used to, we admit, but it has its advantages.
2) The Risk-ratio. The risk ratio takes on values between zero ('0') and infinity. One ('1') is the neutral value and means that there is no difference between the groups compared, close to zero or infinity means a large difference between the two groups on the variable concerned. A risk ratio larger than one means that group one has a larger proportion than group two; if the opposite is true the risk ratio will be smaller than one. If you swap the two proportions, the risk ratio will take on its inverse (1/RR).
The risk ratio gives you the percentage difference in classification between group one and group two. For example, the proportion of people suffering from complications after traditional surgery equals 0.10 (10%), while the proportion suffering from complications after alternative surgery equals 0.125 (12.5%). The risk ratio equals 0.8 (0.1/0.125); 20% ((1-0.8)*100) fewer patients treated by the traditional method suffer from complications. Another example: 8% of freezers produced without quality control have paint scratches. This percentage is reduced to 5% if quality control is introduced. The risk ratio equals 1.6 (8/5); 60% more freezers are damaged if there is no quality control.
3) The Rate-ratio. The interpretation of the rate ratio is similar to the interpretation of the risk ratio, discussed above. The risk ratio is related to proportional data, the rate ratio to counted, poisson type, data. The rate ratio is discussed in more depth on the SMR-exact help page.
4) The population attributable risk fraction (PARF). The population attributable risk fraction is the fraction in those found to be “diseased” in a population which can be attributed to a risk factor. For example, to determine the effect of smoking on cancer mortality fill in the proportion of cancer deaths in the non-smokers in the top Mean 1 box and the proportion of cancer deaths in the smokers in the second Mean 2 box. The PARF gives you the proportion of cancer deaths which is caused by smoking. Note that this is valid only for the population studied and that this only works if the full input is representative of this population. This means that not only do Mean 1 and Mean 2 need to be unbiased estimators, the odds N1/N2 must be equal to the odds non-smokers/smokers in the population. The second confidence interval (RRCI), if additional confidence intervals are requested, is obtained by substituting the confidence interval for the Risk Ratio in the formulae for the PARF. The way the procedure is implemented assumes that the exposed group is in the Mean 2 box, and that the proportion in this box is higher as the proportion in the Mean 1 box.
5) Number Needed to Treat (NNT) is a measure which is becoming increasingly popular in the medical field. The NNT is the reciprocal of the absolute risk-difference (ard=|proportion1-proportion2|) and expresses the number of persons to be treated to 'cure' one person.
Number Needed to Treat has some very appealing properties in interpretation, particularly in combination with cost calculation. An example of the use of NNT: if no treatment is given 20% die, with treatment 15% die. NNT=20 (1/|0.2-0.15|). We need to treat 20 people to save one life. But now we develop a preventive program in a completely different area of health care and succeed in bringing the mortality down from 45% to 44.5%. NNT=200 (1/|0.45-0.445|). We need to apply our preventive program to at least 200 people to save one life. This does not seem very effective compared with treatment.
However, the cost of treatment is $200 per person, prevention costs $10 per person. The cost per life saved equals $4000 (20*200) for treatment against $2000 (200*10) for prevention. Prevention is highly cost effective and given a limited budget it should get precedence over treatment.
This way one can do quite a number of nice comparisons using the NNT. A paper by Schulzer and Mancini gives some examples.
The default Confidence Interval for the Number Needed to Treat is calculated according to a method first suggested by Cook and Sacket. This method is based on inverting the confidence interval of the difference between two means. As the Confidence Interval for a difference between two means can be calculated by way of different methods, the same applies for the confidence interval for the NNT. Check the C.I. Option to get some different confidence intervals for the NNT and read below about these different confidence intervals. The basic default method suggested by Cook and Sacket is most often used in practice. If you check the CI option and the NNT option you also get the Confidence Interval for the NNT suggested by Schulzer and Mancini. This one is rather interesting theoretically and based on the Geometric Distribution; which is related to the NNT standing for the notion that a doctor has to "wait" NNT number of patients before seeing one "cured" patient. Please note that whatever method is used, confidence intervals for the NNT are nonsensical if the difference between the two means is not statistically significant, i.e., if the probability of the t-value is more than 0.025 (in the case of a 95% confidence interval). The confidence interval for the NNT should NEVER be used for hypothesis testing. It is there for your information only. Use the t-test for hypothesis testing. Read more about the problems with the confidence interval for the NNT here.
Converting the odds-ratio into the NNT is done on the Logit page.
6) Fisher Exact Test. For use in the analysis of the difference between two proportions. The Fisher's Exact test procedure calculates an exact probability value for the relationship between two dichotomous variables, as found in a two by two crosstable. Basically, the data structure at the basis of a t-test for a difference between two proportions is a two by two crosstable. The Fisher procedure calculates the probability of the difference between the data observed and the data expected, considering the given marginal and the assumptions of the model of independence. The single-sided p-value is the summed probability of all more extreme or similar tables compared with the given table (notation p(Observed>=Expected )). There is also a p-value of the relationship going in the other direction. Do not take too much notice of this p-value; it is not so important. There are a number of theories about how to present double-sided p-values (Agresti, 1992). Data on the basis of two of these theories are presented. First, the sum of small p-values. For the sum of small p-values all tables are generated which are possible given the margins. All p-values of the same size or smaller than the point probability are added up to form the cumulative p-value. The result is relevant to the notation p(O>=E |O<=E). Statisticians usually recommend this method. Another method of estimating the double-sided p-value is to take twice the single-sided probability. For a further discussion of the Fisher exact test please consult the Fisher help page.
7) Other Exact Tests. Besides the Fisher test, discussed above, the Binomial or the Poisson test are presented in the case of comparing a sample with a population parameter. The tests are discussed on the Binomial and the Poisson help page respectively. Interpret the double and the single sided exact tests in the summary as follows. The double sided significance test according to the method of small p's and the notation >= gives the exact probability of the difference between the expected and the observed value or any larger difference, considering the location of the expected and the observed value. The notation > relates to the probability of getting a larger difference than the observed difference between the observed and the expected value. The single sided test with notation >= gives the exact probability of getting the value observed or any larger value, considering the expected value. Similarly, the notation <= gives the probability of getting the observed or any smaller value; the notation > gives the probability of getting larger values than the observed value; the notation < gives the probability of getting smaller observed values than the one expected.
8) Chi-square tests. For use in the analysis of the difference between two proportions. The Chi-square tests calculate a double sided probability value for the relationship between two dichotomous variables, as found in a two by two crosstable. Basically, the data structure at the basis of a t-test for a difference between two proportions is a two by two crosstable. The Chi-square procedures calculate the probability of the difference between the data observed and the data expected, considering the given marginal and the assumptions of the model of independence. The Chi-squares tests give only an estimate of the true Chi-square and associated probability value, an estimate which might not be very good in the case of the margins being very uneven or with a small value (~less than five) in one of the cells. In that case the Fisher Exact is preferred. Four Chi-square tests are presented. Pearson's Chi-square is mathematically related to the classical Pearson's Correlation co-efficient and to Analysis of Variance. Pearson's Goodness of Fit Chi-square (GFX) is most often used in research. Likelihood Ratio Chi-square (LRX) was developed more recently than the Pearson's chi-square and is the second most frequently used Chi-square. It is directly related to log-linear analysis and logistic regression. Yate's Chi-square is equivalent to Pearson's Chi-square with Continuity Correction. Mantel-Haenszel Chi-square is thought to be closer to the 'true' Chi-square if small numbers of cases are involved. It is not often used. For a further discussion of the Chi-square tests please consult the Two by Two help page.
Equal or unequal variance.
SISA will default assume that the variances are unequal and will calculate Welch’s t-test. This method produces a slightly smaller t-value as the traditional student's t-test. Degrees of freedom for the Welch's t-test are calculated using a complicated formula. The number of degrees of freedom will be smaller as in the student’s t-test. If you check the “Equal Var” box SISA will calculate the traditional student’s t-test with n1+n2-2 degrees of freedom. The student’s t-test is more powerful than Welch’s t-test and should be used if the variances are equal. There is an f-test to estimate the probability of the variances being equal.
Confidence Intervals.
Checking this option gives you additional Confidence Intervals for the difference between two proportions, the difference between two means, the NNT and the odds-ratio. Note that the default confidence intervals, the ones you get when you do not ask for additional confidence intervals, would be the preferred choice for many researchers. The CC-Wald confidence interval is the Continuity Corrected version of the usual Wald confidence interval. Gives a slightly wider confidence interval and can be used when the number of cases is small. The Newcombe-Wilson (NW) hybrid score confidence interval is proposed by Newcombe (1998). Is based on different assumptions regarding the relationship between the sample and the population variance, the score test approach. Gives a slightly narrower confidence interval. Although the method has some superior properties it is rarely used. The Normal Approximation confidence interval for the Odds-Ratio is discussed in many older statistics books, such as Reynolds (1977). Is based on the standard error and has the disadvantage of being a linear beast in a non-linear world. Almost completely superseded by the Wald's C.I. The chi-square based confidence interval is suggested by Miettinen (1976). The Pearson Chi-square is used in the calculation. Rarely used.
Degrees of Freedom.
There are various ways in which the number of degrees of freedom for the two-sample t-test can be calculated. In the case of un-equal variances Welch’s t-test is mostly used and the number of degrees of freedom for this method is calculated with a complicated formula. The number of degrees for the student’s t-test equals n1+n2-2. In the case of the equal variance assumption this number of degrees of freedom is correct for the student’s t-test. However, if the variance of mean1 is different from the variance of mean2, this number of degrees of freedom is too large for the student’s t-test. Using the n1+n2-2 number of degrees of freedom leads to a difference being declared statistically significant too easily and a higher chance of a Type I error. Wonnacott and Wonnacott, among others, suggest using n1-1 or n2-1, whichever is smaller. Unfortunately, using this formula makes it too difficult to declare a difference statistically significant, an increased chance of a Type II error. Should you wish to use this method anyhow (not suggested!) you can do your calculations and use the Significance procedure on the SISA website to calculate the p-value.
Population analysis.
There are two cases in which population analysis can be done: a) the 'historical' situation were some sort of an arrived opinion on the numerical value of a phenomenon exists; b) in the case the numerical value is a population value, for example, the number of deaths in a community can be exactly known. In the case of 'a', exactly seems a relative concept and Bayesian methods might be preferred. In the case of 'b' the methods proposed here are valid. Fill-in the population proportion or average in the top box, fill-in the sample proportion or average in second box. Input is much the same as above. 'Click' the population button.
There are some subtle differences which we will discuss now.
In the case of proportions it should be considered that the underlying nature of the data is quite different from data used to test for a difference between two estimated proportions. In the case of two estimated proportions the data consists of a two by two table and all methods for table analysis apply. In the case of comparing a population value with a sample estimate it concerns data which compares an expected with an observed distribution in a one dimensional array with two categories. In this case not the Fisher but the Binomial is the exact alternative and most appropriate test, use it in SISA online, use SISA's MsDOS version if you have a large sample size, or a Poisson approximation if you have a very large sample. The method used in this population procedure is the normal approximation of the Binomial.
SISA believes this is a non preferred way of doing things which was relevant before computer crunching, consult Wonnacott and Wonnacott or Blalock for a discussion of the normal approximation.
Give the population proportion in the top box and the sample proportion in the second box. Only one number of cases has to be given (for the sample size), and you do not have to give standard deviations. If you want descriptive statistics click the 'calculate' button. Odds and risk-ratioos, and NNT are valid. However, standard errors, confidence intervals, and significance tests for these ratio's and otherwise are not valid. In the procedure as implemented here continuity correction is not applied.
For averages the analysis suggested here is very valid and correct for its purpose. Give the population average in the top box and the sample average in the second box. Only one number of cases has to be given (for the sample size). If you give a standarddeviation in the standard deviation 1 box this standard deviation is considered to be the population standard deviation and the normal distribution is applied, if you give a standarddeviation in the standard deviation 2 box, which is the most bottom box, this standard deviation is considered to be the sample standard deviation and the t-distribution is used. If you give no standard deviation at all the averages are considered to be Poisson distributed rates and the standard deviation is considered to be the square root of the average. No continuity correction is applied.