and type II errors, level of significance, power of the test, p-value. Concept of standard error and
confidence interval.
Hypothesis is a conjectural statement of the relation between two or more variables. For example, a study designed to look at the relationship between anxiety and test performance might have a hypothesis that states, "This study is designed to assess the hypothesis that anxious people will perform worse on a test than individuals who are not anxious."
Hypothesis are always in declarative sentence form, and they relate, either generally or specifically, variables to variables. There are two criteria for "good" hypothesis and hypothesis statements. One, hypothesis are statements about the relations between variables. For example, over-learning leads to performance decrement. Second criterion is that hypothesis carry clear implications for testing the stated relations. For example, groups A and B will differ on some characteristics. So hypothesis can be tested and shown to be probably true or probably false.
Hypothesis has some virtue. It directs investigation. There are important differences between problem and hypothesis. The problem is a question and is not directly testable. But hypothesis is testable.
Sources of hypothesis
Hypothesis can be deduced from theory and from other hypotheses.
Hpothesis testing -
Step1: Make a hypothesis and select a criteria for the decsion The standard logic that underlies hypothesis testing is that there are always (at least) two hypotheses: the null hypothesis and the alternative hypothesis.
The null hypothesis (H0) predicts that the independent variable (treatment) has no effect on the dependent variable for the population.
The alternative hypothesis (H1) predicts that the independent variable will have an effect on the dependent variable for the population - we'll talk more about how specific this hypothesis may be
The logic of hypothesis testing assumes that we are trying to reject the null hypothesis, not that we are trying to prove the alternative hypothesis.
Why? Generally, It is easier to show that something isn't true, than to prove that it is. This is especially true when we are dealing with samples. Remember that we aren't testing every individual in the population, only a sub set.
Example :
Hypothesis: All dogs have 4 legs.
To reject: need to have a sample which includes 1 or more dogs with more or fewer than 4 legs.
To accept: need to examine every dog in the population and count their legs. So part of the first step is to set up your null hypothesis and your alternative hypothesis. The other part of this step is to decide what criteria that you are going to use to either reject or fail to reject (not accept) the null hypothesis. So consider the problem that we have. We have a sample and its descriptive statistics are different from the population's parameters (which may be based on the control group sample statistics). How do we decide whether the difference that we see is due to a "real" difference (which reflects a difference between two populations) or is due to sampling error? To deal with this problem the researcher must set a criteria in advance. For example, think of the kinds of questions we were asking in the previous chapter. Given a population X with a m = 65 and a s = 10, what is the probability that our sample (of size n) will have a mean of 80? We're going to be asking the same questions here, but taking it a step further and say things like, "Gee, the probability that my sample has a mean of 80 is 0.0002. That's pretty small. I'll bet that my sample isn't really from this population, but is instead from another population." setting a criteria in advance is concerned with this part about saying "that's pretty small". When we set the criteria in advance, we are essentially saying, how small a chance is small enough to reject the null hypothesis. Or in other words, how big a difference do I need to have to reject the null hypothesis. That's the big picture of setting the criteria, now let's look at the details:
what are the possible real world situations?
- H0 is correct
- H0 is wrong
what are the possible conclusions?
- H0 is correct
- H0 is wrong
So this sets up four possibilities (2 * 2):
- 2 ways of making mistakes
- 2 chances to be correct
Actual situation | |||||||||||||
Experimenter's Conclusions |
-type I error (a, alpha) - the H0 is actually correct, but the experimenter rejected it |
- - e.g., there really is only one population, even though the probability of getting a sample was really small, you just got one of those rare samples
- -type II error (b, beta)- the H0 is really wrong, but the experiment didn't feel as though they could reject it
- - e.g., your sample really does come from another population, but your sample mean is too close to the original population mean that you aren't can't rule out the possibility that there is only one population
Actual situation | |||||||||||||
Jury's Verdict |
|
- Type I error - sending an innocent person to jail
Type II error - letting a guilty person go free
- In scientific research, we typically take a conservative approach, and set our criterIa such that we try to minimize the chance of making a Type I error (concluding that there is an effect of something when there really isn't). In other words, scientists focus on setting an acceptable alpha level (a), or level of significance.
- The alpha level (a), or level of significance, is a probability value that defines the very unlikely sample outcomes when the null hypothesis is true. Whenever an experiment produces very unlikely data (as defined by alpha), we will reject the null hypothesis. Thus, the alpha level also defines the probability of a Type I error - that is, the probability of rejecting H0 when it is actually true.
Let's look at this with pictures of distributions to try and connect this with what we've been talking about so far.
a = prob of making a type I error | |
general alternative hypothesis
Two-tailed test a = 0.05 so this is 0.025 in each tail 0.025 + 0.025 = 0.05 | |
specific alternative hypothesis
H1: there is a difference & the new group should have a higher meanOne-tailed test a = 0.05 so this is 0.05 in the tail |
- If our sample mean falls into the shaded areas then we reject the H0. On the other hand, if our sample mean falls outside of the shaded areas, then we may not reject the H0. These shaded regions are called the critical regions.
- The critical region is composed of extreme sample values that are very unlikely to be obtained if the null hypothesis is true. The size of the critical region is determined by the alpha level. Sample data that fall in the critical region will warrant the rejection of the null hypothesis.
Population distribution | So the population m = 65 and s = 10.
Which distribution should you look at? population? sample means? |
distribution of sample means | Look at distribution of sample means.Find your sample mean in the distribution. Look up the probability of getting that mean or higher for the sample (see last chapter). Let's assume an a = 0.05 Let's also assume that our alternative hypothesis is that the treatment should improve performance (make the mean higher) now we need to find our standard error. |
what is our critical region? Well, this is a one tailed test. so, look at the unit normal table, and find the area that corresponds to a = 0.05 z = 1.65 (conservative, really 1.645) so, translate this into a sample mean = Z so, if = 69, then we reject the H0 |
- since we know that the z-score corresponding to the critical region is 1.65, then we just need to compute the z-score corresponding to the sample mean to see if it is higher or lower than this critical z-score.
- Z
Population distribution | So the population m = 65 and s = 10. Suppose that you take a sample of n = 25, give them the treatment and get a sample means? |
distribution of sample means | Look at distribution of sample means.Find your sample mean in the distribution. Look up the probability of getting that mean or higher for the sample (see last chapter). Let's assume an a = 0.05 Let's also assume that our alternative hypothesis is that the treatment should change performance, so we have a two-tailed test. |
now we need to find our standard error. so, look at the unit normal table, and find the area that corresponds to a = 0.05 z = 1.96 so, translate this into a sample mean so, if |
- 1) Random sample - the samples must me representative of the populations. Random sampling helps to ensure the representativeness.
2) Independent observations -also related to the representativeness issue, each observation should be independent of all of the other observations. That is, the probability of a particular observation happening should remain constant.
3) s is known and is constant - the standard deviation of the original population must stay constant. Why? More generally, the treatment is assumed to be adding (or subtracting) a constant from every individual in the population. So the mean of that population may change as a result of the treatment, however, recall that adding (or subtracting) a constant from every individual does not change the standard deviation.
4) the sampling distribution is relatively normal - either because the distribution of the raw observations is relatively normal, or because of the Central Limit Theorem (or both).
Almost done, but we need to talk a bit about the other kind of error that we might make
- recall:
Actual situation | |||||||||||||
Experimenter's Conclusions |
|
The power of a statistical test is the probability that the test will correctly reject a false null hypothesis. So power is 1 - b.
- So, the more "powerful" the test, the more readily it will detect a treatment effect.
- Power is the probability of obtaining sample data in the critical region when the null hypothesis is false.
So when there are two populations, the power will be related to how big a difference there is between the two.
a big difference between the two populationsnotice that the shaded region is large the chance to correctly reject the null hypothesis is good | |
a smaller difference between the two populationsnotice that the shaded region is smaller the chance to correctly reject the null hypothesis is not nearly as good |
- 1) Increasing a increases power.
One-tailed test a = 0.05all of the critical region (a) is on one side of the distribution | |
Two-tailed test a = 0.05 because a specific direction is not predicted, the critical region (a) is spread out equally on both sides of the distributionas a result the power is smaller |
Small n a = 0.05relatively large standard error | |
Larger n a = 0.05Smaller standard error as a result the power is greater |
No comments:
Post a Comment