How can probability be applied in public health




















Divide this by the total number of possible outcomes. Yet you could toss a coin 10 times and get seven heads and three tails, which is 70 per cent heads and 30 per cent tails. However, if you toss that coin 1, times or more — which a few people have done — you will eventually begin to see that breakdown.

This illustrates another important point about probability. It depends on the outcome or event happening over a large number of repetitions, or with a large number of people.

There are many examples of how probability is used throughout society. One common measure is the probability of developing cancer. According to the Canadian Cancer Society, 40 per cent of Canadian women and 45 per cent of men will have a diagnosis of an incident of cancer during their lifetimes. These probabilities are based on calculations from cancer statistics across the country. While this broad information can be useful for those who plan, deliver or research health-care services, more detailed information is even more helpful.

The probability that no more than 1 of 5 or equivalently that at most 1 of 5 die from the attack is What is the probability that 2 or more of 5 die from the attack?

Here we want to compute P 2 or more successes. The possible outcomes are 0, 1, 2, 3, 4 or 5, and the sum of the probabilities of each of these outcomes is 1 i.

There is a 1. Suppose you flipped a coin 10 times i. What would be the probability of getting exactly 4 heasds? Suppose we were interested in characterizing the variability in body weights among adults in a population. There are several noteworthy characteristics of this graph.

It is bell-shaped with a single peak in the center, and it is symmetrical. If the distribution is perfectly symmetrical with a single peak in the center, then the mean value, the mode, and the median will be all be the same.

Many variables have similar characteristics, which are characteristic of so-called normal or Gaussian distributions. Note that the horizontal or X-axis displays the scale of the characteristic being analyzed in this case weight , while the height of the curve reflects the probability of observing each value. The fact that the curve is highest in the middle suggests that the middle values have higher probability or are more likely to occur, and the curve tails off above and below the middle suggesting that values at either extreme are much less likely to occur.

There are different probability models for continuous outcomes, and the appropriate model depends on the distribution of the outcome of interest. The normal probability model applies when the distribution of the continuous outcome conforms reasonably well to a normal or Gaussian distribution, which resembles a bell shaped curve. Note normal probability model can be used even if the distribution of the continuous outcome is not perfectly symmetrical; it just has to be reasonably close to a normal or Gaussian distribution.

However, other distributions do not follow the symmetrical patterns shown above. For example, if we were to study hospital admissions and the number of days that admitted patients spend in the hospital, we would find that the distribution was not symmetrical, but skewed.

Note that the distribution to the distribution below is not symmetrical, and the mean value is not the same as the mode or the median. Normal probabilities can be calculated using calculus or from an Excel spreadsheet see the normal probability calculator further down the page. There are also very useful tables that list the probabilities. The standard deviation gives us a measure of how spread out the observations are. It is possible to have BMI values below 11 or above 47, but extreme values occur very infrequently.

To compute probabilities from normal distributions, we will compute areas under the curve. For any probability distribution, the total area under the curve is 1. Consequently, if we select a man at random from this population and ask what is the probability his BMI is less than 29? What is the probability that a 60 year old male has BMI less than 35? The probability is displayed graphically and represented by the area under the curve to the left of the value 35 in the figure below.

What is the probability that a 60 year old male has BMI less than 41? It is easy to figure out the probabilities for values that are increments of the standard deviation above or below the mean, but what if the value isn't an exact multiple of the standard deviation? For example, suppose we want to compute the probability that a randomly selected male has a BMI less than 30 which is the threshold for classifying someone as obese. Because 30 is neither the mean nor a multiple of standard deviations above or below the mean, we cannot simply use the probabilities known to be associated with 1, 2, or 3 standard deviations from the mean.

In a sense, we need to know how far a given value is from the mean and the probability of having values less than this. And, of course, we would want to have a way of figuring this out not only for BMI values in a population of males with a mean of 29 and a standard deviation of 6, but for any normally distributed variable.

So, what we need is a standardized way of evaluating any normally distributed data so that we can compute the probability of observing the results obtained from samples that we take.

We can do all of this fairly easily by using a "standard normal distribution. What is the probability that a randomly selected male from this population would have a BMI less than 30? This provides us with a way of standardizing how far a given observation is from the mean for any normal distribution, regardless of its mean or standard deviation.

Now what we need is a way of finding the probabilities associated various Z-scores. This can be done by using the standard normal distribution as described on the next page. The standard normal distribution is a normal distribution with a mean of zero and standard deviation of 1. The standard normal distribution is centered at zero and the degree to which a given measurement deviates from the mean is given by the standard deviation.

To this point, we have been using "X" to denote the variable of interest e. However, when using a standard normal distribution, we will use "Z" to refer to a variable in the context of a standard normal distribution. For any given Z-score we can compute the area under the curve to the left of that Z-score.

The table in the frame below shows the probabilities for the standard normal distribution. Examine the table and note that a "Z" score of 0. This table is organized to provide the area under the curve to the left of or less of a specified value or "Z value".

In this case, because the mean is zero and the standard deviation is 1, the Z value is the number of standard deviation units away from the mean, and the area is the probability of observing a value less than that particular Z value. Note also that the table shows probabilities to two decimal places of Z.

The units place and the first decimal place are shown in the left hand column, and the second decimal place is displayed across the top row. But let's get back to the question about the probability that the BMI is less than 30, i. We can answer this question using the standard normal distribution.

The figures below show the distributions of BMI for men aged 60 and the standard normal distribution side-by-side. The area under each curve is one but the scaling of the X axis is different.

Note, however, that the areas to the left of the dashed line are the same. The BMI distribution ranges from 11 to 47, while the standardized normal distribution, Z, ranges from -3 to 3.

The following formula converts an X value into a Z score , also called a standardized score :. Thus, the probability that a male aged 60 has BMI less than 30 is Again we standardize:. Note, however, that the table always gives the probability that Z is less than the specified value, i. As an alternative to looking up normal probabilities in the table or using Excel, we can use R to compute probabilities.

What is the probability that a 60 year old man in the population above has a BMI less than 29 the mean? What is the probability that a 60 year old man will have a BMI less than 30?

The Z-score was 0. What is the probability that a 60 year old man will have a BMI greater than 35? What is the probability that a male aged 60 has BMI between 30 and 35?

Note that this is the same as asking what proportion of men aged 60 have BMI between 30 and Try to formulate and answer on your own before looking at the explanation below. Now consider BMI in women. What is the probability that a female aged 60 has BMI less than 30? We use the same approach, but for women aged 60 the mean is 28 and the standard deviation is 7. What is the probability that a female aged 60 has BMI exceeding 40? The standard normal distribution can also be useful for computing percentiles.

For example, the median is the 50 th percentile, the first quartile is the 25 th percentile, and the third quartile is the 75 th percentile. In some instances it may be of interest to compute other percentiles, for example the 5 th or 95 th. The formula below is used to compute percentiles of a normal distribution. Previously we started with a particular "X" and used the table to find the probability. So we begin by going into the interior of the standard normal distribution table to find the area under the curve closest to 0.

When we go to the table, we find that the value 0. Interpretation: Ninety percent of the BMIs in men aged 60 are below Ten percent of the BMIs in men aged 60 are above What is the 90 th percentile of BMI among women aged 60? Recall that the mean BMI for women aged 60 the mean is 28 with a standard deviation of 7. Percentiles of height and weight are used by pediatricians in order to evaluate development relative to children of the same sex and age.

For example, if a child's weight for age is extremely low it might be an indication of malnutrition. For infant girls, the mean body length at 10 months is 72 centimeters with a standard deviation of 3 centimeters. Suppose a girl of 10 months has a measured length of 67 centimeters. How does her length compare to other girls of 10 months? A complete blood count CBC is a commonly performed test. One component of the CBC is the white blood cell WBC count, which may be indicative of infection if the count is high.

WBC counts are approximately normally distributed in healthy people with a mean of WBC per mm 3 i. What proportion of subjects have WBC counts exceeding ? Using the mean and standard deviation in the previous question, what proportion of patients have WBC counts between and ?

The mean of a representative sample provides an estimate of the unknown population mean, but intuitively we know that if we took multiple samples from the same population, the estimates would vary from one another. We could, in fact, sample over and over from the same population and compute a mean for each of the samples. In essence, all these sample means constitute yet another "population," and we could graphically display the frequency distribution of the sample means.

This is referred to as the sampling distribution of the sample means. The data are shown below and ordered from smallest to largest. The rightmost column shows the sample mean based on the 4 observations contained in that sample.

The collection of all possible sample means in this example there are 15 distinct samples that are produced by sampling 4 individuals at random without replacement is called the sampling distribution of the sample means , and we can consider it a population, because it includes all possible values produced by this sampling scheme. Notice also that the variability in the sample means is much smaller than the variability in the population, and the distribution of the sample means is more symmetric and has a much more restricted range than the distribution of the population data.

If the population is normal, then the theorem holds true even for samples smaller than This means that we can use the normal probability model to quantify uncertainty when making inferences about a population mean based on the sample mean. Again, there are two exceptions to this. If the population is normal, then the result holds for samples of any size i.. The figure below illustrates a normally distributed characteristic, X, in a population in which the population mean is 75 with a standard deviation of 8.

The distribution of the sample means is illustrated below. Note that the horizontal axis is different from the previous illustration, and that the range is narrower. The mean of the sample means is 75 and the standard deviation of the sample means is 2. Now suppose we measure a characteristic, X, in a population and that this characteristic is dichotomous e. The Central Limit Theorem applies even to binomial populations like this provided that the minimum of np and n 1-p is at least 5, where "n" refers to the sample size, and "p" is the probability of "success" on any given trial.

Therefore, the criterion is met. We saw previously that the population mean and standard deviation for a binomial distribution are:. Mean binomial probability:. Standard deviation:. Note that in this scenario we do not meet the sample size requirement for the Central Limit Theorem i.

The sample size must be larger in order for the distribution to approach normality. The Poisson distribution is another probability model that is useful for modeling discrete variables such as the number of events occurring during a given time interval.

For example, suppose you typically receive about 4 spam emails per day, but the number varies from day to day. Today you happened to receive 5 spam emails. What is the probability of that happening, given that the typical rate is 4 per day?

The Poisson probability is:. So, in the example above. We will begin looking at several different kinds of graphs and tables and seeing how we can take data from one interpretation and put it into another. Then we will connect this to how health related subjects can be summarized in a table or graph, and how that points to health disparities.

Lesson 3 Resources packet. This lesson presents students with another way of illustrating data in the form of a two-way frequency table. They will learn how to create one, how they are useful, and how they can be used to make predictions. Lesson 4 Resources packet. This lesson provides students with a clear outline of summative assessment for the unit. We will discuss how this unit ties into other classes that are focusing on similar issues regarding health concerns within their communities.

Then we will discuss how the students will use the skills that they acquire throughout the unit to address a research question they generate. Lesson 5 Resources packet.



0コメント

  • 1000 / 1000