Androgynous Angel of Light Mychal Bloodwing is unjustifiably banished to a dystopian world and struggles to survive within societies populated by demons, sinners, and living dead. In the quest to clear her good name, Mychal uses her angelic powers to battle all sorts of creatures in her search for clues to solve the cosmic mystery of why she's been abandoned by the Ruler of Light and Love. Dec 23, AM. Dec 22, PM. Cathie ". Nov 26, PM. Prather and his Shell Scott series. Oct 22, PM. Topics Mentioning This Author.
Pulp Fiction — members — last activity 3 hours, 51 min ago Hard Boiled detective novels, noir, and great crime novels old and new. Discuss resources. Authors post about your books and projects. Readers discover great indie books. It is divided by genres, and includes folders for writing resources, book websit This group is dedicated to connecting readers with Goodreads authors.
Feel free to invite some friends to join our Round Table community! Wattpad — members — last activity Sep 13, PM This is a new group for lovers or newbies of the e-book website Wattpad. I have an obsession with it, and this is for those who either want to promote This is a new group for lovers or newbies of the e-book website Wattpad. I have an obsession with it, and this is for those who either want to promote their story, ramble about one they're reading or recommend some to others Add a reference: Book Author.
Search for a book to add a reference. We take abuse seriously in our discussion boards. Only flag comments that clearly need our attention. As a general rule we do not censor any content on the site. The only content we will consider removing is spam, slanderous attacks on other members, or extremely offensive content eg. We will not remove any content for bad language alone, or being critical of a particular book.
Welcome back. Just a moment while we sign you in to your Goodreads account. Add One Dead Critic liked it 3. Rate this book Clear rating 1 of 5 stars 2 of 5 stars 3 of 5 stars 4 of 5 stars 5 of 5 stars. The report, by Jensen , seemed on balance to favor purposive selection. However, pur- posive selection was abandoned relatively soon as a method of sampling for obtaining national estimates in surveys in which many items were measured. It lacked the flexibility that later developments of probability sampling produced, it was unable to predict from the sample the accuracy to be expected in the estimates, and it used sampling units that were too large.
Gini and Galvani concluded that the probability method called stratified random sampling Chapter 5 , with the commune as a sampling unit, would have given better results than their method. In the notation of section 1. As mentioned in section 1. Further- more, with probability sampling, we have formulas that give the mean and variance of the estimates.
How good is the estimate? The practical verification that approximately the stated proportion of statements is correct does much to educate and reassure administrators about the nature of sampling. Actually, a like jl, is subject to a sampling error. With certain types of stratified sampling and with the method of replicated sampling section In some of the most common problems, particularly in the estimation of ratios, estimators that are otherwise convenient and suitable are found to be biased.
Even with estimators that are unbiased in probability sampling, errors of measurement and nonreponse may produce biases in the numbers that we are able to compute from the data. This happens, for instance, if the persons who refuse to be interviewed are almost all opposed to some expenditure of public funds, whereas those who are interviewed are split evenly for and against.
Suppose that we do not know that any bias is present. We will consider how the presence of bias distorts this probability. To do this, we calculate the true probability that the estimate is in error by more than 1. The two tails of the distribution must be examined separately. The results are shown in Table 1. TABLE 1. At this point the total probability is 0.
As the bias increases further, the disturbance becomes more serious. The two tails are affected differently. With a positive bias, as in this example, the probability of an underestimate by more than 1. The probability of the corres- ponding overestimate mounts steadily. In most applications the total error is the primary interest, but occasionally we are particularly interested in errors in one direction. As a working rule, the effect of bias on the accuracy of an estimate is negligible if the bias is less than one tenth of the standard deviation of the estimate.
In using these results, a distinction must be made between the two sources of bias mentioned at the beginning of this section. This troublesome problem is discussed in Chapter Use of the MSE as a criterion of the accuracy of an estimator amounts to regarding two estimates that have the same MSE as equivalent. Table 1. Because of the difficulty of ensuring that no unsuspected bias enters into estimates, we will usually speak of the precision of an estimate instead of its accuracy.
Each name is to have an equal chance of being drawn in the sample. What problems arise in the following common situations? All cards with the same name bear consecutive numbers and therefore appear together in the file, c Some names appear on more than one card, but cards bearing the same name may be scattered anywhere about the file. What kinds of frames might be tried for the following surveys?
Have the frames any serious weaknesses? For a current interview survey of the people in the city, what are the deficiencies of this frame? Can they be remedied by the interviewers during the course of the field work? In using the directory, would you draw a list of addresses dwelling places or a list of persons? For the total sample, the ratio of actual to book value was 1.
A proprietor of a parking lot finds that business is poor on Sunday mornings. What assumption must be made in order to answer this question? Do your results agree with the directions of the changes noted in Table 1. The estimate that gives the smaller expected loss is preferred, other things being equal. In practice a simple random sample is drawn unit by unit. The units in the population are numbered from 1 to N. A series of random numbers between 1 and N is then drawn, either by means of a table of random numbers or by means of a computer program that produces such a table.
Original Sin City
At any draw the process used must give an equal chance of selection to any number in the population not already drawn. The units that bear these n numbers constitute the sample. It is easily verified that all N C n distinct samples have an equal chance of being selected by this method. Consider one distinct sample, that is, one set of n specified units. Random sampling with replacement is entirely feasible: at any draw, all N members of the population are given an equal chance of being drawn, no matter how often they have already been drawn.
The formulas for the variances and estimated variances of estimates made from the sample are often simpler when sampling is with replacement than when it is without replacement. For this reason sampling with replacement is sometimes used in the more complex sampling plans, although at first sight there seems little point in having the same unit two or more times in the sample.
Among the larger tables are those published by the Rand Corporation — 1 million digits — and by Kendall and Smith — , digits. Numerous tables are available, many in standard statistical texts. Table 2. TABLE 2. If the first digit of N is a number between 5 and 9, the following method of selection is adequate. Select three columns from Table 2. Go down the three columns, selecting the first 10 distinct numbers between and These are 36, , , , , , , , , and For the last two numbers we jumped to columns 30 to In repeated selections it is advisable to vary the starting point in the table.
When the first digit of N is less than 5, some may still prefer this method if n is small and a large table of random digits is available. In a series of three-digit numbers, subtract from all numbers between and , from all numbers between and , from all numbers between and , from all numbers between and and, of course, from all numbers between and Using columns 05 to 07 in Table 2. In using this method with a number N like , note that one subtracts from a number between and , but automatically rejects all numbers greater than Subtraction of from numbers between and would give a higher probability of acceptance to remainders between and than to remainders between and Other methods of sampling are often preferable to simple random sampling on the grounds of convenience or of increased precision.
Simple random sampling serves best to introduce sampling theory. These properties of the units are referred to as characteristics or, more simply, as items. The values obtained for any specific item in the N units that comprise the population are denoted by y 1? Note that the sample will not consist of the first n units in the population, except in the instance, usually rare, in which these units happen to be drawn. If this point is kept in mind, my experience has been that no confusion need result.
Capital letters refer to characteristics of the population and lowercase letters to those of the sample. For totals and means we have the following definitions. Proportion of units that fall into some defined class e. Estimation of the first three quantities is discussed in this chapter. The symbol A denotes an estimate of a population characteristic made from a sample. In this chapter only the simplest estimators are considered.
This has been done, we hope, only in instances in which it is clear from the context what the missing factor is. When studying any formula that is presented, the reader should make sure that he or she knows the specific method of sampling and method of estimation for which the formula has been established.
For simple random sampling it is obvious that y and Ny are consistent estimates of the population mean and total, respectively. Consistency is a desirable property of estimators. On the other hand, an inconsis- tent estimator is not necessarily useless, since it may give satisfactory precision when n is small compared to AT. Its utility is likely to be confined to this situation.
An estimator is consistent if the probability that it is in error by more than any given amount tends to zero as the sample becomes large. Exact statement of this definition requires care with complex survey plans. As we have seen, a method of estimation is unbiased if the average value of the estimate, taken over all possible samples of given size n , is exactly equal to the true population value.
If the method is to be unbiased without qualification, this result must hold for any population of finite values y, and for any n. To investigate whether y is unbiased with simple random sampling, we calculate the value of y for all N C n samples and find the average of the estimates. The symbol E denotes this average over all possible samples. Theorem 2. The sample mean y is an unbiased estimate of Y. To evaluate this sum, we find out in how many samples any specific value y, appears. A less cumbersome proof of theorem 2. This leads to the result. Its advantage is that most results take a slightly simpler form.
Provided that the same notation is maintained consistently, all results are equivalent in either notation. We now consider the variance of y. Using 2. Corollary 1. They are given with a divisor N- 1 in place of N by writers who present results in terms of or. For instance, if S is the same in the two populations, a sample of from a population of , gives almost as precise an estimate of the population mean as a sample of from a population of 10, Persons unfamiliar with sampling often find this result difficult to believe and, indeed, it is remarkable.
To them it seems intuitively obvious that if information has been obtained about only a very small fraction of the population, the sample mean cannot be accurate. It is instructive for the reader to consider why this point of view is erroneous. The effect of ignoring the correction is to overestimate the standard error of the estimate y.
The following theorem, which is an extension of theorem 2. Apply theorem 2. By theorem 2. Hence these two terms cancel on the left and right sides of 2. The result of the theorem equation 2. The formulas involve S 2 , the population variance. In practice this will not be known, but it can be estimated from the sample data. The relevant result is stated in theorem 2. Nr- 1 Proof. By the argument of symmetry used in theorem 2. Furthermore, by theorem 2.
The reader should note the symbols employed for true and estimated variances of the estimates. The reasons for this assumption and its limitations are considered in section 2. The t distribution holds exactly only if the observations y t are themselves normally distributed and N is infinite. Moderate departures from normality do not affect it greatly. For small samples with very skew distributions, special methods are needed. Signatures to a petition were collected on 67 6 sheets. Each sheet had enough space for 42 signatures, but on many sheets a smaller number of signatures had been collected.
Since about half the sheets had the maximum number of signatures, 42, the data are presented as a frequency distribution. Note that the original distribution appears to be far from normal, the greatest frequency being at the upper end. Nevertheless, there is reason to believe from experience that the means of samples of 50 are approximately normally distributed. A complete count showed 21, signatures. In this expression the a t are random variables and the y t are a set of fixed numbers. Clearly Pr a,. It may be used to find higher moments of the distribution of y, although for this purpose a method given by Tukey , with further development by Wishart , is more powerful.
In this event the ith unit may appear 0, 1, 2 , ,n times in the sample. Let t t be the number of times that the ith unit appears in the sample. In some applications the cost of measuring the distinct units in the sample may be predominating, so that the cost of the sample is proportional to the number of distinct units.
In this situation, Seth and J. Rao showed that for given average cost, V y in sampling without replacement is less than V y d in sampling with replacement. In a household survey examples are the average number of suits of clothes per adult male, the average expenditure on cosmetics per adult female, and the average number of hours per week spent watching television per child aged 10 to Ratios also appear in many other applications, for example, the ratio of loans for building purposes to total loans in a bank or the ratio of acres of wheat to total acres on a farm.
The sampling distribution of R is more complicated than that of y because both the numerator y and the denominator x vary from sample to sample. In small samples the distribution of R is skew and R is usually a slightly biased estimate of R. In large samples the distribution of R tends to normality and the bias becomes negligible.
The following approximate result will serve for most purposes: the distribution of R is studied in more detail in Chapter 6. In order to avoid having to work out the distribution of the ratio of two random variables y - Rx and x, we replace x by X in the denominator of 2. This gives X 2.
This shows that to the order of approximation used here R is an unbiased estimate of R. From 2. Hence we can find V jR by applying theorem 2. The way in which theorem 2. It was shown that the formula in theorem 2. The same result, or its natural extension, holds also in more complex sampling situations and is used frequently later in this book.
One way to compute s R is to express it as ft I y. Since the sample is small, the data are intended only to illustrate the calculations. Estimate from the sample a the mean weekly expenditure on food per family, b the mean weekly expenditure on food per person, and c the percentage of the income that is spent on food. Compute the standard errors of these estimates. Weekly Expenditure on Food per Family. Weekly Expenditure on Food per Person.
Hence, from 2. This again is a ratio of two variables i? In a household survey separate estimates might be wanted for families with 0, 1, 2,. The term domains of study has been given to these subpopulations by the U. Subcommission on Sampling In the simplest situation each unit in the population falls into one of the domains. Although n is fixed, rij will vary from one sample of size n to another. The complication of a ratio estimate can be avoided by considering the distribution of y 7 over samples in which both n and n, are fixed.
It follows that theorems 2. From theorem 2. If Nj the number of unpaid bills in the population is known, there is no problem. The sample estimate is Nfij and its conditional standard error is N, times expression 2. This is multiplied by the known total amount receivable in the list. The device of keeping n, fixed as well as n does not help in this problem. Some students seem to have a psychological objection to doing this, but the method is sound. The methods of this and the preceding section also apply to surveys in which the frame used contains units that do not belong to the population as it has been defined.
An example illustrates this application. From a list of minor household expenditures a simple random sample of Certain types of expenditure on clothing and car upkeep were not considered relevant. Of the sample items, were relevant. The sum and uncorrected sum of squares of the relevant amounts in dollars were as follows.
N" In this example expenditures on car upkeep and clothing were excluded as not relevant and therefore were scored as zeros in the sample. For instance, in a survey of stores to estimate total sales of luggage, some stores do not handle luggage; certain area sampling units for farm studies contain no farms. Sometimes it is possible, by expenditure of effort, to identify and count the units that contribute nothing, so that in our notation N-Nj , hence N h is known.
Consequently it is worth examining by how much V Yj is reduced when Nj is known. If N f is not known, 2. As might be expected, the reduction in variance due to a knowledge of N f is greater when the proportion of zero units is large and when y 7 varies relatively little among the nonzero units. For further study of this problem, see Jessen and Houseman One point should be noted. Instead, we test the null hypothesis that the two domains were drawn from infinite populations having the same mean.
Under this null hypothesis it may be proved see exercise 2. In the theory of probability much study has been made of the distribution of means of random samples. It has been proved that for any population that has a finite standard deviation the distribution of the sample mean tends to normality as n increases see, e. This work relates to infinite populations. For sampling without replacement from finite populations, Hajek has given necessary and sufficient conditions under which the distribution of the sample mean tends to normality, following work by Erdos and Renyi and Madow Hajek assumes a sequence of values n v , N v tending to infinity in such a way that N v -n v also tends to infinity.
This imposing body of knowledge leaves something to be desired. The distributions of many types of economic enterprise stores, chicken farms, towns exhibit a marked positive skewness, with a few large units and many small units. The same kind of skewness is displayed by some biological populations e. City size thousands Fig.
Frequency distribution of sizes of United States Cities in Their inclusion would extend the horizontal scale to more than five times the length shown and would, of course, greatly accentuate the skewness. Figure 2. The distribution of the sample totals, and likewise of the means, is much more similar to a normal curve but still displays some positive skewness. From statistical theory and from the results of sampling experiments on skewed populations, some statements can be made about what usually happens to confidence probabilities when we sample from positively skew populations, as follows: 1.
The frequency with which the assertion y - 1. As an illustration, consider a variate y that is essentially binomially distributed, so that the exact distribution of y can be read from the binomial tables.
- Mystical Poetry (The Mystic Knowledge Series)!
- Conferring: The Keystone of Readers Workshop.
- (PDF) Communication System 1 | sherif kamel - yripojeb.tk.
A simple random sample of size n shows a units that have the value ft and n—a units that have the value 0. There is no safe general rule as to how large n must be for use of the normal approximation in computing confidence limits. The rule attempts to control only the total frequency of wrong statements, ignoring the direction of the error of estimate. By calculating G u or an estimate, for a specific population, we can obtain a rough idea of the sample size needed for application of the normal approximation to compute confidence limits.
The result should be checked by sampling experi- ments whenever possible. The data in Table 2. The computation of G 1 is shown under the table. The computations are made on a coded scale, and, since is a pure number, there is no need to return to the original scale. Note that the first class-interval was slightly different from the others. Good sampling practice tends to make the normal approximation more valid.
Failure of the normal approximation occurs mostly when the population contains some extreme individuals who dominate the sample average when they are present. However, these extremes also have a much more serious effect of increasing the variance of the sample and decreasing the precision. Consequently, it is wise to segregate them and make separate plans for coping with them, perhaps by taking a complete enumeration of them if they are not numerous.
This removal of the extremes from the main body of the population reduces the skewness and improves the normal approximation. This technique is an example of stratified sampling, which is discussed in Chapter 5. This question has naturally attracted a good deal of work. These numbers that identify the units are often called the labels attached to the units. Early results proved by Horvitz and Thompson for linear estimators in simple random sampling are as follows. Godambe showed that in this class no unbiased estimate of Y exists with minimum variance for all populations.
Further properties of y have been developed by Hartley and J. Rao , , Royall , and C. Rao Hartley and Rao show that y has minimum variance among unbiased estimators of Y that are functions only of the n t and y t. For random sampling with replacement, they show that thejnean of the distinct values in the sample is the maximum likelihood estimator of Y 9 although it does not have minimum variance in all populations.
Royall b has given a more general result. The influence of this work on sampling practice has been limited thus far, but should steadily increase. Some reference to it will be made from time tQ time. For reviews, see J. Calculate the sample meany for all possible simple random samples of size 2. Verify that y is an unbiased estimate of Y and that its variance is as given in theorem 2. The numbers of persons per household in the sample were as follows.
Rao unless otherwise noted. The values to the nearest dollar are as follows. An advisor suggests that a simple randomsample of 12 shelves will meet the requirements. Do you agree? Use this information to make an improved estimate of the total number of signatures and find the standard error of your estimate. The sample contained 54 public and 46 private colleges. In each case compute the standard error of your estimate. Calculate the standard error of the estimated total number of inhabitants in all cities for the following methods of sampling: a a simple random sample of size 50, b a sample that includes the five largest cities and is a simple random sample of size 45 from the remaining cities c a sample that includes' the nine largest cities and is a simple random sample of size 41 from the remaining cities.
For one item the variance is thought to be about 15 for both owners and renters. The standard error of the difference between the two domain means is not to exceed 1. How large a sample is needed a if owners and renters can be identified in advance of drawing the sample, b if not? An approximate answer will do in b ; an exact discussion requires binomial tables. A selects a simple random sample of 20 children and counts the number of decayed teeth for each child, with the following results. B , using the same dental techniques, examines all children, recording merely those who have no decayed teeth.
He finds 60 children with no decayed teeth. The company can either a compile the list and interview a simple random sample drawn from the eligible employees or b draw a simple random sample of all employees, interviewing only those eligible. The cost of rejecting those not eligible in the sample is assumed negligible. Ignore the fpc. Repeated sampling implies repetition of the drawing of both the sample and the subsample.
There has been some interest in finding smaller sets of samples of size n that have the same properties as the set of simple random samples. One set is that of balanced incomplete block bib designs. These are samples of n distinct units out of N such that i every unit appears in the same number r of samples, ii every pair of units appears together in A samples.
There is no general method for finding the smallest r for which a bib can be constructed. Avadhani and Sukhatme have shown how bib designs may be used in attempting to reduce travel costs between sampling units. The illustration is taken from an earlier example by Roy and Chakravarti I After the decision to take a simple random sample had been made, it was realized that y x would be unusually low and y N would be unusually high. For this situation, Sarndal examined the following unbiased estimator of Y.
Given the information in exercise 2. This estimator is an example of stratified sampling Chapter 5 with three strata: yf, y 2. Many of the results regularly published from censuses or surveys are of this form, for example, numbers of unemployed persons, the percentage of the population that is native-born. We suppose that every unit in the population falls into one of the two classes C and C.
In statistical work the binomial distribution is often applied to estimates like a and p. As will be seen, the correct distribution for finite populations is the hyper geometric, although the binomial is usually a satisfactory approximation. For any unit in the sample or population, define y t as 1 if the unit is in C and as 0 if it is in C.
In order to use the theorems in Chapter 2, we first express S 2 and s 2 in terms of P and p. Note that E y? Theorem 3. The variance of p is using 3. If p and P are the sample and population percentages , respec- tively, falling into class Q 3. In the corollary of theorem 2. From a list of names and addresses, a simple random sample of names showed on investigation 38 wrong addresses. Estimate the total number of addresses needing correction in the list and find the standard error of this estimate.
To remove it, replace the term N-n by N. The preceding formulas for the variance and the estimated variance of p hold only if the units are classified into C or C' so that p is the ratio of the number of units in C in the sample to the total number of units in the sample. In many surveys each unit is composed of a group of elements, and it is the elements that are classified.
Appropriate methods are given in section 3. If the fpc is ignored, we have n The function PQ and its square root are shown in Table 3. These functions may be regarded as the variance and standard deviation, respectively, for a sample of size 1. The functions have their greatest values when the population is equally divided between the two classes, and are symmetrical about this point. This approach is not appropriate when interest lies in the total number of units in the population that are in class C.
Thus we tend to think of the standard error expressed as a fraction or percentage of the true value, NP. TABLE 3. Very large samples are needed for precise estimates of the total number possessing any attribute that is rare in the population. This gives a sample size of Simple random sampling, or any method of sampling that is adapted for general purposes, is an expensive method of estimating the total number of units of a scarce type.
In sampling without replacement, the proportion keeps changing in this way throughout the draw. In the present section these variations are ignored, that is, P is assumed constant. This amounts to assuming that A and N-A are both large relative to the sample size n , or that sampling is with replacement. With this assumption, the process of drawing the sample consists of a series of n trials, in each of which the probability that the unit drawn is in C is P.
This situation gives rise to the familiar binomial frequency distribution for the number of units in C in the sample. There are three comprehensive sets of tables. All give P by intervals of 0. The ranges for n are as follows. The numbers of units in the two classes C and C in the population are A and A' , respectively. To find the probability wanted, we count how many of these samples contain exactly a units from C and a ' from C.
Each selection of the first type can be combined with any one of the second to give a different sample of the required type. The distribution is called the hypergeometric distribution. For computing purposes the hypergeometric probability 3. A family of eight contains three males and five females.
Find the frequency distribution of the number of males in a simple random sample of size 4. In the sample, a out of n fall in class C. Suppose that inferences are to be made about the number A in the population that fall in class C.
For an upper confidence limit to A, we compute a value A v such that for this value the probability of getting a or less falling in C in the sample is some small quantity a a , for example, 0. When ctjj is chosen in advance, 3. Numerous methods are available for computing confidence limits. Values for intermediate population sizes are obtain- able by interpolation. Lieberman and Owen give tables of individual and cumulative terms of the hypergeometric distribution, but N extends only to The Normal Approximation From 3.
The last term on the right is a correction for continuity. This produces only a slight improvement in the approximation. However, without the correc- tion, the normal approximation usually gives too narrow a confidence interval.
Particle Size Measurement
The error in the normal approximation depends on all the quantities n , p, N, a n , and a L. The quantity to which the error is most sensitive is np or more specifically the number observed in the smaller class.
Table 3. The rules in Table 3. Furthermore, the probability that the upper limit is below P is between 2. Example 1. In a simple random sample of size 1 00, from a population of size , there are 37 units in class C. To find limits for the total number in class C in the population, we multiply by N, obtaining and , respectively. Binomial Approximations When the normal approximation does not apply, limits for P may be found from the binomial tables section 3.
Example 2 shows how the binomial approximation is computed. The Fisher-YateS tables give 0. Burstein has produced a variant of this calculation that is slightly more accurate. Suppose that a units out of n are in class C in this example, a-9,n- Example 3. In auditing records in which a very low error rate is demanded, the upper confidence limit for A is primarily of interest.
Suppose that of records are verified and that the batch of is accepted if no errors are found. Special tables have been constructed to give the upper confidence limit for the number of errors in the batch. A good approximation results from the following relation. The probability that no errors are found in n when A errors are present in N is, from the hypergeometric distribution, N-A N-A - 1. Thus a sample from a human population may be arranged in 15 five-year age groups. This is the appropriate extension of the binomial distribution and is a good approximation when the sampling fraction is small.
The numerator is the number of distinct samples of size n that can be formed with a x units in class 1, a 2 in class 2, and a 3 in class 3. Case 1. The theory already presented applies to this case. Confidence limits are calculated as described in section 3. Case 2. Sometimes certain classes are omitted, p being computed from a breakdown of the remaining classes into two parts. Ratios that are structurally of this type are often of interest in sample surveys. The denominator of such a ratio is not n but some smaller number n'.
Although n' varies from sample to sample, previous results can still be used by considering the conditional distribution of p in samples in which both n and n' are fixed. This device was already employed in section 2. Hence, from 3. Dividing this numerator by 3. There are 10 possible samples of size 3, all with equaHnitial probabilities.
These are grouped according to the value of ri. In a conditional approach the variance changes with the configuration of the sample that was drawn. The sample data may be presented as follows. Domain 1 Domain The frequency distribution and confidence limits for p x were discussed under Case 2 in sections 3.
For estimating the total number A x of units in class C in domain 1, there are two possibilities. Domain 1 2 c 1 2 c 1' a 2 Total Ml ni The ordinary x test Fisher, or the normal approximation to the distribution of p x — p 2 is appropriate. Similarly, comparisons among proportions for more than two domains are made by the methods for a 2 x k contingency table. A group of 61 leprosy patients were treated with a drug for 48 weeks.
To measure the effect of the drug on the leprosy bacilli, the presence of bacilli at six sites on the body of each patient was tested bacteriologically. Among the sites, , or What is the standard error of this percentage? This example comes from a controlled experiment rather than a survey, but it illustrates how erroneous the binomial formula may be. To find the standard error by the correct formula, we need the frequency distribution of the 61 values of p f. It is more convenient to tabulate the distribution of y i9 the number of negative sites per patient. From the distribution in Table 3.
The binomial formula requires the assumption that results at different sites on the same patient are independent, although actually they have a strong positive correlation. The last line of Table 3. It is slightly biased, although the bias is seldom likely to be of practical importance. If we put cii for y, and m, for x t in 2. An alternative expression is 3. Example 2. A simple random sample of 30 households was drawn from a census taken in in wards 6 and 7 of the Eastern Health District of Baltimore. The population contains about 15, households. In Table 3. Our purpose is to contrast the ratio formula with the inappropriate binomial formula.
Consider first the proportion of people who had consulted a doctor. For various reasons, families differ in the frequency with which their members consult a doctor. For the sample as a whole, the proportion who consult a doctor is only a little more than one in four, but there are several families in which every member has seen a doctor.
Similar results would be obtained for any characteristic in which the members of the same family tend to act in the same way. In estimating the proportion of males in the population, the results are different. The reason is interesting. Most households are set up as a result of a marriage, hence contain at least one male and one female.
Consequently the proportion of males per family varies less from one half than would be expected from the binomial formula. None of the 30 families, except one with only one member, is composed entirely of males, or entirely of females. If the binomial distribution were applicable, with a true P of approximately one half, households with all members of the same sex would constitute one quarter of the households of size 3 and one eighth of the households of size 4.
This property of the sex ratio has been discussed by Hansen and Hurwitz Other illustrations of the error committed by improper use of the binomial formula in sociological investigations have been given by Kish Verify that N-n is an unbiased estimate of the variance of p. Work out the conditional distributions of this proportion, p, and verify the formula for its conditional variance. Find the average variance of p in exercise 3.
This is 0. Each family was asked whether it owned or rented the house and also whether it had the exclusive use of an indoor toilet. Results were as follows. Owned Rented Total Exclusive use of toilet Yes No Yes No 6 34 a For families who rent, estimate the percentage in the area with exclusive use of an indoor toilet and give the standard error of your estimate; b estimate the total number of renting families in the area who do not have exclusive indoor toilet facilities and give the standard error of this estimate.
State the conditions under which knowledge of N r produces large reductions in variance. Try also the method on p. Some students made no visits. A simple random sample of n eligible students is taken. In it n x students out of the n made at least one visit and their total number of visits was y. Their total number of visits was and the estimated variance s 2 was 1. As the intracluster correlation varies, what are the highest and lowest possible values of the true variance of p the sample estimate of P and how do they compare with the binomial variance?
Estimate the variance of the proportion of persons who saw a dentist, and compare this with the binomial estimate of the variance. For further discussion, see Finney, , and Sandelius, , who considers a plan in which sampling continues until either m have been found or the total sample size has reached a preassigned limit n 0.
myWorld Interactive American History TE Topic Sampler Pages 51 - 86 - Text Version | FlipHTML5
See also section 4. The decision is important. Too large a sample implies a waste of resources, and too small a sample diminishes the utility of the results. The decision cannot always be made satisfactorily; often we do not possess enough information to be sure that our choice of sample size is the best one. Sampling theory provides a framework within which to think intelligently about the problem. A hypothetical example brings out the steps involved in reaching a solution.
An anthropologist is preparing to study the inhabitants of some island.
Among other things, he wishes to estimate the percentage of inhabitants belonging to blood group O. Cooperation has been secured so that it is feasible to take a simple random sample. How large should the sample be? This equation cannot be discussed without first receiving an answer to another question. How accurately does the anthropologist wish to know the percentage of people with blood group O? The anthropologist replies coldly that he is aware of this, that he is willing to take a 1 in 20 chance of getting an unlucky sample, and that all he asks for is the value of n instead of a lecture on statistics.
We are now in a position to make a rough estimate of n. To simplify matters, the fpc is ignored, and the sample percentage p is assumed to be normally distributed. Whether these assumptions are reasonable can be verified when the initial n is known. A formula for n has been obtained, but n depends on some property of the population that is to be sampled. In this instance the property is the quantity P that we would like to measure. We therefore ask the anthropologist if he can give us some idea of the likely value of P. This information is sufficient to provide a usable answer. The corresponding n lies between and To be on the safe side, is taken as the initial estimate of n.
The assumptions made in this analysis can now be reexamined. Whether the fpc is required depends on the number of people on the island. The method of applying the readjustment, if it is needed, is discussed in section 4. There must be some statement concerning what is expected of the sample. This statement may be in terms of desired limits of error, as in the previous example, or in terms of some decision that is to be made or action that is to be taken when the sample results are known.
The responsibility for framing the statement rests primarily with the persons who wish to use the results of the survey, although they frequently need guidance in putting their wishes into numerical terms. Some equation that connects n with the desired precision of the sample must be found.
The equation will vary with the content of the statement of precision and with the kind of sampling that is contemplated. One of the advantages of probability sampling is that it enables this equation to be constructed. This equation will contain, as parameters, certain unknown properties of the population. These must be estimated in order to give specific results. It often happens that data are to be published for certain major subdivisions of the population and that desired limits of error are set up for each subdivision.
A separate calculation is made for the n in each subdivision, and the total n is found by addition. More than one item or characteristic is usually measured in a sample survey: sometimes the number of items is large. If a desired degree of precision is prescribed for each item, the calculations lead to a series of conflicting values of n, one for each item.
Some method must be found for reconciling these values. Finally, the chosen value of n must be appraised to see whether it is consistent with the resources available to take thfe sample. This demands an estimation of the cost, labor, time, and materials required to obtain the proposed size of sample.
It sometimes becomes apparent that n will have to be drastically reduced. A hard decision must then be faced — whether to proceed with a much smaller sample size, thus reducing precision, or to abandon efforts until more resources can be found. In succeeding sections some of these questions are examined in more detail. This amount is determined, as best we can, in the light of the uses to which the sample results are to be put. Sometimes it is difficult to decide how much error should be tolerated, particularly when the results have several different uses.
He might reply that the blood group data are to be used primarily for racial classification. In this respect the example is typical of the way in which a limit of error is often decided on. In fact, the anthropologist was more certain of what he wanted than many other scientists and administrators will be found to be. When the question of desired degree of precision is first raised, such persons may confess that they have never thought about it and have no idea of the answer. My experience has been, however, that after discussion they can frequently indicate at least roughly the size of a limit of error that appears reasonable to them.
Further than this we may not be able to go in many practical situations. Even when these consequences are known, however, the results of many important surveys are used by different people for different purposes, and some of the purposes are not foreseen at the time when the survey is planned. Therefore, an element of guesswork is likely to be prominent in the specification of precision for some time to come.
If the sample is taken for a very specific purpose, e. A general approach to problems of this type is given in section 4. From theorem 3. If not, it is apparent on comparison of 4. From 4. But with a rare item e. The method is usually called inverse sampling. For N. Thus, by fixing m in advance, we can control the value of cv p without advance knowledge of P. The value of n with this method is a random variable, but will be large if P is small. We assume that y is normally distributed: from theorem 2. This is often more stable and easier to guess in advance than 5 itself.
As a first approximation we take 4. The quantity C is the desired cv 2 of the sample estimate. In nurseries that produce young trees for sale it is advisable to estimate, in late winter or early spring, how many healthy young trees are likely to be on hand, since this determines policy toward the solicitation and acceptance of orders. A study of sampling methods for the estimation of the total numbers of seedlings was undertaken by Johnson The data that follow were obtained from a bed of silver maple seedlings 1 ft wide and ft long.
The appropriate formulas for other methods of sampling and estimation are presented with the discussion of these techniques. In practice, there are four ways of estimating population variances for sample size determinations: 1 by taking the sample in two steps, the first being a simple random sample of size from which estimates s 2 or p x of S 2 or P and the required n will be obtained; 2 by the results of a pilot survey; 3 by previous sampling of the same or a similar population; and 4 by guesswork about the structure of the population, assisted by some mathematical results.
Method 1 gives the most reliable estimates of S 2 or P, but it is not often used, since it slows up the completion of the survey. A few results are quoted. When this is not so, see Cox The combined size of the first two samples should be Pigi [ pigi l-3pigi V piqi Vn x 4. With this method, the ordinary binomial estimate p made from the complete sample of size n is slightly biased. In all results given above the fpc is ignored.
A sampler wishes to estimate P with a coefficient of variation of 0. Equation 4. The second method, a small pilot survey, serves many purposes, especially if the feasibility of the main survey is in doubt. If the pilot survey it itself a simple random sample, the preceding methods apply.
But often the pilot work is restricted to a part of the population that is convenient to handle or that will reveal the magnitude of certain problems. Allowance must be made for the selective nature of the pilot when using its results to estimate S 2 or P. For instance, a common practice is to confine the pilot work to a few clusters of units. Thus the computed s 2 measures mostly the variation within a cluster and may be an underestimate of the relevant S 2.
The relation between intra- and intercluster variation is discussed in Chapter 9. Cornfield 1 gives a good illustration of the estima- tion of sample size in cluster sampling for proportions. Method 3 — the use of results from previous surveys — points to the value of making available, or at least keeping accessible, any data on standard deviations obtained in previous surveys. Unfortunately, the cost of computing standard deviations in complex surveys is high, even with electronic machines, and fre- quently only those s.
If suitable past data are found, the value of S 2 may require adjustment for time changes.