INTRAVAL. Bureau voor sociaal-wetenschappelijk onderzoek en advies<
 
Between the Lines
Chapter 6    Spread, dispersion and extent (part 2)
6.2    Estimation of extent
The number of cocaine users in Rotterdam is estimated with the assistance of two estimators (13). Both estimators employ data obtained from a snowball sample of the populations. The estimators are based on a random 'one-wave' sample. The randomly selected respondents form the initial sample. The number of initial respondents is denoted with N. The respondents then nominate other people of the population who meet the inclusion criteria. This collection of newly nominated users forms the so-called first 'snowball wave'. In the Rotterdam survey, the respondents who were originally approached form the initial sample. The other cocaine users whom they nominate must meet the following inclusion criterion: they must have used cocaine at least 25 times and/or at least five times in the last six months. Furthermore, the nominees must live in Rotterdam. The people nominated by the respondents who were originally approached, form the first 'snowball wave'.
The first estimator v1 is defined as:
estimator v1
 
in which T01 is the number of newly nominated users by the initial respondents, thus if a user has been named more than once he is also counted more than once,
T00 is the number of times that respondents nominate other respondents, the so-called respondent-referrals.
 
The second estimator v2 is defined as:
estimator v2
 
in which M is the number of newly nominated users by the initial respondents; if a user has been named more than once he is counted only once,
M00 is the number of respondent-referrals in which respondents nominated more than once are only counted once.
 
Both estimators can be understood in terms of the capture-recapture idea in which capture is interpreted as 'drawn' in the initial sample and recapture as nominated by users from the initial sample, in other words the respondent-referrals(14). As we have already said, both estimators are based on a random sample. This is an important assumption which cannot be fulfilled in the Rotterdam study. Cocaine users form a hidden population and, moreover, there is no sample framework available. We have therefore tried, by means of various targets, to find initial respondents who are as far removed as possible from each other. The degree to which this variation, in principle intended as an approach to a random sample, has succeeded, depends largely on the degree of independence of the search for the various respondents by the field workers. If, for example, two respondents are found in the same pub on the same evening or if this pub appears to be the habitual pub of both persons, the chance is great that they nominate each other. This chance is smaller when two respondents are found in different pubs, geographically far apart.
In Rotterdam, a total of 110 respondents were interviewed. 84 of these 110 form the initial sample (the other 26 are extensions). 9 of the 84 initial respondents did not cooperate in nominating other users. Three of these have been nominated by other initial respondents, these three are considered nominees. The same applies to five other respondents who did cooperate in nominating other users, but were found to be non-random (e.g. through acquaintances of the field workers). We are therefore left with n=70 respondents and M=824 nominees for the first estimates.
estimator estimate standard error
v1 2.278 (505)
v2 2.777 (629)
 
These estimates are low compared to the estimates from other sources (Intraval 1989, Toet and Geurs 1992). Closer analysis shows that this is due to the high number of respondent-referrals among the respondents. This is an indication that, despite the precautions taken, the initial respondents were not all found independently of each other. Diagram 6.3. shows the initial sample after the high number of respondent-referrals has been analyzed in more detail.
Diagram 6.3
Subdivision respondent-referrals over the entries in the sample
Subdivision respondent-referrals over the entries in the sample
 
   
There appear to be two distinct entries. Respondents found in the entertainment circuit name other respondents also found in the entertainment circuit. Respondents found through the assistance agencies or prison together with respondents found through advertisements or through the field work form a single entry because two respondents from the entry advertisement/field work nominated two other respondents from the entry assistance/prison.
In order to arrive at reliable estimates, it is necessary to check which respondents in the initial sample were not found independently of each other. If, for example, a respondent has said to a field worker 'you should take a look there' and a new respondent is indeed found, then this new respondent cannot be considered as having been found independently, but as a selective extension. Another form of finding a respondent in a non-independent manner is when two respondents come together for an interview or are found together in the field. 24 respondents appear to have been found non-independently of each other. In order to meet the assumption of a random sample to a reasonable degree, it is necessary to eliminate one of the 'simultaneously' found respondents by drawing lots. Estimators can be calculated for the limited dataset thereby obtained. In this way they were ten times estimated with ten populations in which the respondents who were not independent of each other were excluded. If two respondents A and B were found in a non-random manner, then in the first estimate population, for example, respondent A is excluded, in the second estimate population respondent B and in the third respondent B, and so on. This happens with all groups in which the respondents are not found independently of each other.
The averages for the estimators are:
estimator estimate standard error
v1' 8.675 (4754)
v2' 8.545 (4620)
 
Another way to estimate the number of users is the estimate per entry. From figure 6.3 it appears that there are two separate entries. With the 23 respondents from the entertainment circuit, it appears that ten respondents were not found randomly/independent of each other. Ten estimates were made for this entry in which the respondents who were not found independent of each other were excluded alternately. The same was done for the respondents who are found by means of the other entry (assistance agencies and prison, with advertisements and field work). It turned out that 14 of the 47 respondents selected in this way had not been found independently of each other. The averages for the estimators when the two entries are added to each other are:
estimator estimate standard error
v1'' 8.833 (3596)
v2'' 8.701 (3320)
 
It appears that both approaches, i.e. for all randomly selected initial respondents and for the two separate entries arrive at estimates of around 8,700 users. The high standard errors of the estimators indicate the degree of inaccuracy of the estimates. We have several reasons to believe that these estimates are in fact underestimates. The first reason is that the initial respondents were found too much in the centre of the network. This is because respondents in the centre of a network have a higher chance of being found since they know more people and are known to more people. This would result in a large number of respondent-referrals. The second reason is that the inclusion criterion contains, among the other things, 'have used cocaine 25 times or more'. This means that users who meet the criterion but took cocaine in the distant past and stopped some time ago, will probably not be found and will not be nominated.
In addition there is a standard rule in statistics that an estimated population normally lies within two standard errors of the estimator. Because we have reasons to believe that the above estimates are underestimates, we could suppose that there are at least 5,000 cocaine users in Rotterdam who meet the inclusion criteria. The maximum number of users would, according to the rule, fall within two standard errors of the estimates. But since we do not know the degree of underestimation it is possible that the maximum number of users is 20,000. If we accept that the random sample is of a high quality, with sufficient attention being paid to the spread of the initial respondents and that the ex-users from the distant past form only a small group, we may conclude that the best possible estimation of the number of cocaine users who meet the inclusion criterion would be around 12,000(15). This is, incidently, 2% of the overall population of Rotterdam.
Notes
1. This approach has been worked out in greater detail in an article which is being prepared for publication: 'Analyzing the role of cocaine in relationships: two approaches of personal network analysis' by T.A.B. Snijders, M. Spreen and R. Zwaagstra.
2. The circuit to which a respondent is assigned is the circuit in which he is mainly taking cocaine. This is not the same classification criterion used in the relationship classification criterion which the respondent uses for nominating other users. The respondent classifies other users he knows according to the circuit within which he knows the nominee.
3. There are only four respondents in the work circuit and one in the hobby/sports circuits who have stated these circuits as their main cocaine use circuit. These have therefore not been included in the table. This does not mean however that there are no users in these circuits. As we said earlier 209 people in the work circuit and 79 in the hobby/sports circuit have been classified by 95 respondents.
4. The number of users in the five circuits who nominate respondents in the entertainment circuit and home circuit, have been simultaneously tested against each other. It appeared that the five averages in the circuits of respondents from the entertainment circuit did not differ significantly from the five averages of the respondents from the home circuit (p > 0.05).
5. This is tested with a t-test for averages (p < 0.05)
6. The number of users in the five circuits nominated by the respondents from on the one side the home circuit and the entertainment circuit and on the other side the hard drug scene, have been simultaneously tested against each other. It appears that the five averages in the circuits of respondents from the entertainment circuit and home circuit (combined) differ significantly from the five averages of respondent from the hard drug scene (p < 0.01).
7. The relationship pattern of the personal networks have been analyzed with a principal component analysis. Because there are too few respondents for a reliable analysis, the results are used to give an impression which circuits are uncorrelated in relation to each other and which circuits are highly mutually correlated.
The first principal component for the relationship pattern of respondents from the entertainment + home circuit is (73 respondents)
entertainment circuit   .75
workplace   .49
home circuit   -.15
hobby/sports circuit   .41
hard drug scene   .72
The first principal component for the relationship pattern of respondents from the hard drug scene is (22 respondents):
entertainment circuit   .79
workplace   .77
home circuit   .25
hobby/sports circuit   .73
hard drug scene   .66
8. There are, for example, respondents who nominate users only in the home circuit, respondents who nominate few (many) users in the home circle and few (many) users in the other four circuits and respondents who nominate few (many) users in the home circuit and many (few) users in the other four circuits.
9. The following combinations have also been attributed the value zero:
- always in a context of cocaine and the nominee does not know the respondent is also taking cocaine (1 relationship);
- usually in a context of cocaine and the nominee does not know the respondent is also taking cocaine (2 relationships);
- sometimes in a context of cocaine and the nominee does not know the respondent is also taking cocaine (7 relationships).
In these combinations there is no evidence of a relationship so there is also no role played by cocaine in these 'relationships'.
10. The scale is constructed on the basis of the following questions:
a) Does the contact with the nominee occur in the context of cocaine?
1. Always; 2. Usually; 3. Occasionally; 4. Never.
b) Does the nominee know that you take cocaine?
1. Yes; 2. No; 3. Not known.
c. Have you and the nominee been linked in any way in obtaining cocaine in the last 6 months?
1. Yes, I obtain/sell from nominee; 2. Yes, nominee obtains/sell from me; 3. Yes, we obtain and sell in turn; 4. Yes, we obtain/sell from a third party; 5. No, there is no connection between us, but we have the same dealer; 6. No, there is no connection of any kind between us.
11. The 95 respondents nominate a total of 439 contacts. At the moment of the interview, 32 respondents were not taking cocaine. These respondents were not included in the analyses since there is no evidence of a cocaine relationship. The other 63 respondents nominated a total of 309 contacts. In 14 of these relationships it appeared that the nominee does not know that the respondent is taking cocaine. These relationships, too, were excluded from the analyses. The relationships which were investigated are all those in which the nominee is aware that the respondent is taking cocaine. They are, thus, mutual relationships as far as the knowledge of each other's cocaine use is concerned. A total of 259 relationships of 63 respondents remain.
12. The different relationships which have the same respondent are not independent of each other. Therefore it is not possible to carry out a normal regression analysis. Such an analysis is based on the independence between the relationships of a respondent. The relationship level is nested in the respondent level. For an introduction into multi-level analysis see, for example: Goldstein, H (1987). Multilevel Models in Educational and Social Research. Oxford University Press, New York.
13. For a more detailed description of these two estimators, see the article to be published: 'Estimation of hidden populations by using snowball sampling', by O. Frank (University of Stockholm) and T.A.B. Snijders (University of Groningen). A number of estimators are discussed in this article.
14. The first and second estimator are closely linked. When the average number of times, that a person in the initial random sample is nominated by others in the initial random sample, is equal to the average number of times that an individual outside the initial random sample is nominated by people in the initial sample, then T00/T01 is equal to M00/M and both estimators have the same result. Respondents have difficulty naming other users for a number of reasons. Therefore, in the survey we requested the first two letters of the first name and surname, or nickname, age category and occupation. The people in the first wave were classified on the basis of the combination of these characteristics. This is no easy task as can be seen from the fact that with the 1,041 new names nominated by the 85 initial respondents there are the following percentages of missing data: letters first name 2%; letters surname 32%; gender 0.4%; age 0.4%; occupation 9%. It turned out that in nearly one third of the cases the first two letters of the surname were missing and in nearly one tenth the occupation. These two, together with the initials of the first name, are the best distinguishing factors. For this reason v1 is more reliable than v2 since in this estimator only the respondents need to be identified. More data is available on the respondents.
15. It must be noted that the limits (5,000 to 20,000) indicate what can be concluded about the number of cocaine users only on the basis of the statistical analysis of this snowball sample. Any other reliable evidence which may become available can be used to adjust this estimation and make it more accurate.
previous   next
Table of Contents
Preface
Chapter 1    Introduction
Chapter 2    Methodology
Chapter 3    General impressions
Chapter 4    The main characteristics
Chapter 5    Typology
Chapter 6    Spread, dispersion and extent
Chapter 7    Conclusions and discussion
Summary
Literature
Appendix A    Glossary
Appendix B    Occupation classification
Appendix C    Patterns of use
 
© INTRAVAL, Groningen-Rotterdam .
Deze site wordt onderhouden door De Poel Webdesign.