`beth2.Rd`

Compute multivariate optimal allocation for different domains corrected considering stratified two stages design

```
beat.2st(stratif, errors, des_file, psu_file, rho, deft_start = NULL,
effst = NULL, epsilon1 = 5, mmdiff_deft = 1,maxi = 20,
epsilon = 10^(-11), minPSUstrat = 2, minnumstrat = 2, maxiter = 200, maxiter1 = 25)
```

- stratif
Data frame of survey strata, for more details see, e.g.,strata.

- errors
Data frame of coefficients of variation for each domain, for more details see, e.g.,errors.

- des_file
Data frame containing information on sampling design variables, for more details see, e.g.,design.

- psu_file
Data frame containing information on primary stage units stratification, for more details see, e.g.,PSU_strat.

- rho
Data frame of survey strata, for more details see, e.g.,rho.

- deft_start
Data frame of survey strata, for taking into account the initial design effect on each variable, for more details see, e.g.,deft_start.

- effst
Data frame of survey strata, for taking into account the estimator effect on each variable, for more details see, e.g.,effst.

- epsilon1
First stop condition: sample sizes differences beetween two iterations; iteration continues until the maximum of sample sizes differences is greater than the default value. The default is 5.

- mmdiff_deft
Second stop condition: defts differences beetween two iterations; iteration continues until the maximum of defts largest differences is greater than the default value. The default is 0.06.

- maxi
Third stop condition: maximum number of allowed iterations. The default is 20.

- epsilon
The same as in function beat.1st.

- minPSUstrat
Minimum number of non-self-represenative PSUs to be selected in each stratum.

- minnumstrat
The same as in function beat.1st.

- maxiter
The same as in function beat.1st.

- maxiter1
The same as in function beat.1st.

The methodology is a generalization of Bethel multivariate allocation (1989) that extended the Neyman (1959) - Tchuprov (1923) allocation for multi-purpose and multi-domains surveys. The generalized Bethel’s algorithm allows to determine the optimal sample size for each stratum in a stratified sample design. The overall sample size and the allocation among the different strata is determined starting from the accuracy constraints imposed in the survey on interest estimates. The optimal allocation is obtained throught a procedure that converge in few iteractions:

The first iteration is a computation of an initial allocation with the multivariate optimal allocation for different domains in one stages statified sample design (the methodology is a generalization for multidomains and multistages designs of Bethel multivariate allocation, 1989).

The correction of the initial allocation is based on an iterative method calculating new allocations and is based on an inflaction of strata variances using the design effect (Ganninger, 2010).

Object of class `list`

. The list contains 8 objects:

- iteractions
Data frame that for each iteraction provides a summary with the number of Primary Stage Units (

`PSU_Total`

) distinguish between Self-Representative (`PSU_SR`

) from Non-Self-Representative (`PSU_NSR`

) and the number of Secondary Stage Units (`SSU`

). This output is also printed to the screen.- file_strata
Input data frame in

`stratif`

with the design effect for each variables in each stratum (`DEFT1 - DEFTn`

) and the optimal sample size columns.- alloc
Data frame with optimal (

`ALLOC`

), proportional (`PROP`

), equal (`EQUAL`

) sample size allocation.- planned
Data frame with a summary of expected coefficients of variation for each variable in each domain.

- expected
Data frame with a summary of realized coefficients of variation with the given optimal allocation for each variable in each domain.

- sensitivity
Data frame with a summary of the sensitivity at 10% for each domain and each variable. Sensitivity can be a useful tool to help in finding the best allocation, because it provides a hint of the expected sample size variation for a 10% change in planned CVs.

- deft_c
Data frame with the design effect for each variable in each domain in each iteraction. Note that

`DEFT1_0 - DEFTn_0`

is always equal to 1 if`deft_start`

is`NULL`

. Instead is equal to`deft_start`

. While`DEFT1 - DEFTn`

are the final design effect related to the given allocation.- param_alloc
A vector with a resume of all the parameter given for the allocation.

Cochran, W. (1977)
*Sampling Techniques.*
John Wiley & Sons, Inc., New York

Ganninger, M. (2010).
*Design effects: model-based versus design-based approach.*
Vol. 3, p. 174. DEU.

Neyman, J. (1934).
On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection.
*Journal of the Royal Statistical Society*, 97(4), 558-625.

Tschuprow, A. A. (1923). On the mathematical expectation of the moments of frequency distributions in the case of correlated observation.
(Chapters 4-6). *Metron*, 1923, 2: 646-683.

```
if (FALSE) {
# Load example data
data(beat.example)
## Example 1
# Allocate the sample
allocation2st_1 <- beat.2st(stratif=strata, errors=errors,
des_file=design, psu_file=PSU_strat,rho=rho)
# The total ammount of sample size is 191 PSU (36 SR + 155 NSR) and 15147 SSU.
## Example 2
# Assume 13000 SSUs is the maximum sample size to stick to our budget.
# Look at the sensitivity is in DOM1 for REG1 and REG2 due to V1.
allocation2st_1$sensitivity
# We can relax the constraints increasing the expected coefficients of variation for X1 by 10
errors1 <- errors
errors1[1,2] <- errors[1,2]+errors[1,2]*0.1
# Try the new allocation
allocation2st_2 <- beat.2st(stratif=strata, errors=errors1,
des_file=design, psu_file=PSU_strat,rho=rho)
## Example 3
# On the contrary, if we tighten the constraints decreasing the expected coefficients of variation
# for X1 by 10
errors2 <- errors
errors2[1,2] <- errors[1,2]-errors[1,2]*0.1
# The new allocation leads to a larger sample than the first example (around 18000)
allocation2st_3 <- beat.2st(stratif=strata, errors=errors2,
des_file=design, psu_file=PSU_strat,rho=rho)
## Example 4
# Sometimes some budget constraints concern the number of PSU involved in the survey.
# Tuning the PSUs number is possible modyfing the MINIMUM in des_file.
# Assume to increase the MINIMUM from 48 to 60
design1 <- design
design1[,4] <- 60
allocation2st_4 <- beat.2st(stratif=strata, errors=errors2,
des_file=design1, psu_file=PSU_strat, rho=rho)
# The PSUs number is decreased, while the SSUs number increased
# due to cluster intra-correlation effect.
# Under the same expected errors, to offset a slight reduction of PSUs (from 221 to 207)
# an increase of SSUs involved is observed.
allocation2st_3$expected
allocation2st_4$expected
## Example 5
# On the contrary, assume to decrease the MINIMUM from 48 to 24.
# The SSUs number strongly decrease in the face of an increase of PSUs,
# always under the same expected errors.
design2 <- design
design2[,4] <- 24
allocation2st_5 <- beat.2st(stratif=strata, errors=errors2,
des_file=design2, psu_file=PSU_strat, rho=rho)
allocation2st_4$expected
allocation2st_5$expected
## Example 6
# Assume that the SSUs are in turn clusters, for instance households composed by individuals.
# In the previous examples we always derived optimal allocations
# for sample of SSUs (i.e. households, because
# DELTA = 1).
design
design1
design2
# For obtaining a sample in terms of the elements composing SSUs
# (i.e., individuals) is just sufficient to
# modify the DELTA in des_file.
design3 <- design
design3$DELTA <- 2.31
# DELTA_IND=2.31, the average size of household in Italy.
allocation2st_6 <- beat.2st(stratif=strata, errors=errors,
des_file=design3, psu_file=PSU_strat, rho=rho)
## Example 7
# Complete workflow
library(readr)
pop <- read_rds("https://github.com/barcaroli/R2BEAT_workflows/blob/master/pop.RDS?raw=true")
library(R2BEAT)
cv <- as.data.frame(list(DOM=c("DOM1","DOM2"),
CV1=c(0.02,0.03),
CV2=c(0.03,0.06),
CV3=c(0.03,0.06),
CV4=c(0.05,0.08)))
cv
samp_frame <- pop
samp_frame$one <- 1
id_PSU <- "municipality"
id_SSU <- "id_ind"
strata_var <- "stratum"
target_vars <- c("income_hh","active","inactive","unemployed")
deff_var <- "stratum"
domain_var <- "region"
delta = 1 # households = survey units
minimum <- 50 # minimum number of SSUs to be interviewed in each selected PSU
deff_sugg <- 1.5 # suggestion for the deff value
inp <- prepareInputToAllocation1(samp_frame,
id_PSU,
id_SSU,
strata_var,
target_vars,
deff_var,
domain_var,
minimum,
delta,
deff_sugg)
inp$desfile$MINIMUM <- 50
alloc <- beat.2st(stratif = inp$strata,
errors = cv,
des_file = inp$des_file,
psu_file = inp$psu_file,
rho = inp$rho,
deft_start = NULL,
effst = inp$effst,
minPSUstrat = 2,
minnumstrat = 50
)
}
```