`NEWS.md`

A new function

*selectSampleSpatial*has been added: if geographical coordinates are available in the frame, in order to obtain spatially distributed selected points this function makes use of the lpm2_kdtree function from the SamplingBigData package (Tillé-Grafstrom).A new function

*bethelProc*has been added: this function allows to execute a complete procedure from the Bethel optimal allocation to the selection of a sample, without having to optimize the strata, that are supposed to be given and fixed.

A new

*optimStrata*function is available: this function is a wrapper that allows to execute the three different optimization functions: (i) optimizeStrata (method =*atomic*); (ii) optimizeStrata (method =*continuous*); (iii) optimizeStrata (method =*spatial*).A new

*optimizeStrataSpatial*function is available: this function optimizes the frame stratification taking into account also spatial correlation of frame units in a territorial context. As for*optimizeStrata2*, this function can be used only on continuous stratification variables.A new

*kMeansSolutionSpatial*function has been added: this function is the same than*KmeansSolution*, but operates in the case of optimization with only continuous stratification variables and has to be used only in conjunction with*optimizeStrataSpatial*.A new

*KmeansSolution2*function has been added: this function is the same than*KmeansSolution*, but operates in the case of optimization with only continuous stratification variables and has to be used only in conjunction with*optimizeStrata2*.A new

*prepareSuggestion*function has been added: this function operates on the result of*kMeanSolution2*or*kMeanSolutionSpatial*order to prepare the suggestions for the optimization with only continuous stratification variables.A new

*computeGamma*function has been added: this function allows to calculate a heteroscedasticity index, to be passed to optimization step as a*model*parameter in order to correctly evaluate the variance in the strata.

A new

*summaryStrata*function has been added, enabling to get structured information regarding the strata produced by the*optimizeStrata2*function (operating on only continuous stratification variables).A new

*assignStrataLabel*function has been added, enabling to assign the optimized strata label to new units to be added in the sampling frame.Fixed a bug in the optimization step for continuous variables

A new function

*optimizeStrata2*is available. This function performs the same task than*optimizeStrata*, but with a different Genetic Algorithm, operating on real values genome, instead of an integer one. This pemits to operate directly on the boundaries of the strata, instead of aggregating the initial atomic strata. In some situations (limited size of sampling frame) this new function is much more efficient. The limitation is in the nature of the stratification variables, that are required to be all continuous (though categorical ordinal could be handled).A new function

*expected_CV*has been added to calculate CVs on target variables in different domains that may be expected from a given solution, output of the*optimizeStrata*execution.Fixed a bug in the optimization step when considering also take-all strata

The optimization of population frame is run in parallel if different domains are considered. To this end, the parameter

*parallel*can be set to TRUE in the*optimizeStrata*function. If not specified, n-1 of total available cores are used OR if number of domains < (n-1) cores, then number of cores equal to number of domains are used.A new function

*KmeansSolution*produces an initial solution using the kmeans algorithm by clustering atomic strata considering the values of the means of target variables in them. Also, if the parameter*nstrata*is not indicated, the optimal number of clusters is determined inside each domain, and the overall solution is obtained by concatenating optimal clusters obtained in domains. By indicating this solution as a suggestion to the optimization step, this may greatly speed the convergence to the optimal solution.A new function

*selectSampleSystematic*has been added. It allows to select a stratified sample with the systematic method, that is a selection that begins selecting the first unit by an initial randomly chosen starting point, and proceeding in selecting other units by adding an interval that is the inverse of the sampling rate in the stratum. This selection method can be useful if associated to a particular ordering of the selection frame, where the ordering variable(s) can be considered as additional stratum variable(s).It is now possible to handle

*anticipated variance*by introducing a model linking a proxy variable whose values are available for all units in the sampling frame, with the target variable whose values are not available. In this implementation only linear model can be chosen. When calling the*buildStrataDF*function, a dataframe is given, containing three parameters (beta, sigma2 and gamma) for each couple target / proxy. On the basis of these parameters, means and standard deviations in sampling strata are calculated accordingly to given formulas.

The crossover function in the genetic algorithm has been modified by considering the

*grouping*version of this algorithm: instead of mixing chromosomes in an indifferentiate way, groups of them in one parent (representing already aggregated strata) are attributed to the other parent when generating a child, preserving their composition. Moreover, parents are selected with a probability proportional to their fitness (in the previous version selection was completely at random). In many cases this can greatly speed the convergence to an optimal solution.A new function

*adjustSize*has been added. It allows to adjust the sample size and related allocation in strata on the basis of an externally indicated overall sample size. The adjustment of the sample size is perfomed by increasing or decreasing it proportionally in each optimized stratum.A new function

*buildFrameDF*has been added. It allows to create a*sampling frame*dataframe by indicating the dataset in which the information on all the units are contained, the identifier, the X variables, the Y variable and the variable that indicates the domains of interest.In function

*optimizeStrata*now the*initialStrata*parameter is a vector, whose length is equal to the number of strata in the different domains.All outputs are written to an .subdirectory.

Function

*memoise*from the same package (now required) is applied before each evaluation in order to save processing time. This may largely increase the efficiency of the algorithm.A

*recode*function is applied on every generated solution in order to recode a genotype of n genes with k<=n distinct alleles 1, 2, …, k in such a way that the distinct alleles of the recoded genotype appear in the natural order 1, 2, …, k. This avoids to consider as distinct two solutions that are equivalent but make use of a different coding.

Modified the output to the console or to the file of results: of all the solutions, only the optimal value for each generation is visualised.

Now the visualisation of the trend in the optimal and mean values is optional: the plot can be avoided by setting showPlot = FALSE when calling optimizeStrata. It can be advisable when the number of iterations is very high.

The object returned by function

*optimizeStrata*is no more a dataframe but a list: (i) the first element of the list is the solution vector (solution*i**n**d**i**c**e**s*); (*i**i*)*t**h**e**s**e**c**o**n**d**e**l**e**m**e**n**t**o**f**t**h**e**l**i**s**t**i**s**t**h**e**d**a**t**a**f**r**a**m**e**c**o**n**t**a**i**n**i**n**g**a**g**g**r**e**g**a**t**e**d**s**t**r**a**t**a*(*s**o**l**u**t**i**o**n*aggr_strata).In all the functions that previously produced .csv files and .pdf plots in the working directory, as a default this is no more the current behaviour. To write these files, it is now necessary to set the

*writeFiles*flag to TRUE.

- Two new functions: (i)
*evalSolution*, to evaluate the found solution in terms of expected target variables precision and bias obtainable by samples drawn from the otpimized frame; (ii)*tuneParameters*, to determine the best combination of values to assign to the parameters necessary for the execution of the genetic algorithm used for the optimization of the frame stratification.