--- title: "Demonstration for Creating Continuous Norms with cNORM" author: "Wolfgang Lenhard, Alexandra Lenhard" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Demonstration for Creating Continuous Norms with cNORM} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(cNORM) library(ggplot2) ``` cNORM (A. Lenhard, Lenhard & Gary, 2018) is an R package that generates continuous test norms for psychometric and biometric data and analyzes the associated model fit. Originally, cNorm exclusively used an approach that makes no assumptions about the specific distribution of the raw data (A. Lenhard, Lenhard, Suggate & Segerer, 2016). Since version 3.2 (2024), however, the package also offers the option of parametric modeling using the beta-binomial distribution. cNORM was developed specifically for achievement tests (e.g. vocabulary development: A. Lenhard, Lenhard, Segerer & Suggate, 2015; written language acquisition: W. Lenhard, Lenhard & Schneider, 2017). However, the package can be used wherever mental (e.g. reaction time), physical (e.g. body weight) or other test scores depend on continuous (e.g. age, duration of schooling) or discrete explanatory variables (e.g. sex, test form). In addition, the package can also be used for "conventional" norming based on individual groups, i.e. without including explanatory variables. The package estimates percentiles as a function of the explanatory variable. This is done either parametrically on the basis of the beta-binomial distribution or distribution-free using Taylor polynomials. Mathematical modeling of the data using continuous variables such as age has the following advantages: * By optimizing the model based on the overall sample, small deviations from representativeness in individual subsamples caused by random sampling are minimized (A. Lenhard, Lenhard & Gary, 2019). * Gaps between different discrete levels of the explanatory variable are closed. In intelligence tests, for example, norms can be created not only for relatively wide age intervals (e.g., 6 months), but for any exact age. In achievement tests, the norms are no longer limited to the specific time of the data collection (e.g., middle or end of the school year), but can be extended to any point in time within a school year under certain conditions. * Norm tables are always determined on the basis of the entire sample instead of a single cohort or grade level. For this reason, only around a quarter of the sample size is required to achieve the same norm score precision with continuous norming compared to conventional norming per subgroub. In general, continuous norming only requires a sample size of around 100 per cohort to achieve sufficient norming quality (W. Lenhard & Lenhard, 2020). Especially in those performance areas that deviate relatively strongly from the population mean, continuous norming usually leads to more precise results than conventional norming. This is specifically important, because it is precisely these areas that are most relevant in diagnostic practice. * The limits of the model fit can be evaluated graphically and analytically. For example, it is possible to determine at which points the model deviates strongly from the manifest data or where strong floor or ceiling effects occur. This feature allows to specify in which age and performance ranges the test no longer differentiates sufficiently. * The beta-binomial distribution models the raw data very precisely in many application scenarios, especially for tests that are based on 1-PL IRT models. Fitting with Taylor polynomials, on the other hand, requires very few assumptions about the raw score distributions. This means that a wider range of scales can be standardized, for example, continuous measures such as reaction times, scales that contain negative values or speeded test scales. * In case of violations of represantativeness, Iterative Proportional Fitting (Raking) can be used to calculate weights and apply these to mitigate biases. In this vignette, we will demonstrate the necessary steps for the application of the R package with real human performance data, namely, with the normative sample of the sentence comprehension subtest of ELFE 1-6, a reading comprehension test in German language (W. Lenhard & Schneider, 2006) and with the German adaptation of the Peabody Picture Vocabulary Test 4 (A. Lenhard et al., 2015) ## Mathematical Background The rationale of the approach is to rank the results in the different age cohorts (= age, $a$) or continuously with a sliding window and thus to determine the observed norm scores (= location, $l$). Afterwards, powers of the age specific location and of the age are computed, as well as all linear interactions. Thus, we model the raw score $r$ as a function of the powers of location $l$ and age $a$ and their interactions by a Taylor polynomial: $$f(r) = \sum_{k=0}^K \sum_{t=0}^T \beta_{k,t} \cdot l^k \cdot a^t$$ Where: - $f(r)$ is the function of the raw score $r$ - $\beta_{k,t}$ is the coefficient for the k-th power of location and t-th power of age - $K$ is the maximum power for location - $T$ is the maximum power for age Finally, the data is fitted by a hyperplane via multiple regression and the most relevant terms are identified: ## In a nutshell: Establishing continuous norming models The 'cnorm' method combines most of the steps in one go. The example in a nutshell already suffices for establishing norm scores. It conducts the ranking, the computation of powers and the modeling. A detailed explanation of the distinct steps follows afterwards. ``` ## Basic example code for modeling the sample dataset library(cNORM) # Start the graphical user interface (needs shiny installed) # The GUI includes the most important functions. For specific cases, # please use cNORM on the console. cNORM.GUI() # Using the syntax on the console: The function 'cnorm' performs # all steps automatically. Please specify the raw score and the # grouping variable. The resulting object contains the ranked data # via object$data and the model via object$model. cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group) # Plot different indicators of model fit depending on the number of # predictors plot(cnorm.elfe, "subset", type=0) # plot R2 plot(cnorm.elfe, "subset", type=3) # plot MSE # NOTE! At this point, you usually select a good fitting model and rerun # the process with a fixed number of terms, e. g. 4. Avoid models # with a high number of terms: cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group, terms = 4) # Powers of age can be specified via the parameter 't'. # Cubic modeling is usually sufficient, i.e., t = 3. # In contrast, 'k' specifies the power of the person location. # This parameter should be somewhat higher, e.g., k = 5. cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group, k = 5, t = 3) # Visual inspection of the percentile curves of the fitted model plot(cnorm.elfe, "percentiles") # Visual inspection of the observed and fitted raw and norm scores plot(cnorm.elfe, "norm") plot(cnorm.elfe, "raw") # In order to compare different models, generate a series of percentile # plots with an ascending number of predictors, in this example between # 5 and 14 predictors. plot(cnorm.elfe, "series", start=5, end=14) # Cross validation in order to choose appropriate number of terms # with 80% of the data for training and 20% for validation. Due to # the time consumption, the maximum number of terms is limited to 10 # in this example with 3 repetitions of the cross validation. cnorm.cv(cnorm.elfe$data, max=10, repetitions=3) # Cross validation with prespecified terms of an already # existing model cnorm.cv(cnorm.elfe, repetitions=3) # Print norm table (in this case: 0, 3 or 6 months at grade level 3) # (Note: The data is coded such that 3.0 represents the beginning and # 3.5 the middle of the third school year) normTable(c(3, 3.25, 3.5), cnorm.elfe) ``` Conventional norming per age group ``` library(cNORM) # Application of cNORM for the generation of conventional norms # for a specific age group (in this case age group 3): data <- elfe[elfe$group == 3,] cnorm(raw=data$raw) ``` In the following, the single steps in detail: ## 1. Data Preparation and modelling ### Representativeness of the data base The starting point for standardization should always be a representative sample. Establishing representativeness is one of the most difficult tasks of test construction and must therefore be carried out with appropriate care. First of all, it is important to identify those variables that systematically covary with the variable to be measured. In the case of school achievement and intelligence tests, these are, for example, educational background of the parents, the federal state, the socio-economic background, etc. Caution: Increasing the sample size is only beneficial for the quality of the standardization if the covariates do not remain systematically distorted. For example, it would be useless or even counterproductive to increase the size of the sample if the sample was only collected from a single type of school or only in a single region. One advantage of continuous norming is the generally low sample size required. One way of achieving representativeness is therefore to delete as many randomly selected cases from overrepresented strata as necessary, until the individual strata are represented with the required percentage in the overall sample. However, this means that laboriously collected data is lost again. ### Weighting If representativeness cannot be achieved by removing cases, a second option is to weight the data using Iterative Proportional Fitting (Raking). In simulation studies (Gary et al., 2023, 204), we were able to show that weighting usually leads to more precise norm scores. However, we have so far only conducted these simulation studies using the distribution-free continuous norming method implemented in cNORM. Problems with weighting only arose when the variance in the standardization sample differed greatly from the actual variance in the reference population. Therefore, when applying weighting, make sure that no excessive deviations from representativeness must be compensated for and that subgroups whose average test scores deviate relatively strongly from the population mean are already sufficiently taken into account during data collection. For conducting weighting, please consult the vignette on 'WeightedRegression'. ### Determining the norm sample size The appropriate sample size cannot be quantified in a definitive way, but depends on how well the test (or scale) must differentiate in the extreme sections of the norm scale. In many countries, for example, it is common (although not always reasonable) to differentiate between IQ < 70 and IQ > 70 to diagnose developmental disabilities and to choose the appropriate school type or educational track. An IQ test used for school placement must therefore be able to identify a deviation of 2 SDs or more from the population mean as reliably as possible. If, on the other hand, the diagnosis of a reading/spelling disorder is required, a deviation of 1.5 SD from the population mean is generally sufficient for the diagnosis according to DSM-5. As a rule of thumb for determining the ideal sample size, it can be stated that the measurement error caused by the norming procedure is particularly high in those performance areas that are only represented with low probability in the norming sample. (This does not only apply to continuous norming, but to all norming methods.) For example, in a representative random sample of N = 100, the probability that there is no single child with an IQ below 70 is about 10%. For a sample size of N = 200, this probability decreases to 1 %. Doubling the sample size thus notably improves the reliability of the norm score in ranges markedly deviating from the scale mean. Since continuous norming models are always based on the entire sample, the statistical power of the norming procedure increases for each individual age. As a result, the required size of the norm sample can be substantially reduced. With a sample size of n = 100 per cohort or grade level, the norms already achieve a goodness of fit that is only achieved with conventional norming with sample sizes of n = 400 and more (W. Lenhard & Lenhard, 2021). Thus, not only do the norm scores become more precise, but the standardization projects become more cost-effective overall. ### Data preparation Once a representative sample of sufficient size has been created, the data must be loaded into the R workspace. cNORM excludes cases with missings in relevant variables. For continuous norming, in addition to the variable with the raw scores, an explanatory variable (e.g. age or duration of schooling) is required, which can be represented as a discrete grouping variable or as a continuous variable. Please ensure that the discrete grouping variable is a numerical variable with the group mean of the corresponding continuous variable being used as the variable's value, e.g. 10.5 for all children aged between 10 and 11. If only a continuous variable is initially available when applying the distribution-free method (i.e., modeling with Taylor polynomials), then this variable must be recoded into a discrete grouping variable. However, the method is relatively robust to changes in the granularity of the group subdivision. For example, the modeling result barely depends on whether the sample is devided into age brackets of 6 months or 12 months (see A. Lenhard, Lenhard, Suggate, & Segerer, 2016). The more the course of the raw scores across the explanatory variable deviates from a linear development, the finer the groups should be formed. In parametric modeling with the beta-binomial distribution, an additional group variable is generally unnecessary. For recoding a continuous explanatory variable into a group variable, the following function can be used: ``` # Creates a grouping variable for the variable 'age' # of the ppvt data set. In this example, 12 equidistant # subgroups are generated. group <- getGroups(ppvt$age, 12) ``` ## 2. Calculation of Manifest Norms and Modeling In the cNORM package, distribution-free modeling of norm scores is based on estimating raw scores as a power function of person location $l$ and the explanatory variable $a$. This approach requires several steps, which internally can be executed by a single function - the 'cnorm()' function. First, the 'cnorm()' function estimates preliminary values for the person locations. To this purpose, the raw scores within each group are ranked. Alternatively, a sliding window can be used in conjunction with the continuous explanatory variable. In this case, the width of the sliding window (function parameter 'width') must be specified. Subsequently, the ranks are converted to norm scores using inverse normal transformation. The resulting norm scores serve as estimators for the person locations. To compensate for violations of representativeness, weights can be included in this process (see Weighting). The second internal step is the calculation of powers for l and a. Powers are calculated up to a certain exponent 'k'. In order to use a different exponent for a than for l, you can also specify the parameter 't' (see mathematical derivation), which is beneficial in most cases. If neither k nor t is specified, the 'cNORM()' function uses the values k = 5 and t = 3, which have proven effective in practice. All powers are also multiplied crosswise with each other to capture the interactions of l and a in the subsequent regression. The object finally returned by the 'cnorm()' function contains the preprocessed data including the manifest norm scores and all powers and interactions of l and a in 'model$data'. In the third internal step, 'cnorm' determines a regression function. Following the principle of parsimony, models that achieve the highest $R_{adjusted}^{2}$ with as few predictors as possible should be selected. The 'cnorm()' function uses the 'regsubset()' function from the 'leaps' package for regression. Two different model selection strategies are possible: You can either specify the minimum value required for $R_{adjusted}^{2}$. Then the regression function that meets this requirement with the fewest number of predictors is selected. Or you can specify a fixed number of predictors. Then the model that achieves the highest $R_{adjusted}^{2}$ with this number of predictors is selected. Unfortunately, it is usually not known in advance how many predictors are needed to optimally fit the data. How to find the best possible model is explained in the following section Model Selection. To begin with, however, we would first like to explain the basic functionality of cNORM using a simple modeling example. To do this, we will use the 'elfe' dataset provided and start with the default setting. Since cNORM version 3.3.0, this setting provides the model with the highest $R_{adjusted}^{2}$ while avoiding inconsistencies. (What we mean by 'inconsistency' is also explained in more detail in the section Model Selection). ```{r fig0, fig.height = 4, fig.width = 7} library(cNORM) # Models the 'raw' variable as a function of the discrete 'group' variable model <- cnorm(raw = elfe$raw, group = elfe$group) ``` The model explains more than 99.2% of the data variance but requires a relatively high number of 10 predictors (plus intercept) to do so. The third line ('Final regression model') reports which powers and interactions were included in the regression. For example, L2 represents the second power of l, A3 represents the third power of a, and so on. By default, the location l is returned in T-scores (M = 50, SD = 10). However, IQ scores, z-scores, percentiles, or any vector containing mean and standard deviation (e.g., c(10, 3) for Wechsler scaled scores) can be selected instead by specifying the 'scale' parameter of the 'cnorm' function. Subsequently, the complete regression formula including coefficients is returned. The returned 'model' object contains both the data (model\$data) and the regression model (model\$model). All information about the model selection can be accessed under 'model\$model\$subsets'. The variable selection process is listed step by step in 'outmat' and 'which'. There you can find $R^{2}$, $R_{adjusted}^{2}$, Mallow's Cp, and BIC. The regression coefficients for the selected model ('model\$model\$coefficients') are available, as are the fitted values ('model\$model\$fitted.values') and all other information. A table with the corresponding information can be printed using the following code: ```{r} print(model) ``` Mathematically, the regression function represents a hypersurface in three-dimensional space. When R2 is sufficiently high (e.g., $R^2$ > .99), this surface typically models the manifest data very well over wide ranges of the normative sample. However, a Taylor polynomial, as used here, usually has a finite radius of convergence. In practice, this means that at some age or performance ranges the regression function might no longer provide plausible values. The model might, for example, unexpectedly deviate strongly from the manifest data. Such areas are usually best recognized by graphically comparing manifest and modeled data. For this purpose, cNORM provides, among other things, percentile plots. In the percentile plot, the manifest data are represented as dots, while the modeled percentiles are shown as lines. The raw score range is automatically determined based on the values from the original dataset. However, it can also be explicitly specified using the parameters 'minRaw' and 'maxRaw'. As can be seen from the percentile plot above, the percentiles of the norming model run relatively smoothly across all levels of the explanatory variable and align well with the manifest data. Small fluctuations between individual groups are eliminated. The uppermost percentile line (PR = 97.5) runs horizontally from the fourth grade onward. However, this does not represent a limitation of the model, but rather a ceiling effect of the test, as the maximum raw score of 28 is reached at this point. Nevertheless, implausible values or model inconsistencies often appear at those points where the test has floor or ceiling effects or where the normative sample is too sparse, that is, they usually occur at the boundaries of the age or performance range of the normative sample, or even beyond. Therefore, when checking the percentile lines, particular attention should be payed to the model's behavior at these boundaries. ### Model Selection Although the percentile plot suggests a good model fit, the number of predictors (i.e., 10) is relatively high. This high number carries the risk of potential overfitting of the data. The most obvious sign of overfitting is usually wavy (and therefore counter intuitive) percentile lines. Since they are not wavy for the calculated model, there is not much indication of relevant overfitting here. Moreover, if too much emphasis is placed on parsimony, insufficient fit in extreme performance ranges can occur. Therefore, the proposed model with 10 predictors seems to be an adequate option. However, the cNORM package provides methods to find even more parsimonious models, which we will demonstrate in the following. First, we recommend to perform a visual inspection using percentile plots. To this end, cNORM offers a function that generates a series of percentile plots with an increasing number of predictors. ``` # Generates a series of percentile plots with increasing number of predictors plotPercentileSeries(model, start = 1, end = 15) ``` In this case, a series of percentile plots with 1 to 15 predictors will be generated. The percentile lines begin to intersect from 12 predictors onward in the higher grade levels. This means that a single raw score is mapped on two different norm scores, i.e., the mapping of latent person variables to raw scores is not bijective in these models. Consequently, at least for some raw scores, it would be impossible to determine a definitive norm score when using these models. There can be various reasons for intersecting percentile lines: * The test shows floor or ceiling effects. When the minimum or maximum raw score is reached for the first time, cNORM normally cuts off the norm scores automatically. Unfortunately, in unfavorable cases, intersecting percentile lines can still occur. * Too many predictors were used. Please try to obtain a consistent model with fewer predictors. * The power parameter 't' for the explanatory variable a was chosen too high. Usually, t = 3 will be sufficient to model the raw scores across age. If only three age groups exist, t should be set to a maximum of 2, and for only two age groups, to 1. Higher values are not reasonable in these cases. * The regression model has been extended to age or performance ranges that were rare or non-existent in the normative sample. While it is mathematically possible to cover age and performance ranges that are not present in the normative sample, such extrapolation without empirical basis should be done with caution. If the power parameter 't' for the explanatory variable a is chosen too high, very wavy percentile lines can occur in addition to crossing ones. For comparison, you will find below the series of percentile plots that is obtained with k = 5 and t = 4. With 14 predictors, wavy percentile lines can be seen at PR = 2.5, but even with 7 or more predictors, the percentile lines no longer increase as monotonically as desired. Let's return to our model series with k = 5 and t = 3, which evidently produces better results than k = 5 and t = 4. Our goal was to potentially find models that provide sufficiently good modeling results with fewer than 10 predictors. To this end, you can also examine how the addition of predictors affects $R_{adjusted}^{2}$. For this purpose, use the following command: ```{r fig1, fig.height = 4, fig.width = 7} plot(model, "subset", type = 0) ``` If 'type = 1' is set in the 'plot()' function instead of 'type = 0', Mallow's Cp is displayed in logarithmic form. With 'type = 2', the BIC (Bayesian Information Criterion) is plotted against $R_{adjusted}^{2}$. Fortunately, the scale can be modeled very well, which is evident from the fact that it's possible to find consistent models that explain more than 99% of the data variance. This value is even achieved here with just three predictors ($R_{adjusted}^{2}$ = .991). With 8 predictors, 99.3% of the data variance is explained by the according model, only 0.2% more than with three predictors. But even such small increases can sometimes lead to significant improvements for extreme person locations. Beyond 8 predictors, however, a further addition of predictors does not even lead to changes in the per mille range. So all models between 3 and 8 predictors seem to fit well, but are more parsimonious than the model with 10 predictors and can thus be considered for selection, too. Which of these models should ultimately be selected will be determined through the following cross-validation. ## 3. Generating Norm Tables In addition to the pure modeling functions, cNORM also contains functions for generating norm tables or retrieving the normal score for a specific raw score and vice versa. ### predictNorm The 'predictNorm' function returns the normal score for a specific raw score (e.g., raw = 15) and a specific age (e.g., a = 4.7). The normal scores have to be limited to a minimum and maximum value in order to take into account the limits of model validity. ```{r} predictNorm(15, 4.7, model, minNorm = 25, maxNorm = 75) ``` ### predictRaw The 'predictRaw' function returns the predicted raw score for a specific normal score (e.g., T = 55) and a specific age (e.g., a = 4.5). ```{r} predictRaw(55, 4.5, model, minRaw = 0, maxRaw = 28) # ... or for several norm scores and age levels ... predictRaw(c(45, 50, 55), c(2.5, 3, 3.5), model) ``` ### normTable The 'normTable' function returns the corresponding raw scores for a specific age (e.g., a = 3) and a pre-specified series of normal scores. The parameter 'step' specifies the distance between two normal scores. ```{r} normTable(3, model, minRaw = 0, maxRaw = 28, minNorm=30.5, maxNorm=69.5, step = 1) ``` This function is particularly useful when scales have a large range of raw scores, and consequently, multiple raw scores correspond to a single (rounded) norm score. For example, for a tabulated norm score of T = 40, all integer raw scores that are assigned to norm scores between 39.5 and 40.5 would need to be listed. In the present case, the raw score 8.47 corresponds to T = 39.5, and the raw score 8.99 corresponds to T = 40.5. As a result, no single (integer) raw score would be assigned to a standard score of 40 in the test manual's table. Furthermore, the 'normTable()' funtion is also useful when norm scores for various subtests need to be tabulated in a single table. ### rawTable The function 'rawTable' is similar to 'normTable', but reverses the assignment: The normal scores are assigned to a pre-specified series of raw scores at a certain age. This requires an inversion of the regression function, which is determined numerically. Specify reliability and confidence coefficient to automatically calculate confidence intervals: ```{r} rawTable(3.5, model, minRaw = 0, maxRaw = 28, minNorm = 25, maxNorm = 75, step = 1, CI = .95, reliability = .89) # generate several raw tables table <- rawTable(c(2.5, 3.5, 4.5), model, minRaw = 0, maxRaw = 28) ``` You need these kind of tables if you want to determine the exact percentile or the exact normal score for all occurring raw scores. ## 4. Useful Tools ### Plot of raw scores In the following figures, the manifest and projected raw scores are compared separately for each (age) group. If the 'group' variable is set to 'FALSE', the values are plotted over the entire range of explanatory variable (i.e. without group differentiation). ```{r fig2, fig.height = 7, fig.width = 7} plot(model, "raw", group = TRUE) ``` The fit is particularly good if all dots are as close as possible to the angle bisector. However, it must be noted that deviations in the extreme upper, but especially in the extreme lower range of the raw scores often occur because the manifest data in these ranges are associated with large measurement error. ### Plot of norm scores The function corresponds to the plot("raw") function, except that in this case, the manifest and projected norm scores are plotted against each other. Please specify the minimum and maximum norm score. In the specific example, T-scores from 25 to 75 are used, covering the range from -2.5 to +2.5 standard deviations around the average score of the reference population. ```{r fig3, fig.height = 7, fig.width = 7} plot(model, "norm", group = TRUE, minNorm = 25, maxNorm = 75) ``` ### Probability Density Function The 'plot("density") function plots the probability density function of the raw scores. This method can be used to visualize the deviation of the test results from the normal distribution. ```{r fig4, fig.height = 4, fig.width = 7} plot(model, "density", group = c (2, 3, 4)) ``` ### Curve analysis of the regression function Finally, we would like to remind mathematically experienced users that it is also possible to perform conventional curve sketching of the regression function. Since the regression equation is a polynomial of the nth degree, the required calculations are not very complicated. You can, for example, determine local extrema, inflection points etc.. In this context, the 'plot("derivative")' function provides a visual illustration of the first partial derivative of the regression function with respect to the person location. The illustration helps to determine the points at which the derivative crosses zero. At these points, the mapping is no longer bijective. The points therefore indicate the boundaries of the model's validity. ```{r fig5, fig.height = 4, fig.width = 7} plot(model, "derivative") ``` ## References * CDC (2012). National Health and Nutrition Examination Survey: Questionnaires, Datasets and Related Documentation. available: https://wwwn.cdc.gov/nchs/nhanes/. date of retrieval: 25/08/2018 * Lenhard, A., Lenhard, W., Segerer, R. & Suggate, S. (2015). Peabody Picture Vocabulary Test - Revision IV (Deutsche Adaption). Frankfurt a. M.: Pearson Assessment. * Lenhard, A., Lenhard, W., Suggate, S. & Segerer, R. (2016). A continuous solution to the norming problem. Assessment, Online first, 1-14. doi: 10.1177/1073191116656437 * Lenhard, A., Lenhard, W., Gary, S. (2019). Continuous norming of psychometric tests: A simulation study of parametric and semi-parametric approaches. PLoS ONE, 14(9), e0222279. https://doi.org/10.1371/journal.pone.0222279 * Lenhard, W., Lenhard, A. & Schneider, W. (2017). ELFE II - Ein Leseverstaendnistest für Erst- bis Siebtklaessler. Goettingen: Hogrefe. * Lenhard, W. & Schneider, W. (2006). ELFE 1-6 - Ein Leseverstaendnistest für Erst- bis Sechstklässler. Goettingen: Hogrefe. * Lenhard, W., & Lenhard, A. (2021). Improvement of Norm Score Quality via Regression-Based Continuous Norming. Educational and Psychological Measurement, 81(2), 229–261. https://doi.org/10.1177/0013164420928457 * Gary, S., Lenhard, W. & Lenhard, A. (2021). Modelling Norm Scores with the cNORM Package in R. Psych, 3(3), 501-521. https://doi.org/10.3390/psych3030033