Sign Up For Newsletter

Blueprints For Healthy Youth Development logo

Nuffield Early Language Intervention

A school-based program designed to improve children's vocabulary, narrative skills, active listening, and confidence in independent speaking.

Program Outcomes

  • Preschool Communication/ Language Development
  • School Readiness

Program Type

  • Academic Services
  • School - Individual Strategies
  • Skills Training

Program Setting

  • School

Continuum of Intervention

  • Selective Prevention

Age

  • Early Childhood (3-4) - Preschool
  • Late Childhood (5-11) - K/Elementary

Gender

  • Both

Race/Ethnicity

  • All

Endorsements

Blueprints: Promising

Program Information Contact

Denise Cripps
Executive Officer to the President
St. John's College, Oxford
OX1 3JP
denise.cripps@sjc.ox.ac.uk
Tel: 01 865 277456
Website:

Program Developer/Owner

Maggie Snowling
Department of Experimental Psychology and President


Brief Description of the Program

Nuffield Early Language Intervention is a 30-week language intervention program delivered in the final term in Nursery school (ages 3-4) and the first two terms in Reception class (age 5). The program comprises activities targeting spoken language skills for the first 20 weeks, supplemented for the final 10 weeks with training in two critical components of the alphabetic principle, letter-sound knowledge and phoneme awareness. A second 20-week version begins upon entry into primary school, rather than beginning in preschool.

Outcomes

Fricke et al., (2013)

The program significantly improved scores on:

  • Composite language (vocabulary, grammar and listening comprehension)
  • Narrative awareness
  • Phonological awareness
  • Reading comprehension

Sibieta et al., (2016)

At posttest:

  • Composite language score (vocabulary, grammar, and listening comprehension) significantly favored the intervention group for the 30-week intervention
  • Composite language score for the 20-week intervention was marginally significant

Fricke et al. (2017)

At posttest and delayed 6-month follow-up, as compared to children in control, children in treatment (both in the 30-week and 20-week intervention groups) showed greater improvements in:

  • A composite measure of language skills (vocabulary, grammar, and listening comprehension)

Dimova et al. (2020)

At posttest, children in the intervention condition, compared to the control condition, demonstrated significantly better

  • Language skills
  • Reading ability

Brief Evaluation Methodology

A randomized controlled trial was conducted with 180 children from 15 nursery schools in Yorkshire (Fricke et al., 2013). From each school, 12 children with the lowest mean verbal composite scores were selected as participants in the trial. The waitlist control group received no additional service during the duration of the study. Children were assessed at baseline, posttest, and 6 months after the intervention.

A second evaluation was conducted with 394 children from 34 nursery schools in Yorkshire (Sibieta et al., 2016). From each school, approximately 12 children with the lowest mean verbal composite scores were selected as participants in the trial. Children were assigned to a 30-week treatment group, 20-week treatment group, or waitlisted control group. The study conducted assessments at baseline, posttest, and 6 months after completion of the intervention.

For a third evaluation, Fricke et al. (2017) conducted a replication and extension of the study by Fricke et al. (2013) that involved a randomized controlled trial with 394 children from 34 nursery schools in England (17 in Greater London and 17 in Yorkshire/Nottinghamshire). Within each school, children were assigned to the 30-week intervention (n=132), the 20-week intervention (n=129), or the waitlist control group (n=129). Outcomes were measured at baseline, posttest, and 6 months after completion of the intervention.

Dimova et al. (2020) further examined the program by randomly assigning schools across the United Kingdom to either receive the 20-week intervention program (n = 97) or continue instruction as usual (n = 96). Language and reading skills were assessed at baseline and posttest.

Study 1

Fricke, S., Bowyer-Crane, C., Haley, A. J., Hulme, C., & Snowling, M. J. (2013). Efficacy of language intervention in the early years. Journal of Child Psychology and Psychiatry, 54(3), 280-290.


Study 2

Sibieta, L., Kotecha, M., & Skipp, A. (2016). Nuffield early language intervention: Evaluation report and executive summary. Education Endowment Foundation.


Protective Factors

School: Instructional Practice


* Risk/Protective Factor was significantly impacted by the program

Race/Ethnicity/Gender Details
No tests for program differences by race, ethnicity, or gender.

The training for teachers/teaching assistants to deliver the program is a two-day training course and covers the following:

  • Background to the Nuffield Early Language Intervention program
  • Oral language development (including listening and comprehension)
  • Oral language development (vocabulary, grammar and narrative)
  • Overview of the programme
  • Discovering the Manual (including record-keeping and progress assessment tool)
  • Individual sessions
  • Letter sounds and phonological awareness
  • Record-keeping and assessment tools

The Reception program costs £400 and this includes the training.

Source: Washington State Institute for Public Policy
All benefit-cost ratios are the most recent estimates published by The Washington State Institute for Public Policy for Blueprint programs implemented in Washington State. These ratios are based on a) meta-analysis estimates of effect size and b) monetized benefits and calculated costs for programs as delivered in the State of Washington. Caution is recommended in applying these estimates of the benefit-cost ratio to any other state or local area. They are provided as an illustration of the benefit-cost ratio found in one specific state. When feasible, local costs and monetized benefits should be used to calculate expected local benefit-cost ratios. The formula for this calculation can be found on the WSIPP website.


No information is available


No information is available

Program Developer/Owner

Maggie SnowlingProfessorDepartment of Experimental Psychology and PresidentSt. John's CollegeUniversity of OxfordUnited Kingdom

Program Outcomes

  • Preschool Communication/ Language Development
  • School Readiness

Program Specifics

Program Type

  • Academic Services
  • School - Individual Strategies
  • Skills Training

Program Setting

  • School

Continuum of Intervention

  • Selective Prevention

Program Goals

A school-based program designed to improve children's vocabulary, narrative skills, active listening, and confidence in independent speaking.

Population Demographics

Preschool children with poor language and literacy skills.

Target Population

Age

  • Early Childhood (3-4) - Preschool
  • Late Childhood (5-11) - K/Elementary

Gender

  • Both

Race/Ethnicity

  • All

Race/Ethnicity/Gender Details

No tests for program differences by race, ethnicity, or gender.

Other Risk and Protective Factors

Poor language and literacy skills

Risk/Protective Factor Domain

  • Individual

Risk/Protective Factors

Risk Factors

Protective Factors

School: Instructional Practice


*Risk/Protective Factor was significantly impacted by the program

Brief Description of the Program

Nuffield Early Language Intervention is a 30-week language intervention program delivered in the final term in Nursery school (ages 3-4) and the first two terms in Reception class (age 5). The program comprises activities targeting spoken language skills for the first 20 weeks, supplemented for the final 10 weeks with training in two critical components of the alphabetic principle, letter-sound knowledge and phoneme awareness. A second 20-week version begins upon entry into primary school, rather than beginning in preschool.

Description of the Program

Nuffield Early Language Intervention is a 30-week language intervention program delivered in the final term in Nursery school (ages 3-4) and the first two terms in Reception class (age 5). The first 10 weeks involves three 15-min group sessions (2-4 children per group) per week delivered in preschool. This increases to three 30-min sessions plus two 15-min individual sessions in Reception class. A separate 20-week version skips the initial 10-week preschool portion and begins with the 30-minute sessions in primary school.

Children are taught using multi-sensory techniques within a standard framework. The oral language program aims to improve children's vocabulary, develop narrative skills, encourage active listening and build confidence in independent speaking. New vocabulary is selected with reference to themes common in Early Years' settings and includes nouns, verbs, adjectives, prepositions, pronouns and question words. Narrative work encourages expressive language and grammatical competence. Activities revolve around creating and acting out stories, sequencing and story elements. Listening skills are specifically targeted in the first 20 weeks during the Sound/Listening Game incorporating ideas from Letters and Sounds: Phase 1 (DfES, 2007). This section is extended in the last 10 weeks by activities to promote phoneme awareness (blending and segmenting) and letter-sound knowledge.

Theoretical Rationale

The program is based on research that learning to read builds on oral language skills, and that children must learn to decode print fluently and develop skills to understand what they read to become literate. In addition to phoneme awareness and letter knowledge, reading comprehension requires broader language skills.

Theoretical Orientation

  • Skill Oriented

Brief Evaluation Methodology

A randomized controlled trial was conducted with 180 children from 15 nursery schools in Yorkshire (Fricke et al., 2013). From each school, 12 children with the lowest mean verbal composite scores were selected as participants in the trial. The waitlist control group received no additional service during the duration of the study. Children were assessed at baseline, posttest, and 6 months after the intervention.

A second evaluation was conducted with 394 children from 34 nursery schools in Yorkshire (Sibieta et al., 2016). From each school, approximately 12 children with the lowest mean verbal composite scores were selected as participants in the trial. Children were assigned to a 30-week treatment group, 20-week treatment group, or waitlisted control group. The study conducted assessments at baseline, posttest, and 6 months after completion of the intervention.

For a third evaluation, Fricke et al. (2017) conducted a replication and extension of the study by Fricke et al. (2013) that involved a randomized controlled trial with 394 children from 34 nursery schools in England (17 in Greater London and 17 in Yorkshire/Nottinghamshire). Within each school, children were assigned to the 30-week intervention (n=132), the 20-week intervention (n=129), or the waitlist control group (n=129). Outcomes were measured at baseline, posttest, and 6 months after completion of the intervention.

Dimova et al. (2020) further examined the program by randomly assigning schools across the United Kingdom to either receive the 20-week intervention program (n = 97) or continue instruction as usual (n = 96). Language and reading skills were assessed at baseline and posttest.

Outcomes (Brief, over all studies)

Fricke et al., (2013)

There was a significant impact at posttest and 6 months after the intervention on a composite score assessing language, and on narrative and phonological awareness, and there was a significant impact on reading comprehension at 6 months. There was no impact on literacy (early word reading and spelling) at either assessment.

Sibieta et al., (2016)

At posttest for the composite language score, the study found a significant treatment effect for the 30-week intervention, driven mainly by the grammar measure and the expressive vocabulary measure. It found a marginal effect for the 20-week intervention.

Fricke et al. (2017)

At posttest and the delayed 6-month follow-up, results showed that children in treatment (both in the 30-week and 20-week interventions) demonstrated greater improvements in a composite measure of language skills (vocabulary, grammar, and listening comprehension) as compared to children in control. There was no impact on early literacy and reading comprehension.

Dimova et al. (2020)

At posttest, children in the treatment condition displayed significantly better language and reading skills compared to children in the control condition.

Outcomes

Fricke et al., (2013)

The program significantly improved scores on:

  • Composite language (vocabulary, grammar and listening comprehension)
  • Narrative awareness
  • Phonological awareness
  • Reading comprehension

Sibieta et al., (2016)

At posttest:

  • Composite language score (vocabulary, grammar, and listening comprehension) significantly favored the intervention group for the 30-week intervention
  • Composite language score for the 20-week intervention was marginally significant

Fricke et al. (2017)

At posttest and delayed 6-month follow-up, as compared to children in control, children in treatment (both in the 30-week and 20-week intervention groups) showed greater improvements in:

  • A composite measure of language skills (vocabulary, grammar, and listening comprehension)

Dimova et al. (2020)

At posttest, children in the intervention condition, compared to the control condition, demonstrated significantly better

  • Language skills
  • Reading ability

Mediating Effects

Language scores immediately after the intervention fully mediated reading 6 months after the intervention (Fricke et al., 2013).

Effect Size

Cohen's d ranged from 0.30 to .83 in Fricke et al. (2013), indicating small to large effects, and from .16 to .27 in Sibieta et al. (2016). Fricke et al. (2017) reported small effect sizes (d= .21 to .30). Dimova et al. (2020) reported small to medium effect sizes (= .15 to .36).

Generalizability

The studies reported little demographic information, limiting the ability to generalize the results of the study. However at least one study (Dimova et al., 2020) collected data from schools across the United Kingdom.

Potential Limitations

Fricke et al., (2013)

  • Lack of detail on baseline equivalence
  • No tests for differential attrition, though attrition only 8% and imputation used

Sibieta et al., (2016)

  • No information of reliability and validity of outcome measures, though they seem to be frequently used

Fricke et al. (2017)

  • Baseline equivalence not tested
  • Incomplete tests for differential attrition

Dimova et al. (2020)

  • Baseline equivalence not tested
  • No tests for differential attrition, though attrition rates were relatively low

Endorsements

Blueprints: Promising

Program Information Contact

Denise Cripps
Executive Officer to the President
St. John's College, Oxford
OX1 3JP
denise.cripps@sjc.ox.ac.uk
Tel: 01 865 277456
Website:

References

Study 1

Certified Fricke, S., Bowyer-Crane, C., Haley, A. J., Hulme, C., & Snowling, M. J. (2013). Efficacy of language intervention in the early years. Journal of Child Psychology and Psychiatry, 54(3), 280-290.

Study 2

Certified Sibieta, L., Kotecha, M., & Skipp, A. (2016). Nuffield early language intervention: Evaluation report and executive summary. Education Endowment Foundation.

Study 3

Fricke, S., Burgoyne, K., Bowyer-Crane, C., Kyriacou, M., Zosimidou, A., Maxwell, L., ... & Hulme, C. (2017). The efficacy of early language intervention in mainstream school settings: A randomized controlled trial. Journal of Child Psychology and Psychiatry. Advance online publication. doi:10.1111/jcpp.12737

Study 4

Dimova, S., Ilie, S., Brown, E. R., Broeks, M., Culora, A., & Sutherland, A. (2020). The Nuffield Early Language Intervention: Evaluation Report. United Kingdom: Education Endowment Foundation.

Study 1

Evaluation Methodology

Design:

Recruitment: Nineteen nursery schools in Yorkshire (England) were involved at the outset of the study. In these Nursery schools, all children who were due to enter school (Reception) in the following academic year were screened. Following screening, one school withdrew and three schools were deemed unsuitable. In each of the remaining 15 nursery schools, 12 children with the lowest mean verbal composite scores were selected as participants in the trial.

Assignment: The 180 children from the 15 nursery schools were randomly allocated within each school to receive the 30-week language intervention (n = 90) or to a waiting control group (n = 90).

In addition, six children in each school matched on gender and date of birth to a random sample of three children from the intervention and the waiting control groups acted as a representative peer comparison group against which to benchmark the progress of children (n = 82).

Assessments and Attrition: Children were assessed before the intervention, immediately following the intervention, and 6 months after the intervention. A total of 7 children from the intervention group (7.8%) and 8 children from the control group (8.9%) moved schools and were lost to follow-up.

Sample Characteristics:
The mean age of children at baseline was 4 years. No other demographic information was provided.

Measures:
All measures had high reliability (0.75 to 0.99). The structural equation models used multiple indicators for each of the following four outcomes.

Language Skills were measured with grammar and vocabulary information from the Renfrew Action Picture Test, vocabulary knowledge from the CELF Preschool IIUK Expressive Vocabulary test, and listening comprehension from answers to questions about two short stories read to the child.

Narrative skills were measured using a story retelling task.

Phonological awareness was measured by indicators of alliteration matching and sound isolation.

Literacy skills were measured by an early word reading scale and spelling responses.

Additional measures focused on taught vocabulary using Expressive Picture Naming and Receptive Picture Selection and the Picture Naming and Definitions task. Reading comprehension, available only at the 6-month follow-up, used the YARC beginner passage.

Analysis: The authors used hierarchical linear models or structural equation models, with Maximum Likelihood Missing Value estimators to allow for missing data and robust standard errors to allow for the clustering of children within schools. The structural equation models included baseline outcomes as predictors.

Intent-to-Treat: The study did not follow children who moved schools, but structural equation models with missing values estimators used all 180 subjects.

Outcomes

Implementation Fidelity: There was no information, though the article says that fidelity was monitored: "teaching assistants attended regular tutorials and the research team observed each teaching assistant delivering intervention and provided feedback on five occasions. In addition, teaching assistants completed records of session plans, children's progress and attendance for each group and individual session."

Baseline Equivalence: The intervention and control groups were said to be approximately equated on all measures, but the study presented no d values or significance tests.

Differential Attrition: No tests were performed, perhaps because attrition was only about 8% and missing data were imputed.

Posttest: The intervention had significant and beneficial effects on language skills (d = .80), narrative skills (d = .39), and phonological awareness (d = .49). The effects on literacy (early word reading and spelling) were not significant.

6-month Follow-up: The above posttest effects were maintained at follow-up: language skills (d = .83), narrative skills (d = .30), and phonological awareness (d = .49).

Also, at 6 months, there was a significant effect on reading comprehension (marginal mean group difference = 0.91, 95% CI 0.42-1.41, p < .001). This was found to be fully mediated by language comprehension abilities at posttest.

Additional tests on the vocabulary taught by the program showed higher scores for the intervention group in reception classes but not in nursery classes.

Long-term effects: Not evaluated.

Study 2

Evaluation Methodology

Design:

Recruitment: The study recruited primary schools with attached nurseries that were located in disadvantaged areas of Yorkshire, England. A total of 34 schools were included in the study out of 302 approached. Within selected schools, children were screened using a composite measure of language skills including vocabulary and sentence structure. Children with the 12 lowest scores were invited to participate with parent consent.

Assignment: The randomization process allocated pupils within each nursery to two treatments and one control group and minimized differences across groups in terms of age, gender, and pretest scores using an iterative optimization process. Assignment was conducted at the individual level. Of the 394 assigned students, 132 were in the 30-week treatment, 133 in the 20-week treatment, and 129 in the control group. Participating schools were offered alternate early language development programs after completion of the intervention for waitlisted control participants.

Assessments and Attrition: Assessments were conducted at pretest, posttest, and 6-month follow-up. Of 394 students enrolled, 350 remained at the 6-month follow-up. Of the 34 schools enrolled, 3 left the program before completion. Reasons for attrition included changing schools (n = 34) and not completing one of the assessments (n = 10). In addition, some of the moderator variables gathered from a national database were available for a sample of only 239.

Sample Characteristics: The sample was approximately half female (49%) and an average of 46.1 months old. Given the targeting of disadvantaged schools, about 29% of the students qualified for free school meals and 16% were learning English as an additional language.

Measures: The study distinguished primary and secondary measures. The measures were gathered by research assistants blind to condition. Although the study reported no information on validity or reliability, it appears that the measures are well standardized and commonly used.

For the primary outcome, the study used a composite language measure that consisted of four components: information scores and grammar scores from the Renfrew Action Picture Test (APT), which asks students to describe a set of pictures; expressive vocabulary from the CELF-Preschool 2 UK test; and a listening comprehension test using short stories.

For the secondary outcome, the study used a word-level literacy composite measure consisting of three components: letter-sound knowledge, early word reading, and spelling.

Analysis: The study used fully-interacted linear matching, which linearly interacts the treatment effect with all pre-treatment characteristics and outcomes. The models controlled for gender, age, English as a second language, known speech or language difficulties, and pre-treatment scores for the language composite assessment. They also adjusted for school-level clustering in the estimation of standard errors (as well as checking the results with several other estimation techniques in the Appendix C).

Intent-to-Treat: The analysis included all cases with all data. Three schools dropped out of the intervention condition, but the students were followed and included in the analysis. Only students leaving the schools or not completing the survey were excluded.

Outcomes

Implementation Fidelity: Three schools dropped the program and 5 of the other 31 schools showed significant deviation related to both the structure and delivery of the program. However, the study stated that overall the delivery of the program structure and session components were generally in line with the prescribed model. On average, students attended 80% of classroom sessions and 56% of individual sessions.

Baseline Equivalence: Tests for baseline differences across conditions (Table 9) used the analysis sample of 350 rather than the randomized sample of 394. There were no significant differences between conditions for demographic and outcome measures. Measures based on the subsample of 239 used in the moderation tests did show some differences, however.

Differential Attrition: Tests for baseline equivalence of the analysis sample, which excluded dropouts, indicated that attrition did not compromise the balance between conditions.

Posttest: At posttest, the 30-week intervention group showed significantly more improvement in the primary measure of the composite language score as compared to the control group. This improvement was driven by significant improvement in grammar scores and expressive vocabulary. The 20-week intervention group showed marginal improvement in the language composite score.

For the secondary measure of composite word-level literacy, neither the 30-week nor the 20-week program had significant effects.

The study conducted a 6-month follow-up, finding that the composite language score but not the composite literacy score was significantly improved among both the 30-week and 20-week intervention groups. However, in the intervening 6 months, some schools implemented other reading programs at varying times, which complicates results.

Finally, tests for moderation suggest that the program was most effective for students without known speech and language difficulties or students learning English as an additional language.

Long-Term: The study did not conduct long-term follow-up.

Study 3

Evaluation Methodology

Design:

Recruitment: A total of 302 primary schools with attached nurseries in generally disadvantaged areas were approached, and 34 schools agreed to participate. All children in these nurseries who were expected to enter school (Reception in England) the following academic year were screened. Children in need of special education and children learning English as an additional language were not included in the screening. Up to 12 children were selected within each school using the following criteria: (a) having the lowest mean verbal composite scores in their school, and (b) entering the same primary school they attended for nursery. A total of 394 students met these criteria.

Assignment: Within each school, children were assigned to either the 30-week intervention (n=132), 20-week intervention (n=133), or the waitlist control group (n=129). While the control group received business-as-usual, after the posttest they were given permission to deliver additional language and literacy support provided by the research team (which was different than the Nuffield Early Language Intervention). By the 6-month follow-up, 8 control schools had implemented this training (although the specific nature, quality and intensity varied widely) while the other 19 schools continued with business-as-usual.

Attrition: Overall attrition rates were 7% at the posttest and 16% at the 6-month follow-up.

Sample: Information about sample characteristics was not provided.

Measures:

The measures were gathered by research assistants blind to condition. The primary outcome, language skills, was assessed using a latent measure that consisted of the following six assessments:

  • Expressive vocabulary knowledge was measured using the CELF Expressive Vocabulary subtest (alpha = .82) and the Information Score from the Renfrew Action Picture Test (APT; interrater reliability = .83).
  • Receptive vocabulary skills were assessed using the BPVS (alpha = .91).
  • Grammatical skills were measured using the CELF Sentence Structure subtest (alpha = .78) and the APT Grammar Score (interrater reliability = .89).
  • Listening comprehension skills were tested by asking children to listen to two short stories adapted from the York Assessment of Reading for Comprehension (YARC) and answer questions about them (interrater reliability = .99).

Secondary outcomes were early literacy skills and reading comprehension. Early literacy skills were assessed using a latent measure that consisted of two assessments: the YARCA letter-sound knowledge subtest (alpha = .95) and the YARC early word reading subtest (alpha = .98). Reading comprehension was measured using the two beginner passages from the YARC passage reading test (alpha = .77).

Analysis: Structural equation models (SEM) were constructed using Mplus 7.4 with Full Information Maximum Likelihood estimators to allow for missing data and robust (Huber-White) standard errors to allow for the clustering of children within schools. Although baseline outcomes were adjusted in these models, it appears that the models did not control for demographic variables. In this model, the unstandardized regression weights from the language pretest to the two language post-test factors were fixed to be equal.

Intent-to-Treat: The authors argued that all analyses were performed on an intention-to treat basis.

Outcomes

Implementation Fidelity: Teaching assistants delivered on average 28/30 group sessions in nursery and 49/57 group sessions in reception for the 30-week intervention group. For the 20-week intervention group, teaching assistants delivered on average 49/57 group sessions in Reception. The number of sessions each child attended varied considerably (30-weeks group: nursery group sessions M = 24.69, reception group sessions M = 38.51; individual sessions M = 21.91; 20-weeks: reception group sessions M = 41.11, individual sessions M = 23.01). Some teaching sessions were also observed to assess treatment fidelity. The quality of teaching of different session components were graded on a 5-point scale with the manual instructions as a reference point (1 = several aspects missing/not satisfactory, 2 = some aspects missing/not satisfactory, 3 = according to manual, 4 = according to manual with good use of resources/questions/techniques to support language, 5 = according to manual with very good use of resources/questions/techniques). On average, teaching assistants achieved a mean quality rating of 2.83 for group sessions observations in nursery, 2.95 in the first 10 weeks in reception, and 3.20 in the second 10 weeks in reception. Fidelity and quality ratings for individual sessions tended to be lower than for group sessions (first 10 weeks in reception: M = 2.74, second 10 weeks: M = 2.83).

Baseline Equivalence: Baseline equivalence was tested only on outcome measures without significance tests.

Differential Attrition: Not conducted.

Posttest: At the posttest and the 6-month follow-up, Fricke et al. (2017) reported that children in treatment (both in 20-week and 30-week intervention groups) showed greater improvements in language skills than children in control. Effect sizes were slightly larger for the 30-week intervention group. There were no significant differences between the two treatment groups. Also, there were no significant differences between treatment and control in secondary outcome measures assessing early literacy and reading comprehension.

The effect of interaction between condition and pretest scores were not significant, which means that children with the most severe language difficulties at pretest responded to the intervention to the same degree as children with less severe difficulties.

Long-Term: Not conducted.

Study 4

Evaluation Methodology

Design:

Recruitment: In the summer of 2018, 1,100 schools were approached to participate in the study, and 207 schools expressed interest to be involved in the trial. A total of 193 schools that contained 240 reception classrooms agreed to participate in the study, which was conducted during the 2018-2019 academic year. The schools were recruited from 13 geographic regions across the UK with an emphasis on recruiting a balance of rural and urban schools. To be eligible, schools needed to have not implemented the program before, have above average free school meal eligibility, and agree to implement their assigned condition. From each classroom, the five children with the lowest scores on a composite language skills measure (LanguageScreen) were recruited for the study (total children N = 1,156).

Assignment: Using a stratified cluster randomized controlled trial design, schools were randomly assigned to either the treatment condition (n = 97 schools, n = 585 children) or business as usual control condition (n = 96 schools, n = 571 children). Schools had a 50:50 chance of being assigned to each condition within each geographic cluster. Stratification variables were geographic area and number of classes within a school. The treatment condition received the program. The control condition proceeded with business as usual instruction. However, as an incentive, control schools received £1,000 for their participation.

Assessments/Attrition: Children were assessed at pre-test (prior to randomization) and at posttest (immediately after the 20-week program was complete). Overall, .5% of schools (n = 1) did not complete the posttest; 7% of students (n = 85) did not complete the posttest.

Sample: The only individual-level demographic variables described in the report were gender, with both conditions having slightly more boys than girls (57.4% and 53.2%), and age, with both conditions having average ages around 52 months. At the school-level, the intervention and control groups had similar percentages of students eligible for free school meals (34.05% and 33.81%). Both groups had similar percentages of schools rated as "inadequate" by the Office for Standards in Educations (OFSTED; 4.76 in intervention group, 4% in control group). Additionally, the control group had more schools rated as "outstanding" (18.67% vs. 13.1%) or "requires improvement" (13.33% vs. 4.76%), relative to the intervention group. The intervention group had more schools rated as "good" (77.38% vs. 64%).

Measures:

The measures were gathered by research assistants blind to condition. The primary outcome, language skills, was assessed using a latent measure that consisted of the following four language tests:

• the Clinical Evaluation of Language Fundamentals (CELF) recalling sentences subtest (Wiig et al., 2006)-a test where children are asked to repeat a sentence back to the person carrying out the assessment;

• the CELF expressive vocabulary subtest-where children are asked to name objects or actions depicted in a set of images;

• the Renfrew Action Picture Test (RAPT) information sub-test-a sub-test of RAPT where children are asked to describe the information shown in a set of pictures; and

• the Renfrew Action Picture Test (RAPT) grammar sub-test (Renfrew, 2016)-a sub-test of RAPT that checks the grammar used by children, such as the use of verb tenses, while describing the information shown in a set of pictures (as part of the RAPT information test above).

The York Assessment of Reading for Comprehension (YARC) early word reading test (Snowling et al., 2009)-a test that assesses pupils' single-word reading ability, was also administered. Participants completed the LanguageScreen measure at posttest as well, which captured expressive vocabulary, receptive vocabulary, sentence repetition, and listening comprehension and was measured as a composite variable Unlike the primary outcome measure, both of these secondary outcome measures were not constructed as latent variables.

Analysis: First, using structural equation modelling (SEM), factor scores were created for the primary latent variable language skills outcome at pretest and posttest. Intervention effects on the primary and secondary outcomes were examined using multilevel models, which accounted for clustering of students nested within schools. Models controlled for school-level stratification variables (geographic location and number of classes per school), as well as student baseline scores.

Intent-to-Treat: Outcomes were analyzed using an intent-to-treat approach. The authors stated that no missing data imputation was used given complete data on the primary outcome measure (p. 25).

Outcomes

Implementation Fidelity: The researchers calculated a compliance measure based on the share of eligible school staff attending NELI training, the proportion of group NELI sessions delivered, and the number of individual NELI sessions delivered. Based on these measures, only 11 schools were rated as "high compliers," indicating differences in the quality of program implementation. Data on teacher attendance and externally-recorded number of intervention program sessions completed were not available.

Baseline Equivalence: The researchers did not perform formal tests to assess baseline equivalence, stating that they followed CONSORT guidelines and provided baseline descriptive characteristics in Table 12. Other than school-level "OFSTED" ratings, intervention and control condition baseline characteristics appeared similar for both school- and individual child-level variables. Following WWC guidelines, they did provide effect sizes for the baseline differences for outcome measures and all differences were small (g < .128).

Differential Attrition: Although overall attrition was relatively low, there were no formal tests for differential attrition. At the student level, there was 5.3% (n = 30) attrition in the control condition and 9.4% (n = 55) attrition in the intervention condition.

Posttest: At posttest, primary outcome analyses indicated that intervention school students demonstrated significantly better language skills (as assessed by the latent language skills variable) than control school students (g = .26). For secondary outcomes, intervention school students also demonstrated significantly better single-word reading ability (as assessed by the YARC early word reading test, g = .15) and significantly better language skills (as measured by the LanguageScreen, g = .36).

Long-Term: Not conducted.

Contact

Blueprints for Healthy Youth Development
University of Colorado Boulder
Institute of Behavioral Science
UCB 483, Boulder, CO 80309

Email: blueprints@colorado.edu

Sign up for Newsletter

If you are interested in staying connected with the work conducted by Blueprints, please share your email to receive quarterly updates.

Blueprints for Healthy Youth Development is
currently funded by Arnold Ventures (formerly the Laura and John Arnold Foundation) and historically has received funding from the Annie E. Casey Foundation and the Office of Juvenile Justice and Delinquency Prevention.