from ERS SPECTRUM, Winter 2005, Educational Research Service, Arlington, VA

Can Professional Development Programs Help Close the Achievement Gap?

by C. Jayne Brahler, William L. Bainbridge, and Margaret Stevens

This paper explores the question of whether it is possible to design professional development programs for teachers that can significantly improve student test results and reduce the achievement gap for students.

The Dayton (Ohio) Foundation and the Montgomery County (Ohio) Educational Service Center, sponsors of The Miami Valley Teacher/Leadership Academy, answer this question with a resounding yes. Recent results indicate that the academy’s two-year program, designed to provide professional development to help improve student test scores, not only succeeded in significantly improving the student proficiency mean score, but also considerably reduced the achievement gap for participating students.

The paper includes two parts. First, the authors discuss how to measure the achievement gap and assess its changes in individual classrooms. Second, they report the findings of an original research project relative to affecting the achievement gap.

[end box]

The Miami Valley Teacher/Leadership Academy began to implement its program in September 2001 via a contract with the Center for Performance Assessment and 12 participating school systems. The program was designed by Dr. Douglas Reeves, a consultant and innovator in the field of educational assessment and accountability systems. His book Making Standards Work documents the MSW program, which has been used throughout the country as a guide for standards and performance assessment implementation. The MSW program meets the federal guidelines of high-quality professional development. It is scientifically research-based and focused on student achievement, and it follows a continuum from knowledge and skills to supported job-embedded practice and reflection.

The academy incorporated a research component into its overall design to assess whether student achievement would improve as a result of the professional development. A sampling of 12 teachers serving 291 students in grades 3, 4, and 5 were studied. The research revealed two major findings: the gain in overall mean test score from pre- to post-test for students of teachers who attended all eight sessions was significantly higher than that of students of teachers who attended two to four sessions, and the achievement gap closed for the experimental group.

While the increase in proficiency test scores was expected, the close in achievement gap was a surprise. Thus, it merits closer scrutiny. How can individual teachers determine if they are making an impact on the achievement gap? Is it easily measured within the individual classroom? Is there a formula for closing the achievement gap? What types of activities are associated with closing the achievement gap?

The Achievement Gap

For the individual classroom teacher, the achievement gap resembles a huge, gaping crevasse that separates the mean proficiency test scores of lower- and higher-achieving students. The goal is to narrow this gap at the national, state, and district levels. Every shred of progress toward narrowing the achievement gap must begin in the individual classroom.

The preceding paragraph is built on a common conceptual bottleneck concerning the achievement gap. Reread this sentence: "For the individual classroom teacher it resembles a huge, gaping crevasse that separates the mean proficiency test scores of lower- and higher-achieving students." But the phrase "achievement gap" does not refer only to mean proficiency test scores. Mean proficiency test scores are a measure of central tendency, not of deviations from the arithmetic mean of individual student performances. Measures of central tendency do not provide information about a disparity in student achievement, yet the term "achievement gap" speaks directly to remedying such a disparity.

An effective way to understand this concept is to consider a classroom of students for whom the mean test score is 75 percent. That may seem acceptable, until further investigation reveals that few students are performing near 75 percent. Instead, most students fit into one of two categories: falling below 30 percent or performing above 90 percent. This represents a considerable achievement gap and also means that student scores are greatly varied. Narrowing this variance equates to narrowing the achievement gap. Everyone would like to raise mean test scores, but it is equally important to decrease the disparity between student performances.

That said, how many teachers deal regularly with "variances" and are comfortable with assessing changes in variance? Most people are familiar with standard deviations (a statistic used as a measure of the dispersion or variation in a distribution; the squared deviation from the arithmetic mean), and standard deviations are a measurement of variance. The standard deviation can be used as the measure of the variance in student scores within a classroom. For the purpose of assessing changes in the achievement gap within classrooms, the variance at the beginning of the year can be compared with the variance at the end of the year. When the changes in the standard deviation are collected across several classrooms, one can gain information about the combined effect of those classrooms on the achievement gap (the disparity in student performances).

The distribution pattern of a set of scores can be represented graphically and provides information about two parameters: the mean and the standard deviation. There are five basic distribution patterns: symmetric, bimodal, skewed, flat, and outliers.  Normal distributions are part of this family of distributions. They are symmetrical, with scores concentrated more in the middle than in the tails. Normal distributions also are known as "bell curves." They suggest that the majority of student scores should cluster around a middle score, with equal numbers of scores falling above and below that midpoint. The bell curve has been criticized because it suggests that a "mid-level" performance should predominate in any group of students. In fact, for many years teachers were encouraged to adjust student scores to fit the bell curve so that the majority of scores were in the mid-level range.

Standards-based assessment, in contrast to the bell curve, sets a high-level criterion performance as the goal for all students. The concept of wanting all students to achieve high-level performance is both similar to and different from the normal distribution, or bell curve. The majority of student scores still are clustered tightly together around a mean score, which equates to closing the achievement gap. The mean student score, however, no longer must be at a mid-point performance. Instead, it ideally will be at a high level, which equates to most students meeting that standard.

Let’s look at these concepts graphically. We will focus on symmetrical, bimodal and skewed distribution patterns because they graphically represent: 1) no achievement gap, with mid-level mean performance; 2) significant achievement gap, with bimodal distribution; and 3) no achievement gap, with high-level mean performance.





Figure 1: This frequency plot depicts a symmetric (bell curve) pattern. Student scores are clustered evenly about a mid-level mean, and the standard deviation is relatively small and evenly distributed.


Figure 2: This frequency plot depicts a bimodal distribution of scores. Student scores are not centered on the mean, and a large disparity exists between the scores for low-achieving children and those for high-achieving children. Such a data set has a large standard deviation and indicates a large achievement gap.

Figure 3: This figure graphically depicts a distribution of student scores with a high-level mean score, with student scores clustered tightly about that mean. This type of distribution is called a "skewed" distribution. Specifically, the distribution is skewed to the right, representing higher test score performances. This figure graphically depicts an ideal standards-based distribution of student scores because of the high-level mean performance with no gap in student achievement. The standard deviation is smaller because there is less variance between student scores.

Anything that moves a test score distribution toward normal distribution will decrease the standard deviation, or variance, between student scores. Reducing the variance between student scores is a tangible example of closing an achievement gap.


Closing the Gap

In this section we provide an original example of how a professional development activity, sponsored by the Miami Valley Teacher Leadership Academy, affected the achievement gap for children in grades 3, 4, 5 and 6 at participating schools.

The evaluation project covered several components of the academy but focused primarily on MSW. The study was funded by Dayton community businesses. Miami University’s Applied Research Department collected data from July 2003 through July 2004, and faculty and staff from Miami University and The University of Dayton completed all survey and data analyses and interpretations.

The study’s main finding was that the MSW professional development activity was successful in closing the achievement gap in all subject areas of the Ohio Proficiency Test (citizenship, math, reading, science, and writing). The study examined the achievement gaps of students taught by teachers who had participated in the MSW sessions compared with those of students taught by non-participating teachers. Two performance measures were selected to indicate the effect the program had on student performance and included the gain in proficiency scores from pre- to post-test and the change in test score distribution from pre- to post-test.

Twelve teachers who had no prior exposure to the MSW curricula were selected to participate in the study. All 12 teachers volunteered for the study and were provided the opportunity to participate in a total of eight MSW sessions over the course of 12 months. The sessions covered: Making Standards Work; Power Standards; Designing Performance Assessments; and Unwrapping Standards. Teachers were surveyed at the end of the study to determine the number of sessions they had attended and were divided into groups accordingly. The "control" group attended four or fewer sessions, while the "experimental" group attended all eight sessions. All sessions were considered to be of equal importance.

The 12 teachers instructed a total of 291 students. The Ohio Proficiency Test score data (grades 4 and 6) and the off-year proficiency tests score data (grades 3 and 5) were the performance measures used to assess the effectiveness of the MSW sessions. Student test scores were converted to three levels of performance, namely, Below Proficient (0), Proficient (1), and Advanced Proficient (2). Student gain scores were calculated by subtracting pre-test scores from post-test scores. Results are depicted in figure 4 below.


Figure 4. Test score gain is significantly different based on total number of MSW sessions completed.

The chart reveals the gain in student achievement from pre- to post-test for students of teachers who attended eight MSW sessions compared to that of students of teachers who attended two to four sessions. Students of teachers who attended eight sessions showed a mean gain of .71 points compared with a mean gain of .31 for students of teachers who attended two to four sessions. In other words, on average, students of teachers who attended eight MSW sessions advanced almost one entire performance level from Below Proficient to Proficient or from Proficient to Advanced Proficient.


Summary and Discussion

Figure 5 is a vertical box plot of the gain in student scores for students whose teachers attended zero, two, four or eight MSW sessions. On the y-axis, each increment of one represents a shift up or down between the three categories of "below-proficient," "proficient," and "advanced proficient." The x-axis groups the data according to number of sessions attended. Each one of the horizontal lines in the box plot represents an important number related to the data set. The top and bottom lines are drawn at the lowest and highest data values, representing the range. The three lines that form the box are drawn 25 percent, 50 percent, and 75 percent of the way through the data. These five numbers—the least, 25 percent, 50 percent, 75 percent, and the greatest—form the five-number summary and provide valuable information about the range and distribution of student scores.

Figure 5


This box plot represents the data collected in the current study and shows that for the teachers who attended eight sessions, the mean test score gain for 50 percent of their students was 0.05 (or half up from their baseline category to the next higher category) compared with a mean gain of zero for the students whose teachers were in the other groups.

Many factors, such as the number of years of teaching experience or the individual content area, can influence student test score performance. In this study, analysis of covariance (ANCOVA) tests were used in order to determine if there was a significant difference in student gain depending on teaching experience or subject area (citizenship, math, reading, science, and writing). In other words, in addition to checking for significant differences in performance for students whose teachers attended 2-4 versus 8 MSW sessions, we further checked to make certain that the difference in student performance was not actually due to teaching experience or subject area. The ANCOVA tests confirmed there was not a significant difference in gain in student achievement based on either of these two confounding variables and that there was a significant difference based on teacher participation in the MSW sessions

The professional development participation rate required to successfully implement eight sessions of MSW is a significant finding and validates the return on the academy’s investment. Additionally, per our previous discussion, this finding is especially significant when one considers the pattern of variance demonstrated by the test scores of the control and treatment students. The average test score standard deviation increased for the control group (+.13) but actually decreased for the experimental group (-.06). In other words, the disparity across student performances was lessened (the achievement gap was closed) for the students whose teachers had attended all eight MSW sessions. At the same time, the overall mean test scores increased, more so for the teachers who attended eight sessions compared with teachers who attended two or four sessions.



When the data from figures 4, 5, and 6 are combined, we have a graphical depiction of a distribution of students’ scores, with a mean score that has been raised following the teacher participation in eight MSW sessions. Individual student scores are clustered tightly about that mean. This type of distribution resembles the right-skewed distribution pattern previously discussed, which represents higher test score performances and a narrowed achievement gap. Recall that this distribution pattern graphically depicts an ideal standards-based distribution of student scores because of the high-level mean performance with no gap in student achievement. The standard deviation is smaller because there is less variance between student scores.

One purpose of this article was to share the method for assessing the achievement gap using the distribution of student test scores. The study utilized this method, and its main objective was to determine if a specific professional development activity (in this case, MSW) was successful in closing the achievement gap for students in classrooms of teachers who participated significantly, compared with students of teachers who did not. Both performance measures (gain in proficiency test scores from pre- to post-test (figures 4 and 5) and the change in test score distribution from pre- to post-test (figure 6) indicate the program was successful in closing the achievement gap, but significant participation was required to have the greatest effect on test score improvement.

An explanation for why there was a decrease in the disparity between student test score performance for students whose teachers attended eight but not two to four MSW sessions would require further study, but it may be considered logical to assume that teachers who completed eight sessions benefited from high-quality, job-embedded professional development with content focused on student achievement and incorporating scientifically based research. The study shows that as teachers moved forward on the professional development continuum, student achievement improved, and disparity in student performance was significantly minimized.


While limited to a single study, these research findings indicate that high-quality professional development can help to close the achievement gap. A critical amount of professional development may be required, however, before effectiveness reaches a measurable level in terms of student achievement.

This study also shows that principals and teachers can and should monitor student progress not only by looking at mean scores, but also by studying variance to assess its effect on the achievement gap.


C. Jayne Brahler, PhD, is a faculty member at the University of Dayton. William L. Bainbridge is a distinguished research professor at the University of Dayton and president and CEO of SchoolMatch, a Columbus, Ohio-based educational consulting, data, and research firm. Margaret Stevens is assistant superintendent of the Montgomery County Educational Service Center in Dayton, Ohio.




1. Data collection completed by Applied Research Center of Miami University.

2. Data analysis and interpretation completed by C. Jayne Brahler; University of Dayton.