This piece is a response to a recent report titled, “Incentives, Selection, and Teacher Performance: Evidence from IMPACT” by Thomas Dee and James Wyckoff (2013). This response attempts to offer a ground-level perspective from a teacher working within a high poverty neighborhood public school; It’s not trying to undermine the report’s findings. In fact, I believe we need more research conducted on the effects that the IMPACT teacher evaluation system has on the DCPS teacher workforce. Below, you will find certain passages I’ve selected, and my response to the claims and assertions made by authors of this report. Again, this isn’t an attempt to undermine or contradict this report’s findings; rather, it’s an attempt to shed light on the effects of IMPACT from a teacher’s point of view.
#1: Regarding the uniqueness IMPACT, Dee and Wyckoff (2013) claim:
A second unique feature of IMPACT is that its incentives are linked to a multi-dimensional measure of teacher performance (e.g., multiple classroom observations as well as test scores) that is likely to have more validity than test scores alone (e.g., MET 2013). This targeted performance measure may also enhance the efficacy of IMPACT’s incentives because it places some weight on actions teachers control more clearly and directly (e.g., how their classroom practice relates to defined standards of effective instruction). (p.2)
Personally speaking, I believe multiple classroom observations is a great assessment tool if, and when, feedback is purposeful and practical. Meaningful feedback must strike a balance between assessing effective instructional practices and understanding the context of a classroom’s composition. When employing a standardized teacher evaluation system, one significant problem that emerges is the propensity for an evaluator to favor, albeit directly or indirectly, certain classrooms over others. For example, a colleague once mentioned how a Master Educator, who is suppose be an objective observer, once said, “I’m evaluating you the same way I will evaluate a teacher from Deal or Hardy.” Both Deal Middle School and Hardy Middle School are not high poverty neighborhood public schools.
Now, I am not suggesting that effective best practices won’t work in high poverty neighborhood public schools. To the contrary, best practices can, and should, be applied across all schools, regardless of its zip code. However, if student “redirection” is seen as a sign of poor classroom management, then behaviors associated with chronic poverty will undoubtedly punish teachers working within high poverty neighborhood public schools. Therefore, in my professional opinion, the DCPS IMPACT Teaching and Learning Framework (TLF) scoring rubric needs differentiation. Contrary to popular opinion, all neighborhood public schools are not created equal.
#2: Regarding the uniqueness of IMPACT, Dee and Wyckoff (2013) assert:
In the current context, there are several substantive reasons that IMPACT offers a unique opportunity to examine the effects of a robust package of performance-based teacher incentives. First, as we describe below, IMPACT introduced exceptionally high-powered incentives (i.e., the threat of dismissal for low-performing teachers as well as substantially larger financial incentives for high-performing teachers). Second, these incentives were linked to a multi-faceted measure of teacher performance consistent with emerging best practices (e.g., clearly articulated standards, the use of several data sources including several structured classroom observations) rather than simply to test scores alone. (p. 8)
I certainly appreciate the multi-faceted measurements within the IMPACT teacher evaluation system. That being said, Group 1 teachers – ELA and math – are held accountable for student performance on high-stakes, standardized tests. It’s a well-known fact that a student’s performance on standardized tests have a direct correlation to his or her family income. Therefore, Group 1 teachers, specifically within a high poverty neighborhood public school, have a unique set of obstacles to overcome. At first glance, the opportunity for Group 1 teachers, particularly those working within high poverty schools, to earn a higher bonus seems extremely fair. However, given the shear stress and unique set of challenges, education policymakers must revisit the use of back-end financial incentives.
In my humble opinion, a front-end bonus, or pay increase, for Group 1 teachers operating within one of the 40 lowest performing schools can serve as a powerful recruitment tool. Their work is often the most challenging within the entire school District. If we want our best and brightest teachers working in the most challenging environments, then we need to reward teachers who not only excel in such environments, but also take on the challenge in the first place. The back-end bonus falls short of showing appreciation for the day-to-day challenging work.
#3: Regarding the use of IVA as part of the rubric for Group 1 teachers, Dee and Wyckoff (2013) state:
A second component of a teacher’s overall score is based exclusively or in part on the test performance of their students. More specifically, for “Group 1” teachers, these scores include their calculated “Individual Value Added” (IVA): a teacher’s estimated contribution to the achievement growth of their students as measured on the DC Comprehensive Assessment System (CAS) tests and conditional on student and peer traits.4 The “Group 1” teachers for whom IVA is calculated are only those for whom the available CAS data allow for the estimation of value added (i.e., only reading and math teachers in grades 4 through 8). The IVA measure is not defined for the majority of DCPS teachers (i.e., about 83 percent of the general-education teachers in DCPS). In lieu of an IVA score, these teachers instead receive a Teacher-Assessed Student-Achievement (TAS) score. At the beginning of each academic year, teachers choose (and administrators approve) learning goals based on non-CAS assessments. At the end of the year, administrators rate the teacher’s success in meeting these goals using a rubric that emphasizes student learning or content mastery. (p. 9)
Yes, the IVA applies to Group 1 teachers only. Yes, there are four components to the IMPACT rubric. However, the indisputable fact that chronic poverty affects student performance is simply not addressed or accounted for equitably. There are vast differences between teaching a low-income student within a low poverty neighborhood public school versus teaching a low-income student within a high poverty neighborhood public school. High poverty neighborhood public schools aren’t necessarily provided with additional resources, and often suffer from an ineffective student-to-teacher ratio. Simply put, the educational playing field is not level. For example, the chart below illustrates school composition, with respect to student performance, across DCPS public schools, per ward. A Group 1 teacher working in Ward 3 doesn’t have the same percentage of students performing at the basic or below basic level, as a Group 1 teacher in Ward 8. For example, according the data (2012), Ward 3 neighborhood public schools are largely comprised of students on, or above, grade level. In contrast, Ward 8 neighborhood public schools overwhelmingly consists of students at the basic, or below basic, performance levels.
Sources: Please refer to the end of this blog entry.
Sources: Please refer to the end of this blog entry.
Furthermore, there are popular misconceptions on the use of an IVA component based on Value Added Models (VAMs). Politicians and education policy experts often promote the idea that VAMs capture student academic growth. However, what they fail to understand is that VAMs don’t account for “multiple guessing.” If a student enters the school year reading multiple years behind, he or she will struggle to “access” the text (i.e. Lexile scores) of the standardized test. What we – teachers – see within the classroom, during high-stakes testing, is a struggling reader’s propensity to “Christmas tree” the exam. Since the test format is predominantly multiple-choice, every student has at least a 25% chance of guessing the correct answer. So, a low test score may show a student’s inability to “access” the text/passage itself, and not an individual teacher’s “performance.”
In fact, VAMs based on proficiency rates don’t capture Group 1 teachers’ efforts in raising student growth based on diagnostic data. If the IVA reflected a diagnostic-based student growth model, then, in my opinion, it will be a more accurate assessment of any given teacher’s performance. But, if it’s benchmarked to proficiency rates, it will undoubtedly favor schools consisting of students that are on, or above, grade level. This isn’t difficult to understand or earth-shattering to consider. For example, the chart above illustrates the distribution of “highly effective” teachers, per DC Wards. As per the data (2012), Ward 3 had the most “highly effective” teachers, the highest percentage of students on, or above, grade level, and the lowest percentage of children living under the poverty line. In contrast, Ward 8 had the lowest percentage of “highly effective” teachers, the lowest percentage of students on, or above, grade level, and the highest percentage of children living under the poverty line. Subsequently, Ward 8 neighborhood public schools also had the highest percentage of students performing at the below, or below basic, levels.
#4: Regarding the threat of dismissal on minimally effective teachers, Dee and Wyckoff (2013) stress:
In addition to these mechanical dismissals, IMPACT may encourage some low-performing teachers who otherwise would have remained to voluntarily exit DCPS. Thirty percent of first-time Minimally Effective teachers voluntarily exit DCPS while only 13 percent of teachers who are Effective or Highly Effective do so. As might be expected, Minimally Effective teachers closest to the Effective threshold are more likely to remain in DCPS than those furthest from it. Only 28 percent of first-time Minimally Effective teachers whose IMPACT scores are within 25 points of the Effective threshold (IMPACT scores of 225-249) voluntary exit DCPS, while 39 percent of those within 25 points of the Ineffective threshold (IMPACT scores of 175-199) voluntarily exit. These descriptive outcomes are consistent with a restructuring of the teaching workforce that is implied by the incentives embedded in IMPACT. Less effective teachers under a threat of dismissal are more likely to voluntarily leave than teachers not subject to this threat, and those furthest from the threshold even more likely. (p. 17)
Personally speaking, part of the reason “minimally effective” teachers within high poverty neighborhood public schools voluntarily leave is due to a “demoralizing effect” of IMPACT ratings. For example, although there are four components to IMPACT, a Group 1 teacher may score “effective” or “highly effective” in three of the four components, yet ultimately receive a final IMPACT rating of “minimally effective.” In addition, I believe there’s a significant difference between a Group 1 teachers and Group 2 teachers, with respect to earning a “minimally effective” rating. Whereas a “minimally effective” Group 2 teacher may feel motivated to improve his or her instruction practices, commitment to school community, and teacher created assessments; the direct correlation between a student’s performance and chronic high poverty may cause Group 1 teachers, particularly those working with in high poverty neighborhood public schools, to feel demoralized and unmotivated.
#5: Regarding policy considerations for teacher-evaluation systems, Dee and Wyckoff (2013) acknowledge:
Overall, the evidence presented in this study indicates high-powered incentives linked to multiple indicators of teacher performance can substantially improve the measured performance of the teaching workforce. Nonetheless, implementing such high-stakes teacher-evaluation systems will continue to be fraught with controversy because of the difficult trade-offs they necessarily imply. Any teacher-evaluation system will make some number of objectionable errors in how teachers are rated and in the corresponding consequences they face. (pp. 28-29)
In my honest opinion, we need to isolate the effects of high-stakes teacher evaluation systems to Group 1 teachers, specifically within high poverty neighborhood public schools. In doing so, we will be able to identify gaps, such as the resource or professional development gaps. Once identified, education policymakers can provided targeted solutions and resources to Group 1 teachers within high poverty neighborhood public schools. Identifying which teachers are “less effective” is not the aim of any credible teacher evaluation system. On the contrary, a credible system should identify least effective classroom environments.
If we can identify which environments are least conducive for highly effective instructional practices, then we can begin problem solving effectively. Conversely, if we choose to ignore the correlation between student performance and chronic high poverty, then we are avoiding an important issue that is plaguing our high poverty neighborhood public schools. Thus, I urge education policymakers to not only seek the input from Group 1 teachers within low poverty neighborhood public schools, but also seek input from Group 1 teachers within high poverty neighborhood public schools.