Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time

I don’t like grades.

As a student, I oscillated between taking anything but superlative grades as a sign of my own failure and being utterly indifferent to grades as a secondary consideration to learning the material. Either way, grades were an imperfect motivator.

As a teacher, I am even more ambivalent about grades, which I see as something I am required to do in order to rank my students. I am always prouder of a student who struggles and reaches a breakthrough than the genius who coasts through the course, even though the latter receives the higher grade. My own experience as a student informs how I structure my courses, leading to policies that encourage regular engagement, choice in how to complete assignments, emphasis on the process over product, and often opportunities for revision. Each of these course policies marked an improvement, but they all retained the thing that I was in many ways least satisfied with: grades.

A few weeks ago a faculty development seminar introduced me to the broad strokes of Specifications Grading and since it seemed like the direction I have been moving my courses, I spent nearly an hour after the event jotting down preliminary notes for what that might look like in my course. At the end of that day I was intrigued, but needed more information. Over my spring break, therefore, I read Linda Nilson’s Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time (Stylus 2014).

Broadly speaking, Specifications (Specs) Grading is a variation on a pass-fail, contract grading, and competency-based outcomes that ties course assignments to specific course objectives. This model, Nilson argues, has three major benefits. First, setting a high bar for “acceptable” work but giving opportunities for revision imposes rigor without making the professor into a jerk. Second, demystifying the grading process and offering flexibility reduces stress on the students. Third, eliminating partial credit saves time. Some model systems presented a fourth potential benefit of allowing teachers to give more of their limited attention to those students aiming for the higher grades.

In addition to an argument for its benefits, Specifications Grading serves as a guide to adapt traditional grading models to a specs system across two broad categories: outcomes and assignments/rubrics.

If you’re anything like me, you course outcomes won’t work for specs grading. Nobody ever really taught me how to write objectives so what I have in my syllabuses focus on what the students will receive. The conceit of an objective might be well-intentioned, but if the students can’t demonstrate what they are learning through the assessments, then it won’t work. Often this just means a subtle, but significant shift:

  • Students will gain a broad understanding of US history since 1877.
  • Students will be able to identify the major events of American history since 1877

Each of these objectives would then be demonstrated specifically by one or more course assessments. In Nilson’s model, some of these course objectives would correspond to basic, minimal standards like the one listed above. Students who achieve proficiency at those lower-level objectives would be able to pass the course with a C, while students at aiming for a B or A would have to also demonstrate proficiency at objectives that involve more complex skills.

The second part step involves developing detailed one level rubrics that explain everything that the assignment must have to be accounted “proficient.” Now there will be some variability in what that standard should be, but Nilson recommends building the rubric from everything you would expect to see in a roughly B+ assignment. When it comes time to grade the assignments, then, the assessment becomes a binary yes/no, along with some comments that might be used if, as Nilson recommends, the students get the chance for revision.

I have traditionally had an antagonistic relationship with most rubrics because most of the rubrics I have been required to use were a particularly poor match for how I wanted to grade such that someone who received 9/12 on the rubric was solidly in the B+ range according to how I grade. However, I found myself coming around to this model of rubric because it removes the splitting hairs and partial credits in favor of either showing that the students achieved proficiency or did not. The grade translation, in turn, does not come from an individual rubric but from how many assignments in which the student achieved proficiency.

and have been jotting down notes on how I can transform my existing courses with minimal disruption to anything but how I grade.

For my general education classes the assignments might look like (based on a syllabus for this semester):

To receive a “C” in this course (linked to the lowest tier of objectives)

  • Participation [in various forms] of 75%
  • Objective quiz score of 75% [I allow retakes and drop a quiz score, so I have exactly 2 students who are not clearing this bar right now]
  • Journals 10/15
  • Papers 5/5 completed, but not to “proficiency” with historical essay writing

To receive an “A”:

  • Participation of 95%
  • Objective quiz score of 90%
  • Journals 13/15
  • Papers 5/5 to proficiency
  • Completing a final project

The “B” range would obviously fall somewhere in between these two levels, with a “D” a little below “C.” The numbers might be off a little bit, but I would calibrate them based on what my final grade sheet looks like.

For my upper-level classes that are writing intensive and where the students complete three longer essays, a “C” may require revising one of the three essays to proficiency, “B” requires two, and “A” all three. For all of these classes, I am also toying with the idea of creating a list of “recommended” books for the course and allowing any student the opportunity to choose and review one of these books in place of one “proficient” paper—with guidelines for what constitutes an acceptable review, of course.

Specifications Grading also introduced me to a different paradigm to the student-teacher relationship. Students are not customers, Nilson argues, but clients. Specifications grading takes into account that different clients are going to aim at different outcomes. It makes the expectations clear for each tier and lets the client choose which package to pursue. In Nilson’s telling, this allows the teacher to dedicate the most energy to the students most invested in the course by dint of aiming at the top tiers.

This model is tempting given how frustrating it can be to expend disproportionate amounts of energy on reticent students, but it was also the point that left me most uncomfortable with specs grading. One common proposal in the sample syllabuses Nilson provides is setting not only different levels of proficiency, but also different assignments for the different tiers. I incorporated that into one of my sketches above for the final projects, but even there I have been wondering whether the non-project option ought to require an objective test passed at a certain proficiency since under specs grading—something I’m not wild about given that 1) I am skeptical about the value of such objective tests, period; 2) writing such a test would hand back some of the savings in time; 3) keeping track of who is doing what sounds like a lot of bookkeeping.

However, my discomfort with the different assignments for different levels stems is also philosophical. That is, it feels to me like saving time and becoming a better teacher for the invested students involves allowing students aiming at a “C” to fall behind. The counter, I think, is that this is in fact the point. The way I imagine this grading scheme working in my classes, those students would still be expected to attend and complete assignments for the whole semester and gives anyone who wants it the opportunity to achieve every objective. But if students are not interested, then it empowers them to put their energies elsewhere (courses, hobbies, work, whatever). In other words, the client model simple acknowledges the reality that teachers cannot force people to learn anything they don’t want to learn, particularly at the busiest time of the semester.

I have been thinking about the process as setting two different benchmarks: the “C” level for minimum objectives and the level of proficiency for complex objectives where “A” reaches it in every category and “B” reaches it in some. Specs grading dispenses with the murky ambiguity of partial credit where the “C” student allegedly achieved 75% of a given course objective. Thus, it isn’t the “C” student doing less work so much as they hit one set of objectives, while I am vouching that the “A” student has completed more and more complex work that allows me to certify that they have reached proficiency in the others—I can hope the “C” student developed in these other categories, but the grade makes no claim that they did so.

At this point I am ready to dive into specs grading head first, but I’m also sure that whatever system I come up with in the abstract will require adjustment once I get into a semester. So here’s the question for those of you who have used specs grading: what should I be on the lookout for? Is there anything I’m missing?

ΔΔΔ

I keep a list of pedagogy resources along with links to write-ups I have done on this blog.

Hawaii 5-O and “grading shows”

The anatomy of a grading show (defined as a show to have on in the background while grading) is a funny thing. For me they fall into two broad categories. The first are old and familiar shows. The writing, the stories, and the rhythms are familiar. They take no brainpower to watch while marking bluebook exams or multiple choice tests. The second also requires minimal brainpower, but because they are a sitcom or procedural for which the rhythms are familiar, even if the specifics are not. If the show proves to be too captivating then its purpose falters because grading slows down. Usually, this means that the show has to be something I want to see, but far, far on the crummy end of the spectrum.

This current batch of grading has been me watching the reboot of Hawaii 5-O. I’m most of the way through the first season and have a few thoughts on this curious show.

  1. Hawaii 5-O is a show about a special law enforcement task force in Hawaii, led by a former Navy Seal and consisting of outsiders and outcasts. Among other things, their leader, Steve, has returned home to help uncover the cause of his father’s death.
  2. The writing on this show are pretty bad. It is aiming for fast-paced, cryptic, and yet direct. The result is that everyone seems to have inexplicable skills and knowledge, not to mention an extreme unevenness to the plot. Rob Morrow, one of the stars of Numbers, gave an interview a few months ago where he talked about the tendency of that show to be overwritten. It was insufficient for details or information to be conveyed by visual imagery or physical acting alone, but had to be said three times. Morrow mentioned frustration with this and how he used to try to create a script that was more spare and efficient and therefore elegant. Hawaii 5-O has this same problem in spades, with most of the excess dialogue also being bad dialogue.
  3. The superficial premise of the show is pretty people in paradise meets law enforcement, not unlike, for instance, Burn Notice. However, for this core concept, there is a lot of paradise and, aside from the stars, very little in the way of pretty people. The show is far more interested in shoot-outs and set-piece action scenes than in scenery.
  4. Throughout this episode, I’m trying to figure out what the core of the show is. Burn Notice has the tension between patriotism and his being blacklisted (with a dash of family dysfunction). Numbers has the good-hearted rebuilding of sibling relationships and the bringing of family together. NCIS has its goofy office hijinks. This show has aspects of all of these tensions and is desperately trying to recreate these formulas that worked (at least to some degree) in past shows, without actually pulling it off because it does some of all of those. There is a family vendetta, a blacklisted cop, another who is having a custody spat with his ex-wife with whom he would like to get back together. Then there is the extra seasoning of everyone being trigger-happy, which seems to be trying to cover for the failures elsewhere.

    This violence also manifests itself in that the main characters are all-too willing to blatantly disregard most laws, including to torture suspects. The characters sometimes allude to this in the sense that the leader of the team is not himself a cop. There is too much else going on, including that these law enforcement officers are always in a rush to get to their next act of sanctioned vigilantism, but they seem to want the core of the show to be tension of having a Seal in a cop’s job. Of course, asking those questions requires better writing and a larger cast, so Hawaii 5-O is happy to use everything as a throwaway, moving along quickly enough that maybe nobody will notice.

  5. What follows from the last point is that there is a visual representation of a militarized law enforcement that takes the stance that almost everyone else is a victim waiting to happen. Frequently, this results in stern talking-tos. At most there are token references to people outside of the main cast of the show, passing mentions of that they should not be discharging weapons in public spaces, and remorse when the “good guys” cannot save someone.

An end of semester thought

Another semester come and gone, or almost. I have a student primed to come in an collect his final exam tomorrow and I am expecting a grade complaint to ensue, but the other context of this post is that I had a student email me last night or early this morning thanking me for being “stricter TA than the others” because it helped her mold her study habits, her reading, and her writing. The student who sent me that message was a delight to have in class (I actually enjoyed that entire section quite a bit, even if the classroom itself made me sometimes feel like Yuri Petrova while I taught), and I did appreciate the way that she phrased her statement that I was a hard-ass, suggesting that I had expectations about what the students should have prepared before class and what we needed to talk about in class rather than that I was a malicious grader.

In a sense this is another “grade inflation” piece following after “confessions” of grade inflators, a piece about grade compression” instead of inflation, this response to the slate Confessions piece, and this from the Harvard Crimson, dated June 5, 1997 that cites a controversy from four years earlier when a professor at Harvard said in the Harvard Magazine that the causes of grade inflation stem from affirmative action in 1969. The way this latest bout of frustration has swirled across social media† has seemed to strike a nerve with academics. People have stumped for their cause of choice, whether that they are not paid well enough to “waste” time arguing grades, standardized tests (and the ensuing results-based education), customer-model of higher education, the desperate need for good teaching evaluations to keep a tenuous employment,‡ etc. Each also has his or her own response…and no one has a feasible solution. What I have been thinking about, rather, is the aura of mystery that surrounds grades.⚔

I can only echo the frustration expressed elsewhere about the student demand for making the grades and what exactly the grades mean. I really don’t care about grades, even though I dutifully assign them throughout the semester, but, like most teachers, it is a dreaded activity. But I am musing about the perception of grades versus reality. In most of my sections my average test score ranges from about a 77 to an 82 simply based on the class makeup and parameters of the exam and caveats about small sample sizes apply–outlier sections will sometimes skew a little bit lower or a little bit higher than that general range and 81 or 82 is probably the most common average I have seen. Mind you that I am talking just about the tests, and there is usually between 10 and 40 percent of the grades that rely on written responses, attendance, etc, for which a student gets full credit simply for completing the task.± The result of these extra points are that students who follow through with the course work have a final grade somewhat higher than their test grades. Even when the students have read the syllabus, many assume that their grade is exactly as it reads on the tests (an observation, nothing more).

I also don’t particularly like to talk about overall course averages because there is a non-negligible chunk of the students who don’t come to class, miss tests, miss in-class quizzes, and don’t complete response papers…these are most of the students who fail the class. With those students in the equation, the course average may dip below that of the exams, but often pulls it back to even with them. Students who do the work are rewarded for it, those who don’t can sometimes float by on exams alone, but if their exams are borderline, slip below into failing range.

I TA for an intro American history class and have been an adjunct,¥ and rarely have full authority over my own course design and final grades, but my students usually walk away from my classes believing that I am a hard grader, and this is something I worry about. I am fine being known as a somewhat demanding instructor so long as it is coupled with the knowledge that I will reciprocate whatever effort the students put in and work with them to master the material. I would also like to be known as a fair grader, though I know that it is impossible to please everyone all the time. My fear boils down not to fairness, though, nor that I am some kind of boogieman set on the earth to terrorize students, but that my expectations are punishing my students. I do not believe this to be the case, but the recent talk of how other professors and other TAs grade makes me wonder–and in a system that prioritizes results over process, is it simply a cop-out to hide behind the syllabus outlining student responsibilities when they cry foul at the end of the semester because missing work has harmed their grade?

I tell myself that I am about average in terms of actual difficulty; I try to challenge my students every week knowing, but often not revealing until the very end of the course, that the students are doing “fine”§ in my class–hey, the grading parameters are in the syllabus. My students may believe me to be some sort of Devourer-of-GPAs, but in the final calculation doesn’t bear that out, even if I made them work to receive the desired grade.

Of course I could be the one bearing the brunt of the punishment from this perception since if I make it seem that I am not handing out top grades across the board–whether or not any possible “deficiency” (that which I call grading) is buttressed elsewhere in the grade–then the perception is that I am punishing students, keeping them from the sterling GPA that they want. Here perception, not reality, is what matters and a perceived lack of inflation/ease/compression/whatever is a sign of curmudgeonly vindictiveness and a signal that that instructor is the GPA-Devourer at fault for whatever bureaucratic issues the student faces. More directly, unless the student has been engaged with me throughout the semester they probably don’t know that they are doing better than the tests might indicate before they fill out their course evaluations.


† I love most things about Twitter, but its ability to enable internet pitch-fork mobs, ardent Jacobins, and devout Crusaders in defense of their perceived (and sometimes correct) injustices is terrifying.

‡ Of course, those evaluations come in before the final grade, so perception is everything. More below.

⚔ Many students say that they prefer multiple-choice, but the grades are actually lower on them, from which many levels of interpretation may be read.

± There may also be prompt-based papers the students have to complete, but they typically are in the same range as the exams and don’t change the calculation about amount of attendance/response/etc points.

¥ Not every student attended every class, but everyone did all the assignments, so I didn’t quite have this problem in that class.

§ Fine can mean that I don’t care about the grade, but in this context it really means that the student is doing much better than they think they are in terms of the overall grade.

Assorted Links

  1. When Philosophers Join the Kill Chain-An op-ed by Mark Levine in Al-Jazeera about Bradley Strawser, the philosopher who has been defending the moral imperative of done strikes. Levine is highly critical of Strawser, particularly in his attempts to defend the use of drones through the concepts of just war without considering the implications for actual people. Another academic is less than thrilled at Levine’s blunt use of philosophers, but agrees with his overall point.
  2. Remembering Gore Vidal: A Dying Breed– A blog post on the Economist that points out that Gore Vidal was a breed of public intellectual that is not commonly seen anymore.
  3. Court Rejects Assertion that ‘Tenure’ Means Continuous Employment-A law professor in Michigan was fired after she refused to teach the assigned courses, an act that has now been upheld through a court case and an appeal. I am not entirely clear on what the details of the case were, but it seems that she tried to make the claim that tenure entails continuous lifetime employment, something that the court explicitly did not uphold. It seems that this will just help define the parameters of behavior that warrants termination, but it is a definition that bears watching.
  4. Survival Strategy for Humanists: Engage, Engage– A piece in the Chronicle of Higher Education about how humanities can survive in the future. Not much new here, but it is nice that this sort of argument seems to be slowly picking up steam. The idea is that communication, writing, teaching skills need to be taught and then we should stop writing books that are utterly incomprehensible.
  5. Writers and readers on Twitter and Tumblr-An article on Slate that implies that “coddling” (my words) has a negative impact on art and artistry, so the feel good back-patting that takes place between authors and readers online only serves as a cheap form of therapy, but does not improve literature. I think that the author is not totally wrong.
  6. The “Immeasurable”– An enlightening graph about grading.
  7. As always, comments encouraged. What else is out there?