One Year of Specs Grading: the Good, the Bad, and the Ugly

Spring semester 2023 is in the books. It actually has been in the books for a few days, though I have spent the time since working on wrapping up its ragged ends.

In truth, this was a second consecutive brutally difficult semester, making the 2022/23 school year one of the most difficult of my career. In addition to a series of crises external to what happened in the classroom, I was also teaching three new classes: upper level surveys on Ancient Rome and Persia, and a first year seminar. The demographics of that first year seminar were particularly challenging such that I had to functionally re-write the syllabus midway through the semester and I had one student so difficult that I came to dread walking into that classroom.

I wrestled the semester into submission eventually. My Persia class might be my favorite class I have ever taught, in large part because of the mix of students, and my revised first year seminar syllabus along with a slightly different approach to discussion allowed my students to pick up on the themes and skills that are most important for the course. In each of these three classes I was also able to build trust with the students that we were able to largely weather the techpocalypse ransomware attack that took down the network two weeks before the end of the semester. The outpouring of comments from students in the last few weeks was enormously moving, but I also want to recognize how hard I had to work to get there.

However, by way of semester retrospective, I want to focus on one academic year using Specifications Grading. I adopted this system because it promised to make my life easier, and my spring changes like an UnGrading system to assess participation and taking attendance every day worked, but, one year in, I am left wondering whether a specs model is the right fit for most of my classes.

The Good

My favorite part of specs grading is not assigning grades to assignments. The obsession with grades is deeply rooted in students, but grades themselves are often a poor match for learning. Specifications, by contrast, clearly establishes my expectations and, at least in theory, gives the students guidance on how they can earn credit for an assignment. This is still a form of grading, but the expectations provide a framework within which the students can learn and my feedback can focus on whether the student has met the expectation for that assignment. Moreover, the grades are earned across categories, meaning that the students have to engage with each part of the course and the clear expectations for each grade tier can allow students to prioritize their efforts if, for instance, they have met the requirements for their target grade in my class and need to focus instead on passing a different one.

Moreover, by modifying the expectations up or down for either the overall grades or for individual assignments I can adjust what my expectations are for the students. Thus, when our tech issues struck, I could easily fulfill every learning objectives and still lower the expectations for several graded categories in my classes, much to the relief of my students.

I particularly found specifications grading effective for relatively small, repeated assignments like journals where partial credit is particularly arbitrary and missing the rubric on one or two assignments both teaches an important lesson about following the assignment guide and has a relatively minimal overall effect on the final grade. Whether or not I continue with Specifications Grading as an overarching grading scheme, I will definitely carry these aspects forward into what comes next.

The Bad

More of my students this semester than in the fall term seemed to embrace the spirit of the specs grading and understood how the grade tiers worked, but this still left me with some students who struggled to see the connection between the work that they were completing the grade tiers in the syllabus. A couple of these were unique cases with a confluence of circumstances, but others were more persistent and connected to another issue that frustrated me last semester.

One of the keys to Specifications Grading is transparency. Every assignment guide came with a detailed rubric that spelled out exactly how to earn credit for that assignment. These rubrics were prescriptive in that they articulated the formal characteristics that I was grading on, but they were deliberately open-ended so that the students could work within the guardrails to express themselves. For instance, the journal assignment specified a length, a mandate to include a date, title, and word count, and a set of prompts like “what was the most interesting thing you learned from class this week” or “how would something you learned this week change a paper you wrote earlier in the semester.” For responses to a class movie, the rubric might be that you need to answer each question with at least 2 complete sentences appropriate for the movie.

However, I often got the sense that the students weren’t checking their work against the rubric before submitting it. In the small repeated assignments one or two times being told that an assignment wasn’t accepted put the students back on the right track, but then in some of these cases the students would trip up in exactly the same way on the next assignment.

Even more worrying was that this also happened on bigger assignments like papers where students turned in sometimes two or more drafts that seemed to rely on little more than hope that it fulfilled the rubric, even after having the students use this exact rubric for the purposes of peer review. I allow students to revise their papers both as a matter of praxis for teaching writing and because not doing so would be too draconian a policy for a specs system (see below), but nevertheless getting rounds of papers that simply ignored the guidelines, and, in at least one case, introduced new ways that the paper missed the rubric on revision, made me ask in frustration why I provide the rubrics in the first place.

But for all of these frustrations, these are not the reasons I’m considering whether to keep a specifications model or adopt some sort of hybrid system.

The Ugly

Two semesters into using Specifications Grading, my biggest question is whether it is a good match for writing-enhanced classes.

I really like the rubric I designed for grading essays in this system. Unlike most specs rubrics that use a proficient/not-proficient binary, my rubric has two “pass” tiers, one for basic proficiency and another for advanced. The advanced tier I calibrated at roughly a low-A. Earning a C in this course required revising one of three papers to the advanced tier and just the first tier for the other two, a B required revising two, and the A required all three.

Despite the promises of specs grading, I have not found that this system saves me any time at all, especially when grading papers on the learning management system, which I do as a matter of equity (e.g. costs of printing), scheduling (e.g. not having things due at class time), and convenience (e.g. I can toggle between versions). Simply put, I found that a lot of students would not be able to write well enough to fulfill the advanced tier of the rubric on one paper, let alone three. Even when they looked at the scored rubric, which was not always the case, I felt like I had to give lots of direct and actionable feedback in the paper itself, in the rubric comments, and in the summary comments on the paper. Otherwise, I feared, the students might not be able to make the connections between whatever they wrote and the rubric scores.

Let me be clear here: the system works. As I told my students, my goal at this point in their college career is to help build good writing skills and habits so so that every student knows that they can revise a (relatively short) paper to a high quality before they get to the two research-centric classes that they take in their junior and senior year. I am also comfortable with the rubric calibration because each semester I had a few students who fulfilled the rubric with no or minimal revisions to their paper, and nearly every student improved dramatically from the start of the semester to the end.

But there were also some days when I felt like I was dragging two classes worth of students (46, at final count) toward writing proficiency, on top of being responsible for the course content, two sections of tag-along non-WE sections of these courses (6 students), and the first year seminar. It was a lot. Having two sections of this process of course magnified all of the issues, but it also left me wondering whether continuing down this path toward completely spec-ified writing-enhanced courses is sustainable. I don’t relish the prospect of going back to traditional points-based grading either, which makes me wonder if I can imagine some sort of hybrid grading scheme that does what I want it to do.

Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time

I don’t like grades.

As a student, I oscillated between taking anything but superlative grades as a sign of my own failure and being utterly indifferent to grades as a secondary consideration to learning the material. Either way, grades were an imperfect motivator.

As a teacher, I am even more ambivalent about grades, which I see as something I am required to do in order to rank my students. I am always prouder of a student who struggles and reaches a breakthrough than the genius who coasts through the course, even though the latter receives the higher grade. My own experience as a student informs how I structure my courses, leading to policies that encourage regular engagement, choice in how to complete assignments, emphasis on the process over product, and often opportunities for revision. Each of these course policies marked an improvement, but they all retained the thing that I was in many ways least satisfied with: grades.

A few weeks ago a faculty development seminar introduced me to the broad strokes of Specifications Grading and since it seemed like the direction I have been moving my courses, I spent nearly an hour after the event jotting down preliminary notes for what that might look like in my course. At the end of that day I was intrigued, but needed more information. Over my spring break, therefore, I read Linda Nilson’s Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time (Stylus 2014).

Broadly speaking, Specifications (Specs) Grading is a variation on a pass-fail, contract grading, and competency-based outcomes that ties course assignments to specific course objectives. This model, Nilson argues, has three major benefits. First, setting a high bar for “acceptable” work but giving opportunities for revision imposes rigor without making the professor into a jerk. Second, demystifying the grading process and offering flexibility reduces stress on the students. Third, eliminating partial credit saves time. Some model systems presented a fourth potential benefit of allowing teachers to give more of their limited attention to those students aiming for the higher grades.

In addition to an argument for its benefits, Specifications Grading serves as a guide to adapt traditional grading models to a specs system across two broad categories: outcomes and assignments/rubrics.

If you’re anything like me, you course outcomes won’t work for specs grading. Nobody ever really taught me how to write objectives so what I have in my syllabuses focus on what the students will receive. The conceit of an objective might be well-intentioned, but if the students can’t demonstrate what they are learning through the assessments, then it won’t work. Often this just means a subtle, but significant shift:

  • Students will gain a broad understanding of US history since 1877.
  • Students will be able to identify the major events of American history since 1877

Each of these objectives would then be demonstrated specifically by one or more course assessments. In Nilson’s model, some of these course objectives would correspond to basic, minimal standards like the one listed above. Students who achieve proficiency at those lower-level objectives would be able to pass the course with a C, while students at aiming for a B or A would have to also demonstrate proficiency at objectives that involve more complex skills.

The second part step involves developing detailed one level rubrics that explain everything that the assignment must have to be accounted “proficient.” Now there will be some variability in what that standard should be, but Nilson recommends building the rubric from everything you would expect to see in a roughly B+ assignment. When it comes time to grade the assignments, then, the assessment becomes a binary yes/no, along with some comments that might be used if, as Nilson recommends, the students get the chance for revision.

I have traditionally had an antagonistic relationship with most rubrics because most of the rubrics I have been required to use were a particularly poor match for how I wanted to grade such that someone who received 9/12 on the rubric was solidly in the B+ range according to how I grade. However, I found myself coming around to this model of rubric because it removes the splitting hairs and partial credits in favor of either showing that the students achieved proficiency or did not. The grade translation, in turn, does not come from an individual rubric but from how many assignments in which the student achieved proficiency.

and have been jotting down notes on how I can transform my existing courses with minimal disruption to anything but how I grade.

For my general education classes the assignments might look like (based on a syllabus for this semester):

To receive a “C” in this course (linked to the lowest tier of objectives)

  • Participation [in various forms] of 75%
  • Objective quiz score of 75% [I allow retakes and drop a quiz score, so I have exactly 2 students who are not clearing this bar right now]
  • Journals 10/15
  • Papers 5/5 completed, but not to “proficiency” with historical essay writing

To receive an “A”:

  • Participation of 95%
  • Objective quiz score of 90%
  • Journals 13/15
  • Papers 5/5 to proficiency
  • Completing a final project

The “B” range would obviously fall somewhere in between these two levels, with a “D” a little below “C.” The numbers might be off a little bit, but I would calibrate them based on what my final grade sheet looks like.

For my upper-level classes that are writing intensive and where the students complete three longer essays, a “C” may require revising one of the three essays to proficiency, “B” requires two, and “A” all three. For all of these classes, I am also toying with the idea of creating a list of “recommended” books for the course and allowing any student the opportunity to choose and review one of these books in place of one “proficient” paper—with guidelines for what constitutes an acceptable review, of course.

Specifications Grading also introduced me to a different paradigm to the student-teacher relationship. Students are not customers, Nilson argues, but clients. Specifications grading takes into account that different clients are going to aim at different outcomes. It makes the expectations clear for each tier and lets the client choose which package to pursue. In Nilson’s telling, this allows the teacher to dedicate the most energy to the students most invested in the course by dint of aiming at the top tiers.

This model is tempting given how frustrating it can be to expend disproportionate amounts of energy on reticent students, but it was also the point that left me most uncomfortable with specs grading. One common proposal in the sample syllabuses Nilson provides is setting not only different levels of proficiency, but also different assignments for the different tiers. I incorporated that into one of my sketches above for the final projects, but even there I have been wondering whether the non-project option ought to require an objective test passed at a certain proficiency since under specs grading—something I’m not wild about given that 1) I am skeptical about the value of such objective tests, period; 2) writing such a test would hand back some of the savings in time; 3) keeping track of who is doing what sounds like a lot of bookkeeping.

However, my discomfort with the different assignments for different levels stems is also philosophical. That is, it feels to me like saving time and becoming a better teacher for the invested students involves allowing students aiming at a “C” to fall behind. The counter, I think, is that this is in fact the point. The way I imagine this grading scheme working in my classes, those students would still be expected to attend and complete assignments for the whole semester and gives anyone who wants it the opportunity to achieve every objective. But if students are not interested, then it empowers them to put their energies elsewhere (courses, hobbies, work, whatever). In other words, the client model simple acknowledges the reality that teachers cannot force people to learn anything they don’t want to learn, particularly at the busiest time of the semester.

I have been thinking about the process as setting two different benchmarks: the “C” level for minimum objectives and the level of proficiency for complex objectives where “A” reaches it in every category and “B” reaches it in some. Specs grading dispenses with the murky ambiguity of partial credit where the “C” student allegedly achieved 75% of a given course objective. Thus, it isn’t the “C” student doing less work so much as they hit one set of objectives, while I am vouching that the “A” student has completed more and more complex work that allows me to certify that they have reached proficiency in the others—I can hope the “C” student developed in these other categories, but the grade makes no claim that they did so.

At this point I am ready to dive into specs grading head first, but I’m also sure that whatever system I come up with in the abstract will require adjustment once I get into a semester. So here’s the question for those of you who have used specs grading: what should I be on the lookout for? Is there anything I’m missing?

ΔΔΔ

I keep a list of pedagogy resources along with links to write-ups I have done on this blog.

Hawaii 5-O and “grading shows”

The anatomy of a grading show (defined as a show to have on in the background while grading) is a funny thing. For me they fall into two broad categories. The first are old and familiar shows. The writing, the stories, and the rhythms are familiar. They take no brainpower to watch while marking bluebook exams or multiple choice tests. The second also requires minimal brainpower, but because they are a sitcom or procedural for which the rhythms are familiar, even if the specifics are not. If the show proves to be too captivating then its purpose falters because grading slows down. Usually, this means that the show has to be something I want to see, but far, far on the crummy end of the spectrum.

This current batch of grading has been me watching the reboot of Hawaii 5-O. I’m most of the way through the first season and have a few thoughts on this curious show.

  1. Hawaii 5-O is a show about a special law enforcement task force in Hawaii, led by a former Navy Seal and consisting of outsiders and outcasts. Among other things, their leader, Steve, has returned home to help uncover the cause of his father’s death.
  2. The writing on this show are pretty bad. It is aiming for fast-paced, cryptic, and yet direct. The result is that everyone seems to have inexplicable skills and knowledge, not to mention an extreme unevenness to the plot. Rob Morrow, one of the stars of Numbers, gave an interview a few months ago where he talked about the tendency of that show to be overwritten. It was insufficient for details or information to be conveyed by visual imagery or physical acting alone, but had to be said three times. Morrow mentioned frustration with this and how he used to try to create a script that was more spare and efficient and therefore elegant. Hawaii 5-O has this same problem in spades, with most of the excess dialogue also being bad dialogue.
  3. The superficial premise of the show is pretty people in paradise meets law enforcement, not unlike, for instance, Burn Notice. However, for this core concept, there is a lot of paradise and, aside from the stars, very little in the way of pretty people. The show is far more interested in shoot-outs and set-piece action scenes than in scenery.
  4. Throughout this episode, I’m trying to figure out what the core of the show is. Burn Notice has the tension between patriotism and his being blacklisted (with a dash of family dysfunction). Numbers has the good-hearted rebuilding of sibling relationships and the bringing of family together. NCIS has its goofy office hijinks. This show has aspects of all of these tensions and is desperately trying to recreate these formulas that worked (at least to some degree) in past shows, without actually pulling it off because it does some of all of those. There is a family vendetta, a blacklisted cop, another who is having a custody spat with his ex-wife with whom he would like to get back together. Then there is the extra seasoning of everyone being trigger-happy, which seems to be trying to cover for the failures elsewhere.

    This violence also manifests itself in that the main characters are all-too willing to blatantly disregard most laws, including to torture suspects. The characters sometimes allude to this in the sense that the leader of the team is not himself a cop. There is too much else going on, including that these law enforcement officers are always in a rush to get to their next act of sanctioned vigilantism, but they seem to want the core of the show to be tension of having a Seal in a cop’s job. Of course, asking those questions requires better writing and a larger cast, so Hawaii 5-O is happy to use everything as a throwaway, moving along quickly enough that maybe nobody will notice.

  5. What follows from the last point is that there is a visual representation of a militarized law enforcement that takes the stance that almost everyone else is a victim waiting to happen. Frequently, this results in stern talking-tos. At most there are token references to people outside of the main cast of the show, passing mentions of that they should not be discharging weapons in public spaces, and remorse when the “good guys” cannot save someone.

An end of semester thought

Another semester come and gone, or almost. I have a student primed to come in an collect his final exam tomorrow and I am expecting a grade complaint to ensue, but the other context of this post is that I had a student email me last night or early this morning thanking me for being “stricter TA than the others” because it helped her mold her study habits, her reading, and her writing. The student who sent me that message was a delight to have in class (I actually enjoyed that entire section quite a bit, even if the classroom itself made me sometimes feel like Yuri Petrova while I taught), and I did appreciate the way that she phrased her statement that I was a hard-ass, suggesting that I had expectations about what the students should have prepared before class and what we needed to talk about in class rather than that I was a malicious grader.

In a sense this is another “grade inflation” piece following after “confessions” of grade inflators, a piece about grade compression” instead of inflation, this response to the slate Confessions piece, and this from the Harvard Crimson, dated June 5, 1997 that cites a controversy from four years earlier when a professor at Harvard said in the Harvard Magazine that the causes of grade inflation stem from affirmative action in 1969. The way this latest bout of frustration has swirled across social media† has seemed to strike a nerve with academics. People have stumped for their cause of choice, whether that they are not paid well enough to “waste” time arguing grades, standardized tests (and the ensuing results-based education), customer-model of higher education, the desperate need for good teaching evaluations to keep a tenuous employment,‡ etc. Each also has his or her own response…and no one has a feasible solution. What I have been thinking about, rather, is the aura of mystery that surrounds grades.⚔

I can only echo the frustration expressed elsewhere about the student demand for making the grades and what exactly the grades mean. I really don’t care about grades, even though I dutifully assign them throughout the semester, but, like most teachers, it is a dreaded activity. But I am musing about the perception of grades versus reality. In most of my sections my average test score ranges from about a 77 to an 82 simply based on the class makeup and parameters of the exam and caveats about small sample sizes apply–outlier sections will sometimes skew a little bit lower or a little bit higher than that general range and 81 or 82 is probably the most common average I have seen. Mind you that I am talking just about the tests, and there is usually between 10 and 40 percent of the grades that rely on written responses, attendance, etc, for which a student gets full credit simply for completing the task.± The result of these extra points are that students who follow through with the course work have a final grade somewhat higher than their test grades. Even when the students have read the syllabus, many assume that their grade is exactly as it reads on the tests (an observation, nothing more).

I also don’t particularly like to talk about overall course averages because there is a non-negligible chunk of the students who don’t come to class, miss tests, miss in-class quizzes, and don’t complete response papers…these are most of the students who fail the class. With those students in the equation, the course average may dip below that of the exams, but often pulls it back to even with them. Students who do the work are rewarded for it, those who don’t can sometimes float by on exams alone, but if their exams are borderline, slip below into failing range.

I TA for an intro American history class and have been an adjunct,¥ and rarely have full authority over my own course design and final grades, but my students usually walk away from my classes believing that I am a hard grader, and this is something I worry about. I am fine being known as a somewhat demanding instructor so long as it is coupled with the knowledge that I will reciprocate whatever effort the students put in and work with them to master the material. I would also like to be known as a fair grader, though I know that it is impossible to please everyone all the time. My fear boils down not to fairness, though, nor that I am some kind of boogieman set on the earth to terrorize students, but that my expectations are punishing my students. I do not believe this to be the case, but the recent talk of how other professors and other TAs grade makes me wonder–and in a system that prioritizes results over process, is it simply a cop-out to hide behind the syllabus outlining student responsibilities when they cry foul at the end of the semester because missing work has harmed their grade?

I tell myself that I am about average in terms of actual difficulty; I try to challenge my students every week knowing, but often not revealing until the very end of the course, that the students are doing “fine”§ in my class–hey, the grading parameters are in the syllabus. My students may believe me to be some sort of Devourer-of-GPAs, but in the final calculation doesn’t bear that out, even if I made them work to receive the desired grade.

Of course I could be the one bearing the brunt of the punishment from this perception since if I make it seem that I am not handing out top grades across the board–whether or not any possible “deficiency” (that which I call grading) is buttressed elsewhere in the grade–then the perception is that I am punishing students, keeping them from the sterling GPA that they want. Here perception, not reality, is what matters and a perceived lack of inflation/ease/compression/whatever is a sign of curmudgeonly vindictiveness and a signal that that instructor is the GPA-Devourer at fault for whatever bureaucratic issues the student faces. More directly, unless the student has been engaged with me throughout the semester they probably don’t know that they are doing better than the tests might indicate before they fill out their course evaluations.


† I love most things about Twitter, but its ability to enable internet pitch-fork mobs, ardent Jacobins, and devout Crusaders in defense of their perceived (and sometimes correct) injustices is terrifying.

‡ Of course, those evaluations come in before the final grade, so perception is everything. More below.

⚔ Many students say that they prefer multiple-choice, but the grades are actually lower on them, from which many levels of interpretation may be read.

± There may also be prompt-based papers the students have to complete, but they typically are in the same range as the exams and don’t change the calculation about amount of attendance/response/etc points.

¥ Not every student attended every class, but everyone did all the assignments, so I didn’t quite have this problem in that class.

§ Fine can mean that I don’t care about the grade, but in this context it really means that the student is doing much better than they think they are in terms of the overall grade.

Assorted Links

  1. When Philosophers Join the Kill Chain-An op-ed by Mark Levine in Al-Jazeera about Bradley Strawser, the philosopher who has been defending the moral imperative of done strikes. Levine is highly critical of Strawser, particularly in his attempts to defend the use of drones through the concepts of just war without considering the implications for actual people. Another academic is less than thrilled at Levine’s blunt use of philosophers, but agrees with his overall point.
  2. Remembering Gore Vidal: A Dying Breed– A blog post on the Economist that points out that Gore Vidal was a breed of public intellectual that is not commonly seen anymore.
  3. Court Rejects Assertion that ‘Tenure’ Means Continuous Employment-A law professor in Michigan was fired after she refused to teach the assigned courses, an act that has now been upheld through a court case and an appeal. I am not entirely clear on what the details of the case were, but it seems that she tried to make the claim that tenure entails continuous lifetime employment, something that the court explicitly did not uphold. It seems that this will just help define the parameters of behavior that warrants termination, but it is a definition that bears watching.
  4. Survival Strategy for Humanists: Engage, Engage– A piece in the Chronicle of Higher Education about how humanities can survive in the future. Not much new here, but it is nice that this sort of argument seems to be slowly picking up steam. The idea is that communication, writing, teaching skills need to be taught and then we should stop writing books that are utterly incomprehensible.
  5. Writers and readers on Twitter and Tumblr-An article on Slate that implies that “coddling” (my words) has a negative impact on art and artistry, so the feel good back-patting that takes place between authors and readers online only serves as a cheap form of therapy, but does not improve literature. I think that the author is not totally wrong.
  6. The “Immeasurable”– An enlightening graph about grading.
  7. As always, comments encouraged. What else is out there?