Specsitol: a semester reflection

I submitted grades a little over a week ago and promptly withdrew, exhausted, into a little fort with curtain walls made of novels. At least that’s what it felt like. The specific details might be exaggerated.

Several times I tried to break through the fog that had settled over my mind, but succeeded only in producing a silly post about pizza TV shows and the weekly varia post that I start compiling as soon as the previous week’s goes up. I could barely think about the semester that had just ended, let alone put those thoughts into any sort of coherent discussion.

Simply put, I had an exceptionally difficult semester, and one that rates among the very toughest I have ever experienced. Some issues stemmed from causes external to my classes (e.g. not getting some much-needed rest this summer and early semester indexing and proofing a book manuscript that put me perpetually behind), while others stemmed from things that happened in the classes, most of which I don’t want to talk about in this space because I don’t like talking about specific student activity in a public forum even when identifying details have been redacted, especially when there is nothing of universal value that can be gleaned by doing so.

Not every problem stemmed from these issues, of course.

Dissatisfied with traditional forms of grading, I dove headlong into the world of Specifications Grading for most of my courses this semester. To stick with the metaphor, I liked these waters but they also sent me crashing into the rocks.

The formula varied a little bit, class by class, but, in general I came up with a system where the students earned credit across four or five categories of assignments (e.g. journal entries, small assignments/participation, papers). Every assignment was graded using a bespoke rubric and it either met the standard and thus earned credit, or it did not. More work, and higher quality essays (the essay rubric had two tiers, one for basic competency and another for advanced) earned higher grades. To meet these higher standards, I allotted virtual tokens that the students could use either to revise their papers or turn work in late, pegging the number of tokens to the number of papers.

I entered the semester thinking that I had worked out a reasonably simple system that would give students the agency to decide what grade they were aiming for, make my expectations for each grade level clear, and provide in-semester flexibility that would allow students to do their best work. However, I had not anticipated that putting these assignments and expectations up front in the course would lead to cognitive overload for a significant number of students. In fact, I had a conversation in the final week of class with a student who said that this semester was much harder than the course they had taken the semester before even though the workload in the two courses was identical except that I had swapped one short weekly assignment for another. While there are other explanations why this student might have struggled with my course, I’m inclined to take the sentiment at face value because I saw evidence of the same struggle from other students who were struggling to interface with the information that I had provided in a way that made it harder to complete the work itself.

The core of this problem, I think is that many students were used to traditional grading schemes that allow students to muddle through to a passing grade without too much effort. By contrast, the system I devised required students to complete assignments in each category to a specified level in order to earn the grade. Passing my general education courses last semester did not require too much work, unless you simply neglected a graded category.

I am treating this as a messaging problem for now. Traditional grading schemes remain stupid and I’m not ready to abandon my attempt to find something better just yet.

However, the issue of students neglecting grade categories dovetailed with the tokens and flexible deadlines to create absolute chaos on my end. Here there were several intertwined issues.

Several semesters ago I developed a system for deadlines where students could receive an automatic extension by filling out a Google form before the due date. This policy has proven incredibly popular with my students. However, while I intend to keep it intact in some form, I am starting to question whether the system is having the intended effect. Rather than providing students the space to do their best work, I am finding that whatever grace I provide is filled by other classes with stricter deadlines such that my students wind up writing their papers at the last minute anyway, just several days later, and I had so many students taking the extension that it became a challenge to return papers in a timely fashion.

However, it was the tokens that turned this semester into a logistical nightmare. I set up the tokens anticipating that most would be used for revisions, knowing full well that revisions coming in at any point would cause some chaos. What I did not anticipate is that some portion of students would use most or all of their tokens to turn work in late. This meant that I had not only revisions, but also new work being turned in on no particular schedule throughout the semester, and I had difficulty keeping tabs on students who hadn’t turned in assignments, some of whom I knew were working on things and some of whom I did not.

Compounding these issues was, I think, a consequence of having a significant number of first year students. Anecdotally, from talking with friends who teach in high school, some students have been conditioned to think that flexible deadlines and the like mean that an assignment is optional. Or that whatever make-up assignment gets offered will be easier than the original assignment. As one explained:

“I’ll allow X to be redone/revised/resubmitted” is increasingly being taken as “I don’t need to do X, I’ll do the makeup Y later which will be easier anyway.”

This was obviously not what had been intended, but this collision of expectations and conditioning meant that I spent a significant amount of time amid the chaos of trying to grade everything just trying to track down missing work so that the students wouldn’t fail on those grounds. Oh, and I had 50% more students than I had in either semester last year.

Then there was the grading itself. I adopted a specifications system because it promised to offload some portion of the grading onto explicit rubrics where I could check the appropriate box. I loved not assigning grades to papers, but I quickly discovered several things that meant the system created just as much work as the mystery black box of traditional grading, if not more. The issues started because, I discovered, many students simply did not complete the assignments with the rubrics in mind and did not use the rubrics to check the work before submission. This meant that I often received work that did not fulfill the simplest rubrics.

These problems were particularly acute on the written assignments with its long, detailed rubric that should have provided guidance for the papers. I quickly realized that many of my students did not have the writing background to achieve the higher proficiencies, so simply checking the rubric box was not going to provide adequate guidance or encouragement. At the same time, while some students were not going to be aspiring to those grade tiers, I also couldn’t in good conscience provide detailed feedback for some students and not for others until the very end of the semester when the possibilities of revision had passed. By the last two weeks of the term it was clear that I would not be able to get caught up, so I offered that any student who wanted to revise their work could come to office hours and have their paper(s) marked in person so that they could receive feedback on how to meet the next tier. These meetings gave any student meant that (I think) any student aiming for higher grade tiers reached them, but they also meant that those weeks were a whirlwind of paper conferences.

Finally, my small assignments policy put a cherry on top of this disaster sundae.

The policy was simple. There were some number of small papers, in-class activities, exit-tickets, one-minute essays, and other activities that took place in class. If you weren’t there, you couldn’t make up the work. Unless you were an athlete at a competition. Or you got sick. Or had other “excused” absences. Right from the start, I found myself litigating what counts as a legitimate absence, which is one of my least favorite parts about taking attendance. Then, like with non-completion of work, I found myself around the middle of the semester worried about the number of students who seemed liable to fail (or otherwise drop grade tiers) because they had failed to adequately participate in the class. Since the opportunities for these points often did not come at regular intervals, I found myself inventing “optional extra” opportunities that would allow the students to bring their grade in that category up, which, in turn, created confusion about what assignments students actually needed to complete. Often, the students who completed the optional assignments were not the ones I had in mind when I created them. And, of course, adding all of these small assignments created a flurry of paperwork that I had to manage.

Chaos.

I should point out that for a non-negligible percentage of my students this system worked exactly as I envisioned, giving them agency to achieve grades based on their goals for the semester. Had I not felt compelled to give the students aiming for the “C” the same level of feedback I gave to those aiming for an “A,” my grading might have even been manageable—but, of course, almost everyone said that they were aiming for an “A” back in August.

I am not ready to abandon this grading mode, just yet, but it needs to be modified in critical ways for it to become sustainable and productive. The changes I have in mind to this point are:

  • Streamline my messaging and expectations. This means not only being clear about my expectations in terms of earning credit across multiple categories, but also clarifying that this is a labor-based grading scheme. It is designed to be transparent and achievable, but not necessarily easy. At the same time…
  • I want to submerge the mechanics of the participation grade. Some of the chaos this semester was created by the various points that students earned for doing in-class activities, which meant that this was something I had to track. I am not planning to change the activities that I do for small assignments, but my current thought for this category is to take a page out of the “ungrading” playbook. Instead of me assigning grading, the students will complete three reflections, one at the start, one at the middle, and one at the end of the term. The first one will set expectations and think about where they are at the start of the course. The middle two reflections will both have the students assign themselves a percentile grade for their own engagement with the course material. I will then plug the final percentile grade into a formula that adds or subtracts points based on attendance and maybe what percentage of small assignments they complete where perfect participation and attendance adds to score, a range results in no change, and excessive missed classes and activities results in lost points. I see a number of ways that this could go horribly wrong and I’m still working out the kinks, but it would also relieve the demand for me to track so many different assignments or create “optional” work.
  • I am going to rewrite the longer rubrics both to make them easier to follow and so that the students can explicitly use them as checklists. Similarly, I am going to print these rubrics and distribute them directly to my students.
  • Ditto for handouts on things like writing. I provide a lot of resources for the skills that I ask the students to master in these classes, but I find that even when directing students to them via presentation in front of the class, they are not being used because most students forget that they are there. I remember sticking handouts into my backpack never to be seen again, but at least having been handed a physical copy of something might help jog memories.
  • I am changing the token system. Tokens will only be used for turning in assignments late and probably limited to just 2, with a reward to the participation grade for every token left unused. Revision will be limited to the papers, but allowed for every paper, albeit probably with firmer deadlines for when a first round of revisions need to be complete.
  • Since none of this addresses how much time I spent responding to individual papers this semester, I am also likely going to lean more heavily on the language in the rubric and invite students looking to revise their papers to higher levels of achievement to come for conferences earlier in the semester.

Looking over these changes, there are still parts of this system I am concerned about. The ungrading formula, for instance, is an awkward beast to explain in the syllabus and it could lead to uncertainty about how the various non-paper assignments contribute to their grade. But I also think that there is a real possibility that these changes might be able to preserve what I liked about last semester while also steering into the sorts of written and metacognitive exercises that I find particularly valuable for students in a way that will make it a more sustainable and productive learning environment for everyone involved.

Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time

I don’t like grades.

As a student, I oscillated between taking anything but superlative grades as a sign of my own failure and being utterly indifferent to grades as a secondary consideration to learning the material. Either way, grades were an imperfect motivator.

As a teacher, I am even more ambivalent about grades, which I see as something I am required to do in order to rank my students. I am always prouder of a student who struggles and reaches a breakthrough than the genius who coasts through the course, even though the latter receives the higher grade. My own experience as a student informs how I structure my courses, leading to policies that encourage regular engagement, choice in how to complete assignments, emphasis on the process over product, and often opportunities for revision. Each of these course policies marked an improvement, but they all retained the thing that I was in many ways least satisfied with: grades.

A few weeks ago a faculty development seminar introduced me to the broad strokes of Specifications Grading and since it seemed like the direction I have been moving my courses, I spent nearly an hour after the event jotting down preliminary notes for what that might look like in my course. At the end of that day I was intrigued, but needed more information. Over my spring break, therefore, I read Linda Nilson’s Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time (Stylus 2014).

Broadly speaking, Specifications (Specs) Grading is a variation on a pass-fail, contract grading, and competency-based outcomes that ties course assignments to specific course objectives. This model, Nilson argues, has three major benefits. First, setting a high bar for “acceptable” work but giving opportunities for revision imposes rigor without making the professor into a jerk. Second, demystifying the grading process and offering flexibility reduces stress on the students. Third, eliminating partial credit saves time. Some model systems presented a fourth potential benefit of allowing teachers to give more of their limited attention to those students aiming for the higher grades.

In addition to an argument for its benefits, Specifications Grading serves as a guide to adapt traditional grading models to a specs system across two broad categories: outcomes and assignments/rubrics.

If you’re anything like me, you course outcomes won’t work for specs grading. Nobody ever really taught me how to write objectives so what I have in my syllabuses focus on what the students will receive. The conceit of an objective might be well-intentioned, but if the students can’t demonstrate what they are learning through the assessments, then it won’t work. Often this just means a subtle, but significant shift:

  • Students will gain a broad understanding of US history since 1877.
  • Students will be able to identify the major events of American history since 1877

Each of these objectives would then be demonstrated specifically by one or more course assessments. In Nilson’s model, some of these course objectives would correspond to basic, minimal standards like the one listed above. Students who achieve proficiency at those lower-level objectives would be able to pass the course with a C, while students at aiming for a B or A would have to also demonstrate proficiency at objectives that involve more complex skills.

The second part step involves developing detailed one level rubrics that explain everything that the assignment must have to be accounted “proficient.” Now there will be some variability in what that standard should be, but Nilson recommends building the rubric from everything you would expect to see in a roughly B+ assignment. When it comes time to grade the assignments, then, the assessment becomes a binary yes/no, along with some comments that might be used if, as Nilson recommends, the students get the chance for revision.

I have traditionally had an antagonistic relationship with most rubrics because most of the rubrics I have been required to use were a particularly poor match for how I wanted to grade such that someone who received 9/12 on the rubric was solidly in the B+ range according to how I grade. However, I found myself coming around to this model of rubric because it removes the splitting hairs and partial credits in favor of either showing that the students achieved proficiency or did not. The grade translation, in turn, does not come from an individual rubric but from how many assignments in which the student achieved proficiency.

and have been jotting down notes on how I can transform my existing courses with minimal disruption to anything but how I grade.

For my general education classes the assignments might look like (based on a syllabus for this semester):

To receive a “C” in this course (linked to the lowest tier of objectives)

  • Participation [in various forms] of 75%
  • Objective quiz score of 75% [I allow retakes and drop a quiz score, so I have exactly 2 students who are not clearing this bar right now]
  • Journals 10/15
  • Papers 5/5 completed, but not to “proficiency” with historical essay writing

To receive an “A”:

  • Participation of 95%
  • Objective quiz score of 90%
  • Journals 13/15
  • Papers 5/5 to proficiency
  • Completing a final project

The “B” range would obviously fall somewhere in between these two levels, with a “D” a little below “C.” The numbers might be off a little bit, but I would calibrate them based on what my final grade sheet looks like.

For my upper-level classes that are writing intensive and where the students complete three longer essays, a “C” may require revising one of the three essays to proficiency, “B” requires two, and “A” all three. For all of these classes, I am also toying with the idea of creating a list of “recommended” books for the course and allowing any student the opportunity to choose and review one of these books in place of one “proficient” paper—with guidelines for what constitutes an acceptable review, of course.

Specifications Grading also introduced me to a different paradigm to the student-teacher relationship. Students are not customers, Nilson argues, but clients. Specifications grading takes into account that different clients are going to aim at different outcomes. It makes the expectations clear for each tier and lets the client choose which package to pursue. In Nilson’s telling, this allows the teacher to dedicate the most energy to the students most invested in the course by dint of aiming at the top tiers.

This model is tempting given how frustrating it can be to expend disproportionate amounts of energy on reticent students, but it was also the point that left me most uncomfortable with specs grading. One common proposal in the sample syllabuses Nilson provides is setting not only different levels of proficiency, but also different assignments for the different tiers. I incorporated that into one of my sketches above for the final projects, but even there I have been wondering whether the non-project option ought to require an objective test passed at a certain proficiency since under specs grading—something I’m not wild about given that 1) I am skeptical about the value of such objective tests, period; 2) writing such a test would hand back some of the savings in time; 3) keeping track of who is doing what sounds like a lot of bookkeeping.

However, my discomfort with the different assignments for different levels stems is also philosophical. That is, it feels to me like saving time and becoming a better teacher for the invested students involves allowing students aiming at a “C” to fall behind. The counter, I think, is that this is in fact the point. The way I imagine this grading scheme working in my classes, those students would still be expected to attend and complete assignments for the whole semester and gives anyone who wants it the opportunity to achieve every objective. But if students are not interested, then it empowers them to put their energies elsewhere (courses, hobbies, work, whatever). In other words, the client model simple acknowledges the reality that teachers cannot force people to learn anything they don’t want to learn, particularly at the busiest time of the semester.

I have been thinking about the process as setting two different benchmarks: the “C” level for minimum objectives and the level of proficiency for complex objectives where “A” reaches it in every category and “B” reaches it in some. Specs grading dispenses with the murky ambiguity of partial credit where the “C” student allegedly achieved 75% of a given course objective. Thus, it isn’t the “C” student doing less work so much as they hit one set of objectives, while I am vouching that the “A” student has completed more and more complex work that allows me to certify that they have reached proficiency in the others—I can hope the “C” student developed in these other categories, but the grade makes no claim that they did so.

At this point I am ready to dive into specs grading head first, but I’m also sure that whatever system I come up with in the abstract will require adjustment once I get into a semester. So here’s the question for those of you who have used specs grading: what should I be on the lookout for? Is there anything I’m missing?

ΔΔΔ

I keep a list of pedagogy resources along with links to write-ups I have done on this blog.