In 2012, I graded hundreds of papers as a teaching assistant. This was a time-consuming and subjective process, and I often spent days grading just to meet deadlines.

Large language models (LLMs) like GPT-5 are transforming education, including the grading of written assignments. LLMs provide an efficient and scalable way to assess students' writing while also generating personalized feedback. However, my experience using LLMs for grading has been unreliable for the following reasons:

  1. LLMs can produce very different grades for the same work, undermining fairness and reliability in the assessment process.
  2. LLMs generate outputs based on probabilistic patterns rather than genuine reasoning or qualitative judgment.
  3. As black-box systems, LLMs make it difficult to explain why points were deducted.

Because of these challenges, I decided not to rely solely on LLMs for grading. Still, I have found them extremely useful for drafting grading comments (feedback). This post demonstrates how to use the gpt-5-mini model via the OpenAI API to generate feedback in batch.

🤖 The Role of LLMs in Providing Feedback

LLMs can analyze text based on predefined grading rubrics, evaluating assignments for factors such as coherence, argumentation, grammar, and style. Using an LLM model to only draft grading comments lets me...

  • Provide detailed feedback: LLMs can highlight areas for improvement in structure, logic, and language use.
  • Ensure consistency: Unlike me and other human graders, AI does not suffer from fatigue or bias, making its evaluations more consistent.
  • Save time: Instructors and graders can focus more on instruction and student engagement rather than spending excessive hours on grading.
  • Provide individualized feedback: AI can tailor comments to a student's specific weaknesses and strengths.

📜 Code

This notebook iterates through a set of .pdf and .docx files in specified directories, extracts the text from each file, and then uses the OpenAI API to generate feedback based on a rubric. The feedback is saved to a .txt file with the same name as the original submission file.

Import Packages

In [1]:
import glob
import os
from pathlib import Path
from openai import OpenAI
from docx import Document
import PyPDF2
import markdown
from bs4 import BeautifulSoup
import uuid
import re
  • glob: Used to find files matching a specified pattern.
  • os: Provides a way of using operating system dependent functionality.
  • pathlib.Path: Offers a way to handle file paths.
  • openai.OpenAI: OpenAI's Python library to interact with the OpenAI API.
  • docx.Document: Used for reading .docx files.
  • PyPDF2: Used for reading .pdf files.
  • markdown: Used to convert Markdown text to HTML.
  • bs4.BeautifulSoup: Used to parse HTML and extract text. BeautifulSoup combined with the markdown package is used to strip Markdown formatting characters from the returned text.
  • uuid: Used to generate unique user IDs to avoid cached responses.
  • re: Used for regular expressions.

Initialize an OpenAI Client

Read the OpenAI API key from a file named openai-api-key.txt.

In [2]:
OPENAI_API_KEY = None

with open('openai-api-key.txt', 'r') as f:
    OPENAI_API_KEY = f.read()
In [3]:
client = OpenAI(api_key=OPENAI_API_KEY)

Select Student Submission Files

Uses glob to find all .pdf files in the target directory.

In [4]:
# find both .pdf files
submission_files_path = "submissions/"
supported_file_extensions = [".pdf"]

files_to_grade = []

for ext in supported_file_extensions:
    glob_path = os.path.join(submission_files_path, f'*{ext}')
    
    files_to_grade.extend(
        glob.glob(
            os.path.join(submission_files_path, f'*{ext}')
        )
    )

files_to_grade
Out[4]:
['submissions/submission-01.pdf']

Define Helper Functions

Create a function to strip Markdown formatting literals.

In [5]:
def strip_markdown(md_text):
    # Convert Markdown to HTML
    html = markdown.markdown(md_text)
    # Parse the HTML and extract text
    soup = BeautifulSoup(html, "html.parser")
    return soup.get_text()

Test the strip_markdown() function.

In [6]:
print(strip_markdown('**bold**\n_italic_\n### Heading 3'))
bold
italic
Heading 3

Create a function to replace three or more consecutive line breaks with just two line breaks.

In [7]:
def replace_extra_linebreaks(text):
    return re.sub(r'\n{3,}', '\n\n', text)

Create a function to replace curly quotes with standard double quotes.

In [8]:
def standardize_quotes(text):
    standardized_text = re.sub(r'[“”]', '"', text)
    standardized_text = re.sub(r'[‘’]', "'", standardized_text)

    return standardized_text

sample_text = "“It’s a ‘test’,” she said."
print(f"Before: {sample_text}")
print(f"After: {standardize_quotes(sample_text)}")
Before: “It’s a ‘test’,” she said.
After: "It's a 'test'," she said.

Create a rubric

Define the grading rubric as a multi-line string. This rubric is used to instruct the AI on how to provide feedback.

In [9]:
rubric = """
1. Content
Does the student evidence a reasonable understanding of the issues in the case?

- Clearly answer all of the questions asked in the required/questions section. Each sentence in the required section may have multiple questions—answer each one. Clearly label each answer; please do not make me search for them.
- Include very little or no background summary of the case itself. Assume I have a good understanding of the background.

2. Judgment and Originality
Is the student able to go beyond the facts of the case, evidencing the ability to state and support independent professional judgment?

- What I look for in an exceptional paper is the ability to go beyond the basic facts in the case. Put original thought, analysis, and examples into your responses. Do not simply restate the case or the case’s arguments. Come up with original ideas or original reasoning and then defend that reasoning.
- Often, there is no one correct answer in these cases. If your answers are clearly stated and adequately supported, they will be “right”.

3. Writing/Grammar

- Proofread your paper. When feasible, have someone else read your paper as well.
While you should always spell check, this is often not enough.
Correct English usage entails more than simply correct spelling.

4 Professional Impression

Is the paper well-organized and easy to follow?
Would it make a positive impression on a business professional?

- Business readers are constrained by time so business writing should be brief. Your reader should understand your ideas as quickly and painlessly as possible.
- Business writing is concise and to the point. Rambling or stream-of-consciousness type prose is not appropriate. Use headings to make your writing concise and readable. Use bullet-point lists where appropriate.
- You usually see answers/recommendations in business writing at the beginning of the paper. In business, we are not telling a murder mystery where the answers come at the very end.
  - Each paragraph should start with the main point of the paragraph. The rest of the paragraph should be your proof or evidence for that main point. Focus your writing on the defense of your clearly stated answers.
  - For example, with very few exceptions, I should be able to understand the answers/recommendations in your paper by reading only the first paragraph of your paper and only the first sentence of each subsequent paragraph.
"""

🔄 Iterate Through Each File and Generate Feedback

In [10]:
num_files = len(files_to_grade)

for i in range(num_files):
    fp = files_to_grade[i]
    
    print('======================')
    print(f"Grading file: {fp}")
    
    path = Path(fp)

    feedback_txt_path = path.with_name(path.stem + "-gpt-feedback").with_suffix(".txt")
    
    # skip if the feedback file has already been generated
    if feedback_txt_path.exists():
        print(f"{feedback_txt_path} already exists")
        print('SKIPPING')
        continue
    
    submission_file = client.files.create(
        file=open(path, "rb"),
        purpose="user_data"
    )

    completion = client.chat.completions.create(
        model="gpt-5-mini",
        messages=[
            {
                "role": "developer", 
                "content": f"You are a teaching assistant for a University course. \
                Use the following writing guidelines to provide comments:\n\n{rubric}"
            },
            {
                "role": "user",
                "content": [
                    {
                        "type": "file",
                        "file": {
                            # only works with a PDF file
                            "file_id": submission_file.id,
                        }
                    },
                    {
                        "type": "text",
                        "text": "Can you provide feedback on this written report? \
You are not grading the paper, only providing comments. \
Do not use markdown formatting or syntax such as ** or ###. \
Focus on mostly positive feedback, but include one or two areas for improvement if applicable."
                    }
                ]
            }
        ],
        user=str(uuid.uuid4())  # pass a unique user ID per request to avoid cached response
    )

    gpt_feedback = completion.choices[0].message.content

    # strip markdown format
    gpt_feedback = strip_markdown(gpt_feedback)

    # replace three or more consecutive line breaks into two
    gpt_feedback = replace_extra_linebreaks(gpt_feedback)

    # convert “” quotes to ""
    gpt_feedback = standardize_quotes(gpt_feedback)

    with open(feedback_txt_path, 'w') as feedback_file:
        feedback_file.write(gpt_feedback)

    print(f"Completed {i + 1} out of {num_files} submissions ({round(i + 1 / num_files * 100, 1)}%)")

print("**********************")
print(f"Complete")
======================
Grading file: submissions/submission-01.pdf
Completed 1 out of 1 submissions (100.0%)
**********************
Complete

💬 Sample Output

The gpt-5-mini model provided the following feedback.

Overall impression
- Strong, practical report that clearly targets a business audience. The paper concisely explains the selected use case (market research and summarization) and ties capabilities to business workflows and risks. The tone and examples (e.g., using 10-Ks, analyst reports, and earnings calls) make the write-up relevant and usable for managers.
Content (strengths)
- Clear coverage of the chosen use case: pages 2–3 explain industry/competitor analysis, consumer insights, and summarization well and relate each to real business needs.
- The team tested the tool (page 3: summarizing 10-Ks, analyst reports, earnings calls) — that original testing is a major plus because it moves beyond literature summary to applied evaluation.
- The risks and limitations section (page 4) shows appropriate caution about overreliance, accuracy, and the need for human validation.
Judgment and originality (strengths and one suggested improvement)
- Strength: The report goes beyond basic description by describing how ChatGPT can be integrated into workflows (e.g., turning results into visuals and drafting follow-up emails), which demonstrates useful professional judgment.
- Improvement: Expand the original analysis of limitations. For example, quantify or give concrete examples from your tests where ChatGPT produced outdated or incorrect information. A brief example (one or two concrete test cases) would strengthen your independent judgment and make the recommendations more persuasive.
Writing / grammar (strengths and two small fixes)
- Strength: Language is generally clear and businesslike; sections are focused on practical outcomes (what the tool can do and potential impacts).
- Fix 1: Be consistent with product naming: you use both "Chat GPT" and "ChatGPT"; pick one form (ChatGPT) and use it consistently.
- Fix 2: Minor punctuation/usage: change "10-K’s" to "10-Ks" (no possessive). Also check the TechRadar URL on page 5 — it appears to contain a stray character (an odd symbol before "explained").
Professional impression / organization (strengths and one suggestion)
- Strength: The paper is concise and oriented toward a business reader. The recommendation-style sentences and practical examples make it easy for a manager to act on.
- Suggestion: Add a one-paragraph executive summary at the top (2–3 sentences) that states the main conclusion and recommended next steps (e.g., "Use ChatGPT for rapid summarization and initial trend detection, but require human validation and documented sourcing for any decisions"). This gives busy readers immediate guidance.
References and sourcing (note)
- Good that you cite multiple sources (Harvard Business School, TechRadar, Wikipedia, OpenAI).
- Suggestion: Improve citation formatting and ensure links are clickable and accurate. Also be cautious about asserting specific model versions (e.g., "GPT‑5") without corroborated, dated sources — hedge language or verify the exact model and capabilities at time of writing.
Summary of key actionable improvements (prioritized)
1) Add one or two concrete examples from your testing (specific errors or summarization outputs) to support claims about accuracy/limitations.
2) Add a short executive summary and tighten naming/citation consistency (ChatGPT vs Chat GPT; fix URL character; consistent citation format).
Overall, well done: practical, well-targeted, and useful for business readers. With a couple of concrete examples from your tests and small editorial cleanups, this will be an even stronger, more persuasive report.

Explaining the code

  • Iterate through each file in files_to_grade.
  • Construct the file path for the feedback .txt file.
  • Skip the file if the feedback file already exists.
  • Upload the PDF file to OpenAI using File inputs API (client.files.create()).
  • Call the OpenAI API to generate feedback:
    • Specify the gpt-5-mini model.
    • Set up a conversation with two roles:
      • developer: Define the AI's role as a teaching assistant and provides the rubric.
      • user: Ask for feedback on the report, instructing the AI to avoid markdown, provide positive feedback, and suggest areas for improvement.
    • Include the file.
    • Generate a unique user ID for each request.
  • Process the feedback from the OpenAI API:
    • Strip Markdown formatting.
    • Replace extra line breaks.
    • Standardize double quotes.
  • Save the feedback to a .txt file.

Create a function to extract text from a .docx file, reading each paragraph and joining them with newline characters.

💡 What about Microsoft Word documents?

OpenAI's file input API only allows PDF files. If the submissions include other unsupported file formats like Microsoft Word documents, you have to extract the text before calling the chat.completions API.

However, this approach does not yield as good results as using the file input API. This is because the manual extraction method will introduce spacing issues to the input text. As a workaround, I've included this prompt:

Because the text has been extracted from Word documents and PDF files using Python, they will contain inconsistent spacings. Please ignore them.

Select Student Submission Files

Uses glob to find all .docx files in the target subdirectories.

In [11]:
# find Word Document files
submission_files_path = "submissions/"
supported_file_extensions = [".docx"]

files_to_grade = []

for ext in supported_file_extensions:
    glob_path = os.path.join(submission_files_path, f'*{ext}')
    
    files_to_grade.extend(
        glob.glob(
            os.path.join(submission_files_path, f'*{ext}')
        )
    )

files_to_grade
Out[11]:
[]
In [12]:
def extract_text_from_docx(file_path):
    doc = Document(file_path)
    return "\n".join([para.text for para in doc.paragraphs])

You can optionally extract text from PDF files to reduce the number of input tokens. Create a function to extract text from a .pdf file, reading each paragraph and joining them with newline characters.

In [13]:
def extract_text_from_pdf(file_path):
    with open(file_path, "rb") as file:
        reader = PyPDF2.PdfReader(file)
        
        # Extract text from all pages
        text = ""
        for page in reader.pages:
            text += page.extract_text()
            
        return text

🔄 Iterate Through Each File and Generate Feedback

In [14]:
num_files = len(files_to_grade)

for i in range(num_files):
    fp = files_to_grade[i]
    
    print('======================')
    print(f"Grading file: {fp}")
    
    path = Path(fp)

    feedback_txt_path = path.with_name(path.stem + "-gpt-feedback").with_suffix(".txt")
    
    # skip if the feedback file has already been generated
    if feedback_txt_path.exists():
        print(f"{feedback_txt_path} already exists")
        print('SKIPPING')
        continue
    
    if path.suffix == '.docx':
        text = extract_text_from_docx(fp)
    else:
        # for unsupported file types, do nothing
        continue

    completion = client.chat.completions.create(
        model="gpt-5-mini",
        messages=[
            {
                "role": "developer", 
                "content": f"You are a teaching assistant for a graduate-level accounting course. \
                Use the following writing guidelines to provide comments:\n\n{rubric}"
            },
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Can you provide feedback on this written report? \
You are not grading the paper. You are only providing comments. \
Do not output markdown format. Avoid using markdown syntax like ** and ###. \
Because the text has been extracted from Word documents and PDF files using Python, \
they will contain inconsistent spacings. Please ignore them. \
Provide mostly positive feedbacks, with one or two areas of improvement if there are any. \
Here is the full report text:\n\n" \
+ text
                    },
                ]
            }
        ],
        user=str(uuid.uuid4())  # pass a unique user ID per request to avoid cached response
    )

    gpt_feedback = completion.choices[0].message.content

    # strip markdown format
    gpt_feedback = strip_markdown(gpt_feedback)

    # replace three or more consecutive line breaks into two
    gpt_feedback = replace_extra_linebreaks(gpt_feedback)

    # convert “” quotes to ""
    gpt_feedback = standardize_quotes(gpt_feedback)

    with open(feedback_txt_path, 'w') as feedback_file:
        feedback_file.write(gpt_feedback)

    print(f"Completed {i + 1} out of {num_files} submissions ({round(i + 1 / num_files * 100, 1)}%)")

print("**********************")
print(f"Complete")
**********************
Complete

Explaining the code

  • Iterate through each file in files_to_grade.
  • Construct the file path for the feedback .txt file.
  • Skip the file if the feedback file already exists.
  • Extract the text from the file based on its extension (.docx or .pdf).
  • Call the OpenAI API to generate feedback:
    • Specify the gpt-5-mini model.
    • Set up a conversation with two roles:
      • developer: Define the AI's role as a teaching assistant and provides the rubric.
      • user: Ask for feedback on the report, instructing the AI to avoid markdown, provide positive feedback, and suggest areas for improvement.
    • Include the uploaded PDF file's file.id.
    • Generate a unique user ID for each request.
  • Process the feedback from the OpenAI API:
    • Strip Markdown formatting.
    • Replace extra line breaks.
    • Standardize double quotes.
  • Save the feedback to a .txt file.

✅ Closing Thoughts

LLMs like GPT-5 show great promise in supporting teaching, but they are not yet reliable enough to replace human judgment in grading. Their real strength lies in drafting feedback, spotting patterns, and reducing the repetitive workload for instructors. By combining automation with human oversight, we can achieve both efficiency and fairness, ensuring that feedback remains constructive and student-centered.