Neshise Insights / guide

Reading a model card with a critical eye

A short guide to reading the documentation that ships alongside an AI model — what to look for, what to ignore, and what to ask if something is missing.

By Neshise

A model card is a document published by an AI lab to describe what a model is, what it was trained on, how it was evaluated, and where it should and should not be used. The format was proposed by Mitchell and colleagues in 2019, and most major labs now publish something in the genre.

Model cards are useful. They are also, sometimes, a way of looking transparent without being transparent. This guide is a short reading list of questions to bring to one.

Start with the intended-use section

Most model cards lead with a section on intended use. Read this first, and read it for what it excludes.

A line like “this model is intended for English-language tasks” is honest if the rest of the card supports it. It is concerning if the card later reports benchmark scores in twelve languages without acknowledging the gap between “supported” and “intended.”

Look for what the lab is willing to put in writing about where the model should not be used. Vague is a signal.

Ask what the evaluation set looks like

Every model card reports numbers. The interesting question is what was measured.

  • Was the evaluation set drawn from the same distribution as the training data?
  • Was it constructed after the training data was frozen, to reduce contamination?
  • Were the people the model is most likely to fail on represented in the eval?

If a model claims to work for accessibility use cases, the eval should include users of assistive technology, non-mainstream language varieties, and the kinds of inputs those users actually produce. Few cards meet this bar.

Look for what is missing

Two omissions are common and worth flagging:

  1. Energy and compute disclosures. A surprising number of cards still do not report the compute used to train the model. This matters for reproducibility and for environmental accountability.
  2. Bias evaluations beyond the obvious axes. Race and gender are often reported. Disability, age, dialect, and the intersection of these are often not.

If a card is silent on something you care about, that silence is itself information.

A small reading checklist

When we read a new model card at Neshise, we walk through six questions:

  1. What is the intended use, and what is excluded?
  2. What was the model trained on, and at what scale?
  3. What evaluations were run, and on what data?
  4. What known limitations are disclosed?
  5. What is not reported that you would expect to see?
  6. Who can you contact with concerns?

The last question is the one most often missing. A model card that does not name a person or team you can reach is, in a real sense, not yet documentation.


Reading a model card well takes ten or fifteen minutes. Building the habit is worth the time.