Guidelines
Suggest edit

Expose a distribution summary for quantitative attributes

For grouped-result distribution analysis of quantitative data, use interaction support for distribution summaries in an information visualization system to improve insight and address judgments about normalcy or case location within a spread for analysts.

  • purpose:refine
  • basis:empirical
  • task:distribute
  • scope:grouped-result
  • data:quantitative
  • quality:insight
  • lever:interaction-access
  • audience:analyst

advice

Add a distribution view

Add a distribution summary for a quantitative attribute over the current set of cases. For example, show the spread of calories, ages, or film lengths across a selected set so viewers can judge what is typical and where one case sits within that spread.

reason

Why distribution matters beyond single values

A spread tells viewers what is normal in the data and where a case falls relative to that normal pattern. Some seemingly simple comparison questions are actually questions about position within a full distribution.

Mechanism: Distribution summaries support judgments about normalcy and relative position that cannot be answered from one exact value or one aggregate alone.

Evidence: Characterize Distribution is defined as describing the distribution of a quantitative attribute over a set of cases; the taxonomy states that users use distribution to understand normalcy versus anomaly and notes that some comparison questions are really questions about location within a distribution (Amar et al., 2005).

context

Use when the user needs spread or typicality

  • User Goal: Understand what values are typical, how values are spread, or where one case falls relative to others.
  • Task: Characterize the distribution of a quantitative attribute over a set.
  • Data: A quantitative attribute measured across multiple cases.
  • Chart Setting: An information visualization system that can operate on a selected subset of cases.
  • Audience: Analysts exploring a dataset for structure and normalcy.
  • Success Criterion: The system reveals the spread of values, not only exact lookups or one-number summaries.

exceptions

Do not use a distribution summary for span-only questions

Break it when: The user needs only the span of values or the list of unique values. Why: That is a range task, not a full distribution task.

costs

Tradeoffs of a distribution summary

Sacrifice: The output summarizes a set of values rather than returning one exact case value.
Risk: A distribution summary alone may not identify the exceptional cases that motivate follow-up.
Mitigation: Use a separate anomaly operation when the goal is to find exceptions.

mistakes

Common failure modes

Mistake: Replacing the full spread with only a single average when the real question is “how does this case compare to others.” Why it fails: That question is about location within a distribution, not only about the mean.

check

How to test distribution support

Failure Sign: Reviewers can read individual records or a single aggregate, but not the spread of a quantitative attribute.
Quick Check: Ask for the distribution of one quantitative attribute on a selected subset.
Stronger Test: Select one case and verify that the system can place it relative to the current spread.

fix

What to change

  • Add a distribution summary that updates on the current set of cases.
  • Let viewers inspect where a selected case falls within the current spread.
  • Add a separate anomaly operation if the real goal is to identify exceptions.

References

Amar, R., Eagan, J., & Stasko, J. (2005). Low-level components of analytic activity in information visualization. IEEE Symposium on Information Visualization, 2005. INFOVIS 2005., 111–117. https://doi.org/10.1109/INFVIS.2005.1532136