Paper: Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation

Page content

Listen to this article.

Problem

Large Language Models (LLMs) are known to harbor biases, but these are tricky to spot! Traditional methods of checking LLM outputs—looking at single responses or relying on automated metrics—often miss subtle biases hidden within the model’s probability distributions. This is because LLMs generate text stochastically; they don’t always choose the most likely word, and important bias might lurk in those less common generation paths.

Method

This paper introduces “TreeTracer,” a new visual analytics tool to tackle this problem. The core approach involves:

  1. Perturbation Analysis: They systematically swap out key terms (defined by an ontology) within input prompts.
  2. Aggregated Generation: TreeTracer gathers hundreds of different outputs from the LLM for each perturbed prompt.
  3. Hierarchical Structuring: These generations are organized into a syntax-aligned, hierarchical “tree” structure.
  4. Node Merging & Classification Awareness: The tool uses another language model to merge nodes in the tree based on their semantic similarity, taking classification tasks into account (likely to highlight bias related to specific topics).
  5. Sankey Diagram Visualization: Finally, a custom Sankey diagram is used to visualize these complex structures and allow for direct comparison between different contexts. Critically, it also uses “contrastive inference” to display counterfactual token probabilities - helping ensure observed patterns are truly representative of bias and not just random fluctuations.

Results & Limitations

According to the authors, TreeTracer allows for easier detection of LLM biases by visually comparing these tree structures created with different prompts. They’ve validated their tool through case studies comparing GPT-2 XL (an unaligned model) against Apertus models (which have been aligned using a constitutional framework).

However, it’s important to note that the abstract clearly states any visualization only represents a subset of the model’s learned behavior. This means there could still be biases hidden outside what TreeTracer visualizes, and the findings rely on the accuracy of the auxiliary language model used for node merging and classification-aware analysis.

Why It Matters

For data scientists and ML practitioners working with LLMs, this tool offers a new way to understand and potentially mitigate bias. The ability to visualize the nuanced probability distributions and identify less obvious biases could lead to fairer and more reliable applications of these powerful models. The focus on aggregated data and visual comparison makes it potentially accessible beyond just core NLP researchers.

References