Framing Analytics With Large Language Models In Congressional Transcripts
Large Language Models (LLMs) enable detailed analysis of thematic structures in political text, but reproducible validation of frame extraction remains limited. This study examines Chilean congressional debates and develops a validation methodology based on frame-evidence coherence. An LLM extracts frames with supporting evidence quotes, mapping them to a nine-category taxonomy, while a rule-based regex extractor provides a baseline. Frame quality is validated with an LLM-as-a-judge approach that rates coherence between frames and evidence on a 1--5 scale. This approach has a high inter-rater reliability with ICC(2,1)=0.927 for single judges and ICC(2,k)=0.975 for averaged scores. The mean coherence score of 4.06+-0.82 indicates a high alignment between frames and their corresponding evidence. Perturbation tests confirm the validity of the coherence score, as this metric drops significantly when evidence is mismatched (Delta=2.15, p<0.001) or paired with wrong frames (Delta=2.57, p<0.001). Finally, a multi-model comparison between LLMs shows that generally larger models achieve higher extraction consistency with a Jaccard similarity of 0.91 for GPT-5.
