Hazel Kim

I am a DPhil student in computer science at the University of Oxford, conducting research in Natural Language Processing and Machine Learning. I am grateful to be advised by Philip Torr and Yarin Gal from Oxford and Hinrich Schütze from LMU Munich as a student of the European Laboratory for Learning and Intelligent Systems (ELLIS Society).

I've served on the program committee in AAAI 2023-2026, and as a reviewer for ACL ARR 2023-2025 and NeurIPS 2025.

Language Acquisition from Limited Sources: Language models shape the world with given data. They work well with quality and abundant data while providing biased perspectives with scarce, poor data. However, quality data is often expensive to collect or private to get access. This challenge has inspired me to study how to encourage language models to acquire language proficiency and reasoning ability from a limited quantity or quality of resources.
Causal Inference, not Correlation: The heavy reliance on correlations between input observation and output predictions limits language models to reason about cause-and-effect relationships that are not explicitly present in the text. High correlations do not always uncover the causality of events. Causal inference is a crucial topic to control language models against biased data.
Information Quantification: Computation creates valuable information. The usable information for models varies depending on their interaction between input and output resources. Exploring how to quantify information is essential for interpreting model behaviors.
Emergent Knowledge of Language Models: Language models emerge new knowledge that is not present in smaller models when the scale grows. The emergent knowledge is a valuable yet underexplored resource. This has motivated me to investigate them regarding the model capabilities of how trustworthy they are for transparent usage in real-world scenarios.

Publications

Selected and Recent Papers

Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias PDF

Hazel Kim, Philip Torr

Preprint

Measuring what Matters: Construct Validity in Large Language Model Benchmarks PDF

Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou, Franziska Sofia Hafner, Harry Mayne, Jan Batzner, Negar Foroutan, Chris Schmitz, Karolina Korgul, Hunar Batra, Oishi Deb, Emma Beharry, Cornelius Emde, Thomas Foster, Anna Gausen, María Grandury, Simeng Han, Valentin Hofmann, Lujain Ibrahim, Hazel Kim, Hannah Rose Kirk, Fangru Lin, Gabrielle Kaili-May Liu, Lennart Luettgau, Jabez Magomere, Jonathan Rystrøm, Anna Sotnikova, Yushi Yang, Yilun Zhao, Adel Bibi, Antoine Bosselut, Ronald Clark, Arman Cohan, Jakob Nicolaus Foerster, Yarin Gal, Scott A. Hale, Inioluwa Deborah Raji, Christopher Summerfield, Philip Torr, Cozmin Ududec, Luc Rocher, Adam Mahdi

NeurIPS 2025 Datasets & Benchmarks

Detecting LLM Hallucination through Layer-wise Information Deficiency PDF

Hazel Kim, Tom A. Lamb, Adel Bibi, Philip Torr, Yarin Gal

EMNLP 2025

How Ambiguous Are the Rationales For Natural Language Reasoning? PDF

Hazel Kim

COLING 2025

ATHENA: Mathematical Reasoning with Thought Expansion PDF

JB. Kim, Hazel Kim, Joonghyuk Hahn, Yo-Sub Han

EMNLP 2023

ALP: Data Augmentation Using Lexicalized PCFGs for Few-Shot Text Classification PDF

Hazel Kim, Daecheol Woo, Seong Joon Oh, Jeong-Won Cha, Yo-Sub Han

AAAI 2022

LST: Lexicon-Guided Self-Training for Few-Shot Text Classification PDF

{Hazel Kim, Jaeman Son}*, Yo-Sub Han

Arxiv

Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias PDF

Hazel Kim, Philip Torr

Preprint

Measuring what Matters: Construct Validity in Large Language Model Benchmarks PDF

NeurIPS 2025 Datasets & Benchmarks

Detecting LLM Hallucination through Layer-wise Information Deficiency PDF

Hazel Kim, Tom A. Lamb, Adel Bibi, Philip Torr, Yarin Gal

EMNLP 2025

How Ambiguous Are the Rationales For Natural Language Reasoning? PDF

Hazel Kim

COLING 2025

ATHENA: Mathematical Reasoning with Thought Expansion PDF

JB. Kim, Hazel Kim, Joonghyuk Hahn, Yo-Sub Han

EMNLP 2023

ALP: Data Augmentation Using Lexicalized PCFGs for Few-Shot Text Classification PDF

Hazel Kim, Daecheol Woo, Seong Joon Oh, Jeong-Won Cha, Yo-Sub Han

AAAI 2022

LST: Lexicon-Guided Self-Training for Few-Shot Text Classification PDF

{Hazel Kim, Jaeman Son}*, Yo-Sub Han

Arxiv

Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias PDF

Hazel Kim, Philip Torr

Preprint

Measuring what Matters: Construct Validity in Large Language Model Benchmarks PDF

NeurIPS 2025 Datasets & Benchmarks

Detecting LLM Hallucination through Layer-wise Information Deficiency PDF

Hazel Kim, Tom A. Lamb, Adel Bibi, Philip Torr, Yarin Gal

EMNLP 2025

How Ambiguous Are the Rationales For Natural Language Reasoning? PDF

Hazel Kim

COLING 2025

ATHENA: Mathematical Reasoning with Thought Expansion PDF

JB. Kim, Hazel Kim, Joonghyuk Hahn, Yo-Sub Han

EMNLP 2023

ALP: Data Augmentation Using Lexicalized PCFGs for Few-Shot Text Classification PDF

Hazel Kim, Daecheol Woo, Seong Joon Oh, Jeong-Won Cha, Yo-Sub Han

AAAI 2022

LST: Lexicon-Guided Self-Training for Few-Shot Text Classification PDF

{Hazel Kim, Jaeman Son}*, Yo-Sub Han

Arxiv

Google Scholar

Wonderful People I've Met: I was fortunate to start researching with consistent support from Yosub Han. I got great insights into what and how to research from Seong Joon Oh. I was delighted to work with humble yet astute advice from Sangdoo Yun and lucky to have inspiring discussions with Kangmin Yoo. I was happy to mentor enthusiastic JB. Kim in writing his first paper. I enjoyed working on my first paper with Daecheol Woo's sincerity and positive mindset. I thankfully met many great people while conducting research! I appreciate all of my collaborators for supporting me in many different ways :)

Research Areas: Interpretable and Controllable Language Models

Publications

Research Areas:
Interpretable and Controllable Language Models