Hazel Kim

DPhil Student, Oxford

{firstname}.kimh [AT] gmail

I am a DPhil student in computer science at the University of Oxford, conducting research in Natural Language Processing and Machine Learning. I am grateful to be advised by Philip Torr and Yarin Gal from Oxford and Hinrich Schütze from LMU Munich as a student of the European Laboratory for Learning and Intelligent Systems (ELLIS Society).

Much of the direction I've taken was shaped by my early interaction with Jae C. Choe and his advisor E.O. Wilson's book "Consilience: The Unity of Knowledge". I cannot express enough gratitude that Jae took my curiosity seriously at such a young age.

Their encouragement opened more doors than I could have expected. My interests have wandered widely, from preparing for mathematics olympiads (AIME 10 & 12, and IMO), to a growing curiosity about law while studying Latin and Roman law, to human rights work with Amnesty International's youth events. My early interest in biodiversity and climate change owes a great deal to Jae's close collaborator Jane Goodall, whose stories deepened my empathy for animals and their behavior. Through my undergraduate years, alongside my computer science studies, which included competing in the ICPC programming contest (North America Division, 2018), I pursued political science (minor), philosophy (first-, second-, and senior-year seminars), economics (advanced macroeconomics), and biology (evolution and gentics). I was also fortunate to study violin with conductor and violinist Andrew Koehler; one of our most memorable performances was a concert-requiem at Harris Theater Chicago commemorating the Holodomor, the Ukrainian famine-genocide of the 1930s. Music tells stories that are not far from subjects we study today; math, emotions, human beings, history, philosophy, politics, etc.

These experiences now converge in my research on human cognition-inspired artificial intelligence. My starting point has always been language, out of a belief that reasoning grows from language acquisition; words shape our thoughts. That conviction has guided me since I began working in deep learning in 2019, and still crafts the questions I ask today.

Research Areas:
Interpretable and Controllable Language Models

Language Acquisition from Limited Sources
Language models shape the world with given data. They work well with quality and abundant data while providing biased perspectives with scarce, poor data. However, quality data is often expensive to collect or private to get access. This challenge has inspired me to study how to encourage language models to acquire language proficiency and reasoning ability from a limited quantity or quality of resources.
Causal Inference, not Correlation
The heavy reliance on correlations between input observation and output predictions limits language models to reason about cause-and-effect relationships that are not explicitly present in the text. High correlations do not always uncover the causality of events. Causal inference is a crucial topic to control language models against biased data.
Information Quantification
Computation creates valuable information. The usable information for models varies depending on their interaction between input and output resources. Exploring how to quantify information is essential for interpreting model behaviors.
Emergent Knowledge of Language Models
Language models emerge new knowledge that is not present in smaller models when the scale grows. The emergent knowledge is a valuable yet underexplored resource. This has motivated me to investigate them regarding the model capabilities of how trustworthy they are for transparent usage in real-world scenarios.

Publications

Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias PDF

Hazel Kim, Philip Torr

Preprint

Measuring what Matters: Construct Validity in Large Language Model Benchmarks PDF

Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou, Franziska Sofia Hafner, Harry Mayne, Jan Batzner, Negar Foroutan, Chris Schmitz, Karolina Korgul, Hunar Batra, Oishi Deb, Emma Beharry, Cornelius Emde, Thomas Foster, Anna Gausen, María Grandury, Simeng Han, Valentin Hofmann, Lujain Ibrahim, Hazel Kim, Hannah Rose Kirk, Fangru Lin, Gabrielle Kaili-May Liu, Lennart Luettgau, Jabez Magomere, Jonathan Rystrøm, Anna Sotnikova, Yushi Yang, Yilun Zhao, Adel Bibi, Antoine Bosselut, Ronald Clark, Arman Cohan, Jakob Nicolaus Foerster, Yarin Gal, Scott A. Hale, Inioluwa Deborah Raji, Christopher Summerfield, Philip Torr, Cozmin Ududec, Luc Rocher, Adam Mahdi

NeurIPS 2025 Datasets & Benchmarks

Detecting LLM Hallucination through Layer-wise Information Deficiency PDF

Hazel Kim, Tom A. Lamb, Adel Bibi, Philip Torr, Yarin Gal

EMNLP 2025

How Ambiguous Are the Rationales For Natural Language Reasoning? PDF

Hazel Kim

COLING 2025

ATHENA: Mathematical Reasoning with Thought Expansion PDF

JB. Kim, Hazel Kim, Joonghyuk Hahn, Yo-Sub Han

EMNLP 2023

ALP: Data Augmentation Using Lexicalized PCFGs for Few-Shot Text Classification PDF

Hazel Kim, Daecheol Woo, Seong Joon Oh, Jeong-Won Cha, Yo-Sub Han

AAAI 2022

LST: Lexicon-Guided Self-Training for Few-Shot Text Classification PDF

{Hazel Kim, Jaeman Son}*, Yo-Sub Han

Arxiv

Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias PDF

Hazel Kim, Philip Torr

Preprint

Measuring what Matters: Construct Validity in Large Language Model Benchmarks PDF

Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou, Franziska Sofia Hafner, Harry Mayne, Jan Batzner, Negar Foroutan, Chris Schmitz, Karolina Korgul, Hunar Batra, Oishi Deb, Emma Beharry, Cornelius Emde, Thomas Foster, Anna Gausen, María Grandury, Simeng Han, Valentin Hofmann, Lujain Ibrahim, Hazel Kim, Hannah Rose Kirk, Fangru Lin, Gabrielle Kaili-May Liu, Lennart Luettgau, Jabez Magomere, Jonathan Rystrøm, Anna Sotnikova, Yushi Yang, Yilun Zhao, Adel Bibi, Antoine Bosselut, Ronald Clark, Arman Cohan, Jakob Nicolaus Foerster, Yarin Gal, Scott A. Hale, Inioluwa Deborah Raji, Christopher Summerfield, Philip Torr, Cozmin Ududec, Luc Rocher, Adam Mahdi

NeurIPS 2025 Datasets & Benchmarks

Detecting LLM Hallucination through Layer-wise Information Deficiency PDF

Hazel Kim, Tom A. Lamb, Adel Bibi, Philip Torr, Yarin Gal

EMNLP 2025

How Ambiguous Are the Rationales For Natural Language Reasoning? PDF

Hazel Kim

COLING 2025

ATHENA: Mathematical Reasoning with Thought Expansion PDF

JB. Kim, Hazel Kim, Joonghyuk Hahn, Yo-Sub Han

EMNLP 2023

ALP: Data Augmentation Using Lexicalized PCFGs for Few-Shot Text Classification PDF

Hazel Kim, Daecheol Woo, Seong Joon Oh, Jeong-Won Cha, Yo-Sub Han

AAAI 2022

LST: Lexicon-Guided Self-Training for Few-Shot Text Classification PDF

{Hazel Kim, Jaeman Son}*, Yo-Sub Han

Arxiv

Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias PDF

Hazel Kim, Philip Torr

Preprint

Measuring what Matters: Construct Validity in Large Language Model Benchmarks PDF

Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou, Franziska Sofia Hafner, Harry Mayne, Jan Batzner, Negar Foroutan, Chris Schmitz, Karolina Korgul, Hunar Batra, Oishi Deb, Emma Beharry, Cornelius Emde, Thomas Foster, Anna Gausen, María Grandury, Simeng Han, Valentin Hofmann, Lujain Ibrahim, Hazel Kim, Hannah Rose Kirk, Fangru Lin, Gabrielle Kaili-May Liu, Lennart Luettgau, Jabez Magomere, Jonathan Rystrøm, Anna Sotnikova, Yushi Yang, Yilun Zhao, Adel Bibi, Antoine Bosselut, Ronald Clark, Arman Cohan, Jakob Nicolaus Foerster, Yarin Gal, Scott A. Hale, Inioluwa Deborah Raji, Christopher Summerfield, Philip Torr, Cozmin Ududec, Luc Rocher, Adam Mahdi

NeurIPS 2025 Datasets & Benchmarks

Detecting LLM Hallucination through Layer-wise Information Deficiency PDF

Hazel Kim, Tom A. Lamb, Adel Bibi, Philip Torr, Yarin Gal

EMNLP 2025

How Ambiguous Are the Rationales For Natural Language Reasoning? PDF

Hazel Kim

COLING 2025

ATHENA: Mathematical Reasoning with Thought Expansion PDF

JB. Kim, Hazel Kim, Joonghyuk Hahn, Yo-Sub Han

EMNLP 2023

ALP: Data Augmentation Using Lexicalized PCFGs for Few-Shot Text Classification PDF

Hazel Kim, Daecheol Woo, Seong Joon Oh, Jeong-Won Cha, Yo-Sub Han

AAAI 2022

LST: Lexicon-Guided Self-Training for Few-Shot Text Classification PDF

{Hazel Kim, Jaeman Son}*, Yo-Sub Han

Arxiv

Google Scholar

Resume

The Timeline

Acknowledgement

This website uses the website design and template by Martin Saveski