I am a DPhil student in computer science at the University of Oxford, conducting research in Natural Language Processing and Machine Learning.
I am grateful to be advised by Philip Torr and Yarin Gal from Oxford and Hinrich Schütze from LMU Munich as a student of the European Laboratory for Learning and Intelligent Systems (ELLIS Society).
Much of the direction I've taken was shaped by my early interaction with Jae C. Choe and his advisor E.O. Wilson's book "Consilience: The Unity of Knowledge". I cannot express enough gratitude that Jae took my curiosity seriously at such a young age.
Their encouragement opened more doors than I could have expected. My interests have wandered widely, from preparing for mathematics olympiads (AIME 10 & 12, and IMO), to a growing curiosity about law while studying Latin and Roman law, to human rights work with Amnesty International's youth events. My early interest in biodiversity and climate change owes a great deal to Jae's close collaborator Jane Goodall, whose stories deepened my empathy for animals and their behavior.
Through my undergraduate years, alongside my computer science studies, which included competing in the ICPC programming contest (North America Division, 2018), I pursued political science (minor), philosophy (first-, second-, and senior-year seminars), economics (advanced macroeconomics), and biology (evolution and gentics). I was also fortunate to study violin with conductor and violinist Andrew Koehler; one of our most memorable performances was a concert-requiem at Harris Theater Chicago commemorating the Holodomor, the Ukrainian famine-genocide of the 1930s. Music tells stories that are not far from subjects we study today; math, emotions, human beings, history, philosophy, politics, etc.
These experiences now converge in my research on human cognition-inspired artificial intelligence. My starting point has always been language, out of a belief that reasoning grows from language acquisition; words shape our thoughts. That conviction has guided me since I began working in deep learning in 2019, and still crafts the questions I ask today.
Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias PDF
Hazel Kim, Philip Torr
Preprint
Measuring what Matters: Construct Validity in Large Language Model Benchmarks PDF
Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou, Franziska Sofia Hafner, Harry Mayne, Jan Batzner, Negar Foroutan, Chris Schmitz, Karolina Korgul, Hunar Batra, Oishi Deb, Emma Beharry, Cornelius Emde, Thomas Foster, Anna Gausen, María Grandury, Simeng Han, Valentin Hofmann, Lujain Ibrahim, Hazel Kim, Hannah Rose Kirk, Fangru Lin, Gabrielle Kaili-May Liu, Lennart Luettgau, Jabez Magomere, Jonathan Rystrøm, Anna Sotnikova, Yushi Yang, Yilun Zhao, Adel Bibi, Antoine Bosselut, Ronald Clark, Arman Cohan, Jakob Nicolaus Foerster, Yarin Gal, Scott A. Hale, Inioluwa Deborah Raji, Christopher Summerfield, Philip Torr, Cozmin Ududec, Luc Rocher, Adam Mahdi
NeurIPS 2025 Datasets & Benchmarks
Detecting LLM Hallucination through Layer-wise Information Deficiency PDF
Hazel Kim, Tom A. Lamb, Adel Bibi, Philip Torr, Yarin Gal
EMNLP 2025
ATHENA: Mathematical Reasoning with Thought Expansion PDF
JB. Kim, Hazel Kim, Joonghyuk Hahn, Yo-Sub Han
EMNLP 2023
ALP: Data Augmentation Using Lexicalized PCFGs for Few-Shot Text Classification PDF
Hazel Kim, Daecheol Woo, Seong Joon Oh, Jeong-Won Cha, Yo-Sub Han
AAAI 2022
LST: Lexicon-Guided Self-Training for Few-Shot Text Classification PDF
{Hazel Kim, Jaeman Son}*, Yo-Sub Han
Arxiv
Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias PDF
Hazel Kim, Philip Torr
Preprint
Measuring what Matters: Construct Validity in Large Language Model Benchmarks PDF
Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou, Franziska Sofia Hafner, Harry Mayne, Jan Batzner, Negar Foroutan, Chris Schmitz, Karolina Korgul, Hunar Batra, Oishi Deb, Emma Beharry, Cornelius Emde, Thomas Foster, Anna Gausen, María Grandury, Simeng Han, Valentin Hofmann, Lujain Ibrahim, Hazel Kim, Hannah Rose Kirk, Fangru Lin, Gabrielle Kaili-May Liu, Lennart Luettgau, Jabez Magomere, Jonathan Rystrøm, Anna Sotnikova, Yushi Yang, Yilun Zhao, Adel Bibi, Antoine Bosselut, Ronald Clark, Arman Cohan, Jakob Nicolaus Foerster, Yarin Gal, Scott A. Hale, Inioluwa Deborah Raji, Christopher Summerfield, Philip Torr, Cozmin Ududec, Luc Rocher, Adam Mahdi
NeurIPS 2025 Datasets & Benchmarks
Detecting LLM Hallucination through Layer-wise Information Deficiency PDF
Hazel Kim, Tom A. Lamb, Adel Bibi, Philip Torr, Yarin Gal
EMNLP 2025
ATHENA: Mathematical Reasoning with Thought Expansion PDF
JB. Kim, Hazel Kim, Joonghyuk Hahn, Yo-Sub Han
EMNLP 2023
ALP: Data Augmentation Using Lexicalized PCFGs for Few-Shot Text Classification PDF
Hazel Kim, Daecheol Woo, Seong Joon Oh, Jeong-Won Cha, Yo-Sub Han
AAAI 2022
LST: Lexicon-Guided Self-Training for Few-Shot Text Classification PDF
{Hazel Kim, Jaeman Son}*, Yo-Sub Han
Arxiv
Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias PDF
Hazel Kim, Philip Torr
Preprint
Measuring what Matters: Construct Validity in Large Language Model Benchmarks PDF
Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou, Franziska Sofia Hafner, Harry Mayne, Jan Batzner, Negar Foroutan, Chris Schmitz, Karolina Korgul, Hunar Batra, Oishi Deb, Emma Beharry, Cornelius Emde, Thomas Foster, Anna Gausen, María Grandury, Simeng Han, Valentin Hofmann, Lujain Ibrahim, Hazel Kim, Hannah Rose Kirk, Fangru Lin, Gabrielle Kaili-May Liu, Lennart Luettgau, Jabez Magomere, Jonathan Rystrøm, Anna Sotnikova, Yushi Yang, Yilun Zhao, Adel Bibi, Antoine Bosselut, Ronald Clark, Arman Cohan, Jakob Nicolaus Foerster, Yarin Gal, Scott A. Hale, Inioluwa Deborah Raji, Christopher Summerfield, Philip Torr, Cozmin Ududec, Luc Rocher, Adam Mahdi
NeurIPS 2025 Datasets & Benchmarks
Detecting LLM Hallucination through Layer-wise Information Deficiency PDF
Hazel Kim, Tom A. Lamb, Adel Bibi, Philip Torr, Yarin Gal
EMNLP 2025
ATHENA: Mathematical Reasoning with Thought Expansion PDF
JB. Kim, Hazel Kim, Joonghyuk Hahn, Yo-Sub Han
EMNLP 2023
ALP: Data Augmentation Using Lexicalized PCFGs for Few-Shot Text Classification PDF
Hazel Kim, Daecheol Woo, Seong Joon Oh, Jeong-Won Cha, Yo-Sub Han
AAAI 2022
LST: Lexicon-Guided Self-Training for Few-Shot Text Classification PDF
{Hazel Kim, Jaeman Son}*, Yo-Sub Han
Arxiv
Acknowledgement
This website uses the website design and template by Martin Saveski