Tamanna Hossain-Kay

Hi, I’m a PhD candidate in the Computer Science department at UC, Irvine advised by Sameer Singh. My research is generously funded by the Hasso-Plattner Institute.

Rigorous model evaluation remains a critical challenge in NLP, as unreliable or biased systems can still achieve high scores on standard benchmarks, underscoring the gap between evaluation and real-world performance . Moreover, what counts as “good” model behavior is inherently tied to human values, which are pluralistic and evolving, and the rapid development of new LLM methods can require specialized evaluation. My research quantifies model biases and failures to ensure the responsible deployment of NLP technologies in real-world environments, concentrating on (1) social safety evaluation, and (2) the evaluation of new LLM architectures and training methods.

Social Safety

LLM Methods

Blog Posts •