
Hi, I’m a PhD candidate in the Computer Science department at UC, Irvine advised by Sameer Singh. My research is generously funded by the Hasso-Plattner Institute.
Rigorous model evaluation remains a critical challenge in NLP, as unreliable or biased systems can still achieve high scores on standard benchmarks, underscoring the gap between evaluation and real-world performance . Moreover, what counts as “good” model behavior is inherently tied to human values, which are pluralistic and evolving, and the rapid development of new LLM methods can require specialized evaluation. My research quantifies model biases and failures to ensure the responsible deployment of NLP technologies in real-world environments, concentrating on (1) social safety evaluation, and (2) the evaluation of new LLM architectures and training methods.
Social Safety
-
Non-Binary and Transgender Inclusion: In my latest paper, I conducted a survey of gender-diverse individuals in the US to understand perspectives about automated interventions for text-based misgendering, and based on survery insights introduced a misgendering interventions task and evaluation dataset, MisgenderMender (Hossain, 2024). Prior to this, I developed a framework, Misgendered, for evaluating masked and autoregressive language models for their ability to use pronouns, revealing their inability to correctly use gender-neutral and neo-pronouns (Hossain 2023)
-
Misinformation Detection: During the height of COVID-19 pandemic, I created an award-winning evaluation dataset for COVID-19 misinformation detection, COVIDLies (Hossain, 2020), and assessed the generalizability of top performing neutral rumor verification models from the perspectives of both topic and temporal robustness (Hossain, 2023).
LLM Methods
-
Architecture: State Space Models (SSMs) have recently emerged as an efficient alternative to transformers for language modeling, due to their fixed-size states and linear scaling with sequence length. However, since the entire context is compressed into a single hidden representation, the fidelity of that compression—and consequently, their recall ability—is critical to their viability as transformer alternatives. In my current work-in-progress, I demonstrate that, through the lens of text reconstruction, the ability of SSMs to recall information from their hidden states is more limited than previously suggested.
-
Training: Given the high costs and challenges associated with collecting vast amounts of data, synthetic data generation presents itself as a cost-effective and straightforward alternative for training machine learning models. However, training on synthetic data, particularly when done recursively, raises concerns such as error propagation and a reduction in data diversity. In my current work-in-progress I am studying the affect of incorporating synthetic data into preference alignment of LLMs.
Blog Posts •
- 2025/02/08 Yet Another DeepSeek Overview
- 2023/08/05 Paper Summary: Whose Opinions Do Language Models Reflect?
- 2022/01/08 Paper Summary: Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies
- 2021/11/28 Weird Stuff in High Dimensions
- 2021/11/09 Box Embeddings: Paper Overviews
- 2021/06/28 Orthogonal Procrustes
- 2019/12/25 Five Year Anniversary Trip to Maui
- 2018/11/01 A Brief Introduction to fMRI Analysis
- 2018/05/18 Viz : The Rohingya Exodus
- 2018/05/16 Viz : Drawing [e]
- 2018/02/22 Viz : US Same Sex Marriage Laws
- 2018/02/11 Viz : The Words of Larry Nassar Survivors
- 2017/12/22 Viz : Encoded Photo - An Anniversary Card
- 2017/12/03 Viz : Text Mining Stranger Things (Season 1 vs. Season 2)
- 2017/11/19 Viz : #MeToo Twitter
- 2017/11/19 Viz : #MeToo Twitter