About
I am a research scientist at Meta Superintelligence Labs. I work on scaling pre-training and post-training of Meta’s next-generation foundation models, with a particular focus on training efficiency, both from the infrastructure perspective (training parallelisms, communication-computation overlap, etc.) and the model co-design perspective (e.g., linear attention). I was previously on the AI and Systems Co-Design team, where I also worked on LLM training systems.
I received my PhD in Computer Science from the Paul G. Allen School of Computer Science & Engineering at the University of Washington, advised by Magdalena Balazinska. My PhD research focused on the intersection of data management and AI, including Tensor Query Processing.
Before UW, I obtained my B.S. (Honors) in Computer Science at Fudan University. In my younger years, I was a contestant in ACM-ICPC and Olympiad in Informatics.
