About
I am a research scientist at Meta Superintelligence Labs. I work on scaling pre-training and post-training of Meta’s next-generation foundation models, with a particular focus on training efficiency, both from the infrastructure perspective (training parallelisms, communication-computation overlap, etc.) and the model co-design perspective (e.g., linear attention). I was previously on the AI and Systems Co-Design team, where I also worked on LLM training systems.
I received my PhD in Computer Science from the Paul G. Allen School of Computer Science & Engineering at the University of Washington, advised by Magdalena Balazinska. My PhD research focused on the intersection of data management and AI, including Tensor Query Processing (Best Demo Award, VLDB’22).
Before UW, I obtained my B.S. (Honors) in Computer Science at Fudan University. In my younger years, I was a contestant in ACM-ICPC and Olympiad in Informatics.
