About
I am a Member of Technical Staff at Microsoft AI Superintelligence. I work on scaling pre-training and post-training of foundation models, with a particular focus on training efficiency, both from the infrastructure perspective (e.g., parallelisms, communication-computation overlap, etc.) and the model co-design perspective (e.g., linear attention). Before Microsoft AI, I was a research scientist at Meta Superintelligence Labs.
I received my PhD in Computer Science from the Paul G. Allen School of Computer Science & Engineering at the University of Washington, advised by Magdalena Balazinska. My PhD research focused on the intersection of data management and AI, including Tensor Query Processing.
Before UW, I obtained my B.S. (Honors) in Computer Science at Fudan University. In my younger years, I was a contestant in ACM-ICPC and Olympiad in Informatics.
