My name is Cehao Yang(杨策皓), a second-year PhD in Artificial Intelligence. Currently I’m studying the HKUST(GZ)-IDEA Joint PhD Program, proudly supervised by Prof.Hui Xiong and Prof.Jian Guo.

🤔 Currently, my research interests primarily focus on Large Language Models (LLMs), covering the following topics:

Knowledge-Augmented LLMs: utilizing external knowledge base to enhance LLMs’ reasoning
Synthetic Data: generating high-quality datasets for training LLMs efficiently
Knowledge Graph: synergizing LLMs and knowledge graphs for application

🙌 Thanks for your reading! If you are interested in my research, welcome for discussion and collaboration!

🔥 News

2026.01: 🎉🎉 One paper (Financial Wind Tunnel) about Market Simulation is accepted by Web 2026 Industry Track!
2026.01: 🎉🎉 One paper (Encrypted Synthetic Data) about Privacy-Preserving is accepted by EACL 2026 Findings!
2025.12: 🎉🎉 One open-source project (DataArc-SynData-Toolkit) has released. Feel free to have a try!
2025.10: 🎉🎉 One paper (SoG) about Synthetic Data Driven by Knowledge Graph is accepted by LoG 2025!
2025.08: 🎉🎉 One paper (Beyond Function-Level Search) about Code Retrieval is accepted by EMNLP 2025 Findings!
2025.05: 🎉🎉 One paper (LongFaith) about Synthetic Data for Long-Context Reasoning is accepted by ACL 2025 Findings!
2025.04: 🎉🎉 Our paper (KGR3) is reported by DeepTech深科技, click (here) to read!
2025.01: 🎉🎉 One paper (ToG2.0) about Knowledge Graph-augmented LLM is accepted by ICLR 2025!
2025.01: 🎉🎉 One paper (KGR3) about LLM-driven Knowledge Graph Completion is accepted by NAACL 2025 Main!
2024.12: 🎉🎉 One paper (CATS) about LLM-driven Inductive Knowledge Graph Completion is accepted by AAAI 2025!

📝 Selected Publications

ACL 2025 Findings LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data
Cehao Yang^*, Xueyuan Lin^*, Chengjin Xu^*, Xuhui Jiang, Shengjie Ma, Aofan Liu, Hui Xiong, Jian Guo
ICLR 2025 Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation
Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Cehao Yang, Jiaxin Mao, Jian Guo
NAACL 2025 Main Retrieval, Reasoning, Re-ranking: A Context-Enriched Framework for Knowledge Graph Completion
Muzhi Li^*, Cehao Yang^*, Chengjin Xu^*, Xuhui Jiang, Yiyan Qi, Jian Guo, Ho-fung Leung, Irwin King
AAAI 2025 Context-aware Inductive Knowledge Graph Completion with Latent Type Constraints and Subgraph Reasoning
Muzhi Li^*, Cehao Yang^*, Chengjin Xu^*, Zixing Song, Xuhui Jiang, Jian Guo, Ho-fung Leung, Irwin King

✍️ Selected Pre-prints

SELECT2REASON: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Cehao Yang^*, Xueyuan Lin^*, Chengjin Xu^*, Xuhui Jiang, Xiaojun Wu, Honghao Liu, Hui Xiong, Jian Guo.
GraphSearch: An Agentic Deep Searching Workflow for Graph Retrieval-Augmented Generation
Cehao Yang^*, Xiaojun Wu^*, Xueyuan Lin^*, Chengjin Xu, Xuhui Jiang, Yuanliang Sun, Jia Li, Hui Xiong, Jian Guo
Think-on-Graph 3.0: Efficient and Adaptive LLM Reasoning on Heterogeneous Graphs via Multi-Agent Dual-Evolving Context Retrieval
Xiaojun Wu^*, Cehao Yang^*, Xueyuan Lin^*, Chengjin Xu, Xuhui Jiang, Yuanliang Sun, Hui Xiong, Jia Li, Jian Guo

🎖 Honors and Awards

2024.09 - 2028.09, Ph.D. Full Scholarship (￥720,000)
2022.09 - 2024.09, M.Phil. Full Scholarship (￥240,000)

📖 Educations

2024.09 - Now, Ph.D. in Artificial Intelligence, Hong Kong University of Science and Technology (Guangzhou).
2022.09 - 2024.11, M.Phil. in Artificial Intelligence, Hong Kong University of Science and Technology (Guangzhou).
2018.09 - 2022.06, B.S. in Computer Science and Engineering, South China University of Technology.

💁 Volunteer

Reviewer: ACL’25, AAAI PDLM’25, ACM MM’25, NeurIPS’25, ACL SRW’25, EMNLP’25, AAAI’26, AACL’26, ICLR’26, EACL’26, ACL’26, ICML’26
2025.07 - 2025.08, Red Bird Challenge Camp, HKUST(GZ), Teaching Assistant
2024.09 - 2024.12, AIAA5088: Natural Language Processing and Its Applications (2024-2025 Fall), Teaching Assistant
2023.07 - 2023.08, Red Bird Challenge Camp, HKUST(GZ), Teaching Assistant

💻 Internships

2025.01 - Now, Research Intern, DataArcTech Ltd.
2023.08 - 2024.08, NLP Research Intern, IDEA FinAI, Shenzhen.
2021.06 - 2021.09, Back-end Research & Development Intern, ByteDance, Shenzhen.