杨海龙 - 北京航空航天大学

个人简介

杨海龙，男，博士，博导，院长助理。北航计算机系统结构方向博士，美国密歇根大学博士后。CCF体系结构专委会委员，CCF高级会员。主要研究方向为高性能计算、性能分析与优化、编译优化技术、运行时系统、分布式与并行计算。近年来，承担了国家自然科学基金项目、国家重点研发计划项目、国家863计划项目数十项，项目研发方向主要集中在高性能数值算法、编译优化与自动调优、大规模性能分析工具、深度学习系统等；承担阿里、商汤、华为、中国电科、航天科工、航天科技等企业委托课题数十项，项目研发方向主要集中在大规模弹性训练系统、深度学习编译自动调优、稀疏算子编译优化、E级程序性能分析工具、国产处理器高性能数值算法、航空多学科优化并行计算平台等。目前已在SC、ISCA、ASPLOS、PLDI、ICSE、TPDS、TC、TOCS、TACO、ICS、ICPP、IPDPS、CLUSTER等国际顶级/知名会议和期刊上发表学术论文40余篇，获教学优秀三等奖一项。担任CCF THPC期刊青年编委，TPDS、TC、PARCO、JPDC、FGCS、FCS等期刊审稿人，CLUSTER21国际会议体系结构程序委员会主席，ICPP、CLUSTER、HPCC、NPC、PMAM等国际会议程序委员会委员。指导的北航超算队获得ASC17总决赛亚军、ASC19最高计算性能奖、ASC19总决赛一等奖、ASC22总决赛一等奖等国内外赛事奖项24项。目前主讲计算机学院本科生课程《计算机科学方法论》、研究生课程《开源操作系统前沿技术》、留学生课程《Parallel Programming》，协助讲解研究生课程《高等计算机体系结构》、留学生课程《Computer Architecture》。

研究领域

主要研究方向为高性能计算、性能分析与优化、分布式与并行计算、深度学习编译优化技术、大数据系统性能分析与优化、云计算资源管理和任务调度、高吞吐计算。

近期论文

查看导师新发文章（温馨提示：请注意重名现象，建议点开原文通过作者单位确认）

EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs (SC) 2023. TrivialSpy: Identifying Software Triviality via Fine-grained and Dataflow-based Value Profiling (SC) 2023. Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs (ICPP) 2023. BiRFIA: Selective Binary Rewriting for Function Interception on ARM (ICS) 2023. Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU (IPDPS) 2023. VClinic: A Portable and Efficient Framework for Fine-grained Value Profilers (ASPLOS) 2023. Building a Domain-Specific Compiler for Emerging Processors with a Reusable Approach (SCIS) 2023. Towards Optimized Tensor Code Generation for Deep Learning on Sunway Many-Core Processor (FCS) 2022. CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs (SC) 2022. Vectorizing SpMV by Exploiting Dynamic Regular Patterns (ICPP) 2022. NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database (ICPP) 2022. Toward accelerated stencil computation by adapting tensor core unit on GPU (ICS) 2022. StencilMART: Predicting Optimization Selection for Stencil Computations across GPUs (IPDPS) 2022. PowerSpector: Towards Energy Efficiency with Calling-Context-Aware Profiling (IPDPS) 2022. Input-Aware Sparse Tensor Storage Format Selection for Optimizing MTTKRP (TC) 2021. The Deep Learning Compiler: A Comprehensive Survey (TPDS) 2021. Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee (TOCS) 2021. SpTFS: Sparse Tensor Format Selection for MTTKRP via Deep Learning (SC) 2020. ZeroSpy: Exploring Software Inefficiency with Redundant Zeros (SC) 2020. SympleGraph: Distributed Graph Processing with Precise Loop-Carried Dependency Guarantee (PLDI) 2020. Accelerating Sparse Cholesky Factorization on Sunway Manycore Architecture (TPDS) 2020. Massively Scaling Seismic Processing on Sunway TaihuLight Supercomputer (TPDS) 2020. Temperature-Aware DRAM Cache Management - Relaxing Thermal Constraints in 3-D Systems (TCAD) 2020. Redundant Loads: A Software Inefficiency Indicator (ICSE) 2019. LWPTool: A Lightweight Profiler to Guide Data Layout Optimization (TPDS) 2018. SMGuard: A Flexible and Fine-Grained Resource Management Framework for GPUs (TPDS) 2018. PowerChief: Intelligent Power Allocation for Multi-Stage Applications to Improve Responsiveness on Power Constrained CMP (ISCA) 2017. Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers (ASPLOS) 2017. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers (ASPLOS) 2016. Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers (ISCA) 2013.