个人简介
杨海龙,男,博士,博导,院长助理。北航计算机系统结构方向博士,美国密歇根大学博士后。CCF体系结构专委会委员,CCF高级会员。主要研究方向为高性能计算、性能分析与优化、编译优化技术、运行时系统、分布式与并行计算。近年来,承担了国家自然科学基金项目、国家重点研发计划项目、国家863计划项目数十项,项目研发方向主要集中在高性能数值算法、编译优化与自动调优、大规模性能分析工具、深度学习系统等;承担阿里、商汤、华为、中国电科、航天科工、航天科技等企业委托课题数十项,项目研发方向主要集中在大规模弹性训练系统、深度学习编译自动调优、稀疏算子编译优化、E级程序性能分析工具、国产处理器高性能数值算法、航空多学科优化并行计算平台等。
目前已在SC、ISCA、ASPLOS、PLDI、ICSE、TPDS、TC、TOCS、TACO、ICS、ICPP、IPDPS、CLUSTER等国际顶级/知名会议和期刊上发表学术论文40余篇,获教学优秀三等奖一项。担任CCF THPC期刊青年编委,TPDS、TC、PARCO、JPDC、FGCS、FCS等期刊审稿人,CLUSTER21国际会议体系结构程序委员会主席,ICPP、CLUSTER、HPCC、NPC、PMAM等国际会议程序委员会委员。指导的北航超算队获得ASC17总决赛亚军、ASC19最高计算性能奖、ASC19总决赛一等奖、ASC22总决赛一等奖等国内外赛事奖项24项。
目前主讲计算机学院本科生课程《计算机科学方法论》、研究生课程《开源操作系统前沿技术》、留学生课程《Parallel Programming》,协助讲解研究生课程《高等计算机体系结构》、留学生课程《Computer Architecture》。
研究领域
主要研究方向为高性能计算、性能分析与优化、分布式与并行计算、深度学习编译优化技术、大数据系统性能分析与优化、云计算资源管理和任务调度、高吞吐计算。
近期论文
查看导师新发文章
(温馨提示:请注意重名现象,建议点开原文通过作者单位确认)
EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs (SC) 2023.
TrivialSpy: Identifying Software Triviality via Fine-grained and Dataflow-based Value Profiling (SC) 2023.
Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs (ICPP) 2023.
BiRFIA: Selective Binary Rewriting for Function Interception on ARM (ICS) 2023.
Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU (IPDPS) 2023.
VClinic: A Portable and Efficient Framework for Fine-grained Value Profilers (ASPLOS) 2023.
Building a Domain-Specific Compiler for Emerging Processors with a Reusable Approach (SCIS) 2023.
Towards Optimized Tensor Code Generation for Deep Learning on Sunway Many-Core Processor (FCS) 2022.
CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs (SC) 2022.
Vectorizing SpMV by Exploiting Dynamic Regular Patterns (ICPP) 2022.
NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database (ICPP) 2022.
Toward accelerated stencil computation by adapting tensor core unit on GPU (ICS) 2022.
StencilMART: Predicting Optimization Selection for Stencil Computations across GPUs (IPDPS) 2022.
PowerSpector: Towards Energy Efficiency with Calling-Context-Aware Profiling (IPDPS) 2022.
Input-Aware Sparse Tensor Storage Format Selection for Optimizing MTTKRP (TC) 2021.
The Deep Learning Compiler: A Comprehensive Survey (TPDS) 2021.
Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee (TOCS) 2021.
SpTFS: Sparse Tensor Format Selection for MTTKRP via Deep Learning (SC) 2020.
ZeroSpy: Exploring Software Inefficiency with Redundant Zeros (SC) 2020.
SympleGraph: Distributed Graph Processing with Precise Loop-Carried Dependency Guarantee (PLDI) 2020.
Accelerating Sparse Cholesky Factorization on Sunway Manycore Architecture (TPDS) 2020.
Massively Scaling Seismic Processing on Sunway TaihuLight Supercomputer (TPDS) 2020.
Temperature-Aware DRAM Cache Management - Relaxing Thermal Constraints in 3-D Systems (TCAD) 2020.
Redundant Loads: A Software Inefficiency Indicator (ICSE) 2019.
LWPTool: A Lightweight Profiler to Guide Data Layout Optimization (TPDS) 2018.
SMGuard: A Flexible and Fine-Grained Resource Management Framework for GPUs (TPDS) 2018.
PowerChief: Intelligent Power Allocation for Multi-Stage Applications to Improve Responsiveness on Power Constrained CMP (ISCA) 2017.
Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers (ASPLOS) 2017.
Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers (ASPLOS) 2016.
Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers (ISCA) 2013.