Chenghao Lyu

Chenghao Lyu

Ph.D.

UMass Amherst

About Me

My name is Chenghao Lyu. I am a researcher and engineer interested in databases, big data analytics systems, machine learning and multi-objective optimization.

I earned my Ph.D. in computer science from UMass Amherst, advised by Prof. Yanlei Diao and Prof. Prashant Shenoy. Before joining UMass Amherst, I obtained my BS in EE and MS in CS from Fudan University, where I was advised by Prof. X. Sean Wang. During my Ph.D., I also worked as a scientific collaborator in CEDAR team at Ecole Polytechnique in France for 2.5 years.

My research lies in the intersection of big data analytics systems, machine learning, and multi-objective optimization, with a focus on designing optimizers to auto-configure parameters for large-scale systems to achieve improved performance and cost reduction.

Email: {first-name}@cs.umass.edu

Interests
  • Adaptive Query Execution and Optimization
  • Big Data Analytics Systems
  • Machine Learning
  • Multi-objective Optimizations
Education
  • Ms/PhD in Computer Science, 2018 - 2025.1

    UMass Amherst, MA, USA

  • MSc in Electronic Engineering, 2018

    Fudan University, Shanghai, China

  • BSc in Electronic Engineering, 2015

    Fudan University, Shanghai, China

News

[2025.01] I am joining Learned Systems Group at Amazon as an applied scientist.

[2025.01] I successfully defended my PhD! Many thanks to my committee members—Yanlei Diao, Prashant Shenoy, Peter Haas, and David Irwin—for their invaluable support.

[2025.01] I received the Dr. Phil Bernstein Graduate Scholarship in Computer Science.

[2024.06] Our paper “A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning” was accepted to VLDB 2024!

[2024.03] I was back to Amherst and defensed my thesis proposal.

[2023.12] We released UDAO, the unified data analytics optimizer, to public and PyPI. Try “pip install udao”.

[2023.10] I reported my on-going work “An Adaptive, Multi-Resolution, and Multi-Objective Parameter Tuning Approach for Spark SQL” in the ERC BigFastData Workshop.

[2023.05] We released our Spark-TPCH dataset and the MOO framework (where I contribute the internal solver) in our UDAO project, a Uniformed Data Analytics Optimizer.

[2022.07] Our paper with Alibaba Cloud on “Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing” was accepted to VLDB 2022!

[2021.10] I started working as a scientific collaborator in the CEDAR project-team of Inria and LIX, at Ecole Polytechnique. Bonjour!

[2020.10] Our paper “Spark-based Cloud Data Analytics using Multi-Objective Optimization” was accepted to ICDE 2021!

[2020.02] I started my internship at Alibaba DAMO Academy.

Publications

(2024). A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning. In PVLDB, 17(11), 2024.

PDF Cite Code Poster DOI Tech Report

(2022). Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing. In PVLDB, 15(11), 2022.

PDF Cite DOI Tech Report

(2021). Spark-based Cloud Data Analytics using Multi-Objective Optimization. In ICDE, 2021.

PDF Cite Code DOI Tech Report

(2021). Neural-based Modeling for Performance Tuning of Spark Data Analytics. arXiv, 2021.

PDF Cite

(2019). UDAO: A Next-Generation Unified Data Analytics Optimizer. In PVLDB 12(12), 2019.

PDF Cite DOI

Experience

 
 
 
 
 
The CEDAR project-team, LIX, Ecole Polytechnique
Scientific Collaborator
Oct 2021 – May 2024 Paris
Developing a Unified Data Analytics Optimizer (UDAO) system.
 
 
 
 
 
DAMO Academy, Alibaba
Research Intern
Feb 2020 – Dec 2021 Remote&Hangzhou
Designed the new architecture of a resource optimizer in big data systems. Saved 36-37% latency and 37-75% cost over production workloads of 0.6M jobs and a simulator of the extended Alibaba MaxCompute environment.