Query Plan Representation (QPR) is central to workload modeling, with various deep-learning based architectures proposed in the literature. Our work is motivated by two key observations: (i) the research community still lacks clarity on which model, if any, best suits the QPR problem; and (ii) while transformers have revolutionized many fields, their potential for QPR remains largely underexplored. This study examines the strengths and challenges of Graph Transformers for QPR. We introduce a new taxonomy that uni!es deep-learning based QPR techniques along key design axes. Our benchmark analysis of common QPR architectures reveals that Graph Transformer etworks (GTNs) consistently outperform alternatives, but can degrade under limited training data. To address this, we propose novel data augmentation techniques to enhance training diversity and re!ne GTN architectures by replacing ineffective language-model-inspired components with techniques better suited for query plans. valuation on JOB, TPC-H, and TPC-DS benchmarks shows that with su#cient training data, enhanced GTNs outperform existing models for capturing complex queries (JOB Full and TPC-DS) and enable the query embedder trained on TPC-DS to generalize to TPC-H queries out of the box.