The paperβs design intent is that a single flat tokenization forces one codebook to represent all structural scales, while BiHiToβs NHQ lets different levels specialize to different structural βtypesβ (global anchor vs backbone geometry vs full atoms).
Mechanistically, the pipeline is: point-cloud representation (atom coordinates) β BML-based encoding β hierarchical downsampling + per-level FSQ quantization β upsampling + linear fusion β Mamba-based decoding, trained with RMSD and an inter-atomic distance loss after alignment using Umeyama-Kabsch.
Skeptical caution: the paper text strongly argues for hierarchy priors, but it does not directly prove causality that βhierarchyβ is the primary driver rather than e.g. architectural capacity, hyperparameter tuning, or metric/conditioning effects. The ablations (especially L3 removal) support the hierarchy hypothesis internally, but they still only test within BiHiToβs design space.
Custom summaries of the latest cutting edge Science research. Every Friday. No Ads.