SONIVA presents a groundbreaking, richly annotated pathological speech database featuring recordings from approximately 1,000 stroke survivors and 7,000 ageβmatched controls, substantially improving automated speech recognition (ASR) for aphasia assessment. The work demonstrates a notable WER reduction from 43.62% to 21.93% and 93% accuracy in acoustic classification, providing a critical resource for clinical and linguistic research
This paper introduces the SONIVA database, which is designed to address a critical gap in the field of automated speech recognition (ASR) in post-stroke aphasia. The study is innovative as it aggregates a large volume of speech recordings from approximately 1,000 stroke survivors (with 200 longitudinal cases) and 7,000 age-matched controls, addressing the heterogeneity in aphasic speech patterns. The authors provide detailed annotations, including linguistic coding, orthographic transcriptions, and international phonetic alphabet representations, enabling robust analysis and training of advanced ASR systems. This exhaustive dataset directly addresses the limitations posed by previous, smaller databases and offers the potential for significant clinical improvements in the screening and monitoring of aphasia
Despite the strengths of the SONIVA dataset, the authors acknowledge several challenges:
The above graph visualizes the composition of the SONIVA dataset, illustrating the overall number of participants and the annotated samples provided, thereby enhancing clarity on data scale and distribution.
Overall, the SONIVA paper represents a significant advance in clinical speech pathology and ASR development. Its major strengths lie in dataset scale, detail of annotations, and demonstrated improvements in ASR performance metrics. Future work should aim to address environmental and demographic biases to further generalize these promising results .