This study was approved by the Ethics Committee of the Toyama University Hospital, Toyama Japan (Approval Number: R2019003). All methods were performed in accordance with the relevant guidelines and regulations. In this study, informed consent was obtained by publishing an opt-out document on the website, based on the Ethical Guidelines for Medical and Health Research Involving Human Subjects established by the Ministry of Health, Labour and Welfare of Japan.

Subjects

The data of 1009 patients who underwent equilibrium examinations in our department (the Department of Otolaryngology, Head and Neck Surgery, University of Toyama) in the 10-year span from 2009 to 2019 were retrieved. The number of patients with PV was 497, and the number with non-PV disease was 512 (611 males and 398 females overall; mean age, 55.6 years). PV disease and non-PV disease were diagnosed according to the International Classification of Vestibular Disorders of the Bárány Society14 and the guidelines of the Japan Society for Equilibrium Research15 (Table 1). For example, small acoustic neuromas corresponding to Koos16 grade I or II were classified in the PV disease group. Patients who were confirmed to have unilateral PV dysfunction but could not be diagnosed as having an established clinical entity were classified in the PV disease group and considered to have inner ear disorder. Patients who were evaluated to have normal PV function and in whom central nervous system disease was ruled out by neurological examinations and brain MRI/magnetic resonance angiography (MRA) or brain CT were classified in the non-PV disease group and considered to have dizziness syndromes of unclear etiology. However, even if brain MRI/MRA and brain CT did not show any abnormalities, patients who showed normal vestibular function but showed abnormalities in the optokinetic nystagmus test and eye tracking test were classified in the non-PV disease group and considered to have central balance disorder. These patients often showed downbeat nystagmus, failure of fixation suppression, and abnormal eye movement. Although the cause of persistent postural-perceptual dizziness may be rooted in the PV system, these symptoms are thought to be modified by other factors. For this reason, patients with these symptoms were classified in the non-PV disease group.

Table 1 Demographic data and clinical diagnosis of patients.

All patients underwent our standardized neuro-otological examinations, listed as number (No.) 1 to No. 16 in Table 2. These 16 examinations yielded a total of 44 features, which could be divided into two types: continuous and categorical. Continuous features with numerical values were used as they were, and categorical features were coded as integers from 0 to 3. Examinations No. 1 to No. 14 were performed as routine examinations, and examinations No. 15 and No. 16 were added as needed. In the caloric test (No. 6), we injected air currents at 24 °C and 50 °C (6 L/min) into each ear canal for 60 s with the patient’s eyes closed. The maximal slow-phase velocity (MSPV), canal paresis percentage (CP%), and directional preponderance percentage (DP%) of the caloric nystagmus were recorded after each irrigation, and the CP% and DP% were calculated from the MSPV according to Jonkees’ formula17. In our department, if the CP is ≥ 20%, the ear with the lower response is assumed to have unilateral vestibular hypofunction, indicating an abnormal caloric reflex. Bilateral vestibular hypofunction as evaluated by MSPV is defined as < 6°/s in each ear after caloric stimulation18. The failure of fixation suppression test (No. 7) started at 80 s after the beginning of the air current and continued for 10 s. The patient, with both eyes open, stared at an optotype19,20. The pendular sinusoidal rotation test (No. 8) was performed with rotation of the chair at 0.l Hz, amplitude 240°, maximum velocity of 75.4°/s, with the patient’s eyes closed21. In the eye tracking test (No. 9), the patient gazed at and pursued an optotype lamp (viewing angle 20 degrees, frequency 0.3 Hz) that moved left and right22. In the optokinetic nystagmus test (No. 10), 12 striations were projected onto a hemispherical drum. The striations began to rotate in the clockwise (CW) direction of 1°/s and accelerated until a velocity of 100°/s was reached. Next, the striations began to rotate in the counterclockwise (CCW) direction23. Two neuro-otology specialists (M.A. and H.S.) certified by the Japan Society for Equilibrium Research evaluated the waveforms from electronystagmography (ENG) by visual inspection and diagnosed all ENG findings. Stabilometry (No. 11) was performed according to the Japanese standard24. The Mann test (No. 12) was performed during tandem standing for 30 s with the eyes open and 30 s with the eyes closed, and then the positions of the front and back legs were reversed25. In the Fukuda stepping test (No. 13), the patient stood upright with eyes closed and arms extended forward and took 50 steps26,27. In the Schellong test (No. 14), blood pressure was measured twice in a recumbent position and 3 additional times: immediately after standing and 5 and 10 min later28. The galvanic body-sway test (GBST) (No. 15) evaluates the body-sway response induced by 0.2 mA and 0.4 mA electrical stimulation applied to the retroauricular area. Bipolar rectangular current stimulation lasting for 3 s was repeated 10 times, alternating between the left and right, as the patient stood on the stabilometer with his or her feet close together29. The stimulus conditions of the cervical vestibular evoked myogenic potential (cVEMP) test (No. 16) were a click sound of 0.1 ms, a frequency of 5 Hz, and a sound pressure level of 105 dB. Two hundred reaction waveforms were summed30.

Table 2 Equilibrium examinations and each feature’s name.

Steps in the machine learning classification method

In the present research, we used supervised ML to perform classification, which aims to predict the categories of new observations based on a training set of data whose categories are known31. The program was created on Google Colaboratory using Python version (v) 3.7.12, scikit-learn32 v1.0.2, NumPy v1.21.5, SciPy v1.4.1, Pandas v1.3.5, and Matplotlib v3.2.2. Five well-known algorithms, random forest (RF), adaboost (AB), gradient boosting (GB), support vector machine (SVM), and logistic regression (LR), were adopted. These algorithms have been used in a large number of treatises and specialized books based on an established theory33,34,35,36,37,38,39. The steps in classification are as follows.

Import the data

From the results of the 1009 patients, we created a CSV data file consisting of 44 features and target categories (PV = 0, non-PV = 1). After the CSV data were imported into the program, they were preprocessed to ensure the accuracy of future predictions40.

Split the data

The preprocessed dataset was randomly divided into 75% training data (n = 756) and 25% testing data (n = 253), as shown in Fig. 1. The randomness of splitting for training and testing data was controlled via the “random_state” parameter in scikit-learn.

Figure 1 Overview of our machine learning process.

ML and predictions

ML was performed to create the best model using the training data. In the learning process, various parameters in the algorithm are automatically adjusted. However, some parameters need to be determined by a human to achieve the best prediction41. These variables, known as hyperparameters, can be set using GridsearchCV in scikit-learn32. By using GridsearchCV for each model, we could select the best hyperparameters and create the best model with each of the 5 algorithms. Thereafter, the best models were applied to the test data, as shown in Fig. 1, to create the final evaluation output. In the following description, the models obtained under the condition of random_state = 0 are presented as the best models. Furthermore, we performed 10 replicates of the whole process from splitting the data to applying the new best model for the test data by changing the random state and calculated average values such as accuracy.

Test measures

In the binary classification, one of the two predicted groups was called the negative group (N), and the other was called the positive group (P). We defined PV disease as (N) and non-PV disease as (P). The confusion matrix is commonly used to evaluate the diagnostic ability of classifiers. In Table 3, the basic framework of the confusion matrix32 displays the number of predictions by each model in each of four categories: TP (true positive), FP (false positive), FN (false negative), and TN (true negative).

Table 3 Basic framework of the confusion matrix.

The six test measures used for evaluating the predictive performance of ML are as follows: accuracy, precision, recall (also known as sensitivity), area under the receiver operating characteristic curve (AUC-ROC), F1-score, and Matthews correlation coefficient (MCC). The first five measures are displayed as numerical values ranging from 0 to 1, whereas MCC is displayed as numerical values from − 1 to 1. The greater each value is, the higher the predictive performance.

Statistical analysis

The Mann‒Whitney U test was used for statistical evaluation of age, precision, recall, and F1-score between PV and non-PV. The Chi-square test was used for statistical evaluation of gender proportion. BellCurve for Excel v3.21 (Social Survey Research Information Co., Ltd., Japan) was used for the analysis, and P < 0.05 was considered statistically significant.