Automatic Detection of Atrial Fibrillation in Ambulatory Electrocardiograms using a Deep Neural Network

Oluwafemi Ogundare
7 min readSep 10, 2021
An Electrocardiogram.

Digital electrocardiogram (ECG) interpretation plays a pivotal role in the clinical ECG workflow. Publicly available digital ECG data and the algorithmic paradigm of deep learning present an opportunity to substantially improve the accuracy (and scalability) of automated ECG analysis. Here, I use a deep neural network (DNN) to classify ECG recordings from patients who used a single-lead ambulatory ECG monitoring device into four distinct categories: normal sinus rhythm (N), atrial fibrillation (A), other rhythm (O), or noise (~). When validated against a test dataset, the DNN achieved an average F1 score of 0.782. This finding demonstrate that an end-to-end deep learning approach can potentially classify arrhythmias from single-lead ECGs with high diagnostic performance similar to that of cardiologists. If validated in clinical settings, this approach could reduce the rate of misdiagnosed digital ECG interpretations and improve the efficiency of human ECG interpretation.

Introduction

The electrocardiogram is a fundamental tool in the everyday practice of clinical medicine. It is crucial for diagnosing a wide spectrum of heart abnormalities from arrhythmias to myocardial infarction. The combination of widespread digitization of ECG data and the development of algorithmic paradigms that can benefit from large-scale processing of raw data may provide substantial improvements to ECG interpretation.

An ECG Machine.

Atrial fibrillation (AF) is is the most common sustained cardiac arrhythmia, occurring in 1–2% of the general population. This arrhythmia is associated with significant morbidity, carrying a 4- to 5-fold increased risk for ischemic stroke. AF is often silent, with patients occasionally presenting with stroke as the first manifestation of the arrhythmia. Other patients have troubling symptoms such as palpitations or dizziness, but traditional monitoring has been unable to define an arrhythmia. Despite the enormity of this problem, AF detection remains problematic, because it may be episodic. Therefore, periodic sampling of heart rate and rhythm could be helpful to establish a diagnosis in these conditions.

Dataset

Data used in this project underwent expert annotation for 4 rhythm classes: normal sinus rhythm, atrial fibrillation, other rhythm, and noise. A total of 8528 ECG recordings, each lasting from 9s to 61s, and of variable lengths, were present in the dataset. Each ECG recording was taken by a patient and was recorded by the AliveCor ECG device, which is a Food and Drug Administration (FDA)-cleared, single-lead, ambulatory ECG monitor that continuously records data from a single vector (modified Lead I [LA-RA]) at a frequency of 300Hz, which in theory has each of the two electrodes of the ECG monitor in each hand. A good number of the ECG recordings were inverted (RA-LA) since the device did not require the patients to rotate it in any particular orientation.

The AliveCor ECG Monitor
The Einthoven Triangle showing the vectors of the 3 bipolar limb leads used in clinical practice.
Vector representation of ECG Lead I
ECGs of AF, Normal Sinus Rhythm, Other Rhythm and Noise.

Training

The DNN was trained on the dataset holding out a 10% validation and test dataset for early stopping and algorithm evaluation, respectively.

Results

The performance of the DNN was first evaluated on the 10% test dataset (n=852) by calculating the F1 score.

The F1 score, which is the harmonic mean of the positive predictive value (precision) and sensitivity (recall), was then computed whilst using the test dataset labels as the gold standard. The class weighted average F1 score was 0.782. Normal sinus rhythm had the highest F1 score (0.857); this was followed by AF, other rhythm and noise.

F1 scores on the 10% test dataset

F1 scores on the 10% validation dataset (n=852) were materially unchanged from the test dataset results, although they were slightly higher.

F1 scores on the 10% validation dataset

A confusion matrix was plotted to illustrate the discordance between the DNN’s predictions and the ground truth or cardiologist committee consensus.

Confusion Matrix

Methods

Algorithm development

A convolutional neural network was designed to detect heart rhythms, which takes as input the raw ECG data (sampled at 300Hz, or 300 samples per second) and outputs one prediction every 256 samples (or every 0.85s), which is denoted by the output interval. The network takes as input only the raw ECG samples and no other patient- or ECG-related features. The network architecture was same as that proposed by Awni et al. at Stanford University in their paper Cardiologist-level Arrhythmia Detection and Classification in Ambulatory Electrocardiograms using a Deep Neural Network.

From Nature Medicine: Cardiologist-level Arrhythmia Detection and Classification in Ambulatory Electrocardiograms using a Deep Neural Network (Awni et al., 2019)— “The network architecture has 34 layers; to make the optimization of such a network tractable, we employed shortcut connections in a manner similar to the residual network architecture. The network consists of 16 residual blocks with two convolutional layers per block. The convolutional layers have a filter width of 16 and 32*2k filters, where k is a hyper-parameter which starts at 0 and is incremented by 1 every fourth residual block. Every alternate residual block subsamples its inputs by a factor of 2. Before each convolutional layer, we applied batch normalization and a rectified linear activation, adopting the pre-activation block design. The first and last layers of the network are special-cased due to this pre-activation block structure. We also applied Dropout between the convolutional layers and after the nonlinearity with a probability of 0.2.”

The final fully connected softmax layer of the network was modified to produce a distribution over the 4 output classes.

The network was trained de novo with random initialization of the weights. Adam optimizer, with the default parameters beta_1=0.9, beta_2=0.999, and a mini batch size of 2 was used. The learning rate was initialized to 0.001 and was reduced by a factor of 10 when the validation set loss failed to improve in two consecutive epochs. The model that achieved the lowest error on the validation set was chosen.

Algorithm evaluation

The given label of each ECG record was used as the label for approximately every 0.85s output interval. To produce a single prediction for the variable length record, majority vote of the output interval predictions was used.

Code

The code for the algorithm development and evaluation is available on github.

Limitation

The major limitation of the methodology used in this project is that the input dataset is limited to single-lead ECG records obtained from an ambulatory monitor, which provides limited signal compared to a standard 12-lead ECG. It remains undetermined if the algorithm performance would be similar in 12-lead ECGs.

The standard ECG has 12 leads. Six of the leads, namely leads I, II, III, aVL, aVR and aVF, are considered ‘limb leads’ while the other six, namely leads V1, V2, V3, V4, V5 and V6, are called ‘precordial leads’.

Conclusion

An end-to-end DNN approach has the potential to improve the accuracy of ECG analysis and interpretation. Future algorithmic and computational advances could compel us to revisit the standard approaches to ECG interpretation. Furthermore, algorithmic approaches whose performance improves as more data become available, such as deep learning , can leverage the widespread digitization of ECG data and provide clear opportunities to bring us closer to the ideal of a learning health care system.

Bio

Oluwafemi Ogundare is a 3rd year Medical Student at the University of Ibadan, Nigeria. He is passionate about Artificial Intelligence, Genomics, and Bioinformatics due to their potential to usher mankind into an era of Precision/Personalized Medicine. He spends his free days improving on his Machine Learning skills and learning new concepts in Genomics and Bioinformatics. He has worked on a number of health-related Machine Learning projects including Tumor Segmentation in 3D MRIs of the Brain, Invasive Ductal Carcinoma Detection in Breast Cancer Histology Images, and Prediction of Patient Survival Rates using Random Forests. You can send him a mail at femiogundare001@gmail.com or reach him on LinkedIn at https://www.linkedin.com/in/oluwafemi-ogundare-65b6a0185/.

References

[1] Schläpfer, J. & Wellens, H. J. (2017). Computer-interpreted electrocardiograms: benefts and limitations.

[2] Ganong’s Review of Medical Physiology, Twenty-Fifth Edition (by Kim Barrett, Susan Barman, Scott Boitano & Hedwenn Brooks).

[3] Essentials of Medical Physiology, Eight Edition (by K. Sembulingam & Prema Sembulingam).

[4] Gari, D.C., Chengyu, L., Benjamin, M., Li-wei, H.L., Ikaro, S., Qiao, L., Johnson, A.E. & Roger, G.M. (2017). AF Classification From a Short Single Lead ECG Recording: The PhysioNet/Computing in Cardiology Challenge 2017.

[5] Awni, Y.H., Pranav, R., Masoumeh, H., Geoffrey, H.T., Codie, B., Mintu, P.T. & Andrew Y.N. (2019). Cardiologist-level Arrhythmia Detection and Classification in Ambulatory Electrocardiograms using a Deep Neural Network.

--

--