Northumbria Research Link

Page created by Ray Anderson

Sports

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Northumbria Research Link
Citation: McCay, Kevin D., Ho, Edmond, Sakkos, Dimitris, Woo, Wai Lok, Marcroft, Claire,
Dulson, Patricia and Embleton, Nicholas D. (2021) Towards Explainable Abnormal Infant
Movements Identification: A Body-part Based Prediction and Visualisation Framework. In:
IEEE BHI 2021: IEEE International Conference on Biomedical and Health Informatics
(BHI) : Reshaping healthcare through advanced AI-enabled health informatics for a better
quality of life, 27-30 Jul 2021, Virtual. (In Press)

URL:

This version was downloaded from Northumbria Research Link:
http://nrl.northumbria.ac.uk/id/eprint/46441/

Northumbria University has developed Northumbria Research Link (NRL) to enable users
to access the University’s research output. Copyright © and moral rights for items on
NRL are retained by the individual author(s) and/or other copyright owners. Single copies
of full items can be reproduced, displayed or performed, and given to third parties in any
format or medium for personal research or study, educational, or not-for-profit purposes
without prior permission or charge, provided the authors, title and full bibliographic
details are given, as well as a hyperlink and/or URL to the original metadata page. The
content must not be changed in any way. Full items must not be sold commercially in any
format or medium without formal permission of the copyright holder. The full policy is
available online: http://nrl.northumbria.ac.uk/policies.html

This document may differ from the final, published version of the research and has been
made available online in accordance with publisher policies. To read and/or cite from the
published version of the research, please visit the publisher’s website (a subscription
may be required.)

Towards Explainable Abnormal Infant Movements
Identification: A Body-part Based Prediction and
Visualisation Framework
Kevin D. McCay1 , Edmond S. L. Ho1 , Dimitrios Sakkos1 , Wai Lok Woo1 ,
Claire Marcroft2 , Patricia Dulson2 , and Nicholas D. Embleton2
1
Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, UK
2
Newcastle Neonatal Service, NUTH NHS Foundation Trust, Newcastle upon Tyne, UK

Abstract—Providing early diagnosis of cerebral palsy (CP) is Currently, the General Movements Assessment (GMA) is
key to enhancing the developmental outcomes for those affected. used to evaluate infant movement by manually observing
Diagnostic tools such as the General Movements Assessment spontaneous infant movements at a specific stage in develop-
(GMA), have produced promising results in early diagnosis,
however these manual methods can be laborious. ment. In a typically developing infant “Fidgety Movements”
In this paper, we propose a new framework for the automated (FMs) are detectable from 3 to 5 months post term [14] and
classification of infant body movements, based upon the GMA, consistently have a similar appearance. The absence of these
which unlike previous methods, also incorporates a visualization movement characteristics consequently allows for abnormal
framework to aid with interpretability. Our proposed framework FM patterns to be identified and classified [5]. However, the
segments extracted features to detect the presence of Fidgety
Movements (FMs) associated with the GMA spatiotemporally.
challenges associated with applying these assessments in prac-
These features are then used to identify the body-parts with the tice depends upon the availability of appropriately trained clin-
greatest contribution towards a classification decision and high- icians. In order to address the issues surrounding manual clin-
light the related body-part segment providing visual feedback to ical assessment, several studies have been carried out which
the user. attempt to assess the viability of automating assessments to
We quantitatively compare the proposed framework’s classi-
predict motor impairment based upon observed motion quality
fication performance with several other methods from the liter-
ature and qualitatively evaluate the visualization’s veracity. Our using computer vision-based approaches [10]. Examples such
experimental results show that the proposed method performs as [2] [1] [3] explore a per frame background subtraction
more robustly than comparable techniques in this setting whilst method for analysis, whereas more recent methods [17] [13]
simultaneously providing relevant visual interpretability. [15], propose the use of Optical Flow-based methods to track
Index Terms—infants, cerebral palsy, general movements as- and assess infant movements. Whilst reasonable results are
sessment, machine learning, explainable AI, visualization
obtained, these methods typically struggle to deal with intra-
class variation, as well as anomalies within the recorded
I. I NTRODUCTION video footage such as illumination changes, camera movement,
Cerebral palsy (CP) is the term for a group of lifelong subject-scaling, and resolution inconsistencies. This makes it
neurological conditions which can cause difficulties with mo- difficult for these approaches to be adopted in a real-world
bility, posture and coordination. CP can also cause problems clinical setting.
with swallowing, speech articulation, vision, and has been On the other hand, with the advancement of pose estimation
associated with a diminished ability to learn new skills. techniques, high-quality skeletal poses can be extracted from
There is significant variance in the severity of CP, with some video automatically. Recent work such as [11], [12] proposed
individuals showing very minor symptoms whilst others may using histogram-based pose features to automate GMA by
be severely disabled [6]. CP is attributed to non-progressive classifying infant movements into FM+ (normal) and FM-
damage to the brain in early infancy [4], [16] and is one of (abnormal). The pose-based features, namely Histograms of
the most common physical disabilities in childhood. However, Joint Orientation 2D (HOJO2D) [11] and Histograms of Joint
early diagnosis of CP can be difficult, with a confirmed Displacement 2D (HOJD2D) [11] are computed from the
diagnosis rarely made before 18 months of age [10]. The orientation of the body segment and the displacement of the
difficulty in providing an early diagnosis is problematic, as joints, respectively. Encouraging classification performance
early intervention care is considered particularly important for on traditional classifiers [11] and deep learning frameworks
those with emerging and diagnosed CP. [12] were demonstrated. Wu et al. [18] proposed Movement
Complexity Index which determines the complexity of the
This project is supported in part by the Royal Society (Ref: IES/R1/191147). body movements of the infant by computing the correlations
Corresponding author: Edmond S. L. Ho (e.ho@northumbria.ac.uk) between the movements of the joints using the Spearman

Correlation Coefficient Matrix (SCCM). Although the method the effectiveness of using histogram-based motion features,
focuses on analyzing the features to predict the risk level of namely Histograms of Joint Orientation 2D (HOJO2D) and
CP of the infant without the need of the training process Histograms of Joint Displacement 2D (HOJD2D), extracted
as in machine learning based approaches, the features are from 2D skeletal poses in detecting FMs from videos. In this
computed from 3D skeletal data which requires specialized paper, an early fusion (i.e. concatenation) of the HOJO2D and
image sensing devices to capture those data. Furthermore, the HOJD2D is used as the input motion features, since better
method requires the user to specify a threshold level of the performance has been demonstrated [11], [12].
computed index to separate normal/abnormal, and it is unclear
if this can be generalized to other datasets. B. Spatiotemporal Fidgety Movement Detection
The aforementioned studies suggest that an automated sys-
In order to detect the presence of FMs spatiotemporally, the
tem could potentially help to reduce the time and cost associ-
motion features have to be extracted from 1) different body-
ated with current manual clinical assessments, and also assist
parts and 2) different temporal segments individually. Inspired
clinicians in making earlier and more confident diagnoses
by this, we propose motion feature extraction from 5 different
by providing additional information about the assessed infant
body-parts in the spatial domain, namely left arm, right arm,
movements. However, these methods are also not without their
left leg, right leg, and head-torso. For the temporal domain,
setbacks. One of the main issues with using machine-learning
we compute HOJO2D and HOJD2D features (8 bins) for the
approaches in the medical domain is the problem of inter-
5 body parts from every 100-frame segment. In doing so,
pretable AI. Models are often seen as ‘black boxes’ in which
each video is represented by multiple histogram-based motion
the underlying structures can be difficult to understand. There
features accordingly. For example, a 1000-frame video will be
is an increasing requirement for the mechanisms behind why
represented by 50 fused features of HOJO2D and HOJD2D.
systems are making decisions to be transparent, understandable
In this work, we formulate the FMs detection problem as a
and explainable [8]. As such, we propose a new motion clas-
binary classification. Since each video is annotated with FM+
sification and visualization framework, which takes an RGB
or FM-, we label all the fused features extracted according
video as the input and analyzes the movement of individual
to the holistic annotation of the video. When training the
body parts to determine if FMs are present (FM+) or absent
classifier all features are used, while the temporal location
(FM-), subsequently identifying normal or abnormal general
information is not used. In other words, no matter whether
movements from segments of the sequence. To make our
the features are extracted from the beginning or near the end
proposed framework fully interpretable, an important aspect
of the video, they will be used to train a single classifier. This
is the integration of an automatically generated visualization
proposed approach provides distinct advantages over previous
capable of relaying pertinent information to the assessor.
methods, i.e. 1) the classifier will be trained by more data
The visualization highlights body-parts which are showing
samples rather than using only one histogram representation
movement abnormalities, and are subsequently providing the
for the whole video as proposed in [11], [12], and 2) a focus
most significant contribution towards the classification result.
on the presence/absence of FMs while ignoring the temporal
As such, our proposed contributions are summarized as:
information when training the classifier.
• A new body-part based classification framework for the
We follow McCay et al. [11] on using an ensemble clas-
automated prediction of CP based upon body movement sifier on MATLAB R2020a that consists of a wide range
extracted from videos. of classifiers to boost the performance of the classification
• A visualization feature to highlight pertinent body-parts
results. Given the multiple fused features extracted from a
in the video to improve the model interpretability. video, all the features will be classified as FM+ or FM-
Experimental results showed that out proposed fidgety , this information is then used in visualizing the results
movements prediction framework achieved 100% accuracy and (Section II-D). As the features were extracted in sequential
outperforms the existing work on the benchmark MINI-RGBD order in the temporal domain, the classification result on each
[7] dataset. The details of our proposed classification and visu- histogram-based motion feature is essentially detecting FMs
alisation framework are discussed in Section II. Our evaluation spatiotemporally.
is discussed in Section III. Our hope is that this contribution
will aid in the adoption of such technologies in this domain, C. Late Fusion for Cerebral Palsy Prediction
through accurate, quantifiable and explainable results. A demo
video is available on https://youtu.be/6CZZmWnT4mo While the method presented in Section II-B provides precise
information on the presence/absence of FMs spatiotemporally,
II. M ETHODOLOGY directly using all motion features as a cerebral palsy prediction
In this section, we discuss the proposed classification and for the whole video will result in sub-optimal performance
visualization framework illustrated in Figure 1. since the temporal ordering is less important in the GMA
than the presence/absence of sustained FMs at any point in
A. Pose-based motion features the sequence. To tackle this problem, we propose representing
The first step of our proposed framework is extracting each of the 5 body parts using a single scalar score s, with
features from input video data. McCay et al. [11] demonstrated this being the average score of the classification result (FM+

Fig. 1: The overview of the proposed prediction and visualization framework.

Method Accuracy Sensitivity Specificity
as 0 and FM- as 1) across all temporal segments for each [11] w/ LDA 66.67% 50.00% 75.00%
body-part. Therefore, the range of s will be between 0 and 1. [11] w/ SVM 83.33% 50.00% 100.00%
Here, we propose to use a late fusion approach to train an [11] w/ Decision Tree 75.00% 50.00% 87.50%
[11] w/ kNN (k=1) 75.00% 25.00% 100.00%
ensemble classifier for cerebral palsy prediction. Specifically,
[11] w/ kNN (k=3) 50.00% 00.00% 75.00%
each video is represented by using the 5 scores obtained from [11] w/ Ensemble 66.67% 50.00% 75.00%
the body parts. The binary classifier will predict whether the FCNet [12] 83.33% 75% 87.5%
motion in the video is considered normal or abnormal. Conv1D-1 [12] 83.33% 75% 87.5%
Conv1D-2 [12] 91.67% 75% 100.00%
Conv2D-1 [12] 83.33% 75% 87.5%
D. Visualization Conv2D-2 [12] 83.33% 75% 87.5%
While machine learning-based frameworks have obtained Movement Complexity Index [18] 91.67% 100.00% 87.5%
excellent performance in a wide range of visual understanding Our method 100.00% 100.00% 100.00%
tasks, most of the existing frameworks can be considered
black-box approaches since most of the classification frame- TABLE I: Classification accuracy comparison between our
works only output the predicted label without specifying ex- proposed framework and baseline methods.
actly what influences the classification decision. Whilst this is
acceptable in typical computer vision tasks, it is less preferable
fidgety movement detection with baselines methods in Section
in healthcare applications, since it is essential for the clinicians
III-A. Next, we present the visualization results as qualitative
to verify the prediction as well.
analysis in Section III-B. We follow the standard protocol as
To extract body part information from an input image, the
in [11], [12], [18] to conduct a leave-one-subject out cross-
CDCL [9] pre-trained body segmentation model is used in this
validation to ensure the results presented in this section are
work. The body is segmented into 6 parts; head, torso, upper
obtained base on unseen data during the training process.
arms, lower arms, pelvis and upper legs, and lower legs. An
example of the segmentation result is illustrated in Figure 1 A. Quantitative Evaluation on the Fidgety Movement Detec-
(bottom left-hand corner). Specifically, given an input infant tion Results
image, CDCL [9] returns an image mask for segmentation. To demonstrate the overall performance of our proposed
To align with those 5 body parts to be used in this work, we framework, we first evaluate the cerebral palsy prediction of
separate the segmentation masks for the arms and legs into the whole input video as explained in Section II-C. We com-
the left and right masks. Here, k-means clustering is used to pared with the existing methods and the results are presented in
divide the pixels on each segment mask into two groups. Table I. Using our framework, we achieved a perfect prediction
In order to make our proposed framework more inter- with 100% accuracy. This highlights the effectiveness of our
pretable, we include a visualization module that highlights the proposed framework over the previous work ( [11], [12], [18]).
body-parts that are contributing to the classification decision.
Our proposed method highlights the body-parts in red to B. Qualitative Evaluation on the Visualization Results
indicate the absence of fidgety movements based on the scores We further provide qualitative results to demonstrate the
computed in the body part abnormality detection explained in effectiveness of our proposed framework. As presented in Sec-
Section II-B, providing clinicians with an intuitive visualization II-D, we detect the absence (FM-) or presence (FM+) of
tion such as the examples illustrated in Figure 2. fidgety movements of each body part in each temporal segment
(see Section II-B). The body parts with a prediction of FM-
III. E VALUATION will be highlighted in red. An example is illustrated in Figure
In this section, we evaluate the effectiveness of our proposed 2. Readers are referred to https://youtu.be/6CZZmWnT4mo to
method using the public dataset MINI-RGBD [7] with Fidgety evaluate the visual quality of the results. From the results, it
movement annotation by an experienced GMs assessor in can be seen that the highlighted body-parts generally show less
[11]. We first compare the performance of our method on complex or more repetitive movements in the videos annotated

R EFERENCES
[1] Lars Adde, Jorunn Helbostad, Alexander R. Jensenius, Mette Langaas,
and Ragnhild Støen. Identification of fidgety movements and prediction
of CP by the use of computer-based video analysis is more accurate
when based on two video recordings. Physiotherapy Theory and
Practice, 29(6):469–475, 2013.
[2] Lars Adde, Jorunn L. Helbostad, Alexander R. Jensenius, Gunnar Tarald-
(a) Positive (abnormal) example sen, Kristine H. Grunewaldt, and Ragnhild StØen. Early prediction of
cerebral palsy by computer-based video analysis of general movements:
A feasibility study. Developmental Medicine and Child Neurology,
52(8):773–778, 2010.
[3] Lars Adde, Hong Yang, Rannei Sæther, Alexander Refsum Jensenius,
Espen Ihlen, Jia yan Cao, and Ragnhild Støen. Characteristics of
general movements in preterm infants assessed by computer-based video
analysis. Physiotherapy Theory and Practice, 34(4):286–292, 2018.
PMID: 29064734.
[4] Martin Bax, Murray Goldstein, Peter Rosenbaum, Alan Leviton, Nigel
(b) Negative (normal) example Paneth, Bernard Dan, Bo Jacobsson, and Diane Damiano. Proposed
definition and classification of cerebral palsy, april 2005. Developmental
Medicine and Child Neurology, 47(8):571–576, 2005.
[5] Christa Einspieler, Arend F. Bos, Melissa E. Libertus, and Peter B.
Marschik. The general movement assessment helps us to identify
preterm infants at risk for cognitive dysfunction. Frontiers in Psychol-
ogy, 7, Mar 2016.
[6] National Institute for Health and Care Excellence. NICE seeks to
improve diagnosis and treatment of cerebral palsy, Jan 2017.
(c) Positive (abnormal) example [7] Nikolas Hesse, Christoph Bodensteiner, Michael Arens, Ulrich G. Hof-
mann, Raphael Weinberger, and A. Sebastian Schroeder. Computer
Fig. 2: Examples of the video generated by our visualization vision for medical infant motion analysis: State of the art and RGB-
module. Body-parts without fidgety movements are high- D data set. In Computer Vision - ECCV 2018 Workshops. Springer
lighted in red. International Publishing, 2018.
[8] Andreas Holzinger, Chris Biemann, Constantinos S. Pattichis, and
Douglas B. Kell. What do we need to build explainable ai systems
for the medical domain? arXiv preprint arXiv:1712.09923, 2017.
as FM-. As shown in Figure 2 (a), the arms are showing a lack [9] Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, and
Ming-Ting Sun. Cross-domain complementary learning using pose for
of movement and are subsequently predicted as FM- using multi-person part segmentation. IEEE Transactions on Circuits and
our framework. On the other hand, the legs are predicted as Systems for Video Technology, 2020.
FM+ as they are showing some movements in that temporal [10] Claire Marcroft, Aftab Khan, Nicholas D. Embleton, Michael Trenell,
and Thomas Plötz. Movement recognition technology as a method of
segment. Figure 2 (c) show an example with monotonous arm assessing spontaneous general movements in high risk infants. Frontiers
and leg movements and our method highlights those body parts in Neurology, 6(JAN):284, 2015.
as FM- accordingly. For the videos annotated as FM+, such [11] K. D. McCay, E. S. L. Ho, C. Marcroft, and N. D. Embleton. Establish-
ing pose based features using histograms for the detection of abnormal
as the example shown in Figure 2 (b), it can be seen that infant movements. In Proceedings of IEEE EMBC, pages 5469–5472,
a much wider variety of movements can be observed. The July 2019.
visualization provides effective visual feedback to the user, [12] K. D. McCay, E. S. L. Ho, H. P. H. Shum, G. Fehringer, C. Marcroft,
and N. D. Embleton. Abnormal infant movements classification with
as such clinicians can pay greater attention to the highlighted deep learning on pose-based features. IEEE Access, pages 1–1, 2020.
segments for further analysis. [13] S. Orlandi, K. Raghuram, C. R. Smith, D. Mansueto, P. Church, V. Shah,
M. Luther, and T. Chau. Detection of atypical and typical infant
IV. C ONCLUSION movements using computer-based video analysis. In Proceedings of
IEEE EMBC, pages 3598–3601, 2018.
In this paper, we present a new framework for detecting [14] Heinz FR Prechtl, Christa Einspieler, Giovanni Cioni, Arend F. Bos,
fidgety movements of infants spatiotemporally using the pose- Fabizi Ferrari, and Dieter Sontheimer. An early marker for neurological
based features extracted from RGB videos. Experimental deficits after perinatal brain lesions. The Lancet, 349(9062):1361–1363,
May 1997.
results demonstrated that the new method not only achieves [15] H. Rahmati, H. Martens, O. M. Aamo, Ø. Stavdahl, R. Støen, and
perfect prediction with 100% accuracy, but also provides the L. Adde. Frequency analysis and feature reduction method for prediction
user with visualization on how the machine-learning based of cerebral palsy in young infants. IEEE Transactions on Neural Systems
and Rehabilitation Engineering, 24(11):1225–1234, Nov 2016.
framework made the overall prediction relating to the abnor- [16] Peter Rosenbaum, Nigel Paneth, Alan Leviton, Murray Goldstein, Martin
mality of the infant’s movement. Whilst our system is able Bax, Diane Damiano, Bernard Dan, and Bo Jacobsson. A report: The
to provide rudimentary visual feedback to the user, additional definition and classification of cerebral palsy april 2006. Developmental
medicine and child neurology. Supplement, 109:8–14, 03 2007.
visualization tools would be useful to exploit the extracted [17] Annette Stahl, Christian Schellewald, Øyvind Stavdahl, Ole Morten
spatiotemporal information and provide additional predictive Aamo, Lars Adde, and Harald Kirkerod. An optical flow-based method
aid to clinicians for this complex diagnostic task. We intend to to predict infantile cerebral palsy. IEEE Transactions on Neural Systems
and Rehabilitation Engineering, 20(4):605–614, 2012.
further implement our method on data gathered in a real-world [18] Q. Wu, G. Xu, F. Wei, L. Chen, and S. Zhang. Rgb-d videos-based early
clinical setting, as well as explore other relevant visualisation prediction of infant cerebral palsy via general movements complexity.
methods to provide meaningful and explainable feedback to IEEE Access, 9:42314–42324, 2021.
the user.