A Data Mining Approach for Sepsis Validation - CSCE'19 Online ...

Page created by Howard Schmidt
 
CONTINUE READING
Int'l Conf. Artificial Intelligence | ICAI'19 |                                                                                         323

                        A Data Mining Approach for Sepsis Validation
                          Venkata Mayukha Cheekati, Bhargav Pingle, and Ahmad Y. Javaid
                                Electrical Engineering and Computer Science Department
                                                 The University of Toledo
                                                Toledo, OH 43606, USA
                           Venkatamayukha.Cheekati, Bhargav.Pingle, ahmad.javaid@utoledo.edu

    Abstract— This paper aims to assist a medical doctor in              morbidity. [3] studies show, each year, in the United States,
    decision-making by predicting the Sepsis level. Sepsis is a          around seven hundred and fifty thousand patients are prone
    life-threatening and the prevailing reason for death, which          to sepsis and septic shock. [4] claims Sepsis to be the 10th
    occurs when a human’s immune system becomes inefficient               leading cause of death in the United States, and there is
    to avoid the pathogenic growth in the body. This inefficiency         twenty to ninety percent range of mortality during severe
    leads to severe damage to tissues of various organs of               sepsis and septic shock. To reduce the effect of sepsis, it
    the body, which is fatal. Early prediction of sepsis with            should be identified in the initial stages. [2] affirms that
    appropriate treatment helps in decreasing the mortality of           during the early stages when mild sepsis prevails, and it can
    a sepsis-affected patient. With the huge chunks of medical           be treated with vaccinations for particular pathogens. But
    data that is generated, analysis can be done to predict and          if the sepsis is not detected in the earlier phase, there are
    raise an alarm for the impending sepsis. A methodology is            very high chances of mortality and the critically-ill patient is
    presented to predict the septic-condition of a patient using         admitted to the Intensive Care Unit (ICU) for more vigorous
    data mining algorithms in an open source tool Weka. We               treatment. To prevent such organ dysfunction and urgency
    also integrated the preprocessed data with Sequential Organ          of treatment, sepsis can be prevented with better healthcare
    Failure Assessment (SOFA) score analysis for enhancing               facilities.
    the results. The organ dysfunctional trends evaluation had              With increasing advancements in technology, healthcare
    an improved result on integration with SOFA. For this                is also changing for a superior future. With the advent of
    analysis, a publicly available database called MIMIC-III             computers, sensors, electronic health record systems, it made
    is used. Various experiments are performed to obtain a               easier for the hospitals to store a large amount of data that
    better prediction model. A detailed description of the process       can be studied and analyzed to predict that condition of the
    followed to improve the accuracy of prediction is presented          patient at a given point. For a critically ill patient affected
    in this paper. The final model has achieved an accuracy of            by sepsis, the data generated in ICU’s is a vast amount. It is
    about 92.34%.                                                        a large size complex data difficult to be interpreted in real-
                                                                         time by a human. With the help of this kind of data, a wide
    Keywords: sepsis, prediction, data mining, SOFA score, MIMIC-        variety of approaches have been proposed in the literature to
    III                                                                  predict the condition of a critically-ill ICU admitted patient
                                                                         which are discussed in the literature review.
    1. Introduction
       Sepsis is defined as a systemic response to an infection           2. Related Work & Motivation
    [1]. Sepsis may occur due to extensive use of broader septic             [5] quotes that about US$14.6 billion is spent annually
    antibiotics, intake of agents that can potentially suppress          to treat sepsis. Systemic Inflammatory response syndrome
    immune power, etc. It occurs when the barriers to host               (SIRS) results if sepsis has occurred due to infection. For
    invasion like skin, mucous membranes of respiratory sys-             the patient to fall under SIRS criteria their heart rate is
    tems, gastrointestinal systems fail. [2] states, Sepsis may          ≥ 90 beats/min, temperature either ≥ 38C or ≤ 36C,
    also happen due to insect bites which inject pathogens into          respiratory rate (RR) ≥20 breaths/min, and white blood cell
    the host body, skin burns, and breaks. Sepsis has three              count (WBC) either ≥12 000 or ≤4000 cells/mm3 (or ≥10%
    stages namely, sepsis, severe sepsis and septic shock. The           bands). At least two of the SIRS criteria should be present in
    severity of the condition increases with each stage. During          a patient to be diagnosed with sepsis. In case there is induced
    sepsis, septic shock occurs when there is a steep fall in            hypotension, i.e., systolic blood pressure is ≤ 90mmHg,
    the blood pressure of the body which leads to damage to              40mm Hg below baseline or Mean Arterial Pressure (MAP)
    other body organs like the heart, respiratory system, etc.           ≤ 70mm Hg, septic shock occurs.
    These phenomena are very dangerous and fatal to human                   Many methods have been proposed to predict the mortality
    life. Higher the sepsis severity, higher the mortality and           of a person in the ICU in literature to prevent the patient to

                                                                     1

                                              ISBN: 1-60132-501-0, CSREA Press ©
324                                                                                              Int'l Conf. Artificial Intelligence | ICAI'19 |

      fall ill due to sepsis. [6], [7] shows that one of the many             Data partitioning, Feature construction, Classifier training,
      ways to quantify the organ failure is calculating the Se-               Classifier testing.
      quential Organ Failure Assessment (SOFA) score. This was                   The Insight technique approach is superior to scores
      introduced in 1994. The SOFA score presents the number                  like SOFA, qSOFA, etc. Insight is also shown by these
      of organs failed (which is termed as âĂIJsepsisâĂİ) and the          experiments to be relatively resistant to performance loss
      severity of the failure of the organ. This score is calculated          from reduced measurement availability. Insight performs
      based on various levels of the respiratory system, nervous              superior even in input dropout cases. Although this cannot
      system (Glasgow coma scale), cardiovascular system, liver,              be computed manually like conventional scores, it is a
      renal failure, and coagulation.                                         very good alternative to other scores in an improving and
         These values are considered during the 24 hours of stay              increasing EHR hospital environment. âĂIJInsightâĂİ is
      cycles in the ICU. The SOFA score gives a range of 0-4                  a dynamic and active learning approach which is trained
      for each of the failures mentioned above. To get the SOFA               by older data to predict better. This can help health-care
      score, the condition should be treated to be a continuous               providers assess the sepsis condition almost accurately to
      process instead of treating it as an event, i.e., the assessment        minimize potential harm to a patient.
      should be based on a time-scale instead of an abrupt event                  [18] developed a better efficient tool to predict patient
      saying yes/no. The various ranges of SOFA scores give                   stability called Auto Triage is developed and described in
      us the anomalies of an organ/system which is suitable for               January 2017. This tool claims its accuracy to be 92.9%
      the Contextual Anomaly Detection. Higher the SOFA score                 with 94.5% specificity indicating its highly sensitive nature
      higher the mortality chances. The initial, highest, and mean            towards the data. The focus in our work is mainly on
      SOFA scores correlated well with mortality. Initial and                 manipulating the data obtained from the ICU for better
      highest scores of more than 11 or mean scores of more than              prediction. These scoring systems assist the clinician in
      5 are equivalent to mortality of more than 80%.                         estimating the damage occurred and augments the ability to
         Apart from the SOFA, quick SOFA (qSOFA) is also                      serve the sepsis patient better. From the literature review,
      introduced in February 2016. But it does not take as many               it is evident that there has been a lot of improvement
      parameters and gives only a rough idea of a patient's condi-            in prediction accuracy using machine learning techniques
      tion if it's good or bad. This score uses only blood pressure           like TREWScore, InSight, and Auto Triage. But there has
      (0/ 1), respiration rate (0/1) and Glasgow coma scale (0/1)             been no significant research in data mining techniques. [19]
      ( [8]).It will not continuously evaluate the parameters but             study on data mining on ICU patient deterioration prediction
      rather gives an index of zero or one for the levels. Since there        done, suggested the features that most affect the accurate
      are only 3 parameters, score ranges from 0-3. If qSOFA≥2,               prediction. So, with the help of those attributes further
      the mortality rate is very high. qSOFA can be performed at              research is carried out on improving the accuracy of sepsis
      the time of emergencies because it's fast enough and during             prediction.
      emergencies, the values of these parameters are usually                    The forthcoming sections of the paper help in under-
      extreme. qSOFA serves as a screening mechanism to prompt                standing the process followed in improving the accuracy
      the clinician to further investigate for sepsis or to transfer to       of the model. Section 3 describes the dataset used and the
      a higher level of care.                                                 attributes to predict sepsis. Various classification algorithms
         Like SOFA and qSOFA scores, other scores like Modi-                  are also described in Section 4. All the types of experiments
      fied Early warning score (MEWS) ( [9]), simplified acute                  done in the way to achieve a higher prediction probability
      physiological score (SAPS II) ( [10]), Vital Pac Early                  are explained in Section 5. Section 6 discusses the results
      Warning Score (ViEWS) ( [11]), Simple Clinical Score (                  obtained from the mentioned experiments.
      [12]), Mortality in Emergency Department Sepsis (MEDS)
      ( [13]), Acute Physiology and Chronic Health Evaluation II              3. Data collection & pre-processing
      (APACHE II) ( [14]), Rapid Emergency Medicine Score (
      [15]) are proposed. [16] proposed a targeted real-time early               Our research is done based on the Medical Information
      warning score (TREWScore) which has a specificity of 67%                 Mart for Intensive Care -III (MIMIC III) database. [20], [21]
      with 85% sensitivity. [17], unlike SOFA, qSOFA and other                contributions made it available for free with the details of
      prediction of sepsis-like MEWS etc., in September 2016                  over forty thousand de-identified patients who were treated
      developed a new Machine learning technique named ĂIJIn-                between 2001 and 2012 in the ICUs of Beth Israel Dea-
      sightĂİ. This technique is different from the older scores            coness Medical Center. This work is done with the help of
      because it shows and analyses the trends of the patient’s               grants from National Institute of Biomedical Imaging and
      organ failure data instead of just classifying them.Using this          Bioengineering (NIBIB) and National Institutes of Health
      method, instead of using fixed parameters for evaluation                 (NIH). MIMIC-III has a diverse range of data high temporal
      once can include any other parameter if needed. The training            resolution data which includes lab results, bedside monitor
      and testing process for Insight technique has 4 stages:                 trends, and waveforms, and electronic documentation.

                                                                          2

                                                 ISBN: 1-60132-501-0, CSREA Press ©
Int'l Conf. Artificial Intelligence | ICAI'19 |                                                                                       325

       In our work, we mainly concentrate on a few tables                   6) Glucose, ItemId = 51478
    namely, admissions which have the patient age, primary                  From d_items:
    diagnosis results, admit time and discharge/ death time                 1) Partial Pressure of Oxygen (PaO2), ItemId = 490
    (if any); d_labitems, the table which has all the names/                2) Fraction of Inhaled Oxygen, (FiO2) ItemId = 3420
    definitions of laboratory items that can be measured in                  3) Platelets, ItemId = 225170
    ICU; d_items table which has definitions of items in ICU                 4) Bilirubin, ItemId = 225690
    databases; Chart events the table which consists of the value           5) Glasgow coma score Total, ItemId = 19
    of the measured item and the time at which the measurement              6) Heart Rate, ItemId = 211
    was recorded. With the help of these tables, an approach is             7) Mean Arterial Pressure (MAP), ItemId = 438
    built to predict the condition of sepsis patient. As described          8) Blood Pressure, ItemId = 52
    by the MIMIC-III, we have accessed the database through                 9) Temperature, Itemid = 223761
    the PostgreSQL server.
                                                                            With the help of the above-defined fifteen attributes and
    3.1 Gold standard                                                    the proposed gold standard, datasets are created, and pre-
        [22] studies are helpful in identifying the patients with        diction and validation tests are carried out for various cases
    sepsis with the ICD9 code of 995.9. After selection of the           using Weka.
    patients in the retrospective dataset for inclusion, each of
    the patients underwent a binary classification process to             4. Classification algorithms
    designate them as positive or negative for having acquired              From the collection of algorithms present in Weka, the
    in-hospital sepsis. The classification was made based on              classification algorithms used in our approach are described.
    the patient meeting both of the following criteria: (1) The             1) ZeroR: [25] studies show that it is a simple clas-
    patient record contains an ICD9 code (995.9) indicating in-                sification algorithm which does not consider any of
    hospital contraction of sepsis (2) The patient meets the 1991              its predictors as explained. It only predicts the ma-
    Systemic Inflammatory Response Syndrome (SIRS) criteria                     jor category. Although ZeroR does not give a pre-
    for sepsis for a persistent 5âĂŘhour period. The beginning               diction results based on predictors, it serves as a
    of the patient’s first 5âĂŘhour SIRS event is defined as the               baseline/benchmark to compare the efficiency of other
    zero hour.                                                                 classification algorithms.
    3.2 Attribute Selection                                                 2) Naive Bayesian: [25] explains that this classifier is
                                                                               built on the Bayes theorem of probability with the
       The d_items and d_labitems has an exhaustive list of the                assumptions of independent predictors. It is easy to
    measurements taken in the ICU. But considering all the                     build a Bayesian model due to its independence of
    attributes gives less accuracy. [19] proposed to use feature               attributes and improves the ease of classification of
    selection. So only limited attributes called test attributes               very large datasets.
    are selected by using the technique of feature selection.               3) J48: As studied by [26], this algorithm is a decision
    The feature selection is carried out in an open source data                tree which is based on Iterative dichotomiser 3 (ID3)
    mining tool by name Weka introduced in [23]. Weka is a                     algorithm. The decision tree consists of decision nodes
    collection of data mining and machine learning algorithms                  and leaves. When this algorithm is applied, the dataset
    for preprocessing, classification and regression analysis of                is broken down into smaller and smaller subsets asso-
    data. The algorithm can also be loaded with a custom-                      ciations are developed depth-wise. Information gain
    designed algorithm, which can be implemented using Java.                   and entropy are the driving factors of this algorithm.
         [24] helped us with feature selection and we use su-                  Entropy is the degree of heterogeneity in the data and
    pervised.attribute. InfoGainAttributeEval. This filter is a                 Information gain is based on the decrease in entropy
    wrapper for the Weka class that computes the information                   after the split in the tree.
    gain on a class.                                                        4) Random Tree: According to [27], Random Treeś pre-
       • Attribute Subset Evaluator: InfoGainAttributeEval                     dictions depend on the random vectors which are sam-
       • Search Method: Ranker                                                 pled independently. K-attributes are chosen at random,
       • Evaluation mode: evaluate all training data                           and several splits are made in the tree based on the
    The attributes selected for prediction are:                                selected K attributes and the best split is chosen.
       From d_labitems:                                                     5) Sequential minimal optimization (SMO): [28] de-
       1) Urea Nitrogen, ItemId = 50851                                        scribes that SMO is a way of solving Support Vector
       2) Creatinine, ItemId = 50912                                           Machine (SVM) training problem which is more effi-
       3) INR, ItemId = 51237                                                  cient than conventional quadratic programming solver.
       4) Partial Thromboplastin Time (PTT) = 51275                            By default, normalization is done by this algorithm.
       5) Prothrombin Time (PT), ItemId = 51274                                This algorithm uses the SVM output; converts them

                                                                     3

                                              ISBN: 1-60132-501-0, CSREA Press ©
326                                                                                            Int'l Conf. Artificial Intelligence | ICAI'19 |

            into probabilities based on its heuristics and applies a           In this experiment, we use SOFA score analysis attributes
            standard sigmoid function to optimize before classify-          for the dataset construction. The attributes are:
            ing the data.                                                      1) Partial pressure of Oxygen / Fraction of Inhaled Oxy-
                                                                                   gen (PaO2 / FiO2)
      5. Experiments                                                           2) Platelet Count
         Using the tables from the MIMIC-III dataset, a shortened              3) Bilirubin
      table is built with the test attributes proposed in Section              4) Glasgow coma scale
      2. Attributes like the patient Id, patient age, gender,                  5) Creatinine
      admit time in ICU, discharge from ICU time or death                      6) Hypotension
      time if the patient is dead, the status of the patient                   Except for hypotension, all the above attributes are
      (dead/alive) are also incorporated into the final shortened            available in MIMIC III, which is calculated from Mean
      table. Several experiments are conducted by building the              Arterial Pressure (MAP). If MAP is greater than 1
      datasets in various ways. The manipulations of the dataset            and the person is on vasopressors, then hypotension is
      presented below are achieved using PostgreSQL and Python.             calculated from Dopamine, Dobutamine, Epinephrine,
                                                                            Norepinephrine [30].The dataset is built by assigning the
      Experiment 1:The first dataset was built by taking the                 sofa scores on to the above-mentioned attributes based on
      average of all the test result of each unrepeated attribute           an hourly binned average. So, all the columns are now filled
      from d_items and d_labitems for each patient i.e. one                 with SOFA scores instead of the actual values of the dataset.
      patient would comprise of a row of attributes of patient
      details mentioned above and attributes taken from d_items             Experiment 6:
      and d_labitems as shown in Table 1. If the attribute is not              The experiment uses a SOFA score analysis on the
      present in the person’s record then it is flagged zero. If the         type of dataset used in experiment 2. For the experiment
      person is dead the the “Dead?" column is flagged as 1. If              2, the dataset which has fifteen test attributes averages,
      he is alive, then the column is set as “0".                           SOFA score attributes are added. Some of the SOFA scores
                                                                            attributes already exist in the test attributes. All such
         Experiment 2:                                                      attributes which can be given SOFA scores are assigned
         Instead of taking the entire set of attributes from the            with the scores and the remaining test attributes are given
      MIMIC III dataset, the test attributes mentioned in Section           their regular values. In this dataset, the SOFA scores are
      2 are taken to build the dataset. Cutting down the features           assigned to attributes of the average data which does not
      employed in training an algorithm is called Dimensionality            have any kind of hourly for 5-mins bins.
      reduction. [29] study stated that this technique improves
      accuracy greatly. A patient still has a single of the row as          Experiment 7:
      in Table 1 but only with 10 test attributes.                             In this experiment, SOFA scores are applied to experiment
                                                                            4 dataset in the similar fashion of experiment 6. But there
      Experiment 3:                                                         are hourly and 5 min bins in this dataset. Also, the SOFA
         Instead of taking the average for each attribute, the              score column is added as an additional attribute along with
      dataset in this experiment was built by taking the binning            other attributes. In this kind of dataset, we can clearly see
      the average of the test attributes for every hour from the            how SOFA scores vary just before the patient’s death due to
      admits time to discharge/death time. So, in this dataset, a           sepsis.
      person will have rows equal to the number of hours spent
      in the hospital.
                                                                            6. Results
      Experiment 4:                                                            The datasets constructed in the above-mentioned experi-
         Another kind of dataset is constructed by binning the test         ments are evaluated using the data mining tool Weka using
      attributes for each hour when the difference between the              the classifiers mentioned in Section 4. The accuracy and
      attribute at ti and ti+1 is negligible but binning all the test       other metrics of each experiment with respect to each clas-
      attributes average for a 5-minute span if the difference in           sifier are presented in Table 2, Figure 1, Figure 2, Figure 3.
      the attribute value at ti and ti+1 is significant and change           All the seven experiments are carried out with 10 cross-
      in the value affects the patient’s health. This experiment is         validation folds with the default settings for each classifier
      done with a hypothesis that the model might understand                algorithm.
      the data to predict better when such significant changes are              For experiment 1, the highest accuracy obtained is only
      considered.                                                           73.53% from Random tree algorithm. Naive Bayes classifi-
                                                                            cation fell even below the baseline. Due to the consideration
      Experiment 5:                                                         of all the existing features from both d_items and d_labitems,

                                                                        4

                                                ISBN: 1-60132-501-0, CSREA Press ©
Int'l Conf. Artificial Intelligence | ICAI'19 |                                                                                     327

                                                       Table 1: Sample dataset
      Patient ID      Avg(Heart rate)       Avg(Attribute 2     Avg(Attribute 3)          .....   Avg(Attribute 1000)    Dead?
        10185              81                     10                  1.2                  ....            0               1
        10185              78                     11                  1.9                  ....            0               1
        10124              73                     15                  0.5                  ....           0.4              0

                                                  Table 2: Experiment wise accuracy
      Algorithm       Experiment 1    Experiment 2   Experiment 3       Experiment 4   Experiment 5   Experiment 6   Experiment 6
        ZeroR            71.36           71.36          74.85              73.43          74.85          71.36          73.43
         J48             72.31           76.78          81.67              77.86          87.60          77.18          92.34
     Random Tree         73.53           76.53          77.43              76.40          79.17          75.63          87.95
        SMO              71.95           72.12          73.59              75.79          81.52          74.68          82.59
     Naive Bayes         56.93           67.91          75.39              75.14          82.19          70.12          86.13

                                                  Fig. 1: Experiment wise accuracy

                                                  Fig. 2: Experiment wise precision

                                                                    5

                                              ISBN: 1-60132-501-0, CSREA Press ©
328                                                                                               Int'l Conf. Artificial Intelligence | ICAI'19 |

      the classifier algorithms couldn’t predict the status of a            the usage of this model. This higher accuracy is obtained
      patient based on all the available attributes and hence the          by employing methods like dimensionality reduction, data
      accuracy is low. [31] research results aligns with this result       binning with respect to time, integrating SOFA score. [19]
      that shows Random Trees handles larger data better.                  studies show that accuracy achieved by them using similar
         For experiment 2, there is a little improvement in                classifier algorithms in Weka on the MIMIC II dataset is
      accuracy to 76.78% which is obtained by using J48                    77.58%. Our result can be considered as an improvement
      decision tree algorithm. This improvement in percentage              over that study.
      is due to lesser number of features used to predict which               Future work includes improving the accuracy of the early
      reduced the number of classifications to be done when                 prediction of sepsis by developing a model using data mining
      compared to the previous experiment where a very large               with machine learning. Data mining helps in pre-processing
      number of classifications happen due to a higher number of            the data while techniques like neural networks in machine
      attributes.                                                          learning help in advance prediction. The final goal is to
                                                                           develop a Sepsis Assessment Module. This module devel-
         The accuracy of prediction for experiment 3 improved              oped during this project should assist healthcare providers
      because of the hourly binning of data. This enables the              in identifying patients at risk of sepsis. This will provide
      system to understand the data of each attribute in a time span       a potential mechanism for earlier therapeutic interventions
      of every hour and classify the data in a much better way. The        that may avoid or mitigate against patient harm.
      highest accuracy is obtained from the usage of J48 classifier
      which is 81.67%. For experiment 4, we expected that the
      accuracy might improve because the algorithm can learn               References
      better in different instances. In contradiction, the accuracy
      decreased. The highest accuracy obtained in experiment 4              [1] Roger C Bone, Robert A Balk, Frank B Cerra, R Phillip Dellinger,
      is 77.86% with the J48 classifier. The decrease in accuracy                Alan M Fein, William AKnaus, Roland MH Schein, and William J
                                                                                Sibbald. Definitions for sepsis and organ failureand guidelines for
      shows that the system was unable to understand the patterns               the use of innovative therapies in sepsis.Chest, 101(6):1644âĂŞ1655,
      to achieve better classification with the unequal intervals of             1992.
      time (i.e. some data is for a one hour time span and some             [2] Andrew Lever and Iain Mackenzie. Sepsis: definition, epidemiology,
                                                                                and diagnosis.Bmj,335(7625):879âĂŞ883, 2007.
      data is for a 5-min time span).                                       [3] Lioudmila V Karnatovskaia and Emir Festic. Sepsis: A review for the
         With the dataset, in experiment 5, the accuracy has                    neurohospitalist.TheNeurohospitalist, 2(4):144âĂŞ153, 2012.
      improved significantly. On the integration of SOFA score               [4] Mani Chopra and Avadhesh C Sharma. Distinct cardiodynamic and
      with the hourly average dataset only for certain attributes,              molecular characteris-tics during early and late stages of sepsis-
                                                                                induced myocardial dysfunction.Life sciences,81(4):306âĂŞ316,
      the improved accuracy is 87.60% achieved by J48 algorithm.                2007.
      The accuracy improved because it is easier for the algorithm          [5] Eren Gultepe, Jeffrey P Green, Hien Nguyen, Jason Adams, Timothy
      to classify a bunch of 0’s, 1’s, 2’s, 3’s and 4’s instead of              Albertson, and Ilias Tagkopoulos. From vital signs to clinical out-
                                                                                comes for patients with sepsis: a machine learning basis for a clinical
      numerous real values. The accuracy of experiment 6 has                    decision support system. Journal of the American Medical Informatics
      fallen below experiment 5. The highest accuracy is 77.18%                 Association, 21(2):315âĂŞ325, 2013.
      which is lower than the experiment 5 highest accuracy but             [6] Alan E Jones, Stephen Trzeciak, and Jeffrey A Kline. The sequential
                                                                                organ failure as-sessment score for predicting outcome in patients with
      more than experiment 2 highest accuracy. The experiment                   severe sepsis and evidence of hypoperfusion at the time of emergency
      2 doesn’t have SOFA score analysis and so has lower                       department presentation.Critical care medicine,37(5):1649, 2009.
      accuracy. But the experiment 5 has both hourly bins along             [7] J-L Vincent, Rui Moreno, Jukka Takala, Sheila Willatts, Arnaldo
                                                                                De Mendon Ìğca, HajoBruining, CK Reinhart, PeterM Suter, and
      with SOFA score analysis thereby having a higher accuracy.                LG Thijs. The sofa (sepsis-related organfailure assessment) score to
      So, to predict a patient’s situation using this model, if he              describe organ dysfunction/failure, 1996.
      is dead/alive, this model can give an 87 percent accurate             [8] Eamon P Raith, Andrew A Udy, Michael Bailey, Steven McGloughlin,
      prediction.                                                               Christopher MacIsaac, Rinaldo Bellomo, and David V Pilcher. Prog-
                                                                                nostic accuracy of the sofa score, sirs criteria, and qsofa score for
         The dataset in experiment 7 has been trained so well that              in-hospital mortality among adults with suspected infection admitted
      its accuracy has significantly improved to 92.34% classified                to the intensive care unit. Jama, 317(3):290âĂŞ300, 2017.
      by J48 algorithm. This was a little unexpected as the                 [9] VC Burch, G Tarr, and C Morroni. Modified early warning score
                                                                                predicts the need forhospital admission and inhospital mortal-
      experiment 4 accuracy decreased. For the same reason for                  ity.Emergency Medicine Journal, 25(10):674âĂŞ678, 2008.
      classifying in a range of 0 to 4 the accuracy has improved.          [10] Jean-Roger Le Gall, Stanley Lemeshow, and Fabienne Saulnier. A new
                                                                                simplified acutephysiology score (saps ii) based on a european/north
      7. Conclusion and future work                                             american multicenter study.Jama,270(24):2957âĂŞ2963, 1993.
                                                                           [11] David R Prytherch, Gary B Smith, Paul E Schmidt, and Peter I Feath-
         In this paper, an improved model of sepsis prediction                  erstone.ViewsËEatowards
                                                                                                 ˛          a national early warning score for detecting
      using data mining is presented. All the experiments that                  adult inpatient deterioration.Resuscitation, 81(8):932âĂŞ937, 2010.
                                                                           [12] J Kellett and B Deane. The simple clinical score predicts mortality
      were conducted and the accuracies of each experiment are                  for 30 days afteradmission to an acute medical unit.Journal of the
      discussed. The accuracy of 92.34% can be obtained on                      Association of Physicians, 99(11):771âĂŞ781, 2006.

                                                                       6

                                               ISBN: 1-60132-501-0, CSREA Press ©
Int'l Conf. Artificial Intelligence | ICAI'19 |                                                                                                                      329

                                                                Fig. 3: Experiment wise recall

    [13] Nathan I Shapiro, Richard E Wolfe, Richard B Moore, Eric Smith,                     logical estimates.Critical Care, 20(1):313, 2016.
         Elizabeth Burdick, andDavid W Bates. Mortality in emergency depart-            [23] Geoffrey Holmes, Andrew Donkin, and Ian H Witten. Weka: A
         ment sepsis (meds) score: a prospectivelyderived and validated clinical             machine learning workbench. In Intelligent Information Systems,
         prediction rule.Critical care medicine, 31(3):670âĂŞ675, 2003.                    1994. Proceedings of the 1994 Second Australian and New Zealand
    [14] William A Knaus, Jack E Zimmerman, Douglas P Wagner, Eliz-                          Conference on, pages 357âĂŞ361. IEEE, 1994.
         abeth A Draper, and Diane ELawrence. Apache-acute physiology                   [24] Lei Yu and Huan Liu. Feature selection for high-dimensional data: A
         and chronic health evaluation: a physiologicallybased classification                 fast correlation-basedfilter solution. InProceedings of the 20th interna-
         system.Critical care medicine, 9(8):591âĂŞ597, 1981.                              tional conference on machine learning(ICML-03), pages 856âĂŞ863,
    [15] Thomas Olsson and Lars Lind. Comparison of the rapid emergency                      2003.
         medicine score andapache ii in nonsurgical emergency department                     simple bayesian classifierunder zero-one loss.Machine learning, 29(2-
         patients.Academic emergency medicine,10(10):1040âĂŞ1048, 2003.                    3):103âĂŞ130, 1997.
    [16] Katharine E Henry, David N Hager, Peter J Pronovost, and Suchi                 [26] Neeraj Bhargava, Girja Sharma, Ritu Bhargava, and Manish Mathuria.
         Saria. A targeted realtime early warning score (trewscore) for septic               Decision tree anal-ysis on j48 algorithm for data mining.Proceedings
         shock. Science translational medicine, 7 (299):299ra122âĂŞ299ra122,               of International Journal of AdvancedResearch in Computer Science
         2015.                                                                               and Software Engineering, 3(6), 2013.
    [17] Thomas Desautels, Jacob Calvert, Jana Hoffman, Melissa Jay, Yaniv              [27] Chris Thornton, Frank Hutter, Holger H Hoos, and Kevin Leyton-
         Kerem, Lisa Shieh, David Shimabukuro, Uli Chettipally, Mitchell D                   Brown. Auto-weka: Combined selection and hyperparameter opti-
         Feldman, Chris Barton, et al. Prediction of sepsis in the intensive care            mization of classification algorithms. In Proceedings of the 19th ACM
         unit with minimal electronic health record data: a machine learning                 SIGKDD international conference on Knowledge discovery and data
         approach. JMIR medical informatics, 4(3), 2016.                                     mining, pages 847âĂŞ855. ACM, 2013.
    [18] Jacob S Calvert, Daniel A Price, Christopher W Barton, Uli K                   [28] Lei Chen, Chen Chu, and Kaiyan Feng. Predicting the types of
         Chettipally, and RitankarDas. Discharge recommendation based on                     metabolic pathway of compounds using molecular fragments and
         a novel technique of homeostatic analysis.Journal of the American                   sequential minimal optimization. Combinatorial Chemistry High
         Medical Informatics Association, 24(1):24âĂŞ29, 2016.                             Throughput Screening, 19(2):136âĂŞ143, 2016.
    [19] Noura Alnuaimi, Mohammad M Masud, and Farhan Mohammed. Icu                     [29] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl.
         patient deteriorationprediction: a data-mining approach.arXiv preprint              Application of dimen-sionality reduction in recommender system-a
         arXiv:1511.06910, 2015.                                                             case study. Technical report, MinnesotaUniv Minneapolis Dept of
    [20] Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei,                       Computer Science, 2000.
         Mengling Feng, Moham-mad Ghassemi, Benjamin Moody, Peter                       [30] Flavio Lopes Ferreira, Daliana Peres Bota, Annette Bross, Chris-
         Szolovits, Leo Anthony Celi, and Roger G Mark.Mimic-iii, a freely                   tian MÂt’elot, and Jean-Louis Vincent. Serial evaluation of the
         accessible critical care database.Scientific data, 3:160035, 2016.                   sofa score to predict outcome in critically ill patients. Jama,
    [21] Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff,                  286(14):1754âĂŞ1758, 2001.
         Plamen Ch Ivanov,Roger G Mark, Joseph E Mietus, George B Moody,                [31] Jo ÌČao Maroco, Dina Silva, Ana Rodrigues, Manuela Guerreiro,
         Chung-Kang Peng, and H EugeneStanley. Physiobank, physiotoolkit,                    Isabel Santana, and Alexan-dre de Mendon Ìğca. Data mining methods
         and physionet.Circulation, 101(23):e215âĂŞe220, 2000.                             in the prediction of dementia: A real-datacomparison of the accu-
    [22] C Bouza, T Lopez-Cuadrado, and JM Amate-Blanco. Use of explicit                     racy, sensitivity and specificity of linear discriminant analysis,logistic
         icd9-cm codes toidentify adult severe sepsis: impacts on epidemio-                  regression, neural networks, support vector machines, classification
    [25] Pedro Domingos and Michael Pazzani. On the optimality of the                        trees and ran-dom forests.BMC research notes, 4(1):299, 2011.

                                                                                    7

                                                        ISBN: 1-60132-501-0, CSREA Press ©
You can also read