MAGNIFICENT BEASTS OF THE MILKY WAY: HUNTING DOWN STARS WITH UNUSUAL INFRARED PROPERTIES USING SUPERVISED MACHINE LEARNING - DIVA PORTAL

Page created by Sherry Garrett
 
CONTINUE READING
MAGNIFICENT BEASTS OF THE MILKY WAY: HUNTING DOWN STARS WITH UNUSUAL INFRARED PROPERTIES USING SUPERVISED MACHINE LEARNING - DIVA PORTAL
.

Magnificent beasts of the Milky Way: Hunting
 down stars with unusual infrared properties
     using supervised machine learning

                              Julia Ahlvind1
                        Supervisor: Erik Zackrisson1
                       Subject reader: Eric Stempels1
                          Examiner: Andreas Korn1
              Degree project E in Physics – Astronomy, 30 ECTS
       1
           Department of Physics and Astronomy – Uppsala University
                               June 22, 2021
MAGNIFICENT BEASTS OF THE MILKY WAY: HUNTING DOWN STARS WITH UNUSUAL INFRARED PROPERTIES USING SUPERVISED MACHINE LEARNING - DIVA PORTAL
Contents
1 Background                                                                                                                                                                                              2
  1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                        2

2 Theory: Machine Learning                                                                                                                                                                                2
  2.1 Supervised machine learning . . . . . . . . .                               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   3
  2.2 Classification . . . . . . . . . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   3
  2.3 Various models . . . . . . . . . . . . . . . .                              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   3
      2.3.1 k-nearest neighbour (kNN) . . . . .                                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   3
      2.3.2 Decision tree . . . . . . . . . . . . .                               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   4
      2.3.3 Support Vector Machine (SVM) . .                                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   4
      2.3.4 Discriminant analysis . . . . . . . .                                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   5
      2.3.5 Ensemble . . . . . . . . . . . . . . .                                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   6
  2.4 Hyperparameter tuning . . . . . . . . . . .                                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   6
  2.5 Evaluation . . . . . . . . . . . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   6
      2.5.1 Confusion matrix . . . . . . . . . . .                                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   6
      2.5.2 Precision and classification accuracy                                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   7

3 Theory: Astronomy                                                                                                                                                                                      7
  3.1 Dyson spheres . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 8
  3.2 Dust-enshrouded stars       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 8
  3.3 Gray Dust . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 9
  3.4 M-dwarf . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 10
  3.5 post-AGB stars . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 10

4 Data and program                                                                                                                                                                                        10
  4.1 Gaia . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
  4.2 AllWISE . . . . . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
  4.3 2MASS . . . . . . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
  4.4 MATLAB . . . . . . . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
      4.4.1 Decision trees . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
      4.4.2 Discriminant analysis .                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
      4.4.3 Support Vector Machine                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
      4.4.4 k-nearest neighbour . .                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
      4.4.5 Ensemble . . . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13

5 The general method                                                                                                                                                                                      14
  5.1 Forming datasets and building DS models                                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
      5.1.1 Training set stars . . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
      5.1.2 Training set Dyson spheres . . . .                                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
      5.1.3 Training set YSOs . . . . . . . . .                               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
  5.2 Finding and identifying the DS candidates                               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
      5.2.1 Manual analysis of DS candidates                                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
  5.3 Testing set . . . . . . . . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
  5.4 Best fitted model . . . . . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18

6 Process                                                                                                                                                                                                 18
  6.1 limiting DS magnitudes . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   19
  6.2 Introducing a third class . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   19
  6.3 Coordinate dependence . . . .                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   20
  6.4 Malmquist bias . . . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   21
  6.5 cc flag . . . . . . . . . . . . . .             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
  6.6 Feature selection . . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
  6.7 Proportions of the training sets                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   23
MAGNIFICENT BEASTS OF THE MILKY WAY: HUNTING DOWN STARS WITH UNUSUAL INFRARED PROPERTIES USING SUPERVISED MACHINE LEARNING - DIVA PORTAL
7 Result                                                                                                                                                                                            23
  7.1 Frequent sources of reference . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   24
      7.1.1 Marton et al. (2016) . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   24
      7.1.2 Marton et al. (2019) . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   24
      7.1.3 Stassun et al. (2018) . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   24
      7.1.4 Stassun et al. (2019) . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   25
  7.2 A selection of intriguing targets . .             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   25
      7.2.1 J18212449-2536350 (Nr.1) .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   26
      7.2.2 J04243606+1310150 (Nr.8)                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   27
      7.2.3 J18242978-2946492 (Nr.10)                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   28
      7.2.4 J18170389+6433549 (Nr.22)                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   29
      7.2.5 J14492607-6515421 (Nr.26)                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   30
      7.2.6 J06110354-4711294 (Nr.30)                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   31
      7.2.7 J05261975-0623574 (Nr.37)                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   32
      7.2.8 J21173917+6855097 (Nr.46)                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   33
  7.3 Summary of the results . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   34

8 Discussion                                                                                                                                                                                        34
  8.1 Evaluation of the approach . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   35
      8.1.1 The influence of various algorithms                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   35
      8.1.2 Training sets . . . . . . . . . . . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   36
  8.2 Challenges . . . . . . . . . . . . . . . . . .                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   37
  8.3 Follow-up observations . . . . . . . . . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   37
  8.4 Grid search . . . . . . . . . . . . . . . . .                     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   38
  8.5 Future prospects and improvements . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   39

9 Conclusion                                                                                                                                                                                        40

A Appendix                                                                                                                                                                                          i
  A.1 Candidates . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   i
  A.2 Uncertainty derivations .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . iii
  A.3 Results of various models     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . iv
  A.4 Algorithm . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . v
      A.4.1 linear SVM . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . v
      A.4.2 quadratic SVM . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . vii
Abstract
   The significant increase of astronomical data necessitates new strategies and developments to
analyse a large amount of information, which no longer is efficient if done by hand. Supervised
machine learning is an example of one such modern strategy. In this work, we apply the classification
technique on Gaia+2MASS+WISE data to explore the usage of supervised machine learning on large
astronomical archives. The idea is to create an algorithm that recognises entries with unusual infrared
properties which could be interesting for follow-up observations. The programming is executed in
MATLAB and the training of the algorithms in the classification learner application of MATLAB. Each
catalogue; Gaia+2MASS+WISE contains „ 109 , 5ˆ108 and 7ˆ108 (The European Space Agency
2019, Skrutskie et al. 2006, R. M. Cutri IPAC/Caltech) entries respectively. The algorithms searches
through a sample from these archives consisting of 765266 entries, corresponding to objects within a
ă 500 pc range. The project resulted in a list of 57 entries with unusual infrared properties, out of
which 8 targets showed none of the four common features that provide a natural physical explanation
to the unconventional energy distribution. After more comprehensive studies of the aforementioned
targets, we deem it necessary for further studies and observations on 2 out of the 8 targets (Nr.1 and
Nr.8 in table 3) to establish their true nature. The results demonstrate the applicability of machine
learning in astronomy as well as suggesting a sample of intriguing targets for further studies.
Sammanfattning
     Inom astronomi samlas stora mängder data in kontinuerligt och dess tillväxt ökar snabbt för varje
år. Detta medför att manuella analyser av datan blir mindre och mindre lönsama och kräver istället
nya strategier och metoder där stora datamängder snabbare kan analyseras. Ett exempel på en sådan
strategi är vägledd maskininlärning. I detta arbete utnyttjar vi en vägled maskininlärnings teknik
kallad klassificering. Vi använder klassificerings tekniken på data från de tre stora astronomiska
katalogerna Gaia+2MASS+WISE för att undersöka användningen av denna teknik på just stora
astronomiska arkiv. Idén är att skapa en algorithm som identifierar objekt med okontroversiella
infraröda egenskaper som kan vara intressanta för vidare observationer och analyser. Dessa ovanliga
objekt är förväntade att ha en lägre emission i det optiska våglängdsområdet och en högre emission
i det infraröda än vad vanligtvis är observerad för en stjärna. Programmeringen sker i MATLAB och
träningsprocessen av algoritmerna i MATLABs applikation classification learner. Algoritmerna söker
igenom en samling data bestående av 765266 objekt, från katalogerna Gaia+2MASS+WISE. Dessa
kataloger innehåller totalt „ 109 , 5ˆ108 och 7ˆ108 (The European Space Agency 2019, Skrutskie
et al. 2006, R. M. Cutri IPAC/Caltech) objekt vardera. Det begränsade dataset som algoritmerna
söker igenom motsvarar objekt inom en radie av ă 500 pc. Många av de objekt som algoritmerna
identifierade som ”ovanliga” tycks i själva verket vara nebulösa objekt. Den naturliga förklaringen för
dess infraröda överskott är det omslutande stoft som ger upphov till värmestrålning i det infraröda.
För att eliminera denna typ av objekt och fokusera sökningen på mer okonventionella objekt gjordes
modifieringar av programmen. En av de huvudsakliga ändringarna var att introducera en tredje klass
bestående av stjärnor inneslutna av stoft som vi kallar ”YSO”-klassen. Ytterligare en ändring som
medförde förbättrade resultat var att introducera koordninaterna i träningen samt vid den slutgiltiga
klassificeringen och på så vis, identifiering av intressanta kandidater. Dessa justeringar resulterade i
en minskad andelen nebulösa objekt i klassen av ”ovanliga” objekt som algoritmerna identifierade.
Projektet resulterade i en lista av 57 objekt med ovanliga infraröda egenskaper. 8 av dessa objekt
påvisade ingen av det fyra vanligt förekommande egenskaperna som kan ge en naturlig förklaring
på dess överflöd av infraröd strålning. Dessa egenskaper är; nebulös omgivning eller påvisad stoft,
variabilitet, Hα emission eller maser strålning. Efter vidare undersökning av de 8 tidigare nämnda
objekt anser vi att 2 av dessa behöver vidare observationer och analys för att kunna fastslå dess sanna
natur (Nr.1 och Nr.8 i tabell 3). Den infraröda strålningen är alltså inte enkelt förklarad för dessa 2
objekt. Resultaten av intressanta objekt samt övriga resultat från maskininlärningen, visar på att
klassificeringstekniken inom maskininlärning är användbart på stora astronomiska datamängder.
List of Acronyms
Machine learning
ML – Machine learning.
model – Another word for the machine learning system or algorithm that is trained on the training data.
training data - A dataset that consists of both input and output data, which is used in the training process
of the models.
predictions- The output of a model after it has been trained on training data.
input data - The data that is fed to the model, i.e. the first part of the training data.
labeled data - Another word for the output data, i.e. the second part of the training data.
test data - A dataset that is used to test the performance of the model and that is different from the training
set.
training a model - A process where the algorithms are provided training data, where it learns to recognise
patterns.
hyperparameter - A parameter whose value is used to control the learning process, thus, adjusting the model
in more detail.
classifier - A method on solving classification problems.
kNN – k-Nearest Neighbour
SVM – Support Vector Machine
classes/labels - The output of a classifier.
parametric – For parametric models, the training data is used to initially establish parameters. However once
the model has been trained, the training data can be discarded, since it is not explicitly used when making
predictions.
loss function - A function that measures how well the model’s predicted output fits the input data.
TP – True positive.
FN – False negative.
FP – False positive.
TN – True negative.

Astronomy
SED – Spectral Energy Distribution.
YSO – Young Stellar Object.
PMS – Pre-main-sequence
MS – Main sequence
IR – Infrared
FIR – Far infrared
NIR – Near infrared
UV - Ultraviolet
DS – Dyson sphere

                                                      1
1     Background                                                 on large astronomical data. The procedure is to create
                                                                 a sample of models of typical targets that contain
1.1    Introduction                                              these unusual infrared properties, as well as a sample
                                                                 of common stars. The unconventional objects, as the
Progress in astronomy has been driven by a number                ones that fit the Dyson sphere models, are limited to
of important revolutions, such as the telescope,                 near solar-magnitudes within this study. Thereafter,
photography, spectroscopy, electronics and computers,            we apply ML to train algorithm(s) to recognise and
which have resulted in a major burst of data.                    categorise stars with uncommon infrared properties
Astronomers are now facing new challenges of handling            from the Gaia+WISE+2MASS catalogues. The Gaia
large quantities of astronomic data which accumulates            archive is a grand ESA (European Space Agency)
every day. Large astronomical catalogues such as                 catalogue comprising brightnesses measurements and
Gaia, with its library of „1.6 billion stars, hold               positions of more than 1.6 billion stars with remarkable
key information on stellar evolution, galaxy formation           accuracy. The brightness is measured in the three
processes, the distribution of stars and much more. It           passbands GBP (BP), G and GRP (RP) which covers
is crucial for the progression of astronomic research,           wavelengths from the optical („300 nm) to near-
not only to access this data but to extract relevant             infrared (NIR) („1.1 µm). The 2MASS (Two Micron
and important information. It is unrealistic to assume           All Sky Survey) catalogue contains, besides astrometric
that all data will be manually analysed. Therefore, we           data, ground-based photometric measurements in
are entering an era where we put more confidence in              the NIR regime („1.1-2.4 µm), with around 470
computers and machine learning (ML) processes, to do             million sources. Complementary to the optical-NIR
the work at hand.                                                measurements, WISE (Wide-field Infrared Survey
    The astronomic data avalanche is not solely the              Explorer) photometry data is recovered in four mid-
cause of the increasing usage of ML. The technique               infrared (mid-IR) bandpasses („2.5-28 µm). The
is becoming more and more popular and important in               AllWISE source catalogue contains „ 7 ˆ 108 entries
numerous fields. Moreover, the growing interest is not           (R. M. Cutri IPAC/Caltech). Note that this catalogue
isolated to astronomical research, but in other scientific       is not a point source catalogue, meaning that it
research, financial computations, medical assistance             contains some resolved sources such as galaxies and
and applications in our mobile phones. There are many            filaments in galactic nebulosity. Using these three
subdivisions for machine learning that can be used               archives enables a broader wavelength range, necessary
in various fields. For the purposes of this work, we             for our study. The project is expected to result in a
will adopt supervised machine learning (SML). This               list of intriguing objects that are suitable for follow-up
ML technique exploits the fact that the input data is            observations. Questions that will be considered are; are
classified so that the predictions of the training can be        there any objects with properties consistent with those
evaluated. This is in contrast to unsupervised machine           expected for Dyson spheres in our night sky? If so, is
learning which trains models based on unclassified               it possible to determine their nature based on existing
training data. Simply put, the difference between                data outside the Gaia+WISE+2MASS catalogue?
unsupervised and supervised ML is, thus, that the
training process is further supervised for the latter                In this report we include relevant background
technique.                                                       theory of machine learning and astronomical
    Supervised ML can be further divided into                    knowledge in section 2 & 3, respectively. In section
subgroups, namely regression and classification                  4, we review the databases used in the ML part of
techniques.     The technicalities of the latter are             the work, the programs and applications exploited in
discussed in section 2.2 and is used within this work.           section 4.4 and the employed programming methods
The classifier method will be adapted in order to                and scripts in section 5. Further on, in section 6, we
categorise stars with and without unusual infrared               discuss the process and steps of implementations taken
properties. Several types of stars are expected to have          throughout the work to improve the programs. The
these kinds of properties. Some of them being stars              main results are thereafter appraised in section 7, as
surrounded by debris disks and dust-enshrouded stars,            well as a selection of the most prosperous candidates.
such as young stellar objects (YSOs) or protostars.              Thereafter we discuss the overall progress, results, the
Further intriguing objects with such infrared properties         application of ML in astronomy and future prospects
are the hypothetical Dyson spheres (DS). A Dyson                 and improvements in section 8. Finally, we conclude
sphere is an artificial mega-structure which encloses            the results of this work and its appliance in section 9.
a star with the porpoise of harvest the stellar light.
By narrowing down the search to the latter group of
alluring objects, we hope to discover intriguing targets         2    Theory: Machine Learning
that are not directly explained by previously observed
astronomical phenomena.                             Machine learning is the scientific area involving the
                                                    study of a computer algorithm that learns and
  With this work, we aim to exploit the use of improves on past experiences. A ML algorithm or
machine learning and thereby test its applicability model is constructed on sample data, referred to as

                                                             2
training data, with the goal to predict and asses                2. Therefore, the classes of the outcome can be referred
an outcome without being trained for that specific               to as positive and negative classes.
task. There are two main subgroups of ML which
are used in various situations, unsupervised learning
and supervised learning. In the former technique,
the algorithm is trained to identify intrinsic patterns
                                                                 2.3    Various models
within the input-training data without any knowledge    There are many different models or algorithms that
of labeled responses (output data). A commonly used     can be used in machine learning. Similar to the
technique of unsupervised ML is clustering. It is used  choice of supervised or unsupervised machine learning,
to find hidden patterns and groupings in the data,      different models are suitable for different purposes.
which has not been categorised or labelled. Instead     Furthermore, the classification model that depends on
of relying on labeled data, the algorithm looks for     the chosen machine learning algorithms, can further be
commonalities in the training data and categorises      tuned by changing the values of the hyperparameters
entries according to presence or absence of these       (see section 2.4). Some of the models commonly used
properties. The latter subgroup, supervised ML, is      for classification, as described in the former section, are
used within this work and presented in the following    discussed below. Note however, that these methods are
section.                                                also applicable for other machine learning techniques
                                                        besides classification. The algorithms reviewed in the
2.1 Supervised machine learning                         following section are also the ones adapted in this work.
                                                        By trial and error we adopt the various algorithms
Supervised machine learning techniques utilise the to identify what algorithm(s) are most suited for this
whole training data which contains some input data, classification problem.
output data and the relationship between the two
groups. By using a model, that has previously been
adapted to the training data, one can predict the 2.3.1 k-nearest neighbour (kNN)
outcome i.e. the output data from a new set of data, The k-nearest neighbour method for supervised
that is different from the training data (Lindholm machine learning is, as many other SML models, based
et al. 2020). The process of adapting the model onto on the predictions of its neighbours. If an input
the training data is referred to as training the model. data point xi is close to a training data point xt ,
Unsupervised machine learning is particularly useful then the prediction of the input data yi (xi ) should
for cases where the relationship between the input data be close to the prediction of the training set yt (xt ).
x and output data y is not explicit. For instance, the For example, if k “ 1, then the object is assigned
relationship may be too complicated or even unknown the class of its single nearest neighbour. For higher
from the training data. Thus, the problem can not be values of k, the data point is classified by assigning
solved with a traditional computer program of input the label that is most prevailing among its k number
x that returns the output y, based on some common of nearest neighbours. The kNN algorithm works in
set of rules. Supervised machine learning, instead, various metric spaces. A common metric space is the
approaches the problem by learning the relationship Euclidean metric, where the distance between two data
between x and y from training data.                     points (xj , xi ) in n-dimensions follow eq. 1 (Kataria &
                                                                 Singh 2013). Other examples of metric space that can
    As previously mentioned, which ML approach to                be used are Chebyshev or Minkowski.
use depends on the data and what you want to achieve.
SML is suitable for regression techniques, categorising                                          g
or classifying data since the algorithm needs training                                           fn
                                                                                                 fÿ              2
in order to make predictions on the unseen data.                    Dpxj , xi q “ ||xj ´ xi || “ e pxi,k ´ xj,k q    (1)
Regression techniques predict continuous responses e.g.                                           k
changes in the flux of a star. In contrast, classification
techniques predict discrete responses e.g. what type        As with all ML models, the parametric choices vary
of star it is. The latter approach will be used in this between problems. There is no optimal value for k
work and is more thoroughly described in the following  which is profitable for the majority of cases and few
section.                                                guidelines exist. However, one is that for a binary
                                                        (two-class) classification, it is beneficial to set k equal
                                                        to an odd number since this avoids tied classifications
2.2 Classification
                                                        (Hall et al. 2008). But in truth, one has to, by trial
The classification method categorises the output into and error, test the options on a training set to see
classes and can thus take M number of finite values. what is suitable for this particular classification. In
In the most simplest case, M=2 and is referred to general, a large value of k reduces the effect of noise
as a binary classification, if M ą2 we instead call it on the classification, but at the same time makes the
a multi-class classification. For the binary case, the boundaries between classes less distinct, thus risking
labelled responses are noted -1 and 1, instead of 1 and overfitting (the model has adapted too much to the

                                                             3
training data and will not be able to generalise well to
new data).

2.3.2   Decision tree
Another method commonly used in machine learning is
decision trees and more specifically classification trees
if the problem regards classification. The leaves in
a classification tree represent the class labels and the
branches the combined features which are used in order
to select the class. More specifically, the input variable
(previously referred to as xi ) is known as the root node,
the last nodes on the tree are known as leaf nodes or
terminal nodes and the intermediate nodes are called
internal nodes.
    The nodes of a decision tree are chosen by looking                 Figure 1: An illustration of a two dimensional support vector
                                                                       machine with corresponding notations.
for the optimum split of the features. One function that
measures the quality of a split is the Gini index. This
function selects the optimal separation for when the
data is split into groups, where one class dominates and               problem can be split into multiple binary classification
minimises the impurity of the two children nodes–the                   problems and thus be solved for.           The primary
following nodes after a split of a parent node. The                    objective of this technique is to project nonlinear
gini impurity thus measures the frequency at which any                 separable n-dimensional data samples onto a higher
element of the dataset is miscategorised when labelled,                dimensional space with the use of various kernel
and follows the formula seen in eq. 2. Here πˆlm is                    functions such that they can be separable by an n-
the proportion of the training observations in the lth                 dimensional hyperplane. This is referred to as a linear
region that belong to the mth class according to eq. 3.                classifier but higher orders such as quadratic, cubic and
In eq. 3, nl is the number of training data points in                  Gaussian also exists. The choice of the hyperplane in
node l, yi is the ith label of the training data point                 the dimensional space is important for the precision
(xi , yi ). Therefore, πˆlm is essentially the probability             and accuracy of the model.
of an element yi belonging to the class m (Lindholm                         One common choice is the maximum-margin
et al. 2020).                                                          hyperplane, where the distance between the two closest
                                                                       data points of each group (the support vectors), is
               M
               ÿ                              M
                                              ÿ                        maximised and that no miss classification occurs. The
        Ql “         πˆlm p1 ´ πˆlm q “ 1 ´       pπˆlm q2   (2)       hyperplane or the threshold which separates these two
               m“1                            m“1                      classes will thus reside in between the two closest
                                                                       points where the distance from the threshold and data
                              1 ÿ                                      point is called the margin (Hastie et al. 2009). In two
                     πˆlm “       1yi “ m                    (3)
                              nl                                       dimensions, one can visualise the data as two groups
                                                                       on the xy-plane, where one can draw two parallel lines
                                                                       representing the boundary of each group, orthogonal
    Trees can be sensitive to small changes in the                     to the shortest distance between the data points of
training data, which can result in large changes in the                the different groups (see figure 1). For this case,
tree, hence in the final outcome of the classification.                the hyperplane is also represented as a straight line
Therefore, the numbers of splits in classification trees               and positioned in between the two boundary lines. A
must be treated carefully in order to prevent overfitting              general hyperplane can be written as a set of points
i.e. where the tree does not generalise well from the                  x satisfying the eq. 4, where θ is the normal vector to
training data. In the use of decision trees, one can                   the hyperplane ( and y the classification label following
further improve the precision of the predicament by                    y P t ´1, `1 . The boundary lines (hyperplanes for
utilising the so-called pruning. Pruning is essentially                a higher dimension) thus represent the position where
the inverse process of splitting an internal node when                 anything on or above eq. 5a is of the class with label 1,
a sub-node or internal node is removed after training                  and on or below eq. 5c of class -1. The distance between
the data.                                                              these two boundary hyperplanes in the θ-direction is
                                                                         2
                                                                       ||θ|| (see figure 1), and to maximise the distance, ||θ||
2.3.3   Support Vector Machine (SVM)                                   needs to be minimised (Lindholm et al. 2020).
Support vector machine is yet a further ML technique
well suited for classification, but can also be used in
regression problems. In its most simple form, SVM
does not support multiclass classification, however, the                                      θT x ´ b “ y                      (4)

                                                                   4
constraints are followed. In other words, if C is large,
                                                         the constraints are hard to ignore and the margin is
                      θT x ´ b “ 1                  (5a) narrow. If C is small, the constraints are easily ignored
                      θT x ´ b “ 0                  (5b) and the margin is broad.
                     θ T x ´ b “ ´1                 (5c)
                                                                                        ÿ
                                                                        min||θ||2 ` C       maxp0, 1 ´ yi pθ T xi ` bqq    (8)
                                                                                        i
    Furthermore, if we also require that all
classifications are correct, in addition to having the        Furthermore, for non-linear separable data, even
maximum distance as stated above, we enforce eq. 7.       introducing soft margins would not suffice.        The
                    `           ˘                         kernel function provides a solution to this problem by
                   y θT xi ` b ě 1@i                  (7) projecting the data from a low-dimensional space to
                                                          a higher dimensional space (Noble 2006). The kernel
    When combining the two criteria, we get the so- function derives the relationships between every pair
called optimisation` problem,˘where ||θi || is subject to of data points as if they are in a higher dimension,
the constraints yi θ T xi ´ b ě 1@1 . Thus, we end meaning that the function does not actually do the
up with the function signpθ T x ´ bq. The choice of transformation. This is referred to as the kernel trick.
the maximum margins is often by default used by the Common kernel functions are polynomials of degree
program itself since it is the most stable solution under two or three, as well as radial basis function kernels
perturbations of input of further data points. However, such as Gaussian.
for non linearly separable data, the algorithm often
finds the soft margins. This lets the SVM algorithm             2.3.4    Discriminant analysis
deal with errors in the data by allowing a few outliers
to fall on the wrong side of the hyperplane, thus               Discriminant analysis uses the training set to
”misclassify” them without affecting the final result.          determine the position of boundaries that separates
In other words, the outlier(s) instead reside on the            the response classes. The locations of the boundaries
same side of the hyperplane with members of the                 are determined by treating each individual class as
opposite class. Clearly, we can not allow too many              samples from multidimensional Gaussian distributions.
misclassifications, and thus aiming to minimise the             The boundaries are thereafter drawn at the points
misclassification rate. Hence when introducing the              where the probability of classifying an element to
soft margins it is necessary to control and check the           either of the classes, is equal. The boundary is thus
process. Soft margins are used in a technique called            a function that depends on the parameters of the
cross validation. This technique is used to control the         fitted distributions. If the distribution of all classes
balance between misclassifications and overfitting. One         is assumed to be equal, the boundary equations are
can set the number of allowed misclassifications within         simplified greatly and become linear. Otherwise, the
the soft margins to get the best classification that is         boundaries are quadratic. The quadratic discriminant
not overfitting. This happens automatically for some            analysis (QDA) is slightly more demanding in terms
programs (like MATLABs classification learner (CL)) or          of memory and calculations, but it is still seen as an
can be set manually.                                            efficient classification technique. The QDA can be
    In order to control the position of the boundary,           expressed as seen in eq. 9 (Lindholm et al. 2020).
SVM utilises a so-called cost function. The idea is
to regulate the freedom of misclassified points on the                                        ´             ¯
action of maximising the margin by using the hinge loss                                 π̂m N x|µ̂m , Σ̂m
function. A point belonging to the positive class that                     ppy “ m|xq “ ř        ´            ¯            (9)
                                                                                          M
resides on or above the higher boundary or support                                        j“1 π̂j x|µ̂j , Σ̂j
vector (eq. 5a) has zero loss. A point of the positive
                                                                                    n
class on the threshold (eq. 5c) has loss one, and further           Where π̂m{j “ m{j
                                                                                    n is the number of training points
points in the positive class on the ”wrong” side of             in class m/j over the total number of data points and
the threshold have linearly increasing loss. The cost           N indicates Gaussian or normal distribution. Eq. 10
function thus accounts for the distance between the             shows the mean vector of each class among all training
support vector and the misclassified point. The further         data points within that class.
the separation, the more costly it gets to correctly
classify the point and less so to shift the threshold,                                         1     ÿ
                                                                                 µ̂m{j “                      xi          (10)
maximise the margin and miscategorise the point. The                                         nm{j   i:yi “m
total cost function can be stated as seen in eq. 8 where
the first term governs the maximisation of the margin               Finally, eq. 11 describes the covariance matrix of
and the second term accounts for the loss function              each class m, in other words, a matrix that describes
(Smola & Schölkopf 1998). The parameter C is a                 the covariance between each pair of elements of the
regularisation parameter that adjusts how well the              given vectors xi and µ̂m .

                                                            5
2.4     Hyperparameter tuning
                                                          Sometimes the performance of the classification
              1   ÿ                          T
     σ̂m “             pxi ´ µ̂m q pxi ´ µ̂m q       (11) can be optimised by setting the hyperparameters
             nm   i:yi “m                                 manually. Hyperparameters vary between models but
                                                          are generally seen as ”settings” for the algorithm.
                                                          A typical hyperparameter for SVM algorithms is the
                                                          polynomial degree of the kernel function, for kNN it
2.3.5 Ensemble                                            is the number of neighbours k. For decision trees
                                                          there are two important hyperparameters to consider;
Ensemble methods are meta-algorithms that uses the number of estimators (decision trees) and the
several copies of a fundamental model. This set of maximum allowed depth for each tree, i.e.               the
multiple copies are referred to as an ensemble of base maximum number of splits or branches. As with many
models, where base models can e.g. be one of the things in machine learning, there is no optimal choice
aforementioned models in the sections above. The of hyperparameters. Therefore, empirical trials of the
fundamental concept is to train each such base model hyperparameters are the preferred way to optimise the
in slightly different ways. Each base model makes its algorithm, this is also known as hyperparameter tuning.
own prediction and thereafter an average or majority Another approach is the so-called Grid search, which
vote is derived to obtain the final prediction (Lindholm builds and evaluates a model for each combination
et al. 2020). There are two main types of models of parameters specified in a grid. A third option
when discussing ensemble classification, bagging and is a random search which is based on a statistical
boosting. In bagging, or ”bootstrap aggregation”, distribution of each parameter, from which values are
multiple slightly different versions of the training set randomly sampled (Bergstra & Bengio 2012). This
are created. These sets are random overlapping subsets work is, as previously mentioned, executed in MATLAB
of the training data. This results in an ensemble of and MATLAB’s classification learner. Here the evaluation
similar base models which are not identical to the of the hyperparameter tuning is automatically done
core base model, thus when training the models, one by the program and commonly not tweaked by us.
reduces the variance and so the risk of overfitting. One However, hyperparameter tuning is indeed accessible
could summarise bagging as an ensemble of multiple in MATLAB and tested within this work. The available
models of the same type, where each one is trained and relevant hyperparameters in MATLABs CL is further
on a different generated random subset of the training discussed in section 4.4. Furthermore, we evaluate our
data. This method is not a new model technique, but experimentation of hyperparameter tuning by studying
a collection of formerly mentioned models that uses a a confusion matrix which is addressed in the below
new approach to the data itself. After the ensemble has section 2.5.1.
been trained, the results are aggregated by an average
or weighted average of the predicted class probabilities,
which results in a final prediction from the bagging.     2.5 Evaluation
    The second ensemble classification technique                2.5.1   Confusion matrix
mentioned is boosting. In contrast to bagging, the
base models in boosting are trained sequentially where          The most convenient way of evaluating the
each model aims to correct for the mistakes that the            performance of a classifier is to study a so-called
former models have made. Furthermore, an effect                 confusion matrix. For a binary classification problem,
of using boosting is bias reduction of the base model,          the confusion matrix is formed by separation of the
instead of the reduced variance in bagging. This allows         validation data in four groups depending on the true
boosting to turn an ensemble of weak base models into           output or label y and the predicted output ypxq.
one stronger model, without the heavy calculations              Figure 2 shows a general confusion matrix where the
that normally would be required (Lindholm et al.                green boxes, true positive (TP) and true negative
2020). Both boosting and bagging are ensemble                   (TN), show the correctly classified elements, meaning
methods that combine prediction from multiple models            that the model predicts the true class. The red boxes,
of classification (or regression) type. Thus, boosting is       false positive (FP) and false negative (FN) instead
also using previously mentioned models through a new            show the number of incorrectly predicted elements.
approach. The biggest difference is, as mentioned, that         Out of the latter two outcomes, the false positive is
boosting is sequential. The idea is that each model tries       often the most critical one to consider and should,
to correct for the mistakes made by the previous one            therefore, be minimised. This is true because, in most
by modifying the training dataset after each iteration,         cases of classification, we are interested in discerning
in order to highlight the data points for which the             one specific class as accurate as possible. To illustrate
formerly trained models performed dissatisfactory.              this, we will use an example related to this work,
The final prediction is then a weighted average or              namely the classification of common stars (A) and
weighted majority vote of all models of the ensemble.           stars with unusual infrared properties (B). Since the
                                                                goal is to identify targets belonging to class B, TP are

                                                            6
The accuracy of a classification model is similar to
                                                                        the precision, but includes the total correct predictions
                                                                        while precision evaluates half of the true outcome.
                                                                        The accuracy thus evaluates the fraction of correct
                                                                        predictions over the total number of predictions, as
                                                                        seen in eq. 13. The ratio takes a value between 0
                                                                        and 1, but the accuracy is commonly presented in
                                                                        percentage 0-100%. The accuracy metric is conceivably
                                                                        better suited for a uniformly distributed sample since
                                                                        it is a biased representation of the minority class.
                                                                        This implies that the majority class has a bigger
                                                                        impact on the accuracy than what the minority class
                                                                        has. Therefore, the classification accuracy could be
                                                                        misleading if the working involves large differences in
                                                                        the quantities of each class (Schütze et al. 2008).

                                                                                                    TP ` TN
Figure 2: The figure shows a confusion matrix with true and                       Accuracy “                                (13)
predicted classes. The true positive (TP) and true negative (TN)                               TP ` TN ` FP ` FN
(green squares) represents the correctly classified objects while
the false positive (FP) and the false negative (FN) (red squares)           Naturally, a precision close to 1 and an accuracy
are the falsely classified objects.
                                                                        near 100% is preferred, however, as discussed earlier
                                                                        (section 2.5.1) it is more important to reduce the
targets predicted as, and truly are unusual stars. TN                   number of FN. A model with high precision and a
are targets predicted as and are common stars, FN are                   high ratio of FN is a worse model than one with lower
targets predicted as stars but belong to class A. Finally,              precision and a low ratio of FN. Nevertheless, both
the most critical outcome, FP are targets predicted as                  parameters give a quick indication of how well the
group B but in reality, TN are targets predicted as and                 model classifies the entries.
are common stars, FN are targets predicted as stars but
belong to class A. Finally, the most critical outcome,
FP are targets predicted as group B but in reality,                     3    Theory: Astronomy
belong to group A (common stars falsely classified as
unusual stars). If the number of FP is high, it means                   The purpose of this work is as previously mentioned,
that the model will assort many common stars as                         to utilise machine learning to select intriguing stellar
unusual ones, thus making the result unreliable. This                   targets with unusual infrared properties. A typical
means that the group of interest will be contaminated                   spectral energy distribution (SED) of a star, like a
by ”common stars” and, thereby making the manual                        black-body spectrum, is characterised by a relatively
process of checking the list of interesting targets time-               smooth peak at some specific wavelength (e.g. see
consuming. If instead FN is high while FP is low,                       the SED of the Sun in figure 3). The placement of
it means that many uncommon targets are lost in                         the peak is determined by the temperature of the star
the classification and gets assorted into the common                    and indicates in what wavelength range the majority
group. It is true that these targets will go unnoticed,                 of the stellar radiation is emitted. The black-body
however, the targets later classified as uncommon stars                 curve of solar-like stars (spectral type FGK) commonly
by the model will have a smaller uncertainty of being                   peaks in the optical wavelength range and decreases
misclassified.                                                          as we move to longer wavelengths in the infrared
                                                                        (IR). The specific position of the peak in the optical
                                                                        signifies the colour of the star. It should be noted that
2.5.2    Precision and classification accuracy                          black-body radiation is often the first approximation
Further parameters that can be used for assessing the                   for stellar emission even though stars are not perfect
overall performance of the classification is precision                  black bodies. The unconventional objects considered in
and classification accuracy. The precision of a model                   this study are expected to deviate from this classical
describes the ratio of true positives over all positives,               view. The targets of interest do indeed show a peak
as stated in eq. 12. A high precision value (close to                   in the optical, but with a lower brightness than a
1) is good and a low (close to 0) signals that there is                 common star. Furthermore, these targets are also
a problem in the classification which yields many false                 expected to show an excess in the infrared regime.
positives.                                                              The underlying idea is that there is some component
                                                                        blocking a significant fraction of the stellar radiation,
                                    TP                                  thus making the star dimmer. However, the stellar
                   precision “                              (12)        radiation can not be retained within but is re-emitted
                                  TP ` FP
                                                                        in another wavelength. This re-emitted radiation is the

                                                                    7
are identifying potential candidates based on their IR-
                                                                 excess from thermal radiation. The AGENT formalism
                                                                 is one proposal of how the energy budget would look
                                                                 like for a Dyson sphere. This formalism was first
                                                                 introduced by Wright, Griffith, Sigurdsson, Povich &
                                                                 Mullan (2014) and follows;

                                                                                     α`“γ`ν                       (14)
                                                         ff Where α is the collected starlight,  is the non-
                                                         starlight energy supply, γ represent the waste heat
                                                         radiation and ν is all other forms of energy emission
                                                         e.g. neutrino radiation. For simplifications of the
                                                         formalism one assumes negligible non-thermal losses
                                                         (ν „ 0) and that the energy from starlight is much
                                                         higher than other sources ( „ 0) thus generalising
Figure 3: The solar spectrum. Credit: Robert A. eq. 14 to α “ γ. The simplified black body model of
Rohde licensed under CC BY-SA 3.0                        the Dyson sphere can be expressed using α and γ on
                                                         the host star (see eq. 6 of Wright, Griffith, Sigurdsson,
                                                         Povich & Mullan (2014)). We generalise it further by
 thermal radiation emitted in the IR. In the following expressing the luminosity and magnitude of the DS as;
section, we will discuss known astronomical objects
that fit the description of these uncommon targets, as                      LDS “ LStar ˆ fcov                (15)
well as a hypothetical object which lay the ground for
our investigation of the use of ML on large astronomical
datasets.                                                         magDS “ magStar ´ 2.5log10 p1 ´ fcov q      (16)

3.1    Dyson spheres                                             3.2    Dust-enshrouded stars
Perhaps the most speculative candidate for a target              One type of the known astronomical objects that
with a SED as described in the above section is a so-            fit part of the SED profile of our DS models are
called Dyson sphere. A Dyson sphere is an artificial             dust-enshrouded stars or certain stars within nebulae.
circumstellar structure first introduced by Dyson                Young stellar objects (YSOs) are good candidates
(1960), which harvest starlight from the enclosed star.          for nebulous objects. YSOs denotes stars in their
Such a mega-structure covers parts of or the whole               early stage of evolution and can be divided into two
star, thus blocking the star’s light. This causes the            subgroups; protostars and pre-main-sequence (PMS)
energy output in the optical part of the SED to                  stars. The terminology of a protostar refers to a
drop. However, the blocked starlight is reradiated               point in time of the cloud collapsing phase of stellar
in the thermal infrared as the megastructure gets                formation. When the density at the centre of the
heated up and has turned into an ember the size                  collapsing cloud has reached around 10´10 kgm´3 , the
of a star. This energy output is seen as a second                region becomes optically thick and makes the process
peak or slope in the IR range of the SED. Stars                  more adiabatic (no heat or mass transferred between
encased by Dyson spheres are therefore expected to               system and surrounding). The pressure increases in the
show abnormalities in their SED when compared to                 central regions and eventually reaches near hydrostatic
”normal” stars. A Dyson sphere is typically not                  equilibrium (the gravitational force is balanced by
visualised as a solid shell enclosing the star, but rather       the pressure gradient; Carroll & Ostlie (2017)). The
a swarm of satellites where each satellite absorbs a             protostar resides deep within the parent molecular
small fraction of the stellar radiation (Suffern 1977,           cloud, enshrouded in a cocoon of dust. Since the
Wright, Mullan, Sigurdsson & Povich 2014, Zackrisson             star has yet to begin nuclear fusion, the generated
et al. 2018). The covering fraction fcov , depends on the        energy does not come from the core. Instead, most
distribution and scales of the enshrouding satellites.           of the energy comes from gravitational contractions
If one assumes that this astro-engineered structure              which heats the interior. The heated dust thereafter
acts like a gray-absorber (all wavelengths are equally           reradiates the photons in longer wavelengths which
affected), one expects only a general dimming of the             accounts for the IR source of radiation associated with
observed flux and no further changes on the spectral             these stars. This IR excess might, therefore, resemble
shape. An ideal Dyson sphere with high efficiency                that of a Dyson sphere. However, the protostar is not
can absorb all stellar light and has minimal energy              expected to peak much in optical due to the dust which
loss. In reality, that is not likely the case for such           is fully enshrouds the protostar, thus all radiation is
a construction. Some of the absorbed stellar light               reprocessed.
will be turned into waste heat and other forms. The                  As the name suggests, a PMS star is a star in the
waste heat is the centrepiece of this work since we              evolutionary stage just before the main sequence where

                                                             8
Tauri stars, they also show strong emission lines. The
                                                                        major difference between T Tauri and Herbing Ae/Be
                                                                        stars are their masses, where the former typically have
                                                                        masses in the range 0.5 to 2 M@ (solar masses) and
                                                                        the latter group 2-10 M@ .

                                                                            Furthermore, YSOs are associated with early
                                                                        star evolution phenomena such as protoplanetary
                                                                        disks (also known as proplyds), astronomical jets (a
                                                                        beam of ionised matter that is emitted along the
                                                                        axis of rotation) and masers. Maser emission is
                                                                        commonly detected via emission from OH molecules
                                                                        (hydroxyl radical) or water molecules H2 O. A maser
                                                                        is the molecular analogue to a laser and a source of
                                                                        stimulated spectral line emission. Stimulated emission
                                                                        is a process where a photon of specific energy interacts
                                                                        with an excited atomic electron which causes the
                                                                        electron to drop into a lower energy level. The emitted
                                                                        photon and the incident photon will be in the same
                                                                        phase with each other, thus amplify the radiation
                                                                        (Carroll & Ostlie 2017).
Figure 4: The HR diagram with absolute magnitude and
luminosity on the vertical axis and spectral class and effective
temperature on the horizontal axis. The diagram also shows the
different evolutionary stages of stars and some well known stars.
Image credit: R. Hollow, CSIRO.                                         3.3    Gray Dust

most stars in their evolution reside today and spend                    A potential source that could give rise to false high
most of their lifetime. The main sequence (MS)                          fcov Dyson sphere candidates, is seemingly grey dust.
refers to a specific part of the evolutionary track for                 Obscured starlight with no significant reddening of
a star in the so-called Hertzsprung-Russell diagram                     the spectrum is likely caused by material in the line
(HR diagram), which plots the temperature, colour                       of sight. This is what we refer to as grey dust.
or spectral type of stars against their luminosity or                   Micrometre-sized dust grains would essentially be grey
absolute magnitude, to mention a few versions (see                      at optical to near-infrared wavelengths, thus, reduce
figure 4). After a protostar has blown away its envelope                the optical peak that is seen for most stars, similarly
or birth cradle, it is optically visible. The young star                as expected of a Dyson sphere (Zackrisson et al. 2018).
has acquired nearly all of its mass at this stage but                   Studies suggest that such dust grains have been formed
has not yet started nuclear fusion. Thereafter, the star                in circumstellar material around supernovae (Bak
contracts, which results in an internal temperature                     Nielsen et al. 2018) but have also been detected in the
increment and finally initiates fusion processes. When                  interstellar medium (Wang et al. 2015). Observations
the star is in the phase of contraction, it is in the                   have shown that the µm-sized graphite grains, together
pre-main-sequence stage (Larson 2003). Two common                       with nano- and submicron-sized silicate and graphite
types of PMS stars are T Tauri stars and Herbing                        grains, fit the observed interstellar extinction of the
Ae/Be stars. T Tauri stars are named after the first                    Galactic diffuse interstellar medium in the range from
star of their class to be discovered (in the constellation              far-UV to mid-IR, along with NIR to millimetre
of Taurus) and represent the transition between stars                   thermal emission (Wang et al. 2015). The µm-sized
that are still shrouded in dust and on the MS. Dust                     grains account for the flat or grey extinction in the
that surrounds the young star (both T Tauri and                         UV, optical and NIR. Since they absorb little (near no)
Herbing Ae/Be type) is the source of IR radiation and                   amount in the optical, they do not emit much radiation
causes the IR excess in the SED which matches the DS                    in the IR. The sources of smaller sized grains could,
models. However, the T Tauri stars are expected to                      however, potentially mimic the IR signature of a DS.
feature irregular luminosity variations due to variable                 The gray dust likely affects the optical part of the SED
accretion and shocks, with timescales on the order                      more than the IR, to a great extent. Interstellar gray
of days (Carroll & Ostlie 2017). Furthermore, these                     dust will block the dispersed stellar radiation from a
types of young stars exhibit strong hydrogen emission                   distant star and emit IR radiation while gray dust that
lines (the Balmer series). Thus, an Hα spectral line                    surrounds a star prevents a larger fraction of the stellar
indication is a good signature for these types of stars.                radiation to escape, hence also a higher IR emission.
Herbing Ae/Be stars are likely high-mass counterparts                   Therefore, the IR property created by circumstellar
to T Tauri stars. As the name suggests, these stars                     dust, is the one that most resembles that of a Dyson
are of spectral type A or B and in likeness to T                        sphere.

                                                                    9
3.4    M-dwarf
M-dwarfs or Red dwarfs are small and cool main
sequence stars (thus appear red in colour) and
constitutes the largest proportion of stars in Milky
Way, more than 70% (Henry et al. 2006). These low-
mass stars develop slowly and can therefore maintain
a near-constant low luminosity for trillions of years.
This makes them both the oldest existing types of
stars observed, but also very hard to detect at large
distances. Evidently, not all M-dwarfs have debris
disks. In fact, this group of stars is rare and most
observed M-dwarfs with debris disks are younger M-
dwarfs, (Binks & Jeffries 2017). Most studies of
circumstellar debris disks are concentrated on early-
type or solar-type stars, and less so for M-dwarfs.
However, some studies (Avenhaus et al. 2012, Lee et al.
2020) have shown that warm circumstellar dust disks
are not only observed around M-dwarfs but that they
also create an excess in the IR. The underlying sources     Figure 5: Evolutionary track of a solar mass star. Credit:
of the dust disks are believed to be planetesimal belts     Lithopsian, licensed under CC BY-SA 4.0.
where the dust is created through collisions. This also
takes place in the solar system, however, the collisions
are more frequent for other observed systems which         IR, thus producing similar SEDs as the ones expected
results in higher IR excess than solar system debris       for DSs. However, the obscured AGB stars, and young
disks (Avenhaus et al. 2012). The combination of           stars as previously discussed, are often associated with
the low luminosity, temperature and an encompassing        OH masers. The post-AGB phases start when the
debris disk of an M-dwarf could create a SED that is       star begins to contract and heat up. The increasing
consistent with a DS model. Although, a debris disk is     temperature and constant luminosity make the star
not expected to generate high coverage, meaning that a     transverse horizontally towards higher temperatures in
fitted DS model onto a target with a debris disk should    the HR diagram (see figure 5). When the temperatures
show a low covering fraction. Typically a debris disk      reach „ 25000 K, the radiation is energetic enough to
reduces the optical emission in orders of LLIR‹
                                                „ 0.01     ionise the remaining circumstellar envelope and appear
(Hales et al. 2014).                                       as planetary nebulae (Engels 2005). It is possible that
                                                           post-AGB or non-pulsating AGBs have similar SEDs
3.5    post-AGB stars                                      as the expected DSs since both objects are assumedly
                                                           cooler with stronger IR-emission.
Post-Asymptotic giant branch (post-AGB) stars are
evolved stars with initial masses of 1-8 M@ (Engels
(2005), values vary significantly between authors „0.5-
10 M@ ). These objects are expected to have high            4     Data and program
brightness where a solar-mass star can reach ą 1000
times the present solar luminosity (see figure 5). An
AGB star is both very luminous and cool, thus having        The data used within this work originates from three
strong IR-radiation and occupy the upper right region       different catalogues. These are the Gaia (Data release
in the HR-diagram seen in figure 5. The characteristics     2, DR2; Gaia Collaboration (2016, 2018)), 2MASS
of these stars are the energy production in the double-     (Skrutskie et al. 2006) and WISE/AllWISE (All Wide-
shell structure (helium and hydrogen) that surrounds        field Infrared Survey Explorer; Wright et al. (2010)).
the degenerated carbon-oxygen core. The AGB phase           The three catalogues cover different wavelength ranges,
can be divided into two parts; early-AGB, a quiescent       that when used together covers optical to mid-infrared
burning phase and the thermal pulse phase, where            wavelengths. Since the targets of interest display
large energy releases are created by flashes of He-         somewhat lower energy output in the optical and an
shell burning. In these phases, the outer layers of         excess in the IR, this range of wavelengths is suitable
the envelopes are extended to cooler regions by the         for our project.
pulsation which facilitates dust formation. The newly
formed dust is pushed out by radiation pressure which           In this section, we will first go through the
drags the gas along and leads to high mass-loss rates.      three data catalogues from which we recover our
In turn, the high mass-loss rates lead to the formation     astronomical data. Thereafter, a more in-depth review
of circumstellar dust shells with high optical depths.      of machine learning via MATLAB, its features and
The dust absorbs the stellar light and re-emits in the      utilities.

                                                       10
4.1    Gaia
The ESA space observatory Gaia with the
accompanying database is one of today’s greatest
resource of stellar data. The satellite is constantly
scanning the sky, thus creating a three-dimensional
map of no less than „ 1.6 billion stars.           This
corresponds to around one per cent of the total number
of stars within the Milky Way. Furthermore, the
data will reach remarkable accuracy, where targets
brighter than 15m will have a position accuracy of
24 microarcseconds (mas) and the distance to the
nearest stars, as good as 0.001% (The European Space
Agency 2019). Such precision will be achieved at the
end of the mission when the satellite has measured
each target about 70 times. The satellite operates in        Figure 6: The coloured lines show the passbands for GBP pBP q
three band-pass filters; BP , G and RP with respective       (blue), G (green) and GRP pRP q (red) that defines the Gaia DR2
central wavelength; 0.532, 0.673 and 0.797 µm (Gaia          photometric system. The thin, gray lines show the nominal, pre-
                                                             launched passbands used for DR1 and published in Jordi et al.
Collaboration 2016, 2018) which can be seen in figure 6.     (2010). Credits: ESA/Gaia/DPAC, P Montegriffo, F. De Angeli,
    New data releases are frequently published in the        C. Cacciari. The figure was acquired from Gaia’s homepage
Gaia archive as the satellite repeatedly scans each          where it was published 16/03/2018.
target. The latest release EDR3 was made public
in December of 2020. This work is based on the                   catalogue superior to a former WISE All-Sky
predecessor DR2. Even though the (early) EDR3 has            Release Catalog.       Furthermore, the photometric
better accuracy, it is missing some properties (such         accuracy of AllWISE, in all four bands, has improved
as luminosity) and partly other properties for some          due to corrections of the source flux bias and more
targets (e.g. G magnitudes). It is worth noting that the     robust background estimations. The only exception,
errors of the magnitudes are not given in the archive,       where WISE All-Sky is better than AllWISE is for
these are manually derived from the respective flux (see     photometry measurements for sources brighter than
eqs. 30 & 31). The error estimation is a simplification      the saturation limit in the first two bands W 1 ă 8 &
of the real uncertainty but is not expected to have an       W 2 ă 7. Both these catalogues are commonly shown
impact on the machine learning process. The estimated        for our Dyson sphere candidates when searched for in
errors for Gaia photometry in Vega system is based on        VizieR 2 , where we pay extra attention to the variability
the derivation found on the homepage of ESA Gaia:            flag for each band. From this catalogue we also utilise
GDR2 External calibration1 (Paolo Montegriffo 2020).         a parameter called cc-flag which is discussed more in-
The derivation can be seen in Appendix A.2.                  depth in section 6.5.

                                                             4.3     2MASS
4.2    AllWISE
                                                             Between 1997 and 2001, the Two Micron All
WISE is a Medium Class Explorer mission by NASA              Sky Survey resulted in photometric and astrometric
that conducted a digital imaging survey of the full          measurements over the entire celestial sphere. A pair
sky in the mid-IR bandpasses; W 1, W 2, W 3 and              of two identical telescopes at Mount Hopkins, Arizona
W 4 with respective central wavelength; 3.4, 4.6, 12.0       and Cerro Tololo, Chile, made NIR observations which
and 22.0 µm (Wright et al. 2010). The catalogue              resulted in a Point Source Catalogue containing above
contains photometry and astrometry for over 500              470 million sources. The 2MASS All-Sky Data Release
million objects. In likeness to the Gaia satellite, the      includes the aforementioned Point Source Catalogue,
WISE space telescope accumulates data for each target        4.1 million compressed FITS images of the entire sky
multiple times. The independent exposures for each           and an additional Extended Source Catalogue of 1.6
point on the Ecliptic plane were typically 12 times          million objects. The NIR photometric bands used
or more, while observational points at the Ecliptic          in 2MASS and utilised within this work were; J, H
poles reached several hundred. In November 2013 the          and Ks with corresponding central wavelengths; 1.247,
AllWISE Data Release was generated. This source              1.645 and 2.162 µm (Skrutskie et al. 2006). These
catalogue is a combination of data from WISE Full            passbands largely correspond to the common bands J,
Cryogenic, 3-Band Cryo and NEOWISE Post-Cryo                 H and K, first introduced by Johnson (1962), with
survey phases. This has enhanced the sensitivity in          the adjustment that 2MASS Ks filter (s for short)
the W 1 and W 2 bands, thus making the AllWISE               excludes wavelengths beyond 2.31 µm to minimise
   1 https://gea.esac.esa.int/archive/documentation/GDR2/Data_processing/chap_cu5pho/sec_cu5pho_calibr/ssec_cu5pho_

calibr_extern.html
   2 https://vizier.u-strasbg.fr/viz-bin/VizieR

                                                        11
You can also read