Big Data Research Infrastructure Collaboration Toward the SKA (BRICSKA) - SciELO
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
An Acad Bras Cienc (2021) 93(Suppl. 1): e20201027 DOI 10.1590/0001-3765202120201027
Anais da Academia Brasileira de Ciências | Annals of the Brazilian Academy of Sciences
Printed ISSN 0001-3765 I Online ISSN 1678-2690
www.scielo.br/aabc | www.fb.com/aabcjournal
PHYSICAL SCIENCES
Big Data Research Infrastructure Collaboration
Toward the SKA (BRICSKA)
RUSS TAYLOR, FABIO PORTO, CHENZHOU CUI, YOGESH WADADEKAR
& OLEG MALKOV
Abstract: Astronomy is entering an era of mega-data that will render conventional
research methods as well as data and visual analytics tools ineffective. The Square
Kilometre Array (SKA) drives one of the most significant big data challenges of the next
decades. South Africa, China and India are partners in the global SKA collaboration
and host recently completed, next generation radio astronomy facilities. South Africa,
Brazil, China and India are involved in the Large Synoptic Survey Telescope (LSST),
which represents a complementary mega-data challenge, vastly increasing the current
data volume of optical surveys, and providing critical multi-wavelength data set
for SKA analytics. Russian researchers are also engaged in radio astronomy and
multi-wavelength, multi-messenger projects driving increasing volumes of observational
data. This project brings together teams leading programs in data innovation in each
partner country to collaborate on the development of new technologies and systems
to meet the big data challenge of SKA pathfinder facilities and the multi-wavelength
projects that are critical to the advance of astronomy. In so doing we will prototype and
demonstrate scalable big data technologies for the new big data era, and establish a
BRICS multinational federated data intensive cloud network for collaborative programs
in data intensive astronomy.
Key words: BRICS, SKA, big data, large surveys.
INTRODUCTION endeavour, and that leadership in science
worldwide is inclusively distributed across the
Modern science faces several challenges that globe.
are transformative in how science operates and
in how innovations and new knowledge are Astronomy is a science at the nexus of
generated and exploited. The challenges are (i) these challenges and the BRICS nations have
Big Data, how to record, transport and deliver all made the strategic choice to invest in
it, (ii) Big Compute, how to process, analyse, astronomy and in the technologies underpinning
and visualise big data, (iii) Big Science; by its practice; each country with their specific
asking big questions as a global community advantages. This project aims to build a strong
facilitated by a suite of unique international international partnership to jointly develop
facilities, science can only progress in leaps technologies and systems that address the Big
through large international collaborations, and Data challenge posed by astronomy Big Science
(iv) Human Capacity Development; to make sure projects, of which the participating countries
all people are active participants in the scientific are strong members. Seizing the opportunity
An Acad Bras Cienc (2021) 93(Suppl. 1)RUSS TAYLOR et al. BRICSKA
presented by large international astronomy new sets of skills and technologies; skills that
collaborations, the project will significantly combine knowledge of astronomy with computer
contribute to Human Capacity Development in science, information science and statistics, and
BRICS to ensure participation and leadership technologies that adapt the innovations of the
in global science, and by linking those science fourth industrial revolution to the service of
and technology skills to society and industry, scientific research. The international scope and
contribute to stimulating innovation. scale of these next-generation astronomical
With BRICS countries home to some of endeavours, and the multi-national nature of the
the best astronomy facilities in the world, this collaborations on large science programs means
project aims to facilitate, through development that we must respond to these challenges as a
of people and technology, the synergies between global community.
these facilities, thereby addressing some of the Three of the BRICS partner countries
big challenges facing science today, and helping are part of the international SKA project
answer some of the big scientific questions of organisation. All partner countries are engaged
our time, which have mobilised international in data-intensive multi-wavelength and
resources into large scientific projects like the multi-messenger programs providing critical
SKA and the LSST. ancillary data to realise the scientific potential
The SKA drives one of the most significant of SKA pathfinders and the SKA itself. The LSST
big data challenges of the coming decade (An is the flagship of upcoming optical facilities
2019). New technologies under development that will create vast data sets that must be
on the pathway to the SKA are creating mined both for transient phenomena and to
commensurate advances in the data capacities create catalogues of multi-wavelength data
of radio telescopes. The data rates from for billions of objects that are interoperable
existing radio telescopes to the researcher with radio data. Four of the BRICS partner
are 10,000 times larger than what was typical countries are international collaborators of
only a few years ago. At the same time the the LSST project. The impact of MeerKAT and
enterprise of observational radio astronomy is the SKA will be dramatically increased by
changing. Projects undertaken by large global incorporating multi-wavelength data, and the
collaborations, in which large amounts of LSST dataset will be the widest, deepest and
observing time are devoted to major key science fastest optical survey ever done. Combining
programs that create vast data sets, is becoming these datasets is however a non-trivial task.
the new paradigm. This mode of observing This proposal brings together scientific leads
combined with the new instrumental capacities of large programs on SKA pathfinders, the LSST
is driving an exponential growth in the rate of and other multi-wavelength facilities along
data confronting researchers (see Figure 1). with computer scientists, data scientists and
To rise to the scientific opportunity opened experts in 4th industrial revolution technologies
up by this new generation of instruments to build solutions for big data astronomy
requires the research community to develop that will enable the full scientific potential
new infrastructure, algorithms, software systems of significant investment in major facilities,
and platforms to manage, process, analyse leverage investment in HPC systems for data
and mine these data sets. The enterprise of intensive research, and develop the expertise
science in this new data intensive age requires to adapt and innovate new technologies to meet
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 2 | 22RUSS TAYLOR et al. BRICSKA
Figure 1. Data volumes for large projects on radio astronomy facilities as a function of time. Current and planned
BRICS facilities are indicated in red. BRICS researchers are being confronted with a data deluge in the coming
decade characterized by exponential growth that is much faster than has been experienced by the rest of the world
to this point.
the exponential growth in astronomical data in Realizing the HCD and development potential
the next decade. of the flagship naturally calls for a coherent
effort across the BRICS nations. The proposal
Enabling access to globally competitive team and their respective national networks
research for smaller institutions previously have extensive and complementary experience
isolated from research; revealing and developing in all the topics mentioned above. Working
untapped talent in the BRICS countries; jointly on those topics prevents duplication
embedding young researchers into the global of effort in developing such platforms and
networks of science and industry; increasing promotes sustained collaboration between
direct economic benefits of science through partners countries beyond astronomy research.
interfaces with the private sector; increasing
the awareness of, and the support for science SCIENTIFIC RATIONALE
among the general public with outreach;
supporting development projects through The concept of the Square Kilometre Array (SKA)
interdisciplinary projects using the research emerged in the 1990s. Currently, it is one of
infrastructure; and developing communities the largest international science projects in the
through the development of astro-tourism at world, as it is set to answer fundamental open
astronomical facilities in the BRICS countries questions about our universe. Using the SKA
are some of the potential development benefits and other telescopes described in Table I the
of this flagship project that are detailed below. participants in this project are research leaders
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 3 | 22RUSS TAYLOR et al. BRICSKA
in most of the SKA key science projects (KSP) as Magnetism
detailed below.
Magnetic fields are an important element of
astrophysical phenomena that is yet to be
well understood. They contain energy and
influence physical processes on all scales, from
Science Goals
star formation to galaxies and to the largest
structures known in the universe. We don’t yet
Fundamental Physics know how these magnetic fields formed and
we are not in a position to explain how strong
The discovery of gravitational waves has they are. What SKA will be able to do is map
opened up a new window to the universe. magnetic fields in great detail. For this question
SKA will be crucial to look for counterparts and others, the MIGHTEE project on MeerKAT
in radio to the sources of those gravitational (Jarvis et al. 2016) will reach similar depth to
waves. For example, the uGMRT has already that reached by the larger area SKA sky survey,
demonstrated this with meaningful radio and thus will provide a pilot to the experiments
follow-up observations of gravitational wave that will be carried by the SKA on a much larger
events such as the GW170817 which was the survey volume. Similarly, when combined with
merger of two neutron stars (Kim et al. 2017). uGMRT data, the SuperMIGHTEE data set (Taylor
Combining the SKA with the Five-hundred-meter 2019) will become the premier data on the GHz
Aperture Spherical Telescope (FAST) in China radio properties of the deep sky thanks to its
(Gibney 2019) may also allow us to study the unique full spectral coverage and ultra-broad
low-frequency universe using high precision band nature.
pulsar timing, massive black holes, interacting
galaxies in galaxy pairs and in groups of
The Hydrogen Universe
galaxies. Some of the fastest and most energetic
phenomena hold the key to our understanding Neutral hydrogen is a powerful probe of the
of fundamental laws of physics in conditions of evolution of the universe. Neutral hydrogen is
extreme energy. The detection of powerful bursts found everywhere and can therefore be used
of gravitational waves, gamma rays, X-rays, or to probe anything from the early distribution
radio waves provide evidence for this; however, of matter to the detailed structure of evolved
we still know very little about their appearance in galaxies themselves. Radio telescopes are
optical light and therefore combining data from the best instruments to observe spontaneous
large modern facilities into multi-wavelength emission in radio waves of neutral hydrogen
data sets is key to understanding what is and therefore to map the distribution of neutral
happening in those conditions of extreme hydrogen in the universe. The SKA precursor
energy. Contributions to this science objective and pathfinder telescopes are enabling detailed
are expected directly from the ThunderKAT imaging of neutral hydrogen with unprecedented
project on MeerKAT (Fender et al. 2016), with the depth and detail, laying the groundwork for the
MeerLICHT telescope, FAST, transient detection in SKA. From detecting galaxies in the early universe
the era of the Large Synoptic Survey Telescope in emission and absorption as proposed by
(LSST), uGMRT follow-up of LIGO events, and LADUMA (Blyth et al. 2016) and MALS (Gupta
more. et al. 2016) to investigating galaxy formation and
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 4 | 22RUSS TAYLOR et al. BRICSKA
evolution with MIGHTEE, with these projects on environment. Because of this, radio transients
the MeerKAT telescope, the evolution of neutral are invaluable probes for subjects as diverse
hydrogen in the Universe will be studied. The as stellar evolution, relativistic astrophysics
low-frequency uGMRT receivers will complement and cosmology. ThunderKAT is a MeerKAT large
these by enabling detections at even higher survey program to detect and study such
redshifts, and many new results are already phenomena using the high sensitivity and wide
being reported from systematic surveys of HI in area imaging capability of the MeerKAT. As well
emission and absorption, including some of the as performing targeted programs, ThunderKAT
furthest detections to date of HI in emission. will co-observe with other MeerKAT large survey
The science programs of FAST also include a projects and search their data for transients. The
large scale neutral hydrogen survey. Matching uGMRT also provides for significant capabilities
detected galaxies with objects and signals in for detections of transient phenomena in the
other wavelengths will enable new discoveries, Universe.
and new science. These ambitious research In addition, optical observations of
projects will help us understand how this neutral transients may lead to important discoveries in
gas eventually turns into stars, and even how other fields of astronomy: Near-Earth Objects,
supermassive black holes at the centres of nearby asteroids potentially dangerous for Earth;
galaxies are fueled. exoplanets; variability of super massive black
holes in galaxies; stellar flares; gravitational
The Transient Universe lenses; and probably several other yet-unknown
types of fast-changing phenomena that have not
One of the key capacities of the new generation
yet been observed.
of radio telescopes and of the SKA is the
To enable such studies, there is a need
ability to measure and detect transient objects.
for Big Data and Big Compute facilities within
While the universe may appear to evolve only
the BRICS group to fully exploit the data
over scales of billions of years, astrophysical
resources following from transient science
phenomena are all dynamic and some can occur
observations obtained with the facilities within
over incredibly short timescales. We have only
the BRICS countries. The BRICS member
gained an awareness of this as the fields of
countries’ participation as global leaders in
view of telescopes and the ability to observe on
astrophysical transient research is expected to
short time scales have improved with advances
accelerate greatly over the next decade with the
in technology. Time domain astronomy, where
development of new facilities. A cornerstone
repeated observations can be undertaken over
of this will be the emergence of the LSST as
a range of timescales (from sub-second to
a generator of transient discoveries. Two of the
years), is key to obtaining data of sufficient
BRICS countries (Brazil and South Africa) are now
cadence to unlock the nature of transient objects
actively participating in LSST transient science
and understand the dynamics of astrophysical
collaborations and more may still do so in the
events.
future.
Transient radio emission is associated
with essentially all explosive phenomena
The Continuum Universe
and high-energy astrophysics in the universe.
It acts as a locator for such events, and Radio continuum surveys were identified as
a measure of their feedback to the local a scientific priority for the SKA for several
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 5 | 22RUSS TAYLOR et al. BRICSKA
reasons, including galaxy evolution, magnetism, cosmology is entering a new era where the
transients, and more. Among those priorities large-scale structure in the universe can be
are the understanding of the star formation mapped in three dimensions. Moreover, science
history of the universe - to which MIGHTEE collaboration between FAST and MeerKAT on
and SuperMIGHTEE will contribute by studying the intensity mapping of neutral hydrogen
the evolution of star-forming galaxies, and of will contribute to a specific cosmological
active galactic nuclei (AGN), where supermassive measurement; that of the so-called Baryonic
black holes reside, and how this affects the Acoustic Oscillation (BAO). This is fundamental,
evolution of galaxies and their environment. as the large-scale structure of the universe and
Another important aspect is the potential how it came to be, of which the BAO is a keystone
for serendipitous discoveries. Unplanned measure, is one of the pillars of our current
discoveries often lead to fundamental insights understanding of cosmology.
into new physics. A technique called Very Long
Baseline Interferometry (VLBI) is key for this Large projects on next-generation BRICS
last point. The technique combines signals facilities
from individual radio observatories to form a
The scientific journey between now and SKA1
single, Earth-sized radio interferometer with an
will be charted through large science programs
angular resolution several orders of magnitude
with new SKA pathfinder and precursor facilities.
finer than offered by connected-element
These programs map directly onto the key
interferometers. Many important results have
science goals of the SKA, and prototype not only
resulted from the use of the VLBI technique,
the technical and scientific capabilities of the
a recent example being the imaging of a
SKA but also the growth of big data in astronomy
black hole by the Event Horizon Telescope
and the new modality of global collaboration
(Kohler 2019). One of the science goals of the
around very large projects generating vast
FAST telescope is provide the largest aperture
amounts of data. At the same time developments
in international cm-wave VLBI networks,
and new projects in optical and infrared
enabling unprecedented ultra-sensitive imaging.
astronomy such as the LSST are set to create a
Similarly, MeerKAT will add an extremely
rapid growth of multi-wavelength data sets.
sensitive antenna to global networks, paving the
The proposal team includes leaders of large
way towards SKA-VLBI (Paragi et al. 2015).
programs on SKA pathfinder facilities and in
multi-wavelength astronomy, most of which
Cosmology
involves collaborations among researchers in
The SKA precursor and pathfinder telescopes the BRICS partners. The data solutions to be
are making possible new types of observations; developed will thus advance key science projects
neutral hydrogen intensity mapping surveys that that address the fundamental questions driving
do not detect individual galaxies. Combined multi-national investment in megascience
with radio continuum surveys that detect radio projects in the coming decade.
galaxies out to very high redshift as proposed by This flagship program and the complementary
LADUMA and to a certain extent MIGHTEE, and flagship program on transients are seen as
some of the ongoing and proposed programs synergistic, where together they can both
with the uGMRT, and with LSST and optical advance the science and also the associated
observations able to measure cosmic distances, development and spin-off benefit goals. This
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 6 | 22RUSS TAYLOR et al. BRICSKA
Table I. Current and Planned Facilities providing big data challenges for BRICS Astronomy Research.
Telescope Location Wavelengths Data rate
SKA1-mid South Africa Radio: 350 MHz - 14 GHz 1000 TB/day
MeerKAT South Africa Radio: 580 MHz - 3.5 GHz 10-100 TB/day
FAST China Radio: 70 MHz - 3 GHz 690 TB/day
uGMRT India Radio: 0.1 - 1.45 GHz 10 TB/day
PRAO Russia Radio: 111 MHz 0.1 TB/day
AVN, VLBI Global Radio: 1 - 100 GHz 10 TB/day
LSST Chile Optical - near Infrared 20 TB/night
MeerLICHT South Africa Optical 0.1 TB/night
will leverage existing and planned new facilities radio frequencies in the GHz range. The MeerKAT
within the BRICS countries and will also draw science program includes large survey projects
on the opportunities presented by other space- that will occupy about 70% of the observing time.
and ground-based facilities that exist within Each program requires typically thousands of
the BRICS group. As shown in FIgure 1 the hours of observing and will, over the course of
new generation of BRICS facilities that drive 5 years, create Petabyte scale data sets. These
the data challenge in the next decade are data sets must be converted to science products
MeerKAT, the upgraded Giant Metrewave Radio and then visualised, analysed and mined for
Telescope, the Five-hundred-metre Aperture the answers to the scientific questions. Large
Spherical Telescope and the SKA1-mid. Parallel programs relevant to this proposal include:
developments in wide-field Very Long Baseline
The MeerKAT International GHz Tiered
Interferometry and mult-wavelength astronomy
Extragalactic Exploration (MIGHTEE) project
with the LSST provide complementary data
is being undertaken by an international
challenges that are critical to our overall science
collaboration of researchers. The MIGHTEE
objectives.
observations will provide radio continuum,
spectral line and polarisation information.
MeerKAT MIGHTEE, along with multi-wavelength
data, will allow a range of science to be
The MeerKAT telescope is the precursor
achieved, as listed above.
to the SKA mid-frequency array. MeerKAT
was inaugurated in July 2018, and science The LADUMA (Looking At the Distant
operations have begun. MeerKAT will operate Universe with the MeerKAT Array) project
as a stand-alone South African facility for will study evolution of the neutral hydrogen
approximately five years until its 64 antennas are gas in galaxies over cosmic time spanning
merged with 135 SKA antennas to form SKA1-mid. over two-thirds the age of the Universe. The
The large number of antennas, high sensitivity, survey strategy is to observe a single area
and wide-field of view of MeerKAT make it the on the sky rich in existing multi-wavelength
premier imaging telescope in the world for data. The primary science goals of LADUMA
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 7 | 22RUSS TAYLOR et al. BRICSKA
include investigating, as a function of directly and detect transients in other
environment and look-back time: the MeerKAT large survey projects. This
distribution of mass of neutral hydrogen commensal use of the other surveys means
of galaxies and how it depends on stars that the combined MeerKAT large survey
and dark matter in their environment projects will produce by far the largest
and the evolution of the cosmic neutral GHz-frequency radio transient program
gas density. LADUMA will be the deepest before the SKA.
HI survey before the SKA comes online,
The associated MeerLICHT optical
thereby acting as a pilot project for future
telescope will simultaneous take large
wider-field deep observations of neutral
field optical images for all night-time
hydrogen to be carried out with the SKA.
MeerKAT observations and create a large
The MeerKAT Absorption Line Survey (MALS) optical data set for unique and powerful
will carry out the most sensitive search of multi-wavelength studies of transient
HI and OH absorption lines at 0 < z < 2, phenomena.
the redshift range over which most of the
evolution in the star formation rate density The Upgraded Giant Metrewave Radio
takes place. The key science themes of Telescope
the survey are: (1) Evolution of atomic and
The National Centre for Radio Astrophysics of
molecular gas in galaxies and relationship
the Tata Institute for Fundamental Research
with star formation rate density, (2) Fuelling
operates the Giant Metrewave Radio Telescope
of active galactic nucleus (AGN), AGN
(GMRT) on the Deccan plateau north of Pune
feedback and dust-obscured AGNs, (3)
India. The Giant Metrewave Radio Telescope
Variation of fundamental constants of
(GMRT) consists of 30 antennas each of 45-meter
physics, (4) Evolution of magnetic fields
diameter, spread over a region of 25 kilometres.
in galaxies, and (5) Physical modeling of
It is the world’s largest and most powerful radio
the ISM, Astrochemistry and Cosmology.
interferometer dish array in its frequency range
Due to the excellent sensitivity of the
(100 to 1450 MHz). The GMRT has undergone a
MeerKAT telescope, MALS will also deliver
major upgrade with new receivers, control, data
an extremely sensitive HI 21-cm emission,
transmission and wide band correlator systems
radio continuum and polarization survey
that has substantially improved performance
to address a wide range of issues at the
and increased the data rate from the array by
forefront of galaxy evolution research.
over a factor of 10. The data rate can be as
ThunderKAT is MeerKAT large survey high as 100 TB/hour at the raw voltage recording
program to detect transient radio signals. mode. The upgraded GMRT is an SKA pathfinder
ThunderKAT comprises a comprehensive and has the capability to demonstrate SKA kind
and complementary program of surveying of science with unprecedented sensitivity at low
and monitoring both galactic (within the radio frequencies
Milky Way) and extragalactic transients There are several large programs underway
such as microquasars, supernovae with the uGMRT including search for pulsars and
and possibly yet unknown transient transients (both targeted and general surveys);
phenomena. ThunderKAT will both survey precision timing of millisecond pulsars with
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 8 | 22RUSS TAYLOR et al. BRICSKA
the aim to contribute to the global effort for Commission (NDRC) and managed by the
the detection of gravitational waves; detailed National Astronomical Observatories (NAOC)
imaging of selected target deep fields; detailed of the Chinese Academy of Sciences (CAS),
studies of galaxy clusters; large surveys in with the government of Guizhou province
the neutral hydrogen line – both in emission as a cooperation partner. Being the largest
and absorption; radio follow-up of supernovae, filled-aperture telescope worldwide located at
gamma-ray bursts, and also gravitational wave a radio quiet site, the FAST science impact on
detections, etc. astronomy will be extraordinary, and has the
With interferometer baselines almost four potential to revolutionise many other areas
times larger than those of the MeerKAT, the of the natural sciences. Compared with its
uGMRT offers similar imaging angular resolution precursor Arecibo, FAST has an advantage of
at as MeerKAT at longer wavelengths. Moreover, a factor of two in raw sensitivity and a factor
the increased bandwidth and receivers of the of five to ten in surveying speed. FAST will
uGMRT provide similar sensitivity to MeerKAT. also cover two to three times more sky area
The uGMRT and MeerKAT together thus have thanks to its innovative design of an active
tremendous synergy as complementary facilities primary surface. A science instrument with an
for the study of the deep radio sky over an ultra order of magnitude improvement in any of its
broad band, opening a powerful new approach capacities, which is also able to explore new
to studies of the distant universe. dimensions in parameter space, is likely to
The SuperMIGHTEE project forms part of a generate unexpected discoveries.
bilateral Indo-South African Flagship program FAST produced its first light in September
in Astronomy, for which we have developed 2016 and following a commissioning, phase
a technical and scientific collaboration to normal operation to commenced in late 2019.
expand MIGHTEE to a joint MeerKAT and uGMRT The science programs of FAST mainly include
project. The unique full spectral coverage and a large scale neutral hydrogen survey, pulsar
ultra-broad band nature of the superMIGHTEE observations, leading the international very
data set will make it the premier data on the GHz long baseline interferometry (VLBI) network,
radio properties of the deep radio sky, and will detection of interstellar molecules, detecting
not be surpassed until well into the SKA phase 1 interstellar communication signals, and pulsar
era. timing arrays. FAST is currently the largest single
The Hz-MALS project uses uGMRT to extend dish telescope, while MeerKAT is the largest
the redshift coverage of MALS to 2 < z < 5.2. This aperture synthesis array telescope.
redshift coverage is unique to uGMRT and the
Science collaboration between the two
outcomes from the survey make it highly relevant
could probe pulsars, exotic events, the low
to design future surveys with SKA.
HI surface brightness, low mass galaxies and
the HI intensity mapping for cosmological BAO
The Five-hundred-meter Aperture Spherical
measurement. The combined surveys may also
Telescope
allow us to study low-frequency gravitational
The Five-hundred-meter Aperture Spherical waves using the high precision pulsar timing,
Telescope (FAST) is a next-generation extremely massive black holes, interacting galaxies in
large single dish radio telescope. It is funded galaxy pairs and in groups of galaxies. FAST is
by the National Development and Reform facing a similar situation of big data processing.
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 9 | 22RUSS TAYLOR et al. BRICSKA
The required data transmission rate of FAST is at time domain astronomy by detecting millions of
least 6GB/s, which represents a big challenge for transient and variable phenomena each night.
data transfer and processing. LSST will be the next big data generator of optical
transients, estimated to be of order several
Very Long Baseline Interferometry million detections of new transients or variability
per night. Analysing this enormous quantity of
Among other things, VLBI enables the study of data to allow rapid follow-up of transients with
AGNs and black holes, magnetic fields, galaxy other telescopes will require significant compute
formation, or even cosmological dark energy infrastructure, as well as novel algorithms. The
models. In this new survey-driven era, it is vital parallel proposal for a dedicated BRICS-wide
that global VLBI networks continue to increase Flagship program on transients shares many
their sensitivity and observing capabilities to of the Big Data and Big Compute challenges
enable high resolution followup, discovery, and discussed in this proposal in relation to the
statistical analyses at radio wavelengths. This SKA. The parallel proposal aims to develop a
is of particular importance in the realization network of ground-based optical telescopes for
of SKA1-mid, at which point VLBI is in effect the follow-up of astrophysical multi-wavelength
merged into a standard connected-element and multi-messenger transient objects. The
radio interferometer. SKA1-mid will thus carry demands for data handling associated
out high resolution wide-field surveys by design. with transient studies amongst the BRICS
The increasing wide-field VLBI capability poses collaboration will be met by the implementation
significant big data and big compute challenges. of this Big Data Flagship program.
VLBI has been at the heart of the
SKA since its conceptualisation. One of the
PROJECT OBJECTIVES
key science objectives of FAST is to lead
the international VLBI network. Shanghai The objectives of the proposed research are
Astronomical Observatory, Chinese Academy of to develop and foster collaborations between
Sciences (SHAO) have been developing VLBI researchers among the five BRICS partner
techniques since 1972. The longest baseline countries to address the data challenges of the
of the Chinese VLBI Network is 3249 km. With coming era. Specifically we have the following
wide-field VLBI being so closely tied with goals:
developments towards the SKA, as well as VLBI
itself being inter-continental in nature, it fits very Establish a prototype network of federated
naturally into the goals and aspirations of this cloud-based infrastructure and e-science
BRICS proposal. tools to support the next-generation
data intensive radio astronomy to serve
Large Synoptic Survey Telescope our national research communities, and
exploit synergies between partners and
The full science goals of the SKA will only be opportunities for enhanced research
achieved in synergy with other instruments. The through joint big data astronomy science
LSST will create a data set of multi-wavelength programs.
information for a vast number of cosmic
radiation sources that can be mined jointly with Create new technologies and modalities
radio survey data. The LSST will also transform for visualisation and machine learning
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 10 | 22RUSS TAYLOR et al. BRICSKA
enhanced visual analytics of cloud based HPC clusters within clouds systems. Areas being
remote big data, as well as cloud-based worked on include:
tools for collaborative exploration by
Elastic scheduling and workflow tools for
distributed research teams.
OpenStack and CloudStack
Develop cloud-based provisioning of HPC Distributed role based authorization to
for automated processing workflows for resources.
use by science teams. Provisioning flexible
containerised analytics environments such Scheduling of data movement between
as Jupyter notebook allowing interactive sites.
engagement with the data for user Scheduling of compute jobs to sites.
applications. Such platform and systems
will empower research teams to work Use of containerisation to support
with big data. Such workflow systems will movement of applications and reproducible
address the challenges of reproducible science.
science with big data. Teams in South Africa (IDIA) and China
(NAOC, SHAO, Alibaba Cloud) are also working on
Develop cloud-enabled systems and
science gateway and portal technologies for user
environments for the fusion of
access to data products, to cloud processing and
multi-wavelength data sets such as
analytics.
the LSST with radio data, and for
multi-messenger and transient science,
Algorithms and Machine Learning for Big Data
including a cloud-based LSST data capacity
Processing and Analytics Perhaps the largest
in South Africa to host and analyse
area of collaboration lies in the development
LSST transient events. An objective here
of novel algorithms and HPC software for
is to develop and incorporate machine
processing and analysis of big (radio) astronomy
learning to analyse data and enable us
data, and we include the following subsections
to characterise, classify (and discover) the
for collaborative purposes:
unknown and prioritise rapid response
programs. processing pipelines for raw data to science
products, including development and
Develop scalable knowledge bases and
integration of workflow systems, research
data lakes to provide access to integrated
data management and reproducible
and correlated knowledge transforming
science technologies (South Africa: IDIA,
raw data into scientific knowledge on a
Rhodes University; India: NCRA-TIFR, IUCAA,
global scale.
IISER; Brazil: LNCC, LIneA; China: NAOC,
Tianjin University, Guangzhou University,
Data intensive cloud architectures: Teams Russia). We aim to leverage the scientific
within IDIA (South Africa) will work closely and technical skills from our partners to
with researchers based in India (NCRA-TIFR), further develop workflows and recipes for
China (Tianjin University, SHAO, Alibaba Cloud) calibration and imaging pipelines.
and Brazil (LNCC) to build open-source based In addition, we will test and develop cloud
architecture and tools to enable deployment of interfaces to allow for seamless interaction
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 11 | 22RUSS TAYLOR et al. BRICSKA
with radio astronomy software, with the Cloud-based Visual Analytics and Exploration of
aim to facilitate collaborative development Big Data Cloud-based visual analytics of remote
for pipeline design and data inspection. big data (South Africa: IDIA; China: SHAO, Alibaba
Cloud, India: IUCAA). This will expand on an
initial framework being developed at IDIA for
algorithms for mining big data for
low-latency cloud server-client visualization of
information, including machine learning
remote big data including 2D and 3d (virtual
(South Africa: IDIA, SARAO; Brazil: LNCC,
reality) interfaces. We will expand on a new hdf5
LIneA; India: NCRA-TIFR, IUCAA, SPPU;
image data set framework that is optimized for
China: NAOC, Tianjin University, Guangzhou
big data and develop cloud provisioned HPC
University, Russia). We will use the
server-side rendering technologies and tools
techniques developed by e.g. the Spitzer
that interface to multiple remote clients. We
Data Fusion and the HELP EC-FP7-SPACE
will design interfaces and tools sets for visual
projects to merge multi-wavelength
analytics of big data sets from the cloud from
surveys and develop machine learning
MeerKAT, uGMRT and FAST in response to use
tools to best exploit the wealth of
cases that meet the needs of the BRICS data
multi-wavelength data available in
intensive programs.
upcoming radio deep survey fields and
enable new discovery space.
HCD and Development
Human capital development (HCD) is a key
data systems and distributed knowledge aspect of the project. An integral part of the
bases for management for fusion and new approach to science driven by big data, big
mining of diverse data sets. This work compute and large international collaborations
will be lead by LNCC advancing their is to create and embed the platforms within the
developments for LSST. research communities so that researchers may
work collaboratively on these large, diverse data
The novel processing challenges and sets. Such platforms are best developed through
extraordinary image sizes emerging from an extensive, international program such as
the field of wide-field VLBI presents a the one proposed here: it promotes sustained
new type of problem that foreshadows collaboration between partners from the BRICS
the SKA (South Africa: IDIA, SARAO; China: countries, and it prevents duplication of effort
SHAO, FAST; India, Russia). We will interface in developing such platforms in individual
disparate pieces of custom-built software countries.
into a more general, multi-purpose VLBI The project will develop the required
survey toolkit. A component that will training resources that can run in the cloud
require new development, based on an and give access to large data sets. The
existing pilot project, will be the area HCD and education platform development
of VLBI source-finding. At present, this and maintenance needs to be supported
does not include Bayesian inference by staff members embedded in the partner
and machine-learning that are relatively organisations that develop and maintain such
straightforward extensions of the pilot infrastructure for research. Material for running
source-finding project. workshops will be made openly available, and
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 12 | 22RUSS TAYLOR et al. BRICSKA
referenced extensively through the proposal exploiting the pivotal role of ICT and harnessing
participants’ global networks. the potential of big data; exploiting the full
By developing a dedicated platform to run potential of scientific knowledge and improving
training workshops, data science hackathons the performance of historically disadvantaged
and research schools that operates in the universities; focusing on inter-disciplinary
cloud, the HCD and skills development efforts research; expanding research infrastructure,
of this project will be greatly enhanced. including cyber-infrastructure; and exploiting
It will be possible to multiply this model the potential of STI for African development and
across BRICS countries, in smaller institutions continental integration.
as described above, and beyond. Relevant This research project is closely aligned with
experience of proposal participants includes the SA National Strategy for Multiwavelength
the support by IDIA of DARA Big Data Astronomy, building on the large investment
Africa research schools and hackathons South Africa has made in radio astronomy
(https://idia.ac.za/BigDataAfrica-2018/), the (through MeerKAT and the SKA), as well as at
Astronomy Data Science Toolkit developed by optical wavelengths (SALT, LSST). This proposal
the IAU OAD (http://datascience.astro4dev.org), seeks to build capacity in the development
and the expertise of our Chinese collaborators of solutions in processing, analysing, mining,
(NAOC, Alibaba Cloud). visualising and maximising the science coming
from increasing large, rich and complex
ALIGNMENT WITH NATIONAL STRATEGIES multi-wavelength data sets. By bringing the LSST
data to the SKA-mid at an SA LSST data centre,
South Africa we will enable multi-wavelength studies and
The South African government White Paper increase the impact of SKA science. In addition,
on Science Technology and Innovation (STI) ensuring access to the LSST alert stream will
envisions STI enabling inclusive and sustainable allow the full use of the rich range of telescopes
South African development in a changing within South Africa for transient follow-up.
world. This proposal addresses two of the Development of the hardware, software and
three high-level goals spelled out in the human capacity within the field of VLBI is a key
White Paper, namely to take advantage of part of SARAO and DST?s strategy to develop
opportunities presented by megatrends and the African VLBI Network and the expertise
technological change, and to contribute to within the SKA African partner countries to
a more inclusive economy at all levels. The become world-class users of this instrument.
grand data challenge in astronomy is part of Furthermore, this is directly aligned with the
the magatrend characterised as the fourth strategic goal of the AVN to form the backbone
industrial revolution, and the integrated HCD of the SKA Phase 2 to be built across the African
strategy is specifically aimed at making the continent.
development of skills high in demand more This project aligns with the National
inclusive, and to facilitate the translation Strategy for Human Capacity Development
of highly skilled young scientists into the for Research, Innovation and Scholarships
economy. Recognising that STI can contribute through the proposed collaborations on
to the Sustainable Development Goals, the globally competitive research and innovation.
White Paper highlights the importance of: These strategic international engagements will
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 13 | 22RUSS TAYLOR et al. BRICSKA
enhance the research and technological capacity scientific portal for the publication of DES data
for innovation within South Africa, as well as and the execution of scientific pipelines. Brazil
ensure the establishment and maintenance of is also participating in the LSST project, through
research infrastructure and platforms. In this the Brazilian Participation group (BPG), led by
way, the project is also aligned to the Strategic LIneA and the National Laboratory of Astronomy
Research Infrastructure Roadmap. (LNA). Through these different initiatives, Brazil
This project helps advance the SA National is contributing to the development of astronomy
Integrated Cyberinfrastructure System (NICIS). in South America through the development
It leverages the investment in the DIRISA of Big Data software and the availability of
Tier 2 data intensive research facility to HPC and network infrastructure. Moreover, it is
work with international partners on data investing in the education of multi-disciplinary
intensive challenges that will be applicable human resources founded in basic science and
to cyberinfrastructure solutions in SA for computer science.
astronomy and other disciplines such as
bioinformatics. China
China is one of the prominent participants in
Brazil
SKA. SKA global collaboration is an important
Brazil is engaged in establishing data centers in task for building a community of common
support to eScience. The initiative includes HPC destiny for mankind, as proposed by the
infrastructures, with the prominent availability Chinese government. The Chinese government
of the Santos Dumont supercomputer services put large investment in technology research,
available for the whole scientific community data center and human capital development.
and currently hosting more than 100 scientific Chinese scientists have been working closely
projects. In a complementary way, the RNP with members from other countries preparing
(National Research Network) is fostering the for the technology and science using SKA data,
improvement of links within Brazil and with and are involved in 13 associated science working
Africa, Europe and the US. In particular the groups and focus groups of SKA. The China SKA
Monet cable has been installed linking Chile to science team work together with the information,
Miami, and passing by São Paulo enabling the communication and computer industries aiming
transfer of data produced by the LSST project. at tackling the challenges associated with the
Additionally, the SAIL and SACS cables now link SKA big data, which will not only promote major
Brazil to Africa through Angola and Cameroon, original scientific discoveries, but also apply
respectively, improving the connection with the the obtained technological achievements for
African continent countries, including South stimulating the national economy. This project
Africa. Brazil is intensively involved in important is perfectly in step with the goals of China SKA
astronomy surveys. The LIneA laboratory hosts a team.
tertiary site of the Sloan SDSS III collaboration Exploring the synergy between FAST and the
providing a replica of the SDSS portal with MeerKAT/SKA projects is part of the research
access to published data from the Latin America content of this project, and will open up new
Community. Brazil additionally, supports the research areas for scientific users of both
Dark Energy Survey (DES) project, through FAST and SKA data. With the improvements in
the DES-Brazil consortium, making available a sensitivity and resolution by the combination
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 14 | 22RUSS TAYLOR et al. BRICSKA
of FAST and MeerKAT, the survey area of India is a full partner in the SKA, and
their common sky is still quite large. The NCRA-TIFR is the nodal institute leading the
science cooperation between the largest single Indian participation in the SKA. Indian scientists
dish telescope, FAST, and the largest aperture and engineers lead by NCRA, with significant
synthesis array telescope, MeerKAT/SKA, in the participation from Indian industry, are involved
world, will result in detection of hundreds or in several work packages of SKA. In particular,
even thousands of new pulsars and some exotic NCRA is leading the Telescope Manager (TM) work
events, as well as allowing the study in details package, which is the brain and nervous system
of the exotic events and the HI gas in nearby of the SKA. In addition, researchers from India
galaxies. This is also highly consistent with the are involved in significant roles in projects on
scientific goal of FAST. SKA precursor facilities like MALS on MeerKAT
National Astronomical Data Center (NADC) of in South Africa and Solar science on MWA in
China is recently established based in National Australia, many of which involve the need to
Astronomical Observatories, CAS (NAOC) on the handle large volumes of data. India also plans
purpose of managing astronomical data at the to set-up a SKA regional data centre to cater
national level and providing data services and to the specific needs of storing SKA data and
technologies for the whole life cycle of data developing specialised analysis pipelines that
management, including outreach and education. are of benefit to Indian researchers involved with
This project works on cloud-based computing key science projects of the SKA.
environments and scientific platforms for data Finally, much of the above activity envisaged
users, and on data techniques including machine in India is being carried out in collaboration
learning, visualisation and data mining, which is with industry partners, many of whom bring new
aligned with NADC’s goal. ideas to the table and are willing to (and are
well equipped to) take on the challenges of big
India data analytics that are entailed in these ongoing
The Giant Metrewave Radio Telescope is a and planned activities. This includes the exciting
SKA pathfinder facility. The recently completed new field of machine learning which is finding
upgrade of the GMRT has further enhanced increased role in the plans of researchers from
the front-line capabilities of the GMRT, and India.
several exciting new discoveries and results Given the above, this project is very strongly
are being produced. It has also resulted in aligned with both institutional and national
an order of magnitude increase in the data level strategies and plans. Indian researchers
rates from the observatory, making it imperative and institutions will be able to contribute
to adopt big data intensive techniques, and meaningfully to this initiative, and also stand to
improved, efficient data analysis pipelines to benefit significantly in their efforts on existing
process the data. In order to maximise the and planned Indian and other BRICS member
science returns from the uGMRT, NCRA plans to facilities. It will also allow Indian industry to
implement imaging pipelines to create science engage and grow in these high-tech areas
ready products for the benefit of users who do with a view to becoming a major player in
not have the resource to handle large data. There the global arena. The goals and activities
is significant scope for new developments, and of this collaborative project also align well
these are on the road-map going forward. with India’s plans for effective participation in
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 15 | 22RUSS TAYLOR et al. BRICSKA
the construction phase of the SKA, including size and scale. The development of the research
expediting the work on setting up of a SKA infrastructure and algorithms needed to tackle
regional data centre in India, and enhancing the big data and big compute challenges
meaningful partnership with BRICS members addressed in this proposal lends itself to joint
participating in the SKA. research and development projects with industry
for the local, regional and global markets. A
Russia multi country collaboration, involving industry
partners is critically needed for this activity.
In the future of multi-wavelength, multi-messenger
A successful collaboration will lead to the
astrophysics, it is important to develop
development of human and technical capacity
the ability to organise timely, coordinated
within academia and industry in all BRICS
observations across multiple facilities with ever
countries. For example, India has an ongoing
increasing volumes of observational data, as well
engagement with industry in several areas.
as the technical capabilities and expertise to
NCRA is already in collaboration with some
efficiently sort, search and analyze large sets of
industry partners like Thoughtworks India Ltd,
heterogeneous observations. Russia operates,
Persistent Systems Ltd and TCS Research on
or is involved with, many observational facilities,
handling large data, machine learning and other
including numerous ground-based optical and
technical aspects with regard to SKA regional
radio telescopes, the BDUNT and BNO neutrino
data centers. In Brazil Petrobras and DELL Brazil
telescopes, the Russian-German Spektr-RG
have expressed interest in the project. In South
space-based X-ray observatory, two international
Africa, there are ongoing conversations with
networks of optical telescopes (MASTER and
IBM Research - Africa and SAC, a local Space
ISON), and the proposed BRICS-based network
Engineering company. Underpinning the growth
of optical telescopes. Besides the technical
of industrial and economic activity is also the
requirements outlined above, the fusion of
need for trained talent. The inspirational aspect
these many astrophysical data sets also requires
of astronomy attracts young people to science
intensive training of new specialists and human
and technology fields but not all will become
capital development, with a particular focus
astronomers. Engaging in those studies, however,
on the role of exchange between the BRICS
leads to the acquisition of skills that are in high
countries. Besides direct application to the field
demand in industry. This flagship will play a
of astrophysics, the knowledge and techniques
key role in making that talent available for the
developed throughout the course of the project
growth of industry in the BRICS nations.
are a high priority for other fields faced with
similar challenges, including geophysics and
Revealing and developing untapped talent
climate research, telecommunications, energy
efficiency, material sciences and medicine, BRICS countries are characterised by a growing
among others. young population hungry for opportunities. HCD
will take place through graduate study support
BENEFITS BEYOND ASTRONOMY in the form of scholarships and participation in
the research and innovation that this flagship
Economic impact: collaborations with industry
will drive among other things. As part of their
Building and managing a SKA regional centre is training, young researchers supported by this
a formidable challenge due to its unprecedented network will learn communication skills and the
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 16 | 22RUSS TAYLOR et al. BRICSKA
international aspect of the project will equip The inclusive participation of all research
them with multicultural experiences critical to institutions in the BRICS countries needs to
being competitive in a global workforce. The joint be promoted. The international nature of this
experience of the proposal partners ensures a flagship guarantees that such efforts are not in
successful implementation of this approach to isolation and offer collaboration opportunities
HCD. previously not easily accessible for those
The benefits of an associated HCD program institutions. In South Africa for example, this
will be felt wider than astronomy and computing. means access for historically disadvantaged
Students and early career researchers have the institutions with the support of well established
technical and problem-solving skills that are institutions. In China, it may mean that smaller
vital to many sectors of a knowledge-driven universities in that vast country can join the
economy, and that are essential to using big endeavour. The experience of the IAU Office of
data to address some of the most pressing Astronomy for Development (OAD), the South
development issues. The exponential growth African Radio Astronomy Observatory (SARAO)
in data science and technology careers and the Inter-University Institute for Data
internationally means that research and Intensive Astronomy (IDIA) will ensure effective
collaboration in the areas of cloud-based and inclusion of smaller institutions.
high performance computing have high potential
for impact on human capital development. This Embedding young researchers in global
needs to be proactively nurtured as proposed in networks
this flagship project. Young researchers engaged in this broad
collaboration among leading institutions in
Access to research for smaller institutions BRICS partner countries will experience exciting
career opportunities. The will enhance the BRICS
In BRICS countries, many smaller research context as sources of scientific and technical
institutions and universities are not currently talent, and will strengthen the BRICS countries’
competitive on a global scale of scientific leadership in global science through investment
research. Many new researchers and students in strategic areas where BRICS displays particular
will have access to big data and big strength, such as radio astronomy. By developing
compute, through the gateway technologies and big data and big compute expertise and ensuring
cloud-based toolkit we propose to develop in participation of early-career researchers in this
this proposal. For example, researchers based flagship, this investment is guaranteed to pay
at the IAU/OAD, University of Zululand and IDIA off considering the roles those play in the
(South Africa) will collaborate with researchers development of economies in the context of the
in China (NAOC), India (NCRA-TIFR, IUCAA) and 4th industrial revolution.
Brazil (LNCC). Access will be established through Enabling young scientists to work
training workshops at institutions and through on interdisciplinary projects on research
the support of participation of students from infrastructure is an important element to
those institutions in research activities of facilitate the beneficial dissemination of trained
the project. In addition to this, it offers the big data and big compute scientists across other
opportunity to train IT staff and researchers in sectors and raises the participants’ awareness of
new technology. how they can contribute to development. Such
An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 17 | 22You can also read