About

We are an international network of re­searchers from Carnegie Mellon University, Laboratoire de Physique de Clermont Auvergne, Space Research Institute, Sternberg astronomical institute, and University of Surrey joined together to solve the problem of detecting unusual objects in astronomical databases with machine learning methods. We are committed to applying our skills and knowl­edge to new fas­ci­nat­ing prob­lems in as­tro­physics. We are open to in­ter­na­tional ex­changes and col­lab­o­ra­tions. We also wel­come stu­dents, post­graduates, and researchers will­ing to discuss the future of machine learning applications in astronomical discovery.


Latest news

October 18, 2024

Congratulations to Dr. Etienne Russeil with an excellent defense

Read more

September 17, 2024

Real-bogus scores for active anomaly detection by Timofey Semenikhin et al. is now available on arXiv

August 14, 2023

First results from SNAD-VI Workshop is now published in RNAAS

Read more

February 28, 2023

SNAD viewer paper is now published in PASP

Read more

July 18, 2022

Gaia DR3 light curves are now available in the SNAD viewer

Read more

June 16, 2022

Could SNAD160 be a Pair-instability Supernova?

Read more

Other news

Project

The next generation of astronomical surveys will revolutionize our understanding of the Universe, raising unprecedented data challenges in the process. One of them is the impossibility to rely on human scanning for the identification of unusual astrophysical objects. Moreover, given that most of the available data will be in the form of photometric observations, such characterization cannot rely on the existence of high resolution spectroscopic observations. The goal of this project is to develop a pipeline where human expertise and modern machine learning techniques can complement each other in the task of identifying unusual astronomical objects. Our approach contains two steps. First one is the data preparation and search for the optimal machine learning algorithm for working with this data set to identify the outliers. During the second step we analyse all found outliers using the information in other wavelength diapasons, additional observations and theoretical modelling. The strategy can be applied to a large data set for which we only have photometric observations. Enabling reliable anomaly/outlier detection based solely on photometric observations is one of the fundamental puzzles to be solved before we can convert the full potential of large-scale surveys into scientific results. This project represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire.

Products

SNAD discoveries at TNS

SNAD ZTF DR viewer

SNAD catalog

SNAD catalog of artefacts

Publications:

Machine Learning Analysis of Supernova Light Curves

Pruzhinskaya M., Malanchev K., Kornilov M. et. al, 2019, Proceedings of Science, 342, id. 51

Machine Learning Analysis of Supernova Light Curves

Pruzhinskaya M.V., Malanchev K., Kornilov M. et. al, 2019, Proceedings of Science, 342, id. 51, DOI:10.22323/1.342.0051

Abstract. The next generation of astronomical surveys will revolutionize our understanding of the Universe, raising unprecedented data challenges in the process. One of them is the impossibility to rely on human scanning for the identification of unusual/unpredicted astrophysical objects. Moreover, given that most of the available data will be in the form of photometric observations, such characterization cannot rely on the existence of high resolution spectroscopic observations. We introduce an analysis of anomaly detection in the Open Supernova Catalog (http://sne.space/) with use of machine learning. We developed a strategy and pipeline — where anomalous objectsare identified and then submitted to careful individual analysis. This project represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire.

Keywords. Supernovae: general, anomaly detection

Figure. Light curves of binary microlensing event Gaia16aye — case of misclassification in the Open Supernova Catalog. Solid lines are the results of our approximation by MULTIVARIATE GAUSSIAN PROCESS.

Machine learning techniques for analysis of photometric data from the Open Supernova catalog

Kornilov M.V., Pruzhinskaya M.V., Malanchev K.L. et al., 2019, Proceedings of the International Conference "The multi-messenger astronomy: gamma-ray bursts, search for electromagnetic counterparts to neutrino events and gravitational waves, Publishing house SNEG Pyatigorsk, pp. 100-110

Machine learning techniques for analysis of photometric data from the Open Supernova catalog

Kornilov M.V., Pruzhinskaya M.V., Malanchev K.L. et al., 2019, Proceedings of the International Conference "The multi-messenger astronomy: gamma-ray bursts, search for electromagnetic counterparts to neutrino events and gravitational waves, Publishing house SNEG Pyatigorsk, pp. 100-110, DOI:10.26119/SAO.2019.1.35517

Abstract. The next generation of astronomical surveys will revolutionize our understanding of the Universe, raising unprecedented data challenges in the process. One of them is the impossibility to rely on human scanning for the identification of unusual/unpredicted astrophysical objects. Moreover, given that most of the available data will be in the form of photometric observations, such characterization cannot rely on the existence of high resolution spectroscopic observations. The goal of this project is to detect the anomalies in the Open Supernova Catalog with use of machine learning. We will develop a pipeline where human expertise and modern machine learning techniques c an complement each other. Using supernovae as a case study, our proposal is divided in two parts: the first developing a strategy and pipeline where anomalous objects are identified, and a second phase where such anomalous objects submitted to careful individual analysis. The strategy requires an initial data set for which spectroscopic is available for training purposes, but can be applied to a much larger data set for which we only have photometric observations. This project represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire.

Keywords. Machine learning techniques, supernovae, astronomical surveys

Figure. Light curves of SN 2016bln that belongs to rare sub-type Ia-91T.

Use of Machine Learning for Anomaly Detection Problem in Large Astronomical Databases

Malanchev K., Volnova A., Kornilov M. et al., 2019, CEUR Workshop Proceedings, 21st International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2019, 2523, pp. 205-216

Use of Machine Learning for Anomaly Detection Problem in Large Astronomical Databases

Malanchev K., Volnova A., Kornilov M. et al., 2019, CEUR Workshop Proceedings, 21st International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2019, 2523, pp. 205-216

Abstract. In this work, we address the problem of anomaly detection in large astronomical databases by machine learning methods. The importance of such study is justified by the presence of a large amount of astronomical data that cannot be processed only by human resource. We focus our attention on finding anomalous light curves in the Open Supernova Catalog. Few types of anomalies are considered: the artifacts in the data, the cases of misclassification and the presence of previously unclassified objects. On a dataset of ~2000 supernova (SN) candidates, we found several interesting anomalies: one active galactic nucleus (SN2006kg), one binary microlensing event (Gaia16aye), representatives of rare classes of SNe such as super-luminous supernovae, and highly reddened objects.

Keywords. Gaussian processes, isolation forest, machine learning, supernovae, transients

Figure. Three-dimensional t-SNE reduced data after application of the isolation forest algorithm. Each point represents a supernova light curve from the data set projected into the three- dimensional space with the coordinates (x1, x2, x3). The intensity of the colour indicates the anomaly score for each object as estimated by the isolation forest algorithm – bluer colour means more anomalous light curve behaviour.

Anomaly Detection in the Open Supernova Catalog

Pruzhinskaya M., Malanchev K., Kornilov M., Ishida E., Mondon F., Volnova A., Korolev V., 2019, MNRAS, 489, 3, pp. 3591-3608

Anomaly Detection in the Open Supernova Catalog

Pruzhinskaya M., Malanchev K., Kornilov M., Ishida E., Mondon F., Volnova A., Korolev V., 2019, MNRAS, 489, 3, pp. 3591-3608, DOI:10.1093/mnras/stz236

Abstract. In the upcoming decade, large astronomical surveys will discover millions of transients raising unprecedented data challenges in the process. Only the use of the machine learning algorithms can process such large data volumes. Most of the discovered transients will belong to the known classes of astronomical objects. However, it is expected that some transients will be rare or completely new events of unknown physical nature. The task of finding them can be framed as an anomaly detection problem. In this work, we perform for the first time an automated anomaly detection analysis in the photometric data of the Open Supernova Catalog (OSC), which serves as a proof of concept for the applicability of these methods to future large-scale surveys. The analysis consists of the following steps: (1) data selection from the OSC and approximation of the pre-processed data with Gaussian processes, (2) dimensionality reduction, (3) searching for outliers with the use of the isolation forest algorithm, and (4) expert analysis of the identified outliers. The pipeline returned 81 candidate anomalies, 27 (33 per cent) of which were confirmed to be from astrophysically peculiar objects. Found anomalies correspond to a selected sample of 1.4 per cent of the initial automatically identified data sample of approximately 2000 objects. Among the identified outliers we recognized superluminous supernovae, non-classical Type Ia supernovae, unusual Type II supernovae, one active galactic nucleus and one binary microlensing event. We also found that 16 anomalies classified as supernovae in the literature are likely to be quasars or stars. Our proposed pipeline represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire. All code and products of this investigation are made publicly available.

Keywords. Methods: data analysis, catalogues, supernovae: general

Figure. Light curves of superluminous supernova PTF10aagc. Solid lines are the results of our approximation by Multivariate Gaussian Process.

Active Anomaly Detection for time-domain discoveries

Ishida Emille E. O., Kornilov Matwey V., Malanchev Konstantin L., Pruzhinskaya Maria V., Volnova Alina A., Korolev Vladimir S., Mondon Florian, Sreejith Sreevarsha, Malancheva Anastasia, 2021, Astronomy & Astrophysics 650, A195

Active Anomaly Detection for time-domain discoveries

Ishida Emille E. O., Kornilov Matwey V., Malanchev Konstantin L., Pruzhinskaya Maria V., Volnova Alina A., Korolev Vladimir S., Mondon Florian, Sreejith Sreevarsha, Malancheva Anastasia, Das Shubhomoy, 2019, Astronomy & Astrophysics 650, A195, DOI:10.1051/0004-6361/202037709

Abstract. We present the first evidence that adaptive learning techniques can boost the discovery of unusual objects within astronomical light curve data sets. Our method follows an active learning strategy where the learning algorithm chooses objects which can potentially improve the learner if additional information about them is provided. This new information is subsequently used to update the machine learning model, allowing its accuracy to evolve with each new information. For the case of anomaly detection, the algorithm aims to maximize the number of scientifically interesting anomalies presented to the expert by slightly modifying the weights of a traditional Isolation Forest (IF) at each iteration. In order to demonstrate the potential of such techniques, we apply the Active Anomaly Discovery (AAD) algorithm to 2 data sets: simulated light curves from the PLAsTiCC challenge and real light curves from the Open Supernova Catalog. We compare the AAD results to those of a static IF. For both methods, we performed a detailed analysis for all objects with the ~2% highest anomaly scores. We show that, in the real data scenario, AAD was able to identify ~80% more true anomalies than the IF. This result is the first evidence that AAD algorithms can play a central role in the search for new physics in the era of large scale sky surveys.

Keywords. Methods: data analysis, supernovae: general, stars: variables: general

Figure. Fraction of anomalies as a function the total number of candidates for the simulated PLAsTiCC data set. The full line represents the mean and shaded regions mark 5-95 percentiles of results obtained from 2000 realizations with differentrandom seeds.

Realization of Different Techniques for Anomaly Detection in Astronomical Databases

Konstantin Malanchev, Vladimir Korolev, Matwey Kornilov et al., 2020, In: Elizarov A., Novikov B., Stupnikov S. (eds), Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2019. Communications in Computer and Information Science, vol 1223. Springer, Cham, pp. 97-107

Realization of Different Techniques for Anomaly Detection in Astronomical Databases

Konstantin Malanchev, Vladimir Korolev, Matwey Kornilov et al., 2020, In: Elizarov A., Novikov B., Stupnikov S. (eds), Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2019. Communications in Computer and Information Science, vol 1223. Springer, Cham, pp. 97-107, DOI:10.1007/978-3-030-51913-1_7

Abstract. In this work we address the problem of anomaly detection in large astronomical databases by machine learning methods. The importance of such study is justified by the existence of a large amount of astronomical data that can not be processed only by human resource. We evaluated five anomaly detection algorithms to find anomalies in the light curve data of the Open Supernova Catalog. Comparison of the algorithms revealed that expert supervised active anomaly detection method shows the best performance, while among purely unsupervised techniques Gaussian mixture model and one-class support vector machine methods outperform isolation forest and local outlier factor methods.

Keywords. Active anomaly detection, isolation forest, local outlier factor, machine learning, one-class SVM, supernovae

Figure. Fraction of anomalies as a function the total number of candidates (outliers) scrutinised by the expert.

The Most Interesting Anomalies Discovered in ZTF DR3 from the SNAD-III Workshop

Patrick D. Aleo, Emille E. O. Ishida, Matwey Kornilov et al., 2020, Res. Notes AAS, 4, 112

The Most Interesting Anomalies Discovered in ZTF DR3 from the SNAD-III Workshop

Patrick D. Aleo., Emille E. O. Ishida, Matwey Kornilov et al., 2020, Res. Notes AAS, 4, 112, DOI:10.3847/2515-5172/aba6e8

Abstract. The search for objects with unusual astronomical properties, or anomalies, is one of the most anticipated results to be delivered by the next generation of large scale astronomical surveys. Moreover, given the volume and complexity of current data sets, machine learning algorithms will undoubtedly play an important role in this endeavor. The SNAD team is specialized in the development, adaptation and improvement of such techniques with the goal of constructing optimal anomaly detection strategies for astronomy. We present here the preliminary results from the third annual SNAD workshop (https://snad.space/2020/) that was held on-line in 2020 July.

Keywords. Supernovae, transient detection, AGN host galaxies, outlier detection

Figure. The light curves of PSO J011.0457+41.5548 — luminous blue variable candidate from the M31 field.

Anomaly detection in the Zwicky Transient Facility DR3

Malanchev K. L., Pruzhinskaya M. V., Korolev V. S., Aleo P. D., Kornilov M. V., Ishida E. E. O., Krushinsky V. V., Mondon F., Sreejith S., Volnova A. A., Belinski A. A., Dodin A. V., Tatarnikov A. M., Zheltoukhov S. G., 2021, MNRAS, 502, 4, pp. 5147-5175

Anomaly detection in the Zwicky Transient Facility DR3

Malanchev K. L., Pruzhinskaya M. V., Korolev V. S., Aleo P. D., Kornilov M. V., Ishida E. E. O., Krushinsky V. V., Mondon F., Sreejith S., Volnova A. A., Belinski A. A., Dodin A. V., Tatarnikov A. M., Zheltoukhov S. G., 2021 MNRAS, 502, 4, pp. 5147-5175, DOI:10.1093/mnras/stab316

Abstract. We present results from applying the SNAD anomaly detection pipeline to the third public data release of the Zwicky Transient Facility (ZTF DR3). The pipeline is composed of 3 stages: feature extraction, search of outliers with machine learning algorithms and anomaly identification with followup by human experts. Our analysis concentrates in three ZTF fields, comprising more than 2.25 million objects. A set of 4 automatic learning algorithms was used to identify 277 outliers, which were subsequently scrutinised by an expert. From these, 188 (68%) were found to be bogus light curves — including effects from the image subtraction pipeline as well as overlapping between a star and a known asteroid, 66 (24%) were previously reported sources whereas 23 (8%) correspond to non-catalogued objects, with the two latter cases of potential scientific interest (e. g. 1 spectroscopically confirmed RS Canum Venaticorum star, 4 supernovae candidates, 1 red dwarf flare). Moreover, using results from the expert analysis, we were able to identify a simple bi-dimensional relation which can be used to aid filtering potentially bogus light curves in future studies. We provide a complete list of objects with potential scientific application so they can be further scrutinised by the community. These results confirm the importance of combining automatic machine learning algorithms with domain knowledge in the construction of recommendation systems for astronomy. Our code is publicly available at https://github.com/snad-space/zwad.

Keywords. Methods: data analysis, stars: variables: general, transients: supernovae, astronomical data bases: miscellaneous

Figure. The light curves of AT 2017ixs — nova candidate from the M31 field that behaves unusually for its suspected astrophysical type.

SNAD Transient Miner: Finding Missed Transient Events in ZTF DR4 using k-D trees

Aleo P. D., Malanchev K. L., Pruzhinskaya M. V., Ishida E. E. O., E. Russeil, Kornilov M. V., Korolev V. S., Sreejith S., Volnova A. A., G. S. Narayan, 2022, New Astronomy, Vol. 96

SNAD Transient Miner: Finding Missed Transient Events in ZTF DR4 using k-D trees

Aleo P. D., Malanchev K. L., Pruzhinskaya M. V., Ishida E. E. O., E. Russeil, Kornilov M. V., Korolev V. S., Sreejith S., Volnova A. A., G. S. Narayan, 2022 New Astronomy, Vol. 96, DOI:10.1016/j.newast.2022.101846

Abstract. We report the automatic detection of 11 transients (7 possible supernovae and 4 active galactic nuclei candidates) within the Zwicky Transient Facility fourth data release (ZTF DR4), all of them observed in 2018 and absent from public catalogs. Among these, three were not part of the ZTF alert stream. Our transient mining strategy employs 41 physically motivated features extracted from both real light curves and four simulated light curve models (SN Ia, SN II, TDE, SLSN-I). These features are input to a k-D tree algorithm, from which we calculate the 15 nearest neighbors. After pre-processing and selection cuts, our dataset contained approximately a million objects among which we visually inspected the 105 closest neighbors from seven of our brightest, most well-sampled simulations, comprising 92 unique ZTF DR4 sources. Our result illustrates the potential of coherently incorporating domain knowledge and automatic learning algorithms, which is one of the guiding principles directing the SNAD team. It also demonstrates that the ZTF DR is a suitable testing ground for data mining algorithms aiming to prepare for the next generation of astronomical data.

Keywords. Transient sources (1851), Time domain astronomy (2109), Supernovae (1668), Active galactic nuclei (16)

Figure. SNAD150 light curves in zr- and zg-bands within the first 420 days of ZTF DR4, generated with the ZTF SNAD viewer.

Could SNAD160 be a Pair-instability Supernova?

Maria V. Pruzhinskaya, Alina Volnova, Matwey Kornilov et al., 2022, Res. Notes AAS, 6, 122

Could SNAD160 be a Pair-instability Supernova?

Maria V. Pruzhinskaya, Alina Volnova, Matwey Kornilov et al., 2022, Res. Notes AAS, 6, 122, DOI:10.3847/2515-5172/ac76cf

Abstract. The SNAD team reports the discovery of SNAD160 (AT2018lzi) within the Zwicky Transient Facility third data release. The transient has been found using the active anomaly detection algorithm, an adaptive learning strategy aimed at incorporating expert knowledge into machine learning models. Our preliminary analysis shows that SNAD160 could be a superluminous supernova powered by a pair-instability mechanism — its light curve behavior is consistent with the observed slow rise and slow decay expected from these events.

Keywords. Supernovae; Transient detection; Transient sources; Light curve classification; Time series analysis; Astronomical object identification

Figure. SNAD160 zr-light curve (gray circles) – compared to the normal SN Ia Nugent's model (solid blue line), normal SN IIP 1999em model (solid red line) from Vincenzi et al. (2019) in zr-band, and synthetic R-band light curves of bright PISN models – He130 (green line) and R250 (purple line) from Kasen et al. (2011) at z=0.3 (dashed) and z=0.4 (solid). All models are shifted to the observer frame.

Supernova search with active learning in ZTF DR3

Maria V. Pruzhinskaya, Emille E. O. Ishida, Alexandra K. Novinskaya, Etienne Russeil, Alina A. Volnova, Konstantin L. Malanchev, Matwey V. Kornilov, Patrick D. Aleo, Vladimir S. Korolev, Vadim V. Krushinsky, Sreevarsha Sreejith, Emmanuel Gangler, 2023, A&A, Volume 672, A111

Supernova search with active learning in ZTF DR3

Maria V. Pruzhinskaya, Emille E. O. Ishida, Alexandra K. Novinskaya, Etienne Russeil, Alina A. Volnova, Konstantin L. Malanchev, Matwey V. Kornilov, Patrick D. Aleo, Vladimir S. Korolev, Vadim V. Krushinsky, Sreevarsha Sreejith, Emmanuel Gangler, 2023, A&A, Volume 672, A111, DOI: 10.1051/0004-6361/202245172

Abstract. In order to explore the potential of adaptive learning techniques to big data sets, the SNAD team used Active Anomaly Discovery (AAD) as a tool to search for new supernova (SN) candidates in the photometric data from the first 9.4 months of the Zwicky Transient Facility survey - between 2018 March 17 and December 31 (58194 < MJD < 58483). We analysed 70 ZTF fields with high galactic latitude and visually inspected 2100 outliers. This resulted in 104 supernova-like objects found, 57 of them were reported to the Transient Name Server for the first time and 47 were previously mentioned in other catalogues either as supernovae with known types or as supernova candidates. We visually inspected the multi-colour light curves of the non-catalogued transients and performed their fit with different supernova models to assign it to a proper class: Ia, Ib/c, IIP, IIL, IIn. Moreover, we also identified unreported slow-evolving transients which are good superluminous SN candidates, and a few others non-catalogued objects, such as red dwarf flares and active galactic nuclei. Beyond confirming the effectiveness of human-machine integration underlying the AAD strategy, our results shed light on potential leaks in currently available pipelines and can help avoid similar losses in future large scale astronomical surveys. The algorithm enables directed search of any type of data and definition of anomaly chosen by the expert.

Keywords. supernovae: general; transients: supernovae; methods: data analysis

Figure. Light curve fit of SNAD137 by Nugent’s Type IIn supernova model. Observational data correspond to OIDs: 825102200009050 (zg), 825202200039582 (zr), 825302200018371 (zi).

The SNAD Viewer: Everything You Want to Know about Your Favorite ZTF Object

Konstantin Malanchev, Matwey V. Kornilov, Maria V. Pruzhinskaya, Emille E. O. Ishida, Patrick D. Aleo, Vladimir S. Korolev, Anastasia Lavrukhina, Etienne Russeil, Sreevarsha Sreejith, Alina A. Volnova, Anastasiya Voloshina, Alberto Krone-Martins, 2023, PASP, Volume 135, Number 1044

The SNAD Viewer: Everything You Want to Know about Your Favorite ZTF Object

Konstantin Malanchev, Matwey V. Kornilov, Maria V. Pruzhinskaya, Emille E. O. Ishida, Patrick D. Aleo, Vladimir S. Korolev, Anastasia Lavrukhina, Etienne Russeil, Sreevarsha Sreejith, Alina A. Volnova, Anastasiya Voloshina, Alberto Krone-Martins, 2023, PASP, Volume 135, Number 1044, DOI: 10.1088/1538-3873/acb292

Abstract. We describe the SNAD Viewer, a web portal for astronomers which presents a centralized view of individual objects from the Zwicky Transient Facility's (ZTF) data releases, including data gathered from multiple publicly available astronomical archives and data sources. Initially built to enable efficient expert feedback in the context of adaptive machine learning applications, it has evolved into a full-fledged community asset that centralizes public information and provides a multi-dimensional view of ZTF sources. For users, we provide detailed descriptions of the data sources and choices underlying the information displayed in the portal. For developers, we describe our architectural choices and their consequences such that our experience can help others engaged in similar endeavors or in adapting our publicly released code to their requirements. The infrastructure we describe here is scalable and flexible and can be personalized and used by other surveys and for other science goals. The Viewer has been instrumental in highlighting the crucial roles domain experts retain in the era of big data in astronomy. Given the arrival of the upcoming generation of large-scale surveys, we believe similar systems will be paramount in enabling the materialization of scientific potential enclosed in current terabyte and future petabyte-scale data sets. The Viewer is publicly available online at https://ztf.snad.space

Keywords. Astronomy web services – Astronomy software

Figure. Diagram of the service infrastructure. Dashed rectangles represent individual services, circles are web modules and complimentary scripts, cylinders are database management systems, and the display is the Portal module. Parallelograms are external services used by the Viewer. Lines show data flow, double circles mark data receivers. Arrows show data exchange between modules of a single service.

RAINBOW: A colorful approach to multipassband light-curve estimation

Russeil E., Malanchev K. L., Aleo P. D., Ishida E. E. O., Pruzhinskaya M. V., Gangler E., Lavrukhina A. D., Volnova A. A., Voloshina A., Semenikhin T., Sreejith S., Kornilov M. V., Korolev V. S., 2024, A&A, Volume 683, id.A251, 13 pp.

RAINBOW: A colorful approach to multipassband light-curve estimation

Russeil E., Malanchev K. L., Aleo P. D., Ishida E. E. O., Pruzhinskaya M. V., Gangler E., Lavrukhina A. D., Volnova A. A., Voloshina A., Semenikhin T., Sreejith S., Kornilov M. V., Korolev V. S., 2024, A&A, Volume 683, id.A251, 13 pp., DOI: 10.1051/0004-6361/202348158

Abstract. Context. Time series generated by repeatedly observing astronomical transients are generally sparse, irregularly sampled, noisy, and multidimensional (obtained through a set of broad-band filters). In order to fully exploit their scientific potential, it is necessary to use this incomplete information to estimate a continuous light-curve behavior. Traditional approaches use ad hoc functional forms to approximate the light curve in each filter independently (hereafter, the MONOCHROMATIC method). Aims: We present RAINBOW, a physically motivated framework that enables simultaneous multiband light-curve fitting. It allows the user to construct a 2D continuous surface across wavelength and time, even when the number of observations in each filter is significantly limited. Methods: Assuming the electromagnetic radiation emission from the transient can be approximated by a blackbody, we combined an expected temperature evolution and a parametric function describing its bolometric light curve. These three ingredients allow the information available in one passband to guide the reconstruction in the others, thus enabling a proper use of multisurvey data. We demonstrate the effectiveness of our method by applying it to simulated data from the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC) as well as to real data from the Young Supernova Experiment (YSE DR1). Results: We evaluate the quality of the estimated light curves according to three different tests: goodness of fit, peak-time prediction, and ability to transfer information to machine-learning (ML) based classifiers. The results confirm that RAINBOW leads to an equivalent goodness of fit (supernovae II) or to a goodness of fit that is better by up to 75% (supernovae Ibc) than the MONOCHROMATIC approach. Similarly, the accuracy improves for all classes in our sample when the RAINBOW best-fit values are used as a parameter space in a multiclass ML classification. Conclusions: Our approach enables a straightforward light-curve estimation for objects with observations in multiple filters and from multiple experiments. It is particularly well suited when the light-curve sampling is sparse. We demonstrate its potential for characterizing supernova-like events here, but the same approach can be used for other classes by changing the function describing the light-curve behavior and temperature representation. In the context of the upcoming large-scale sky surveys and their potential for multisurvey analysis, this represents an important milestone in the path to enable population studies of photometric transients.

Keywords. methods: data analysis; stars: general; supernovae: general; Astrophysics - Instrumentation and Methods for Astrophysics; Physics - Data Analysis; Statistics and Probability

Figure. Quality of the light-curve fit to a PLAsTiCC SNIa light curve. The red diamonds represent points that were randomly removed and were only used to compute the nRMSEo error. The fits were performed considering only points shown as dark blue circles. The error bars are displayed, but are contained within the points.

SNAD catalogue of M-dwarf flares from the Zwicky Transient Facility

Voloshina A. S., Lavrukhina A. D., Pruzhinskaya M. V., Malanchev K. L., Ishida E. E. O., Krushinsky V. V., Aleo P. D., Gangler E., Kornilov M. V., Korolev V. S., Russeil E., Semenikhin T. A., Sreejith S., Volnova A. A., 2024, MNRAS, Volume 553, Issue 4, pp.4309-4323

SNAD catalogue of M-dwarf flares from the Zwicky Transient Facility

Voloshina A. S., Lavrukhina A. D., Pruzhinskaya M. V., Malanchev K. L., Ishida E. E. O., Krushinsky V. V., Aleo P. D., Gangler E., Kornilov M. V., Korolev V. S., Russeil E., Semenikhin T. A., Sreejith S., Volnova A. A., 2024, MNRAS, Volume 553, Issue 4, pp.4309-4323, DOI: 10.1093/mnras/stae2031

Abstract. Most of the stars in the Universe are M spectral class dwarfs, which are known to be the source of bright and frequent stellar flares. In this paper, we propose new approaches to discover M-dwarf flares in ground-based photometric surveys. We employ two approaches: a modification of a traditional method of parametric fit search and a machine learning algorithm based on active anomaly detection. The algorithms are applied to Zwicky Transient Facility (ZTF) data release 8, which includes the data from the ZTF high-cadence survey, allowing us to reveal flares lasting from minutes to hours. We analyse over 35 million ZTF light curves and visually scrutinize 1168 candidates suggested by the algorithms to filter out artefacts, occultations of a star by an asteroid, and other types of known variable objects. The result of this analysis is the largest catalogue of ZTF flaring stars to date, representing 134 flares with amplitudes ranging from -0.2 to -4.6 mag, including repeated flares. Using Pan-STARRS DR2 colours, we assign a spectral subclass to each object in the sample. For 13 flares with well-sampled light curves and available geometric distances from Gaia DR3, we estimate the bolometric energy. This research shows that the proposed methods combined with the ZTF's cadence strategy are suitable for identifying M-dwarf flares and other fast transients, allowing for the extraction of significant astrophysical information from their light curves.

Keywords. methods: data analysis – surveys – stars: activity – stars: flare – stars: late-type

Figure. Flowchart of the active machine learning method used to discover M-dwarf flares.

Exploring the Universe with SNAD: Anomaly Detection in Astronomy

Alina A. Volnova, Patrick D. Aleo, Anastasia Lavrukhina, Etienne Russeil, Timofey Semenikhin, Emmanuel Gangler, Emille E. O. Ishida, Matwey V. Kornilov, Vladimir Korolev, Konstantin Malanchev, Maria V. Pruzhinskaya, and Sreevarsha Sreejith, 2024, DAMDID/RCDL 2023, CCIS 2086, pp. 195–208

Exploring the Universe with SNAD: Anomaly Detection in Astronomy

Voloshina A. S., Lavrukhina A. D., Pruzhinskaya M. V., Malanchev K. L., Ishida E. E. O., Krushinsky V. V., Aleo P. D., Gangler E., Kornilov M. V., Korolev V. S., Russeil E., Semenikhin T. A., Sreejith S., Volnova A. A., 2024, DAMDID/RCDL 2023, CCIS 2086, pp. 195–208, 2024, DOI: 10.1007/978-3-031-67826-4_15

Abstract. SNAD is an international project with a primary focus on detecting astronomical anomalies within large-scale surveys, using active learning and other machine learning algorithms. The work carried out by SNAD not only contributes to the discovery and classification of var- ious astronomical phenomena but also enhances our understanding and implementation of machine learning techniques within the field of astro- physics. This paper provides a review of the SNAD project and summarizes the advancements and achievements made by the team over several years.

Keywords. Methods: data analysis, Supernovae: general, Transients, Astronomical data bases

Figure.SNAD-VII Workshop, Rio de Janeiro, Brazil, May 2024.

Real-bogus scores for active anomaly detection

T. A. Semenikhin, M. V. Kornilov, M. V. Pruzhinskaya, A. D. Lavrukhina, E. Russeil, E. Gangler, E. E. O. Ishida, V. S. Korolev, K. L. Malanchev, A. A. Volnova, S. Sreejith, 2024, arXiv:2409.10256

Real-bogus scores for active anomaly detection

T. A. Semenikhin, M. V. Kornilov, M. V. Pruzhinskaya, A. D. Lavrukhina, E. Russeil, E. Gangler, E. E. O. Ishida, V. S. Korolev, K. L. Malanchev, A. A. Volnova, S. Sreejith, 2024, arXiv:2409.10256

Abstract. In the task of anomaly detection in modern time-domain photometric surveys, the primary goal is to identify astrophysically interesting, rare, and unusual objects among a large volume of data. Unfortunately, artifacts -- such as plane or satellite tracks, bad columns on CCDs, and ghosts -- often constitute significant contaminants in results from anomaly detection analysis. In such contexts, the Active Anomaly Discovery (AAD) algorithm allows tailoring the output of anomaly detection pipelines according to what the expert judges to be scientifically interesting. We demonstrate how the introduction real-bogus scores, obtained from a machine learning classifier, improves the results from AAD. Using labeled data from the SNAD ZTF knowledge database, we train four real-bogus classifiers: XGBoost, CatBoost, Random Forest, and Extremely Randomized Trees. All the models perform real-bogus classification with similar effectiveness, achieving ROC-AUC scores ranging from 0.93 to 0.95. Consequently, we select the Random Forest model as the main model due to its simplicity and interpretability. The Random Forest classifier is applied to 67 million light curves from ZTF DR17. The output real-bogus score is used as an additional feature for two anomaly detection algorithms: static Isolation Forest and AAD. While results from Isolation Forest remained unchanged, the number of artifacts detected by the active approach decreases significantly with the inclusion of the real-bogus score, from 27 to 3 out of 100. We conclude that incorporating the real-bogus classifier result as an additional feature in the active anomaly detection pipeline significantly reduces the number of artifacts in the outputs, thereby increasing the incidence of astrophysically interesting objects presented to human experts.

Keywords. Astronomy data analysis, Classification, Outlier detection, Sky surveys

Figure.Results of running AAD on two feature sets: the dependence of the anomalies fraction on the number of verified candidates.

Presentations:

Malanchev, K. et al. "Anomaly detection in ZTF DR3", DESC supernova group meeting, on-line, 15 December 2020 (slides).

Malanchev, K. et al. "Anomaly detection in ZTF DR3", Rubin Research Bytes Project & Community Workshop 2020, on-line, 18 August 2020 (slides, video).

Kornilov, M. et al. "Algorithms for the active anomaly detection in the era of wide-field astronomical surveys", Highlights of the Russian astrophysics 2019, Moscow, Russia, 16 December 2019.

Malanchev K., Volnova A. et al. "Use of machine learning for anomaly detection in large astronomical databases", XXI international conference "Data Analytics and Management in Data Intensive Domains" (DAMDID), Kazan, Russia, 15-18 October 2019 (slides).
Download poster

Pruzhinskaya, M. et al. "Machine learning and new classes of astrophysical objects", 7th School-seminar "Magneto-Plasma Processes in Relativistic Astrophysics", Tarusa, Russia, 17-21 June 2019.

Kornilov, M. et al. "Anomaly detection in the Open Supernova Catalog by machine learning algorithms", Lomonosov conference 2019, Moscow, Russia, 8-12 April 2019.

Pruzhinskaya, M. et al. "Anomaly detection in the Open Supernova Catalog by machine learning algorithms", IAU100 Special Session "Women and Girls in Astronomy", Moscow, Russia, 11 Febrary 2019 (slides, in Russian).

Kornilov, M. et al. "Anomaly detection in the Open Supernova Catalog by machine learning algorithms", Highlights of the Russian astrophysics 2018, Moscow, Russia, 17 December 2018.

Kornilov, M. et al. "Machine learning techniques for analysis of photometric data from the Open Supernova catalog", The multi-messenger astronomy: gamma-ray bursts, search for electromagnetic counterparts to neutrino events and gravitational waves, Nizhnij Arkhyz (SAO), Russia, 7-14 October 2018 (slides).

Pruzhinskaya, M. et al. "Machine Learning analysis of supernova light curves", Accretion Processes in Cosmic Sources II , Saint Petersburg, Russia, 3-8 September 2018 (slides).

Contact us