About

We are an international network of re­searchers from Sternberg astronomical institute, Laboratoire de Physique de Clermont, Space Research Institute, Moscow Institute of Physics and Technology, and University of Illinois Urbana-Champaign, joined together to solve the problem of detecting unusual objects in astronomical databases with machine learning methods. We are committed to applying our skills and knowl­edge to new fas­ci­nat­ing prob­lems in as­tro­physics. We are open to in­ter­na­tional ex­changes and col­lab­o­ra­tions. We also wel­come stu­dents, post­graduates, and researchers will­ing to discuss the future of machine learning applications in astronomical discovery.


Latest news

November 24, 2021

SNAD Transient Miner: Finding Missed Transient Events in ZTF DR4 using k-D trees

Read more

May 25, 2021

Hidden supernova candidates identified in archival data

Read more

February 10, 2021

A new anomaly detection pipeline for astronomical discovery and recommendation systems

Read more

Project

The next generation of astronomical surveys will revolutionize our understanding of the Universe, raising unprecedented data challenges in the process. One of them is the impossibility to rely on human scanning for the identification of unusual astrophysical objects. Moreover, given that most of the available data will be in the form of photometric observations, such characterization cannot rely on the existence of high resolution spectroscopic observations. The goal of this project is to develop a pipeline where human expertise and modern machine learning techniques can complement each other in the task of identifying unusual astronomical objects. Our approach contains two steps. First one is the data preparation and search for the optimal machine learning algorithm for working with this data set to identify the outliers. During the second step we analyse all found outliers using the information in other wavelength diapasons, additional observations and theoretical modelling. The strategy can be applied to a large data set for which we only have photometric observations. Enabling reliable anomaly/outlier detection based solely on photometric observations is one of the fundamental puzzles to be solved before we can convert the full potential of large-scale surveys into scientific results. This project represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire.

Products

SNAD discoveries at TNS

SNAD ZTF DR viewer

SNAD catalog

Publications:

Machine Learning Analysis of Supernova Light Curves

Pruzhinskaya M., Malanchev K., Kornilov M. et. al, 2019, Proceedings of Science, 342, id. 51

Machine Learning Analysis of Supernova Light Curves

Pruzhinskaya M.V., Malanchev K., Kornilov M. et. al, 2019, Proceedings of Science, 342, id. 51, DOI:10.22323/1.342.0051

Abstract. The next generation of astronomical surveys will revolutionize our understanding of the Universe, raising unprecedented data challenges in the process. One of them is the impossibility to rely on human scanning for the identification of unusual/unpredicted astrophysical objects. Moreover, given that most of the available data will be in the form of photometric observations, such characterization cannot rely on the existence of high resolution spectroscopic observations. We introduce an analysis of anomaly detection in the Open Supernova Catalog (http://sne.space/) with use of machine learning. We developed a strategy and pipeline — where anomalous objectsare identified and then submitted to careful individual analysis. This project represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire.

Keywords. Supernovae: general, anomaly detection

Figure. Light curves of binary microlensing event Gaia16aye — case of misclassification in the Open Supernova Catalog. Solid lines are the results of our approximation by MULTIVARIATE GAUSSIAN PROCESS.

Machine learning techniques for analysis of photometric data from the Open Supernova catalog

Kornilov M.V., Pruzhinskaya M.V., Malanchev K.L. et al., 2019, Proceedings of the International Conference "The multi-messenger astronomy: gamma-ray bursts, search for electromagnetic counterparts to neutrino events and gravitational waves, Publishing house SNEG Pyatigorsk, pp. 100-110

Machine learning techniques for analysis of photometric data from the Open Supernova catalog

Kornilov M.V., Pruzhinskaya M.V., Malanchev K.L. et al., 2019, Proceedings of the International Conference "The multi-messenger astronomy: gamma-ray bursts, search for electromagnetic counterparts to neutrino events and gravitational waves, Publishing house SNEG Pyatigorsk, pp. 100-110, DOI:10.26119/SAO.2019.1.35517

Abstract. The next generation of astronomical surveys will revolutionize our understanding of the Universe, raising unprecedented data challenges in the process. One of them is the impossibility to rely on human scanning for the identification of unusual/unpredicted astrophysical objects. Moreover, given that most of the available data will be in the form of photometric observations, such characterization cannot rely on the existence of high resolution spectroscopic observations. The goal of this project is to detect the anomalies in the Open Supernova Catalog with use of machine learning. We will develop a pipeline where human expertise and modern machine learning techniques c an complement each other. Using supernovae as a case study, our proposal is divided in two parts: the first developing a strategy and pipeline where anomalous objects are identified, and a second phase where such anomalous objects submitted to careful individual analysis. The strategy requires an initial data set for which spectroscopic is available for training purposes, but can be applied to a much larger data set for which we only have photometric observations. This project represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire.

Keywords. Machine learning techniques, supernovae, astronomical surveys

Figure. Light curves of SN 2016bln that belongs to rare sub-type Ia-91T.

Use of Machine Learning for Anomaly Detection Problem in Large Astronomical Databases

Malanchev K., Volnova A., Kornilov M. et al., 2019, CEUR Workshop Proceedings, 21st International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2019, 2523, pp. 205-216

Use of Machine Learning for Anomaly Detection Problem in Large Astronomical Databases

Malanchev K., Volnova A., Kornilov M. et al., 2019, CEUR Workshop Proceedings, 21st International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2019, 2523, pp. 205-216

Abstract. In this work, we address the problem of anomaly detection in large astronomical databases by machine learning methods. The importance of such study is justified by the presence of a large amount of astronomical data that cannot be processed only by human resource. We focus our attention on finding anomalous light curves in the Open Supernova Catalog. Few types of anomalies are considered: the artifacts in the data, the cases of misclassification and the presence of previously unclassified objects. On a dataset of ~2000 supernova (SN) candidates, we found several interesting anomalies: one active galactic nucleus (SN2006kg), one binary microlensing event (Gaia16aye), representatives of rare classes of SNe such as super-luminous supernovae, and highly reddened objects.

Keywords. Gaussian processes, isolation forest, machine learning, supernovae, transients

Figure. Three-dimensional t-SNE reduced data after application of the isolation forest algorithm. Each point represents a supernova light curve from the data set projected into the three- dimensional space with the coordinates (x1, x2, x3). The intensity of the colour indicates the anomaly score for each object as estimated by the isolation forest algorithm – bluer colour means more anomalous light curve behaviour.

Anomaly Detection in the Open Supernova Catalog

Pruzhinskaya M., Malanchev K., Kornilov M., Ishida E., Mondon F., Volnova A., Korolev V., 2019, MNRAS, 489, 3, pp. 3591-3608

Anomaly Detection in the Open Supernova Catalog

Pruzhinskaya M., Malanchev K., Kornilov M., Ishida E., Mondon F., Volnova A., Korolev V., 2019, MNRAS, 489, 3, pp. 3591-3608, DOI:10.1093/mnras/stz236

Abstract. In the upcoming decade, large astronomical surveys will discover millions of transients raising unprecedented data challenges in the process. Only the use of the machine learning algorithms can process such large data volumes. Most of the discovered transients will belong to the known classes of astronomical objects. However, it is expected that some transients will be rare or completely new events of unknown physical nature. The task of finding them can be framed as an anomaly detection problem. In this work, we perform for the first time an automated anomaly detection analysis in the photometric data of the Open Supernova Catalog (OSC), which serves as a proof of concept for the applicability of these methods to future large-scale surveys. The analysis consists of the following steps: (1) data selection from the OSC and approximation of the pre-processed data with Gaussian processes, (2) dimensionality reduction, (3) searching for outliers with the use of the isolation forest algorithm, and (4) expert analysis of the identified outliers. The pipeline returned 81 candidate anomalies, 27 (33 per cent) of which were confirmed to be from astrophysically peculiar objects. Found anomalies correspond to a selected sample of 1.4 per cent of the initial automatically identified data sample of approximately 2000 objects. Among the identified outliers we recognized superluminous supernovae, non-classical Type Ia supernovae, unusual Type II supernovae, one active galactic nucleus and one binary microlensing event. We also found that 16 anomalies classified as supernovae in the literature are likely to be quasars or stars. Our proposed pipeline represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire. All code and products of this investigation are made publicly available.

Keywords. Methods: data analysis, catalogues, supernovae: general

Figure. Light curves of superluminous supernova PTF10aagc. Solid lines are the results of our approximation by Multivariate Gaussian Process.

Active Anomaly Detection for time-domain discoveries

Ishida Emille E. O., Kornilov Matwey V., Malanchev Konstantin L., Pruzhinskaya Maria V., Volnova Alina A., Korolev Vladimir S., Mondon Florian, Sreejith Sreevarsha, Malancheva Anastasia, 2021, Astronomy & Astrophysics 650, A195

Active Anomaly Detection for time-domain discoveries

Ishida Emille E. O., Kornilov Matwey V., Malanchev Konstantin L., Pruzhinskaya Maria V., Volnova Alina A., Korolev Vladimir S., Mondon Florian, Sreejith Sreevarsha, Malancheva Anastasia, Das Shubhomoy, 2019, Astronomy & Astrophysics 650, A195, DOI:10.1051/0004-6361/202037709

Abstract. We present the first evidence that adaptive learning techniques can boost the discovery of unusual objects within astronomical light curve data sets. Our method follows an active learning strategy where the learning algorithm chooses objects which can potentially improve the learner if additional information about them is provided. This new information is subsequently used to update the machine learning model, allowing its accuracy to evolve with each new information. For the case of anomaly detection, the algorithm aims to maximize the number of scientifically interesting anomalies presented to the expert by slightly modifying the weights of a traditional Isolation Forest (IF) at each iteration. In order to demonstrate the potential of such techniques, we apply the Active Anomaly Discovery (AAD) algorithm to 2 data sets: simulated light curves from the PLAsTiCC challenge and real light curves from the Open Supernova Catalog. We compare the AAD results to those of a static IF. For both methods, we performed a detailed analysis for all objects with the ~2% highest anomaly scores. We show that, in the real data scenario, AAD was able to identify ~80% more true anomalies than the IF. This result is the first evidence that AAD algorithms can play a central role in the search for new physics in the era of large scale sky surveys.

Keywords. Methods: data analysis, supernovae: general, stars: variables: general

Figure. Fraction of anomalies as a function the total number of candidates for the simulated PLAsTiCC data set. The full line represents the mean and shaded regions mark 5-95 percentiles of results obtained from 2000 realizations with differentrandom seeds.

Realization of Different Techniques for Anomaly Detection in Astronomical Databases

Konstantin Malanchev, Vladimir Korolev, Matwey Kornilov et al., 2020, In: Elizarov A., Novikov B., Stupnikov S. (eds), Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2019. Communications in Computer and Information Science, vol 1223. Springer, Cham, pp. 97-107

Realization of Different Techniques for Anomaly Detection in Astronomical Databases

Konstantin Malanchev, Vladimir Korolev, Matwey Kornilov et al., 2020, In: Elizarov A., Novikov B., Stupnikov S. (eds), Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2019. Communications in Computer and Information Science, vol 1223. Springer, Cham, pp. 97-107, DOI:10.1007/978-3-030-51913-1_7

Abstract. In this work we address the problem of anomaly detection in large astronomical databases by machine learning methods. The importance of such study is justified by the existence of a large amount of astronomical data that can not be processed only by human resource. We evaluated five anomaly detection algorithms to find anomalies in the light curve data of the Open Supernova Catalog. Comparison of the algorithms revealed that expert supervised active anomaly detection method shows the best performance, while among purely unsupervised techniques Gaussian mixture model and one-class support vector machine methods outperform isolation forest and local outlier factor methods.

Keywords. Active anomaly detection, isolation forest, local outlier factor, machine learning, one-class SVM, supernovae

Figure. Fraction of anomalies as a function the total number of candidates (outliers) scrutinised by the expert.

The Most Interesting Anomalies Discovered in ZTF DR3 from the SNAD-III Workshop

Patrick D. Aleo, Emille E. O. Ishida, Matwey Kornilov et al., 2020, Res. Notes AAS, 4, 112

The Most Interesting Anomalies Discovered in ZTF DR3 from the SNAD-III Workshop

Patrick D. Aleo., Emille E. O. Ishida, Matwey Kornilov et al., 2020, Res. Notes AAS, 4, 112, DOI:10.3847/2515-5172/aba6e8

Abstract. The search for objects with unusual astronomical properties, or anomalies, is one of the most anticipated results to be delivered by the next generation of large scale astronomical surveys. Moreover, given the volume and complexity of current data sets, machine learning algorithms will undoubtedly play an important role in this endeavor. The SNAD team is specialized in the development, adaptation and improvement of such techniques with the goal of constructing optimal anomaly detection strategies for astronomy. We present here the preliminary results from the third annual SNAD workshop (https://snad.space/2020/) that was held on-line in 2020 July.

Keywords. Supernovae, transient detection, AGN host galaxies, outlier detection

Figure. The light curves of PSO J011.0457+41.5548 — luminous blue variable candidate from the M31 field.

Anomaly detection in the Zwicky Transient Facility DR3

Malanchev K. L., Pruzhinskaya M. V., Korolev V. S., Aleo P. D., Kornilov M. V., Ishida E. E. O., Krushinsky V. V., Mondon F., Sreejith S., Volnova A. A., Belinski A. A., Dodin A. V., Tatarnikov A. M., Zheltoukhov S. G., 2021, MNRAS, 502, 4, pp. 5147-5175

Anomaly detection in the Zwicky Transient Facility DR3

Malanchev K. L., Pruzhinskaya M. V., Korolev V. S., Aleo P. D., Kornilov M. V., Ishida E. E. O., Krushinsky V. V., Mondon F., Sreejith S., Volnova A. A., Belinski A. A., Dodin A. V., Tatarnikov A. M., Zheltoukhov S. G., 2021 MNRAS, 502, 4, pp. 5147-5175, DOI:10.1093/mnras/stab316

Abstract. We present results from applying the SNAD anomaly detection pipeline to the third public data release of the Zwicky Transient Facility (ZTF DR3). The pipeline is composed of 3 stages: feature extraction, search of outliers with machine learning algorithms and anomaly identification with followup by human experts. Our analysis concentrates in three ZTF fields, comprising more than 2.25 million objects. A set of 4 automatic learning algorithms was used to identify 277 outliers, which were subsequently scrutinised by an expert. From these, 188 (68%) were found to be bogus light curves — including effects from the image subtraction pipeline as well as overlapping between a star and a known asteroid, 66 (24%) were previously reported sources whereas 23 (8%) correspond to non-catalogued objects, with the two latter cases of potential scientific interest (e. g. 1 spectroscopically confirmed RS Canum Venaticorum star, 4 supernovae candidates, 1 red dwarf flare). Moreover, using results from the expert analysis, we were able to identify a simple bi-dimensional relation which can be used to aid filtering potentially bogus light curves in future studies. We provide a complete list of objects with potential scientific application so they can be further scrutinised by the community. These results confirm the importance of combining automatic machine learning algorithms with domain knowledge in the construction of recommendation systems for astronomy. Our code is publicly available at https://github.com/snad-space/zwad.

Keywords. Methods: data analysis, stars: variables: general, transients: supernovae, astronomical data bases: miscellaneous

Figure. The light curves of AT 2017ixs — nova candidate from the M31 field that behaves unusually for its suspected astrophysical type.

SNAD Transient Miner: Finding Missed Transient Events in ZTF DR4 using k-D trees

Aleo P. D., Malanchev K. L., Pruzhinskaya M. V., Ishida E. E. O., E. Russeil, Kornilov M. V., Korolev V. S., Sreejith S., Volnova A. A., G. S. Narayan, 2021, arXiv:2111.11555

SNAD Transient Miner: Finding Missed Transient Events in ZTF DR4 using k-D trees

Aleo P. D., Malanchev K. L., Pruzhinskaya M. V., Ishida E. E. O., E. Russeil, Kornilov M. V., Korolev V. S., Sreejith S., Volnova A. A., G. S. Narayan, 2021 arXiv:2111.11555

Abstract. We report the automatic detection of 11 transients (7 possible supernovae and 4 active galactic nuclei candidates) within the Zwicky Transient Facility fourth data release (ZTF DR4), all of them observed in 2018 and absent from public catalogs. Among these, three were not part of the ZTF alert stream. Our transient mining strategy employs 41 physically motivated features extracted from both real light curves and four simulated light curve models (SN Ia, SN II, TDE, SLSN-I). These features are input to a k-D tree algorithm, from which we calculate the 15 nearest neighbors. After pre-processing and selection cuts, our dataset contained approximately a million objects among which we visually inspected the 105 closest neighbors from seven of our brightest, most well-sampled simulations, comprising 92 unique ZTF DR4 sources. Our result illustrates the potential of coherently incorporating domain knowledge and automatic learning algorithms, which is one of the guiding principles directing the SNAD team. It also demonstrates that the ZTF DR is a suitable testing ground for data mining algorithms aiming to prepare for the next generation of astronomical data.

Keywords. Transient sources (1851), Time domain astronomy (2109), Supernovae (1668), Active galactic nuclei (16)

Figure. SNAD150 light curves in zr- and zg-bands within the first 420 days of ZTF DR4, generated with the ZTF SNAD viewer.

Presentations:

Malanchev, K. et al. "Anomaly detection in ZTF DR3", DESC supernova group meeting, on-line, 15 December 2020 (slides).

Malanchev, K. et al. "Anomaly detection in ZTF DR3", Rubin Research Bytes Project & Community Workshop 2020, on-line, 18 August 2020 (slides, video).

Kornilov, M. et al. "Algorithms for the active anomaly detection in the era of wide-field astronomical surveys", Highlights of the Russian astrophysics 2019, Moscow, Russia, 16 December 2019.

Malanchev K., Volnova A. et al. "Use of machine learning for anomaly detection in large astronomical databases", XXI international conference "Data Analytics and Management in Data Intensive Domains" (DAMDID), Kazan, Russia, 15-18 October 2019 (slides).
Download poster

Pruzhinskaya, M. et al. "Machine learning and new classes of astrophysical objects", 7th School-seminar "Magneto-Plasma Processes in Relativistic Astrophysics", Tarusa, Russia, 17-21 June 2019.

Kornilov, M. et al. "Anomaly detection in the Open Supernova Catalog by machine learning algorithms", Lomonosov conference 2019, Moscow, Russia, 8-12 April 2019.

Pruzhinskaya, M. et al. "Anomaly detection in the Open Supernova Catalog by machine learning algorithms", IAU100 Special Session "Women and Girls in Astronomy", Moscow, Russia, 11 Febrary 2019 (slides, in Russian).

Kornilov, M. et al. "Anomaly detection in the Open Supernova Catalog by machine learning algorithms", Highlights of the Russian astrophysics 2018, Moscow, Russia, 17 December 2018.

Kornilov, M. et al. "Machine learning techniques for analysis of photometric data from the Open Supernova catalog", The multi-messenger astronomy: gamma-ray bursts, search for electromagnetic counterparts to neutrino events and gravitational waves, Nizhnij Arkhyz (SAO), Russia, 7-14 October 2018 (slides).

Pruzhinskaya, M. et al. "Machine Learning analysis of supernova light curves", Accretion Processes in Cosmic Sources II , Saint Petersburg, Russia, 3-8 September 2018 (slides).

Contact us

Sternberg Astronomical Institute, Moscow State University, Universitetsky pr., 13, Moscow 119234, Russia