We are an international network of researchers from Carnegie Mellon University, Laboratoire de Physique de Clermont Auvergne, Space Research Institute, Sternberg astronomical institute, and University of Surrey joined together to solve the problem of detecting unusual objects in astronomical databases with machine learning methods. We are committed to applying our skills and knowledge to new fascinating problems in astrophysics. We are open to international exchanges and collaborations. We also welcome students, postgraduates, and researchers willing to discuss the future of machine learning applications in astronomical discovery.
First results from SNAD-VI Workshop is now published in RNAAS
SNAD viewer paper is now published in PASP
Gaia DR3 light curves are now available in the SNAD viewer
Could SNAD160 be a Pair-instability Supernova?
The next generation of astronomical surveys will revolutionize our understanding of the Universe, raising unprecedented data challenges in the process. One of them is the impossibility to rely on human scanning for the identification of unusual astrophysical objects. Moreover, given that most of the available data will be in the form of photometric observations, such characterization cannot rely on the existence of high resolution spectroscopic observations. The goal of this project is to develop a pipeline where human expertise and modern machine learning techniques can complement each other in the task of identifying unusual astronomical objects. Our approach contains two steps. First one is the data preparation and search for the optimal machine learning algorithm for working with this data set to identify the outliers. During the second step we analyse all found outliers using the information in other wavelength diapasons, additional observations and theoretical modelling. The strategy can be applied to a large data set for which we only have photometric observations. Enabling reliable anomaly/outlier detection based solely on photometric observations is one of the fundamental puzzles to be solved before we can convert the full potential of large-scale surveys into scientific results. This project represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire.
Machine Learning Analysis of Supernova Light Curves
Pruzhinskaya M., Malanchev K., Kornilov M. et. al, 2019, Proceedings of Science, 342, id. 51
Pruzhinskaya M.V., Malanchev K., Kornilov M. et. al, 2019, Proceedings of Science, 342, id. 51, DOI:10.22323/1.342.0051
Abstract. The next generation of astronomical surveys will revolutionize our understanding of the Universe, raising unprecedented data challenges in the process. One of them is the impossibility to rely on human scanning for the identification of unusual/unpredicted astrophysical objects. Moreover, given that most of the available data will be in the form of photometric observations, such characterization cannot rely on the existence of high resolution spectroscopic observations. We introduce an analysis of anomaly detection in the Open Supernova Catalog (http://sne.space/) with use of machine learning. We developed a strategy and pipeline — where anomalous objectsare identified and then submitted to careful individual analysis. This project represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire.
Keywords. Supernovae: general, anomaly detection
Figure. Light curves of binary microlensing event Gaia16aye — case of misclassification in the Open Supernova Catalog. Solid lines are the results of our approximation by MULTIVARIATE GAUSSIAN PROCESS.
Machine learning techniques for analysis of photometric data from the Open Supernova catalog
Kornilov M.V., Pruzhinskaya M.V., Malanchev K.L. et al., 2019, Proceedings of the International Conference "The multi-messenger astronomy: gamma-ray bursts, search for electromagnetic counterparts to neutrino events and gravitational waves, Publishing house SNEG Pyatigorsk, pp. 100-110
Kornilov M.V., Pruzhinskaya M.V., Malanchev K.L. et al., 2019, Proceedings of the International Conference "The multi-messenger astronomy: gamma-ray bursts, search for electromagnetic counterparts to neutrino events and gravitational waves, Publishing house SNEG Pyatigorsk, pp. 100-110, DOI:10.26119/SAO.2019.1.35517
Abstract. The next generation of astronomical surveys will revolutionize our understanding of the Universe, raising unprecedented data challenges in the process. One of them is the impossibility to rely on human scanning for the identification of unusual/unpredicted astrophysical objects. Moreover, given that most of the available data will be in the form of photometric observations, such characterization cannot rely on the existence of high resolution spectroscopic observations. The goal of this project is to detect the anomalies in the Open Supernova Catalog with use of machine learning. We will develop a pipeline where human expertise and modern machine learning techniques c an complement each other. Using supernovae as a case study, our proposal is divided in two parts: the first developing a strategy and pipeline where anomalous objects are identified, and a second phase where such anomalous objects submitted to careful individual analysis. The strategy requires an initial data set for which spectroscopic is available for training purposes, but can be applied to a much larger data set for which we only have photometric observations. This project represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire.
Keywords. Machine learning techniques, supernovae, astronomical surveys
Figure. Light curves of SN 2016bln that belongs to rare sub-type Ia-91T.
Use of Machine Learning for Anomaly Detection Problem in Large Astronomical Databases
Malanchev K., Volnova A., Kornilov M. et al., 2019, CEUR Workshop Proceedings, 21st International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2019, 2523, pp. 205-216
Malanchev K., Volnova A., Kornilov M. et al., 2019, CEUR Workshop Proceedings, 21st International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2019, 2523, pp. 205-216
Abstract. In this work, we address the problem of anomaly detection in large astronomical databases by machine learning methods. The importance of such study is justified by the presence of a large amount of astronomical data that cannot be processed only by human resource. We focus our attention on finding anomalous light curves in the Open Supernova Catalog. Few types of anomalies are considered: the artifacts in the data, the cases of misclassification and the presence of previously unclassified objects. On a dataset of ~2000 supernova (SN) candidates, we found several interesting anomalies: one active galactic nucleus (SN2006kg), one binary microlensing event (Gaia16aye), representatives of rare classes of SNe such as super-luminous supernovae, and highly reddened objects.
Keywords. Gaussian processes, isolation forest, machine learning, supernovae, transients
Figure. Three-dimensional t-SNE reduced data after application of the isolation forest algorithm. Each point represents a supernova light curve from the data set projected into the three- dimensional space with the coordinates (x1, x2, x3). The intensity of the colour indicates the anomaly score for each object as estimated by the isolation forest algorithm – bluer colour means more anomalous light curve behaviour.
Anomaly Detection in the Open Supernova Catalog
Pruzhinskaya M., Malanchev K., Kornilov M., Ishida E., Mondon F., Volnova A., Korolev V., 2019, MNRAS, 489, 3, pp. 3591-3608
Pruzhinskaya M., Malanchev K., Kornilov M., Ishida E., Mondon F., Volnova A., Korolev V., 2019, MNRAS, 489, 3, pp. 3591-3608, DOI:10.1093/mnras/stz236
Abstract. In the upcoming decade, large astronomical surveys will discover millions of transients raising unprecedented data challenges in the process. Only the use of the machine learning algorithms can process such large data volumes. Most of the discovered transients will belong to the known classes of astronomical objects. However, it is expected that some transients will be rare or completely new events of unknown physical nature. The task of finding them can be framed as an anomaly detection problem. In this work, we perform for the first time an automated anomaly detection analysis in the photometric data of the Open Supernova Catalog (OSC), which serves as a proof of concept for the applicability of these methods to future large-scale surveys. The analysis consists of the following steps: (1) data selection from the OSC and approximation of the pre-processed data with Gaussian processes, (2) dimensionality reduction, (3) searching for outliers with the use of the isolation forest algorithm, and (4) expert analysis of the identified outliers. The pipeline returned 81 candidate anomalies, 27 (33 per cent) of which were confirmed to be from astrophysically peculiar objects. Found anomalies correspond to a selected sample of 1.4 per cent of the initial automatically identified data sample of approximately 2000 objects. Among the identified outliers we recognized superluminous supernovae, non-classical Type Ia supernovae, unusual Type II supernovae, one active galactic nucleus and one binary microlensing event. We also found that 16 anomalies classified as supernovae in the literature are likely to be quasars or stars. Our proposed pipeline represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire. All code and products of this investigation are made publicly available.
Keywords. Methods: data analysis, catalogues, supernovae: general
Figure. Light curves of superluminous supernova PTF10aagc. Solid lines are the results of our approximation by Multivariate Gaussian Process.
Active Anomaly Detection for time-domain discoveries
Ishida Emille E. O., Kornilov Matwey V., Malanchev Konstantin L., Pruzhinskaya Maria V., Volnova Alina A., Korolev Vladimir S., Mondon Florian, Sreejith Sreevarsha, Malancheva Anastasia, 2021, Astronomy & Astrophysics 650, A195
Ishida Emille E. O., Kornilov Matwey V., Malanchev Konstantin L., Pruzhinskaya Maria V., Volnova Alina A., Korolev Vladimir S., Mondon Florian, Sreejith Sreevarsha, Malancheva Anastasia, Das Shubhomoy, 2019, Astronomy & Astrophysics 650, A195, DOI:10.1051/0004-6361/202037709
Abstract. We present the first evidence that adaptive learning techniques can boost the discovery of unusual objects within astronomical light curve data sets. Our method follows an active learning strategy where the learning algorithm chooses objects which can potentially improve the learner if additional information about them is provided. This new information is subsequently used to update the machine learning model, allowing its accuracy to evolve with each new information. For the case of anomaly detection, the algorithm aims to maximize the number of scientifically interesting anomalies presented to the expert by slightly modifying the weights of a traditional Isolation Forest (IF) at each iteration. In order to demonstrate the potential of such techniques, we apply the Active Anomaly Discovery (AAD) algorithm to 2 data sets: simulated light curves from the PLAsTiCC challenge and real light curves from the Open Supernova Catalog. We compare the AAD results to those of a static IF. For both methods, we performed a detailed analysis for all objects with the ~2% highest anomaly scores. We show that, in the real data scenario, AAD was able to identify ~80% more true anomalies than the IF. This result is the first evidence that AAD algorithms can play a central role in the search for new physics in the era of large scale sky surveys.
Keywords. Methods: data analysis, supernovae: general, stars: variables: general
Figure. Fraction of anomalies as a function the total number of candidates for the simulated PLAsTiCC data set. The full line represents the mean and shaded regions mark 5-95 percentiles of results obtained from 2000 realizations with differentrandom seeds.
Realization of Different Techniques for Anomaly Detection in Astronomical Databases
Konstantin Malanchev, Vladimir Korolev, Matwey Kornilov et al., 2020, In: Elizarov A., Novikov B., Stupnikov S. (eds), Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2019. Communications in Computer and Information Science, vol 1223. Springer, Cham, pp. 97-107
Konstantin Malanchev, Vladimir Korolev, Matwey Kornilov et al., 2020, In: Elizarov A., Novikov B., Stupnikov S. (eds), Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2019. Communications in Computer and Information Science, vol 1223. Springer, Cham, pp. 97-107, DOI:10.1007/978-3-030-51913-1_7
Abstract. In this work we address the problem of anomaly detection in large astronomical databases by machine learning methods. The importance of such study is justified by the existence of a large amount of astronomical data that can not be processed only by human resource. We evaluated five anomaly detection algorithms to find anomalies in the light curve data of the Open Supernova Catalog. Comparison of the algorithms revealed that expert supervised active anomaly detection method shows the best performance, while among purely unsupervised techniques Gaussian mixture model and one-class support vector machine methods outperform isolation forest and local outlier factor methods.
Keywords. Active anomaly detection, isolation forest, local outlier factor, machine learning, one-class SVM, supernovae
Figure. Fraction of anomalies as a function the total number of candidates (outliers) scrutinised by the expert.
The Most Interesting Anomalies Discovered in ZTF DR3 from the SNAD-III Workshop
Patrick D. Aleo, Emille E. O. Ishida, Matwey Kornilov et al., 2020, Res. Notes AAS, 4, 112
Patrick D. Aleo., Emille E. O. Ishida, Matwey Kornilov et al., 2020, Res. Notes AAS, 4, 112, DOI:10.3847/2515-5172/aba6e8
Abstract. The search for objects with unusual astronomical properties, or anomalies, is one of the most anticipated results to be delivered by the next generation of large scale astronomical surveys. Moreover, given the volume and complexity of current data sets, machine learning algorithms will undoubtedly play an important role in this endeavor. The SNAD team is specialized in the development, adaptation and improvement of such techniques with the goal of constructing optimal anomaly detection strategies for astronomy. We present here the preliminary results from the third annual SNAD workshop (https://snad.space/2020/) that was held on-line in 2020 July.
Keywords. Supernovae, transient detection, AGN host galaxies, outlier detection
Figure. The light curves of PSO J011.0457+41.5548 — luminous blue variable candidate from the M31 field.
Anomaly detection in the Zwicky Transient Facility DR3
Malanchev K. L., Pruzhinskaya M. V., Korolev V. S., Aleo P. D., Kornilov M. V., Ishida E. E. O., Krushinsky V. V., Mondon F., Sreejith S., Volnova A. A., Belinski A. A., Dodin A. V., Tatarnikov A. M., Zheltoukhov S. G., 2021, MNRAS, 502, 4, pp. 5147-5175
Malanchev K. L., Pruzhinskaya M. V., Korolev V. S., Aleo P. D., Kornilov M. V., Ishida E. E. O., Krushinsky V. V., Mondon F., Sreejith S., Volnova A. A., Belinski A. A., Dodin A. V., Tatarnikov A. M., Zheltoukhov S. G., 2021 MNRAS, 502, 4, pp. 5147-5175, DOI:10.1093/mnras/stab316
Abstract. We present results from applying the SNAD anomaly detection pipeline to the third public data release of the Zwicky Transient Facility (ZTF DR3). The pipeline is composed of 3 stages: feature extraction, search of outliers with machine learning algorithms and anomaly identification with followup by human experts. Our analysis concentrates in three ZTF fields, comprising more than 2.25 million objects. A set of 4 automatic learning algorithms was used to identify 277 outliers, which were subsequently scrutinised by an expert. From these, 188 (68%) were found to be bogus light curves — including effects from the image subtraction pipeline as well as overlapping between a star and a known asteroid, 66 (24%) were previously reported sources whereas 23 (8%) correspond to non-catalogued objects, with the two latter cases of potential scientific interest (e. g. 1 spectroscopically confirmed RS Canum Venaticorum star, 4 supernovae candidates, 1 red dwarf flare). Moreover, using results from the expert analysis, we were able to identify a simple bi-dimensional relation which can be used to aid filtering potentially bogus light curves in future studies. We provide a complete list of objects with potential scientific application so they can be further scrutinised by the community. These results confirm the importance of combining automatic machine learning algorithms with domain knowledge in the construction of recommendation systems for astronomy. Our code is publicly available at https://github.com/snad-space/zwad.
Keywords. Methods: data analysis, stars: variables: general, transients: supernovae, astronomical data bases: miscellaneous
Figure. The light curves of AT 2017ixs — nova candidate from the M31 field that behaves unusually for its suspected astrophysical type.
SNAD Transient Miner: Finding Missed Transient Events in ZTF DR4 using k-D trees
Aleo P. D., Malanchev K. L., Pruzhinskaya M. V., Ishida E. E. O., E. Russeil, Kornilov M. V., Korolev V. S., Sreejith S., Volnova A. A., G. S. Narayan, 2022, New Astronomy, Vol. 96
Aleo P. D., Malanchev K. L., Pruzhinskaya M. V., Ishida E. E. O., E. Russeil, Kornilov M. V., Korolev V. S., Sreejith S., Volnova A. A., G. S. Narayan, 2022 New Astronomy, Vol. 96, DOI:10.1016/j.newast.2022.101846
Abstract. We report the automatic detection of 11 transients (7 possible supernovae and 4 active galactic nuclei candidates) within the Zwicky Transient Facility fourth data release (ZTF DR4), all of them observed in 2018 and absent from public catalogs. Among these, three were not part of the ZTF alert stream. Our transient mining strategy employs 41 physically motivated features extracted from both real light curves and four simulated light curve models (SN Ia, SN II, TDE, SLSN-I). These features are input to a k-D tree algorithm, from which we calculate the 15 nearest neighbors. After pre-processing and selection cuts, our dataset contained approximately a million objects among which we visually inspected the 105 closest neighbors from seven of our brightest, most well-sampled simulations, comprising 92 unique ZTF DR4 sources. Our result illustrates the potential of coherently incorporating domain knowledge and automatic learning algorithms, which is one of the guiding principles directing the SNAD team. It also demonstrates that the ZTF DR is a suitable testing ground for data mining algorithms aiming to prepare for the next generation of astronomical data.
Keywords. Transient sources (1851), Time domain astronomy (2109), Supernovae (1668), Active galactic nuclei (16)
Figure. SNAD150 light curves in zr- and zg-bands within the first 420 days of ZTF DR4, generated with the ZTF SNAD viewer.
Could SNAD160 be a Pair-instability Supernova?
Maria V. Pruzhinskaya, Alina Volnova, Matwey Kornilov et al., 2022, Res. Notes AAS, 6, 122
Maria V. Pruzhinskaya, Alina Volnova, Matwey Kornilov et al., 2022, Res. Notes AAS, 6, 122, DOI:10.3847/2515-5172/ac76cf
Abstract. The SNAD team reports the discovery of SNAD160 (AT2018lzi) within the Zwicky Transient Facility third data release. The transient has been found using the active anomaly detection algorithm, an adaptive learning strategy aimed at incorporating expert knowledge into machine learning models. Our preliminary analysis shows that SNAD160 could be a superluminous supernova powered by a pair-instability mechanism — its light curve behavior is consistent with the observed slow rise and slow decay expected from these events.
Keywords. Supernovae; Transient detection; Transient sources; Light curve classification; Time series analysis; Astronomical object identification
Figure. SNAD160 zr-light curve (gray circles) – compared to the normal SN Ia Nugent's model (solid blue line), normal SN IIP 1999em model (solid red line) from Vincenzi et al. (2019) in zr-band, and synthetic R-band light curves of bright PISN models – He130 (green line) and R250 (purple line) from Kasen et al. (2011) at z=0.3 (dashed) and z=0.4 (solid). All models are shifted to the observer frame.
Supernova search with active learning in ZTF DR3
Maria V. Pruzhinskaya, Emille E. O. Ishida, Alexandra K. Novinskaya, Etienne Russeil, Alina A. Volnova, Konstantin L. Malanchev, Matwey V. Kornilov, Patrick D. Aleo, Vladimir S. Korolev, Vadim V. Krushinsky, Sreevarsha Sreejith, Emmanuel Gangler, 2022, arXiv:2208.09053
Maria V. Pruzhinskaya, Emille E. O. Ishida, Alexandra K. Novinskaya, Etienne Russeil, Alina A. Volnova, Konstantin L. Malanchev, Matwey V. Kornilov, Patrick D. Aleo, Vladimir S. Korolev, Vadim V. Krushinsky, Sreevarsha Sreejith, Emmanuel Gangler, 2022, arXiv:2208.09053
Abstract. In order to explore the potential of adaptive learning techniques to big data sets, the SNAD team used Active Anomaly Discovery (AAD) as a tool to search for new supernova (SN) candidates in the photometric data from the first 9.4 months of the Zwicky Transient Facility survey - between 2018 March 17 and December 31 (58194 < MJD < 58483). We analysed 70 ZTF fields with high galactic latitude and visually inspected 2100 outliers. This resulted in 104 supernova-like objects found, 57 of them were reported to the Transient Name Server for the first time and 47 were previously mentioned in other catalogues either as supernovae with known types or as supernova candidates. We visually inspected the multi-colour light curves of the non-catalogued transients and performed their fit with different supernova models to assign it to a proper class: Ia, Ib/c, IIP, IIL, IIn. Moreover, we also identified unreported slow-evolving transients which are good superluminous SN candidates, and a few others non-catalogued objects, such as red dwarf flares and active galactic nuclei. Beyond confirming the effectiveness of human-machine integration underlying the AAD strategy, our results shed light on potential leaks in currently available pipelines and can help avoid similar losses in future large scale astronomical surveys. The algorithm enables directed search of any type of data and definition of anomaly chosen by the expert.
Keywords. supernovae: general; transients: supernovae; methods: data analysis
Figure. Light curve fit of SNAD137 by Nugent’s Type IIn supernova model. Observational data correspond to OIDs: 825102200009050 (zg), 825202200039582 (zr), 825302200018371 (zi).
The SNAD Viewer: Everything You Want to Know about Your Favorite ZTF Object
Konstantin Malanchev, Matwey V. Kornilov, Maria V. Pruzhinskaya, Emille E. O. Ishida, Patrick D. Aleo, Vladimir S. Korolev, Anastasia Lavrukhina, Etienne Russeil, Sreevarsha Sreejith, Alina A. Volnova, Anastasiya Voloshina, Alberto Krone-Martins, 2022, arXiv:2211.07605
Konstantin Malanchev, Matwey V. Kornilov, Maria V. Pruzhinskaya, Emille E. O. Ishida, Patrick D. Aleo, Vladimir S. Korolev, Anastasia Lavrukhina, Etienne Russeil, Sreevarsha Sreejith, Alina A. Volnova, Anastasiya Voloshina, Alberto Krone-Martins, 2022, arXiv:2211.07605
Abstract. We describe the SNAD Viewer, a web portal for astronomers which presents a centralized view of individual objects from the Zwicky Transient Facility's (ZTF) data releases, including data gathered from multiple publicly available astronomical archives and data sources. Initially built to enable efficient expert feedback in the context of adaptive machine learning applications, it has evolved into a full-fledged community asset that centralizes public information and provides a multi-dimensional view of ZTF sources. For users, we provide detailed descriptions of the data sources and choices underlying the information displayed in the portal. For developers, we describe our architectural choices and their consequences such that our experience can help others engaged in similar endeavors or in adapting our publicly released code to their requirements. The infrastructure we describe here is scalable and flexible and can be personalized and used by other surveys and for other science goals. The Viewer has been instrumental in highlighting the crucial roles domain experts retain in the era of big data in astronomy. Given the arrival of the upcoming generation of large-scale surveys, we believe similar systems will be paramount in enabling the materialization of scientific potential enclosed in current terabyte and future petabyte-scale data sets. The Viewer is publicly available online at https://ztf.snad.space
Keywords. Astronomy web services – Astronomy software
Figure. Diagram of the service infrastructure. Dashed rectangles represent individual services, circles are web modules and complimentary scripts, cylinders are database management systems, and the display is the Portal module. Parallelograms are external services used by the Viewer. Lines show data flow, double circles mark data receivers. Arrows show data exchange between modules of a single service.
Malanchev, K. et al. "Anomaly detection in ZTF DR3", DESC supernova group meeting, on-line, 15 December 2020 (slides).
Malanchev, K. et al. "Anomaly detection in ZTF DR3", Rubin Research Bytes Project & Community Workshop 2020, on-line, 18 August 2020 (slides, video).
Kornilov, M. et al. "Algorithms for the active anomaly detection in the era of wide-field astronomical surveys", Highlights of the Russian astrophysics 2019, Moscow, Russia, 16 December 2019.
Malanchev K., Volnova A. et al. "Use of machine learning for anomaly detection in large astronomical databases", XXI international conference "Data Analytics and Management in Data Intensive Domains" (DAMDID), Kazan, Russia, 15-18 October 2019 (slides).
Download poster
Pruzhinskaya, M. et al. "Machine learning and new classes of astrophysical objects", 7th School-seminar "Magneto-Plasma Processes in Relativistic Astrophysics", Tarusa, Russia, 17-21 June 2019.
Kornilov, M. et al. "Anomaly detection in the Open Supernova Catalog by machine learning algorithms", Lomonosov conference 2019, Moscow, Russia, 8-12 April 2019.
Pruzhinskaya, M. et al. "Anomaly detection in the Open Supernova Catalog by machine learning algorithms", IAU100 Special Session "Women and Girls in Astronomy", Moscow, Russia, 11 Febrary 2019 (slides, in Russian).
Kornilov, M. et al. "Anomaly detection in the Open Supernova Catalog by machine learning algorithms", Highlights of the Russian astrophysics 2018, Moscow, Russia, 17 December 2018.
Kornilov, M. et al. "Machine learning techniques for analysis of photometric data from the Open Supernova catalog", The multi-messenger astronomy: gamma-ray bursts, search for electromagnetic counterparts to neutrino events and gravitational waves, Nizhnij Arkhyz (SAO), Russia, 7-14 October 2018 (slides).
Pruzhinskaya, M. et al. "Machine Learning analysis of supernova light curves", Accretion Processes in Cosmic Sources II, Saint Petersburg, Russia, 3-8 September 2018 (slides).
Sternberg Astronomical Institute, Moscow State University, Universitetsky pr., 13, Moscow 119234, Russia