Title :
Scaling up Prioritized Grammar Enumeration for scientific discovery in the cloud
Author :
Worm, Tony ; Chiu, Kenneth
Author_Institution :
Binghamton Univ., Binghamton, NY, USA
Abstract :
Symbolic Regression (SR) is the data driven search for mathematical relations as performed by a computer. In essence, SR is a search over all possible equations to find those which best model the data on hand. Prioritized Grammar Enumeration (PGE) is a recently proposed algorithm which has been shown to have great efficacy and efficiency on the Symbolic Regression problem, using just a single compute core. PGE reformulates the SR problem as a search over a grammar, makes reductions in the magnitude of the search space, and introduces mechanisms for exploring that space efficiently. Notably, PGE provides reliability and reproducibility of results, a key aspect to any system used by scientists at large. In this paper, we enhance the PGE algorithm in several ways. First, we extend PGE to discover differential equations. Second, we incorporate multiple prioritization heaps into PGE, reducing point evaluations while maintaining efficacy. Finally, we decouple the PGE subroutines into a set of services, contain each with Docker, and deploy them onto the cloud. Our algorithm experiments cover a range of dynamical systems from a multitude of domains. and our cloud experiments explore a variety of architectural setups. Our results show PGE to have great promise and efficacy in automating the discovery of equations at the scales needed by tomorrow´s scientific data problems.
Keywords :
Big Data; differential equations; regression analysis; scientific information systems; Docker; PGE algorithm; data driven search; differential equations; prioritized grammar enumeration; reliability; scientific discovery; search space; symbolic regression; Algebra; Benchmark testing; Equations; Grammar; Heuristic algorithms; Mathematical model; Runtime;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004284