- Poster presentation
- Open Access
- Published:
Parsers for SMILES and SMARTS
Chemistry Central Journal volume 2, Article number: P40 (2008)
SMILES [1] and SMARTS [2] are two line notations developed by Daylight and implemented in a number of chemical informatics software tools. For the most part these are hand-written parsers and quite complicated and hard to read, modify, maintain, optimize or reuse. A traditional computer science approach would use a parsing system like lex/yacc but that has not made much inroad in computational chemistry. The tools have been difficult to use, especially for error reporting and recovery, and most of the developers have a chemistry background and don't know the language theory underlying this approach.
Modern programming languages and parser systems have made many of the difficulties disappear. I have been working with people from OpenSMILES [3] and several of the existing open source toolkits (Open Babel [4], CDK [5] and RDKit [6]) develop valid, useful grammars for SMILES and SMARTS. I have also been evaluating how to implement those grammars using parsing systems like ANTLR [7], PLY [8] and ragel [9]. My plan is to fold that work back into the different projects so there is a broader and more consistent support for these two important notations. I expect also that resulting code will be faster, more maintainable, and more flexible for trying new ideas. By documenting the different parts I hope the knowledge of how to use parser frameworks is disseminated into the computational chemistry development community and helps to develop the next generation of chemistry toolkits and line notations like MQL[10].
This poster presents some of the preliminary results of that work including a SMILES grammar, implementations for ANTLR and PLY, and early performance analysis.
References
Weininger D: J Chem Inf Comput Sci. 1988, 28: 31-36. 10.1021/ci00057a005.
[http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html]
Proschak E, Wegner J, SchĂĽller A, Schneider G, Fechner UJ: Chem Inf Model. 2007, 47 (I2): 295-301. 10.1021/ci600305h.
Author information
Authors and Affiliations
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Dalke, A. Parsers for SMILES and SMARTS. Chemistry Central Journal 2 (Suppl 1), P40 (2008). https://doi.org/10.1186/1752-153X-2-S1-P40
Published:
DOI: https://doi.org/10.1186/1752-153X-2-S1-P40
Keywords
- Development Community
- Chemistry Development
- Science Approach
- Language Theory
- Error Reporting