LAVINE RESEARCH SUMMARY
Chemical Sensors, Biosensors, Retention Mechanisms in High Performance Liquid Chromatography, Chemometrics, Olfaction, Computational Biology
Lavine’s research program is divided into two broad areas: sensors and computational biology. Biosensors in Lavine’s lab utilize self-assembled monolayers to immobilize antibodies and surface plasmon resonance (SPR) spectroscopy to detect changes in refractive index that occur when the target is captured by the modified surface. Chemical sensing includes both mechanistic studies and environmental applications of swellable molecularly imprinted polymers. Computational biology focuses on DNA microarrays and understanding the mechanics of DNA bending. Lavine has also been interested in the development of broadly based profiling techniques for fingerprinting complex biological samples. Large amounts of data are usually generated in fingerprinting experiments requiring chemometric methods for data analysis, and these methods are critically important in extracting information embedded in the data. Hence, data mining and knowledge discovery represent a third major thrust of Lavine’s research. Lavine’s research group has been a pioneer in the development and application of a large variety of multivariate data analysis techniques including genetic algorithms, factor analysis, curve resolution, and pattern recognition. On-going research projects in Lavine’s group are summarized below.
Compound Specific Imprinted Nanospheres for Optical Sensing
The objective of the proposed research is to investigate the use of molecularly imprinted polymers as the basis of a sensitive and selective sensing method for the detection of pharmaceutical and other emerging organic contaminants, which are at parts per billion (ppb) levels, in aquatic environments. The research will involve the preparation of moderately crosslinked, molecularly imprinted polymeric nanospheres (ca. 200nm in diameter) that are designed to swell and shrink as a function of analyte concentration in aqueous media. These nanospheres will be incorporated into hydrogel membranes. Chemical sensing is based on changes in the optical properties of the membrane that accompany swelling of the molecularly imprinted nanospheres. Two effects contribute to this change. One is an increase in the size of the microspheres, which will lead to an increase in the amount of light scattered. The other is a change in the refractive index. Because swelling leads to an increase in the percentage of water in the polymer, the refractive index of the nanospheres will decrease as they swell. This brings them closer to the refractive index of the hydrogel membrane, leading to a decrease in the amount of light scattered/reflected by the microspheres. For the systems that we will be studying, the change in refractive index is the dominant effect. This change will be measured by surface plasmon resonance spectroscopy (SPR) or fluorescence spectroscopy for nanospheres that have been prepared from monomers that fluoresce. The prototype sensor will be capable of detecting pollutants and hazardous materials selectively at ppb levels.
An anthrax sensor will be developed using surface plasmon resonance (SPR) spectroscopy. SPR is a member of a family of spectroscopic techniques based on evanescent wave optics. SPR has been used for the determination of refractive indexes, dielectric constants and layer thicknesses. The experimental set-up for SPR that will be used to detect template binding will be the so-called Kretschmann configuration, which consists of a thin metal film (typically 50 nm thick gold or silver) at the interface of a high and low refractive index materials. Excitation by laser light will result in the production of surface plasmons in the metal film at a given internal angle of incident light when the energy and momentum are matched between the photons and the surface plasmon waves. (A plasmon or charge density wave is a collective oscillation of the charge in a metal). Surface plasmon light is extremely sensitive to changes in the optical architecture of the interface, which will occur after binding of anthrax to the bound antibody on the gold surface.
A suitable antibody will be mixed with an appropriate long chained self-assembled monolayer to yield a formulation that will be directly deposited onto a gold substrate. Ideally, the antibody should contain an –SH moiety but it should be possible to develop a suitable self assembled monolayer formulation using an antibody that does not contain an –SH moiety. The SPR response of the reference Au substrate, which contains the antibody and self-assembled monolayer, will be compared to a control, which will consist of a gold substrate containing only the self-assembled monolayer. If the anthrax spores elicit an SPR response when only the reference Au substrate is used, the experiment will be judged as a success provided that an absence of anthrax spores in the test solution produces a negligible response for both the reference and control. The specificity of this system will be determined by two factors: the selectivity of the antibody towards the anthrax spores and the magnitude of the refractive index change caused by the binding of the spores to the Au substrate, which should be very large. For this reason, we do not anticipate that interference due to nonspecific binding will be a problem.
Supervised Learning From Microarray Data
Microarrays have allowed the expression level of thousands of genes or proteins to be measured simultaneously. Data sets generated by these arrays consist of a small number of observations (e.g., 20-100 samples) on a very large number of measurement variables (e.g., 10,000 genes or proteins). Each variable indicates whether a particular gene or protein is under or over expressed. The observations in these data sets have other attributes associated with them such as a class label denoting the pathology of the subject from which the sample was taken.
We would like to be able to analyze the large arrays of data from a microarray experiment at an intermediate level using pattern recognition techniques for interpretation. However, there are problems when applying pattern recognition methods to larger data sets. Classification success rates will vary with the pattern recognition method employed. Low classification success rates are often obtained for the prediction set despite a linearly separable training set. Automation of these techniques for larger data sets is difficult.
The underlying premise of the approach to data analysis described in this paper is that all classification methods will work well when a problem is simple. By identifying the appropriate features, a “hard” problem can be reduced to a "simple" one. Also, by selecting the most salient features of the data, a classifier can be developed that will obviate the need for a more detailed understanding of the system being investigated. At the very least, such an analysis could identify those genes or proteins worthy of further study. Our goal is, therefore, feature selection, in order to increase the signal to noise ratio of the data by discarding measurements that are not characteristic of the profile of the various classes in the data set. For gene expression data, it is important that a multivariate approach to feature selection be employed since genes usually work in groups to regulate biological processes. Any approach to feature selection must also take into account the existence of redundancies in the data because the features of interest are most likely small sets of highly interdependent genes.
We report on the development of a genetic algorithm (GA) that employs supervised learning to mine gene expression and proteomic data. Our pattern recognition GA selects features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. Because the largest principal components capture the bulk of the variance in the data, the features chosen by the pattern recognition GA contain information primarily about the differences between classes in a data set. The principal component analysis routine embedded in the fitness function of the pattern recognition GA will act as an information filter, significantly reducing the size of the search space since it restricts the search to feature sets whose principal component plots show clustering on the basis of class. In addition, the algorithm focuses on those classes and/or samples that are difficult to classify as it trains using a form of boosting. Samples that consistently classify correctly are not as heavily weighted as samples that are difficult to classify. Over time, the algorithm learns its optimal parameters in a manner similar to a neural network. The algorithm integrates aspects of artificial intelligence and evolutionary computations to yield a smart one pass procedure for feature selection and pattern recognition.
Recently, the fitness function of the pattern recognition GA has been enhanced. Transverse learning has been introduced by coupling a robustified version of the Hopkins statistic to the original fitness function of our pattern recognition GA. For training sets with small amounts of labeled data (i.e., data points tagged with a class label) and large amounts of unlabeled data (i.e., data points not tagged with a class label), this approach is preferred, as our results will show since information in the unlabeled data is used by the fitness function to guide feature selection. With this approach, feature subsets are selected to increase clustering in the principal component plot (using both the labeled and unlabeled data points), while simultaneously optimizing the separation between classes (using only the labeled data points). Transverse learning ensures that features identified by the pattern recognition GA will produce a discriminant that will perform better than one developed from a set of features whose selection is based solely on the dichotomization power of the features for the labeled data points.
The chemical sense of olfaction is a complex and poorly understood phenomenon. While it is an integral part of everyday life, information about the relationship between chemical structure and odor quality is scarce. For a compound to have an odor, it is generally agreed that it must be volatile as well as both lipid and water-soluble. Beyond this general description of characteristics, there is no agreement among researchers as to which molecular properties and structural features are responsible for the olfactory impressions invoked by odorants.
Analysis of odor-structure relationships (OSR) using computer assisted methods and pattern recognition techniques can provide a practical approach to the analysis of odorants. The heart of the approach is finding a set of molecular descriptors from which a discriminating relationship can be found. According to the current theories of olfaction, the perception of odor is initiated by the interaction of the odorant with the olfactory receptor sites in the nose. Olfactory excitation only occurs if the size and shape of the stimulant is the complement of the receptor or if the stimulant possesses sufficient conformational flexibility to attain the correct shape. The spatial arrangement of the stimulant’s functional and steric groups must also conform to the overall 3-dimensional geometry of the receptor. It is logical to apply this information to a structure-olfaction study during the key step: the development of molecular descriptors. However, only topological, and bulk geometric descriptors, e.g., molecular connectivity indices, substructures, substructural molecular connectivity environment descriptors, molecular volume, and principal moments of inertia, have been used to describe molecular shape in previously published OSR studies of musks and other odorants. Descriptors, which contain information about the olfactory process, need to be developed and tested in order to formulate more effective OSRs.
A methodology to facilitate the intelligent design of new odorants (e.g., musks) with specialized properties is being developed as part of our on-going research effort in machine learning. In a traditional framework, the introduction of a new odorant is a lengthy, costly, and laborious discovery, development, and testing process. We propose to streamline this process utilizing large existing olfactory databases available through the open scientific literature as input for a new structure/activity correlation methodology. The first step in this process is to characterize each molecule in the database by an appropriate set of descriptors. To accomplish this task, an enhanced version of the Transferable Atom Equivalent (TAE) descriptor methodology will be used to create a large set of electron density derived shape/property hybrid (PEST), wavelet coefficient (WCD) and TAE histogram descriptors. We have chosen these molecular property descriptors to represent the problem because they have been shown to contain pertinent shape and electronic properties of the molecule and correlate with key modes of intermolecular interactions. Traditional QSAR methodologies, which employ fragment based descriptors, have been shown to be effective for QSAR development within homologous sets of molecules but are less effective when applied to datasets containing a great deal of structural variation. In contrast to previous attempts at SAR, our use of shape-aware electron density based molecular property descriptors has removed many of the limitations brought about by the use of descriptors based on substructure fragments, molecular surface properties, or other whole molecule descriptors. Another reason for the mixed success of past QSAR efforts can be traced to the nature of the underlying modeling problem, which is often quite complex. To meet these challenges, a genetic algorithm for pattern recognition analysis has been developed that selects descriptors which create class separation in a plot of the two largest principal components of the data while simultaneously searching for features that increase clustering of the data.
Development of Computational Methods for DNA Bending
The objective of the proposed research is to develop a general and robust statistical procedure that allows for the theoretical prediction and ultimate synthesis of small nonprotein agents, which control gene expression through site-specific binding and induced bending of DNA. The research plan involves an iterative scheme of (1) long-time molecular dynamics simulations of DNA using modified versions of the conformational flooding technique and multiple trajectory method, (2) the development of multivariate statistical methods to extract the structural details necessary to discover which atoms are actively involved in computed conformational transitions leading to the desired bends and kinks, (3) molecular visualization of the computed DNA energy landscape to predict which small molecular agents have the propensity to “push or pull” the minor groove of DNA into bent regions along the free energy surface, (4) the synthesis of small molecular agents predicted by the computations, and (5) laboratory testing and evaluation of the binding specificity and structural deformation of DNA.
DNA bending has been implicated in the regulation of a number of important biological systems. For example, bending has been shown to be involved in gene expression, DNA recombination and nuclear packaging. Our focus will be on the role of DNA bending on gene expression. Gene expression is one of the most fundamental processes in biology, and as such, is strictly regulated. At a macromolecular level, the selective binding of transcriptional proteins to sites on DNA either facilitate or hinder the binding of RNA polymerase, thus regulating gene expression. Wresting transcriptional control away from the cell would have benefits in the treatment of diseases such as cancer, sickle cell anemia, and heart disease, particularly if this could be accomplished using small man-made molecules. Studies on the control of gene expression by DNA bending have indicated that bends in DNA created by either intrinsic bending sequences or unrelated transcriptional proteins can cause either activation or repression of gene expression depending upon the phase of the bend. Synthetic agents have been demonstrated to affect transcription, where the lack of sequence specificity prevents the utilization of existing agents to regulate a specific gene.
This research which is being performed in collaboration with Duquesne University’s Departments of Chemistry and Biochemistry, and Pharmaceutical Sciences seeks to develop and test new sequence specific DNA bending agents, predicted by molecular dynamics (MD) simulations coupled with the data reduction and sorting abilities of multivariate analysis. Central to this proposed study is the implementation of statistical methodology to extract useful information from data produced by MD simulations. The principal component analysis (PCA) method, a well-established branch of multivariate statistics, is known to be useful for molecular visualization and improve conformational sampling on long time scales. This usefulness can be compromised by the large number of atoms in a macromolecule and the vast amount of data produced from MD simulations. Three nontrivial problems of concern arise: (1) how can atoms important in conformational transitions be identified, (2) what statistical summaries of atom positions are useful to represent changes in molecular conformations, and (3) what is the hierarchical arrangement of DNA conformational sub-state tiers.
For the first time, a systematic and predictive scheme exploiting the dynamics and conformational transitions in bent DNA will be implemented to create man-made molecules that bind to specific sequences in the minor grove of DNA. In addition to the obvious medicinal importance of the research, the proposed study will also lead to the development of statistical tools to aid scientists in classifying, summarizing, and describing the size and types of conformational changes in macromolecular energy landscapes. The project has the potential to impact both health related and basic science issues.
Recent Publications (*undergraduate author, 1microchemical award winner)
- B. K. Lavine, J. Ritter, A. J. Moores*1, M. Wilson, A. Faruque, and H. T. Mayfield, “Source Identification of Underground Fuel Spills by Solid Phase Micro-extraction/High-Resolution Gas Chromatography/Genetic Algorithms,” Anal. Chem., 2000, 72(2), 423-431.
- B. K. Lavine, "Fundamental Reviews: Chemometrics," Anal Chem., 2000, 72(12), 91R-98R.
- B. K. Lavine, “Clustering and Classification of Analytical Data,” in the Encyclopedia of Analytical Chemistry: Instrumentation and Applications, John Wiley & Sons Ltd., Chichester 2000, pp. 9689-9710.
- B. K. Lavine, A. J. Moores*1, and J. P. Ritter, “Underground Fuel Spills, Source Identification,” in the Encyclopedia of Analytical Chemistry: Instrumentation and Applications,” Environment – Water and Waste, 2000, Volume 4,” Edited by R. A. Meyer, pp. 3495-3515.
- B. K. Lavine, D. Brzozowski*1, J. Ritter, A. J. Moores*1, and H. T. Mayfield, “Fuel Spill Identification by Selective Fractionation Prior to Gas Chromatography I. Water Soluble Components,” J. Chromat. Sci., 2001,
- B. K. Lavine, D. Brzozowski*1, A .J. Moores*1, C. E. Davidson, and H.T. Mayfield, “Genetic Algorithm for Fuel Spill Identification,” Anal. Chim. Acta, 2001, 437(2), 233-246.
- B. K. Lavine, C. E. Davidson, A. J. Moores*1, and P. R. Griffiths, “Raman Spectroscopy and Genetic Algorithms for the Classification of Wood Types,” Applied Spectroscopy, 2001, 55(8), 960-966.
- B. K. Lavine, A. Vesanen*, D. M. Brzozowski*1, and H. T. Mayfield “Authentication of Fuel Standards using Gas Chromatography/Pattern Recognition Techniques,” Anal Letters, 2001, 34(2), 281- 294.
- Christine Johnson, Robert K. Vander Meer, and Barry K. Lavine, “Changes in the Cuticular Hydrocarbon Profile of the Slave-maker Ant Queen, Polyergus breviceps, After Killing a Formica Queen,” J. Chem. Ecology, 2001, 27(9), 1787-1804.
- B. K. Lavine, G. Auslander, and J. Ritter, “Polarographic Studies of Zero Valent Iron as a Reductant for Remediation of Nitroaromatics in the Environment,” Microchem. J., 2001, 70(2), 69-83.
- B. K. Lavine, C. E. Davidson, and A. J. Moores*1, “Innovative Genetic Algorithms for Chemoinformatics, “ Chemometrics & Intelligent Laboratory Instrumentation, 2002, 60(1), 161-171.
- B. K. Lavine, C. E. Davidson, and A. J. Moores*1, “Genetic Algorithms for Spectral Pattern Recognition,” Vibrational Spectroscopy, 2002, 28(1), 83-95.
- B. K. Lavine, J. P. Ritter, and S. Peterson*, “Enhancement of Selectivity in Reversed Phase Liquid Chromatography,” J. Chromatog, 2002, 946(1-2), 83-90.
- Ruth Baltus, Barry K. Lavine, and Jason P. Ritter, “Modeling Solute Transport in Micellar Liquid Chromatography,” Separation Science & Technology, 2002, 37, 3443-3464.
- Barry K. Lavine, Jason P. Ritter, and Edward Voigtman, “Multivariate Curve Resolution in Liquid Chromatography – Resolving Two Way Multicomponent Data Using a Varimax Extended Rotation,” Microchemical J., 2002, 72(2), 163-178.
- C. R. Johnson, H. Topoff, R. K. Vander Meer, and B. K. Lavine, Queens Ripe for the Killing: When a host queen becomes the target of aggression by the slave-maker ant queen, Polyergus breviceps,” Animal Behavior, 2002, 64, 807-815.
- B. K. Lavine, "Fundamental Reviews: Chemometrics," Anal Chem., 2002, 74(12), 2763-2770.
- B. K. Lavine, C. E. Davidson, Robert K. Vander Meer, S. Lahav, V. Soroker, and A. Hefetz, “Genetic Algorithms for Deciphering the Complex Chemosensory Code of Social Insects,” Chemometrics & Intelligent Laboratory Instrumentation, 2003, 66(1), 51-62.
- D. J. Westover, B. K. Lavine, and W. R. Seitz, “Synthesis and Evaluation of Nitrated Poly(4-Hydroxy-Styrene) Microspheres for pH Sensing, Microchemical Journal, 2003, 74, 121-129.
- B. K. Lavine, C. E. Davidson, C. Breneman, and W. Katt, “Electronic Van der Waals Surface Property Descriptors and Genetic Algorithms for Developing Structure-Activity Correlations in Olfactory Databases,” J. Chem. Inf. Science, 2003, 43, 1890-1905.
- B. K. Lavine, C. E. Davidson, and W. T. Rayens, “Machine Learning Based Pattern Recognition Applied to Microarray Data, Combinatorial Chemistry & High Throughput Screening,” 2004, 7, 115-131.
- B. K. Lavine, C. E. Davidson, J. P. Ritter, D. Westover, and T. Hancewicz, “Varimax Extended Rotation Applied to Multivariate Spectroscopic Image Analysis,” Microchemical Journal, 2004, 76, 173-180.
- B. K. Lavine, C. E. Davidson, C. Breneman, and W. Katt, “Genetic Algorithms for Clustering and Classification of Olfactory Stimulants,’’(in Chemoinformatics: Methods and Protocols) J. Bajorath (Ed.), Methods Mol Biol., Humana Press, 2004, 275, 399-426.
- B. K. Lavine, “Classification and Pattern Recognition,” in Practical Handbook of Chemometrics, 2nd Edition, Paul Gemperline (Ed.), Marcel Dekker Press, 2004 IN PRESS.
- B. K. Lavine, C. E. Davidson, and D. J. Westover, “Spectral Pattern Recognition Using Self Organizing Maps,” J. Chem. Inf. Comp. Science, 2004, 44(3), 1056-1064.
- B. K. Lavine and J. R. Workman, “Fundamental Review of Chemometrics,” Analytical Chemistry, 2004, 76 (12), 3365-3372