Basis of the International Comparative Study

Phillip R. Westmoreland, University of Massachusetts Amherst

In the mid and late 1990's, molecular and materials modeling has begun to be a valuable tool of science and engineering. Accuracy, efficiency, computational power, and appreciation of these methods have reached the point of giving valid insights and allowing sound decision making.

Industrial use is still in its infancy, though. The most touted potential and successes have been in identification and design of pharmaceuticals, a high-profit application. Less well known is the impact occurring in design and development of homogeneous catalysts, where teamed modeling and experiment are leading to dramatic improvements. Still more unsung is the way that these approaches are transforming the routine tasks of finding physical and chemical properties and trends.

Theoretical advances have come from around the world, but clusters of researchers in different regions have established different strengths. The U.S. has demonstrated particular strengths in pharmaceuticals, force-field development, phase-equilibrium modeling derived from classical molecular thermodynamics, and homogeneous catalysis. Canadian researchers are also deeply involved in homogeneous catalysis and electronic density-functional theory. Among the areas of strength in Europe are kinetics of heterogeneous catalysts and mesoscale modeling. Achievements in Japan, India, Australia, and other countries are recognized but less well known. Likewise, successes in industrial applications are known, but only the smaller successes have been documented publicly in detail. Dissolution of industrial groups shows that failures occur, too, yet the literature contributes little to the benefit of understanding why these failures occurred.

The goals of the proposed international comparative study are (i) to assess state of the art in molecular modeling, from quantum chemistry to empirical molecular simulation of macroscopic systems, and (ii) to assess how effectively and extensively theoretical advances are being transferred to industrial applications. The approach involves joint planning and site visits by a small committee of experts representing the diverse fields, followed by documentation and public presentation of the findings.

Areas to be considered

"Molecular simulations" and "computational quantum chemistry" are the two main thrusts of molecularly based modeling. Molecular simulation is based on classical Newtonian physics, modeling interactions within or between molecules using correlations of the interaction forces (force fields). Systems are modeled either deterministically (molecular dynamics, which integrates classical equations of motion) or as stochastically varied cases (Monte Carlo methods). In contrast, computational quantum chemistry is based on quantum physics, primarily applied to the electronic structure of atoms or molecules. The immediate results are wavefunctions or probability density functionals describing electron states.

The distinction is pragmatic. In molecular simulations, simpler models with many parameters make it possible to model large collections of atoms and molecules. Quantum chemistry can provide greater accuracy but is restricted to smaller molecular size by its complexity and cost.

As a result, molecular simulations are used to model ensemble properties and behaviors. Examples include P-V-T relations, phase equilibrium, transport properties, structures of synthetic and biological macromolecules, and docking of one molecule against another.

Quantum chemistry is most useful when force parameters are unknown or inapplicable. Density functional methods are used increasingly, but Configuration Interaction wavefunction methods with large atomic-orbital basis sets currently remain as the reference case. Semi-empirical methods are another important subset, which may be quite parameterized. Force-field development and calculation of thermochemistry, kinetics, optical properties, and nmr shifts are some general areas of utility.

The entire field may be summed up as "computational chemistry" or "molecular modeling," but some researchers in the area find these terms distastefully inaccurate. To many people, the term "computational chemistry" automatically implies computational quantum chemistry. On the other, this term may be helpful when used more inclusively, bringing in use of reaction theories like RRKM, Master Equation Theory, or Variational Transition State Theory to compute kinetics. Similarly, it is a fair criticism to recognize that much of molecular simulations is "materials modeling" by classical physics, not modeling of molecules or of chemistry.

Improving the effectiveness of these methods requires improvements in theories, algorithms, computing hardware and operating systems, data management, the experiment-modeling interface, problem analysis, personnel infrastructure, and credibility.

  1. Theory. Unresolved theoretical issues range from simple to conceptually rich. For example, the seemingly simple problem of properly interpreting internal rotations is a serious problem for prediction of thermochemistry and kinetics, complicated further when in a condensed phase or in solution. Simulation of polymer distortions in flows rely on lumping multiple atoms, collapsing electronic structure and sensitivity to interatomic forces into pseudo-atoms. As a third example, the assignment of poly-Gaussian hydrogenic orbitals to heavy elements breaks down because of difficulty in modeling more complicated distortions in valence electrons, active participation of numerous non-ground-state electronic configurations, and relativistic effects on electrons near the massive charge of the nucleus.
  2. Algorithms. Improved computational algorithms are need to cope with larger molecules or sets of atoms. More degrees of freedom means more complicated optimizations, as in protein configuration analysis, self-consistent-field solutions, or transition state searches. Parallelization has great promise for large systems, yet problems may have few or no parallel features to exploit. Even if parallelizable, problems may be limited by memory usage or message passing. A different issue is automation of method choice. For example, quantum chemistry calculations may be carried out in a rough hierarchy of increasing accuracy (Pople's "model chemistry" approach), and molecular simulations may be carried out with force fields of increasing detail and accuracy. In both cases, there is a trade-off between accuracy and cost. BASF is Involved in a "Crunchserver" project to have a semi-automatic selection algorithm based on the desired property and accuracy, find requested information in the database or deliver the request for human assessment of choices.
  3. Computing hardware and operating systems. Faster CPUs, more memory, and parallel configurations offer new power and new challenges. As we can tackle bigger problems, we invariably need to tackle bigger ones. Scaling of problem size by N7 or even N3 still quickly limits the feasible targets of ab initio calculations, and a molecular-dynamics simulation of 1000 atoms for a second might as well be infinity when computed in femtosecond time steps. Different approaches to molecular modeling place different demands on hardware and operating systems. So do the varying needs of users. Base-level platforms must be integrated with the computing and visualization necessary to solve smaller but important problems effectively. User interfaces must be transparent windows on the task, methods, and results, despite use on diverse, constantly evolving platforms. Similarly, it is increasingly desirable to carry out code generation, translation, and documentation using cross-platform visual programming tools and techniques like literate programming.
  4. Data management. As computational power increases, the desire to probe the real complexities also demands more sophisticated management and exploitation of data, both computed and measured. While some problems may reduce to a single key result (e.g., a heat of formation), most also have a large amount of accompanying detail. Polymer modeling is a good example. Consider the normal-mode analysis of frequencies for 100,000MW polyethylene ("C7143H14288"). For any configuration, each of the 21,431 atoms has 3 time-dependent position coordinates, simplified to 3 coordinates and 3 components of an oscillation motion vector. It gets worse. These motions may be nonharmonic, the polymer may be highly branched, the polymer will exist in some molecular-weight distribution, and analysis of a single chain will be inadequate to capture the amorphous or crystalline morphology of the material. Furthermore, the interest may be in small-molecule diffusion in a polymer, a polymer melt, homopolymers of more complicated monomers than ethylene, block or random co-polymers, polymers with plasticizers and other additives, solvated polymers, stereochemistry, radical or ionic or condensation or metal-catalyzed polymerization, or polymer degradation. Biopolymers add another layer, as do comparisons with noisy data. The task seems impossible, yet even now polymer modeling is powerful, and as complex a behavior as protein folding can be attacked by crucial simplifications.
  5. The experiment-modeling interface. Pure correlations like QSPR and QSAR (quantitative structure-property or structure-activity relations) already allow prediction and interpretation of practical properties like toxicity or octanol-water partition coefficients. Prediction of measurable properties is crucial, but so is measurement of predictable properties. A challenge for molecular modeling is to keep up with the fast and effective approach of combinatorial chemistry, which presently is quite Edisonian. One role is systematizing the array of possibilities to be tested experimentally. Combinatorial chemistry can be a powerful way to avoid the limitations of imagination, but the best combinatorial chemistry also builds on relevant chemical principles. Computations need to aid the design and interpretation of such experiments, leveraging the appropriate time/accuracy balance of different theoretical approaches.
  6. Problem analysis. Successes in industrial application have rested on identifying the crucial issues or questions. A common experience is that described by Margl of Eastman Chemical, responding to a request from the process development group to identify a mechanism and rate of acylation. That group made the specific request because they assumed this step was crucial to an anhydride alcoholysis. After working through to the answer to the requested problem, the computational chemists returned to the process development group and uncovered the real goal - faster alcoholysis - and quickly identified equilibrium solubilities of the reactants as rate-limiting. Because of misunderstanding about what these methods can and cannot do, collaborative development of the tasks is crucial to successful problem analysis.
  7. Personnel infrastructure. Two personnel issues have been crucial to initial successes in industrial application of computational chemistry: (1) Identifying people who know or are willing to learn both the computational chemistry and the applications and (2) having advocates among management or clients who recognize the appropriate uses of the methods. Even the largest industrial groups of molecular modelers are small in size at present. In the chemical process industries, it is common to find only one or two computational chemists in the company, frequently paired with a more technologically oriented chemist or chemical engineer. At the outset, they educate each other. As time proceeds, they educate others in the possibilities and limitations of the computational tools, most fruitfully by aggressive participation in development teams at the earliest stages. In contrast, operating as a technical service or consultant has had limited success, but it has just as often led to elimination of the activity when internal clients failed to take advantage of tools that they didn't know about or didn't trust.
  8. Credibility. A final key need is development of substantive credibility. Successes within an organization are the ultimate criteria, but awareness of outside successes has proved to be a powerful spur to using these methods. Two dangers have been the desire for overwhelming successes and the strong impact of attractive visuals. There are cases of computational chemistry groups being eliminated because of undue expectations from management and colleagues. These expectations had been heightened by promises to succeed at strictly long-term tasks like de novo catalyst design, reinforced by realistic-looking visual images. In contrast, Dow researchers note that their successes in modeling polymer properties were made possible in part by their shorter-term successes in calculating needed ideal-gas thermochemistry.