SAMPL overview, SAMPL7-10
As of SAMPL7-10, our NIH funding allows us to be somewhat more strategic in planning challenges, so we plan a series of challenges encompassing:
- Physical property prediction (e.g. pKa, logP, logD, etc.)
- Host-guest binding
- Protein-ligand binding
SAMPL is now broken into phases, each running roughly a year, with each phase encompassing several component challenges. Workshop plans are detailed separately.
The Gibb and Isaacs groups, focusing on host-guest binding, plan to collect regular data on host-guest binding, and protein-ligand binding assays are coming online in the Chodera lab. Physical property data, however, relies on planned industry internships and other partnerships.
For physical property prediction, we will likely revisit partitioning and distribution prediction at least two more times. As discussed in the recent SAMPL6 logP virtual workshop, predicting distribution between a polar phase (like octanol) and water involves at least three problems:
- pKa prediction
- neutral species partitioning
- charged species partitioning, as it is known that significant ratios of ions can partition into the octanol phase.
For nonpolar solvents (e.g. dodecane, heptane, cyclohexane) the third problem is minimal. Thus it is likely we will revisit nonpolar to water distribution in SAMPL7, and polar-to-water distribution in subsequent challenges. We also plan to progress in terms of pKa, initially providing pKa values (if we can get these measured) and later transitioning to requiring participants to predict pKa values.
SAMPL7 plans (present through late 2019)
SAMPL7 is planned to include:
- Host-guest binding challenges on Isaacs’ TrimerTrip, Gibb’s GDCC’s, and cyclodextrin derivatives from the Gilson lab
- Physical property prediction (log D, likely between water and a non-octanol solvent), given pKa values, with data available in summer or Fall 2019; this is a partnership with GSK.
Potentially the logD property prediction could be bumped to early in the SAMPL8 challenge depending on progress of data collection.
The SAMPL7 physical property challenge
The measurements are being generated in partnership with GlaxoSmithKline. The logD challenge will focus on an octanol-water log D prediction set, with pKa’s possibly made available. The prediction set will contain around 50 molecules, with focus on fragment-like compounds, matched molecular pairs, large dynamic range, and functional group diversity.
We are looking into future physical property challenges that include aqueous partition coefficients for a range of water-solvent systems including octanol, heptane, cyclohexane and possibly non-aqueous pairs such as cyclohexane/methanol and heptane/acetonitrile. We are working with GSK on logD and potentially pKa measurement on a diverse molecule set for partitioning into multiple solvents, and this same set will likely be revisited for several challenges, with SAMPL7 focusing on logD into a nonpolar solvent, potentially given pKa values.
SAMPL8 plans (late 2019 through Aug. 2020)
- NanoLuc binding challenge, Fall 2019
- HSA binding challenge, Winter 2020
- Physical properties challenge, Spring 2020, likely revisiting the same compound set as the logD challenge but including for more polar solvents such as octanol (and perhaps withholding pKa).
- Host-guest challenges, prior to Summer 2020
The SAMPL8 HSA challenge
The HSA series of challenges will focus on predicting binding affinities and binding sites of compounds binding to human serum albumin. HSA is known have ligands which bind in a variety of binding sites with significant affinity. Small soluble molecules resembling drug fragments are highly likely to bind to HSA (>= 90% of such fragments with Kd tighter than ~480 uM). Collaborating with the National Center for Advancing Translational Sciences (NCATS) opens up the possibility for multiple different assays to be performed, each providing different kinds of binding information. For example, thermofluor assays, microscale thermophoresis, fluorescence competition assays and tryptophan fluorescence quenching are all currently being considered.
Wild-type HSA has more than 7 binding sites. Two binding sites, Sudlow Site I and Sudlow Site II are known major drug binding sites. We plan to focus on predicting binding to Sudlow Site I and II in this challenge. Challenge participants may be asked to predict the binding site of each ligand and what the affinities are. Ligands would be fragment-like or drug-like molecules.
The NanoLuc challenges
We are collaborating with the National Center for Advancing Translational Sciences (NCATS), to use standard assays to measure binding affinities of large compound libraries to an engineered form of luciferase known as NanoLuc. This data will be divided into sets of similar ligand complexities for sequential blind challenges. NanoLuc can also be expressed in E. coli, so future challenges will potentially involve mutated forms.
The NanoLuc challenge will likely involve prediction of affinities of drug-like molecules to a single binding site. However, since NCATS has screened hundreds of thousands of compounds for binding to NanoLuc, the binding affinity prediction component may be preceded by a virtual screening challenge that involves attempting to pick active compounds out of a library of verified nonbinders.
SAMPL9 plans (Sept. 1, 2020 through Aug. 31, 2021)
- Physical property challenge (likely on pKa with Paul Czodrowski; late 2020); whether we revisit logD again will depend on the outcomes of SAMPL7 and 8.
- Host-guest binding challenges, summer 2021
- Protein-ligand binding challenge TBD
SAMPL10 plans (Sept. 1, 2021 through Aug. 31, 2022)
- Physical property challenge
- Host-guest binding challenge
- Protein-ligand binding challenge
Properties of interest
Properties of particular interest for future SAMPL challenges include:
- Solubility (absolute thermodynamic solubility, but also relative solubilities for the same solute in different solvents)
- LogP and logD data
- Passive membrane permeability
- Tautomer ratio prediction (estimated 2020 or 2021)
We are exploring internships to generate such data, but if you have suitable resources and would like to contribute we would potentially love your help.