suneelbvs · suneelbvs · Oct 13, 2025 · Oct 13, 2025
diff --git a/10_Chemical_Format_Conversion_and_Metadata.ipynb b/10_Chemical_Format_Conversion_and_Metadata.ipynb
@@ -0,0 +1,151 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Tutorial 10: Chemical Format Conversion and Metadata Handling\n",
+    "\n",
+    "Round-trip molecules between common chemical formats while preserving metadata fields.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Objectives\n",
+    "\n",
+    "- Load structure-data files (SDF) that contain rich metadata.\n",
+    "- Convert the records into pandas DataFrames for analysis.\n",
+    "- Export SMILES and SDF files with selected properties.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "from rdkit import Chem\n",
+    "from rdkit.Chem import PandasTools\n",
+    "import pandas as pd\n",
+    "from io import StringIO\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Read SDF Data\n",
+    "\n",
+    "The snippet below emulates reading from disk by loading a multi-record SDF string.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "sdf_block = \"\"\"\n",
+    " Mrv2108 07152116512D\n",
+    "\n",
+    " 6  5  0  0  0  0            999 V2000\n",
+    "   1.2990   -0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n",
+    "   0.0000   -1.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n",
+    "  -1.2990   -0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n",
+    "  -1.2990    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n",
+    "   0.0000    1.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n",
+    "   1.2990    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n",
+    " 1  2  2  0  0  0  0\n",
+    " 2  3  1  0  0  0  0\n",
+    " 3  4  2  0  0  0  0\n",
+    " 4  5  1  0  0  0  0\n",
+    " 5  6  2  0  0  0  0\n",
+    " 6  1  1  0  0  0  0\n",
+    "M  END\n",
+    ">  <Name>\n",
+    "Benzene\n",
+    "\n",
+    ">  <Source>\n",
+    "Example\n",
+    "\n",
+    "$$$$\n",
+    "\"\"\".strip()\n",
+    "supplier = Chem.SDMolSupplier()\n",
+    "supplier.SetData(sdf_block, sanitize=True)\n",
+    "molecules = [mol for mol in supplier if mol is not None]\n",
+    "len(molecules)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Convert to a DataFrame\n",
+    "\n",
+    "`PandasTools.LoadSDF` retains all metadata fields, making downstream analytics straightforward.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "sdf_buffer = StringIO(sdf_block)\n",
+    "df = PandasTools.LoadSDF(sdf_buffer, smilesName='smiles', molColName='ROMol')\n",
+    "df\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Write SMILES and SDF Outputs\n",
+    "\n",
+    "Export the curated data to SMILES or SDF files. String buffers allow inspection without touching disk.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "smiles_buffer = StringIO()\n",
+    "PandasTools.WriteSmi(df, smiles_buffer, molColName='ROMol', includeHeader=True, idName='Name')\n",
+    "smiles_buffer.getvalue()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "sdf_writer = Chem.SDWriter()\n",
+    "sdf_output = StringIO()\n",
+    "sdf_writer.SetOutputStream(sdf_output)\n",
+    "for mol in molecules:\n",
+    "    mol.SetProp('Processed', 'True')\n",
+    "    sdf_writer.write(mol)\n",
+    "sdf_writer.close()\n",
+    "sdf_output.getvalue().splitlines()[:10]\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/5_Conformer_Generation_and_3D_Analysis.ipynb b/5_Conformer_Generation_and_3D_Analysis.ipynb
@@ -0,0 +1,160 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Tutorial 5: 3D Conformer Generation and Analysis\n",
+    "\n",
+    "Learn how to generate three-dimensional conformers for a molecule, optimise their geometry, and compare the resulting ensemble.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Objectives\n",
+    "\n",
+    "- Prepare a molecule with explicit hydrogens so that the force field has the atoms it expects.\n",
+    "- Embed several conformers with the ETKDG algorithm and perform force-field minimisation.\n",
+    "- Analyse conformer energies and pairwise RMS values to identify the most representative structures.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "from rdkit import Chem\n",
+    "from rdkit.Chem import AllChem, Draw\n",
+    "from rdkit.Chem import rdMolAlign\n",
+    "import pandas as pd\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prepare an Example Molecule\n",
+    "\n",
+    "We will work with ibuprofen, a small drug-like molecule that exhibits several low-energy conformations.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "ibuprofen = Chem.AddHs(Chem.MolFromSmiles('CC(C)Cc1ccc(cc1)[C@@H](C)C(=O)O'))\n",
+    "ibuprofen\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Generate Conformers with ETKDG\n",
+    "\n",
+    "The experimental torsion knowledge distance geometry (ETKDG) method provides a robust starting point for 3D coordinates.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "params = AllChem.ETKDGv3()\n",
+    "params.randomSeed = 0xF00D\n",
+    "conformer_ids = list(AllChem.EmbedMultipleConfs(ibuprofen, numConfs=10, params=params))\n",
+    "print(f\"Generated {len(conformer_ids)} conformers\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Optimise with a Force Field\n",
+    "\n",
+    "Each conformer is refined with the Universal Force Field (UFF). The final energy (in kcal/mol) helps rank conformers.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "energy_records = []\n",
+    "for cid in conformer_ids:\n",
+    "    AllChem.UFFOptimizeMolecule(ibuprofen, confId=cid)\n",
+    "    ff = AllChem.UFFGetMoleculeForceField(ibuprofen, confId=cid)\n",
+    "    energy_records.append((cid, ff.CalcEnergy()))\n",
+    "energy_df = pd.DataFrame(energy_records, columns=['conformer_id', 'uff_energy_kcal'])\n",
+    "energy_df.sort_values('uff_energy_kcal').reset_index(drop=True)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Compare Conformer Geometries\n",
+    "\n",
+    "The RMS distance matrix quantifies structural differences between conformers. Smaller values indicate similar geometries.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "rms_matrix = AllChem.GetConformerRMSMatrix(ibuprofen, prealigned=False)\n",
+    "rms_df = pd.DataFrame(\n",
+    "    data=rms_matrix,\n",
+    "    columns=[f\"conf_{i}\" for i in conformer_ids[1:]],\n",
+    "    index=[f\"conf_{i}\" for i in conformer_ids[:-1]]\n",
+    ")\n",
+    "rms_df.round(3)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Visualise the Lowest-Energy Conformers\n",
+    "\n",
+    "Drawing the lowest-energy conformers helps communicate which geometry the force field prefers.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "ranked = energy_df.sort_values('uff_energy_kcal').head(4)['conformer_id'].tolist()\n",
+    "mols = [Chem.Mol(ibuprofen) for _ in ranked]\n",
+    "for new_conf, cid in zip(mols, ranked):\n",
+    "    new_conf.RemoveAllConformers()\n",
+    "    new_conf.AddConformer(ibuprofen.GetConformer(id=cid), assignId=True)\n",
+    "Draw.MolsToGridImage([Chem.RemoveHs(m) for m in mols], legends=[f\"conf {cid}\" for cid in ranked], molsPerRow=2)\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}