All Projects
CheminformaticsSMARTSRDKitPythonAPIELN

Retrosynthetic AI

A hybrid MCS and Tanimoto similarity system for querying reaction databases and automatically generating retrosynthetic pathways, logging results to ELN via API.

Overview

Retrosynthetic AI is a hybrid chemistry system that automates the process of identifying viable synthetic pathways for target molecules. By combining Maximum Common Substructure (MCS) analysis with Tanimoto similarity scoring, the system navigates reaction databases to propose evidence-backed routes and programmatically logs them into the Electronic Lab Notebook.

The Problem

Retrosynthetic analysis — working backwards from a target molecule to identify available starting materials — is one of the most intellectually demanding tasks in medicinal chemistry. Manual literature searches are time-consuming, inconsistent, and don’t scale to high-throughput discovery campaigns.

The Solution

Hybrid Similarity Scoring

The system implements a two-pronged query strategy against NextMove’s Pistachio reaction database:

  • MCS (Maximum Common Substructure) — identifies reactions involving structurally analogous scaffold cores
  • Tanimoto Similarity Scoring — captures broader fingerprint-level similarity for less obvious structural analogs

Each candidate is scored for confidence based on its combined similarity metrics, establishing a ranked shortlist of viable precedents.

Iterative Pathway Parsing

Once adequately similar targets are identified, the reaction schemes are iteratively parsed against the original target using SMARTS pattern matching to specify the detachment points for retrosynthetic disconnection.

The decomposed components are then enumerated into the corresponding starting materials at each reaction step, building a full synthetic tree.

ELN Integration

The final output — including:

  • Starting materials (as SMILES strings)
  • Reaction conditions and catalysts at each step
  • Confidence scores for each proposed pathway

— is programmatically recorded into the Electronic Lab Notebook via API, ready for the chemist to review and act on.

Technical Stack

  • RDKit for molecular manipulation and SMARTS operations
  • NextMove Pistachio as the reaction database
  • Python for orchestration logic
  • Signals ELN API for notebook integration

Impact

This system dramatically compressed the time from target nomination to having an actionable synthetic plan in the ELN, allowing medicinal chemists to focus on evaluating and executing the most promising routes rather than mining the literature.

Previous

PurifAI

Next

Algorithmic Building Block Survey