Home / Özkırşehirli Group

Özkırşehirli Group

The Özkırşehirli Group is a student-led organization of researchers that was founded and led by Kemal Özkırşehirli. As the Principal Investigator, he determines the scientific focus, the methodologies used in our research, and the priority projects.

Currently, the areas of interest that define our research are computational methods and artificial intelligence/machine learning theories and pipelines for making new scientific discoveries. Our interests include computer-aided drug design (CADD) focused on Chordoma and TBXT; discovery of novel small molecules; geometric deep learning and 3-D mesh methodology; modeling of biomolecules; designing proteins; and developing scientific workflows.

We have a selection process for membership but do not restrict it based solely on your educational or professional background. We believe that an individual's passion for their work and willingness to be intellectually curious, driven to solve problems, honest about their limitations, and capable of fortitude are essential. These values will determine whether you are a good fit. If you have a strong interest in a field outside those mentioned above, then we encourage you to apply if you are willing to become an expert in the field of study, provide consistent contributions to the research effort, and ask scientifically relevant questions.

Application form - LinkedIn launch post

Public-content boundary: project descriptions on this page use public-release materials only. Non-public archives, hidden evaluation material, credentials, raw private data, and internal strategy documents are excluded from the website.

Research Projects

MeshAnyOrder — Order-Agnostic 3D Mesh Generation for Life Sciences

Kemal serves as Principal Investigator for a seven-member independent research collaboration that includes a Google-affiliated research lead. MeshAnyOrder is an order-agnostic autoregressive transformer for point-cloud-conditioned 3D mesh generation: it represents mesh faces as quantized tokens and predicts unvisited adjacent faces from arbitrary traversal seeds instead of committing the model to a single canonical face ordering.

The core architecture is being extended with 3D rotary positional encoding for translation-invariant attention, heterogeneous triangle/quad tokenization, topology-aware validity constraints, frontier-parallel decoding, and local mesh completion or remeshing. The methodology comes first. Once the core baseline is stable, scientific extensions can test protein and molecular surfaces, enzyme and antibody topology, binding-pocket and interface geometry, and other biomolecular complexes as demanding applications of a general mesh model.

The experimental plan includes random, axis-based, breadth-first, and depth-first traversal orders; causal, adjacency-aware, and bidirectional attention masks; and publication-grade comparisons against leading autoregressive and diffusion-based mesh generators. Evaluation covers reconstruction quality, manifoldness, watertightness, topology preservation, inference latency, memory consumption, mesh complexity, and high-resolution scaling.

TBXT / Brachyury Small-Molecule Discovery for Chordoma

The TBXT project is an eleven-person chordoma-focused computational hit-identification effort led by Kemal, targeting PDB 6F59 chain A and the TBXT G177D site-F region. The team compressed 2,274 prior-art compounds and 737 raw analogs into 503 filtered analogs, generated 30,000 BRICS recombinations, retained 67 novel QSAR-pass proposals, and assembled a 570-compound novelty-filtered pool using site-F/A/G grids, Tanimoto novelty control, and sourceability-aware generation.

The pipeline combines Vina ensemble docking, GNINA CNN pose and pKd scoring, Vina-trap detection, RF/XGBoost TBXT QSAR, Boltz-2 co-folding, MMGBSA/FEP scaffolding, T-box paralog selectivity, Rowan IC50/affinity analysis, RDKit descriptors and BRICS, onepot/muni catalog checks, and Bash/HPC automation. A QSAR model trained on 650 RDKit-valid SPR-derived compounds from 14 decrypted XLSX files, 15 campaigns, and 1,620 Kd fits reached Spearman ρ ≈ 0.49 and MAE ≈ 0.5 pKd; GNINA screened 569 of 570 candidates and identified 40 Tier-A, 51 Tier-B, and 73 Vina-trap candidates.

The final computational funnel moved from 570 compounds to 137 strict-pass candidates, 24 submission-ready candidates, and four judge-facing site-F selections. Filters included exact catalog matching, non-covalent chemistry, PAINS and forbidden-motif exclusion, lead-likeness, ESOL/logS, Tanimoto novelty, cost, supplier risk, and selectivity across 16 paralogs. The final four produced Boltz Kd estimates of 3.2–8.8 µM, Jack/SCC agreement of 1.01–1.34×, GNINA Vina scores of −5.01 to −6.19, pKd values of 3.94–4.69, and Rowan IC50-style predictions of 1.82–6.11 µM. The public research release is available on GitHub.

Deep Reinforcement Learning for Antibody–Antigen Interactions

The public project is an execution-ready research scaffold combining ESM-2-compatible antigen embeddings, a structure-informed cross-attention transformer for antibody CDR generation, OAS/SAbDab/IEDB/CoV-AbDab-style data curation, PPO-oriented candidate ranking, and PyTorch DDP / MIT SuperCloud execution artifacts.

The reward interface supports developability, validity, novelty, CDR-length, IGFold-style geometry, and AlphaFold-Multimer-style interface proxies. Reproducible smoke workflows use deterministic mock structure backends; production runs require real dependency-gated models or external structure services. The public release does not claim wet-lab binding validation, therapeutic performance, or equivalence to real production AlphaFold/IGFold inference.

VeriQSM + QSMBench — Artifact-Grounded Scientific Verification

VeriQSM asks whether a scientific agent merely completed a workflow or produced a scientifically credible result. It distinguishes nominal completion, verified scientific success, and false success, then uses typed workflow contracts, allowlisted execution, independently recomputed physical and numerical checks, provenance requirements, and bounded repair, retry, or refusal decisions.

The public VeriQSM v0.3.0-dev / QSMBench v0.3-draft release provides 240 authoring seeds, a 24-case evaluator-conformance mini, PySCF-compatible quantum-chemistry modules, PBC-aware statistical-mechanics analyses, baseline protocols, and release-audit tooling. It is intentionally not presented as QSMBench 1.0, a sealed benchmark, a completed multi-model comparison, or evidence of scientific superiority. The linked repository remains the current public ChemAgent-QSM alias while the release identity is migrated.

Kadanoff-GNN-RG — ML-Augmented Renormalization Group

Kadanoff-GNN-RG is public alpha research software for combining symmetry-adapted real-space coarse-graining, periodic four-color Metropolis spin configurations, typed-edge graph neural networks, calibrated phase classification, empirical RG-flow reconstruction, and finite-grid fixed-point candidate discovery in the square-lattice antiferromagnetic J1–J2 Ising model.

The workflow preserves uniform, Néel, and stripe ordering channels before blocking and compares learned representations with majority-rule baselines and conventional observables. Phase boundaries and fixed-point outputs are finite-size, sampler-dependent, and representation-dependent estimates; scientific use requires autocorrelation analysis, finite-size scaling, held-out seeds, parameter-grid refinement, Binder-cumulant and susceptibility checks, and robustness across coarse-graining choices.

Kupcinet–Getz Reaction–Diffusion AI

This public v0.3 research release develops a solver-faithful scientific-AI benchmark for nonlinear chemical oscillators, traveling waves, and synthetic morphogenesis. It combines shared reaction-network and graph-compartment models with RK4, BDF/Radau/LSODA/RK45, explicit finite differences, chemical-Langevin dynamics, Gillespie SSA, spatial RDME, physical convergence audits, scientific-AI baselines, calibration, abstention, matched-compute evaluation, and immutable evidence manifests.

The central question is whether numerical representation and intrinsic reaction noise change inferred morphology, phase boundaries, or scientific-AI confidence. Current headline claims remain untested or partial: smoke runs establish execution contracts only, not paper-figure reproduction, chemical validation, or completed benchmark conclusions. Production evidence, release authority, and independent clean-room validation remain explicit gates.

DFT → kMC — Auditable Multiscale Reaction Kinetics

The public implementation scaffold parses Gaussian and ORCA thermochemistry, checks stationary points and imaginary frequencies, converts activation Gibbs free energies into forward and reverse Eyring rates, compiles exact stochastic propensities, runs reproducible Gillespie kinetic-Monte-Carlo ensembles, compares solvent pathways, and records resolved configurations, package versions, hashes, uncertainty outputs, and human-readable reports.

The bundled five-solvent example is synthetic and demonstrates software contracts rather than reproduction of the original historical project. Original molecular structures, target identity, raw quantum-chemistry outputs, unpublished mechanism, and non-public research data are deliberately excluded from the public release.