Li Hongmin
I am a researcher at the Sato Laboratory for Biomedical Data Science, Institute of Science Tokyo. I am also a guest researcher at the Frith Lab, University of Tokyo. My current research interests center on AI-automated scientific workflows, AI research, bioinformatics, and AI for Science, with a focus on building computational systems that support scientific discovery and biological sequence analysis.
CV
Email: lihongmin[at]edu.k.u-tokyo.ac.jp
Recent Activities
- 2026.06 - Received Google Cloud TPU Builders Awards totaling USD 5,500 in GCP credits to support TPU-based AI and scientific workflow experiments.
- 2026.05 - Posted “Targeted Tests for LLM Reasoning: An Audit-Constrained Protocol” on arXiv, proposing an audit-constrained protocol for evaluating LLM reasoning under controlled prompt variation.
- 2026.05 - Posted “Separating Shortcut Transition from Cross-Family OOD Failure in a Minimal Model” on arXiv, analyzing how shortcut learning, shortcut-rule transition, and cross-family OOD failure can separate.
- 2026.05 - Posted “A Controlled Counterexample to Strong Proxy-Based Explanations of OOD Performance: in a Fixed Pretraining-and-Probing Setup” on arXiv, showing a boundary case where proxy-based structure explanations fail to track task-relevant OOD performance.
- 2026.05 - Posted “FastUMAP: Scalable Dimensionality Reduction via Bipartite Landmark Sampling” on arXiv, introducing a fast landmark-based dimensionality reduction method for repeated exploratory embedding.
- 2026.04 - Awarded a KAKENHI Grant-in-Aid for Early-Career Scientists for the ID3 project on input-data differentiable integrated frameworks for biomolecular sequence design.
- 2026.04 - Joined the Sato Laboratory for Biomedical Data Science at the Institute of Science Tokyo as a researcher; also started as a guest researcher at the Frith Lab, University of Tokyo.
- 2025.12 - Published preprint “Large generative mRNA language foundation model for efficient coding sequence generation and design with mRNA-GPT” on bioRxiv.
- 2025.10 - Published preprint “Gradient-based Optimization for mRNA Sequence Design” on bioRxiv.
Professional Experience
- 2026.4 - Present - Researcher, Sato Laboratory for Biomedical Data Science, Institute of Science Tokyo.
- 2026.4 - Present - Guest Researcher, Frith Lab, University of Tokyo.
- 2023.5 - 2026.3 - Postdoctoral Researcher, University of Tokyo.
- 2022.10 - Joined HAOMO.AI as a machine learning engineer for the “蓝色空间领航者” project.
- 2022.4 - Appointed as a researcher at the University of Tokyo (Details).
- 2022.3 - Received Ph.D. from the University of Tsukuba.
Publications
- 2026.05 - Published “Targeted Tests for LLM Reasoning: An Audit-Constrained Protocol” as an arXiv preprint. arXiv:2605.11599 [cs.LG]
- 2026.05 - Published “Separating Shortcut Transition from Cross-Family OOD Failure in a Minimal Model” as an arXiv preprint. arXiv:2605.12945 [cs.LG]
- 2026.05 - Published “A Controlled Counterexample to Strong Proxy-Based Explanations of OOD Performance: in a Fixed Pretraining-and-Probing Setup” as an arXiv preprint. arXiv:2605.11554 [cs.LG]
- 2026.05 - Published “FastUMAP: Scalable Dimensionality Reduction via Bipartite Landmark Sampling” as an arXiv preprint. arXiv:2605.11428 [cs.LG]
- 2025.12 - Co-authored “Large generative mRNA language foundation model for efficient coding sequence generation and design with mRNA-GPT” in bioRxiv. Link
- 2025.10 - Published “Gradient-based Optimization for mRNA Sequence Design” in bioRxiv. Link, Code
- 2023.9 - Co-authored “An Integrated Physical Approach to Earthquake-Induced Landslide Susceptibility Incorporating Geological Structure: A Case Study of the Diexi Catchment, Sichuan, China” in Engineering Geology. Link
- 2023.1 - Published “LSEC: Large-scale spectral ensemble clustering” in Intelligent Data Analysis. arXiv, Code
- 2022.7 - Published “Missing Value Imputation With Low-Rank Matrix Completion in Single-Cell RNA-Seq Data by Considering Cell Heterogeneity” in Frontiers in Genetics. Code
- 2022.6 - “Divide-and-conquer based Large-Scale Spectral Clustering” accepted by Neurocomputing. arXiv, Code
- 2020.11 - “Ensemble Learning for Spectral Clustering” published in ICDM 2020. PDF, Code
- 2020.11 - “Hubness-based Sampling Method for Nyström Spectral Clustering” published in IJCNN 2020. Link
- 2020.2 - “An Oversampling Framework for Imbalanced Classification Based on Laplacian Eigenmaps” published in Neurocomputing. Link
- 2019.8 - “Distributed Collaborative Feature Selection Based on Intermediate Representation” published in IJCAI 2019. Link
- 2019.8 - “Large Scale Spectral Clustering Using Sparse Representation Based on Hubness” published in CBDCom 2018. Link, Code
Funding
Research Grants
- 2026.06 - Google Cloud TPU Builders Award - USD 5,500 in GCP credits for TPU-based AI and scientific workflow experiments
- 2026.04 - KAKENHI Grant-in-Aid for Early-Career Scientists: Development of an Input Data Differentiable Integrated Framework (ID3) for Biomolecular Sequence Design - ¥4,550,000 (Project Info)
- 2025.10 - Google Grant: Input Data Differentiable Designer: A Novel ML Algorithm for Biological Sequence Optimization - $30,000
- 2024.4 - KAKENHI Young Researcher Grant for developing a large-scale language model integrating RNA sequences and text (Project Info)
Scholarships
- 2021.10 - JST SPRING Fellowship (Pioneering Research Initiated by the Next Generation)
Conferences
Conference Activities
Peer Review Activities
- 2026.01 - The Journal of Supercomputing
- 2024.11 - Pattern Analysis and Applications
- 2024.11 - Knowledge and Information Systems
- 2024.10 - Neurocomputing
- 2024.09 - The Visual Computer
- 2024.03 - International Journal of Machine Learning and Cybernetics
- 2022.11 - Transactions on Pattern Analysis and Machine Intelligence
- 2022.10 - Briefings in Functional Genomics
Awards
- 2026.06 - Google Cloud TPU Builders Award - USD 5,500 in GCP credits
- 2020.10 - Second prize, AETA Earthquake Prediction AI Algorithm Competition 2019 (News, Slide)
- 2019.10 - Special award, 3rd Analysys International Algorithm Competition - PV, UV Prediction Competition
- 2019.8 - Best Paper Award at CBDCom 2018 for “Large Scale Spectral Clustering Using Sparse Representation Based on Hubness”