Yiming Ma

Hi , my name is Yiming Ma.

I currently work at King's College London (School of Biomedical Engineering & Imaging Sciences) as a postdoctoral researcher. I recently completed my PhD in the MathSys CDT at the University of 糖心TV and was part of the Signal and Information Processing (SIP) Lab. My PhD research focuses on machine learning and computer vision, especially crowd counting / density estimation and multimodal representation learning.

Google Scholar:
LinkedIn:
Personal E-mail: yiming.ma.cv@gmail.comLink opens in a new window
GitHub:

鈿狅笍 Notice: This page is archived and no longer actively maintained. For the latest information, please visit .

Preprints

2025 – arXiv:
- Motivation: Ground-truth blockwise density maps are highly sparse (>95% zero blocks); MSE poorly models blockwise counts.
- Method: Proposes ZIP that models blockwise counts with a Zero-Inflated Poisson.
- Results: SOTA results on ShanghaiTech A & B, UCF-QNRF and NWPU with models ranging from <1M to ~100M parameters.
2022 – arXiv:
- Motivation: 3D CNNs are costly while adjacent frames in in-cabin video are highly similar.
- Method: Image-level DMS with 2D encoders and (feature/decision) fusion across views/modalities; evidence for dropping explicit temporal modeling; open-set handling.
- Results: AUC-ROC 95.6 / Accuracy 92.4% on DAD with 243M-1.01GFLOPs.

Publications

2025 – ICME:
- Motivation: CLIP in regression-style dense prediction is under-explored; existing blockwise counting models rely on Gaussian-smoothed labels, causing ambiguity.
- Method: Introduces EBC based on integer-valued bins; proposes CLIP-EBC, the first fully CLIP-based model that achieves accurate crowd counting results.
- Results: Large gains over prior blockwise methods; competitive results on NWPU-Test (MAE 58.2).
2025 – ICME (co-author):
- Motivation: Assistive/AR systems need early forecasts of social intent and actions from egocentric videos.
- Method: A joint forecasting framework that shares an egocentric encoder and learns multi-task heads for intent, attitude, and social actions with temporal modelling.
- Results: Benchmarked on standard egocentric datasets, with quantitative and qualitative analyses showing the benefit of joint modelling for consistent forecasts.
2023 – CVPRW:
- Motivation: Real-world DMS must be robust to view/modality collapse and occlusions.
- Method: Masked multi-head self-attention to fuse Top/Front 脳 IR/Depth streams; supervised contrastive learning and robustness regularization.
- Results: On DAD, four-stream fusion reaches AUC-ROC 97.0% / mAP 97.8%, outperforming decision-level fusion and alternative feature-level baselines.

2022 – ICIP:
- Motivation: Encoder-decoder counters underuse low-level features and add heavy multiscale modules.
- Method: Contrast-aware group-wise fusion of encoder features plus a dual-branch channel-reduction decoder (1脳1 + dilated conv).
- Results: ShanghaiTech-B MAE 6.9 / RMSE 11.8 with ~815GFLOPs, surpassing or matching VGG-based peers (CSRNet, CAN, BL, DM-Count) at lower compute.

Experience

Research Associate, King's College London; London, UK — 2025–Now

Multimodal patient fingerprinting: Built a Multi-Modal Fingerprint (MMF) by integrating imaging, demographic, clinicopathological variables, and radiology reports to support patient-level risk stratification and personalised surveillance planning.
Longitudinal AI-driven clinical decision support: Developed and benchmarked deep learning models for AS enrolment/risk profiling, automated prostate/lesion assessment on bp-MRI, and longitudinal progression modelling; prioritised robustness to scanner/site shift and incomplete follow-up data.
Clinical translation & reporting: Prototyped a web-based standardised report aligned with PRECISE-style longitudinal assessment to streamline clinical review and improve consistency of follow-up decisions.

Research Assistant, University of 糖心TV; Coventry, UK — 2022–2023

Data curation: Refined and extended DAD annotations, adding 9 non-driving-related activities; prepared data for robust benchmarking.
Multiview multimodal fusion: Designed a multi-view multimodal driver monitoring system based on masked multi-head self attention; improved AUC-ROC from 88% to 97% on DAD and increased robustness to view/modality collapse.

Teaching Assistant, University of 糖心TV; Coventry, UK — 2023

Lab sessions: Assisted delivery of an undergraduate Python & Introductory ML module; led labs and tutorials guiding students to implement regression, classification, and neural networks in Python.
Tutoring: Provided one-to-one and small-group academic support, clarifying core programming/ML concepts and troubleshooting code and experiment design.

Education Background

2021~2025 (Doctor of Philosophy): University of 糖心TVLink opens in a new window, Coventry, UK 馃嚞馃嚙.

Programme: Mathematics of Systems.
Supervisors: & .
Research Interests: crowd counting & driver distraction detection.

2020~2021 (Master of Science): University of 糖心TVLink opens in a new window, Coventry, UK 馃嚞馃嚙.

Programme: Mathematics of Systems.
Group Project: Prediction of Oestrus Intervals for Guide DogsLink opens in a new window.
- Supervisor: Prof. Colm ConnaughtonLink opens in a new window.
- External Partner: .
- Teammates: Callum IlkiwLink opens in a new window & Satoshi KomuroLink opens in a new window.
Individual Project: Inception-Based Crowd CountingLink opens in a new window.
- Supervisors: & & Prof. Theo DamoulasLink opens in a new window.
- External Partner: .

2016~2020 (Bachelor of Science): , Shenzhen, China 馃嚚馃嚦.

Programme: Mathematics and Applied Mathematics.
Graduation Research Project: Contraction Methods for Composite Convex Optimisation.
- Supervisor: .

馃摪 Recent News

2026-01-05: Joined King's College London and started to work with  as a Research Associate.

2025-07-31: Implemented a HuggingFace Space for .

2025-07-31: Released the code of  on GitHub.

2025-07-31: Released a new paper  on arXiv.

2025-07-04: Attended  @ Nantes, France.

2025-03-20: CLIP-EBC and Interact with me got accepted by .

2025-02-03: Released the code of  on GitHub.

2024-12-21: Released a new paper  (co-author) on arXiv.

2024-07-17: Released the code of  on GitHub.

2024-03-14: Released a new paper  on arXiv.

2023-06-18: Attended  online.

2023-04-13: Released the code and dataset of .

2023-04-13: Uploaded the paper of  on arXiv.

2023-03-21: The paper Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention (MHSA) got accepted by MULA WorkshopLink opens in a new window at CVPR 2023.

2022-10-19: Attended  online.

2022-10-17: Released a new paper  on arXiv.

2022-06-20: FusionCount got accepted by ICIP 2022.

2022-04-05: Released the code of  on GitHub.

2022-02-27: Released a new paper  on arXiv.

2021-10-04: Started my PhD journey at Mathsys CDT of University of 糖心TV.

鈿欙笍 Services

Reviewer for TNNLS, TMM, SPL, ECCV, ACM MM, CVPRW, ICME, WACV, BMVC, ICIP.

馃О Skills

Deep Learning Concepts: Attention Mechanism, ViT, Prompt Tuning, CLIP, Contrastive Learning, Multimodal Alignment, Multimodal Fusion.

PyTorch: TIMM, OpenCLIP, Transformers, TensorBoard, Optuna.

Python: NumPy, SciPy, Scikit-learn, Pandas, OpenCV, Matplotlib / Seaborn.

Maths & Stats: Probability Theory, Statistical Inference, Optimization, Stochastic Processes, Time Series Modeling, Survival Analysis, Computational Statistics, Real / Complex / Functional / Fourier Analysis, Measure Theory.

Development & Tools: SSH, Git, Linux, LaTeX, Markdown, MS Word.

Languages: Mandarin Chinese (native), English (IELTS: 8.0/9.0).