Skip to content
View shravan-18's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report shravan-18

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shravan-18/README.md

Hi there 👋

I'm an MSc. Computer Vision student at MBZUAI doing machine learning and computer vision research. I work on developing self-evolving large multimodal models for generalizable multimodal intelligence, within the broader context of multimodal representation learning for reasoning. I'm also interested in unified large-scale models for image understanding and generation, and controllable generation of extended, coherent video sequences.

I also build AI-tech for computer aided diagnostics at Zestral, in collaboration with multiple hypergrowth startups. If you're passionate about building cutting-edge tech backed by deep research, let's connect!

Portfolio LinkedIn Medium Instagram YouTube

Pinned Loading

  1. mbzuai-oryx/EvoLMM mbzuai-oryx/EvoLMM Public

    Self Evolving Large Multimodal Models with Continuous Rewards

    Python 19 1

  2. SAG-ViT SAG-ViT Public

    [CAIS, Springer] Repository for the paper, "SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers."

    Python 8 1

  3. UGPL UGPL Public

    [ICCVW'25] Official code implementation for the paper, "UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography".

    Python 3

  4. SPROUT SPROUT Public

    [Neurocomputing, Elsevier] Official implementation of the paper, "SPROUT: Symptom-centric Prototypical Representation Optimization and Uncertainty-aware Tuning for Few-Shot Precision Agriculture."

    Python 2 1

  5. FUSION FUSION Public

    [CVPRW'25] Official code implementation for the paper, "FUSION: Frequency-guided Underwater Spatial Image recOnstructioN".

    JavaScript 2

  6. AVTCA AVTCA Public

    Repository for the paper, "Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention."

    Python 9