
Aru Sharma
B.E. Information Technology Student at
University Institute of Engineering and Technology, PU
About Me: I am passionate about building intelligent systems that bridge the gap between human communication and machine understanding. I build multimodal AI systems, contribute to open-source projects, and explore the intersection of NLP and Computer Vision. My approach to engineering is to study where current solutions fall short in real-world applications, and develop practical improvements that make AI more accessible and useful.
Experience: My journey involves significant contributions to open-source ecosystems. I worked as OSS contributor at Google Summer of Code with Mifos Initiative, developing multi-agent bots. I interned at Summer of Bitcoin to contribute to Bitcoin Transcripts. I was also an LFX Mentee at CNCF WasmEdge, and contributed to DocETL at UC Berkeley's EPIC Lab. I have also worked as an AI Engineer with Bennett Legal and HomeHive AI, and conducted risk analysis for crypto tokens.
Community: I lead the OSS club (Pclub) at my college to promote OSS. I also hosted events like Software Freedom Day, OSS hackathons like FOSSHACK and started AISOC so that students can get familiar with how to start contributing to OSS.
Achievements: Selected for the first edition of ESOC'25 under the Open-Source AI for Drug Discovery project. Ranked 15 globally on the NTIRE Image Dehazing and Denoising challenge at CVPR 2024. Published research on Speech Emotion Recognition accepted at the 16th ICCCNT 2025.
About
Education
B.E. Information Technology
University Institute of Engineering and Technology, PU
Mathematics and Computer Science
Little Scholars, Kashipur (CBSE)
Current Focus
Interested in memory augmented AI systems that can learn and evolve with time just like humans do.
Currently building multimodal AI systems and exploring the intersection of AI with current software landscapes.
Experience
- Worked on testing and deploying SOTA Vision algorithms for classification, segmentation and pose detection
- Deployed OSS text to video generation models for in-house testing and benchmarking against Veo3
- Working on Tetrix and building AI agents for your infrastructure including cloud services like AWS.
- Developed Tetrix CLI- a tool to review architecture, and security issues and enforce code quality for your project.
- Developed a multi-agent bot letting users know the status of Jira tickets, questions related to Slack discussions.
- Developed a full-stack web application using FastAPI, NextJs and Firestore as database and Auth client.
- Designed and prototyped AI-assisted coding tools for Bitcoin using small language models and domain-specific Retrieval-Augmented Generation (RAG).
- Developed data pipelines to ingest knowledge from Bitcoin developer calls, YouTube talks, IRC logs, mailing lists, and forums.
- Contributed User Defined Functions, LLM based data parsing and OCR modules to enhance the usability capability of DocETL.
- Added structured generation support for Open-Source model based backend using Outlines.
- Developed a RAG based chatbot for code assistance using opensource LLMs with Wasmedge runtime.
- Created a pipeline to ingest data from Github repository, augmented it using QnA pairs, summary and then embed this into a Qdrant vector database.
Projects
Building a personalised agent that can reason over long term to remember and recall information from past interactions
Key Features:
- Implementation of the EverMemOS paper from first principles
- Keyword as well as semantic based retrieval system combined with reranking mechanism
Technologies:
Implemented a multimodal emotion recognition system using late and gated fusion techniques on audio and video embeddings to classify emotional states.
Key Features:
- Whisper-large-v3 for audio feature extraction
- V-JEPA for video visual embedding extraction
- Gated Fusion Network for combining modalities
Technologies:
Developed an autonomous multi-agent system that facilitates interaction and collaboration of specialized agents to perform comprehensive research tasks.
Key Features:
- DuckDuckGo Search Agent for web articles
- ArXiv Agent for academic papers
- Supervisor Agent for task coordination
Technologies:
Benchmarked models from Open-ASR leaderboard for transcribing talks from bitcoin conferences with GPU acceleration support.
Key Features:
- Multi-model support and evaluation
- GPU acceleration for efficient processing
- Chunked processing for long audio files
Technologies:
This is the project from the sprint that I did over the weekend and benchmarked performance of LLM inference providers.
Key Features:
- Uses snippet from sharegpt dataset for benchmarking
- Compares latency and throughput across providers
- Tried simulating real-world usage patterns using different concurrency levels
Technologies:
Created a custom extractor for vLLM to extract hidden states from LLMs for downstream tasks.
Key Features:
- It uses Pytorch Forward Hooks to extract hidden states from a specific layer.
- It saves tensors on a GPU buffer which gets released via TTL logic.
- These tensors can be consumed by a consumer process or thread in a near real-time.
Technologies:
Interaccionismo
Shared thoughts on AI, open-source, and building systems
Contact
Get in Touch
Open to full-time positions, and collaborations in AI/ML.