
Aru Sharma
B.E. Information Technology Student at
University Institute of Engineering and Technology, PU
About Me: I am passionate about building intelligent systems that bridge the gap between human communication and machine understanding. I build multimodal AI systems, contribute to open-source projects, and explore the intersection of NLP and Computer Vision. My approach to engineering is to study where current solutions fall short in real-world applications, and develop practical improvements that make AI more accessible and useful.
Experience: My journey involves significant contributions to open-source ecosystems. I worked as OSS contributor at Google Summer of Code with Mifos Initiative, developing multi-agent bots. I interned at Summer of Bitcoin to contribute to Bitcoin Transcripts. I was also an LFX Mentee at CNCF WasmEdge, and contributed to DocETL at UC Berkeley's EPIC Lab. I have also worked as an AI Engineer with Bennett Legal and HomeHive AI, and conducted risk analysis for crypto tokens.
Community: I lead the OSS club (Pclub) at my college to promote OSS. I also hosted events like Software Freedom Day, OSS hackathons like FOSSHACK and started AISOC so that students can get familiar with how to start contributing to OSS.
Achievements: Selected for the first edition of ESOC'25 under the Open-Source AI for Drug Discovery project. Ranked 15 globally on the NTIRE Image Dehazing and Denoising challenge at CVPR 2024. Published research on Speech Emotion Recognition accepted at the 16th ICCCNT 2025.
About
Education
B.E. Information Technology
University Institute of Engineering and Technology, PU
Mathematics and Computer Science
Little Scholars, Kashipur (CBSE)
Current Focus
Interested in memory augmented AI systems that can learn and evolve with time just like humans do.
Currently building multimodal AI systems and exploring the intersection of AI with current software landscapes.
Experience
- Developed a multi-agent bot letting users know the status of Jira tickets, questions related to Slack discussions.
- Developed a full-stack web application using FastAPI, NextJs and Firestore as database and Auth client.
- Designed and prototyped AI-assisted coding tools for Bitcoin using small language models and domain-specific Retrieval-Augmented Generation (RAG).
- Developed data pipelines to ingest knowledge from Bitcoin developer calls, YouTube talks, IRC logs, mailing lists, and forums.
- Contributed User Defined Functions, LLM based data parsing and OCR modules to enhance the usability capability of DocETL.
- Added structured generation support for Open-Source model based backend using Outlines.
- Developed a RAG based chatbot for code assistance using opensource LLMs with Wasmedge runtime.
- Created a pipeline to ingest data from Github repository, augmented it using QnA pairs, summary and then embed this into a Qdrant vector database.
Projects
Implemented a multimodal emotion recognition system using late and gated fusion techniques on audio and video embeddings to classify emotional states.
Key Features:
- Whisper-large-v3 for audio feature extraction
- V-JEPA for video visual embedding extraction
- Gated Fusion Network for combining modalities
Technologies:
Developed an autonomous multi-agent system that facilitates interaction and collaboration of specialized agents to perform comprehensive research tasks.
Key Features:
- DuckDuckGo Search Agent for web articles
- ArXiv Agent for academic papers
- Supervisor Agent for task coordination
Technologies:
Benchmarked models from Open-ASR leaderboard for transcribing talks from bitcoin conferences with GPU acceleration support.
Key Features:
- Multi-model support and evaluation
- GPU acceleration for efficient processing
- Chunked processing for long audio files
Technologies:
Interaccionismo
Shared thoughts on AI, open-source, and building systems
Contact
Get in Touch
Open to full-time positions, and collaborations in AI/ML.