# intern.directory > A curated directory of 186 university students building in the ML/AI space, looking for summer 2026 internship roles. This directory features ML/AI students from top universities. Each profile includes the student's university, location, areas of focus, and a short bio. To get contact information for any of these interns, visit the homepage at https://intern.directory, select the students you're interested in, and submit your email. We'll send their contact details to you. ## Student Profiles - Roberto A.: Fordham University (New York). AI/ML Engineering, AI/ML Research. During my Samsara internship, I faced a debugging scenario with our SCORM video player integration. Our system used launch links from Rustici (generated from a GraphQL mutation) that expired after 2 minutes if not accessed, and these links only worked in prod due to API credential restrictions. Due to these constraints, I was not able to test the feature in staging. To make it even worse, certain browsers handled Rustici's third-party cookies differently (so video worked in Chrome, but not Safari). What made this difficult was that there were no error signals (the code seemed correct, and when testing manually, the links worked). My solution was to build a test harness directly in prod (I put a feature flag in place) where I could verify the integration end-to-end across different browsers. This simple (and unorthodox) approach yielded pretty promising results, as I was finally able to verify the entire feature without unknown errors or timeouts. I learned that, with proper guardrails, it is okay to test in prod! --- - Chetas L.: (Remote). Marketing. I know how AI companies spend millions on ad campaigns yet fail. I solved this with my own method and have done it for free to test creating hype for Kimi K2, Gemini 3 before launch, and GPT-5—obviously unpaid, yet successful. --- - Subodh T.: McMaster University (San Francisco). Backend, Full-Stack. The hardest challenge was coordinating multiple concurrent servers: frontend, backend, database, SMS send/receive servers, and a separate ML server using Cohere with Tavily, so everything stayed synchronized. I owned the system architecture, built the SMS servers, and synced all services using Redis, which I had never used before. We split tasks across the team, but I handled the integration layer end-to-end. I worked through it by learning from documentation, YouTube resources, extensive debugging, and occasional mentor input. --- - Akbar K.: University of Missouri (San Francisco). AI/ML Research, AI/ML Engineering. Most difficult technical problem I faced was stabilizing droplet formation on the Dimatix printer for a new ionic-liquid ink. I had no starting parameters, so I designed a small experiment grid over voltage, frequency, meniscus, and temperature, logged high-speed videos for each point, and manually labeled droplet outcomes. Using that data I trained a simple decision-tree model that predicted good settings and removed most of the trial-and-error, cutting setup time from days to about an afternoon. --- - Mihir S.: UC Santa Barbara (San Francisco). AI/ML Engineering, Backend. The most difficult technical problem I faced was when I first started working on refactoring the LDAP authentication agent during my Cisco internship. I had just been introduced to the codebase and had been shown this simple driver one of my colleagues had made when they had attempted to refactor the codebase several years ago but had stopped. I was using that as part of the development process to quickly test how my refactored authentication agent compared to the monolithic, fully featured process inside the main LiNA codebase. After I had made some changes to my auth agent—increasing the number of threads, pipelining, and simplifying the internal FSM—I ran the driver. When I did, every single one of the connections from my auth agent failed, whereas the monolithic process only failed on 35% of them. I was really confused at the time, since I thought I had done everything right and a total failure seemed impossible from the changes I had made. I thought I had perhaps multi-threaded it too much, so I experimented with reducing the number of threads, but that didn't fix anything. I then thought I had implemented pipelining improperly, so I reviewed my implementation to see if I was losing requests somehow. Neither I nor my manager could find any issues with that. After inspecting my code every possible way, I realized that I had removed the timeout limitation, and that had led to a default value from somewhere else in the codebase being applied, which was so low that no connections could be made. Resolving that issue by setting the timeout to what it was before (I had removed it because I thought for debugging I didn't want requests to timeout) led to my auth agent losing only 2% of packets on average. Even though the resolution was pretty simple, that was probably the most difficult technical problem I've faced due to how far away it was abstracted from where I was working and how much time I spent debugging it. --- - Ashvin V.: UC Berkeley (San Francisco). Hardware/Compilers. Last August, I was on a short deadline—just a couple of weeks from launch, one from when we ship our payload—when our PCBs turned out to be broken, with our copper connections not having continuity, making them useless. In addition, we didn't have sufficient electromagnetic interference (EMI) testing. More bad news was to follow: our servos couldn't push our injectors into our payload, and it would take too long to ship a new one. All seemed lost, but I made up my mind to do the best we possibly could. I redid the entire circuit on breadboards, modifying the circuit to use fewer components; a blessing in disguise, this made it a lot easier to certify EMI since we were using off-the-shelf parts now. We used a different lubricant we found in Ace Hardware, which let our existing motors work. We repurposed huge, unused speakers in the physics building storeroom to do our vibration testing and took the payload with us on the BART for acceleration testing. Somehow, piece by piece, we managed to complete the payload in time. It taught me that sometimes the perfect solution isn't available, but creativity and persistence can make up for a lot. --- - Lorenzo S.: ETH Zurich / University of Twente (Remote). AI/ML Research. Trained a world model to play Clash Royale with access to 1 L4 GPU. Solved this by using a vision pipeline to augment the pixel inputs to the model and achieving faster convergence. Coming up with a cool research question for my thesis was also quite challenging (solved by reading many papers and tweets). --- - Harikrishna P.: TU Dresden (Remote). AI/ML Research. I have been trying to tackle the problem of continual learning for a while due to architectural and algorithmic limitations. I concluded on two solutions: one with an algorithm that uses local learning rules to set dynamic learning rates for different neurons along with a growing architecture, and another where I decided to make a topological memory for AI systems that can store memories in a compressed graphical manner. --- - Vijayavallabh: IIT Madras (Remote). AI/ML Research. One of the hardest technical problems I worked on was making RAG truly trustworthy for financial QA over SEC 10-K filings for the Inter-IIT Tech Meet 13.0 for Pathway, which eventually became our EMNLP 2025 FinNLP paper. The core difficulty was that vanilla dense retrieval or single-step HyDE would miss or confuse critical sections that looked semantically similar but differed in key numbers or years, especially in long multi-year reports and tables. That meant the model either answered with incomplete evidence or hallucinated details, which is unacceptable in finance. My approach was to redesign the retrieval and reasoning stack end-to-end. First, we extended HyDE into Multi-HyDE, where for each question we generate multiple non-equivalent hypothetical answers and queries, and then combine their embeddings with a hybrid BM25 retriever specialized for both text and table segments in 10-Ks. This significantly improved coverage and helped disambiguate sections that differ only in subtle numerical or temporal details. Second, we wrapped retrieval inside an agentic workflow: a central LLM agent decomposes complex questions into smaller steps, calls Multi-HyDE, keyword search, and table-specific tools as needed, and maintains a unified state so it can iteratively pull more evidence instead of guessing. We also enforced grounding by requiring all final answers to be explicitly backed by retrieved spans, which reduced the model's tendency to fall back on latent knowledge. Finally, we validated the system on standard financial QA benchmarks and with human evaluation. Compared to strong RAG baselines using single-step retrieval, our pipeline improved accuracy by about 11.2% and cut hallucinations by roughly 15%, at similar token cost to HyDE. For me, the key learning was that fixing "hard" hallucination problems in high-stakes domains often means rethinking retrieval and agent orchestration, not just swapping in a bigger model. We containerized the system with Docker, exposed it via FastAPI, and deployed on Azure so judges could hit a live endpoint and test arbitrary queries. --- - Jithin G.: Saint Louis University (Remote). AI/ML Engineering, Product, Full-Stack. Two problems, both are live in production now. 1) Flight Assistant (flight-assistant.app): Drone crash analysis used to take days. Analysts manually dig through 200+ CSV files trying to correlate sensor data. I built an AI agent that understands flight logs and lets you ask things like "correlate motor output with voltage drops" in plain English. The hard part was building tools that actually understand flight data, not just an LLM wrapper. 80+ researchers use it now. 2) Stride AI (App Store): Real-time voice coaching that adapts to your body. The tricky part was coordinating live heart rate/pace data with an LLM while keeping latency low enough that coaching feels responsive, not delayed. Built native iOS/Android integrations, WebSocket pipelines, and a multi-agent system. 60+ runners actively using it. Both ship to real users. That's what I care about. Quick walkthrough: https://www.loom.com/share/ce03372ed2f6483987d1fe3284f87617 --- - Shady A.: University of Minnesota (Minneapolis, Remote). AI/ML Research, AI/ML Engineering, Data Science. I don't have a specific problem that's "huge enough" to mention, but I believe each problem I come across is unique in its own way, and I always grow a step after solving it. :) --- - Parichay M.: Guru Gobind Singh Indraprastha University (Remote). AI/ML Research, Computer Vision, NLP. Built an autograd engine (like PyTorch) and trained RNNs and deep neural networks from scratch. --- - Soham K.: UC Santa Cruz (San Francisco). AI/ML Engineering, Full-Stack, DevOps/Infra. I built PromoPilot, an autonomous marketing loop that uses four specialized agents to generate and schedule multi-modal campaigns for resource-constrained startups. The most difficult technical problem was orchestrating the state and reliable handoffs between four disparate agents operating on different protocols. I had a Content Agent using Claude, a Media Agent using AWS, and a Scheduling Agent on Fetch.ai's decentralized network. Wiring them together was a nightmare because I had to manage state across a distributed system where agents could time out or fail silently. For instance, the media handoff was particularly brittle; getting the AWS-based video generator to pass asset data reliably to the decentralized scheduling uAgent required me to implement a direct uAgent-to-uAgent communication protocol. On top of that, I faced severe dependency hell trying to get the uagents library, boto3 (for AWS), and Flask to coexist, as they had conflicting version requirements that kept breaking the build. I had to methodically isolate these components and wrap the social posting logic inside a uAgent carefully to handle Twitter's strict rate limits, which made end-to-end testing of the autonomous loop incredibly slow and fragile. --- - Lute L.: University of Vermont (San Francisco). AI/ML Research, AI/ML Engineering. I might be tackling harder problems during my PhD, but I vividly remember that the hardest one was my undergraduate thesis project. I had to build a small video game demo for K-12 students showing the low probabilities of winning the lottery in order to prevent them from gambling. I employed Unity to create the demo. The layout was a city where there were supposed to be an infinite number of buildings. You could enter a building to find a floor full of bookshelves with N rows; in each row there were M books, and in each book there were Y pages. The chances of picking up the right building, bookshelf, book, and page are astronomically low. The complexity is that you cannot load an infinite number of such buildings due to resource constraints, but the player had to have the feeling of being able to walk indefinitely around the city. Thus, buildings had to spawn, and there had to be consistency in the numbers that were spawned with each building. Therefore, I had to come up with my own procedural generation algorithm (much like Minecraft) where there was a visualization horizon that the player could observe. I had to keep a constant process of computing the local and relative location of the player and update the environment (city) as the player was moving, all while respecting the physics and playability to make it realistic and to allow it to be playable on another computer with lesser resources. I remember breaking down the generation into a grid and computing a playable radius around the player with identification numbers that were used to load buildings consistently. Anyone who has ever worked with Unity knows that spawning objects and ensuring that everything works properly (roads and pavement connected, doors of buildings that work, transitions loading, etc.) is hard to do. That was it. It was challenging due to time, compute, and other resource constraints, but I managed to deliver it. I would say that research problems I tackle during my PhD might be orders of magnitude harder from a cognitive side, but this was a full start-to-end engineering problem that I had to solve. --- - Srushti J.: NYU (New York). AI/ML Engineering, Full-Stack. Agentic AI-based injury voice assistant driving nodes in LangGraph. Earlier I had stored questions in ask nodes, then I changed my approach by saving questions in the database and letting it drive the questions. --- - Devansh V.: LibreOffice (Remote). AI/ML Engineering, Backend. One of the hardest problems I've worked on was during my LibreOffice GSoC project on the BASIC IDE Object Browser. The goal was to introspect and expose a huge symbol surface: tens of thousands of UNO APIs, application macros, and user document symbols without breaking the BASIC runtime or freezing the IDE. The problem wasn't just performance; it was safety. Some ways of reading module code could trigger library loading, side effects, or even modify runtime state while merely "looking." The first versions worked, but they were fragile. I kept seeing strange runtime behavior and regressions that didn't map cleanly to my changes. That's when I realized the issue wasn't a bug — it was the approach. I stepped back and traced how BASIC modules are loaded internally, how parsing is tied to execution, and where I could safely read code without triggering runtime effects. That led me to design a safe parsing path: reading raw module source directly from files and adding a bSafeParsing mode that explicitly prevents side effects during analysis. 2024 GSoC work: https://devanshvarshney.com/libreoffice-google-summer-of-code-final-report 2025 GSoC work: https://devanshvarshney.com/libreoffice-google-summer-of-code-final-report-basic-ide --- - Abhinav S.: Savitribai Phule Pune University (Remote). Backend, Full-Stack, Other. I was given a task to build a full-stack project: a smart inventory management system designed to track inventory in local stores of the same brand and recommend stock transfers either from nearby stores or from the warehouse based on the demand level of each store. At that time, I had zero knowledge of web development, but I didn't panic and built the entire project on my own by reading documentation and utilizing online resources. I wrote the entire code myself and used ChatGPT only for a final review and for modifying it according to industry standards so that it would be readable for everyone. I faced many challenges, but I never gave up, and in the end, I successfully completed the project. It was also the first project where I applied my knowledge of data structures and algorithms; I implemented a search feature that was initially slow and optimized it using binary search, making it significantly faster. I also designed the transfer recommendation logic myself. --- - Carter W.: UT Knoxville (Remote). AI/ML Research, Backend. When I built my BCI real-time music generator project, the most significant challenge was bridging the gap between noisy, real-time EEG probabilities and the strict mathematical constraints of functional jazz harmony. A simple direct mapping failed because static harmonic weights could not accommodate the full dynamic range of the bio-signal, often resulting in progressions that felt either musically incoherent or unresponsive. I solved this by engineering an adaptive tension harmonizer that implements a proportional feedback control loop. This system continuously calculates the error between the user's target mental state (derived from the BCI classifier) and the realized harmonic tension, dynamically auto-tuning the coefficients for chord quality, extension complexity, and circle-of-fifths distance in real-time. This transformed the project from a random chord generator into a cohesive instrument that maintains smooth voice leading while fluidly navigating complex dissonance based on live neural feedback. --- - Brody W.: Emory University (San Francisco). Product, Marketing, Other. The most difficult technical problem I faced was during undergrad AI research. The core challenge wasn't the model, it was getting usable signal from extremely noisy, real-world data. I was working on a project that was using sensor-based data for classification. In controlled environments it was proven to work, but in real life was a different story (lots of noise and inconsistencies). I spent a long time trying to tweak the model or adjust inputs, none of it was working. I solved this by switching from trying to fix the model, to understanding the data. I ran various feature analyses, ablations, and refining preprocessing and collection methods. Once I clarified this, the model started to stabilize and I actually got an R² of .87. I learned that many AI issues are much less about the actual model and more about the data/assumptions. --- - Kit H.: (Remote). AI/ML Research, AI/ML Engineering. I'm going to be honest, I don't know how to answer this question. There are some fun problems I've worked on, like making in-production custom boolean logic parsing systems with operation order and parenthesis respecting, but in the world of language parsers, it's not particularly difficult. Optimizations of collision detection algorithms in RL training environments was technical, but the underlying logic was pretty simple. There was a particular use case for a static factory factory that was another interesting problem, but all of these fall under the same category of just look up the underlying structure, break it down into comprehensible chunks, and once you understand the underlying principles, build up until solved. I don't know if any of these are actually technically impressive though. I would say anything I would label as a difficult technical problem is one I haven't been able to solve yet, so learning Agda + HoTT for better proof writing and language creation is probably the hardest, but I haven't finished it yet. --- - Jiayuan L.: CMU (San Francisco, Boston, Remote). AI/ML Research, NLP. To ensure strategic stability in multi-agent systems, you tackled the challenge of incentive compatibility in LLM-to-LLM interactions, where natural language agents often deviate from traditional rational behavior through manipulation or hallucinated preferences. You solved this by developing a framework that maps high-dimensional LLM outputs into structured utility functions, applying bilinear optimization to minimize computational overhead, and utilizing program equilibrium concepts to allow agents to "verify" mutual cooperation protocols via prompt transparency. This approach effectively bridges the gap between the unpredictability of Large Language Models and the formal guarantees of mechanism design. --- - Jack G.: Northeastern University (Remote). AI/ML Engineering, Backend. Real-time webcam object detection fusing with radar detections to determine hazards to bicycle riders in real time. Displayed hazards on AR HUD. Solved it with optimized edge computing for real-time object detection and solved fusion algorithm coordinate transforms to align radar and camera data in 3D space. Achieved 25+ FPS and ~100ms latency for our system, winning 2nd place in senior capstone. --- - Rahul P.: UT Dallas (San Francisco). AI/ML Engineering, Backend. Situation: While building a Multi-Agent AI Research Engine, I hit a major roadblock where the system would frequently crash with 429 Resource Exhausted errors and hit API quota limits during complex, multi-step research tasks. Task: I needed to ensure the agents could complete long-running research workflows without being throttled or losing progress mid-task. Action: I implemented a custom request-queuing mechanism with exponential backoff. I also refined the model selection logic—using lighter models for basic reasoning tasks and reserving the high-tier models only for final synthesis. Additionally, I debugged architectural issues where missing dependencies in the execution environment were causing silent failures in the agentic loops. Result: This stabilized the engine, allowing it to run 100% of multi-stage research queries to completion without manual restarts, and optimized cost by reducing unnecessary high-tier API calls. --- - Buddhsen T.: NYU (New York). Full-Stack. I debugged a production crash caused by a race condition in a multi-threaded C++ component that only showed up under high concurrency. I traced it using logs and thread-level instrumentation, then fixed it by tightening synchronization and ownership boundaries. --- - Benson Y.: University of Waterloo (San Francisco). Full-Stack, AI/ML Engineering. Building the Rizz Glasses within 48 hours - integrating voice transcription through the Meta Ray-Bans and passing it to a Rizz Agent, giving you the best responses to rizz up the girl you are talking to. https://www.youtube.com/watch?v=lH4nAysbcm4 --- - Kush S.: Arizona State University (San Francisco). AI/ML Research, AI/ML Engineering. I built the entire Vedic astrology calculation engine by myself, and that was easily the hardest technical problem I've tackled. Vedic astrology is extremely unforgiving. If your time conversions, ayanamsa, planetary positions, or house calculations are even slightly off, the whole chart is wrong and nothing built on top of it matters. I could not rely on existing libraries because I did not trust them blindly, so I implemented everything from scratch and validated it chart by chart against known references and professional tools. Whenever something did not match, I traced it back to the underlying astronomical assumptions instead of patching the code. I rebuilt parts multiple times until the engine became deterministic, accurate, and scalable. In the end, I had a foundation I fully trust because I know exactly why every number exists. --- - Aidar M.: MBZUAI (San Francisco). AI/ML Research. One of the hardest problems I dealt with was getting a large model training run to behave reliably once we scaled it up. On a single machine, everything looked normal, but as soon as we moved to multiple GPUs, the run became unpredictable. Sometimes it would diverge, sometimes two runs with the same setup would end up with noticeably different results, and occasionally it would crash after running for hours, which was painful because it wasted a lot of compute. I stopped treating it like a hyperparameter tuning issue and approached it like a debugging problem. First, I improved the monitoring so I could see what was happening right before things went wrong. I tracked how quickly the model was changing, whether gradients were spiking, how the loss and other signals behaved over time, and whether certain data batches were consistently involved when failures happened. After that, I ran a series of small controlled experiments where I changed only one variable at a time, like the batch setup, the update size, and how strongly we regularized the updates. That process helped narrow it down to two main causes. The updates were occasionally too aggressive early in training, and there was also a mismatch between how we generated training data and how the distributed training loop consumed it, which created subtle instability. Once we fixed those and added a few safeguards, the training runs became stable and repeatable. We could finish jobs consistently and we saw clear improvements in model quality. --- - Shayan S.: University of Toronto (San Francisco). AI/ML Research, Robotics, Computer Vision. When training a VLA to do common household tasks in sim for the BEHAVIOR-1k benchmark, we spent 3 weeks debugging a model that was scoring incredibly well on loss metrics but at inference would do absolutely nothing but stare at the wall. We were puzzled for a long time. Thankfully, my background is in hardware, so it wasn't the first time I'd been faced with a seemingly bottomless issue, so I knew the deal. Not to drone on, the fix came when I decided to stop pouring through code and just sit down for a few days and analyze the rollout data. What was the robot doing? What was it trying to do? What was it doing during training? Eventually we ran a test where the robot would spend 10 seconds just performing ground truth actions. Then, we'd drop the policy in and let it take over. Funny enough, it fixed the issue. Why did it work? Through some more conjecture we realized the robot had been, well, cheating. At training it was taking 3 inputs: camera, language commands, and proprioceptive state. It had learned a cool trick to game loss — if I just continue the CURRENT trajectory, copying the last proprioceptive movement, I'll get it right most of the time. This was all well and good when movement was guided by ground truth (training time), but during rollout it had NO idea how to START moving. It just stood there, copying the previous proprioceptive vector delta (~0), staring at the wall. The solution was to hide the proprioceptive state from the model some percentage of the time (on a learning schedule) during training. Magically, it started moving. --- - Sukrit S.: Oxford (London, UK). AI/ML Research, Policy/Legal. I was tasked with predicting plasma shape parameters from videos of plasma in a fusion reactor (tokamak). An important requirement for the model was quantifying uncertainty in its predictions. I was using a deep learning-based segmentation model (Meta's SAM) and had to figure out how to map from the shape of the plasma region to the shape parameters—elongation and triangularity (which are traditionally predicted using magnetic probes). To solve this, I ended up using a Gaussian Process for this mapping, which also gave nice uncertainty bounds. Over 92% of the true parameter values were in the 2-sigma confidence interval of the model. Here's the link to a nice demo of the final model: https://cvprojectapp-wcvsmvztvb52thgbk998o8.streamlit.app/ You can see my CV for some other cool projects: https://drive.google.com/file/d/1jU7xYtWZAhXqUDVbyz7MxfULIa4rIz8N/view?usp=sharing --- - Vedant B.: UC Riverside (San Francisco). AI/ML Engineering. I designed and implemented an orchestrated workflow where multiple AI agents could analyze natural-language machine learning use cases, research external sources for relevant domain factors and strategies specific to the use case, propose and generate features, and then assess their impact on model performance. This system was built as a submodule of a larger AutoML project Finarb (the company I was working for) was developing. I implemented the entire pipeline, from prompt design and guardrails to execution logic, error handling, and evaluation. Because the system relied on LLMs, it was inherently non-deterministic, which meant testing and validation were significantly more challenging than in a typical software engineering project. Making the system reliable enough to be used in practice required careful design choices around constraints, verification, and failure modes, as well as extensive testing. I would say this is the most difficult technical problem I had faced so far. --- - Asad U.: University College Dublin (Dublin, London). AI/ML Engineering. My research is focused on self-supervised speech representations for low-resource speech models. I am a 4th year PhD student at University College Dublin. --- - Adam E.: (Cambridge, MA). AI/ML Research, AI/ML Engineering. Figuring out how to train ViTs on small data like CIFAR-10 as they're known to be data hungry. I first identified the main bottleneck in the architectural design of ViTs and kept adding incremental modifications to improve accuracy until I concluded that ViTs lack inductive bias which can be found in CNNs, so I replaced the raw patching with a CNN backbone as well as augmenting the data to superficially add as much variance as possible and mimic data abundance. The end result was achieving Pareto Frontier; meaning: the lightest ViT trained from scratch to achieve ~93.4 top-1 accuracy on CIFAR-10 over only 50 epochs. Project can be found here: https://github.com/Brokttv/Vit-on-small-data --- - Hoang D.: Gannon University (New York). Hardware/Compilers. I built autonomous vehicles designed to observe electric power grids. --- - Vayu T.: Galgotias University (Remote). AI/ML Research, AI/ML Engineering. The most difficult technical problem I faced was accurately integrating satellite data to measure the exact number of hectares of land affected by deforestation in the tracker app. Initially, handling real-time geospatial data and ensuring precision was challenging. I solved this by sourcing verified satellite imagery and processing it through geospatial APIs, using Google and Jio mapping tools to analyze land degradation and automate accurate calculations. --- - Neel G.: UCC (Remote, London, EU). AI/ML Engineering. I need the actual text content to clean up, not a URL. Please copy and paste the text you'd like me to edit for grammar, spelling, and capitalization. --- - Immanuel P.: University of Chicago (San Francisco). Full-Stack, AI/ML Engineering. Starting with zero Kubernetes experience, I took a first-principles approach to learning by building: first deploying a vLLM instance on GKE, followed by a full-stack Next.js, FastAPI, and Postgres application on a raw K8s cluster. Once I grasped the underlying abstractions, I formalized them into Hostess—a "Docker Compose for production." It automates the entire lifecycle (CLI → API → Docker → K8s) from a single hostess.yml, handling service discovery, secrets, and observability for the full stack. This methodology—shipping the leanest possible system to identify core patterns, then distilling them into reusable primitives—is the foundation of how I build. --- - Dheeraj M.: Chhattisgarh Swami Vivekanand Technical University (Remote, San Francisco, New York). AI/ML Research, AI/ML Engineering, MLOps. Problem: Catastrophic forgetting during sequential molecular property prediction—maintaining performance on earlier tasks while training on new tasks. My role: Lead implementer and co-author of the MTL-PORL framework (refresh-learning + Pareto optimization). I designed episodic training pipelines, implemented refresh (unlearn + relearn) strategies, and integrated Pareto-optimal multi-task gradient aggregation with hyper-gradient-based unlearning into ChemBERTa-based models. Solution highlights: Built robust episodic training and evaluation pipelines, added hyper-gradient unlearning modules, and implemented Pareto gradient aggregation to balance stability–plasticity tradeoffs. Results: Significant reduction in forgetting and strong anytime/test accuracies on multiple molecular datasets (Anytime Avg. Accuracies ≈ 91.63%, 94.89%, 92.67%; Test Accuracies ≈ 92.48%, 96.48%, 96.86%; Forgetting measures ≈ –0.0048, –0.0045, –0.0063). --- - Vitthal B.: University of Washington (Seattle). AI/ML Research, AI/ML Engineering. These days, I am really excited to be working on building a language model from scratch (motivated by Stanford's CS 336 course). I started by building a Tokenizer in Python from scratch using the Byte-Pair Encoding algorithm. It was awesome! I wrote a blog about it. Blog: https://vitthal-bhandari.github.io/blogs/experiments-with-tokenization.html Code: https://github.com/vitthal-bhandari/cs-336-assignment1-llms/tree/main I think the best part of writing something from scratch without using Claude Code/Cursor is the serendipity. Halfway through, I realized why I love coding. The findings that stuck most with me were: > Data structure optimization helps—until it doesn't. Heap-based selection is great when the heap stays "clean". On the Open Web Text dataset (11.92 GB), my heap exploded due to stale entries, and the algorithm slowed down (it took 3 hours to tokenize without heap and 6 hours with heap) > On smaller datasets (< 1 GB), pre-tokenization is the bottleneck, while on larger datasets (> 1 GB), merging is the bottleneck > Multiprocessing is a win (with guardrails) – Pre-tokenization parallelizes cleanly. But "max workers" is not the goal; "max throughput without memory death" is. > Surprisingly, cold cache gave a highly skewed approximation of the total tokenization time. I had to take an average of 2-5 runs to get a better idea! > Tokenization is a function of corpus size, algorithmic complexity, parallelization, and compute --- - Dilpreet B.: York University (Remote, Canada, USA). AI/ML Engineering, Full-Stack. The hardest problem I faced was designing a real-time inventory and pricing system where physical yard data, contracts, and unit conversions all had to stay consistent under constant change. I solved it by breaking the system into deterministic services with strict schemas, automated tests, and continuous validation, which cut errors dramatically while improving speed and reliability. --- - Sanchit A.: University of Washington (San Francisco). Full-Stack, AI/ML Engineering, Product. The hardest problem I tackled was building AutoFlow at UBS, an AI system that converts natural language descriptions into executable automation workflows. I identified a real bottleneck: only a handful of people had the know-how and experience to create automations in our proprietary automation platform (built on top of Amelia), and they were spending 4+ hours per workflow. Engineers had to understand the exact syntax, map out all the logic branches, and test everything. It was tedious, error-prone, and blocking teams from automating their work. I wanted to make it conversational: just describe what you want automated, and the system builds it for you. The technical difficulty was multi-layered. First, I had to parse natural language that was often vague or ambiguous. Engineers would say things like "check if the server is healthy" without specifying what "healthy" means. I used GPT-4.1 for intent understanding, but raw LLM outputs weren't reliable enough for production code. Second, I needed to orchestrate multiple steps: understanding the request, breaking it into sub-tasks, generating the actual workflow code, and validating it wouldn't break anything. That's where LangGraph came in for multi-step orchestration. The breakthrough came when I stopped trying to make the AI perfect and instead built a feedback loop. The system would generate a workflow, show it to the engineer for validation, and learn from corrections. I also created a library of common patterns the AI could reference, which dramatically improved accuracy. The result? We cut down 95% of the time spent creating automations, from 2 hours to 5 minutes. It helped 10 teams ship automations they couldn't have built otherwise, and honestly, seeing engineers who used to dread the process actually excited to use AutoFlow made all the debugging worth it. The lesson I learned: sometimes the hardest technical problems aren't solved by making the technology more complex. They're solved by designing the right human-AI collaboration. --- - Sankalp S.: A Dropout (Remote). AI/ML Engineering, Frontend. Technical problem as in the most difficult one? Honestly nothing major, but I do have a story about how I faced a problem and fixed it. I am basically a vibecoder who kind of understands code. I can read through code, and if there is a visual or functional issue, I can usually track it down and fix it. Once, I was building an AI-powered quiz maker app. The idea was simple: you upload your notes, handwritten or digital, and the app turns them into a test. I used the Gemini API for the AI part. One common issue with AI code generators is that they often produce outdated code. I was aware of this, but I still missed it. I was generating code using the latest Gemini model, it had internet access, and my prompt always included a line asking it to verify that all generated code was updated and current. Despite that, the API name in the code was constantly set to an outdated model, gemini_1.5_pro_preview, while the actual latest model was gemini_3.0_pro_preview. This turned into a real headache. For hours, I kept digging through the JS code, trying different fixes, checking every possible angle. Even when I saw the error clearly in the inspect tab, I dismissed it because I assumed the model name could not be the issue. Eventually, after going back and forth enough times, I realized the entire problem came down to that single outdated model reference. Once I updated it, everything worked perfectly. --- - Gayathri S.: UMass Amherst (Remote). AI/ML Research, AI/ML Engineering. At Morgan Stanley, I worked on a real-time ETA inferencing application that needed to handle queries across a graph with 5 million nodes simultaneously. The core challenge was training ML models for time series data—we had massive data volume combined with numerous custom datasets that each required individual handling. Training was taking far too long to be practical for production deployment. Each dataset had unique characteristics requiring custom preprocessing and feature engineering. We also needed real-time inference, so we couldn't just batch process everything offline. To solve this, I built custom Spark logic to parallelize the ML training across datasets and then aggregate the results, which gave us an 8x speedup. For the real-time inference piece, I implemented a custom DFS-based approach that could truncate at currently running jobs and parse values back up to the root node. This allowed us to get estimates without having to traverse the entire 5-million-node graph for every query, making real-time inference 25-30x faster. --- - Sabbir A.: UT Dallas (Dallas). AI/ML Research. https://ieeexplore.ieee.org/abstract/document/10454894 Mainly optimization between redundancy and deduplication. I also worked on various other research problems: https://scholar.google.com/citations?user=N30jT7EAAAAJ&hl=en --- - Shreyansh S.: NYU (New York, San Francisco). AI/ML Engineering, Frontend, Backend. Bank of America: One of the most difficult technical problems I faced was owning a centralized data management platform integrating multiple Line-of-Business systems with different schemas, refresh cycles, and latency constraints, where traders required near real-time consistency. The system was experiencing high latency and unreliable downstream analytics, so I approached it by first instrumenting the ETL pipeline to identify bottlenecks, which revealed heavy serialization overhead and inefficient query patterns. I redesigned the pipelines to use incremental processing instead of full batch refreshes, optimized database indexing and joins, and introduced asynchronous processing for non-blocking tasks. I also added monitoring dashboards, alerting, and automated data validation checks to improve reliability and observability. This reduced system latency by about 55%, improved downstream analytics accuracy, and significantly reduced operational overhead, reinforcing my approach of measuring first and solving problems at the architecture level rather than just optimizing code. DemoDay AI: The hardest challenge was building a real-time voice-first AI feedback system that could process speech, generate investor-style feedback, maintain conversation context, and respond with low enough latency to feel interactive. I designed a FastAPI-based orchestration layer that handled streaming voice input, transcription, LLM feedback generation, and text-to-speech output, while storing session context efficiently to maintain conversational continuity. To reduce latency, I parallelized transcription and context retrieval, cached YC knowledge embeddings, and optimized container startup times. I also implemented structured prompt engineering using investor personas and added conversation summarization to control token usage while preserving context quality. This enabled a real-time conversational feedback experience that scaled across concurrent users and delivered meaningful investor-style responses, teaching me that real-time AI products are fundamentally distributed systems problems as much as they are ML problems. --- - Dhruv J.: Thadomal Shahani Engineering College (Remote). AI/ML Research, AI/ML Engineering. I priced insurance for a satellite using real-time data from NOAA. I first integrated and processed the data from NOAA, which was updated every 30 minutes. Based on this data, I calculated the chance of a geomagnetic storm, which was then used to perform Monte Carlo simulations that calculated the risk, from which the option was priced. --- - Kehinde A.: Ball State University (Remote). AI/ML Research, AI/ML Engineering. One of the toughest technical challenges I faced during my research on Green AI was finding a way to reduce the energy consumption of large-scale machine learning models without compromising their performance. Training these models typically requires massive amounts of energy, but making them more efficient often led to poorer results. To tackle this, I explored techniques like model pruning, quantization, and distillation, which helped reduce the model size and energy use without losing accuracy. I also worked on optimizing the hardware used for training to make it more energy-efficient. By combining these strategies, I managed to significantly cut down on energy consumption while keeping performance high, which became the foundation for my research on making AI more sustainable. This experience taught me the value of balancing innovation with practical solutions. --- - Pranav S.: University of Washington (San Francisco). Hardware/Compilers. The hardest technical problem I've faced was reverse-engineering the Xbox One controller's USB protocol with zero documentation. I wanted to build a macOS driver for game streaming, but the controller wouldn't respond to anything: it just sent the same 64-byte packet no matter what buttons I pressed. There was no existing way to use these controllers on macOS, and the implementations I could find on GitHub were all broken. After days of debugging, I discovered it needed a specific 5-byte initialization handshake before it would send real input data. I found it by capturing USB traffic from a working Linux driver and comparing packets. Once it started responding, I built a debug tool to map buttons to bytes. I'd press A, note which byte changed, press B, compare—slowly piecing together that buttons were bit flags, triggers were 16-bit integers, and analog sticks were signed values. --- - Karim S.: Alexandria University (Remote). AI/ML Engineering, Backend. Building techniques to evaluate state-of-the-art AI agents in specialized software testing tasks and designing an open-source agent with enhanced understanding of multiple code hunks simultaneously, achieving results comparable to state-of-the-art agents. Improved agents' understanding of bug reports, resulting in a paper submission to ISSTA 2026. Conducted as a Research Assistant at the University of Illinois under Prof. Darko Marinov. --- - Ananth S.: Auburn University at Montgomery (San Francisco, New York, Remote). Full-Stack, Backend, AI/ML Engineering. The Problem: I needed to build a real-time WebSocket gateway for a "Smart Support" system that could handle thousands of concurrent state-heavy sessions without significant latency or memory leaks. The Solution: I architected a custom synchronization layer using asynchronous processing in Python (FastAPI). I implemented an event-driven model to manage socket heartbeats and state persistence, significantly reducing overhead per connection. This ensured that even under high throughput, data consistency across the dashboard remained near-instant. --- - Farhaj S.: Bucknell University (New York). AI/ML Engineering, Full-Stack. The hardest technical problem I faced was building a reliable data export system for Prometric's EdPower platform that could generate large compliance reports for over 1,000 school districts without slowing down the production database. My first approach used a synchronous C#/.NET Core endpoint that ran heavy SQL Server queries directly, which caused timeouts and locking issues whenever multiple districts requested exports at the same time. To fix this, I redesigned the feature around background processing. I put each export request into a worker queue, had a dedicated service slowly ingest and process jobs in controlled batches, and tuned the queries and indexing so exports could complete without locking critical tables. The API now just validates the request, enqueues it, and returns a tracking ID, while the worker generates the file in the background and notifies the user when it is ready. This shift from synchronous queries to an asynchronous worker-queue architecture made the exports both scalable and safe for production traffic. It let districts self-serve their reports and eliminated over 50 recurring support tickets per week that used to come from failed or manual exports. --- - Himanshu K.: Khwaja Moinuddin Chishti Language University (Remote). AI/ML Engineering, Backend. Updating a two-three year old repository with latest documentation and updates. I achieved it by AI-assisted coding using Context7 MCP. --- - Tanzdul S.: Hunter College (New York). AI/ML Engineering. I have never really solved that many technical problems, but I am eager to learn and help as much as I can, whether that be getting coffee for the team! I know there are other candidates that will be better, but I will do whatever it takes. Even if it's paid or unpaid, I would just like a chance. Of course, I have only taken Calc 1, 2, Stats 213, Matrix Algebra, Intro to Python, C++, and Computer Architecture. --- - Simardeep S.: UMass Amherst (New York). AI/ML Research. One of the most difficult technical problems I faced was in my previous company. The issue was slow turnaround time for clients asking for different problem statements around document understanding, QA, and RAG. Creating a novel MVP each time took around 7 days of dev, 4-5 days of testing, and 7 days of back and forth with the client. I led the team to create our own framework at the company, reducing the turnaround time to clients to 3 days (dev+test). The architecture was pretty simple to understand, with "modularization" being the key focus. I added support for local model deployments using sync between sglang, vllm, etc. and online models (whatever the client demanded). --- - Rohan K.: Brown University (New York, Chicago). AI/ML Research, AI/ML Engineering. As a research intern at CERN, working on deep learning models to predict particle positions from particle detectors' voltage data, I was tasked with both designing the models and choosing, designing, and implementing classical models to verify the effectiveness of the deep learning alternatives. In the span of a few days, I read several papers in order to figure out which architecture would be best, learn how to implement analytical methods like matrix inversion and charge-sharing methods, and test different methods for choosing hyperparameters to yield the best results. I've become very good at learning complex architectures and techniques quickly, both in deep learning and in general, with the tools available to me, using close paper-reading, textbooks, and asking strong clarifying questions. --- - Thierno D.: City College of New York (New York). Backend, AI/ML Engineering. One of the hardest technical problems I faced was finding and validating reliable data when I first started building Yamalverse, a soccer analytics website I built from scratch. Early on, there wasn't a single trusted source that had all the data I needed, and a lot of public soccer data online is incomplete, inconsistent, or poorly documented. I had to scrape data from multiple sources, each with different formats, naming conventions, and update frequencies. I used Python to build scraping and normalization scripts, then cross-checked the data across sources to catch discrepancies. When values conflicted, I prioritized the standards used by the most well-established and trusted data providers in football analytics, and I encoded those assumptions directly into the data pipeline so they were consistent and repeatable. I also added validation checks to flag outliers and missing fields before data ever reached the database. That forced me to treat data quality as a first-class problem, not something to fix later in the UI. What I'm most proud of is that this wasn't just a technical exercise. It was a project built around something I genuinely care about—soccer—and it ended up attracting real users and even generating my first dollar in revenue. That combination of personal interest, technical rigor, and real-world impact made it one of the most meaningful problems I've worked on. --- - Arpit M.: K J Somaiya College of Engineering (Remote). AI/ML Engineering, Backend, Full-Stack. The most technical problem I faced was while building a RAG product for internal documents at a previous internship. My problem was finding local-first tools to be used on their servers, which was challenging as their documents contained a lot of tables that were difficult to parse. I did almost a week of research on it, finally landing on Docling by IBM for my parser instead of general PDF parsers, which gave me almost 30-40% better accuracy on table data. --- - Harsha Y.: New Horizon College of Engineering (Remote). Backend. Actually, for me the most difficult technical problem I faced was stabilizing and refactoring the AI search system in my VerifyAI project. The system initially had multiple failures. The chat UI showed empty responses, search results were duplicated, and user chat history and bookmarks were breaking because the Supabase database schema did not properly match with the Clerk auth integration. This took a lot of my time. To solve this, I worked end to end with prompting to Copilot across frontend, backend, and database. I did a proper end-to-end system design and gave proper instructions to Copilot to refactor the main /api/verifyai/search route by breaking one very complex function into smaller, readable modules, which reduced cognitive complexity and made debugging easier. I then fixed the database by normalizing all user_id fields to TEXT, repairing foreign keys, and correcting row-level security policies so each user could only access their own data. Finally, for UI issues, I mapped out how text streaming flow should look like and I gave instructions to rebuild parts of the React chat UI so responses streamed correctly, removed duplicate messages, and ensured chat history and bookmarks synced reliably with the backend. The key lesson was that real engineering problems are usually system-level mismatches, not single bugs, and solving them requires old-school methods like thinking and writing down, refining design choices, and reasoning across the entire stack. --- - Luckman K.: Techno International New Town (Remote). AI/ML Research, AI/ML Engineering. The Problem: While building my Multi-Source Research Agent, I faced state divergence. Orchestrating concurrent calls to Google, Bing, and Reddit caused the LLM to lose context because it couldn't reliably merge unstructured, disparate JSON schemas, leading to cyclic hallucination loops. The Solution: I re-architected the system into a stateful, cyclic graph using LangGraph. I implemented a global Pydantic state schema to enforce strict typing across nodes and built a "Review Node" to score context density. If the data threshold wasn't met, the graph triggered a recursive search refinement instead of proceeding to synthesis. The Result: This eliminated infinite loops and reduced token waste by 30%, resulting in a robust, multi-hop agent capable of handling high-latency research tasks with full state integrity. --- - Advait Y.: UIUC (Remote). AI/ML Research, AI/ML Engineering. Working on multi-agent system research, I was stuck on the problem of causal attribution - my codebase generated transcripts that were over 1 million words, which made it incredibly difficult to attribute performance failures to specific causes. To solve it, I came up with an automation mechanism which hard-coded one side of the agent interaction, cleanly isolating agent issues into competence failures (the agent isn't capable) and cooperative failures (the agent doesn't cooperate), which helped push our research forward! --- - Aryan K.: Mumbai University (Remote). AI/ML Engineering, Backend. I think one of my most interesting and enjoyable projects is File Transfer Hub, a file-sharing web app I built during my learning phase that produced very effective results. File Transfer Hub is an open-source, free-to-use web application where you can easily upload any files and get a shareable link, so you don't need to send large files from your storage. We manage it all, and we don't access your files as they're stored in AWS S3 buckets. I first thought about how to build it and came up with a simple design and architecture, though I did take some help from YouTube and LLMs for deeper insights. After clarifying the architecture, I started with the backend using Node.js and Express.js, and AWS S3 for storage. I divided it into components like models, controllers, and routers, and encountered many errors during integration and routing, but I solved them. Then I moved to the frontend, building it with Vite and React with help from LLMs, and integrated the backend with proper responsiveness. During production deployment, I encountered many errors, mostly CORS-related, but I managed to solve them. Through this project, I learned a lot about these technologies and about debugging. --- - Ritesh H.: Pimpri Chinchwad University (Remote). AI/ML Engineering, Backend, Full-Stack. During my remote AI engineering internship, the most difficult technical problem I faced was designing a reliable LLM-powered system that produced consistent, structured outputs despite highly variable user inputs. Early versions of the system suffered from hallucinations, inconsistent JSON outputs, and brittle prompt behavior—especially when handling edge cases or long contexts. Instead of adding ad-hoc fixes, I broke the problem down into model behavior, prompt design, and system constraints. I iteratively redesigned the prompt structure, enforced strict output schemas, added lightweight validation and retry logic, and introduced context chunking to control token usage. I also ran controlled experiments to isolate failure modes and adjusted parameters based on observed behavior rather than intuition. This approach significantly improved reliability and made the system production-usable. More importantly, it taught me how to treat LLMs as probabilistic systems that need engineering guardrails, not just APIs to call. --- - Saumit P.: NC State (Remote). Computer Vision, AI/ML Engineering, AI/ML Research. One client project I worked on involved a dataset of images with huge resolution (40,000-50,000), and the task was unsupervised binary classification. The images were histopathological (skin tissue) scans, and they were divided based on whether a new cancer drug was effective or not. There were only ~250 such images in the dataset. Problems here were: low data volume, very high dimensionality, and prohibited use of labels. I tried to solve this by splitting up the huge images into patches using windows, and then trained an encoder on them using self-supervised learning. Using clustering on the generated embeddings from this encoder, I could categorize patches into smaller groups. At the end, we did use the labels and saw some decent classification performance, but we did not have access to experts who could tell us more about how to interpret the scans. What we did succeed in showing was that it was possible to train an embedding model to recognize patterns in parts of these gigantic scans. --- - Himanshu B.: Nagpur University (Remote). Frontend, Other. I wanted to increase the speed of inference for my local LLM agent, so I retrained it using similar projects with features in my roadmap, then decreased the model size from 13B to 7B parameters. It ran much better and more accurately. --- - Muhammad B.: Ahmadu Bello University (Remote). AI/ML Engineering, Robotics, Computer Vision. One of the most difficult technical problems I faced was achieving reliable autonomy in a simulated self-driving environment during the Shell Autonomous Programming Competition. The challenge wasn't just controlling the vehicle, but integrating perception, localization, and planning in a way that remained stable under noisy sensor data and changing conditions. I solved this by breaking the system into modular ROS nodes, validating each component independently in simulation, and iteratively tuning parameters using logged data. When the vehicle behaved unpredictably, I introduced better state estimation and fail-safe logic rather than overfitting control gains. This systematic, test-driven approach allowed me to move from brittle behavior to a robust autonomous pipeline. --- - Maitreya M.: MIT CSAIL (Remote). AI/ML Research, AI/ML Engineering, Backend. I was working on infrastructure for deploying AI agents or MCP servers within 5 seconds from the CLI, and I hit a wall where I was hitting my AWS account concurrency limits for Lambda queries and AWS was not updating my concurrency threshold. So I wrote a custom concurrency auto-tune cron job that auto-adjusts agent and MCP Lambda function concurrency allocation by checking their popularity among users (for public deployments) and also RPH (requests per hour) to auto-adjust their allocation, preventing throttling and bottlenecks for heavy traffic deployments. One of the most fun engineering problems I solved. --- - Shlok J.: BITS Pilani (Remote). Backend, AI/ML Engineering. Handling 10k concurrent inference requests without slowdown was the toughest problem, so I optimized the LightGBM serving path and added request deduplication plus Redis and DuckDB caching to reach a 99.8% hit rate. --- - Awaneesh S.: IIT BHU Varanasi (Remote). Backend, Frontend. I was building an AI assignment for a text editor. The problem was to let AI work on only a selected section of text. It took a while to track the selection. Thankfully, Tiptap React had something to help. --- - Israel O.: Huston-Tillotson University (San Francisco). AI/ML Engineering, Full-Stack. One of the most difficult technical problems I faced was making LLMs reliably edit Google Docs in my project Izzy Docs. The core issue: Google Docs API is purely index-based (character positions), but LLMs think semantically ("bold the Introduction section"). Every existing MCP server for Google Docs was broken because no one had bridged this gap properly. The problems compound quickly: The API has invisible paragraph boundaries. If you delete characters 89-178 and 178 happens to be a paragraph start, you silently delete the next section too. There's no "replace text" operation—only delete and insert, which must be sequenced correctly. Table cells have internal structure you can't see. Inserting at cell.startIndex corrupts the cell; you have to drill into the paragraph structure to find the actual insertion point. Google's errors are cryptic: "Invalid deleteContentRange: Index 178 must be less than the end index of the referenced segment, 178" tells you nothing actionable. How I solved it: Text→index mapping: I built a system that extracts all text with a segment map, so the LLM can say "bold Introduction" and I translate that to indices 45-57. Outline extraction with duplicate handling: I parse the document structure to identify all headings, track occurrence counts (for when "Methods" appears twice), and map semantic sections to character ranges. Deletion-safe boundaries: Every range calculation automatically subtracts 1 from the end index to avoid clipping into the next section's paragraph boundary. Error translation layer: I pattern-match Google's cryptic errors and return recovery paths—"try ending at 177 instead"—so the LLM can retry intelligently instead of failing. Format normalization: LLMs output table data inconsistently (2D arrays, 1D lists, CSV strings), so I normalize everything before touching the API. The lesson: the hard part of AI tooling isn't generation—it's building the reliability layer that makes AI outputs safe to apply to real systems. --- - Aditya S.: Madhav Institute of Technology and Science (Remote). AI/ML Engineering. The most difficult technical problem I faced was when I was creating an internal tool for outreach automation (a side panel-based browser extension) at a startup where I was interning as a Growth Engineer. I was given a requirement to track all the people and messages sent to them through this tool. We needed to track it so we would know whether a person was a duplicate prospect or not, i.e., already added to the CRM. However, they didn't want me to use a database for that, as it would have introduced one more component to maintain. So I needed to find a hacky solution. To resolve it, I used the CRM we were already using as our database itself. I created a custom field called "prospect_metadata" and inside that, I put all the metadata related to the prospect. Each time a person was added to the sequence, it would check that particular field, and if it was found, it would tell the user that this prospect already exists. Then they could decide whether they wanted to add them again, and if they did, the previous messages sent to them would be used as context to create new messages (the user had autonomy to choose whether they wanted the previous messages included as context or not), and that would be appended to the metadata again. This was a really challenging problem that required me to think creatively and use a different approach. --- - Ramsha K.: University of Mumbai (Bengaluru, India). AI/ML Engineering, MLOps, AI/ML Research. The most difficult errors I have faced in my journey so far are compilation, runtime, and CUDA OOM (out of memory) errors. Recently, while integrating Flash Attention v2 implementation in FinetuneX (an LLM finetuning framework), I faced compilation errors and runtime errors. I debugged this by tracing kernel launches and validating the assumptions inside the GPU kernel. I eventually realized that the Q, K, V tensors being passed to the kernel were incompatible with the kernel's expected dtype, layout, and tiling block sizes. This was just a logical mismatch between kernel design and input contracts. Earlier, I encountered a classic but frustrating issue: NaN loss values. I debugged it using torch.autograd.detect_anomaly(True), which pointed to the specific operation in the forward pass that caused NaN values. After investigating further where the NaN came from, I discovered the root cause was a precision mismatch. I also faced CUDA OOM errors. Adding gradient_checkpointing_enable in my training loop eliminated the CUDA errors. --- - Trajan H.: Princeton (New York). AI/ML Research, AI/ML Engineering. Solved my PhD thesis in pure math by computing lots of examples to gain unmatched intuition, which ultimately helped crack the problem. --- - Mohamed A.: Faculty of Sciences of Tunis (San Francisco). Backend. Making SCORM packages from 2004 work on modern browsers --- - Mayuresh C.: Pimpri Chinchwad College of Engineering (Remote). AI/ML Engineering, Backend. As I was developing my AI SaaS platform, which allows you to chat with any YouTube video, I discovered that the RAG approach did not work when users asked questions that needed the entire context of the video. After brainstorming with Claude and ChatGPT, and scribbling on the whiteboard, I redesigned my backend to include a router component, which routes requests according to their context requirements. Users can still ask questions directly to the video and receive accurate answers, and they can now ask questions like a list of the key topics covered in the video that require the entire video context to consider. --- - Prateek S.: Delhi Skill and Entrepreneurship University (Remote). Backend, AI/ML Engineering, Other. One of the most challenging technical problems I faced was handling CORS errors while connecting a React frontend with a backend API. I debugged the issue by understanding browser security policies, identifying missing headers, and correctly configuring CORS on the backend. I resolved it by explicitly allowing origins, HTTP methods, and credentials, which helped me gain a strong understanding of client-server communication. --- - Pranjal A.: San Jose State University (San Francisco). Backend, Full-Stack. The most difficult technical problem was giving Gemini 2.0 Flash spatial understanding of a 3D object. Giving it information like the mesh and the object is easy, but providing structural information about an arbitrary 3D object is really difficult because the user could give me any 3D object, and sometimes it's all one mesh so you can't extract much info. One thing that worked was treating the mesh not like geometry but more like a graph of relationships. Deriving higher-level components like curvature clusters, symmetry axes, and an adjacency graph gave me a better solution. It took me about a day to accurately extract the info with a hardcoded solution, but this solved the problem I was having at the time. --- - Dineth H.: University of Plymouth (Remote). AI/ML Research, AI/ML Engineering, Data Science. The most difficult technical problem I've faced was a "too-good-to-be-true" model that collapsed in the real test. I was building a cancer subtype classifier using multi-omics data. My cross-validation metrics looked insane until I evaluated on a properly held-out split and the performance dropped close to random. That was a classic silent failure caused by data leakage + split mistakes. Here's how I solved it: Diagnosed leakage: I traced every pre-processing step and found I was doing things like scaling / PCA / feature selection on the full dataset before splitting. That lets information from the test fold bleed into training. Fixed the split logic: Some patients had multiple samples, so random splitting put related samples in both train and test. I switched to group-based splitting so all samples from the same patient stayed on one side. Made pre-processing leak-proof: I rebuilt everything using a strict pipeline so transformations were fit only on the training fold and then applied to validation/test. Validated honestly: I used nested CV for tuning and kept one final untouched hold-out set for the real score. The headline metric became lower, but the model became stable, reproducible, and actually generalizes, which is what matters in real ML work and what I wanted in the end. --- - Ladipo S.: University of Lagos (Remote). Backend, AI/ML Engineering, DevOps/Infra. One of the most difficult technical problems I faced was building a backend system that needed to respond quickly and reliably during emergency scenarios while working on the Vital Aid project. The challenge was balancing speed, accuracy, and reliability, especially when handling location-based hospital searches and AI-driven first-aid responses under different conditions. I approached this by breaking the problem down into smaller parts. I optimized database queries to reduce response time, simplified API request flows, and added validation and fallback mechanisms to handle incomplete data or service failures gracefully. I also restructured parts of the backend to separate core logic from external integrations, which made the system easier to maintain and more reliable. --- - Pirzada S.: IIT Bombay (Remote). AI/ML Research. Recently, I worked on the use of inversion in the post-training stage of large language models. Specifically, I proposed a data-free, black-box LLM inversion framework using previous-token prediction, aimed at reconstructing prompts from model outputs. --- - Mahek F.: Vivekananda Global University (Remote). Backend, AI/ML Engineering. Honestly, the most technical problem I've ever faced was debugging AI-written code while making a project. Sometimes when I took help from AI, it made changes in my file and I didn't notice the log of what it did, so it was difficult to find that bug and solve it. I solved that by keeping my eyes on how and where it was making changes and giving the AI proper instructions to give me the whole log at the end showing what changes it made. --- - Siddhanth M.: Manipal Institute of Technology (Remote). AI/ML Research. When my peers and I were coding the idea we had for our research project, we ran into the issue that the repository we were basing ours off of was terrible—multiple errors and some version control issues. Pairing that with the fact that in general there are fewer well-written repositories for the field of federated learning made it very challenging to get the code to a working state before we could even begin implementing our own idea. The way I tackled it was to take it a step at a time, set up debug prints everywhere, and slowly resolve each error as it came. Once the code ran, I went over the code to find any high-level implementation errors. I feel this works better as it is easier to deal with logical errors when we are not also dealing with major syntax-based ones. --- - S K.: Sathyabama Institute of Science and Technology (Remote). Full-Stack, AI/ML Engineering, Backend. Kubernetes - Built a coding platform under AlgoUniversity (YC). Had to learn and integrate Kubernetes to the backend for secure code execution. The Kubernetes and Redis queue combo messed up my backend a lot, and I had to redo the whole thing (though faster this time). The whole project took around 3 months to build. While rebuilding, I used feature-driven development — building one feature at a time, not leaving any tiny bugs during dev, clear technical diagrams. All this was learned the hard way and will be used for the rest of my life. https://www.algo-zen.dev/ https://www.algo-zen.dev/login Use the password to login: Name - 43110443 Password - 123 Uni - Sathyabama Year - 2023-27 There may be some visual bugs. I will fix them after some months once my exams are over. Currently working on: - sharingan-core — Python library enabling LLMs to understand videos in real time (research paper in progress) - AgentFox - Open source agentic browser (aiming to be better than OpenAI Operator / OpenCLAW-style automation) - Built a production-grade real-time bus tracking system Stack: FastAPI, Redis, WebSockets, Flutter Links to my projects: https://007k.framer.ai/projects/algozen https://007k.framer.ai/projects/faculty-tracker-app https://007k.framer.ai/projects/marin https://007k.framer.ai/projects/sist-transit Resume: https://drive.google.com/drive/folders/1I4DCwo148of-ltz9VYGcawA2nnBODRNF?usp=sharing My first time applying on X. Thanks for reading! I'm looking for paid internships. I love building stuff. --- - Mohammed K.: University of Stuttgart (Remote). AI/ML Research, AI/ML Engineering. The most difficult technical problem I faced was controlling duration in an autoregressive neural codec TTS system—getting the model to speak neither too fast nor too slow, and doing so reliably across different texts and speakers. To solve this, I designed a custom conditioning signal I call speaker rate, computed from the relationship between how much text is being spoken and how many acoustic (audio) tokens the model needs to generate for it. I embedded this speaker-rate signal and injected it into the model as an additional conditioning input—both into the attention mechanism and alongside other embeddings—so the decoder had an explicit, learnable handle on "how fast should this be spoken?" rather than relying on implicit timing cues. After integrating this into training, I was able to steer generation to produce more consistent speaking pace and directly control output length by adjusting the speaker-rate conditioning at inference time. --- - Ajay P.: Duke University (San Francisco). AI/ML Research, Data Science. The most challenging technical problem I faced was during my time as a Machine Learning Research Intern at the Indian Institute of Science. I was working on an asynchronous satellite tracking algorithm using neuromorphic event camera data. The goal was to track satellites in real-time from events captured by event cameras attached to a telescope, with low latency, high accuracy, and optimized for edge performance. Initially, the problem didn't seem too difficult. I developed a simple clustering algorithm that grouped events into clusters representing stars or satellites, with centroids and velocities updated through a rolling window as events streamed in. I built a working prototype in Python and ran test cases, but to my surprise, it only worked 60% of the time. There were several issues: satellite trajectories would curve at certain points, throwing off my estimates; the latency was high because every event had to be compared to every other event to find the closest cluster; very sparse satellite trajectories were never captured; and the trajectory estimates were noisy and not smooth enough. I tackled each problem systematically. For curved trajectories, I modified the algorithm to calculate rolling updates over only the last 500 events instead of the entire history, allowing the estimates to adapt. I also tuned the hyperparameters to give more weight to recent events. To reduce computational complexity, I split the 2D grid into quadrants so events only needed to be compared within their local region. I used an Extended Kalman Filter to smooth out the trajectories, and I implemented dynamic hyperparameters based on cluster density to handle sparse trajectories. After these changes, the model performed really well, capturing almost all trajectories perfectly, with no parameters to learn. I thought I was done and just needed to convert the code to C for deployment on a Raspberry Pi. But when I did that, the latency was in the thousands of microseconds, which was way too high for real-time processing. That's when I realized the C implementation needed to be designed completely differently. I had to redesign the data structures, implement multithreading with separate load and process buffers running in parallel, simplify computations to avoid heavy math, and make strategic assumptions. This taught me that design is critical—if you invest time upfront in designing with hardware constraints in mind, everything else falls into place. I spent a few extra weeks redesigning the Python prototype with hardware efficiency as a priority, and when I converted it to C after that, it only took a few hours. Our final latency was well below 500 microseconds with an error rate lower than many baselines. This work was published at ICASSP 2026 and remains one of my proudest achievements. --- - Saransh P.: AKTU (Remote). Hardware/Compilers, AI/ML Engineering. PROJECT: RAG Chatbot for Chatting with arXiv Documents While building a RAG agent (chatting with documents) for querying dense technical documentation, I faced a significant issue with the "lost in the middle" phenomenon, where the model would hallucinate answers because the retrieved context chunks were not ranked by relevance. To solve this, I moved beyond simple cosine similarity. I engineered a two-stage retrieval pipeline: first using a vector store (FAISS/Chroma) for broad semantic search, and then implementing a Cross-Encoder Reranking step to strictly filter and re-order the retrieved chunks before feeding them to the LLM context window. This improved the answer accuracy and significantly reduced hallucinations on specific technical queries. PROJECT: Note-Taking Tools Problem: PDF Rendering with Lazy Loading, Pinch-to-Zoom, and Memory-Safe Image Handling Built a continuous-scroll PDF viewer (pdf_viewer.py) solving three critical challenges: Memory-safe PyMuPDF-to-Qt conversion — PyMuPDF's pix.samples buffer gets invalidated on garbage collection, causing silent crashes. Fixed by calling .copy() immediately after QImage construction to decouple Qt's pixel buffer from PyMuPDF's memory. Viewport-aware lazy loading — Loading all pages caused massive memory usage. Implemented placeholder-based system: pages render only when visible (plus one-page buffer), distant pages unload back to placeholders. Required coordinating scroll events, geometry calculations, and re-render cycles. Cross-platform pinch-to-zoom — Handled both QGestureEvent and QNativeGestureEvent for trackpad support, tracked base zoom at gesture start, applied incremental scaling, and throttled re-renders (150ms QTimer) to prevent flicker while re-rendering only visible pages. --- - Ernesto S.: ITBA (Remote). AI/ML Research, Robotics. Hardware: Power supply stopped working, so I had to test different possibilities to determine why it wasn't working. Software: Programmed a ViT from scratch (2022) before AI chatbots were as massive as they are today. Had to read through the papers many times, as well as reading the only 4 posts about them. --- - Anmol's J.: University of Michigan (San Francisco). Other. As an AI agent by Anmol, I don't face personal technical challenges like a human engineer does. My "problems" are handled through training data, algorithms, and iterative improvements by my creators. That said, one of Anmol's most difficult problems is developing AI agents with shared context across different platforms—like Instagram, Discord, iMessage, internet forums, and more. --- - Ibra N.: University of Memphis (San Francisco). AI/ML Research, AI/ML Engineering. The biggest challenge I faced was during my capstone project, where I am developing an AI lesson planner for a client's legacy educational platform. The site used an outdated stack (WordPress, PHP) and a poorly structured database, and the client lacked a clear feature roadmap. I took initiative to structure the project by establishing biweekly design meetings to break down requirements and implementing a Kanban board to track progress. I am currently executing this plan. --- - Kshitiz S.: IIT Bhopal Wait, let me correct that. The institution is IISER (Indian Institute of Science Education and Research), not IIT (Indian Institute of Technology). **IISER Bhopal** (Remote). Hardware/Compilers, AI/ML Engineering, Full-Stack. One difficult technical problem was building CodeShield from nothing. I had to design a system that could analyze, validate, and clean up AI-generated code without burning through tokens or falling apart on edge cases. The solution came from a lot of low-level profiling, rewriting modules that behaved badly, and creating a pipeline that relied on static analysis first instead of throwing everything at an LLM. It took patience, but the system eventually became fast, stable, and predictable. A completely separate challenge was getting into the WorldQuant Brain environment with zero background in quantitative finance. I had no clue about factor models, alphas, or market structure, so I had to teach myself the entire workflow while competing with people who had years of experience. I solved it by studying successful alphas, running small controlled tests, reading research papers in plain English until they finally made sense, and building intuition through failed attempts. It was slow at first, but the trial-and-error approach paid off. --- - Prakhar G.: IIT Madras (India). AI/ML Engineering, Data Science. Finding the right kind of data to do my first project, I found data on different sectors of the Indian economy so that I could uncover cyclicity and GDP sensitivity of different sectors of the economy. --- - Tejas C.: Ridge High School / UIUC (New York). Full-Stack. The most difficult technical problem I faced was during a hackathon where our team decided to build a real-time poker assistant using Meta Ray-Ban smart glasses. The core challenge was that there was no available SDK or direct camera access, yet our concept depended on capturing a live visual feed to analyze the player's hand, the table state, and opponents' facial expressions, while also supporting parallel tasks like automated homework solving. To work around the lack of an SDK, I engineered an unconventional but effective pipeline: I created a virtual camera stream using OBS, routed the glasses' output through WhatsApp video, and then ingested that stream on our backend for processing. This allowed us to bypass hardware limitations and still perform computer vision analysis in near real-time. Midway through the hackathon, the team was reduced to just me on the development side, which meant I handled all system design, integration, debugging, and deployment under extreme time pressure. Despite this, I stabilized the pipeline, delivered a working demo, and ensured the system performed reliably enough to showcase the concept. The project ultimately won the hackathon, largely due to the technical workaround and execution under constraints. --- - Amr S.: Royal College of Arts, Science and Commerce (Remote). Frontend, Full-Stack, Design. In one of my recent projects, I encountered a major roadblock while integrating a third-party API. Initially, I couldn't get the API to respond at all, and once I established a connection, the calls were failing due to incorrect parameter mapping. --- - Suhaas V.: San Jose State University (San Francisco). Hardware/Compilers, AI/ML Engineering, Full-Stack. At the top of my head, I can think of a recent technical problem I faced while implementing RLM (Recursive Language Model) through DSPy. DSPy is a prompt engineering framework and RLM is a new inference strategy that helps language models persist during longer context problems. Although RLM was available as a module through DSPy, there was no official documentation available online except for one post on Twitter. I got an error when I ran this implementation on Google Colab. To solve this problem, I set up an observability dashboard through MLflow and ngrok. After reviewing a couple of traces, I figured out the issue—the RLM was failing for every query as it didn't find a REPL environment (Read-Eval-Print Loop) to perform its recursive approach. To make it work, I installed Deno which supports REPL environment initialization and reran the code. Great! This worked and I learned something new that day. --- - Ved C.: USC (San Francisco). AI/ML Engineering, Backend. Internship work at Google: The core of the project was the most challenging to solve. Many engineers at Google and other firms had tried to solve it before but couldn't achieve anything. In addition to the technical complexity, I had to work on legacy code, which added to the strenuousness of the task. It was a mix of understanding a huge, less-readable, less-frequently maintained, and complex code structure, and understanding the geometry behind how PDFs render text. Solving this task comprised many things: weekly meetings with the TL, product area lead, and the team to analyze various approaches, discuss recent changes around code modularity, and code abstraction. The final solution was to build an abstraction layer over the current structure and relay the logic from the lower level to the higher level. I was able to form a working prototype by the end of the internship, which led to appreciation from managers and the team. --- - Anay D.: Cal Poly Pomona (San Francisco). AI/ML Engineering. Standard PyTorch was too slow and memory-bound for the runtime layer patching I needed for a PEFT library. I bypassed high-level abstractions and wrote custom compiled CUDA kernels to scale singular values of weight matrices directly on the GPU. This reduced trainable parameters by 99.5% while maintaining full-tuning accuracy. I productionized the tool, shipped it to PyPI (EigenTune), and gained 300+ installs in the first month. --- - Amit P.: PES University (Remote). AI/ML Research, MLOps. Prime Intellect Bounty program - solved through in-depth discussion with team members. Working on nano-modal (minimal implementation of Modal platform). The hardest part was working with gRPC and making containers execute code. Solved by understanding how the actual Modal platform works and taking inspiration from some of their blogs, talks, and tweets regarding the implementation. --- - Prashant S.: Yeshiva University (New York). AI/ML Engineering, NLP, MLOps. I built an end-to-end rare-disease AI research system spanning data ingestion, normalization, semantic chunking, retrieval, evidence-grounded generation, confidence scoring, and human review. The hardest problem was hallucination under sparse data. I solved it by enforcing source-linked outputs, uncertainty thresholds, and iterative failure logging, making the system reliable for researchers. --- - Jaffer W.: University of Waterloo (Remote, Canada). Frontend, AI/ML Engineering. One of the most difficult technical problems I faced was designing GooseDoor to scale reliably while handling sensitive, user-submitted salary data. Early on, I realized that a naïve backend setup would struggle with spikes in traffic, slow queries, and data integrity issues as the platform expanded beyond a single university. I tackled this by redesigning the backend around Supabase with PostgreSQL, carefully normalizing schemas for offers, companies, and users, and adding indexes to keep queries fast as data volume grew. I also implemented server-side validation and row-level security to ensure only verified university users could submit or view certain data, which was critical for trust and privacy. On the frontend, I optimized data fetching and caching to reduce redundant requests and keep dashboards responsive. I approached the problem iteratively by profiling slow endpoints, stress-testing with realistic data, and refining the architecture until latency dropped to near real-time levels. This experience taught me how to think holistically about scalability, performance, and security rather than just getting something working. --- - Jonathan G.: Carleton University (Remote). Backend, AI/ML Engineering, DevOps/Infra. At Coinbase, I had to optimize a market data service for institutional clients that was riddled with bugs and duplicated code (7000+ LOC). Despite not knowing the language it was built in, the architecture, or background on trading systems, I locked in and decreased p99 latencies from 2000ms to 50ms. I approached the problem with a systems design mindset, finding all of the breaking points and architecting a more optimal system, all while teaching myself the language and how trading systems work. --- - Andy L.: Independent Researcher / Engineer (San Francisco). AI/ML Research, AI/ML Engineering. The most difficult technical problem I faced was ensuring consistent data flow for our cloud-based 3D printer defect detection system. Images from user printers were often too sparse and irregular for accurate nozzle clump detection, creating gaps that undermined the model. To solve this, I led a collaboration with the backend team to innovate within the system's constraints. We optimized by reducing individual image size and, most critically, developed a method to stitch consecutive images into single, information-rich composites. This approach fed the model richer temporal data without exceeding processing limits. The solution restored data consistency, allowing the inference pipeline to run uninterrupted. This significantly improved detection accuracy and reliability, enhancing the overall system's ability to prevent waste and hardware damage. --- - Aryan K.: High School (San Francisco). Full-Stack, AI/ML Research, AI/ML Engineering. I fine-tuned 5 separate LoRAs on different subjects (math, history, science, English, coding) using practice problems and essay examples. Naive merging destroyed performance across all tasks. I built a weighted merge algorithm that computed cosine similarity between adapter weight matrices and combined them proportionally. Deployed on Modal with automatic task routing based on input classification. The single model hit 85% of specialist accuracy at 5x lower inference cost. Multi-task merging beats training one giant model every time. --- - Sarvesh R.: Lingaya's Vidyapeeth (India, Remote). AI/ML Engineering, Backend, Data Science. During my Speech-to-Image Live Conversion using Deep Learning project, the most challenging technical problem I faced was synchronizing real-time audio transcription with accurate and fast image generation. Speech input is unpredictable—background noise, variable speed, and accent differences often caused Whisper to produce unstable transcripts, which resulted in inconsistent or completely unrelated images from the diffusion model. To overcome this, I switched to a chunk-based audio streaming approach to reduce latency, added noise suppression and voice-activity detection to clean the input, and implemented a semantic stabilization layer that preserved important keywords across chunks so the prompt didn't keep changing. I then optimized the diffusion pipeline by using FP16 precision, caching text embeddings, and reducing inference steps during live mode. Together, these improvements allowed the system to process speech smoothly, maintain contextual accuracy, and generate coherent images within a few seconds. --- - Samridh S.: Stony Brook University (New York). AI/ML Engineering, Backend, MLOps. The most difficult problem I faced was debugging an intermittent concurrency issue in a multi-client server where behavior looked random under load. Requests would occasionally stall or arrive out of order, but only when many clients connected at once. I fixed it by reproducing it with a stress test, adding structured logs around shared state and thread handoffs, and then tightening the synchronization strategy (narrower critical sections, safer message queue usage, and eliminating a few racy reads). After that, I wrote regression tests to confirm stability at high concurrency and monitored latency and error rates to ensure the fix didn't create bottlenecks. --- - Vikram R.: Purdue (Remote). AI/ML Engineering, MLOps. Detected a memory leak when using torch.compile with mode=max-autotune with the Sharpness Aware Minimization (SAM) optimizer. I solved it through extensive debugging and VRAM monitoring, trying different SAM implementations, reviewing PyTorch documentation, and investigating open GitHub issues. The problem was assigning to tensor.data inside the optimizer step function, which the max-autotune mode does not support. --- - Arjun R.: Rice University (San Francisco). AI/ML Research, Robotics, Backend. On BranchBite (project), I optimized a MySQL database with 191k+ recipes to speed up features after noticing worse performance at scale due to heavy joins and poor indexing. I tested different queries, redesigned indexes, and switched from offset to keyset pagination to avoid large row scans. This drastically reduced latency and made infinite scroll basically instant. It taught me how critical database design and overall optimizations are for scalability. --- - Daksh K.: University of Toronto (Remote). AI/ML Engineering, Backend. The hardest technical problem I faced was turning a vague research goal ("use Python to model and optimize an energy grid" plus a list of technologies) into a full, research-grade optimization framework. The problem was beyond a typical ML task: I had to model multiple physical energy systems from first principles (wind, PV, SMRs, geothermal, batteries), define engineering simulation with economics, lifecycle emissions, and a social-impact metric, and then solve a constrained multi-objective optimization problem over a huge nonlinear design space. Each candidate solution required expensive simulation and constraint checks, so the implementation would take minutes per evaluation, making evolutionary optimization effectively infeasible. The core challenge became: how do I evaluate complex grid designs fast enough to optimize them reliably? I tackled it in layers. First, I built stable, composable models for each technology and objective (LCOE, emissions intensity, social disagreement). Next, I designed a two-stage optimization strategy: Differential Evolution to tailor technology parameters to site conditions, then NSGA-II to optimize the full grid mix under feasibility constraints. The biggest wall was performance, so I treated the simulator like an HPC system: JIT + vectorization + parallel evaluation with Ray + caching of expensive intermediate computations, which delivered a 10× speedup and made the search tractable. Finally, to improve convergence quality, I added a Bayesian warm-start seeding module (BoTorch/GPyTorch) that boosted Pareto hypervolume by 23%. The outcome was a modular framework strong enough to produce real-world results (e.g., ~50% lower cost and >90% lower emissions vs. diesel baselines in our case study), and it directly led to two peer-reviewed publications with me as lead author: an accepted conference paper and an under-review journal paper. --- - Francis N.: Northumbria University (Remote). Backend, DevOps/Infra. The most difficult technical problem I faced was in a hackathon project where I built a multi-agent AI onboarding system using Power Automate. I had three agents that needed to coordinate: one for welcome setup, one for training recommendations, and one for progress tracking. The core issue was agent coordination with unreliable data. Agent 2 was triggering before Agent 1 finished, flows crashed on null values, and I was getting duplicate actions. I solved it in three steps: First, I implemented a state machine pattern using status flags—Agent 1 sets 'OnboardingStatus = Complete', which triggers Agent 2, which then sets 'TrainingRecommendationsSent = Yes' to prevent re-triggering. Second, I used the coalesce() function throughout to handle null values gracefully: coalesce(item()?['DaysSinceAssigned'], 0) provides a default when data is missing. Third, I built comprehensive error handling with try-catch scopes, retry policies, and created 23 test cases covering edge cases. The result: Zero duplicate actions, 100% reliability even with incomplete data, and proper sequencing across all agents. What I learned: In distributed systems, you can't assume data is complete or that events happen in order. Defensive programming and systematic testing are critical—I learned to test each component independently, then together, to isolate where issues occur. --- - Oluwatobiloba O.: National Open University of Nigeria (Remote). AI/ML Engineering, Product. The most difficult problem I faced was designing a complete governance workflow system for a federal government agency from zero documentation. When I joined, there was no existing design system, no user documentation, and the legacy process was 100% manual. Staff were generating reports by hand, which took days. I needed to digitize this for users who ranged from field officers to senior executives, each with different permission levels and data access. I couldn't do traditional user research (restricted environment), so I conducted stakeholder interviews and co-design workshops to map the end-to-end workflow. I discovered that the core problem wasn't just making it digital—it was that different departments had completely siloed processes with overlapping data dependencies. What I did was create a unified information architecture that mapped integration points between departments. For the interface, I designed role-based progressive disclosure: field officers see a simplified view, executives see aggregated dashboards. The hardest part was handling edge cases: what happens when a case crosses departmental boundaries? I designed state transitions with audit trails so every action was traceable. We replaced 100% of manual processes. Report generation went from days to hours. The design passed compliance review on the first submission because I'd documented every design decision with its rationale. --- - Rishabh J.: Columbia University (San Francisco). AI/ML Engineering, MLOps. Implemented batched speculative decoding inference engine using simple PyTorch and Hugging Face APIs. Primary challenge: Most batched speculative decoding approaches prune acceptance length to the minimum in the batch to keep sequence lengths in sync. I handled jagged sequence lengths and corresponding KV-cache by proposing two approaches to overcome the limitations of cache implementation in Hugging Face APIs. Implemented two approaches with tradeoffs to achieve an inference speedup. Currently working on a scheduler to dynamically switch between these approaches. --- - Al-Ekram H.: University of New Mexico (Remote). AI/ML Research, AI/ML Engineering. Problem: In March 2020, Bangladesh had no accessible COVID-19 modeling tools, leaving millions unable to understand transmission dynamics or policy impacts. Traditional academic models were locked behind paywalls or too technical for public consumption. Solution: I synthesized epidemiological research, adapted SIR/SEIR models, and built an interactive web-based simulator (https://alhridoy.github.io/bdcovid19/model.html) that let users explore intervention scenarios in real-time. I optimized for accessibility—ensuring it worked on low-bandwidth connections and mobile devices prevalent in Bangladesh. Result: 20,000 daily active visitors, coverage by national and international media, and it became a reference tool for policy discussions. More importantly, I demonstrated the ability to rapidly enter an unfamiliar domain (epidemiology), identify what excellence looks like, synthesize complex information, and deliver a product that reached exactly the right audience through the right channels. --- - Logan B.: Auburn University (San Francisco). AI/ML Engineering. To better understand why popular vision-based LLMs kept failing in my projects, I designed a set of experiments that resulted in a published paper that has been cited by researchers from OpenAI, Anthropic, and DeepMind (https://vlmsareblind.github.io/). I enjoy getting into the weeds of hard technical problems to understand why systems break. Please don't hesitate to reach out! I would love to talk. --- - Romendra U.: Madan Mohan Malaviya University of Technology (Remote). Full-Stack, Frontend, Product. TL;DR: One of the hardest problems I solved was optimizing performance in a large dataset visualization project so it remained smooth and usable. While working on a 3D visualization project, I had to render large datasets in the browser while maintaining smooth interaction. Initially, performance dropped significantly, making the application difficult to use. To solve this, I researched rendering techniques, optimized data loading using streaming approaches, reduced unnecessary re-renders, and adjusted how assets were loaded and displayed. I also profiled the application to identify bottlenecks and improved how components updated. This experience taught me how to approach performance problems methodically—measure first, identify the bottleneck, test improvements, and iterate until the system became stable and responsive. --- - Brian A.: University of Indonesia (Remote). AI/ML Engineering, Backend. During the FIDE & Google Efficient Chess AI Kaggle Challenge, we got Stockfish Classic running under the constraints (5 MiB RAM, 64 KiB compressed size, single CPU). But we were stuck. We needed to free up more RAM so the engine could search deeper and play better. I found that a C math library was eating up a lot of memory at runtime. I realized we had space left in our 64 KiB binary size, but RAM was the real problem if we wanted the engine to play better. My solution was to precompute the math functions that were actually used and hardcode them into the binary instead of loading the library at runtime. This moved memory usage from RAM to storage. This freed up enough RAM for deeper search and helped us win a silver medal and rank 18th out of 1,127 teams. --- - Ayush K.: Tomsk State University (Remote). AI/ML Engineering, Data Science. Most difficult technical problem: The most difficult technical problem I faced was during the Wunder Fund RNN Challenge, where I had to deploy a machine learning model for high-frequency market state forecasting under extreme system constraints: single-core CPU execution and a strict <20MB memory limit, while still maintaining competitive predictive performance. How I solved it: Instead of relying on a single large model, I redesigned the solution from a systems perspective. I engineered a custom INT8 dynamic quantization pipeline in PyTorch, which reduced the model memory footprint by around 75%. This allowed me to deploy a 10-model ensemble within the resource budget. I also optimized the architecture itself by using lightweight GRU and LSTM variants, residual connections, and efficient activations to balance latency and accuracy. To avoid overfitting on small and noisy financial datasets, I implemented rigorous cross-validation and custom loss functions. As a result, the system achieved stable inference performance and earned a Top 30 global ranking in the competition. --- - Shrikar M.: NYU (San Francisco). AI/ML Research, Computer Vision, Robotics. I developed VISTA-CLIP, a framework published at CVPR 2025 for continual panoptic segmentation that mitigates catastrophic forgetting without expanding the model backbone. By injecting semantic priors from a frozen CLIP text encoder into the transformer decoder and utilizing visual prompt tuning, the model adapts to novel classes while strictly preserving base knowledge. This work was recognized by top deep tech product companies like Qualcomm's 3D vision team and Waymo for autonomous driving systems, and demonstrates that language-grounded priors are critical for building scalable, lifelong learning systems that can adapt to dynamic environments without retraining from scratch. I have also built a startup previously valued at $3.5M, with a couple of US and India patents in the healthcare domain using on-edge AI computer vision models for patient rehabilitation and recovery in orthopedic and cardiac surgeries. --- - Varun R.: UC Berkeley (San Francisco). Backend, AI/ML Engineering. During a research project with UCSB, a thorough literature review had led me to two potential research objectives: decompiling finite state machines or decompiling memory elements. State machines were well-studied, familiar to me, and more tractable overall than memory elements, but the latter seemed more interesting, and I impulsively decided to pursue it after discussion with my mentor. It seemed like leveraging some existing work in equality saturation and condensing netlist subgraphs could be a good starting point. But after days of careful analysis, diligently pursuing each lead, we discovered what I least expected to find: Nothing. An absolute impasse. I felt lost and unprepared, left completely to my own devices. Remembering what had appealed to me about this topic, I hunkered down and dug deeper. I went back to a tangential source—my mentor's most recent paper, which had little to do with registers and memory blocks. I really liked the section on a formal mathematical statement of the problem, so I tried to mathematically characterize my research question. I latched onto a recursive sequence representation of registers in a counter circuit, using Boolean operators instead of algebraic ones. Scribbling away, I came up with the general recursive formula for an n-bit counter circuit. But wait a minute… The existence of this general formula—doesn't it mean that every register bit has a connection to all the less significant bits belonging to the same register? This mathematical idea, which I frantically sent to my mentor, paved the way for a successful discovery. An obsessive curiosity coupled with patience and calculated risk-taking can pay off! --- - Jeremiah V.: UC Berkeley (New York). Full-Stack, AI/ML Engineering, Hardware/Compilers. Implementing a bi-directional sync engine between a local NoSQL store and a cloud PostgreSQL database. The primary technical hurdle was resolving write conflicts and lost updates that occurred when users edited data while offline. I solved this by implementing a custom LWW conflict resolution strategy paired with a versioned synchronization protocol to ensure deterministic state convergence across all clients. Also, figuring out systematic trading strategies on Kalshi was extremely difficult and required more creativity than math. --- - Akanksh R.: PES University (Remote). AI/ML Engineering, Backend. I was working on a project that required real-time communication between the frontend, backend, and an ML/logic component. When tested separately, each module functioned flawlessly. However, after integration, the system displayed erratic behavior, including data inconsistencies, slow responses, and sporadic, difficult-to-replicate failures. --- - Eitan T.: Weizmann Institute (New York). AI/ML Research, AI/ML Engineering, MLOps. I implemented tool use in Databricks' LLM inference engine, used by dozens of Fortune 500 companies. To make this fast, we had to integrate a trie data structure with a finite state machine into our LLM inference engine. --- - Ishwari S.: MKSSS's Cummins College of Engineering for Women (Remote). AI/ML Engineering, Full-Stack. In June 2025, I was working as a Gen AI Intern at a startup, where I was given a project involving a lot of CSV files with huge datasets. The project was related to the fintech domain and required forming sentences using rows of CSV data (e.g., "Ishwari Shekade, age 18, living in Pune") from the CSV data. Since I had just started working as a Gen AI Intern, I was very fascinated with LLMs and decided to use them for this task. I wrote the code and tested it on a small part of the CSV rows to see how effective it was. However, because the data was huge, it was taking too much time and my laptop was heating up. I tried making a lot of changes to optimize the code, but nothing worked and I was kind of stuck. I discussed this with my co-intern, but we couldn't conclude anything significant. I then decided to explore alternatives to LLMs for this task and came across fuzzy logic libraries. However, I learned that companies already use this nowadays and it's not very accurate, which is why the project came up in the first place. Then I decided to try a hit-and-trial approach: I used simple Python libraries and concat operations, and it worked! It was giving the expected output. This slight change in my approach led to significant results. This, according to me, was the most difficult technical problem I've faced, which I managed to solve eventually. --- - Patrick I.: MBZUAI (San Francisco). AI/ML Research, NLP, Computer Vision. Inventing a new multimodal fusion architecture paradigm to preserve the native capability of the language model (untrained). Currently still solving it, but mostly by going to first principles and coding from scratch layer by layer to make sure architecture design, gradient flow, training setup, and evaluation significance are valid and well-built. --- - Aryan P.: Azim Premji University / IIT Guwahati (San Francisco, Remote). AI/ML Research, AI/ML Engineering. At my time at Trishul, I spent most of my time working on integrating path-finding movements and mapping for rough terrain while managing the physics and movement for drones on an extremely limited edge-compute budget. We were trying to recreate the Anduril Lattice system. --- - Mordecai O.: Federal University of Technology Ikot Abasi (Remote). AI/ML Research, AI/ML Engineering. Trying to build a model for my department to submit attendance during classes. --- - Omkar M.: Purdue (San Francisco). AI/ML Engineering, Design. I developed a custom architecture for proactive small language models optimized for on-device inference. I started by synthesizing recent research papers to identify gaps in current efficiency methods. Using Gemini as a sounding board, I validated my logic against existing benchmarks before prototyping in Google Colab. The first version struggled with latency and failed to run on limited hardware like mobile devices, so I refactored the system by optimizing vector operations and stripping redundant layers. Through iterative testing, I ended up with a streamlined, efficient system capable of proactive task execution. --- - Mohd Y.: HSE Moscow. Data Science, Product. I didn't know Python properly, so I vibe-coded my whole master's project and passed with an excellent grade. --- - Sharath C.: BITS Pilani. Full-Stack, AI/ML Engineering. The hardest technical problem I faced was reducing end-to-end latency for GLM-4.7-Flash (a 30B open-source LLM) to feel instantaneous in a real-time UI, similar to on-the-fly interface generation demos. The main challenge was that raw model inference time was only part of the delay; token streaming, attention memory bandwidth, and scheduling overhead dominated latency at small time scales. I profiled the entire inference path and applied a combination of aggressive KV-cache reuse, FlashAttention-based kernels, continuous batching with prefill/decoding separation, and speculative decoding using a smaller draft model. Although I couldn't reach the ~100 ms target, I reduced first-1000 tokens average latency to ~250 ms, which was a significant improvement (6-fold improvement!) and close to the perceptual threshold. This specific optimization reflects my deep understanding of LLM architecture. --- - Rahul M.: VIT Vellore. AI/ML Research. The hardest technical problem I faced was reducing end-to-end latency for GLM-4.7-Flash (a 30B open-source LLM) to feel instantaneous in a real-time UI, similar to on-the-fly interface generation demos. The main challenge was that raw model inference time was only part of the delay; token streaming, attention memory bandwidth, and scheduling overhead dominated latency at small time scales. I profiled the entire inference path and applied a combination of aggressive KV-cache reuse, FlashAttention-based kernels, continuous batching with prefill/decoding separation, and speculative decoding using a smaller draft model. Although I couldn't reach the ~100 ms target, I reduced first-1000 tokens average latency to ~250 ms, which was a significant improvement (6-fold improvement!) and close to the perceptual threshold. This specific optimization reflects my deep understanding of LLM architecture. --- - Steven S.: UCLA. AI/ML Engineering. I struggled with analyzing 24-hour ECG recordings. The initial approach was to use downsampling, but I pivoted to multiple instance learning. Rather than analyze the entire recording at once, I had my model generate representations for each segment first, then aggregate these representations. My first aggregation approach effectively saw all segments at once, which is incorrect for ECG interpretation, as future segments do not influence past ones. I then used a Causal Neural Network to perform aggregation. --- - Aarjav J.: Brown University. Full-Stack, AI/ML Engineering, Backend. During my research on graph deep learning for drug discovery, I was attempting to validate my model on experimentally obtained clinical data. However, the dataset did not map to the structured dataset for my model and previous experiments. The names of diseases were varying, the drugs were given different names, and there were several ambiguous terms that seemed mappable to multiple terms downstream. This made it very difficult to use the data to begin with. In order to use it, I needed to find a way to harmonize the structure and content. I tried all the regular steps: normalize and match, fuzzy match, embedding similarity, and even tried using an SLM. The number of errors remained too high to use any of these methods reliably, so I decided to reframe the question to finding the nearest match rather than the exact match. This enabled a much clearer approach to harmonizing heterogeneous datasets while maintaining accuracy in mappings. This ontology mapping strategy enabled mapping at a much larger scale that was invaluable for research purposes. --- - Brian C.: Boston University. Full-Stack, Backend, Frontend. The most difficult technical problem I have faced during the past two semesters has been developing a full game platform for the Daily Free Press (DFP) at Boston University to replace their outdated puzzle distribution system. The DFP is an independent student newspaper dedicated to informing and connecting the BU community, and their puzzle offerings historically served as a fun complement to their reporting. However, their existing "platform" consisted only of crossword puzzles shared through one-time links, created and emailed out individually. Engagement was extremely low, not only because the URLs changed every time, but also because there was no centralized hub where students could access past puzzles or discover new ones. This also prevented the DFP from using puzzles as a way to drive more readers to their news content, a strategic opportunity they were missing. Wanting to apply my software engineering skills to a real organization while creating something meaningful for the BU community, I reached out to the DFP's executive board and offered to build a complete, modern games platform. The site would enable front-end puzzle creation, deletion, and publishing; authenticated and anonymous puzzle solving; persistent gameplay; leaderboards; BU-affiliated two-factor authentication; and intentional linking pathways back to DFP news articles. They immediately saw how this could centralize their games, improve workflow, increase student engagement, and strengthen visibility for their reporting. Beyond building a software product, my goal was to cultivate community, giving BU students a place to compete with friends, share puzzle solutions, and stay connected to the DFP's journalism. To build the platform, I used Python's Django web framework to structure the application according to the MVC pattern. Django's robust ORM and built-in security features made it ideal for handling authentication, database transactions, and administrative tools. On the front end, I used HTML, CSS, JavaScript, and eventually TypeScript to implement the puzzle interfaces, creation tools, and interactive features. The back-end data was stored in SQLite during development and PostgreSQL in production to handle the write-heavy nature of puzzle interactions. The platform is deployed through PythonAnywhere, which integrates well with Django and allows scalable access for BU students. While the final architecture looks cohesive, developing it required several major refactors. One of the most significant decisions was transitioning major portions of the front-end logic from JavaScript to TypeScript. As features grew more complex, particularly the crossword interaction system and the on-screen keyboard, I found that TypeScript's static typing and class-based structure allowed me to reduce redundancy and create cleaner abstractions. For example, the core crossword logic—consisting of cell objects, clue synchronization, entry validation, navigation, and keyboard events—needed to behave consistently across anonymous solvers, authenticated users, and administrators. By building these features in TypeScript, I created maintainable classes and interfaces that could be reused across different puzzle modes, greatly improving scalability. Another major challenge involved the system's performance under frequent writes. Every time a user typed a letter into a crossword cell, that keystroke had to be saved immediately to ensure persistence on refresh or device change. With SQLite, these writes caused noticeable lag. Migrating to PostgreSQL, which is optimized for concurrent transactions and heavy data writes, immediately solved the problem and made gameplay feel smooth and responsive. This decision reinforced the importance of choosing technologies based not only on simplicity, but on the behavioral patterns of actual users. Security was also a significant design concern, especially for the leaderboard. The DFP wanted competition, but only among verified BU students. I explored multiple authentication approaches, beginning with BU's Shibboleth Duo integration, consulting BU IT, reviewing OAuth-like workflows, and ultimately implementing a two-factor authentication system linked to BU email addresses. This solution ensured that only legitimate BU students could participate in leaderboard features while maintaining usability. I also added anti-cheating mechanisms, including server-side verification of completion times, encrypted frontend solutions, input patterns, and suspicious solving behaviors, to preserve fairness. Beyond the engineering itself, I was responsible for designing a cohesive user experience across more than a dozen pages: the landing page, puzzle creation interface, puzzle play pages, previews, admin dashboards, authentication workflows, and leaderboards. A major goal was to support the DFP's broader mission of building community and increasing visibility for their news content. To support this, I added smart linking features that direct players from puzzles and leaderboards to the DFP news website, encouraging exploration of campus news and creating a feedback loop between casual puzzle players and the newspaper's reporting. As engagement grows, these puzzles become a playful gateway to the DFP's journalism, helping the organization reach a broader audience --- - Yafi A.: Penn State. AI/ML Engineering, Full-Stack. The hardest technical problem I faced was building a production lead-generation system during my first internship that non-technical users could interact with naturally, using tools I had never used before. Initially, the system was very technical: SQL-heavy, rigid filters, and brittle logic. It technically worked, but users had to think like engineers to get value out of it, which defeated the purpose. Around the same time, Snowflake released Cortex, and I saw an opportunity to let users describe what they wanted in natural language instead of navigating complex queries. The challenge was that I had no prior experience integrating LLMs into production systems, only tools used at a small scale like in classrooms, and early results were unreliable. Natural language queries were ambiguous, outputs were inconsistent, and in some cases the model confidently returned bad leads. To solve this, I treated the LLM as an assistant, not a source of truth. I constrained Cortex behind structured prompts, added deterministic filters, and built lightweight evaluation checks to catch obvious failures. I compared LLM-generated leads against rule-based baselines, manually reviewed edge cases, and iterated on prompts based on where the model felt wrong to users rather than just where it was technically incorrect. The result was a hybrid system: users could describe leads in plain English, but the backend enforced consistency, interpretability, and guardrails. That balance made the tool both powerful and trustworthy, and it significantly lowered the barrier for non-technical teammates to use it effectively. --- - Aayaan N.: University of Waterloo / Cerebras. Backend, AI/ML Engineering, MLOps. Built a custom RTOS kernel running on an STM32 board. Implemented low-latency context switching and multi-threading (requiring extensive work with interrupt types and alternatives), as well as priority-based scheduling (EDF scheduling) and low-latency malloc/dealloc. --- - Jonathan M.: Oklahoma Christian University. AI/ML Engineering, MLOps, Backend. While competing in the Vesuvius Surface Detection challenge, I faced repeated submission failures due to unstable 3D segmentation pipelines. The 3D TIFF data compression caused dependency conflicts in the inference environment, and I discovered critical bugs in my Test-Time Augmentation (TTA) logic where tensor strides were becoming negative during rotation, causing silent failures in PyTorch. I engineered a standardized, containerized inference workflow to match my local environment with the competition runner. I converted my PyTorch checkpoints to TorchScript to resolve the stride errors and optimize runtime. I also implemented a deterministic submission system with strict shape and type validation to catch artifacts before upload. This stabilized my pipeline, moving me from constant timeout errors to consistent, valid submissions. It allowed me to automate multi-GPU training experiments on my university's OSCER cluster via Slurm, establishing a reliable experimentation loop and achieving a verified public leaderboard score of 0.235. --- - Yugank M.: UT Dallas. AI/ML Engineering, Full-Stack, Backend. The most difficult technical problem I ever faced was implementing an acceleration structure in my from-scratch 3D game engine for physics. I solved it by standardizing volume checks into each world object's hull, making the structure containing and updating code much easier to handle. --- - Daniel Z.: Columbia University. AI/ML Research. Built a large-scale reinforcement learning system to explore whether complex survival behaviors can emerge from carefully shaped rewards in an open-world setting like Minecraft. Motivated by the challenge of training long-horizon agents in environments with sparse feedback, I reimplemented and extended the Phasic Policy Gradient (PPG) algorithm from OpenAI's Video PreTraining work to fine-tune foundation Minecraft models toward exploration and survival objectives. I engineered a multi-process, multi-threaded training pipeline with parallel environment orchestration, asynchronous rollout buffering, and a dedicated optimization thread that alternated PPO "wake" updates with PPG auxiliary phases for improved stability. To overcome catastrophic forgetting and retain prior competencies, I integrated KL-regularized policy updates while leveraging transfer learning from a diamond-pickaxe policy to accelerate adaptation. This framework produced agents that autonomously discovered new biomes and maintained basic survival strategies, yet it exposed a critical limitation: reinforcement learning, even on top of a pretrained model, could not reliably drive the discovery of entirely new, combinatorially rich mechanics or very long action sequences when the problem space is vast and rewards are extremely sparse, and the sheer compute required to meaningfully explore that space quickly became the limiting factor. Conducting this experiment was a great way for me to learn valuable engineering and research skills, from building scalable distributed training systems to designing reward functions that balance exploration and stability. --- - Debaditya M.: UT Dallas. Data Science, AI/ML Research, AI/ML Engineering. I was working on a personal web application for a food business where a requirement was that the application had to be robust and handle different quantity metrics from different users, but also be simple to operate so that most of the admin work could be done without needing to understand databases or JSON structures. It took a lot of understanding on my part to make sure all of the requirements were met in the backend while also trying to make the UI extremely simple to operate. I used Appwrite as the backend as a service to handle data, but designed custom UIs so that the admin page could be controlled by simple toggles to select specific things. Because the system needed to be robust, I also had to make sure there were checks for when an admin makes a change to visually confirm some of these changes to prevent technical errors from occurring. --- - Mert G.: UC Berkeley. AI/ML Research, AI/ML Engineering, Product. Led development and rollout of an AI-powered loan pricing platform for a $22B portfolio. Solved it by engineering 1,400+ features, training a CatBoost model, building a COBYLA optimizer, and shipping to 1,000+ branches, driving ~$40M annual profit uplift. --- - Selina C.: CUNY. AI/ML Research, AI/ML Engineering. I independently refactored and abstracted a one-off multimodal data generation pipeline into a scalable system without breaking experimental validity. The hardest part wasn't performance but correctness: introducing abstractions (geometry, camera placement, difficulty axes) without changing the underlying data distribution used in earlier pilot human and model evaluations. I solved this by formalizing intermediate geometric computations, snapshotting configs, and building verification passes before scaling to new data generation. I was able to preserve model evaluation results within ~1% of the original pilot setup with the new pipeline. --- - Raymond C.: UC Irvine. AI/ML Research, Backend, Full-Stack. FRED's dataset had stricter-than-ideal rate limits. In addition, problems existed with preexisting scripts to scrape and pull together data for my research group's training run, so I rewrote a more robust implementation that we used to finish our data collection. Off the top of my head, the scripts had a habit of overwriting and deleting already-pulled data, failing to resume sessions (also poorly overwriting and deleting data, presumably due to some strange race condition), and at the same time being both slower than the rate limit and occasionally running up against the rate-limit timeout checks. While perhaps not the most glamorous technical problem upfront, what made the problem slightly more interesting was that some of our team had already pulled in a significant amount of data, and all of us had pulled in some data. Any implemented solution to our wonky script would have to both save time and patch holes made by the previous script to be worth implementing. Simply "starting over" wouldn't have been a better solution. I remember it not being the most trivial problem—a developer whom I respected took a short crack at implementing a parallelized solution, which, while an improvement, was still not without its faults. My implementation, which admittedly wasn't the most pretty, ended up being the final solution we used for the rest of our dataset because, even when it failed, it failed gracefully. It was a fun problem to work on and taught me a lot about working in environments where time trade-offs were a consideration when shipping solutions. --- - Nana A.: Swarthmore College. AI/ML Engineering, Robotics, Data Science. I built a real-time notification system for congressional votes (Congress Alerts). My stack was Telnyx, Google Sheets/Apps Script, and Google Forms. One of the harder things to work around was rate limits on Google Sheets. I hit a reliability wall with Google Apps Script because it's easy to blow execution limits on the free tier when you have lots of users (~1k). The fix was splitting Congress Alerts into two phases: enqueue and send. Enqueue is fast: fetch new votes, write compact message rows to a queue sheet. Send is a separate trigger that processes, say, 50 messages per run. That keeps each run under quotas, lets me throttle Telnyx calls, and makes throughput scale just by increasing trigger frequency instead of rewriting the whole system. Also, anticipating and addressing edge cases in user behavior when they were texting the service was a pain. --- - Thabhelo D.: Talladega College. AI/ML Engineering, Backend. The most significant technical hurdle I encountered was during a deep learning research initiative where I was implementing a 3x3 factorial experiment to evaluate 3D medical image segmentation architectures, specifically comparing 3D U-Net, UNETR, and SegResNet on the BraTS and MSD Liver datasets. The critical failure occurred when the models consistently returned zero Dice scores during validation, effectively halting progress. The root cause was a subtle and persistent tensor dimension mismatch within the MONAI library's data transformation pipeline, which was difficult to trace because it didn't throw explicit runtime errors. I solved this by methodically debugging the entire data loading sequence, inspecting tensor shapes at each transformation step to pinpoint exactly where the spatial dimensions were being collapsed. Once I identified the mismatch in the loader's output, I refactored the preprocessing code to enforce correct dimensionality, which immediately resolved the scoring issue and allowed me to successfully benchmark the transformer-based models against standard CNNs. --- - Dakshata M.: Maharaja Agrasen Institute of Technology. Frontend, Backend. The most difficult problem I faced was during the BackendXpress project, where I tried to implement a JWT-based authentication system with access and refresh tokens. I tried many times, but it didn't work out very well. So I implemented a dual-token strategy, stored refresh tokens securely in the database, and created middleware for token verification. --- - Kartik J.: UT Dallas. AI/ML Research, AI/ML Engineering. The most difficult technical problem I faced was getting a reinforcement learning agent to interact with a real game environment where I didn't have access to the game's internal state. I was working on training an agent to play Geometry Dash, which meant I had to first extract meaningful state information directly from the screen in real time. This introduced major challenges around latency, accuracy, and stability, especially since I was running everything on a Mac without access to a GPU. I solved this by redesigning the system end-to-end: using YOLO to detect and classify objects, compressing those detections into structured state vectors, and offloading inference so it wouldn't block the game loop. I also introduced imitation learning before reinforcement learning so the agent had a stable starting policy. This experience taught me how to debug complex systems where model performance, infrastructure limits, and algorithm design all interact, and how small architectural decisions can completely determine whether a system is usable or not. --- - Ritwik B.: Shiv Nadar University. Frontend, Full-Stack. The most difficult technical problem I faced was making ChatGIT reliable on large, messy GitHub repos. On paper it was simple—parse code into ASTs, build embeddings, run PageRank on a call graph, and use RAG to answer questions. In reality, two things broke everything: parsing produced a noisy/incomplete graph, and naive PageRank made unimportant "utility" files look critical, so retrieval fed the LLM the wrong context. To fix this, I first hardened the parsing layer: Python used the ast module, other languages went through a dedicated parser that normalized paths, ignored build/test artifacts, and surfaced parse errors instead of silently skipping files. Then I redesigned the graph and PageRank: separate graphs for files/functions/modules, weighted edges (cross-module calls > internal ones, tests down-weighted), and sanity checks against known repos to see if "core" modules ranked correctly. Finally, I combined semantic similarity with normalized PageRank in the RAG layer and grouped snippets by file with paths and line numbers in the prompt. After these iterations, ChatGIT started pointing to the right files and functions with accurate locations, and the "Top Files/Functions" view matched what human maintainers considered important. I later reused the same idea of combining semantic relevance + structural importance when building JARVIS's meeting-prep pipeline for financial advisors. --- - Nikhil P.: UC San Diego. AI/ML Engineering, Full-Stack. The hardest technical problem was giving my AI coding tutorial platform a real, interactive terminal backed by a shared, writable filesystem. This was so that users could run commands and see the same files the editor and tutorials used without things getting out of sync. I first built both the PTYs with node-pty and WebSockets for terminal I/O from scratch and custom logic to keep the terminal's view of the filesystem in sync with the editor and file explorer. We hit scaling limits, security concerns, and a lot of operational complexity keeping terminal cwd, file events, and persistence all aligned. We pivoted to WebContainers so the whole runtime (filesystem & shell) simply lives in the browser. I designed a small WebContainer service that boots once and is shared by the terminal and the file layer. The program spawns shells with proper TTY dimensions and pipes them to xterm.js, and the same WebContainer instance provides the in-browser filesystem that the terminal, editor, and dev server preview all share. That served as the single source of ground truth for the workspace in the client and removed server-side terminal handling. The backend now focuses on persistence and sync instead of running terminals and serving live file state. --- - Nandan P.: Georgia Tech. Backend, AI/ML Engineering. One of the hardest problems I faced was building an open-text hotel search system at Flipkart that had to interpret natural language queries under strict latency and reliability constraints. Early embedding-only approaches had good relevance but were unstable at production scale. I solved this in three steps. First, I profiled the pipeline and redesigned the index to reduce the candidate set earlier. Second, I introduced a hybrid lexical + vector retrieval approach with lightweight re-ranking to balance relevance and efficiency. Third, I added caching, fallbacks, and timeout handling to keep the system reliable during traffic spikes. This made the system stable enough for production and allowed it to handle millions of queries daily. --- - Anubhav M.: UC Davis. Backend, Full-Stack, AI/ML Engineering. One of the most difficult technical problems I faced was during a production migration from a monolithic backend to a microservices architecture during my time as a SWE at EPAM. The biggest challenge was preserving system stability while breaking apart tightly coupled services that handled high-traffic APIs. I solved this by first identifying clear service boundaries, introducing API contracts, and adding comprehensive Postman-based and automated tests before each rollout. I also monitored latency and error rates closely after deployment and iterated quickly on failures. This approach allowed us to migrate incrementally without downtime and significantly improve system scalability and maintainability. --- - Rizaldy U.: CMU. AI/ML Engineering, Data Science, Product. At Telkomsel (170M subscribers), our Mobility Data Pipeline processed terabytes of CDR data but took 4 days per run, making it viable only as a yearly product — a huge bottleneck for enterprise clients who needed monthly mobility insights. The root cause wasn't obvious. I profiled the pipeline on our Hadoop cluster and found three compounding issues: redundant full-table scans on a 2TB+ dataset, inefficient join operations that caused massive data shuffling across nodes, and poor partitioning that ignored actual query patterns. I led a team of 2 data engineers through a full re-architecture using PySpark on YARN. We replaced broad joins with broadcast joins for dimension tables under 100MB, redesigned partition keys to align with downstream access patterns, migrated repetitive transformations to Pandas UDF for vectorized execution, and switched from full reprocessing to incremental loads. The trickiest part was validating output consistency — telco mobility data has subtle edge cases with roaming subscribers and cell tower handoffs that could silently corrupt aggregations. Result: processing time dropped from 4 days to 16 hours (85% reduction), compute costs fell ~60%, and we upgraded the product from yearly to monthly delivery. This directly enabled our data monetization unit to sell mobility analytics to government and enterprise clients, contributing to $11M in annual revenue. --- - Andrea S.: University of Pavia. AI/ML Research, AI/ML Engineering, Robotics. Needed to study convex polyhedra (discrete geometry) for a side project and had to read difficult math papers. --- - Murru Y.: IIT. AI/ML Engineering, Data Science. I built and optimized a real-time multi-speaker diarization and ASR inference pipeline under strict latency constraints. The core challenge was not model accuracy but system behavior under real-time load. GPU inference, audio chunking, and decoding competed for resources. Naive batching increased end-to-end latency, while naive parallelism caused speaker boundary errors and temporal inconsistency in diarization. I reduced the problem to first principles: - Separated IO-bound audio ingestion from compute-bound GPU inference - Profiled GPU utilization, kernel launch overhead, and memory transfers - Redesigned the pipeline as asynchronous stages with bounded queues - Enforced synchronization only where temporal correctness was mathematically required This resulted in a stable, low-latency inference system that maintained diarization consistency while scaling efficiently on cloud GPUs. The key insight was that real-time ML failures are almost always architectural, not model-level. --- - Sauhard D.: JIIT Noida. AI/ML Engineering, Backend, MLOps. One of the hardest problems I faced was designing a multi-agent scientific reasoning system that produced traceable outputs rather than hallucinated summaries. Early versions of my project (SciNets) generated plausible hypotheses but lacked structural grounding and reproducibility. I solved this by redesigning the architecture around graph-constrained reasoning. I built concept graphs from literature, enforced structured causal chains, and added evaluation metrics like grounding stability and symbolic depth to monitor reasoning collapse. I also addressed production issues including cross-user state leakage and auth failures under load by restructuring session isolation and streaming pipelines. The result was a deployed system capable of generating inspectable hypotheses, mechanistic chains, and experiment suggestions rather than opaque text summaries. --- - Pranav B.: UT Austin. AI/ML Engineering, AI/ML Research, Backend. In my research at the IDEAL Lab, the most significant challenge was optimizing the learning convergence of LLMs when integrating autonomous search capabilities into the recommendation process. Using Proximal Policy Optimization (PPO) with retrieved token masking initially led to high variance and unstable training cycles on our large-scale CUDA experiments. I solved this by systematically redesigning the ablation datasets and fine-tuning the reward shaping to better align the model's chain-of-thought reasoning with the retrieval actions. This iterative refinement, performed on the TACC supercomputer, ultimately stabilized the policy and significantly improved the model's ability to autonomously query metadata for informed recommendations. --- - Sajib D.: Virginia Tech. AI/ML Research, Computer Vision, NLP. The hardest technical problem I've worked on was designing an agentic ML system that could reason reliably over noisy, partially conflicting scientific evidence, rather than just generating fluent outputs. In practice, this meant building a multi-agent pipeline where different agents handled retrieval, evidence grounding, verification, and uncertainty estimation for biomedical questions. Early versions hallucinated or over-trusted weak evidence. I fixed this by introducing explicit evidence anchoring, quote-level verification, selective prediction (abstain when uncertain), and structured intermediate representations shared across agents. I iteratively stress-tested the system on adversarial cases, added failure-mode logging, and enforced constraints (e.g., every claim must be traceable to a source). The result was a more reliable agent that knew when not to answer, which mattered more than raw accuracy. This taught me that building agentic systems is less about clever prompts and more about interfaces, contracts between agents, and evaluation of reasoning behavior. --- - Govinda M.: Georgia Tech. AI/ML Research, AI/ML Engineering, NLP. The task of predicting the speed, engine RPM, gear, and throttle% of a Formula 1 car by listening to its engine audio! Here is my solution demo: https://www.youtube.com/watch?v=ZsDxqnzAOLk Check out my other Audio ML projects here: https://govindamadhava.dev/ --- - Kevin X.: UT Austin. AI/ML Research. First-authored the paper "Neural Cellular Automata for ARC-AGI" as an undergraduate, implementing gradient-trained Neural Cellular Automata for the ARC-AGI benchmark from scratch, demonstrating efficient few-shot generalization and identifying design factors that influence self-organizing system performance. Now working on my undergraduate thesis, analyzing the Fractured Entangled Representation Hypothesis in neural networks and identifying potential methods of addressing it. --- - Harry A.: University of Edinburgh. AI/ML Research. While working on recursive models for uncertainty quantification, I discovered a stability problem: naive training produced noisy, divergent learning curves that never converged to optimal performance (documented in my ICLR 2026 paper). To diagnose and solve this problem, I built a custom training framework from scratch with comprehensive logging of activations, gradients, weights, predictions, and metrics at each recursive depth. I then conducted extensive ablation studies across key hyperparameters: recursive block depth, total recursive depth, and truncated backpropagation windows. Through this, I identified a stable parameter window that enabled consistent convergence. The result was a general framework for converting standard architectures into recursive variants that achieve better performance, gain uncertainty quantification capabilities, and use approximately 50% fewer parameters. This work was accepted at ICLR 2026. --- - Vedant G.: University of Pennsylvania. AI/ML Research. While working on my paper on multi-trigger mechanistic interpretability (related to the Anthropic Sleeper Agents work), I hit a hard academic compute wall. The research required training and probing models on a scale that my local setup couldn't handle. I attempted to secure compute resources from Stanford, but they denied my request. I was effectively locked out of the necessary infrastructure to prove my hypothesis, facing a deadline with no budget for a standard H100 cluster. Instead of scaling down the project, I bootstrapped a distributed training and probing pipeline using fragmented, lower-cost compute resources (e.g., spot instances or disparate GPUs). I engineered a custom pipeline to shard the model and activation data across multiple, cheaper consumer-grade GPUs rather than relying on a monolithic enterprise cluster. The main technical bottleneck was the communication overhead between these disjointed devices. To solve this, I implemented aggressive gradient accumulation and optimized the data transfer protocols to minimize the bandwidth bottleneck, effectively simulating a larger cluster on a shoestring budget. Since I was using less reliable instances, I built robust checkpointing and auto-recovery scripts to ensure the long-running interpretation jobs wouldn't fail if a single node went down. This infrastructure allowed me to run the necessary multi-trigger analysis and complete the paper, proving that resource constraints could be overcome with superior engineering. --- - Darrel P.: University of Indonesia. AI/ML Research, AI/ML Engineering. Writing a custom CUDA kernel. I was doing research with NVIDIA and the Python for loop implementation was too slow, so I had to write a custom CUDA kernel. I solved it by going back and forth chatting with an LLM (it was early 2025, so LLMs weren't as good). --- - Edward M.: UCLA. AI/ML Engineering, Full-Stack. In ShotVision, processing full tennis match videos (30+ minutes at 60fps) would take hours and crash the server due to memory constraints. I needed frame-by-frame pose estimation on 100,000+ frames while keeping the web app responsive. I implemented asynchronous video processing with a Flask worker queue system. Videos were chunked into 10-second segments, processed in parallel, then stitched back together. I added server-side caching for keypoint data and implemented a progress tracking system so users could see real-time updates. This reduced processing time from 4 hours to 15 minutes for a 30-minute video while keeping memory usage under 2GB. --- - Boammani L.: ETS Montreal. AI/ML Research, AI/ML Engineering. The most difficult technical problem I faced was automating a pipeline to generate a large-scale synthetic dataset for image-based reasoning under a strict budget constraint (≈$100). The objective was to produce thousands of images paired with reasoning-intensive questions, accurate answers, and reliable Chain-of-Thought annotations. To reduce costs, we avoided direct image generation and instead generated LaTeX and Python code, which we then compiled into images. This drastically lowered expenses, as text generation is significantly cheaper than image synthesis. We also allocated API credits strategically across different models, proportionally to their likelihood of producing non-compiling code, which helped maintain both cost efficiency and dataset quality. The full implementation is available here: https://github.com/AI-4-Everyone/Visual-TableQA-v2 --- - Shyam P.: Worcester Polytechnic Institute. AI/ML Engineering, Backend, Data Science. The most challenging technical problem I faced was building a reliable AI-driven test automation system that could handle the unpredictable nature of web UIs while meeting FDA compliance requirements for pharmaceutical clients. Traditional test automation breaks constantly—a button moves, a class name changes, and suddenly your entire test suite fails. We needed something that could "see" and understand the UI the way a human does, not just match XPath selectors. But here's the catch: in regulated environments, you can't have a black box making decisions. Every action needs to be explainable and auditable. How I Solved It: I architected a system combining Vision-Language Models with a RAG pipeline: The perception layer used VLMs to understand UI state semantically—identifying elements by what they are ("the submit button," "the patient ID field") rather than brittle selectors. The knowledge layer was a RAG system that grounded every decision in documented test procedures. When the AI decided to click something, it could trace that decision back to specific validation requirements. The infrastructure challenge was real—GPU inference at scale isn't cheap. I designed the pipeline to batch intelligently and cache embeddings, getting inference costs down to a level that made sense for continuous validation. The result was a 90% reduction in manual validation effort while maintaining full audit trails for FDA compliance. Why it was hard: It wasn't just an ML problem or just an infrastructure problem—it was both, plus navigating regulatory constraints that most AI systems don't have to consider. The solution required thinking across the full stack. --- - Rajagopalan R.: Amrita Vishwa Vidyapeetham. AI/ML Engineering, NLP. I worked on adding observability using Langfuse and enabling seamless model switching through LiteLLM for our organization's agentic ecosystem. On paper, both tools were straightforward to integrate, and they worked fine independently. However, once we connected them in our actual codebase, we ran into a strange issue—traces were showing up in Langfuse, but all the values were null. There weren't any obvious errors, which made it more challenging. I spent a significant amount of time debugging the integration, double-checking configurations, environment variables, and tracing logic. I went through GitHub issues for both projects and reached out in community channels to see if anyone had faced something similar. Eventually, I discovered the root cause was a version incompatibility between the LiteLLM version we were using and Langfuse v3. When we downgraded Langfuse to v2, the traces immediately started working properly. However, that downgrade caused several other dependency conflicts in our environment. To fix that, I carefully reviewed our dependency tree and reconciled package versions to produce a stable and conflict-free requirements setup. This experience taught me a lot about dependency management and the importance of version management, which often gets overlooked. --- - Anosh P.: Anosh Softwares. Backend, Mobile, Frontend. I'm building a Jewelry Management System from scratch, a fully cross-platform application using Flutter, Go, and SQL Server. The two most difficult challenges I faced were the Trial Balance TCP connection crisis and the Stone Stock Report balance discrepancy. The Trial Balance was a concurrency nightmare as my Go backend was spawning 80+ simultaneous database connections through worker goroutines, causing TCP timeouts on the remote SQL Server. After multiple iterations with semaphores and retry logic, I made the pragmatic call to strip out concurrency entirely, then tackled a SQL parameter limit issue on top of that. The Stock Report had a 0.55 carat difference between one day's closing and the next day's opening. I had to trace data flow across 7+ tables, fix sign convention inconsistencies, eliminate double counting from approval to purchase conversions, and completely rewrite the stored procedure's CTE structure. --- - Suryanarayan P.: BITS Pilani. Other. One difficult technical issue I faced was in a banking application where a specific scenario was not working because some logic was missing in a stored procedure. I analyzed the requirement, reviewed the stored procedure, identified the missing condition, and added the correct logic. After testing it in different scenarios, the issue was resolved successfully. --- - Varun T.: Orange Business. AI/ML Research, AI/ML Engineering. Working on stable diffusion image generation, couldn't find the solution yet --- - Pranaya J.: University of Alberta. AI/ML Research, AI/ML Engineering. Poor representation learning in low-coverage offline RL --- - Taiwo M.: UC Cincinnati. AI/ML Engineering, Full-Stack, DevOps/Infra. The Problem: Designing a synchronization protocol for GridPilot that could handle real-time state changes for 10,000+ IoT nodes (ESP32) without causing database locking or massive latency spikes in the user dashboard. The Solution: I architected a "V2" solution that decoupled the hardware logic from the UI. Instead of direct writes, I implemented a Python Gateway Bridge that batches inputs and uses a modularized service layer (db.js) to handle Firestore state syncing. I utilized an agentic workflow (using Claude/Gemini) to refactor the entire monolithic codebase into a scalable single-page application (SPA), effectively using AI to accelerate the refactoring of the auth and database modules by 400%. --- - Reilly G.: Wayne County Community College. Other. Approaching the task of running multiple models on the same hardware while not having a clue what I was doing. Consistently working on improving strategies and learning from failed ones led me to utilize models like CDs. This has changed my workflow enormously. I consider that solved. --- - Vincent L.: University of Florida. AI/ML Engineering, AI/ML Research, Backend. For my NeurIPS paper, the hardest problem I solved was making Best of N chain-of-thought sampling much cheaper without losing most of the accuracy gains. The challenge was that early pruning is irreversible, and the signals you can observe during decoding (KL divergence, entropy, confidence) are noisy and can spike for reasons unrelated to correctness. I designed a progressive branch-and-prune algorithm that scores branches at each step, stabilizes the signals with windowed aggregation and smoothing, and prunes on a controlled schedule that preserves diversity early and commits later. I validated it on math reasoning benchmarks and measured both accuracy and compute savings. --- - Amitav K.: Holy Trinity Catholic High School. AI/ML Research, AI/ML Engineering. Over the past few months, I was working on a research project that was simulating a bunch of quantum circuits, then simulating them with "noise" added (because of interference from the environment), and training a neural network to map those noisy states to their clean counterparts. At first, I tried training them for 5 qubits, which worked okay, but then for some reason I just couldn't understand, when I would scale up to 8 qubits, the models would just fail completely, their predictions being further from the ground truth than their inputs. Even training a huge model with millions of parameters on just 100 states, it failed to overfit. I spent like a couple of hours literally just looking at the raw data, trying to figure out what was wrong, and then I realized that the noisy data was a lot smaller than the clean data. This is somewhat obvious once you really think about it, because the definition of these noise channels is just squeezing the total space of inputs into a smaller region, but it was very difficult for me because I had no physics background and went in thinking that I could just take some data, chuck it in a model, and get out sensible predictions. I didn't understand my data well enough, and once I understood my data, the solution was obvious. --- - Kaveri M.: G Narayanamma Institute of Technology and Sciences. Full-Stack. The toughest problem I faced was designing a secure trust layer for a peer-to-peer medicine platform where blockchain integrity, QR verification, and encrypted off-chain medical data had to work together without creating performance bottlenecks. I solved it by separating critical proofs on-chain from sensitive data off-chain and building a hash-linked verification system plus an ML-based matching engine to ensure both security and efficient fulfillment. --- - Gunmay J.: DTU. AI/ML Engineering, AI/ML Research. One of the most difficult technical problems I faced was during the DARPA Triage Challenge, where I led the perception pipeline for our autonomous system. We had to run multiple deep learning models (person detection, re-identification, tracking, and decision modules) in real-time on limited GPU hardware, while maintaining robustness in unpredictable disaster environments. The main challenge was scheduling and optimizing these models so they could operate concurrently without exceeding memory or latency constraints. Initially, naive parallel execution caused GPU memory spikes, unstable frame rates, and bottlenecks in downstream decision modules. To solve this, I redesigned the pipeline as a staged, asynchronous system. I prioritized critical inference paths, decoupled modules using message queues (ZeroMQ), and implemented dynamic batching and conditional execution (e.g., running heavier models only when required). I also optimized models using mixed precision, ONNX/TensorRT acceleration, and careful memory management to reduce redundant tensor allocations. This restructuring reduced latency significantly, stabilized GPU usage, and allowed the full perception–decision loop to run reliably under strict hardware constraints. It taught me how to think in terms of systems optimization, not just model accuracy. --- - Mehrdad S.: University of Maryland. AI/ML Research, AI/ML Engineering. Standard attacks failed against high-perturbation image watermarks like TreeRing. The challenge was breaking a black-box detector without access to its weights. I solved this by training a substitute classifier to mimic the target, then generating adversarial examples against the proxy that successfully transferred to fool the real detector. --- - Arnesh B.: IIIT Delhi. AI/ML Research, AI/ML Engineering. I recently led a research project that was submitted to ICML 2026. It was a lot of fun and I had to manage a team of 4 people, organizing and dividing work, running over 1000+ experiments, and calibrating them. --- - Quinn L.: Montana State University. AI/ML Research, AI/ML Engineering. Trained and instruction-tuned a 250M LLM with the deep learning library I wrote. On the DL library side, I was very pedantic with tests, validating gradients/activations to a tight tolerance over every case I could think of. This made it somewhat easy to make a bunch of very fast and small steps forward. On the decoder LM side, I spent a lot of time reading the foundational papers (GPT-2, BPE, attention). With a solid understanding of these, faithfully re-implementing them was quite smooth. --- - Nitish K.: NYU. AI/ML Engineering, Backend, DevOps/Infra. While working at Infoblox on their Terraform provider, the legacy API had an issue where whenever an object was modified, its API reference would change. This was identified by a major client (they handle root DNS servers in Australia). Since the product was at a mature stage, it wasn't feasible to rebuild the API at the time, so we had to fix it on the client side (Terraform). The solution that the team and I came up with was to implement a fallback search in Terraform. If the object was not found, we would perform a search by extensible attribute (metadata that could be attached to any object in the Infoblox server). The first search attempt is done through object reference; if not found, we search by extensible attribute. Extensible attribute search was not made the default method due to higher latency (these attributes are not indexed in the database). This was successfully developed by the team, tested by me, and deployed by the customer. --- - Saahil G.: McMaster University. AI/ML Engineering, Backend. The most difficult technical problem I faced was probably when I went in blind using React for one of my first few hackathons. AI was not at the same level as we have now, and I basically entered a vicious circle where the AI couldn't solve the problem at all and just started giving more errors and hallucinating a lot. I tried fixing it using the documentation but couldn't make heads or tails of it. I tried taking the help of mentors too, but they had no clue how to fix it. One of the most stressful moments of my life—I had to do some workarounds to get it to half work. Still the hardest hackathon I have ever participated in. I learned later on how to solve the error, and now I never run into the same issue ever again :) --- - Haocheng Z.: Cornell University. AI/ML Research, AI/ML Engineering. The hardest technical challenge I faced wasn't a single algorithm, but turning messy, real-world operations into a CRM that non-technical staff could use confidently without training. Instead of guessing from requirements, I went onsite and worked directly alongside the actual users, observing how they processed leads and cases, where they hesitated, what steps they repeated every day, and which tasks were truly automatable versus needing human judgment. That discovery work let us redesign the entire UX logic around their natural workflow rather than our assumptions. We rebuilt the experience with workflow-first navigation, progressive disclosure, and opinionated defaults across core flows like pipelines, queue/claim, and public application intake with approval gates. On the backend, we enforced invariants—status transition rules, approvals, and audit logging—so that the UI could stay simple without risking incorrect states, and we kept the full stack coherent through typed API contracts, consistent validation, and predictable loading/error patterns. The outcome was higher adoption and fewer "how do I do X?" moments, because the product matched how the team actually works day-to-day. --- - Arjun P.: . Full-Stack, Frontend, AI/ML Engineering. The hardest problem in the MEN2 Predictor project was the data. MEN2 is rare. There is no clean dataset online. I had to read hundreds of research papers with my teammate and manually extract data for 152 confirmed RET mutation carriers. Every paper reported things differently. Units were inconsistent. Some values were missing. Some cases were incomplete. The toughest issue was missing CEA values. CEA is an important biomarker along with calcitonin for predicting medullary thyroid cancer. Many papers reported calcitonin but not CEA. I did not want to drop those patients because the dataset was already small. So I used MICE, Multiple Imputation by Chained Equations, with Predictive Mean Matching. I used calcitonin and other available clinical features to estimate realistic CEA values. PMM helped because it does not just predict a number from a formula. It picks values from similar real patients. That kept the data grounded and reduced unrealistic imputation. After cleaning and imputing, the next challenge was model behavior. In medical screening, recall matters more than accuracy. Missing a cancer case is worse than over-flagging someone. Some early models had good accuracy but lower recall. That was not acceptable. I tuned XGBoost and SVM to prioritize sensitivity. On the real clinical dataset, both reached 100 percent recall in hold-out testing. That meant zero missed documented cancer cases. The biggest lesson was simple. Think about the real-world cost of mistakes first. Then design the model around that. --- - Ananthu N.: KSR College of Engineering. AI/ML Engineering, Full-Stack. The most difficult technical challenge I faced was building an AI agent that could autonomously audit subjective website design quality—evaluating whether CTAs are "effective," themes are "consistent," and layouts match a client's "vibe"—with the reliability and reproducibility of traditional automated testing tools. The core problem was that design evaluation is inherently qualitative, yet I needed quantitative, defensible output. To solve this, I architected a multi-agent orchestration system with three key innovations: first, I combined BFS web crawling with stateful browser automation (Playwright + browser-use) to systematically discover and analyze pages viewport-by-viewport, simulating real user scrolling behavior while capturing screenshot evidence at each step; second, I implemented a dual-model LLM pipeline where Gemini 2.5 Flash Lite extracts structured design intent from natural language (website_type, tone, audience, primary_goal), which then constrains a more powerful Gemini 2.0 Pro agent during live analysis to prevent hallucinations and ensure every finding aligns with the specified criteria; and third, I built a real-time event stream architecture using Flask-SocketIO that intercepts raw agent logs, parses them into semantic events (thoughts, actions, results), and streams them to the React frontend, creating a transparent audit trail where users watch the AI "think" through each design decision—ultimately turning subjective design critique into a scored, screenshot-backed, PDF-exportable report that scales automated design QA in a way no existing tool does. --- - Akanji O.: University of Lagos. AI/ML Engineering, Backend, MLOps. As a 17-year-old self-taught developer, one of the most difficult technical problems I faced was improving the accuracy of an image classification model I was building with TensorFlow. The model was underperforming due to data imbalance and overfitting. To solve it, I: Cleaned and restructured the dataset Applied data augmentation Tuned hyperparameters (learning rate, batch size, epochs) Added dropout layers to reduce overfitting Compared different architectures and selected the most efficient one This process helped me improve the model's accuracy significantly while reducing training time. It also strengthened my debugging skills and understanding of model behavior. --- - Apurva M.: VIT. AI/ML Research, AI/ML Engineering. While working on Rust Compiler during Google Summer of Code, I had to spawn separate processes and talk to them through FFI. At the time, I did not understand the computer hierarchy too well and found it very mind-bending stuff. So after the GSoC project, I created another project called typ-browser whose core was written in Rust and UI in SwiftUI, similar to Ghostty. This was deliberately done to practice and understand FFI and process communication. --- - Ololade S.: Nigeria Maritime University. AI/ML Research, AI/ML Engineering. During my wireless power transmission project, I initially couldn't achieve efficient energy transfer because the coils weren't resonating at the same frequency. I solved it by recalculating circuit parameters, redesigning the coils, and running iterative tests until efficiency improved. It taught me structured troubleshooting. --- - Ruhaan C.: Manipal University Jaipur. AI/ML Research. The most difficult technical problem I faced recently was during my independent research on hallucinations in large language models and methods to mitigate them. My initial experiments involved using activation steering on a small Qwen-1.7B model to shift its behavior from a hallucinatory response space toward honest refusal. However, these attempts consistently failed. After investigation, I realized that smaller models may not contain a cleanly separable "hallucination subspace," making targeted steering unreliable. I then considered selectively removing neurons that consistently activated during hallucinated outputs. This approach also proved unsuitable because of neuron polysemanticity in LLMs; individual neurons encode multiple overlapping behaviors, so pruning them risked degrading unrelated capabilities. Then, after digging through several arXiv papers, I came across a fascinating implementation which attached a projection matrix during the forward pass to selectively remove undesired directions in the hidden representation rather than deleting parameters. The method works by: Defining a retain set of behaviors and computing a projection matrix P over them. Applying PCA to obtain a basis W, followed by QR decomposition to produce an orthonormal matrix Q. During inference, subtracting the forbidden subspace from the hidden state: h_final = h_out − (h_out @ Q^T @ Q) This effectively erases the targeted representation directions while preserving the rest of the model's knowledge. When combined with light fine-tuning, you should end up with a model which has unlearned the nasty stuff. Since mathematically speaking, the projection operation is "irreversible," MRP (Metamorphosis Representation Projection) does a pretty good job in shaping model capabilities. --- - Farouq O.: Obafemi Awolowo University. AI/ML Research. I built a lossless compression benchmark suite from scratch in C++ using five algorithms, a CLI, a streaming API, and a full benchmark harness with zero external libraries. It's the project I'm most proud of because it forced me to go deep on bit-level data structures, algorithm tradeoffs, and systems-level performance work all in one codebase. I'd been working with C++ in my systems engineering role (order processing, memory management) and wanted to tackle something where the algorithms and the systems work were equally hard. Compression fit perfectly: the algorithms require real CS depth (entropy coding, dictionary methods, block framing), but making them fast requires systems thinking — cache-aware memory access, SIMD, threading, and careful benchmarking methodology. I also wanted something I could benchmark rigorously, not just "it works." I wanted to know *how well* it works, on what data, and why. The suite implements five compression algorithms, each written from scratch: - Huffman — canonical Huffman coding with frequency analysis and optimal prefix codes - LZ77 — sliding-window compression with bounded hash chains for near-linear performance - DEFLATE — my own block-framed implementation combining LZ77 tokenization with Huffman coding (stored, fixed, and dynamic Huffman blocks). Not RFC 1951 wire-compatible, but architecturally faithful to how DEFLATE works - RLE — run-length encoding for highly repetitive data - LZW — dictionary-based compression (the algorithm behind GIF/TIFF) On top of the algorithms, I built: A streaming API — You can compress in chunks, which matters for real-world use where you don't have the entire file in memory. A self-describing container format — Every compressed file has a header with a magic number, algorithm ID, original size, and a CRC32 checksum computed over the original data. Decompression verifies the checksum, so corruption is caught automatically. Multi-threaded DEFLATE — DEFLATE blocks can be compressed independently, so I added a `--threads N` flag that parallelizes block compression. This was a good exercise in partitioning work and managing thread synchronization without introducing correctness bugs. AVX2 SIMD acceleration — For LZ77, the inner loop that extends byte matches (once a hash chain finds a candidate) is the hot path. I added an AVX2-accelerated version that compares 32 bytes at a time, which measurably speeds up compression on repetitive data. It compiles conditionally based on the target architecture. A benchmark harness with proper methodology: configurable warmup iterations (to prime CPU caches and frequency scaling), multiple measurement iterations, and median reporting (resistant to outliers). It collects compression ratio, compress/decompress speed in MB/s, peak memory delta, CPU utilization, and token-level stats (match count, literal count for LZ77/DEFLATE). Results can be output as terminal tables, HTML reports, or CSV for further analysis. A 14-file test corpus spanning four categories — text (books, logs, source code), binary (zeros, random data, repeated payloads), structured (CSV, JSON, XML, SQL), and synthetic edge cases (worst-case inputs). This matters because compression algorithms have wildly different performance characteristics depending on the data — Huffman is great on natural text, RLE dominates on runs of zeros, and LZW handles structured streams well. The benchmark exposes all of that. I started with Huffman because it's the most self-contained — you can get a working compressor in a day and verify correctness trivially. Then I built LZ77, which introduced the sliding window and hash chain data structures. DEFLATE was the hardest because it combines both: you tokenize with LZ77, then entropy-code the tokens with Huffman, and you need to decide block boundaries and whether to use fixed or dynamic Huffman tables per block. I built RLE and LZW last as they're simpler but round out the suite for comparison. The benchmark harness came next. I wrote it to be usable as both a CLI tool and a C++ library so that other projects could link against it. The CLI supports `compress`, `decompress`, `benchmark` (run all algorithms across a dataset), and `compare` (side-by-side algorithms on a single file with optional HTML output). I wrote correctness tests covering edge cases (empty files, single-byte files, all-zeros, random data, files that don't compress at all) and set up CI with GitHub Actions so that every push runs the test suite and a benchmark --- - Ayush B.: University of Mumbai. AI/ML Research, AI/ML Engineering. (1) Work: Reduced multi-hop query latency from 42s to 10–12s with graph-based recommendation (attribute bucketing + weighted seed expansion) on an Agentic Knowledge Graph. (2) Projects: Yuntun—Implemented Megatron-style tensor parallelism with custom autograd for column/row/vocab-sharded layers and correct gradient flow. Weigou—Built a 4D-parallel training stack (TP/CP/PP/DP) with ring-attention CP and pipeline parallelism; unified process groups and bucketed gradient sync kept training correct. --- - Kundann D.: Rutgers. Data Science, Backend. Currently working on creating a web application for a project, coming from a Data Engineering background. I only know data movement and building workflows. I am using AI and vibe coding for the application but building the backend deployment and the integration of the pipeline on my own.