A Sophomore Intern's Playbook for Agent Dev Interviews • Joye Personal Blog

After interviewing at nearly 10 AI startups, I want to talk about what Agent roles actually test — and what matters 100x more than “memorizing answers.”

Preface#

My name is Joye. I am a sophomore at the University of Melbourne, studying Computing and Software Systems. Over the past few months I have been an AIGC R&D intern at Tezign in Shanghai, mainly working on Atypica — a multi-agent system for business research.

I have recently been looking for new opportunities and have interviewed at close to 10 companies, all AI startups and overseas startups, with roles focused on Agent development and LLM application engineering. Looking back after all those interviews, I realized that interview experiences for this direction are still quite scarce on the Chinese internet — unlike frontend or backend, where there is a systematic set of “eight-part essays” to grind through. Agent development interviews feel more like a deep conversation about “whether you truly understand what you are building.”

So I am writing this article partly to organize and reflect on my own interview notes, and partly to offer some reference for others exploring this same direction.

But I want to say one thing first: this article is not just for people preparing for interviews; it is equally for those who are learning Agent development and building their own Agent projects.

Why? Because there is a fundamental difference between Agent interview questions and frontend “eight-part essays” — you might never use the frontend trivia you memorized, but the questions asked in Agent interviews are real design decisions you genuinely need to consider while building projects. “What do you do when context overflows?” is not an interview question; it is an engineering problem your Agent will definitely hit on its 20th turn of conversation. “How do you prevent an LLM from misusing tools?” is not an interviewer trying to stump you; it is the bug report you will receive in the first week after deploying to production.

Think about it from another angle: if you have already taken context engineering seriously, built robust safeguards for tool-calling, and thought through prompt-injection defenses in your own project — then these are not “questions to prepare for,” but real battle stories you can tell naturally. That is what truly earns you bonus points. Going forward, I will carry these questions with me into every new project I build.

What Agent Role Interviews Actually Test#

Differences from Traditional Development Interviews#

After so many rounds, my biggest feeling is this: Agent development interviews almost never have “standard answers.”

In a traditional frontend interview you can memorize closures, the event loop, and virtual DOM diff algorithms, and that gets you 70–80% of the way there. But the Agent direction is different. Many of the questions interviewers ask are open-ended. They are not testing what you remember; they are testing what you have been through. “What do you do when your Agent gets stuck in an infinite loop?” — someone who has not stepped in that puddle simply cannot make up a convincing answer.

From what I have observed, interviewers care about three layers:

Can you use it? — Have you used LangChain? How did you set up RAG? How did you design your MCP server?
Do you understand why it was designed this way? — Why do structured prompts work better? Why do memory hierarchies matter? Why did you choose the Vercel AI SDK?
Can you make trade-offs under engineering constraints? — What if your token budget is tight? How do you balance response speed versus quality? How do you handle model hallucinations?

The first layer comes from experience, the second from understanding, and the third from judgment. The further down you go, the harder it is to prepare for, but the more it reveals your engineering maturity.

Full Taxonomy of Interview Questions#

I have grouped the questions I encountered by topic. Together they cover the main directions Agent development interviews tend to explore:

RAG and Retrieval Augmentation

Differences and applicable scenarios for multi-hop versus single RAG
How to do retrieval augmentation
When to use vector retrieval versus graph-database retrieval in RAG
How to do data cleaning for RAG
The evolution path from RAG → Agentic Search → Agentic Memory

Agent Architecture and Orchestration

Agent routing design: how to decide which sub-agent to call
Fallback mechanism design for Agents
Are sub-agents parallel or asynchronous? Can they communicate with each other?
How to prevent an LLM from misusing tools
How to handle infinite tool-calling loops
How to design an MCP server
Overall framework design for Agent workflows
How to implement callback mechanisms

Context Engineering and Token Management

How to save tokens
What to do when context overflows (besides FIFO)
How to convert short-term memory into long-term memory
Prompt compression strategies
Where to place prompt caches

Prompt Engineering and Safety

Why structured prompts improve response quality
How to design subtle emotional chain-of-thought
How to prevent prompt injection attacks
How to turn abstract requirements into concrete ones (making AI “smarter” or “more human”)
Whether prompt-injection defenses are in place at the start and end of interviews
Negative effects of one-shot prompting: insufficiently differentiated examples can limit the model’s thinking

LLM Foundations and Model Engineering

Attention mechanism (common knowledge)
Pretraining versus LoRA fine-tuning
Design and limitations of LLM-as-a-Judge
Trade-offs between response speed and quality
How to avoid OOC (Out of Character)
How to handle LLM hallucinations

AI Infra and Engineering

Design of model gateway / Agent gateway
Building an eval platform
DPO / APO algorithm evaluation
Why use the Vercel AI SDK

Industry Awareness and System Design

How Manus, OpenClaw, and OpenCode approach architecture design
Major Agent application scenarios: AI search, Chat-to-BI chart generation, Vibe Coding
Relationship and differences between Skill and MCP
Experience with LangChain / LangGraph

If you look at this list, you will notice that none of these can be solved by “memorizing.” They test your holistic understanding of the system and your judgment in real engineering situations.

Interview Questions That Grew Me the Most#

Rather than listing “reference answers” for each question, I want to share a few moments from interviews that genuinely changed how I think about Agent development. Some I answered well at the time, some I did not, but all of them continued to shape my understanding long after the interview ended.

”What do you do when context overflows? What besides FIFO?”#

Almost every company asked this, and interviewers are usually not satisfied with “discard the oldest messages with FIFO.”

The first time I was asked, I really did only mention FIFO. The interviewer pushed back: what if the user stated a very important constraint at the beginning of the conversation, and your FIFO just threw it away? I froze.

Only when I reflected afterward did I realize that this question is fundamentally about context engineering with a global view. In the Atypica project, we were already doing this — memory tiering (working memory / short-term / long-term), summarization compression, retrieval stitching, token-budget control — but I had not articulated them as a unified pipeline at the time.

This experience taught me one thing: having done something is not the same as being able to explain it. Many times you have already solved a problem in a project, but if you have not abstracted it into a systematic solution, you will stumble in an interview. After that, I deliberately started restructuring every design decision in Atypica using the “problem → solution → trade-off → result” framework.

”How do you prevent an LLM from misusing tools?”#

At first this sounds simple — just add permission controls, right? But the interviewer wants far more than that.

In Atypica, we built an asynchronous state machine for tool-calling, including checkpoints, retries, idempotency, timeout rollbacks, and state recovery. But “preventing misuse” is not just a post-hoc fallback problem. The key is upfront constraint: Is the tool schema design clear enough? Are the function-calling descriptions unambiguous? Does the routing logic correctly dispatch requests to the right tools?

In one interview the interviewer kept pushing: if the LLM enters an infinite loop repeatedly calling the same tool, what do you do? I mentioned our watermark and deduplication mechanisms, plus checkpoint-resume design. The interviewer nodded, but then asked something that stuck with me: “Have you considered that, rather than doing so much runtime defense, you could clarify the boundaries of tool usage at the prompt level?”

That woke me up. Many Agent reliability issues do not originate at the runtime layer but at the prompt layer — if the instructions you give the LLM are inherently ambiguous, no amount of engineering fallback will do anything other than clean up after a bad prompt. That is why I have come to place more and more weight on structured prompt design and guardrails as upfront constraints.

”Why do structured prompts work better than natural-language prompts?”#

I initially thought this was easy to answer: “Because structured information is easier for the model to parse.” But the interviewer followed up: “From the perspective of the Attention mechanism, can you explain why?”

At the time my understanding of Attention stopped at “Q, K, and V do a dot product to compute relevance.” I had not thought deeply about how it relates to prompt format. The interviewer gave a very enlightening angle: structured prompts (using XML tags, JSON Schema, or clear delimiters) actually help the model allocate attention — they give key information clearer “anchors” in the token sequence, reducing the difficulty of “finding the point” in long contexts. Natural-language phrasing tends to bury key information in large amounts of modifiers and transition sentences, increasing attention dispersion.

This conversation made me realize that many of the prompt optimizations we make at the engineering layer have solid model-mechanism foundations behind them. Knowing the what is not enough; knowing the why matters — especially in interviews, where an interviewer can quickly tell whether you are “reciting experience” or “truly understand."

"How do you turn abstract requirements into concrete ones?”#

This might be the “softest” hard question I encountered.

The interviewer’s original words were roughly: “When a user says ‘make the AI a bit smarter’ or ‘give it more of a human feel,’ how do you turn that vague request into a deliverable technical plan?”

This happened to be something I spent a lot of time on in Atypica. We designed a counter-questioning system: when a user’s need is vague, the Agent does not guess blindly but uses explicit and implicit intent recognition to break the fuzzy requirement down into structured constraints — what is the goal, who is the audience, what are the style preferences, what are the limitations.

In the interview I walked through the full design of this system. After listening, the interviewer asked what I thought was a very sophisticated question: “How do you evaluate the quality of counter-questioning? How do you know the questions you ask are effective and not just wasting the user’s patience?”

Honestly, I did not answer this well at the time. I thought about it for a long time afterward and felt it might need to be evaluated through LLM-as-a-Judge or indirect user-satisfaction metrics — but I have not deeply practiced in that direction yet. That is what interviews do: they pinpoint the exact boundary of your knowledge system.

”How do you prevent prompt injection attacks?”#

This was asked in multiple interviews, but what truly benefited me was not the interview itself — it was the homework I did after.

In the interviews I roughly discussed some basic defense ideas: input filtering, role locking, output validation, and so on. But interviewers clearly expected a more systematic answer. Afterward I went and carefully read the technical blogs on prompt safety from OpenAI and Anthropic, and discovered that this field actually has two very clear core ideas:

One is OpenAI’s Instruction Hierarchy — the core idea is to set priorities for instructions from different sources: system prompt > developer instruction > user input, so the model knows whom to listen to when instructions conflict. The other is Anthropic’s sandboxing approach — isolating untrusted external content in a restricted execution environment to limit the scope of malicious instructions at the architecture level.

Why do I highlight this? Because I genuinely recommend that everyone building Agents take the time to read OpenAI’s and Anthropic’s technical blogs. They are not abstract academic papers; they are excellent, first-hand engineering-practice recommendations. After reading them, your understanding of many issues will level up. Papers are worth reading too — I will later publish a curated paper series on my personal blog, selecting and interpreting the papers most valuable to Agent development practice. I am also building an RSS subscription push that aggregates the latest Agent development practices from major AI vendors every day or week, plus my own commentary and filtering — not pure reposting, but opinionated curation. If you are interested, follow my blog at joyehuang.me ↗ for updates.

”Do you know the designs of Manus / OpenClaw / OpenCode?”#

This is not a question of technical depth but of industry breadth. The interviewer wants to know: are you paying attention to what is happening in this industry? Are you only heads-down in your own project, or are you also looking up to see how others are solving similar problems?

At the time I knew something about Manus and OpenCode (I had studied OpenCode’s architecture), but I did not know much about OpenClaw and my answer was weak. This question reminded me: Agent development is still evolving rapidly, and testing industry awareness is essentially testing your learning speed and information sensitivity.

If you usually do not look at GitHub Trending, do not follow AI discussions on Twitter/X, and do not study newly released open-source projects, you will be very passive when you run into these questions. I later developed a habit: spend fixed time every week looking at changelogs and design docs for projects like Manus and OpenCode. You do not need to go deep, but you should at least know which way the industry’s wind is blowing.

What Matters More Than “Grinding Questions”#

About the Interview Itself#

Reflection is the biggest lever in interviewing.

This is the point I want to stress most: an interview without reflection is meaningless. If recording is not allowed, then the moment the interview ends, find a place to sit down while your memory is still fresh and write down every question and your answer. Especially the ones you could not answer or answered poorly — those are the most valuable reflection material. The interview-question list you see in this article was built up round by round exactly this way.

Ask the interviewer to reflect for you.

Many people overlook the value of the Q&A round. I later developed a habit of always asking two questions in that round: “What did I not do well today that I should improve?” and “What did I do okay?” You might think this is too direct, but most interviewers are actually very willing to answer. Their feedback is far more precise than your own guesswork, and it also shows that you are someone actively seeking growth.

Besides those two questions, you can also ask “What would be the first project I would work on if I joined?” — this shows your sincerity and helps you judge whether the actual work matches the job description.

Interviewers are also your network.

After the interview, regardless of the outcome, try to add them on WeChat or LinkedIn. You never know when that connection will become valuable. I have a real story: I interviewed at a fintech Agent company focused on B2B, passed the first round, but the interviewer felt their product direction might not match my interests and proactively referred me to another company. I am currently in that company’s written-test pipeline. Opportunities sometimes come exactly from an “unsuccessful” interview — you never know where a door leads.

On Choosing Offers: Advice for Interns and Startup-Minded People#

Use one question to filter out unreliable startups.

When interviewing at startups, I always ask one question in the Q&A round: “Compared with competitors, what is your product’s core advantage?” This question seems simple, but its filtering power is excellent — if a startup’s interviewer cannot even clearly articulate their own product’s differentiation, the company’s direction is probably vague. If a startup is not confident in its own product, what will you learn there? Avoid them.

Do not be dazzled by “angel round of $X million.”

Funding amount is only one dimension for evaluating a startup, absolutely not the only one. Of course, a company with zero funding deserves extra caution. But having funding does not mean everything is fine — you also need to look at whether the product direction makes sense, the team’s background, whether you can actually learn things there, and whether there is a mentor to guide you. An internship is not just selling your time; it is a learning investment. Think about where your time will yield the highest return.

For myself, the things I care about most when choosing an offer are: is this product interesting? Am I genuinely interested? Is it going overseas? Is it toC? These preferences vary from person to person, but the important thing is to have your own decision framework rather than being led around by funding amounts or company names.

Take the initiative; do not be intimidated by “full-time” labels.

On Boss Zhipin and various recruiting platforms, many positions are listed as full-time. But if you feel the role direction is a great fit, you can absolutely reach out proactively, explain your situation: “Although I am still studying, if your company has internship needs, I hope you will take a look at my resume.” The worst outcome is simply being rejected, but very often they actually do have internship headcount — it just was not posted separately.

As a side note: if your internship experience is solid enough, you can even interview directly for full-time roles. I once interviewed for a full-time Agent development position, got grilled hard, and somehow passed — the interviewer only realized at the end that I had not yet graduated. This made me realize: do not limit yourself. Other people’s expectations of you are often not as low as your own.

Industry Observations: Reading the Wind in Agent Development from Interview Questions#

Interview questions themselves are an industry barometer — what interviewers ask, to some extent, reflects what the industry cares about most right now. From my nearly 10 interviews, I noticed several clear trends.

RAG Is Evolving Toward Agentic Directions#

Almost every interview asked about RAG, but none stopped at the basic “retrieve + generate” level. Interviewers care more about Agentic Search and Agentic Memory — that is, retrieval is no longer a passive pipeline but something the Agent itself decides: “when to search, what to search for, and how to use the results.” This is a qualitative leap from tool to capability.

Three Major Application Scenarios Keep Coming Up#

In conversations with interviewers from different companies, three application scenarios were mentioned most frequently: AI search (the upgrade from keyword matching to semantic understanding), Chat-to-BI (using natural language to generate data-analysis charts), and Vibe Coding (using AI to assist or even lead code writing). These three directions each impose different requirements on Agent architecture design, but their common thread is the pursuit of greater “autonomy” and “reliability.”

The AI Infra Layer Is Starting to Be Taken Seriously#

In the early days of Agent development, people focused more on upper-layer application logic — how to write prompts, how to chain steps together. But now interviews increasingly include AI Infra layer questions: how do you design a model gateway? How does an Agent gateway handle routing and rate limiting? How do you build an eval platform? How much do you know about alignment algorithms like DPO/APO? This shows the industry is moving from “demo phase” to “production grade,” and infrastructure maturity is becoming a key factor in product reliability.

MCP and Skill Systems Are Standardizing#

MCP (Model Context Protocol) came up frequently in interviews. It represents a standardization direction for Agent capability invocation. At the same time, the concept of Skill is emerging from products like Manus — packaging Agent capabilities into reusable, composable modules. For developers, this trend means: in the future, Agent development may no longer be building from scratch, but assembling and orchestrating within a standardized capability marketplace.

Toolchain Choices Reflect Engineering Taste#

Why use the Vercel AI SDK instead of LangChain? Questions like this may seem to test technology choices, but they are actually testing your understanding of the toolchain ecosystem and your own engineering judgment. In an interview, being able to clearly explain “why I chose this, what the trade-offs are, and when I would switch to something else” is far more convincing than simply saying “I have used xxx.”

Closing Thoughts#

After so many rounds, my biggest feeling is this: an interview is a mirror — it does not lie.

It will precisely illuminate every gap in your knowledge system, and it will make you discover things you thought you understood but had merely “used.” The distance between “I have used Agents in a project” and “I can design a reliable Agent system” is not more project experience, but deeper understanding and more systematic thinking.

As a sophomore still in school, I know I still have a lot to learn. But this intensive interview experience gave me a far clearer panoramic view of Agent development than I had before — not just technically, but also in terms of industry judgment.

One last line for friends on the same path:

Build fast, learn faster. The interview is not the finish line; it is the starting point of the next stage of growth.

If this article helped you, feel free to share your own interview experiences in the comments. You can also find me on GitHub ↗.