Byteboard’s Philosophy on AI Assistance and its Impact on the Byteboard Interview
In the immediate term, here’s how the Byteboard team is thinking about tools like ChatGPT and Copilot:
We’ve seen cases where candidates seemingly attempt to rely on Copilot or other AI tools without fully understanding the context of the questions as they are asked, and those candidates received unambiguous Poor ratings based on our rubrics.
- [Part 1-specific] We’ve found that ChatGPT is not well-suited for the type of ambiguity in our Design Doc exercises.
- Whereas we ask candidates to weigh trade-offs, handle ambiguity, and make decisions within the context of the design document, ChatGPT doesn’t do a great job of responding directly to the context in the document, and its responses generally end up dancing around the questions or providing suggestion lists that are somewhat ill-suited to the specific situation.
- At best, a candidate could use ChatGPT to seed some potential ideas, but we’ve found that it can’t generate responses that would result in a strong performance on our assessment.
- [Part 2-specific] We’ve tested Github Copilot on some of our Coding exercises and found mixed results. Our coding exercises are essentially composed of a short series of tasks of increasing complexity and ambiguity.
- With a bit of prompting, Copilot is generally able to quickly generate strong answers for the easy beginning tasks, but by Task 3, the candidate has to have significant expertise and clarity about their goals and implementation plan to be able to use Github Copilot well.
- As its name suggests, Github Copilot works well as an assistant, not as a leader or decision-maker. So, when the questions require decision-making and trade-off analysis (as our later questions always do), candidates can’t fake those skills with Github Copilot.
- [Also Part 2-specific] It’s also worth noting that both Copilot and ChatGPT are prone to making technical errors that require expertise to catch and fix.
- e.g., calling functions that don’t exist, missing edge cases
Ultimately, these AI tools are the tools of the future, and we believe that an engineer who knows how to successfully use them in a Byteboard assessment would also know how to successfully use them on the job.
But, like real-world work, the Byteboard assessment is designed to require a level of flexible thinking, decision-making, and context-awareness that these tools can’t provide on their own. Even with these tools widely available, we’re confident that achieving a Strong rating still requires true job-related expertise.
Byteboard Initiatives to Test and Mitigate the Impact of AI Tools
Additionally, here are some of the initiatives we are undertaking in the short and longer term, as the impact of ChatGPT and similar tools becomes larger:
- We continue to add complexity and ambiguity to the coding tasks for our SWE assessments
- We’re rethinking how to assess reading code to possibly replace simple code tasks this year
- Our team is constantly creating new interviews and updating existing ones, and a part of that process is testing them using AI tools and ensuring all questions require candidates to guide AI assistance and understand the responses in context
- We’re in the midst of an in-depth revision of our Skills Map, and part of that is thinking through the skills that are required of a Software Engineer in the AI-assisted age and ensuring that we provide clear recommendations to you on what those are
- We’re considering a variety of mechanisms to more deeply assess how well a candidate understands the goals and context of a question (e.g., recording a video of the candidate discussing their solution), particularly if they utilized AI assistance
- We’re generally looking into additional plagiarism detection methods and can keep you updated about that too
- For example, our team reviews the logging of a candidate’s copy/paste behavior for text contained in our interview materials to help determine their potential AI-assistance use.
What happens if Byteboard Graders identify a candidate that may have used AI Assistance during their interview?
In addition to the work above, our team regularly conducts audits to search for and remove any leaked material available online. Graders are trained on how to identify responses that may be plagiarized or be generated using AI tools, and we review each potential case we find.