Harender Bisht on Explainable AI and Decision Making

In finance, in reinsurance, in healthcare, the moment that matters most is rarely the moment a system produces an answer. It is the moment afterward, when a regulator, an auditor, or a customer asks why. A model that denies a claim, flags a transaction, or scores a risk has not finished its job when it returns a number. It has finished its job when a human being can stand behind that number and explain how it was reached. Where the stakes are high, an output nobody can account for is not an asset. It is a liability waiting for the question that exposes it.

Harender Bisht has spent his career on the answering side of that question. A Solution Architect with more than a decade of experience across IT, finance, and reinsurance, he designs enterprise systems in which decisions have consequences and a paper trail. He is now completing a PhD in artificial intelligence, with published research on explainable AI, ethics, and high-stakes decision systems. That combination- the architect who has shipped the systems and the researcher who studies how to make them legible- shaped how he judged Code Olympics 2026: not by whether a tool produced output, but by whether the output could be understood, trusted, and defended by someone who had to act on it.

Bisht evaluates software the way a regulated industry forces you to, asking not only whether the result is right, but whether anyone can prove why it is right.

Code Olympics 2026, organized by Hackathon Raptors, challenged teams to ship working software in 72 hours under four simultaneous constraints: a core technical rule, a strict line budget, an assigned project domain, and a programming language each team did not choose. The constraints produce small programs, but they also strip away the abstractions that normally let an engineer hide an opaque decision behind a framework. What is left is the logic itself, exposed, and that is exactly where Bisht’s attention goes.

His batch gave him an unusually clean set of cases to read through that lens. Several teams, knowingly or not, had built tools whose entire value rested on whether their output could be trusted, and Bisht assessed them as he would assess a decision system bound for production in a regulated domain.

A Score Means Nothing Without Calibration

The submission that most directly matched Bisht’s research interests was star trek’s MindMeld, a quiz platform whose premise is a quiet rebuke to most assessment software. It measures not only whether an answer is correct, but how certain the person was when they gave it, and then reports back where confidence and competence diverge: the lucky guesses, the confident errors, the genuine gaps a simple score would have hidden.

To most reviewers that is a clever feature. To Bisht it is the heart of trustworthy decision-making. “A system that reports an answer without reporting its confidence is telling you half the truth,” runs the logic he applies to his own field. “The dangerous output is not the one that is wrong. It is the one that is wrong and certain.” Calibration, the relationship between how sure a system is and how often it is right, is one of the central problems of high-stakes machine learning, because a model that is confidently wrong does far more damage than one that signals its own doubt. MindMeld built that idea into a 72-hour project, surfacing the difference between knowing and guessing rather than collapsing both into a single grade. In Bisht’s reading, a team that understood why calibration matters had understood something most production systems still get wrong.

Ethics Is an Architecture Decision, Not a Disclaimer

State Matrix’s Privacy Redactor drew his attention for a different reason, one that sits at the intersection of his architecture work and his published interest in ethics. The tool is a zero-dependency utility that detects and strips sensitive information out of logs, API responses, and crash reports before they are shared with teammates, bug trackers, or the public.

Bisht has spent years in domains where data handling is not a preference but a legal obligation, and he reads a tool like this as a structural answer to a structural problem. “Most data leaks are not the result of an attack,” he observes of the pattern. “They are the result of ordinary people sharing a log that contained something it should not have. The fix is not a policy telling them to be careful. It is a tool that makes the careful path the default one.” That distinction, between governance imposed as a rule and governance built into the workflow, is the difference between compliance that holds and compliance that depends on everyone remembering. Privacy Redactor put the safeguard at the point of leakage, before the sensitive data ever left the building, and that placement is the part Bisht rewarded. Ethics, in his architecture, is something you design into the path of least resistance, not something you write into a disclaimer nobody reads.

Auditability Is the Feature Finance Cannot Live Without

If MindMeld spoke to his research and Privacy Redactor to his ethics, TeamHM’s Chronofold spoke directly to the finance and reinsurance systems he has built. Chronofold is a zero-dependency, fault-tolerant, event-sourced state engine: feed it a log of events, banking transactions, blockchain transfers, inventory updates, and it replays them, validating each against declarative rules to reconstruct the current state.

Event sourcing is not a flashy choice, and Bisht’s regard for it is precisely the point. “In a financial system, the current balance is not the truth. The sequence of events that produced it is the truth,” he explains, framing the architecture in the terms his industry uses. An event-sourced design keeps every change as an immutable record, which means the state is not just known but explainable: you can replay exactly how it was reached, prove it against the rules, and answer an auditor’s question without guesswork. For a team to choose that pattern under a hackathon’s time pressure, when a simpler mutable approach would have demoed just as well, signaled to Bisht an instinct for the kind of accountability that regulated systems are built on. Chronofold was not the most visually striking project in the batch. It was one of the most defensible, and in his domain those are not the same thing.

Making the Invisible Process Visible

A recurring strength Bisht credited across the batch was a willingness to expose process rather than hide it, which is explainability expressed in a different register. Two projects in particular turned an algorithm’s hidden reasoning into something a person could watch.

PathFinder’s Recursive Maze generates a maze and then runs three pathfinding algorithms, breadth-first search, depth-first search, and A-star, in parallel, visualizing how each one explores the space differently in real time. Code Warriors’ Sortviz does the same for sorting, rendering the step-by-step process of an algorithm in a zero-dependency Go terminal program. Neither is a novel algorithm. What both do is make the decision process legible, and that, to Bisht, is a discipline worth more than novelty. “The reason most people distrust an algorithm is that it is a black box to them,” he notes. “The moment you can see it work, see why A-star reaches the goal faster than depth-first search, the box opens and the trust follows.” A tool that shows its reasoning is a tool that can be taught, debugged, and defended, and in high-stakes settings the ability to see why a system did what it did is often more valuable than a marginal gain in the result itself.

He extended similar credit to beTheNoob’s DevOps Visual Toolkit, which turns opaque structured data into plain-text visualizations an engineer can paste into a README, a Slack message, or a CI log. The intellectual move is the same one running through his highest marks: take something hard to interpret and make it legible to a human who has to act on it. Whether the data is a model’s confidence, a financial event stream, or an algorithm’s search path, Bisht’s standard held constant. The work is not finished when the computer understands. It is finished when a person can.

Judging by Substance, Not by the Label

A subtler thread ran through Bisht’s strongest scores, and it connects the projects that on the surface have nothing in common. The tools he rated highest all shared a refusal to be fooled by appearances, an insistence on judging a thing by what it actually is rather than what it claims to be. It is the same discipline, expressed in different domains, and it is the discipline at the core of his research.

The Code Slayers’ rust-dedupe is the clearest small example. It finds duplicate files not by comparing their names but by comparing their actual content, so that two identical files are caught even after one has been renamed, and two files with the same name are correctly left alone if their contents differ. A naive deduplicator trusts the label. A trustworthy one reads the substance. “The name is a claim. The content is the fact,” is how Bisht frames the distinction his field lives by. “Every system that matters has to decide which one it is going to believe, and the ones that believe the label are the ones that fail in ways nobody sees coming.”

That is the same move MindMeld makes when it looks past the answer to the confidence behind it, the same move Chronofold makes when it looks past the current balance to the events that produced it, and the same move Privacy Redactor makes when it inspects what a log actually contains rather than trusting that it is safe to share. Across a batch of unrelated tools, Bisht kept rewarding the same underlying instinct: that a system earns trust by reading reality rather than its summary. For an architect whose decisions carry regulatory and financial weight, and a researcher who studies how automated judgments go wrong, that instinct is not a stylistic preference. It is the line between a system that holds up under scrutiny and one that only looks like it does.

A Reviewer’s Standard for Systems People Have to Trust

Read across his batch, Bisht’s evaluations resolve into a set of questions drawn straight from the discipline of building decision systems that survive scrutiny. They apply far beyond a hackathon, to any tool whose output someone will be asked to justify.

Does the system report its own confidence, not just its answer? A result without a sense of its own certainty hides exactly the failures, the confident errors, that do the most damage in high-stakes use.

Is the safeguard built into the workflow or bolted on as a rule? Governance that depends on people remembering will eventually fail. Governance built into the default path holds because it asks nothing of them.

Can the outcome be reconstructed and proven? Immutable, event-sourced, auditable designs let you answer the question every regulated system eventually faces: show me exactly how you arrived at this.

Is the process visible, or is it a black box? A system whose reasoning can be watched can be taught, debugged, and defended. One that cannot is trusted only until the first time it is wrong.

Does the output respect the human who must act on it? The job is not done when the machine has computed an answer. It is done when a person can understand it well enough to stand behind it.

Why Explainability Is About to Matter More, Not Less

There is a reason an architect from finance and a researcher in explainable AI reads a 72-hour submission the way Bisht does. The industry is rushing to put machine-generated decisions into places where the cost of being wrong is measured in money, in health, or in someone’s liberty, and it is doing so faster than it is building the means to explain those decisions. The gap between what these systems can decide and what we can account for is widening, and regulation is beginning to close it from the other side, demanding that high-stakes automated decisions be explainable by law.

This is not a distant concern. Lending decisions, fraud flags, insurance pricing, and clinical triage are already being shaped by models whose reasoning their own operators struggle to articulate, and each unexplained decision is a small debt that comes due the moment it is challenged. The organizations that will weather that reckoning are the ones treating explainability as a design requirement now, rather than a feature to retrofit once a regulator or a court insists on it.

The Code Olympics format, by reducing teams to a few hundred legible lines, surfaces the instinct that will matter most as that pressure grows. The teams Bisht ranked highest were not the ones with the cleverest algorithm. They were the ones who understood that a decision is only as useful as it is defensible: that confidence must be reported, that ethics must be architected, that outcomes must be auditable, and that a process kept invisible is a process no one will trust for long. A judge who has built the systems that have to answer for themselves spent his hours at Code Olympics looking for the teams that already knew it, and rewarding the ones who did.

That these instincts showed up at all under a 72-hour clock is the part Bisht found genuinely encouraging. Calibration, auditability, governance by design, and visible reasoning are usually presented to engineers as advanced concerns, the things you bolt on once a system is mature and a compliance team starts asking questions. Seeing teams reach for them under constraint, when the easy path was a polished demo that hid its own logic, suggested to him that the next generation of builders may be internalizing earlier than their predecessors did the idea that trust is not a property you add to a system later. It is a property you either design in from the first decision or spend years failing to retrofit.

Code Olympics 2026 was organized by Hackathon Raptors, a Community Interest Company supporting innovation in software development. The event challenged teams to build working software across 72 hours under four simultaneous constraints: a core technical rule, a line budget, an assigned project domain, and a programming language teams did not choose. Harender Bisht served as a judge evaluating projects for functionality, constraint mastery, language adaptation, code quality, and innovation.