Building your own search intent with Bionoculars: A nuclear medicine use case

The scenario

A nuclear medicine researcher wants to find articles that evaluate the reliability of calcium scores estimated from nuclear medicine imaging (such as SPECT/CT) without running a full dedicated calcium score CT. Their starting query: "calcium score in nuclear medicine".

The challenge: understanding the user's intent is one of the hardest problems in information retrieval. In research, this is even harder because the user's intent is often not fully formed when they begin exploring a new topic (Peltonen et al., 2017).

Methodology

This test started with an interview with our nuclear medicine researcher. The interviewer explained how Bionoculars works, and the researcher guided the search on their own.

We did not know beforehand where the session would lead. It turned out to be a clear example of how Bionoculars helps users express and develop their search intent.

Comparisons with other tools were written afterwards with the interviewee's help, to avoid biasing the Bionoculars test by unconsciously looking for articles found by other tools.

Disclaimer: We are not evaluating the ranking algorithms of the tools discussed here. Our goal is to show how Bionoculars components, in particular keyword groups, combined with the principles of transparent and controllable AI, enable a different bibliographic research process, one that makes it easier for users to express their intent and build a comprehensive list of relevant literature.

What happens with conventional search

PubMed:

Of the top 10 results, only 5 mention both calcium score and nuclear medicine:

Google Scholar:

A few more relevant hits appear in the top 10, for example:

To be clear, with careful query crafting these tools can go further. The problem here is about intent: the researcher has a specific need that a broad initial query cannot communicate, and these tools offer no help in refining it. The main option is to rephrase the query and try again, hoping for a better match. This trial-and-error process adds friction that compounds quickly in large bibliographic projects, and it is not always obvious how to reword a query.

What happens with chatbots

Claude Opus 4.6: Claude covers the topic correctly but frames it primarily through the standard CT-based approach, giving nuclear medicine aspects secondary treatment. The main drawback: the mentions of SPECT/CT and PET/CT lack the supporting articles that would let a researcher verify the claims and ensure the answer is aligned with what the researcher is looking for.

Claude Opus 4.6 initial response about calcium scoring in nuclear medicine, framing it primarily through CT-based approaches

Claude Opus 4.6 initial response: accurate but CT-centric, nuclear medicine aspects are secondary.

When asked to provide citations supporting its initial answer, we notice that the LLM revisits and expands its answer by adding new elements, which raises concerns about reliability: the first response was incomplete, and there is no way to know if the second one is complete and what it missed.

Claude's follow-up response with citations, showing new elements added that were not present in the first answer

Claude's follow-up response with citations: new elements appear that were absent from the first answer.

  • Citations are not provided automatically. Initial answers can be incomplete, and we cannot see what the model has considered or overlooked.

  • A single citation per statement is not enough to support a research decision. More iteration is needed, with no guarantee the AI will produce a thorough overview.

  • As Messeri & Crockett argue, AI tools can create illusions of understanding, a risk that is particularly acute when users cannot see the evidence behind a confident-sounding answer (Messeri & Crockett, 2024).

Consensus: When it comes to citations, Consensus does significantly better: citations are provided by default, with multiple references per statement.

Consensus overview for calcium score in nuclear medicine, showing an answer with multiple citations per statement and a dose-reduction table

Consensus overview with citations: better than Claude, but shortcomings remain on closer inspection.

However, digging into the results reveals shortcomings:

Again, we could likely obtain a good overview by iterating. But each iteration adds friction, and the outcome depends on the user's experience with LLMs and their ability to push beyond the tool's default behavior. Consensus has a query suggestion component after the overview that is meant to help in these cases, but it still misses the core user intent.

Consensus query suggestion panel showing options such as GET A LIT REVIEW, CONSENSUS METER, Calcium scoring CT protocols, and Impact of deep learning on calcium scoring accuracy

Consensus query suggestions: meant to help refine the search, but the nuclear medicine angle is still missing.

The same search in Bionoculars

In Bionoculars, like in PubMed and Google Scholar, the first two results focus on calcium score broadly. What is different is that the user can see the keyword groups: the indexing data and logic behind each result.

Bionoculars initial search results for calcium score in nuclear medicine, with a keyword groups panel on the left showing Agatston Score, Nuclear Medicine, Coronary Artery Calcium, and Coronary Artery Disease

Initial Bionoculars results: the keyword groups panel makes the indexing logic visible.

This makes it easy to see where the mismatch between query and intent happened. We prefer showing these imperfections openly: it helps us improve the tool and gives users better control over the product they pay for.

The principle behind Bionoculars' keyword groups is similar to previous research showing that interactive keyword-based intent modeling improves search task performance (Peltonen et al., 2017).

At position 3, the researcher finds the same article as PubMed and Google Scholar. But this time, among its keyword groups, they notice:

  • Attenuation Correction: This points toward what they are actually looking for.

  • Myocardial Perfusion Imaging: Exactly what they would have added to the query if they had rephrased it manually.

The path forward is intuitive. The researcher directly thought about adding the missing keywords and did not even feel the need to continue checking the top 10. Instead of rewriting the query, they selected the keyword groups for "Attenuation Correction" and "Myocardial Perfusion Imaging" and added similar ones. This single step reshapes the results.

Bionoculars results after selecting Attenuation Correction and Myocardial Perfusion Imaging keyword groups, showing more focused results including articles on SPECT/CT calcium scoring

After selecting "Attenuation Correction" and "Myocardial Perfusion Imaging": the results are immediately more focused.

Note: for the comparison to be fair, we should have iterated for the other tools as well. But what we want to highlight here is that the iteration was automatic, intuitive and did not add any friction. The researcher did not have to guess or rewrite their query: they just reordered the first results using the data shown to them.

The key idea: Bionoculars makes its reasoning visible so you can correct it. It gives you the building blocks to express and refine what you mean, at your own pace, using your own expertise.

Learn how keyword groups work in the documentation or follow the tutorial.

Building the search iteratively

By selecting and combining keyword groups, the researcher starts finding exactly what they were looking for:

Among the new top 10, seven matched the researcher's intent and directly talked about using SPECT/CT attenuation correction CT for calcium score. These articles were there in the results; they were also returned by PubMed and Google in the top 100. What Bionoculars did differently is that it gave the user enough control to surface them quickly, without scrolling through pages of less relevant results.

The researcher can then add these articles to a project for later reading.

Bionoculars modal for adding 7 selected articles to a project named calcium score in nuclear medicine

Adding the relevant articles to a project in one step.

Each step is a small, deliberate decision. The researcher is not hoping the algorithm gets it right, and does not need to construct complex boolean expressions from the start.

Where to go from here

At this point, the researcher has several options:

  • Explore the knowledge graph, which can surface additional relevant articles through concept relationships (documentation).

  • Use AI to summarize the articles collected so far, using their abstracts as input (documentation).

  • Use AI to find new articles via the Select and Generate action.

  • Run a new search, building on the keyword groups already identified.

We recommend the last option. With a more precise keyword set in hand, a new search will surface a fresh set of relevant articles. This is why we added the "Add to Search" button for each keyword group: it builds a refined search using Bionoculars' index keywords directly.

Bionoculars refined search with keywords Agatston Score, Coronary Calcium Score, Attenuation Correction, Myocardial Perfusion Imaging, SPECT MPI, and Nuclear Medicine selected

A new search built from the keyword groups identified in the session.

After this, the researcher quickly reorders and groups keywords to further express their intent. Five new articles appear:

Working with a project

Once the articles are saved to a project, the researcher can run AI queries restricted to that set. This helps prioritize which papers to read first and filter down to the most relevant subset.

Bionoculars project view with the Select and Generate AI action panel open, showing a summary of calcium scoring in nuclear medicine with numbered citations linked to the 12 collected articles

AI action (Select and Generate) restricted to the project's collected articles.

Using LLMs at the end of the process instead of the start mitigates the drawbacks we described earlier. The AI action is used to assist in building a focused overview for better decision making and is not meant to replace the expertise of the researcher.

Summary

The tools we compared each have specific shortcomings for this kind of exploratory research, where intent is not fully expressed or developed from the start:

Search engines (PubMed, Google Scholar):

  • Do not help the user express or develop their intent.

  • Require the user to craft the right query to match the engine's index.

  • Results were dominated by one facet of the query (standard calcium scoring), while the nuclear medicine angle was underrepresented, and it is not always clear how to control this balance.

LLM chatbots (Claude Opus 4.6 and Consensus):

  • Lack of reliability: citations are not always present, and initial answers can be incomplete, and in some cases are not used correctly when present.

  • Lack of transparency: there is no way to see which articles were considered or overlooked, or why certain claims were made.

  • Lack of control: the user cannot steer the output without reprompting and hoping the model interprets the correction correctly.

Bionoculars: Makes its reasoning visible so you can correct it. It gives you the building blocks to express and refine what you mean, at your own pace, using your own expertise. Each step is a small, deliberate decision. The researcher is not hoping the algorithm gets it right, and does not need to construct complex boolean expressions from the start.

References

[1] Peltonen, J., Oulasvirta, A., & Kaski, S. (2017). Interactive Intent Modeling from Multiple Feedback Signals. ACM Transactions on Interactive Intelligent Systems, 7(2), 1-36. https://doi.org/10.1145/3231593

[2] Messeri, L. & Crockett, M. J. (2024). Artificial intelligence and illusions of understanding in scientific research. Nature, 627(8002), 49-58. https://doi.org/10.1038/s41586-024-07146-0