Chapter 7 Planning your analysis with Ask mode
7.1 Stages of analysis
There are overall decisions you need to make when developing your analysis:
- What types of statistics to use.
- How to implement those statistics in R code.
Its worthwhile separting these two decisions, as they are different issues. One is a science question, the other is a programming question.
When using Assistants its also worthwhile using different chat sessions to try and find answers.
7.2 Ask mode
Ask mode helps you plan analysis and implementation, using context from your project.
In VScode click the ‘octocat’ symbol that should be at the top towards the right. This will open the chat window.
The chat panel will appear down the bottom of this new sidebar. Confirm that the chatbot is currently set to ‘Ask’ mode.
Your current file will automatically be included as context for the prompt. You can drag and drop any other files here as well.
Start by asking the chatbot for guidance on a statistical analysis. We are interested in how the abundance of Topa relates to coral cover. For instance you could ask:
How can I test the relationship between pres.topa and CB_cover?
Evaluate the quality of its response and we will discuss.
7.3 The jagged frontier of LLM progress
LLMs were created to write text. But it soon became apparent that they excel at writing programming code in many different languages.
Since then AI companies have been optimising their training and development for coding and logic.
There are a series of standardized tests that are used to compare quality of LLMs. Common evaluation tests are the SWE benchmark which looks at the ability of LLMs to autonomously create bug fixes. Current models get about 50% resolution on this benchmark.
Their progress on math and logic is a bit more controversial. It seems like some of the math benchmarks (like AIME annual tests for top 5% highschool students) are saturated as LLMs are scoring close to 100% on these tests.. So newer tests of unsolved maths problems are being developed.
However, others are finding that the ability of LLMs on math and logic are overstated, perhaps because the LLMs have been trained on the questions and the answers. Its also clear that AI companies have a strong financial incentive to find ways (real and otherwise) of improving on the benchmarks. Are the moment there is tough competition to be ‘industry leaders’ and grab market share with impressive results on benchmarks.
Either way, it does seem that the current areas of progress are programming, math and logic.
Evaluations on statistics and the R software are less common.
The limited evaluations of LLMs on their ability to identify the correct statistical procedure are less impressive than other benchmarks. An evaluation (published 2025) of several models, including GPT-4 as the most up-to-date model, found accuracy at suggesting the correct statistical test of between 8% and 90%.
In general LLMs were good at choosing descriptive statistics (accuracy of up to 90% for GPT-4). Whereas when choosing inferential tests accuracy was much less impressive - GPT-4 scored between 20% and 43% accuracy on questions for which a contingency table was the correct answer.
The results also indicate the improvements that can be gained through better prompts (i.e. doubling in accuracy for GPT 4).
The lesson is two-fold. Just because LLMs excel at some tasks doesn’t mean they will excel at others. Second, good prompting strategies pay off.
For us in the niche R world there is also another lesson. The LLMs should be good at helping us implement analyses (ie write the R code). However, they are less reliable as statisticians who can guide us on the scientific question of what type of analysis to do.
7.4 How to prompt for better statistical advice
The limited number of evaluations of LLMs for statistics have found the biggest improvements for prompts that:
- Include domain knowledge in the prompt
- Include data or summary data in the prompt
- Combine domain knowledge with CoT (but CoT on its own doesn’t help)
In addition, larger and more up-to-date models tend to be better. e.g. try Claude 4.0 over GPT-mini.
7.4.1 What LLMs don’t do that real statisticians do…
If you consult a human statistician they’ll usually ask you lots of questions. LLMs, in contrast, will tend to just give you an answer, whether or not they have enough context.
Say you asked me the same question you had in your LLM prompt like “how do see if fish are related to coral”. There’s no way I’d jump in with an answer with so little information. But the LLM will.
So be aware of this shortcoming and come to prompting pre-prepared with the context it will need to give you a better answer.
7.4.2 Guidelines for prompting for statistical advice
Attach domain knowledge Try to find quality written advice from recognized researchers to include in your prompts.
Always provide context on the data For instance, the model will give better advice for the prompt above if we tell it that pres.topa
is integer counts (it will probably then recommend poisson GLM straight away). Likewise, if your replicates are different sites, tell that to the model so it has the opportunity to recommend approaches that are appropriate for spatial analysis.
Attach data to your prompts You can attach the whole dataset if its in plain text (e.g. csv). Or write a summary()
and/or head()
to file and attach that.
Combine the above approaches with Chain of Thought Just add ‘use Chain of Thought reasoning’ to your prompt. Its that easy.
Double-up on chain of thought with self evaluation After the initial suggest try prompts like “are you sure?”, “Take a deep breath, count to ten and think deeply”, “Evaluate the quality of the options on a 1-5 scale”.
Tip: Make a library of reference material for your prompting. If you see vignettes, blogs, or supplemental sections of papers that explain an analysis well, save them as text files to use in prompts.
7.4.3 Improving our initial prompt by attaching data
Recall our initial prompt was:
How can I statistically test the relationship between pres.topa and CB_cover?
Try some of the strategies above (make a new prompt by clicking the + button) and compare the quality of advice.
For instance, you can save a data summary like this:
Then drag and drop it into the ask window and add something like:
How can I statistically test the relationship between pres.topa and CB_cover? Here are the first 6 rows of data
7.4.4 Improving our initial prompt by attaching domain knowledge
You can further improve the response by attaching a trusted resource. e.g. save this webpage on count models for ecology to your computer. Then you can attach the html file. That turned out to be a bit slow to compute (file too large?). Would be better if we had in plain text (e.g. copy and paste the text to a file, or use an extraction tool to extract text from the html).
If you installed the websearch tool (which will likely become default in future) then you could add a prompt like this:
How can I statistically test the relationship between pres.topa and CB_cover? Here are the first 6 rows of data. pres.topa is my response and it is count data. Use @websearch to find robust recommendations for ecologists to analyse count data before proceeding with your recomemndations.
That worked well for me. I then followed up with:
Great. Evaluate the robustness of each suggestoin on a 1-10 scale
And it gave me a nice summary suggesting to try overdispersion models first (which is a good suggestion).
The absolute best practice would be to give the assistant all the context for your study and observational design. Let’s see how doing that can work in our favour when planning implementation.
7.5 Planning implementation
The other main way to use Ask mode is for help in implementing an analysis. Many of our workflows are complex and involve multiple data wrangling steps.
To get the best out of GC I recommend creating a detailed README.md file with project context. Let’s try that and use it to plan our project.
Save the README.md that his here to a local file. (Remember that we are going to be using this as a prompt, so read it first).
Now you can attach it (or open it then click new chat). Given all the context you’ve provided you can just write something simple like:
Help me plan R code to implement this analysis.
Or
Help me plan the workflow and scripts to implement this analysis
I did this. It suggested both code (that looked approximatley correct) and the directory structure, sticking to my guideline in the readme about being modular.
You should iterative with Ask mode to if there are any refinements you want.
Let’s move onto edit mode to see how to put this plan to action.