Resources

Answers to frequently asked questions about medical statistics, plus practical guides for clients working with us.

FAQ

Frequently Asked Questions about Medical Statistics

Here we collect answers to the questions we most commonly receive from researchers and clinicians.

Can't find the answer you're looking for? Contact us – we're happy to help!

Before the study starts

Ideally before! We recommend getting in touch when you are writing your grant application (e.g. to a funding body). We can help with study design, estimate the cost of statistical work for your budget, and perform power calculations.
It depends on the expected effect size, the variability in the data, and the acceptable margin of error. A power calculation takes all of this into account and provides a scientifically justified sample size. We strongly recommend performing this calculation before data collection begins.
Yes, as much as possible. Pre-specifying the primary research question, outcome measures, and analytical method (preferably in a study protocol) is good scientific practice. It reduces the risk of results being questioned after the fact.

Collaboration & Practical matters

It varies depending on data quality, volume, and the complexity of the research question. A significant portion of time is often spent on data cleaning and creating data that is suitable for analysis. Please contact us well in advance so we can schedule the work.
The price depends on the scope of the project. Get in touch with a brief description of your project and we will put together a quote.
We follow the ICMJE guidelines. If we contribute substantially to the study design, perform and interpret the analyses, and participate in manuscript work, co-authorship is appropriate. For more limited contributions and pure advisory input, we are happy to be mentioned in the Acknowledgements instead. We always discuss this openly before starting.

Data, security and transfer

Yes, absolutely. Data must not contain names, national identification numbers, or other direct identifiers. Use a running number (e.g. Study_ID) for each patient. You keep the code key yourself and store it securely according to your institution's rules. We work only with the de-identified file.
Never send research data via ordinary email, even if it has been de-identified. Use approved platforms for secure file transfer (for example, services provided by your university, hospital, or institution). If you do not have access to a secure system, we can set up an transfer link for you.
We can read most formats – Excel, CSV, SPSS, and REDCap exports. Empty cells are always interpreted as missing values; simply leave them blank. If you collect data in Excel, see our Excel checklist for detailed recommendations.

Statistical methods

It depends on your research question, study design, and variables. You do not need to know this in advance – that is precisely what we are here to help you with!
A p-value gives the probability of observing your result assuming the null hypothesis is true. It says nothing about the magnitude or clinical importance of the effect. A confidence interval gives a range of plausible values for the true effect and is often far more informative – which is increasingly required by journals today.
Not always. Many methods are robust to deviations from normality, especially with larger samples. For smaller datasets or marked skewness, we use non-parametric alternatives. We always assess the distribution of the data before selecting a method.

Reporting and publication

There are established guidelines depending on the study type. We help ensure that the reporting meets current requirements.
Yes. We can both write and review the methods section, compile results tables, and create polished, publication-ready figures.
Yes, this is a very common situation. Send us the reviewer's comments and we will work together to determine what is needed to address the criticism in the best possible way.

Checklist

Things to consider when entering research data in Excel

We generally recommend entering data into a dedicated database via a purpose-built interface (e.g. Medicase, REDCap, Viedoc). If Excel is the tool being used, this checklist helps minimise the risk of errors and creates a data structure that works smoothly for statistical analysis.

We can of course read any file you send us, but by following these guidelines you make our work easier, reduce the time we spend on your project (cheaper for you), and ensure data quality.

1. Basic structure

The most important thing is that the file has a simple and consistent structure.

  • One variable per column: Each column represents one variable (e.g. age, weight, treatment group).
  • One observation per row: Each row represents a unique observation (e.g. one patient, one sample, one time point).
  • Variable names in the first row: Row 1 should contain only short, descriptive variable names (e.g. pat_id, age_at_diagnosis, gender). Do not use special characters or spaces – use underscores instead (my_variable).
  • Data from row 2: The actual data values should start directly in row 2. Do not add extra headings, summaries or blank rows between the variable names and the data.
  • One data table per sheet: Do not mix multiple tables, images or charts on the same sheet as the raw data. One sheet = one data table.

2. Handling data and values

  • One value per cell: Enter only one value per cell. Do not write "180 cm" – instead split it into two variables: height (value: 180) and height_unit (value: "cm").
  • Empty cells for missing values: If a value is missing, leave the cell completely blank. Do not write "missing", "not measured", "NA", "–", "?", "*", "999" or similar. A blank cell is the standard for missing values and is understood by all statistical software.
  • Be consistent: If you use text codes, make sure they are identical every time. "Yes", "yes" and "YES" are treated as three different values by software. Choose one form and stick to it. The Filter function in Excel is a useful tool for finding inconsistencies.
  • No formulas: Avoid having formulas in the raw data file if you are unsure how they work. We can help create these variables later.
  • Date format: Always use the same format for dates, preferably the international format YYYY-MM-DD (e.g. 2023-11-28). This format is unambiguous and avoids confusion with other formats.

3. Variable types and formatting

  • Do not mix text and numbers: A column that is supposed to contain numbers (e.g. age) must not contain text (e.g. "approx. 45"). This forces the entire column to be interpreted as text, making calculations impossible.
  • Numbers should be numbers: Excel sometimes formats numbers as text (often shown with a small green triangle in the corner and left-aligned). Select the column, click the warning symbol, and choose "Convert to Number".
  • Code categories numerically: For categorical variables (e.g. sex, yes/no questions), use numbers instead of text. Create a separate sheet that serves as a code key / data dictionary.
    Example: smoker: 1 = Yes, 0 = No  |  sex: 1 = Male, 2 = Female
  • No cosmetic formatting: Do not use colours, bold or italic text to encode information. Highlighting an outlying cell with red disappears when data is read into statistical software. Instead, create a new column and describe the issue there.

4. Layout and "Do not" rules

  • Do not merge cells: Never use "Merge Cells". It causes problems when sorting, filtering and importing data.
  • Do not hide rows or columns: Hidden data can easily be forgotten or cause errors. All data to be analysed should be visible.
  • No summaries in the file: Do not add totals, averages or other calculations at the bottom of your data table. These may accidentally be read in as a final observation.
  • Use Freeze Panes: If you have many columns and want to keep the patient ID visible at all times, use Freeze Panes rather than repeating the ID column in multiple places.
  • Be careful when deleting: Use the Delete key to clear the contents of cells. If you right-click and choose "Delete…", be careful not to select "Shift cells up/left" – this shifts the entire data structure and can lead to catastrophic errors.

5. Data quality checks – tips for finding errors

Before sending the file, perform a few simple checks:

  • Use Filter: Activate filters on your header row. Click the arrow for a column to see all unique values. This lets you quickly spot typos ("Male", "male", "Mlae") or implausible values.
  • Check min/max: Select a numeric column. At the bottom of Excel's status bar you can see Min, Max and Average (right-click the status bar if they are not visible). Does the maximum value for age look reasonable?
  • Find duplicates: Use Excel's conditional formatting to highlight duplicate values in an ID column and ensure that each row is unique.
  • Zoom out: Zoom out significantly to get a visual overview. You may then spot rows or columns that differ in format or are unintentionally empty.

Questions about your data?

Don't hesitate to get in touch – we are happy to help you prepare and structure your dataset.

Contact us