Crosstabs (cross tabulation): complete guide

dSurvey’s Crosstab section offers tools for conducting multivariate analysis by intersecting 2 or more variables. By creating crosstabs (cross-tabulated tables or contingency tables), you can identify associations and correlations between variables using statistical tests like Chi-square, Z-Test, and ANOVA.

In IdSurvey’s crosstabs, you can choose variables such as questions, open-ended answers, or fields from respondent details.

For instance, in a survey, we inquired about the level of education and the frequency of social media usage. The hypothesis was that “the level of education is correlated with the frequency of social media usage.” Through the crosstab, it is possible to analyze the data to identify any correlation between the level of education and the frequency of social media usage, thus confirming or rejecting the formulated hypothesis.

Interface

Despite the complexity of the tool and data processing, the Crosstab interface has been designed to be intuitive and fast.

Crosstab

  • Basic settings (Confidence level and number of decimals)
  • 2. Views management
  • 3. Side bar
    • A. Banner (columns)
    • B. Stubs (rows)
    • C. Cell metrics
  • 4. Export
  • 5. Select “All completed interviews / Only validated interviews”
  • 6. Crosstab area
    • D. Stub title (row variable)
    • E. Crosstab title (column variables)
    • F. Stub variables (rows)
    • G. Banner values

Create a new crosstab

To create a new crosstab table, you have to select at least one column variable (also called a banner) and at least one row variable (stub). Additionally, you have to choose at least one metric to display in the cells of the crosstab table.

  • Click on “+ Add new banner” in the sidebar, then from the dropdown, search for and select the desired banner question. The answer options of the banner question will represent the columns of the crosstab.
  • 2. Now, select the “stub” question by clicking in the “+ Add new stub” area. The answer options of the stub question will represent the rows of the crosstab.
  • 3. Finally, from the “Cell metrics” section, select the desired metric. For example, “Count.”

A crosstab will be displayed showing the count of responses distributed for each combination of answer options between the stub question and the banner question.

Crosstab

Add more banner variables (columns)

IdSurvey’s Crosstab allows you to add multiple variables to the banner, either in parallel (to add a new table alongside existing ones) or nested (to divide each column of the first variable into as many sub-columns as there are response options in the nested variable).

Add a parallel variable

  • Click on “+ Add new banner” and select the other desired banner question

A second table will be displayed alongside the previous one. This second table will show the distribution of data related to the new question, still cross-tabulated with the set stub variable.

Add a nested variable

  • Click on “+ Add new banner” and select the other desired banner question.
  • Click and drag the new variable, moving it slightly to the right and below the main variable. When you drag it to a compatible area, a symbol will appear, highlighting the hierarchy of the variable you are moving.

A table will be displayed with columns representing the values of the main variable (e.g., answer options, open-ended values, or contact field values), each of which will be subdivided by the values of the sub-variable.

Nested variable

Please note

  • You can add an unlimited number of parallel banner variables and set multiple nested variables, up to a maximum of 5 levels.
  • Nested variables can generate tables that are difficult to read and compromise the performance of the page, especially if beyond 3 levels or if the number of variable values is excessive.

Add more stub variables (rows)

You can add multiple stubs to quickly switch between crosstabs while preserving the distribution of the crosstab’s banner. It is possible to add a virtually unlimited number of stub variables; however, you can view the cross-tabulated data with the banner variables of only one stub variable at a time.

  • 1. To add a stub variable, click on “+ Add new stub” and choose the desired question, open-ended answer, or contact field.
The selected stub is highlighted with a blue background. The name of the selected stub variable (e.g., the text of the question) is also visible as the title of the crosstab table. It is not possible to add nested variables as stubs.

Order, delete, and perform other actions on variables

  • You can click and drag a variable to order them in the desired sequence.
  • Variables can also be dragged from the stub section to the banner section or vice versa.
  • Stubs do not allow the addition of nested variables. Therefore, if you drag a nested variable from the banner to the stub, the child variables will be removed.
  • To delete a variable, simply click on the ‘x’ that appears to the right of the name when hovering over the variable.
  • You can add the same variable multiple times and customize the settings uniquely.
  • Note that when you delete a variable, the associated settings (data source, appearance, and buckets) will be lost.

Select cell metrics

  • To add a new metric to the crosstab, click “+ Add new cell metrics” and select the desired metrics from the list.
  • To remove an active metric, click on the corresponding tag or the “x” symbol next to the metric’s name. You can delete an active metric either from the sidebar or from the dropdown menu.
Please note

  • The selected metrics are displayed in a dark blue color.
  • Metrics that are selected but cannot be represented are shown in a lighter shade of blue.
  • Unselected metrics in the dropdown list are displayed in gray.
  • Unselected and non-representable metrics in the dropdown list are shown in a lighter shade of gray.

Variables

In the previous examples, we considered questions as variables. However, it’s essential to know that various types of data can be used as variables.

  • Simple question, Row in a matrix, Cell in a 3D matrix.

    These three types of variables are treated the same way because the data structure is identical. A matrix corresponds to a series of simple questions. Selecting a specific row in a matrix, therefore, yields a data structure analogous to that of a simple question. The columns of the matrix are effectively the answer options for the row.

Single select questions in the crosstab generate as many columns or rows (depending on whether used as a banner or a stub) as there are response options.
.
  • Answer option for a multiple-choice question.
    Any response from a multiple-choice question can also be used as a variable. In this case, it is referred to as a dichotomous variable, meaning the variable can only take on two values, “selected” or “not selected.” In the crosstab, it will be represented by two columns: one with a heading that reflects the label of the response option (where the selections of the checkbox will be counted) and the other named “blank” (where the non-selected options will be counted).

  • Open end answer
    IdSurvey’s crosstabs also allow the use of the open-ended answer as a variable. In this case, the crosstab represents as many columns or rows (depending on whether used as a banner or a stub) as there are different values provided by respondents. Although there are no imposed limits on using open-ended answers as variables, it is advisable to do so when the open-ended response contains a limited number of values. For example, when the open-ended answer is set as a dropdown, and the respondent selects one of the expected values. Another strategy may be to group different options into a limited number of categories using the bucket function described below.
  • Contact field

    IdSurvey allows the use of any contact field as a variable. Similar to open-ended answers, it is advisable to use contact fields that contain a limited number of different values for convenient analysis in a crosstab. This enables aggregation into a few easily representable and analyzable columns or rows in the crosstab. Each value taken by the contact field variable will be represented as a row or column, depending on whether it is set as a stub or a banner.
Important note on multiple-choice question variables

It’s important to keep in mind that when working with variables created from single-choice questions, we can consider the “cases” as the number of respondents who answered the question (which essentially equals the number of selected responses). When selecting a multiple-choice question, however, the number of cases is always to be understood as the number of selected answers. This rule applies to every type of metric available in IdSurvey’s Crosstab. Below are three scenarios to clarify the interpretation of crosstabs with multiple-choice questions as variables.

  • Single-choice question on the banner and multiple-choice question on the stub:

    For each case in the banner, multiple cases in the stub are possible. If a respondent selected their country of residence (single choice) and then selected 3 devices they own from a multiple-choice question, you will find 1 case in the “Valid cases” count in the crosstab, corresponding to 3 cases in 3 different rows of the stub. For example, for the country Italy (column), there will be 3 cases in the rows corresponding to tablet, smartphone, and smartwatch, respectively.
  • Multiple-choice question on the banner and single-choice question on the stub:
    
For multiple cases in the banner, there are multiple cases in the stub row. If a respondent selected 3 devices they own from a multiple-choice question (banner) and then selected their country of residence (single choice), you will find 3 cases in the crosstab in the “Valid cases”, corresponding to 3 units in the row of the stub corresponding to the country (one case for each intersection of device – country). Reversing the questions from the previous example, you will find 3 cases in the “Italy” row, one for each related column (tablet, smartphone, and smartwatch).
  • Multiple-choice question on both banner and stub:

    This type of cross-variable intersection is not supported and cannot be represented in the crosstab.

Variables settings

Moving the mouse over any variable allows you to view and click the gear icon to access the settings.

You can always restore the default settings of the variable by clicking on the icon found at the top right of the settings window.

Data source
This section allows you to set the data source of the variable.
Using the “Include” switch, you can select or deselect the values you want to include in the crosstab analysis. For example, you might need to exclude one or more response options from the selected question used as a variable. The deselected values will be excluded from the counts of any metric.

The score parameter allows assigning scores to individual values. If custom scores are not set, the crosstab will use the response option codes. This is particularly useful when the chosen variable is a Likert scale question or any Likert-type scale where the codes set in the questionnaire are sequential numbers.

Note
Banner variables are always considered categorical, and scores are therefore ignored. If the stub variable is a numeric variable, and you want to display calculations based on scores in the crosstab, select one or more compatible metrics, such as mean, median, standard deviation, etc.

Data source

Appearance
This section allows you to order or hide values without affecting the crosstab calculations.
 


  • To change the order of values click on the icon and drag the item to the desired position. The crosstab will display the columns or rows according to the set order.
  • To hide a value from the crosstab, deactivate the corresponding switch.
Buckets

This feature allows you to group variable values. For example, if you choose a numeric variable like a Likert scale question from 1 to 10, you can create groups such as “Low,” “Medium,” and “High” and drag the respective values into these groups. The numeric variable can be analyzed as categorical. You can use the same strategy to group values of a question with many response options or group many values of an open-ended response to make crosstabs easier to analyze.

The crosstab’s banner variables will be represented with columns corresponding to any set groupings instead of individual values.

Stub variables will be displayed either grouped or expanded, depending on the chosen metric. For instance, selecting the Count metric will show a table with one row for each value, while choosing the Bucketed counts metric will display another table with rows grouped as set in the variable’s settings. This allows for simultaneously presenting data for individual values and grouped data, providing both detailed and aggregated information for easier comprehension.

For all categorical analysis metrics, both the bucketed and standard versions are available.

  • To create a grouping, click on “+ Add new bucket,” then drag the desired values into the area of the new bucket. To change the name of the grouping, click on the automatically generated name and write the desired one.
  • Create additional groups until all values are distributed into the buckets.
  • If you click the “Ok” button before all values are placed in groups, the system will create a special bucket called “Other” and automatically place any unassigned values into it.
  • To delete a bucket, click on the X that appears in the top right corner of the bucket.

Bucket

Main bar

From the main toolbar, you can access the general settings and functions of the section.

Settings
By clicking on the gear icon, you can set the confidence level and the number of decimal places for the data displayed in the table.

The confidence level provides an estimate of the precision of the statistical analysis and the probability that the results are representative of the population.

The confidence level is a crucial parameter for interpreting the significance of pairwise Z-Tests and for interpreting the p-value obtained with the Chi-square test. It thus influences the results displayed in the crosstab when using various “stats test” metrics. Refer to the list of metrics for more details.

Views
Just like in other sections of IdSurvey, Crosstabs’ views allow you to save and edit multiple crosstabs, including their settings, preferences, and metrics. All changes made to the Crosstab page are automatically saved in a temporary view called “View not saved.” This allows you to resume the analysis at any time, even if the work has not been manually saved in a new view previously.

Private views or team views for all Crosstabs users can be created. Views can only be edited or deleted by the user who created them by opening the list of views and clicking on the wrench icon that appears to the right of the view name.

Export
It is possible to export the created crosstabs in Excel format. There are three export modes:

  • Export current stub: Exports exclusively the selected stub, i.e., the crosstab displayed on the screen.
  • Export all stubs: Exports all the stubs set in the current view, in a single Excel file sheet.
  • Export views: Allows selecting and exporting multiple views simultaneously, generating one sheet for each view in a single Excel file.

All complete interviews / Only validated interviews
This selector allows you to specify whether to include in the analysis all completed interviews or only those validated in the Quality Control section.

List of Cell Metrics

Basic Metrics

  • Show cases
    It activates the display of Valid cases and, when applicable, Total cases.

    • Valid cases

      It is the number of relatable cases, i.e., the number of values present both in the banner and in the stub. 

      In the case of crossing two single-choice questions, it can be considered as the number of respondents who answered both the banner question and the stub question. To understand how to interpret the data in the case of crossing with multiple-choice questions, refer to the section ‘Important Note on Multiple-Choice Question Variables.
    • Total cases
      It is the number of cases present in the stub, regardless of the number of cases present in the banner.
      In the case of crossing two single-choice questions, it can be considered as the number of respondents who answered the stub question, regardless of how many answered the banner question (which, for example, may not have been shown in some interviews due to skips or display conditions). To understand how to interpret the data in the case of crossing with multiple-choice questions, refer to the section ‘Important Note on Multiple-Choice Question Variables.

      If Show cases is selected, Total cases will be automatically displayed in the crosstab when selecting any ‘total’ metric, meaning that it shows percentages based on the total number of cases rather than just valid cases.
  • Missing cases
    It is the number of non-relatable cases, i.e., the number of values present in the banner but not in the stub. Missing cases are equal to Total cases minus Valid cases.
    
In the case of the intersection of two single-response questions, it can be considered as the number of respondents who answered the banner question but did not answer the stub question (which, for example, might not have been displayed in some interviews due to skips or display conditions). To understand how to interpret the data in the case of the intersection with multiple-choice questions, refer to the paragraph ‘Important Note on Multiple-Choice Question Variables.

Distributed Count Metrics

The following metrics are also available in the “bucketed” version. The bucketed versions of these metrics add tables related to the groupings set in the stub variable.

  • Count
    Show the cross-tabulation table with the distribution in the columns of the number of valid cases.
  • Row percentage
    Show the cross-tabulation table with row distribution percentages. The sum of all row percentages is equal to 100% (in the case of cross-tabulation with multiple-choice questions as banners, the row percentage may exceed 100%). By convention, 0% is indicated in the row total if there are no cases in the row.
  • Column percentage (valid) and Column percentage (total)
    Show the cross-tabulation table with column distribution percentages. The sum of column percentages is equal to 100%.
    The “valid” version considers only relatable cases present in both the banner and the stub (valid cases). The “total” version considers all cases present in the stub, regardless of whether there is a corresponding response in the banner (Total cases, equal to valid cases + missing cases). In the case of cross-tabulation with multiple-choice questions as stub, the percentage may exceed 100%.

Metrics of statistical processing based on distributed counts

The following metrics are also available in the “bucketed” version. The bucketed versions of the following metrics add the processing results to the tables showing the bucketed counts of the stub variable.

  • Column stat test (valid) and Column stat test (total)
    The Column stat test applies the “pairwise z-test.” Z-Tests use the standard deviation to assess whether the means of two data samples are significantly different. In practical terms, this test identifies columns in the crosstab with significant differences, suggesting, for example, that a specific response option to the banner question (column) leads to significant differences in responses to the stub question (rows).

In the crosstab, a new row is inserted for each value of the variable (stub). If the test detects significant differences, the letter corresponding to the column associated with these differences will be indicated.

    The “valid” version considers only the relatable cases present in both the banner and stub (valid cases). The “total” version considers all cases present in the stub question, regardless of whether there is a corresponding response in the banner question (Total cases, equal to valid cases + missing cases).
  • Overall stat test of percentages

    The Overall Stats Test of Percentages calculates the p-value resulting from the Chi-square test. The Chi-square test is designed to assess the statistical significance of the relationship between two categorical variables. However, you can also leverage the Chi-square test with numeric banner variables by grouping values into buckets, where each grouping represents a category.
The calculated p-value is displayed in a dedicated row at the bottom of the percentage crosstab. To activate the Overall Stats Test of Percentages metric, it is necessary to have a Column Percentage metric activated as well.

What it means and how to assess p-value
A p-value less than 0.05 (with a confidence level set at 95%) suggests that the observed frequencies in the crosstab are not random (technically known as rejecting the null hypothesis) and that there is a statistically significant association or correlation between the variables. The higher the confidence level, the smaller the p-value must be to suggest an association or correlation between the variables. For example, if the confidence level is set at 99%, to reject the null hypothesis, the p-value must be less than 0.01, meaning there is less than a 1% chance that the observed frequencies in the crosstab are random. You can configure the significance of the p-value by adjusting the Confidence Level from the Settings gear in the main menu of the Crosstab page.

Central Tendency and Dispersion Metrics Based on Scores

To enable any of the following metrics, it is necessary to have scores set for the values of the stub variable (see Variable Settings). In the case of questions with response options with numerical codes, these will be automatically used as scores (e.g., Likert scale or other rating scales).

  • Mean
    It is the arithmetic mean of the scores. A row is displayed at the bottom of the crosstab showing the averages for each column.
  • Median
    It is the central value of the ordered score series. A row is displayed at the bottom of the crosstab showing the median for each column.
  • Varianza
    It is a statistical parameter that measures dispersion or variability. It is equal to the square of the standard deviation. A row is displayed at the bottom of the crosstab showing the variance for each column.
  • Standard deviation
    It is a statistical parameter that measures dispersion or variability. It is equal to the square root of the variance. A row is displayed at the bottom of the crosstab showing the standard deviation for each column..
  • Standard error

    The standard error is a statistical measure of the dispersion of a sample of data points around their mean. It provides an estimate of the uncertainty or variability of the sample mean compared to the population mean. In other words, the standard error quantifies how much the mean of a sample can vary from one sample to another..

Statistical processing metrics based on the averages of scores.

  • Stat test of column means

    The stat test of column means applies the ‘pairwise z-test’ to column score values. Z-tests use standard deviation to assess whether the means of two sets of data samples are significantly different. In practical terms, this test identifies columns in the crosstab with statistically significant score differences, suggesting, for example, that a specific response option to the banner question (column) leads to significant differences in the score of a stub variable (rows).
  • Overall stat test mean

    The overall stat test mean functions as Analysis of Variance (ANOVA). ANOVA evaluates the relationship between a categorical variable associated with the banner and a numerical variable related to the stub, testing differences among two or more means of scores. This test generates a p-value to determine whether the relationship is significant or not. The p-value is displayed below the table of mean scores. For further insights, read the paragraph in this article titled ‘What Does P-Value Mean and How to Evaluate It.’

How to read statistical metric data

If we add the ‘Column stat test’ metric, we observe that new information is added to the table. This involves a pairwise Z-Test.

Read statistical metric data

In this example, we see that in the ‘Primary’ column, under the ‘Little’ row, the letters B and C are indicated. This signifies that for those with an education level of primary schools, the probability of declaring spending little time on social media is significantly higher than those with ‘Secondary’ (B) or ‘Higher education’ (C) education levels.

In this example, a correlation between the level of education and the declared time emerges. The strategy for interpreting the data in this example is common to all pairwise Z-Test metrics (Column stat test, Bucket column stat test, Stat test of column means).