`table_one`

Synopsis

"Table 1" is a common term for the first table in a paper that summarizes demographic and other individual data of the population that is being studied. In general terms, it is a table where different columns from the source table are summarized separately, stacked along the rows. The types of analysis can be chosen manually, or will be selected given the column types. Optionally, there can be grouping applied along the columns as well.

In this example, several variables of a hypothetical population are analyzed split by sex.

using SummaryTables
using DataFrames

data = DataFrame(
    sex = ["m", "m", "m", "m", "f", "f", "f", "f", "f", "f"],
    age = [27, 45, 34, 85, 55, 44, 24, 29, 37, 76],
    blood_type = ["A", "0", "B", "B", "B", "A", "0", "A", "A", "B"],
    smoker = [true, false, false, false, true, true, true, false, false, false],
)

table_one(
    data,
    [:age => "Age (years)", :blood_type => "Blood type", :smoker => "Smoker"],
    groupby = :sex => "Sex",
    show_n = true
)


		Sex
	Total (n=10)	f (n=6)	m (n=4)

Age (years)
Mean (SD)	45.6 (20.7)	44.2 (19.1)	47.8 (25.9)
Median [Min, Max]	40.5 [24, 85]	40.5 [24, 76]	39.5 [27, 85]
Blood type
0	2 (20%)	1 (16.7%)	1 (25%)
A	4 (40%)	3 (50%)	1 (25%)
B	4 (40%)	2 (33.3%)	2 (50%)
Smoker
false	6 (60%)	3 (50%)	3 (75%)
true	4 (40%)	3 (50%)	1 (25%)

You can also omit the second argument as a shortcut when you quickly want to summarize all columns of your dataset. The columns in groupby are excluded automatically:

table_one(
    data,
    groupby = :blood_type,
)


		blood_type
	Total	0	A	B

sex
f	6 (60%)	1 (50%)	3 (75%)	2 (50%)
m	4 (40%)	1 (50%)	1 (25%)	2 (50%)
age
Mean (SD)	45.6 (20.7)	34.5 (14.8)	34.2 (7.8)	62.5 (22.8)
Median [Min, Max]	40.5 [24, 85]	34.5 [24, 45]	33 [27, 44]	65.5 [34, 85]
smoker
false	6 (60%)	1 (50%)	2 (50%)	3 (75%)
true	4 (40%)	1 (50%)	2 (50%)	1 (25%)

Argument 1: `table`

The first argument can be any object that is a table compatible with the Tables.jl API. Here are some common examples:

`DataFrame`

using DataFrames
using SummaryTables

data = DataFrame(x = [1, 2, 3], y = ["4", "5", "6"])

table_one(data, [:x, :y])


	Total

x
Mean (SD)	2 (1)
Median [Min, Max]	2 [1, 3]
y
4	1 (33.3%)
5	1 (33.3%)
6	1 (33.3%)

`NamedTuple` of `Vector`s

using SummaryTables

data = (; x = [1, 2, 3], y = ["4", "5", "6"])

table_one(data, [:x, :y])


	Total

x
Mean (SD)	2 (1)
Median [Min, Max]	2 [1, 3]
y
4	1 (33.3%)
5	1 (33.3%)
6	1 (33.3%)

`Vector` of `NamedTuple`s

using SummaryTables

data = [(; x = 1, y = "4"), (; x = 2, y = "5"), (; x = 3, y = "6")]

table_one(data, [:x, :y])


	Total

x
Mean (SD)	2 (1)
Median [Min, Max]	2 [1, 3]
y
4	1 (33.3%)
5	1 (33.3%)
6	1 (33.3%)

Optional argument 2: `analyses`

The second argument takes a vector specifying analyses, with one entry for each "row section" of the resulting table. If only one analysis is passed, the vector can be omitted. Each analysis can have up to three parts: the variable, the analysis function and the label.

For convenience, if the analyses argument is omitted, it is equivalent to passing Tables.columnnames(table) except that all columns referenced in groupby are filtered out.

The variable is passed as a Symbol or String, corresponding to a column in the input data, and must always be specified. The other two parts are optional.

If you specify only variables, the analysis functions are chosen automatically based on the columns, and the labels are equal to the variable names. Number variables show the mean, standard deviation, median, minimum and maximum. String variables or other non-numeric variables show counts and percentages of each element type.

using SummaryTables

data = (; x = [1, 2, 3], y = ["a", "b", "a"])

table_one(data, [:x, :y])


	Total

x
Mean (SD)	2 (1)
Median [Min, Max]	2 [1, 3]
y
a	2 (66.7%)
b	1 (33.3%)

In the next example, we rename the x variable by passing a String in a Pair.

using SummaryTables

data = (; x = [1, 2, 3], y = ["a", "b", "a"])

table_one(data, [:x => "Variable X", :y])


	Total

Variable X
Mean (SD)	2 (1)
Median [Min, Max]	2 [1, 3]
y
a	2 (66.7%)
b	1 (33.3%)

Labels can be any type except <:Function (that type signals that an analysis function has been passed). One example of a non-string label is Concat in conjunction with Superscript.

using SummaryTables

data = (; x = [1, 2, 3], y = ["a", "b", "a"])

table_one(data, [:x => Concat("X", Superscript("with superscript")), :y])


	Total

X^{with superscript}
Mean (SD)	2 (1)
Median [Min, Max]	2 [1, 3]
y
a	2 (66.7%)
b	1 (33.3%)

Any object which is a subtype of Function is assumed to be an analysis function. An analysis function takes a data column as input and returns a Tuple where each entry corresponds to one analysis row. Each of these rows consists of a Pair where the left side is the analysis result and the right side the label. Here's an example of a custom number column analysis function. Note the use of Concat to build content out of multiple parts. This is preferred to interpolating into a string because interpolation destroys the original objects and takes away the possibility for automatic rounding or other special post-processing or display behavior.

using SummaryTables
using Statistics

data = (; x = [1, 2, 3])

function custom_analysis(column)
    (
        minimum(column) => "Minimum",
        maximum(column) => "Maximum",
        Concat(mean(column), " (", std(column), ")") => "Mean (SD)",
    )
end

table_one(data, :x => custom_analysis)


	Total

x
Minimum	1
Maximum	3
Mean (SD)	2 (1)

Finally, all three parts, variable, analysis function and label can be combined as well:

using SummaryTables
using Statistics

data = (; x = [1, 2, 3])

function custom_analysis(column)
    (
        minimum(column) => "Minimum",
        maximum(column) => "Maximum",
        Concat(mean(column), " (", std(column), ")") => "Mean (SD)",
    )
end

table_one(data, :x => custom_analysis => "Variable X")


	Total

Variable X
Minimum	1
Maximum	3
Mean (SD)	2 (1)

Keyword: `groupby`

The groupby keyword takes a vector of column name symbols with optional labels. If there is only one grouping column, the vector can be omitted. Each analysis is then computed separately for each group.

using SummaryTables

data = (; x = [1, 2, 3, 4, 5, 6], y = ["a", "a", "a", "b", "b", "b"])

table_one(data, :x, groupby = :y)


		y
	Total	a	b

x
Mean (SD)	3.5 (1.87)	2 (1)	5 (1)
Median [Min, Max]	3.5 [1, 6]	2 [1, 3]	5 [4, 6]

In this example, we rename the grouping column:

using SummaryTables

data = (; x = [1, 2, 3, 4, 5, 6], y = ["a", "a", "a", "b", "b", "b"])

table_one(data, :x, groupby = :y => "Column Y")


		Column Y
	Total	a	b

x
Mean (SD)	3.5 (1.87)	2 (1)	5 (1)
Median [Min, Max]	3.5 [1, 6]	2 [1, 3]	5 [4, 6]

If there are multiple grouping columns, they are shown in a nested fashion, with the first group at the highest level:

using SummaryTables

data = (;
    x = [1, 2, 3, 4, 5, 6],
    y = ["a", "a", "b", "b", "c", "c"],
    z = ["d", "e", "d", "e", "d", "e"],
)

table_one(data, :x, groupby = [:y, :z => "Column Z"])


		y
		a		b		c
		Column Z		Column Z		Column Z
	Total	d	e	d	e	d	e

x
Mean (SD)	3.5 (1.87)	1 (NaN)	2 (NaN)	3 (NaN)	4 (NaN)	5 (NaN)	6 (NaN)
Median [Min, Max]	3.5 [1, 6]	1 [1, 1]	2 [2, 2]	3 [3, 3]	4 [4, 4]	5 [5, 5]	6 [6, 6]

Keyword: `show_n`

When show_n is set to true, the size of each group is shown under its name.

using SummaryTables

data = (; x = [1, 2, 3, 4, 5, 6], y = ["a", "a", "a", "a", "b", "b"])

table_one(data, :x, groupby = :y, show_n = true)


		y
	Total (n=6)	a (n=4)	b (n=2)

x
Mean (SD)	3.5 (1.87)	2.5 (1.29)	5.5 (0.707)
Median [Min, Max]	3.5 [1, 6]	2.5 [1, 4]	5.5 [5, 6]

Keyword: `show_total`

When show_total is set to false, the column summarizing all groups together is hidden. Use this only when groupby is set, otherwise the resulting table will be empty.

using SummaryTables

data = (; x = [1, 2, 3, 4, 5, 6], y = ["a", "a", "a", "a", "b", "b"])

table_one(data, :x, groupby = :y, show_total = false)


	y
	a	b

x
Mean (SD)	2.5 (1.29)	5.5 (0.707)
Median [Min, Max]	2.5 [1, 4]	5.5 [5, 6]

Keyword: `total_name`

The object that will be used to identify total columns. Can be of any value that SummaryTables knows how to display.

using SummaryTables

data = (; x = [1, 2, 3, 4, 5, 6], y = ["a", "a", "a", "a", "b", "b"])

table_one(data, :x, groupby = :y, total_name = "Overall")


		y
	Overall	a	b

x
Mean (SD)	3.5 (1.87)	2.5 (1.29)	5.5 (0.707)
Median [Min, Max]	3.5 [1, 6]	2.5 [1, 4]	5.5 [5, 6]

Keyword: `group_totals`

A Symbol or String, or a Vector{Symbol} or Vector{String} specifying one or multiple groups for which to add subtotals. All but the topmost group can be chosen here as the topmost group is handled by show_total already.

using SummaryTables

data = (; x = 1:12, y = repeat(["a", "b"], 6), z = repeat(["c", "d"], inner = 6))

table_one(data, :x, groupby = [:y, :z], group_totals = :z)


		y
		a			b
		z			z
	Total	c	d	Total	c	d	Total

x
Mean (SD)	6.5 (3.61)	3 (2)	9 (2)	6 (3.74)	4 (2)	10 (2)	7 (3.74)
Median [Min, Max]	6.5 [1, 12]	3 [1, 5]	9 [7, 11]	6 [1, 11]	4 [2, 6]	10 [8, 12]	7 [2, 12]

This example shows multiple-level group totals. In order not to make the resulting table too wide, the topmost factor q just has one level which would otherwise be redundant.

using SummaryTables

data = (; x = 1:12, y = repeat(["a", "b"], 6), z = repeat(["c", "d"], inner = 6), q = repeat(["e"], 12))

table_one(data, :x, groupby = [:q, :y, :z], group_totals = [:y, :z])


		q
		e
		y
		a			b			Total
		z			z
	Total	c	d	Total	c	d	Total

x
Mean (SD)	6.5 (3.61)	3 (2)	9 (2)	6 (3.74)	4 (2)	10 (2)	7 (3.74)	6.5 (3.61)
Median [Min, Max]	6.5 [1, 12]	3 [1, 5]	9 [7, 11]	6 [1, 11]	4 [2, 6]	10 [8, 12]	7 [2, 12]	6.5 [1, 12]

Keyword: `sort`

By default, group entries are sorted. If you need to maintain the order of entries from your dataset, set sort = false.

Notice how in the following two examples, the group indices are "dos", "tres", "uno" when sorted, but "uno", "dos", "tres" when not sorted. If we want to preserve the natural order of these groups ("uno", "dos", "tres" meaning "one", "two", "three" in Spanish but having a different alphabetical order) we need to set sort = false.

using SummaryTables

data = (; x = [1, 2, 3, 4, 5, 6], y = ["uno", "uno", "dos", "dos", "tres", "tres"])

table_one(data, :x, groupby = :y)


		y
	Total	dos	tres	uno

x
Mean (SD)	3.5 (1.87)	3.5 (0.707)	5.5 (0.707)	1.5 (0.707)
Median [Min, Max]	3.5 [1, 6]	3.5 [3, 4]	5.5 [5, 6]	1.5 [1, 2]

table_one(data, :x, groupby = :y, sort = false)


		y
	Total	uno	dos	tres

x
Mean (SD)	3.5 (1.87)	1.5 (0.707)	3.5 (0.707)	5.5 (0.707)
Median [Min, Max]	3.5 [1, 6]	1.5 [1, 2]	3.5 [3, 4]	5.5 [5, 6]

Warning

If you have multiple groups, sort = false can lead to splitting of higher-level groups if they are not correctly ordered in the source data.

Compare the following two tables. In the second one, the group "A" is split by "B" so the label appears twice.

using SummaryTables

data = (; x = [1, 2, 3, 4, 5, 6], y = ["A", "A", "B", "B", "B", "A"], z = ["C", "C", "C", "D", "D", "D"])

table_one(data, :x, groupby = [:y, :z])


		y
		A		B
		z		z
	Total	C	D	C	D

x
Mean (SD)	3.5 (1.87)	1.5 (0.707)	6 (NaN)	3 (NaN)	4.5 (0.707)
Median [Min, Max]	3.5 [1, 6]	1.5 [1, 2]	6 [6, 6]	3 [3, 3]	4.5 [4, 5]

table_one(data, :x, groupby = [:y, :z], sort = false)


		y
		A	B		A
		z	z		z
	Total	C	C	D	D

x
Mean (SD)	3.5 (1.87)	1.5 (0.707)	3 (NaN)	4.5 (0.707)	6 (NaN)
Median [Min, Max]	3.5 [1, 6]	1.5 [1, 2]	3 [3, 3]	4.5 [4, 5]	6 [6, 6]