table_one
Synopsis
"Table 1" is a common term for the first table in a paper that summarizes demographic and other individual data of the population that is being studied. In general terms, it is a table where different columns from the source table are summarized separately, stacked along the rows. The types of analysis can be chosen manually, or will be selected given the column types. Optionally, there can be grouping applied along the columns as well.
In this example, several variables of a hypothetical population are analyzed split by sex.
using SummaryTables
using DataFrames
data = DataFrame(
sex = ["m", "m", "m", "m", "f", "f", "f", "f", "f", "f"],
age = [27, 45, 34, 85, 55, 44, 24, 29, 37, 76],
blood_type = ["A", "0", "B", "B", "B", "A", "0", "A", "A", "B"],
smoker = [true, false, false, false, true, true, true, false, false, false],
)
table_one(
data,
[:age => "Age (years)", :blood_type => "Blood type", :smoker => "Smoker"],
groupby = :sex => "Sex",
show_n = true
)
Sex | |||
Overall (n=10) |
f (n=6) |
m (n=4) |
|
Age (years) | |||
Mean (SD) | 45.6 (20.7) | 44.2 (19.1) | 47.8 (25.9) |
Median [Min, Max] | 40.5 [24, 85] | 40.5 [24, 76] | 39.5 [27, 85] |
Blood type | |||
0 | 2 (20%) | 1 (16.7%) | 1 (25%) |
A | 4 (40%) | 3 (50%) | 1 (25%) |
B | 4 (40%) | 2 (33.3%) | 2 (50%) |
Smoker | |||
false | 6 (60%) | 3 (50%) | 3 (75%) |
true | 4 (40%) | 3 (50%) | 1 (25%) |
Argument 1: table
The first argument can be any object that is a table compatible with the Tables.jl
API. Here are some common examples:
DataFrame
using DataFrames
using SummaryTables
data = DataFrame(x = [1, 2, 3], y = ["4", "5", "6"])
table_one(data, [:x, :y])
Overall | |
x | |
Mean (SD) | 2 (1) |
Median [Min, Max] | 2 [1, 3] |
y | |
4 | 1 (33.3%) |
5 | 1 (33.3%) |
6 | 1 (33.3%) |
NamedTuple
of Vector
s
using SummaryTables
data = (; x = [1, 2, 3], y = ["4", "5", "6"])
table_one(data, [:x, :y])
Overall | |
x | |
Mean (SD) | 2 (1) |
Median [Min, Max] | 2 [1, 3] |
y | |
4 | 1 (33.3%) |
5 | 1 (33.3%) |
6 | 1 (33.3%) |
Vector
of NamedTuple
s
using SummaryTables
data = [(; x = 1, y = "4"), (; x = 2, y = "5"), (; x = 3, y = "6")]
table_one(data, [:x, :y])
Overall | |
x | |
Mean (SD) | 2 (1) |
Median [Min, Max] | 2 [1, 3] |
y | |
4 | 1 (33.3%) |
5 | 1 (33.3%) |
6 | 1 (33.3%) |
Argument 2: analyses
The second argument takes a vector specifying analyses, with one entry for each "row section" of the resulting table. If only one analysis is passed, the vector can be omitted. Each analysis can have up to three parts: the variable, the analysis function and the label.
The variable is passed as a Symbol
, corresponding to a column in the input data, and must always be specified. The other two parts are optional.
If you specify only variables, the analysis functions are chosen automatically based on the columns, and the labels are equal to the variable names. Number variables show the mean, standard deviation, median, minimum and maximum. String variables or other non-numeric variables show counts and percentages of each element type.
using SummaryTables
data = (; x = [1, 2, 3], y = ["a", "b", "a"])
table_one(data, [:x, :y])
Overall | |
x | |
Mean (SD) | 2 (1) |
Median [Min, Max] | 2 [1, 3] |
y | |
a | 2 (66.7%) |
b | 1 (33.3%) |
In the next example, we rename the x
variable by passing a String
in a Pair
.
using SummaryTables
data = (; x = [1, 2, 3], y = ["a", "b", "a"])
table_one(data, [:x => "Variable X", :y])
Overall | |
Variable X | |
Mean (SD) | 2 (1) |
Median [Min, Max] | 2 [1, 3] |
y | |
a | 2 (66.7%) |
b | 1 (33.3%) |
Labels can be any type except <:Function
(that type signals that an analysis function has been passed). One example of a non-string label is Concat
in conjunction with Superscript
.
using SummaryTables
data = (; x = [1, 2, 3], y = ["a", "b", "a"])
table_one(data, [:x => Concat("X", Superscript("with superscript")), :y])
Overall | |
Xwith superscript | |
Mean (SD) | 2 (1) |
Median [Min, Max] | 2 [1, 3] |
y | |
a | 2 (66.7%) |
b | 1 (33.3%) |
Any object which is a subtype of Function
is assumed to be an analysis function. An analysis function takes a data column as input and returns a Tuple
where each entry corresponds to one analysis row. Each of these rows consists of a Pair
where the left side is the analysis result and the right side the label. Here's an example of a custom number column analysis function. Note the use of Concat
to build content out of multiple parts. This is preferred to interpolating into a string because interpolation destroys the original objects and takes away the possibility for automatic rounding or other special post-processing or display behavior.
using SummaryTables
using Statistics
data = (; x = [1, 2, 3])
function custom_analysis(column)
(
minimum(column) => "Minimum",
maximum(column) => "Maximum",
Concat(mean(column), " (", std(column), ")") => "Mean (SD)",
)
end
table_one(data, :x => custom_analysis)
Overall | |
x | |
Minimum | 1 |
Maximum | 3 |
Mean (SD) | 2 (1) |
Finally, all three parts, variable, analysis function and label can be combined as well:
using SummaryTables
using Statistics
data = (; x = [1, 2, 3])
function custom_analysis(column)
(
minimum(column) => "Minimum",
maximum(column) => "Maximum",
Concat(mean(column), " (", std(column), ")") => "Mean (SD)",
)
end
table_one(data, :x => custom_analysis => "Variable X")
Overall | |
Variable X | |
Minimum | 1 |
Maximum | 3 |
Mean (SD) | 2 (1) |
Keyword: groupby
The groupby
keyword takes a vector of column name symbols with optional labels. If there is only one grouping column, the vector can be omitted. Each analysis is then computed separately for each group.
using SummaryTables
data = (; x = [1, 2, 3, 4, 5, 6], y = ["a", "a", "a", "b", "b", "b"])
table_one(data, :x, groupby = :y)
y | |||
Overall | a | b | |
x | |||
Mean (SD) | 3.5 (1.87) | 2 (1) | 5 (1) |
Median [Min, Max] | 3.5 [1, 6] | 2 [1, 3] | 5 [4, 6] |
In this example, we rename the grouping column:
using SummaryTables
data = (; x = [1, 2, 3, 4, 5, 6], y = ["a", "a", "a", "b", "b", "b"])
table_one(data, :x, groupby = :y => "Column Y")
Column Y | |||
Overall | a | b | |
x | |||
Mean (SD) | 3.5 (1.87) | 2 (1) | 5 (1) |
Median [Min, Max] | 3.5 [1, 6] | 2 [1, 3] | 5 [4, 6] |
If there are multiple grouping columns, they are shown in a nested fashion, with the first group at the highest level:
using SummaryTables
data = (;
x = [1, 2, 3, 4, 5, 6],
y = ["a", "a", "b", "b", "c", "c"],
z = ["d", "e", "d", "e", "d", "e"],
)
table_one(data, :x, groupby = [:y, :z => "Column Z"])
y | |||||||
a | b | c | |||||
Column Z | Column Z | Column Z | |||||
Overall | d | e | d | e | d | e | |
x | |||||||
Mean (SD) | 3.5 (1.87) | 1 (NaN) | 2 (NaN) | 3 (NaN) | 4 (NaN) | 5 (NaN) | 6 (NaN) |
Median [Min, Max] | 3.5 [1, 6] | 1 [1, 1] | 2 [2, 2] | 3 [3, 3] | 4 [4, 4] | 5 [5, 5] | 6 [6, 6] |
Keyword: show_n
When show_n
is set to true
, the size of each group is shown under its name.
using SummaryTables
data = (; x = [1, 2, 3, 4, 5, 6], y = ["a", "a", "a", "a", "b", "b"])
table_one(data, :x, groupby = :y, show_n = true)
y | |||
Overall (n=6) |
a (n=4) |
b (n=2) |
|
x | |||
Mean (SD) | 3.5 (1.87) | 2.5 (1.29) | 5.5 (0.707) |
Median [Min, Max] | 3.5 [1, 6] | 2.5 [1, 4] | 5.5 [5, 6] |
Keyword: show_overall
When show_overall
is set to false
, the column summarizing all groups together is hidden. Use this only when groupby
is set, otherwise the resulting table will be empty.
using SummaryTables
data = (; x = [1, 2, 3, 4, 5, 6], y = ["a", "a", "a", "a", "b", "b"])
table_one(data, :x, groupby = :y, show_overall = false)
y | ||
a | b | |
x | ||
Mean (SD) | 2.5 (1.29) | 5.5 (0.707) |
Median [Min, Max] | 2.5 [1, 4] | 5.5 [5, 6] |
Keyword: sort
By default, group entries are sorted. If you need to maintain the order of entries from your dataset, set sort = false
.
Notice how in the following two examples, the group indices are "dos"
, "tres"
, "uno"
when sorted, but "uno"
, "dos"
, "tres"
when not sorted. If we want to preserve the natural order of these groups ("uno", "dos", "tres" meaning "one", "two", "three" in Spanish but having a different alphabetical order) we need to set sort = false
.
using SummaryTables
data = (; x = [1, 2, 3, 4, 5, 6], y = ["uno", "uno", "dos", "dos", "tres", "tres"])
table_one(data, :x, groupby = :y)
y | ||||
Overall | dos | tres | uno | |
x | ||||
Mean (SD) | 3.5 (1.87) | 3.5 (0.707) | 5.5 (0.707) | 1.5 (0.707) |
Median [Min, Max] | 3.5 [1, 6] | 3.5 [3, 4] | 5.5 [5, 6] | 1.5 [1, 2] |
table_one(data, :x, groupby = :y, sort = false)
y | ||||
Overall | uno | dos | tres | |
x | ||||
Mean (SD) | 3.5 (1.87) | 1.5 (0.707) | 3.5 (0.707) | 5.5 (0.707) |
Median [Min, Max] | 3.5 [1, 6] | 1.5 [1, 2] | 3.5 [3, 4] | 5.5 [5, 6] |
If you have multiple groups, sort = false
can lead to splitting of higher-level groups if they are not correctly ordered in the source data.
Compare the following two tables. In the second one, the group "A" is split by "B" so the label appears twice.
using SummaryTables
data = (; x = [1, 2, 3, 4, 5, 6], y = ["A", "A", "B", "B", "B", "A"], z = ["C", "C", "C", "D", "D", "D"])
table_one(data, :x, groupby = [:y, :z])
y | |||||
A | B | ||||
z | z | ||||
Overall | C | D | C | D | |
x | |||||
Mean (SD) | 3.5 (1.87) | 1.5 (0.707) | 6 (NaN) | 3 (NaN) | 4.5 (0.707) |
Median [Min, Max] | 3.5 [1, 6] | 1.5 [1, 2] | 6 [6, 6] | 3 [3, 3] | 4.5 [4, 5] |
table_one(data, :x, groupby = [:y, :z], sort = false)
y | |||||
A | B | A | |||
z | z | z | |||
Overall | C | C | D | D | |
x | |||||
Mean (SD) | 3.5 (1.87) | 1.5 (0.707) | 3 (NaN) | 4.5 (0.707) | 6 (NaN) |
Median [Min, Max] | 3.5 [1, 6] | 1.5 [1, 2] | 3 [3, 3] | 4.5 [4, 5] | 6 [6, 6] |