`summarytable`

Synopsis

A summary table summarizes the raw data from one column of a source table for different groups defined by grouping columns. It is similar to a listingtable without the raw values.

Here is an example of a hypothetical clinical trial with drug concentration measurements of two participants with five time points each.

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    concentration = [1.2, 4.5, 2.0, 1.5, 0.1, 1.8, 3.2, 1.8, 1.2, 0.2],
    id = repeat([1, 2], inner = 5),
    time = repeat([0, 0.5, 1, 2, 3], 2)
)

summarytable(
    data,
    :concentration => "Concentration (ng/mL)",
    cols = :time => "Time (hr)",
    summary = [
        length => "N",
        mean => "Mean",
        std => "SD",
    ]
)


	Time (hr)
	0	0.5	1	2	3
	Concentration (ng/mL)

N	2	2	2	2	2
Mean	1.5	3.85	1.9	1.35	0.15
SD	0.424	0.919	0.141	0.212	0.0707

Argument 1: `table`

The first argument can be any object that is a table compatible with the Tables.jl API. Here are some common examples:

`DataFrame`

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value = 1:6,
    group = repeat(["A", "B", "C"], 2),
)

summarytable(data, :value, cols = :group, summary = [mean, std])


	group
	A	B	C
	value

mean	2.5	3.5	4.5
std	2.12	2.12	2.12

`NamedTuple` of `Vector`s

using SummaryTables
using Statistics

data = (; value = 1:6, group = repeat(["A", "B", "C"], 2))

summarytable(data, :value, cols = :group, summary = [mean, std])


	group
	A	B	C
	value

mean	2.5	3.5	4.5
std	2.12	2.12	2.12

`Vector` of `NamedTuple`s

using SummaryTables
using Statistics

data = [
    (value = 1, group = "A")
    (value = 2, group = "B")
    (value = 3, group = "C")
    (value = 4, group = "A")
    (value = 5, group = "B")
    (value = 6, group = "C")
]

summarytable(data, :value, cols = :group, summary = [mean, std])


	group
	A	B	C
	value

mean	2.5	3.5	4.5
std	2.12	2.12	2.12

Argument 2: `variable`

The second argument primarily selects the table column whose data should populate the cells of the summary table. The column name is specified with a Symbol or String:

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value1 = 1:6,
    value2 = 7:12,
    group = repeat(["A", "B", "C"], 2),
)

summarytable(data, :value1, cols = :group, summary = [mean, std])


	group
	A	B	C
	value1

mean	2.5	3.5	4.5
std	2.12	2.12	2.12

Here we choose to list column :value2 instead:

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value1 = 1:6,
    value2 = 7:12,
    group = repeat(["A", "B", "C"], 2),
)

summarytable(data, :value2, cols = :group, summary = [mean, std])


	group
	A	B	C
	value2

mean	8.5	9.5	10.5
std	2.12	2.12	2.12

By default, the variable name is used as the label as well. You can pass a different label as the second element of a Pair using the => operators. The label can be of any type (refer to Types of cell values for a list).

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value1 = 1:6,
    value2 = 7:12,
    group = repeat(["A", "B", "C"], 2),
)

summarytable(data, :value1 => "Value", cols = :group, summary = [mean, std])


	group
	A	B	C
	Value

mean	2.5	3.5	4.5
std	2.12	2.12	2.12

Keyword: `rows`

The rows keyword determines the grouping structure along the rows. It can either be a Symbol or String specifying a grouping column, a Pair{Symbol,Any} or Pair{String,Any} where the second element overrides the group's label, or a Vector with multiple groups of the aforementioned format.

This example uses a single group with default label.

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value = 1:6,
    group = repeat(["A", "B", "C"], 2),
)

summarytable(data, :value, rows = :group, summary = [mean, std])


group		value

A	mean	2.5
A	std	2.12
B	mean	3.5
B	std	2.12
C	mean	4.5
C	std	2.12

The label can be overridden using the Pair operator.

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value = 1:6,
    group = repeat(["A", "B", "C"], 2),
)

summarytable(data, :value, rows = :group => "Group", summary = [mean, std])


Group		value

A	mean	2.5
A	std	2.12
B	mean	3.5
B	std	2.12
C	mean	4.5
C	std	2.12

Multiple groups are possible as well, in that case you get a nested display where the last group changes the fastest.

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value = 1:12,
    group1 = repeat(["A", "B"], inner = 6),
    group2 = repeat(["C", "D", "E"], 4),
)

summarytable(data, :value, rows = [:group1, :group2 => "Group 2"], summary = [mean, std])


group1	Group 2		value

A	C	mean	2.5
	C	std	2.12
	D	mean	3.5
	D	std	2.12
	E	mean	4.5
	E	std	2.12
B	C	mean	8.5
	C	std	2.12
	D	mean	9.5
	D	std	2.12
	E	mean	10.5
	E	std	2.12

Keyword: `cols`

The cols keyword determines the grouping structure along the columns. It can either be a Symbol or String specifying a grouping column, a Pair{Symbol,Any} or Pair{String,Any} where the second element overrides the group's label, or a Vector with multiple groups of the aforementioned format.

This example uses a single group with default label.

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value = 1:6,
    group = repeat(["A", "B", "C"], 2),
)

summarytable(data, :value, cols = :group, summary = [mean, std])


	group
	A	B	C
	value

mean	2.5	3.5	4.5
std	2.12	2.12	2.12

The label can be overridden using the Pair operator.

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value = 1:6,
    group = repeat(["A", "B", "C"], 2),
)

summarytable(data, :value, cols = :group => "Group", summary = [mean, std])


	Group
	A	B	C
	value

mean	2.5	3.5	4.5
std	2.12	2.12	2.12

Multiple groups are possible as well, in that case you get a nested display where the last group changes the fastest.

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value = 1:12,
    group1 = repeat(["A", "B"], inner = 6),
    group2 = repeat(["C", "D", "E"], 4),
)

summarytable(data, :value, cols = [:group1, :group2 => "Group 2"], summary = [mean, std])


	group1
	A			B
	Group 2			Group 2
	C	D	E	C	D	E
	value

mean	2.5	3.5	4.5	8.5	9.5	10.5
std	2.12	2.12	2.12	2.12	2.12	2.12

Keyword: `summary`

This keyword takes a list of aggregation functions which are used to summarize the chosen variable. A summary function should take a vector of values (usually that will be numbers) and output one summary value. This value can be of any type that SummaryTables can show in a cell (refer to Types of cell values for a list).

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value = 1:24,
    group1 = repeat(["A", "B", "C", "D"], 6),
    group2 = repeat(["E", "F", "G"], inner = 8),
)

mean_sd(values) = Concat(mean(values), " (", std(values), ")")

summarytable(
    data,
    :value,
    rows = :group1,
    cols = :group2,
    summary = [
        mean,
        std => "SD",
        mean_sd => "Mean (SD)",
    ]
)


		group2
		E	F	G
group1		value

A	mean	3	11	19
	SD	2.83	2.83	2.83
	Mean (SD)	3 (2.83)	11 (2.83)	19 (2.83)
B	mean	4	12	20
	SD	2.83	2.83	2.83
	Mean (SD)	4 (2.83)	12 (2.83)	20 (2.83)
C	mean	5	13	21
	SD	2.83	2.83	2.83
	Mean (SD)	5 (2.83)	13 (2.83)	21 (2.83)
D	mean	6	14	22
	SD	2.83	2.83	2.83
	Mean (SD)	6 (2.83)	14 (2.83)	22 (2.83)

Keyword: `variable_header`

If you set variable_header = false, you can hide the header cell with the variable label, which makes the table layout a little more compact.

Here is a table with the header cell:

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value = 1:24,
    group1 = repeat(["A", "B", "C", "D"], 6),
    group2 = repeat(["E", "F", "G"], inner = 8),
)

summarytable(
    data,
    :value,
    rows = :group1,
    cols = :group2,
    summary = [mean, std],
)


		group2
		E	F	G
group1		value

A	mean	3	11	19
A	std	2.83	2.83	2.83
B	mean	4	12	20
B	std	2.83	2.83	2.83
C	mean	5	13	21
C	std	2.83	2.83	2.83
D	mean	6	14	22
D	std	2.83	2.83	2.83

And here is a table without it:

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value = 1:24,
    group1 = repeat(["A", "B", "C", "D"], 6),
    group2 = repeat(["E", "F", "G"], inner = 8),
)

summarytable(
    data,
    :value,
    rows = :group1,
    cols = :group2,
    summary = [mean, std],
    variable_header = false,
)


		group2
group1		E	F	G

A	mean	3	11	19
A	std	2.83	2.83	2.83
B	mean	4	12	20
B	std	2.83	2.83	2.83
C	mean	5	13	21
C	std	2.83	2.83	2.83
D	mean	6	14	22
D	std	2.83	2.83	2.83

Keyword: `sort`

By default, group entries are sorted. If you need to maintain the order of entries from your dataset, set sort = false.

Notice how in the following two examples, the group indices are "dos", "tres", "uno" when sorted, but "uno", "dos", "tres" when not sorted. If we want to preserve the natural order of these groups ("uno", "dos", "tres" meaning "one", "two", "three" in Spanish but having a different alphabetical order) we need to set sort = false.

using DataFrames
using SummaryTables
using Statistics

data = DataFrame(
    value = 1:18,
    group1 = repeat(["uno", "dos", "tres"], inner = 6),
    group2 = repeat(["cuatro", "cinco"], 9),
)

summarytable(data, :value, rows = :group1, cols = :group2, summary = [mean, std])


		group2
		cinco	cuatro
group1		value

dos	mean	10	9
dos	std	2	2
tres	mean	16	15
tres	std	2	2
uno	mean	4	3
uno	std	2	2

summarytable(data, :value, rows = :group1, cols = :group2, summary = [mean, std], sort = false)


		group2
		cuatro	cinco
group1		value

uno	mean	3	4
uno	std	2	2
dos	mean	9	10
dos	std	2	2
tres	mean	15	16
tres	std	2	2

Warning

If you have multiple groups, sort = false can lead to splitting of higher-level groups if they are not correctly ordered in the source data.

Compare the following two tables. In the second one, the group "A" is split by "B" so the label appears twice.

using SummaryTables
using DataFrames
using Statistics

data = DataFrame(
    value = 1:4,
    group1 = ["A", "B", "B", "A"],
    group2 = ["C", "D", "C", "D"],
)

summarytable(data, :value, rows = [:group1, :group2], summary = [mean])


group1	group2		value

A	C	mean	1
A	D	mean	4
B	C	mean	3
B	D	mean	2

data = DataFrame(
    value = 1:4,
    group1 = ["A", "B", "B", "A"],
    group2 = ["C", "D", "C", "D"],
)

summarytable(data, :value, rows = [:group1, :group2], summary = [mean], sort = false)


group1	group2		value

A	C	mean	1
B	D	mean	2
B	C	mean	3
A	D	mean	4