| Title: | Miscellanea Functions For Intro Stats Classes |
|---|---|
| Description: | Utilities for introductory statistics labs: plot tail or interval areas for the normal or t distributions, highlight binomial probabilities, and summarise discrete random variables. Calculate pooled standard deviations and Welch-Satterthwaite degrees of freedom, and run teaching-friendly tests such as Fligner-Killeen plus one-way ANOVA, and t tests with an Ansari-Bradley variance check alongside pooled and Welch variants. |
| Authors: | Adrian Dusa [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-3525-9253>) |
| Maintainer: | Adrian Dusa <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.14 |
| Built: | 2026-05-29 13:10:44 UTC |
| Source: | https://github.com/dusadrian/statistics |
Plot tail or interval areas for the normal or t distributions, and draw binomial distributions with highlighted outcome probabilities. Summarise discrete random variables via their mean and standard deviation. Compute pooled standard deviations and Welch-Satterthwaite degrees of freedom, then run teaching-friendly tests such as a Fligner-Killeen variance check plus one-way ANOVA, and t tests that pair an Ansari-Bradley variance check with both pooled and Welch variants.
| Package: | statistics |
| Type: | Package |
| Version: | 0.14 |
| Date: | 2026-05-12 |
| License: | GPL-v3 |
Adrian Dusa
Maintainer: Adrian Dusa ([email protected])
dnorm, pnorm, dbinom
Performs one-way ANOVA together with a Fligner-Killeen test for the homogeneity of variances. When the group variances are not considered homogeneous, the print method displays a Welch approximation table.
anovahv(x, ...) ## Default S3 method: anovahv( x, y, var.equal = NULL, conf.level = 0.95, ... ) ## S3 method for class 'formula' anovahv( formula, data, subset, na.action = na.omit, var.equal = NULL, conf.level = 0.95, ... )anovahv(x, ...) ## Default S3 method: anovahv( x, y, var.equal = NULL, conf.level = 0.95, ... ) ## S3 method for class 'formula' anovahv( formula, data, subset, na.action = na.omit, var.equal = NULL, conf.level = 0.95, ... )
x |
A numeric response vector. |
y |
A grouping variable. Character variables and categorical variables
of class |
formula |
A formula of the form |
data |
An optional data frame, matrix or list containing the variables
in |
subset |
An optional vector specifying a subset of observations. |
na.action |
A function specifying how missing values should be handled by the formula method. |
var.equal |
Logical or |
conf.level |
Numeric value between 0 and 1. When
|
... |
Additional arguments passed to methods. |
The function is a teaching-oriented wrapper around aov,
lm, oneway.test and
fligner.test. It always stores both the classical ANOVA
result and the Welch approximation result. The print method chooses which one
to display.
The argument var.equal follows the same general idea as in
t.testhv. If it is left as NULL, the function performs a
Fligner-Killeen homogeneity of variances test and prints the classical ANOVA
table when the variances are considered homogeneous. If var.equal = TRUE
or var.equal = FALSE, the user explicitly chooses the classical ANOVA or
Welch output, respectively.
Grouping variables of class "declared" are detected and coerced with
as.factor() before fitting the models. This allows declared categorical
variables to be used directly as group variables in both the default and formula
interfaces.
An object of class "anovahv", containing:
homog_test: the Fligner-Killeen homogeneity of variances test;
test: the classical ANOVA model produced by aov;
welch: the Welch one-way test produced by oneway.test;
output_table: a compact Welch approximation table;
var.equal and conf.level: the analysis settings.
Adrian Dusa
aov, anova,
oneway.test, fligner.test,
t.testhv
values <- c( 15, 8, 17, 7, 26, 12, 8, 11, 16, 9, 16, 24, 20, 19, 9, 17, 11, 8, 15, 6, 14 ) groups <- rep(1:3, each = 7) anovahv(values, groups) anovahv(values ~ groups) # Force a specific output, bypassing the variance decision anovahv(values, groups, var.equal = TRUE) anovahv(values, groups, var.equal = FALSE) # Using a dataset vgdf <- data.frame(values, groups) anovahv(values ~ groups, data = vgdf) using( vgdf, anovahv(values ~ groups) ) # Declared categorical grouping variables are detected vgdf$declared_groups <- declared( vgdf$groups, labels = c(A = 1, B = 2, C = 3) ) anovahv(values ~ declared_groups, data = vgdf) anovahv(vgdf$values, vgdf$declared_groups) # Class example cls <- data.frame( values = c( 22, 27, 32, 30, 29, 27, 33, 24, 24, 30, 28, 22, 24, 18, 21, 26, 25, 20, 24, 28, 20, 28, 31, 26, 26, 30, 21, 25, 29, 27 ), groups = rep(1:3, each = 10) ) using( cls, anovahv(values ~ groups) ) # Post-hoc test, for example Bonferroni using( cls, pairwise.t.test(values, groups, p.adj = "bonferroni") )values <- c( 15, 8, 17, 7, 26, 12, 8, 11, 16, 9, 16, 24, 20, 19, 9, 17, 11, 8, 15, 6, 14 ) groups <- rep(1:3, each = 7) anovahv(values, groups) anovahv(values ~ groups) # Force a specific output, bypassing the variance decision anovahv(values, groups, var.equal = TRUE) anovahv(values, groups, var.equal = FALSE) # Using a dataset vgdf <- data.frame(values, groups) anovahv(values ~ groups, data = vgdf) using( vgdf, anovahv(values ~ groups) ) # Declared categorical grouping variables are detected vgdf$declared_groups <- declared( vgdf$groups, labels = c(A = 1, B = 2, C = 3) ) anovahv(values ~ declared_groups, data = vgdf) anovahv(vgdf$values, vgdf$declared_groups) # Class example cls <- data.frame( values = c( 22, 27, 32, 30, 29, 27, 33, 24, 24, 30, 28, 22, 24, 18, 21, 26, 25, 20, 24, 28, 20, 28, 31, 26, 26, 30, 21, 25, 29, 27 ), groups = rep(1:3, each = 10) ) using( cls, anovahv(values ~ groups) ) # Post-hoc test, for example Bonferroni using( cls, pairwise.t.test(values, groups, p.adj = "bonferroni") )
The function "daria" - 'd'raws the 'area' under the normal curve for certain values of z.
daria(area, z1, z2, draw = FALSE)daria(area, z1, z2, draw = FALSE)
area |
The required area |
z1 |
First z value, in the interval +/- 4 |
z2 |
Second z value, in the interval +/- 4 |
draw |
Logical; if TRUE, draw the area |
In the argument area, the function accepts:
"l", "u", "left" and "under" for the area to the left of z,
"r", "o", "a", "right" "over" and "above" for the area to the right of z
"b" and "between" for the area between two z values.
z values smaller than -4 and greater than +4 are truncated to these values.
Adrian Dusa
daria("between", -1.96, 1.96) daria("over", -1) daria("under", -1) daria("over", 2, draw = TRUE)daria("between", -1.96, 1.96) daria("over", -1) daria("under", -1) daria("over", 2, draw = TRUE)
A function similar to daria, with the only difference it uses the t instead of the
normal z distribution. In addition, the function expects an additional parameter for the
degrees of freedom.
dariat(area, t1, t2, df, draw = FALSE)dariat(area, t1, t2, df, draw = FALSE)
area |
The required area |
t1 |
First t value, in the interval +/- 4 |
t2 |
Second t value, in the interval +/- 4 |
df |
Degrees of freedom |
draw |
Logical; if TRUE, draw the area |
In the argument area, the function accepts:
"l", "u", "left" and "under" for the area to the left of z,
"r", "o", "a", "right" "over" and "above" for the area to the right of z
"b" and "between" for the area between two z values.
z values smaller than -4 and greater than +4 are truncated to these values.
Adrian Dusa
# for 100 degrees of freedom dariat("between", -1.96, 1.96, df = 100) dariat("over", -1, df = 100) dariat("under", -1, df = 100) dariat("over", 2, df = 100, draw = TRUE)# for 100 degrees of freedom dariat("between", -1.96, 1.96, df = 100) dariat("over", -1, df = 100) dariat("under", -1, df = 100) dariat("over", 2, df = 100, draw = TRUE)
This function draws graphics for a certain number of repetitions of an experiment, at a certain probability of success, and calculates the probability of obtaining one or more values from a random variable.
dbinoms(x, size, prob, log = FALSE, draw = FALSE, zoom = FALSE, new = FALSE, text = FALSE)dbinoms(x, size, prob, log = FALSE, draw = FALSE, zoom = FALSE, new = FALSE, text = FALSE)
x |
Number of favourable outcomes: a value or a vector of values |
size |
Number of repetitions |
prob |
Probability of success |
log |
Logical; if TRUE, the probability is returned as log(p) |
draw |
Logical; if TRUE, draws the binomial distribution |
zoom |
Logical; if TRUE, eliminates from the graphic all numbers with probability equal to zero |
new |
Logical; if TRUE, a new window will be created for each graphic |
text |
Logical; if TRUE, display the probability above each bar |
Adrian Dusa
# 8 repetitions, with a 0.5 probability of success, calculate the # probability of obtaining between 2 and 4 favourable outcomes dbinoms(2:4, 8, 0.5) # less than 7 favourable outcomes dbinoms(0:6, 8, 0.5) #at most 7 favourable outcomes dbinoms(0:7, 8, 0.5) # above 5 favourable outcomes dbinoms(6:8, 8, 0.5) # at least 5 favourable outcomes dbinoms(5:8, 8, 0.5) # exactly 6 favourable outcomes dbinoms(6, 8, 0.5) # 1, 3 or 6 favourable outcomes dbinoms(c(1, 3, 6), 8, 0.5) # same, drawing the graphic dbinoms(c(1, 3, 6), 8, 0.5, draw = TRUE) # same, drawing the probabilities in the graphic dbinoms(c(1, 3, 6), 8, 0.5, draw = TRUE, text = TRUE)# 8 repetitions, with a 0.5 probability of success, calculate the # probability of obtaining between 2 and 4 favourable outcomes dbinoms(2:4, 8, 0.5) # less than 7 favourable outcomes dbinoms(0:6, 8, 0.5) #at most 7 favourable outcomes dbinoms(0:7, 8, 0.5) # above 5 favourable outcomes dbinoms(6:8, 8, 0.5) # at least 5 favourable outcomes dbinoms(5:8, 8, 0.5) # exactly 6 favourable outcomes dbinoms(6, 8, 0.5) # 1, 3 or 6 favourable outcomes dbinoms(c(1, 3, 6), 8, 0.5) # same, drawing the graphic dbinoms(c(1, 3, 6), 8, 0.5, draw = TRUE) # same, drawing the probabilities in the graphic dbinoms(c(1, 3, 6), 8, 0.5, draw = TRUE, text = TRUE)
The function dfcalc is used only for two samples t test, when the group
variations are not equal.
For small and independent samples, and unknown but equal population variances,
the variances of the two samples are used. As the sample variances are never
equal, this function calculates their pooled variance based on the two standard
deviations and their respective sample sizes.
dfcalc(x, y, n1, n2) spooled(x, y, n1, n2)dfcalc(x, y, n1, n2) spooled(x, y, n1, n2)
x |
The values of the standard deviation for the first group |
y |
The values of the standard deviation for the second group |
n1 |
Size of the first group |
n2 |
Size of the second group |
Adrian Dusa
group1 <- c(13, 14, 9, 12, 8, 10, 5, 10, 9, 12, 16) group2 <- c(16, 18, 11, 19, 14, 17, 13, 16, 17, 18, 22, 12) sd1 <- sd(group1) sd2 <- sd(group2) n1 <- length(group1) n2 <- length(group2) # more direct dfcalc(group1, group2) # if the standard deviations and group sizes are known dfcalc(sd1, sd2, n1, n2) # the pooled standard deviation spooled(sd1, sd2, n1, n2) # more direct spooled(group1, group2)group1 <- c(13, 14, 9, 12, 8, 10, 5, 10, 9, 12, 16) group2 <- c(16, 18, 11, 19, 14, 17, 13, 16, 17, 18, 22, 12) sd1 <- sd(group1) sd2 <- sd(group2) n1 <- length(group1) n2 <- length(group2) # more direct dfcalc(group1, group2) # if the standard deviations and group sizes are known dfcalc(sd1, sd2, n1, n2) # the pooled standard deviation spooled(sd1, sd2, n1, n2) # more direct spooled(group1, group2)
Draws a histogram with a normal curve that approximates the
distribution.
When ylim is not provided, it is chosen to accommodate both the
histogram bars and the superimposed curve.
histc(x, from, to, size = 15, ...)histc(x, from, to, size = 15, ...)
x |
Numeric vector |
from |
Starting point on the horizontal axis. |
to |
End point on the horizontal axis. |
size |
Size of the graphic, in centimeters. |
... |
Other parameters, specific to the base |
Adrian Dusa
x <- sample(18:93, 150, replace = TRUE) histc(x) histc(x, 10, 100) histc(x, 10, 100, xlab = "Age", ylab = "Frequency", main = "Histogram for age in years")x <- sample(18:93, 150, replace = TRUE) histc(x) histc(x, 10, 100) histc(x, 10, 100, xlab = "Age", ylab = "Frequency", main = "Histogram for age in years")
The function expects a tabel (a data frame or a matrix) with just two columns: the first containing the values of a random variable, and the associated probabilities in the second column.
mbinom(x) sbinom(x)mbinom(x) sbinom(x)
x |
The data table. |
If the sum of the probabilities on the second columns is not equal to 1, the function interprets them as absolute values and recalculates the relative frequencies.
Adrian Dusa
data <- matrix(c(0:4, 0.015, 0.235, 0.425, 0.245, 0.080), ncol = 2) mbinom(data) sbinom(data) data <- data.frame(X = 0:4, P_X = c(12, 188, 340, 196, 64)) mbinom(data) sbinom(data)data <- matrix(c(0:4, 0.015, 0.235, 0.425, 0.245, 0.080), ncol = 2) mbinom(data) sbinom(data) data <- data.frame(X = 0:4, P_X = c(12, 188, 340, 196, 64)) mbinom(data) sbinom(data)
Performs one-sample, two-sample and paired t tests, with additional support for choosing between the classical Student t test and the Welch t test. For two independent samples, the default behavior is to first run the Ansari-Bradley test for the homogeneity of variances.
## S3 method for class 'testhv' t(x, ...) ## Default S3 method: t.testhv( x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = NULL, conf.level = 0.95, ... ) ## S3 method for class 'formula' t.testhv( formula, data, subset, na.action = na.pass, ... ) ## S3 method for class 'Pair' t.testhv( x, alternative = c("two.sided", "less", "greater"), mu = 0, var.equal = NULL, conf.level = 0.95, ... )## S3 method for class 'testhv' t(x, ...) ## Default S3 method: t.testhv( x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = NULL, conf.level = 0.95, ... ) ## S3 method for class 'formula' t.testhv( formula, data, subset, na.action = na.pass, ... ) ## S3 method for class 'Pair' t.testhv( x, alternative = c("two.sided", "less", "greater"), mu = 0, var.equal = NULL, conf.level = 0.95, ... )
x |
A numeric vector, a |
y |
An optional second numeric vector. If |
formula |
A formula. Use |
data |
An optional data frame, matrix or list containing the variables
in |
subset |
An optional vector specifying a subset of observations. |
na.action |
A function specifying how missing values should be handled by the formula method. |
alternative |
Character string specifying the alternative hypothesis. See Details. |
mu |
A number indicating the true value of the mean, difference in means, or mean difference under the null hypothesis. |
paired |
Logical, indicating whether the two supplied numeric vectors are
paired. This argument is not accepted by the formula method; use
|
var.equal |
Logical or |
conf.level |
Numeric value between 0 and 1, giving the confidence level of the interval. |
... |
Additional arguments passed to the relevant method. |
The main difference from base R's t.test is the default
treatment of var.equal. In t.test(), this argument is logical and
defaults to FALSE, meaning Welch's unequal-variance test is used unless
the user explicitly requests the pooled-variance Student test.
In t.testhv(), var.equal defaults to NULL. This leaves the
choice to the function: for two independent samples it first applies
ansari.test as a homogeneity of variances test, then prints
the pooled-variance Student test when the variances are considered homogeneous,
or the Welch test otherwise. Users can still set var.equal = TRUE or
var.equal = FALSE to mimic the explicit behavior of
t.test.
Internally, both the pooled-variance and Welch tests are computed and stored in
the result object. The print method decides which one to display, based on
var.equal and, when var.equal = NULL, the p-value of the
homogeneity test.
The alternative argument accepts the same standard values as
t.test: "two.sided", "less" and
"greater". In addition, "!=" and "two.tailed" are accepted
for a two-sided test, "<" and "lower" for a left-tailed test, and
">", "higher" and "upper" for a right-tailed test.
Formula calls support categorical grouping variables of class "declared".
Such variables are coerced to factors before splitting the response into two
groups. The default method also supports the shortcut
t.testhv(values, group) when group is a "declared"
categorical variable with exactly two levels.
Paired t tests can be requested either with two numeric vectors and
paired = TRUE, or by supplying a Pair object. In the formula
method, paired tests should be written as Pair(before, after) ~ 1;
paired = TRUE is deliberately rejected in formula calls.
An object of class "ttesthv", containing:
homogtest: the Ansari-Bradley homogeneity of variances test, or
NULL when it is not relevant;
ttest: the pooled-variance Student t test;
ttestWelch: the Welch t test;
paired, var.equal and conf.level: the corresponding
analysis settings.
Adrian Dusa
group1 <- c(13, 14, 9, 12, 8, 10, 5, 10, 9, 12, 16) group2 <- c(16, 18, 11, 19, 14, 17, 13, 16, 17, 18, 22, 12) t.testhv(group1, group2) # Force the same variance choice as base R's t.test() t.testhv(group1, group2, var.equal = FALSE) t.testhv(group1, group2, var.equal = TRUE) # Formula interface dataset <- data.frame( values = c(group1, group2), group = c(rep(1, 11), rep(2, 12)) ) t.testhv(values ~ group, data = dataset) using( dataset, t.testhv(values ~ group) ) # Declared categorical grouping variables are detected dataset$declared_group <- declared( dataset$group, labels = c(Group1 = 1, Group2 = 2) ) t.testhv(values ~ declared_group, data = dataset) t.testhv(dataset$values, dataset$declared_group) # Paired tests before <- c(8, 7, 6, 9, 10) after <- c(7, 6, 7, 6, 8) t.testhv(before, after, paired = TRUE) t.testhv(Pair(before, after)) paired_data <- data.frame(before = before, after = after) t.testhv(Pair(before, after) ~ 1, data = paired_data)group1 <- c(13, 14, 9, 12, 8, 10, 5, 10, 9, 12, 16) group2 <- c(16, 18, 11, 19, 14, 17, 13, 16, 17, 18, 22, 12) t.testhv(group1, group2) # Force the same variance choice as base R's t.test() t.testhv(group1, group2, var.equal = FALSE) t.testhv(group1, group2, var.equal = TRUE) # Formula interface dataset <- data.frame( values = c(group1, group2), group = c(rep(1, 11), rep(2, 12)) ) t.testhv(values ~ group, data = dataset) using( dataset, t.testhv(values ~ group) ) # Declared categorical grouping variables are detected dataset$declared_group <- declared( dataset$group, labels = c(Group1 = 1, Group2 = 2) ) t.testhv(values ~ declared_group, data = dataset) t.testhv(dataset$values, dataset$declared_group) # Paired tests before <- c(8, 7, 6, 9, 10) after <- c(7, 6, 7, 6, 8) t.testhv(before, after, paired = TRUE) t.testhv(Pair(before, after)) paired_data <- data.frame(before = before, after = after) t.testhv(Pair(before, after) ~ 1, data = paired_data)