Package 'statistics' reference manual

Title:	Miscellanea Functions For Intro Stats Classes
Description:	The majority of functions in this package are designed to facilitate understanding the statistical concepts taught in the class (such as the functions to create graphics or areas under the normal curve), while some are designed to ease calculations for the lab exercises (for instance the degrees of freedom or the pooled variation for two independent samples).
Authors:	Adrian Dusa [aut, cre, cph]
Maintainer:	Adrian Dusa <[email protected]>
License:	GPL (>= 3)
Version:	0.9
Built:	2025-02-11 06:12:56 UTC
Source:	https://github.com/dusadrian/statistics

A package containing useful functions to teach introductory statistics.

Description

For discreet random variables, draws and calculates the probability of a certain number of favourable outcomes out of a number of repetition of an experiment. For a continous random variable, the graphics represents a normal curve with the area to the left or to the right of a certain z or t value, or between two such values. The package also contains functions to calculate the degrees of freedom and the pooled standard deviation using the t distribution etc.

Details

Package:	statistics
Type:	Package
Version:	0.9
Date:	2025-01-12
License:	GPL-v3

Author(s)

Adrian Dusa

Maintainer: Adrian Dusa ([email protected])

ANOVA including the homogeneity of variance test

Description

The function 'anovaFK' - contains two separate tests: in a first state, the Fligner-Killeen test for the homogeneity of variances is run, and function of this test, the Welch approximation is applied if the groups are not homogeneous.

Usage

anovaFK(x, y = NULL, data)
anovaFK(x, y = NULL, data)

Arguments

`x`	A vector of values or a formula object as in 'lhs ~ rhs', unde 'lhs' contains the values and the 'rhs' contains the groups. Both can be vectors or variables from a dataset.
`y`	An optional vector of values, when the two variables are not specified using a formula object.
`data`	A dataset containing the variables specified in the formula object, in case they don't exist as separate objects.

Details

When the variances are not equal, the output differs from the one presented by oneway.test, but the table is similar.

If the degrees of freedom are not what they should be ( $k - 1$ , and $n - k$ respectively) something must be wrong. Specifically, the grouping variable should be declared as a factor (in case it is not already character), otherwise it is considered metric and a regression model is applied instead of ANOVA.

Declaring a variable as a factor is done using the command: as.factor

Author(s)

Adrian Dusa

Examples


values <- c(15, 8, 17, 7, 26, 12, 8, 11, 16, 9, 16, 24, 20, 19, 9, 17, 11, 8, 15, 6, 14)
groups <- rep(1:3, each = 7)

anovaFK(values ~ groups)

# same thing with:
anovaFK(values, groups)

# using a dataset
vgdf <- data.frame(values, groups)
using(
  vgdf,
  anovaFK(values ~ groups)
)

# class example
cls <- data.frame(
  values = c(
    22, 27, 32, 30, 29, 27, 33, 24, 24, 30,
    28, 22, 24, 18,21, 26, 25, 20, 24, 28,
    20, 28, 31, 26, 26, 30, 21, 25, 29, 27
  ),
  groups = rep(1:3, each = 10)
)

using(
  cls,
  anovaFK(values ~ groups)
)

# post-hoc test, ex. Bonferroni
using(
  cls,
  pairwise.t.test(values, groups, p.adj='bonferroni')
)
values <- c(15, 8, 17, 7, 26, 12, 8, 11, 16, 9, 16, 24, 20, 19, 9, 17, 11, 8, 15, 6, 14)
groups <- rep(1:3, each = 7)

anovaFK(values ~ groups)

# same thing with:
anovaFK(values, groups)

# using a dataset
vgdf <- data.frame(values, groups)
using(
  vgdf,
  anovaFK(values ~ groups)
)

# class example
cls <- data.frame(
  values = c(
    22, 27, 32, 30, 29, 27, 33, 24, 24, 30,
    28, 22, 24, 18,21, 26, 25, 20, 24, 28,
    20, 28, 31, 26, 26, 30, 21, 25, 29, 27
  ),
  groups = rep(1:3, each = 10)
)

using(
  cls,
  anovaFK(values ~ groups)
)

# post-hoc test, ex. Bonferroni
using(
  cls,
  pairwise.t.test(values, groups, p.adj='bonferroni')
)

Calculate and draw the area under the normal curve z

Description

The function "daria" - 'd'raws the 'area' under the normal curve for certain values of z.

Usage

daria(area, z1, z2, draw = FALSE)
daria(area, z1, z2, draw = FALSE)

Arguments

`area`	The required area
`z1`	First z value, in the interval +/- 4
`z2`	Second z value, in the interval +/- 4
`draw`	Logical; if TRUE, draw the area

Details

In the argument area, the function accepts:

"l", "u", "left" and "under" for the area to the left of z,

"r", "o", "a", "right" "over" and "above" for the area to the right of z

"b" and "between" for the area between two z values.

z values smaller than -4 and greater than +4 are truncated to these values, since the area to the left and to the right of these values is practically equal to zero.

Author(s)

Adrian Dusa

Examples

daria("between", -1.96, 1.96) 

daria("over", -1)

daria("under", -1)

daria("over", 2, draw = TRUE)

daria("between", -1.96, 1.96) 

daria("over", -1)

daria("under", -1)

daria("over", 2, draw = TRUE)

Calculate and draw the area under the t distribution

Description

A function similar to "daria", with the only difference it uses the t instead of the z distribution. In addition, the function expects an additional parameter for the degrees of freedom.

Usage

dariat(area, t1, t2, df, draw = FALSE)
dariat(area, t1, t2, df, draw = FALSE)

Arguments

`area`	The required area
`t1`	First t value, in the interval +/- 4
`t2`	Second t value, in the interval +/- 4
`df`	Degrees of freedom
`draw`	Logical; if TRUE, draw the area

Details

In the argument area, the function accepts:

"l", "u", "left" and "under" for the area to the left of z,

"r", "o", "a", "right" "over" and "above" for the area to the right of z

"b" and "between" for the area between two z values.

z values smaller than -4 and greater than +4 are truncated to these values, since the area to the left and to the right of these values is practically equal to zero.

Author(s)

Adrian Dusa

Examples

# for 100 degrees of freedom
dariat("between", -1.96, 1.96, df = 100)

dariat("over", -1, df = 100)

dariat("under", -1, df = 100)

dariat("over", 2, df = 100, draw = TRUE)

# for 100 degrees of freedom
dariat("between", -1.96, 1.96, df = 100)

dariat("over", -1, df = 100)

dariat("under", -1, df = 100)

dariat("over", 2, df = 100, draw = TRUE)

Calculate probabilities and draw graphics for a binomial distribution

Description

This function draws graphics for a certain number of repetitions of an experiment, at a certain probability of success, and calculates the probability of obtaining one or more values from a random variable.

Usage

dbinoms(x, size, prob, log = FALSE, draw = FALSE,
        zoom = FALSE, new = FALSE, text = FALSE)
dbinoms(x, size, prob, log = FALSE, draw = FALSE,
        zoom = FALSE, new = FALSE, text = FALSE)

Arguments

`x`	Number of favourable outcomes: a value or a vector of values
`size`	Number of repetitions
`prob`	Probability of success
`log`	Logical; if TRUE, the probability is returned as log(p)
`draw`	Logical; if TRUE, draws the binomial distribution
`zoom`	Logical; if TRUE, eliminates from the graphic all numbers with probability equal to zero
`new`	Logical; if TRUE, a new window will be created for each graphic
`text`	Logical; if TRUE, display the probability above each bar

Author(s)

Adrian Dusa

Examples

# 8 repetitions, with a 0.5 probability of success, calculate the
# probability of obtaining between 2 and 4 favourable outcomes
dbinoms(2:4, 8, 0.5)

# less than 7 favourable outcomes
dbinoms(0:6, 8, 0.5)

#at most 7 favourable outcomes
dbinoms(0:7, 8, 0.5)

# above 5 favourable outcomes
dbinoms(6:8, 8, 0.5)

# at least 5 favourable outcomes
dbinoms(5:8, 8, 0.5)

# exactly 6 favourable outcomes
dbinoms(6, 8, 0.5)

# 1, 3 or 6 favourable outcomes
dbinoms(c(1, 3, 6), 8, 0.5)

# same, drawing the graphic
dbinoms(c(1, 3, 6), 8, 0.5, draw = TRUE)

# same, drawing the probabilities in the graphic
dbinoms(c(1, 3, 6), 8, 0.5, draw = TRUE, text = TRUE)

# 8 repetitions, with a 0.5 probability of success, calculate the
# probability of obtaining between 2 and 4 favourable outcomes
dbinoms(2:4, 8, 0.5)

# less than 7 favourable outcomes
dbinoms(0:6, 8, 0.5)

#at most 7 favourable outcomes
dbinoms(0:7, 8, 0.5)

# above 5 favourable outcomes
dbinoms(6:8, 8, 0.5)

# at least 5 favourable outcomes
dbinoms(5:8, 8, 0.5)

# exactly 6 favourable outcomes
dbinoms(6, 8, 0.5)

# 1, 3 or 6 favourable outcomes
dbinoms(c(1, 3, 6), 8, 0.5)

# same, drawing the graphic
dbinoms(c(1, 3, 6), 8, 0.5, draw = TRUE)

# same, drawing the probabilities in the graphic
dbinoms(c(1, 3, 6), 8, 0.5, draw = TRUE, text = TRUE)

Calculates the degrees of freedom and the pooled variation for a t test

Description

The function dfcalc is used only for two samples t test, when the group variations are NOT equal. For small and independent samples, and unknown but equal population variances, the variances of the two samples are used. As the sample variances are never equal, this function calculates their pooled variance based on the two standard deviations and their respective sample sizes.

Usage

dfcalc(x, y, n1, n2)
spooled(x, y, n1, n2)
dfcalc(x, y, n1, n2)
spooled(x, y, n1, n2)

Arguments

`x`	The values of the standard deviation for the first group
`y`	The values of the standard deviation for the second group
`n1`	Size of the first group
`n2`	Size of the second group

Author(s)

Adrian Dusa

Examples

group1 <- c(13, 14,  9, 12,  8, 10,  5, 10,  9, 12, 16)
group2 <- c(16, 18, 11, 19, 14, 17, 13, 16, 17, 18, 22, 12)
sd1 <- sd(group1)
sd2 <- sd(group2)
n1 <- length(group1)
n2 <- length(group2)

# more direct
dfcalc(group1, group2)

# if the standard deviations and group sizes are known
dfcalc(sd1, sd2, n1, n2)

# the pooled standard deviation
spooled(sd1, sd2, n1, n2)

# more direct
spooled(group1, group2)
group1 <- c(13, 14,  9, 12,  8, 10,  5, 10,  9, 12, 16)
group2 <- c(16, 18, 11, 19, 14, 17, 13, 16, 17, 18, 22, 12)
sd1 <- sd(group1)
sd2 <- sd(group2)
n1 <- length(group1)
n2 <- length(group2)

# more direct
dfcalc(group1, group2)

# if the standard deviations and group sizes are known
dfcalc(sd1, sd2, n1, n2)

# the pooled standard deviation
spooled(sd1, sd2, n1, n2)

# more direct
spooled(group1, group2)

Histogram with a superimposed normal curve

Description

Draws a histogram with a normal curve that approximates the distribution.

Usage

histc(x, from, to, size = 15, ...)
histc(x, from, to, size = 15, ...)

Arguments

`x`	Numeric vector
`from`	Starting point on the horizontal axis.
`to`	End point on the horizontal axis.
`size`	Size of the graphic, in centimeters.
`...`	Other parameters, specific to the base `hist()` function.

Author(s)

Adrian Dusa

Examples

x <- sample(18:93, 150, replace = TRUE)

histc(x)

histc(x, 10, 100)

histc(x, 10, 100, xlab = "Age", ylab = "Frequency",
      main = "Histogram for age in years")
x <- sample(18:93, 150, replace = TRUE)

histc(x)

histc(x, 10, 100)

histc(x, 10, 100, xlab = "Age", ylab = "Frequency",
      main = "Histogram for age in years")

Calculates the mean and the standard deviation of a discreet random variable

Description

The function expects a tabel (a data frame or a matrix) with just two columns: the first containing the values of a random variable, and the associated probabilities in the second column.

Usage

mbinom(x)
sbinom(x)
mbinom(x)
sbinom(x)

Arguments

`x`	The data table.

Details

If the sum of the probabilities on the second columns is not equal to 1, the function interprets them as absolute values and recalculates the relative frequencies.

Author(s)

Adrian Dusa

Examples


data <- matrix(c(0:4, 0.015, 0.235, 0.425, 0.245, 0.080), ncol = 2)
mbinom(data)
sbinom(data)

data <- data.frame(X = 0:4, P_X = c(12, 188, 340, 196, 64))
mbinom(data)
sbinom(data) 

data <- matrix(c(0:4, 0.015, 0.235, 0.425, 0.245, 0.080), ncol = 2)
mbinom(data)
sbinom(data)

data <- data.frame(X = 0:4, P_X = c(12, 188, 340, 196, 64))
mbinom(data)
sbinom(data)

Student's t test with a preliminary testing for the homogeneity of variances

Description

This function executes the t test for one or two groups. In case of two independent groups, the function verifies if the group variances are equal, using the Ansari-Bradley test.

Usage

t_testAB(
  x, y = NULL,
  alternative = c("two.sided", "less", "greater"), var.equal = FALSE,
  mu = 0, paired = FALSE, conf.level = 0.95, data = NULL
)
t_testAB(
  x, y = NULL,
  alternative = c("two.sided", "less", "greater"), var.equal = FALSE,
  mu = 0, paired = FALSE, conf.level = 0.95, data = NULL
)

Arguments

`x`	A numeric vector.
`y`	An optional numeric vector, corresponding to the second group.
`alternative`	Character, for the alternative hypothesis. See details below.
`var.equal`	Logical argument indicating whether to treat the two variances as being equal
`mu`	A number indicating the true value of the mean (or difference in means if performing a two sample test).
`paired`	Logical indicating whether to perform a paired t-test.
`conf.level`	Confidence level of the interval
`data`	An optional matrix or a set of data containing the variables from a formula

Details

The argument alternative follows the standard in the base function t.test(), and it can be "two.sided", "less" or "greater". In addition to those options, this function also allows for "!=" and "two.tailed" for the bidirectional alternative hypothesis, as well as "<" and "lower" for the one tailed test on the left tail, and ">" and "higher" for the right tailed test, respectively.

Author(s)

Adrian Dusa

Examples


group1 <- c(13, 14,  9, 12,  8, 10,  5, 10,  9, 12, 16)
group2 <- c(16, 18, 11, 19, 14, 17, 13, 16, 17, 18, 22, 12)

t_testAB(group1, group2)


# or, if the variables are inside a dataset
dataset <- data.frame(
  values = c(group1, group2),
  group = c(rep(1,11), rep(2,12))
)

t_testAB(values ~ group, data = dataset)

group1 <- c(13, 14,  9, 12,  8, 10,  5, 10,  9, 12, 16)
group2 <- c(16, 18, 11, 19, 14, 17, 13, 16, 17, 18, 22, 12)

t_testAB(group1, group2)


# or, if the variables are inside a dataset
dataset <- data.frame(
  values = c(group1, group2),
  group = c(rep(1,11), rep(2,12))
)

t_testAB(values ~ group, data = dataset)

Package 'statistics'

Help Index

A package containing useful functions to teach introductory statistics.

Description

Details

Author(s)

See Also

ANOVA including the homogeneity of variance test

Description

Usage

Arguments

Details

Author(s)

See Also

Examples

Calculate and draw the area under the normal curve z

Description

Usage

Arguments

Details

Author(s)

See Also

Examples

Calculate and draw the area under the t distribution

Description

Usage

Arguments

Details

Author(s)

See Also

Examples

Calculate probabilities and draw graphics for a binomial distribution

Description

Usage

Arguments

Author(s)

See Also

Examples

Calculates the degrees of freedom and the pooled variation for a t test

Description

Usage

Arguments

Author(s)

Examples

Histogram with a superimposed normal curve

Description

Usage

Arguments

Author(s)

Examples

Calculates the mean and the standard deviation of a discreet random variable

Description

Usage

Arguments

Details

Author(s)

Examples

Student's t test with a preliminary testing for the homogeneity of variances

Description

Usage

Arguments

Details

Author(s)

Examples