Title: | Miscellanea Functions For Intro Stats Classes |
---|---|
Description: | The majority of functions in this package are designed to facilitate understanding the statistical concepts taught in the class (such as the functions to create graphics or areas under the normal curve), while some are designed to ease calculations for the lab exercises (for instance the degrees of freedom or the pooled variation for two independent samples). |
Authors: | Adrian Dusa [aut, cre, cph] |
Maintainer: | Adrian Dusa <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.8 |
Built: | 2024-10-30 14:13:51 UTC |
Source: | https://github.com/dusadrian/statistics |
For discreet random variables, draws and calculates the probability of a certain number of favourable outcomes out of a number of repetition of an experiment. For a continous random variable, the graphics represents a normal curve with the area to the left or to the right of a certain z or t value, or between two such values. The package also contains functions to calculate the degrees of freedom and the pooled standard deviation using the t distribution etc.
Package: | statistics |
Type: | Package |
Version: | 0.8 |
Date: | 2024-10-29 |
License: | GPL-v3 |
Adrian Dusa
Maintainer: Adrian Dusa ([email protected])
dnorm, pnorm, dbinom
The function 'anovaFK' - contains two separate tests: in a first state, the Fligner-Killeen test for the homogeneity of variances is run, and function of this test, the Welch approximation is applied if the groups are not homogeneous.
anovaFK(x, y = NULL, data)
anovaFK(x, y = NULL, data)
x |
A vector of values or a formula object as in 'lhs ~ rhs', unde 'lhs' contains the values and the 'rhs' contains the groups. Both can be vectors or variables from a dataset. |
y |
An optional vector of values, when the two variables are not specified using a formula object. |
data |
A dataset containing the variables specified in the formula object, in case they don't exist as separate objects. |
When the variances are not equal, the output differs from the one presented by oneway.test, but the table is similar.
If the degrees of freedom are not what they should be (, and
respectively) something must be wrong. Specifically, the grouping variable should be
declared as a factor (in case it is not already character), otherwise it is considered
metric and a regression model is applied instead of ANOVA.
Declaring a variable as a factor is done using the command: as.factor
Adrian Dusa
aov
, anova
, oneway.test
, fligner.test
values <- c(15, 8, 17, 7, 26, 12, 8, 11, 16, 9, 16, 24, 20, 19, 9, 17, 11, 8, 15, 6, 14) groups <- rep(1:3, each = 7) anovaFK(values ~ groups) # same thing with: anovaFK(values, groups) # using a dataset vgdf <- data.frame(values, groups) using(vgdf, anovaFK(values ~ groups))
values <- c(15, 8, 17, 7, 26, 12, 8, 11, 16, 9, 16, 24, 20, 19, 9, 17, 11, 8, 15, 6, 14) groups <- rep(1:3, each = 7) anovaFK(values ~ groups) # same thing with: anovaFK(values, groups) # using a dataset vgdf <- data.frame(values, groups) using(vgdf, anovaFK(values ~ groups))
The function "daria" - 'd'raws the 'area' under the normal curve for certain values of z.
daria(area, z1, z2, draw = FALSE)
daria(area, z1, z2, draw = FALSE)
area |
The required area |
z1 |
First z value, in the interval +/- 4 |
z2 |
Second z value, in the interval +/- 4 |
draw |
Logical; if TRUE, draw the area |
In the argument area
, the function accepts:
"l"
, "u"
, "left"
and "under"
for the area to the left of z,
"r"
, "o"
, "a"
, "right"
"over"
and "above"
for the area to the right of z
"b"
and "between"
for the area between two z values.
z values smaller than -4 and greater than +4 are truncated to these values, since the area to the left and to the right of these values is practically equal to zero.
Adrian Dusa
daria("between", -1.96, 1.96) daria("over", -1) daria("under", -1) daria("over", 2, draw = TRUE)
daria("between", -1.96, 1.96) daria("over", -1) daria("under", -1) daria("over", 2, draw = TRUE)
A function similar to "daria", with the only difference it uses the t instead of the z distribution. In addition, the function expects an additional parameter for the degrees of freedom.
dariat(area, t1, t2, df, draw = FALSE)
dariat(area, t1, t2, df, draw = FALSE)
area |
The required area |
t1 |
First t value, in the interval +/- 4 |
t2 |
Second t value, in the interval +/- 4 |
df |
Degrees of freedom |
draw |
Logical; if TRUE, draw the area |
In the argument area
, the function accepts:
"l"
, "u"
, "left"
and "under"
for the area to the left of z,
"r"
, "o"
, "a"
, "right"
"over"
and "above"
for the area to the right of z
"b"
and "between"
for the area between two z values.
z values smaller than -4 and greater than +4 are truncated to these values, since the area to the left and to the right of these values is practically equal to zero.
Adrian Dusa
# for 100 degrees of freedom dariat("between", -1.96, 1.96, df = 100) dariat("over", -1, df = 100) dariat("under", -1, df = 100) dariat("over", 2, df = 100, draw = TRUE)
# for 100 degrees of freedom dariat("between", -1.96, 1.96, df = 100) dariat("over", -1, df = 100) dariat("under", -1, df = 100) dariat("over", 2, df = 100, draw = TRUE)
This function draws graphics for a certain number of repetitions of an experiment, at a certain probability of success, and calculates the probability of obtaining one or more values from a random variable.
dbinoms(x, size, prob, log = FALSE, draw = FALSE, zoom = FALSE, new = FALSE, text = FALSE)
dbinoms(x, size, prob, log = FALSE, draw = FALSE, zoom = FALSE, new = FALSE, text = FALSE)
x |
Number of favourable outcomes: a value or a vector of values |
size |
Number of repetitions |
prob |
Probability of success |
log |
Logical; if TRUE, the probability is returned as log(p) |
draw |
Logical; if TRUE, draws the binomial distribution |
zoom |
Logical; if TRUE, eliminates from the graphic all numbers with probability equal to zero |
new |
Logical; if TRUE, a new window will be created for each graphic |
text |
Logical; if TRUE, display the probability above each bar |
Adrian Dusa
# 8 repetitions, with a 0.5 probability of success, calculate the # probability of obtaining between 2 and 4 favourable outcomes dbinoms(2:4, 8, 0.5) # less than 7 favourable outcomes dbinoms(0:6, 8, 0.5) #at most 7 favourable outcomes dbinoms(0:7, 8, 0.5) # above 5 favourable outcomes dbinoms(6:8, 8, 0.5) # at least 5 favourable outcomes dbinoms(5:8, 8, 0.5) # exactly 6 favourable outcomes dbinoms(6, 8, 0.5) # 1, 3 or 6 favourable outcomes dbinoms(c(1, 3, 6), 8, 0.5) # same, drawing the graphic dbinoms(c(1, 3, 6), 8, 0.5, draw = TRUE) # same, drawing the probabilities in the graphic dbinoms(c(1, 3, 6), 8, 0.5, draw = TRUE, text = TRUE)
# 8 repetitions, with a 0.5 probability of success, calculate the # probability of obtaining between 2 and 4 favourable outcomes dbinoms(2:4, 8, 0.5) # less than 7 favourable outcomes dbinoms(0:6, 8, 0.5) #at most 7 favourable outcomes dbinoms(0:7, 8, 0.5) # above 5 favourable outcomes dbinoms(6:8, 8, 0.5) # at least 5 favourable outcomes dbinoms(5:8, 8, 0.5) # exactly 6 favourable outcomes dbinoms(6, 8, 0.5) # 1, 3 or 6 favourable outcomes dbinoms(c(1, 3, 6), 8, 0.5) # same, drawing the graphic dbinoms(c(1, 3, 6), 8, 0.5, draw = TRUE) # same, drawing the probabilities in the graphic dbinoms(c(1, 3, 6), 8, 0.5, draw = TRUE, text = TRUE)
The function dfcalc
is used only for two samples t test, when the group
variations are NOT equal.
For small and independent samples, and unknown but equal population variances,
the variances of the two samples are used. As the sample variances are never
equal, this function calculates their pooled variance based on the two standard
deviations and their respective sample sizes.
dfcalc(x, y, n1, n2) spooled(x, y, n1, n2)
dfcalc(x, y, n1, n2) spooled(x, y, n1, n2)
x |
The values of the standard deviation for the first group |
y |
The values of the standard deviation for the second group |
n1 |
Size of the first group |
n2 |
Size of the second group |
Adrian Dusa
group1 <- c(13, 14, 9, 12, 8, 10, 5, 10, 9, 12, 16) group2 <- c(16, 18, 11, 19, 14, 17, 13, 16, 17, 18, 22, 12) sd1 <- sd(group1) sd2 <- sd(group2) n1 <- length(group1) n2 <- length(group2) # more direct dfcalc(group1, group2) # if the standard deviations and group sizes are known dfcalc(sd1, sd2, n1, n2) # the pooled standard deviation spooled(sd1, sd2, n1, n2) # more direct spooled(group1, group2)
group1 <- c(13, 14, 9, 12, 8, 10, 5, 10, 9, 12, 16) group2 <- c(16, 18, 11, 19, 14, 17, 13, 16, 17, 18, 22, 12) sd1 <- sd(group1) sd2 <- sd(group2) n1 <- length(group1) n2 <- length(group2) # more direct dfcalc(group1, group2) # if the standard deviations and group sizes are known dfcalc(sd1, sd2, n1, n2) # the pooled standard deviation spooled(sd1, sd2, n1, n2) # more direct spooled(group1, group2)
Draws a histogram with a normal curve that approximates the distribution.
histc(x, from, to, size = 15, ...)
histc(x, from, to, size = 15, ...)
x |
Numeric vector |
from |
Starting point on the horizontal axis. |
to |
End point on the horizontal axis. |
size |
Size of the graphic, in centimeters. |
... |
Other parameters, specific to the base |
Adrian Dusa
x <- sample(18:93, 150, replace = TRUE) histc(x) histc(x, 10, 100) histc(x, 10, 100, xlab = "Age", ylab = "Frequency", main = "Histogram for age in years")
x <- sample(18:93, 150, replace = TRUE) histc(x) histc(x, 10, 100) histc(x, 10, 100, xlab = "Age", ylab = "Frequency", main = "Histogram for age in years")
The function expects a tabel (a data frame or a matrix) with just two columns: the first containing the values of a random variable, and the associated probabilities in the second column.
mbinom(x) sbinom(x)
mbinom(x) sbinom(x)
x |
The data table. |
If the sum of the probabilities on the second columns is not equal to 1, the function interprets them as absolute values and recalculates the relative frequencies.
Adrian Dusa
data <- matrix(c(0:4, 0.015, 0.235, 0.425, 0.245, 0.080), ncol = 2) mbinom(data) sbinom(data) data <- data.frame(X = 0:4, P_X = c(12, 188, 340, 196, 64)) mbinom(data) sbinom(data)
data <- matrix(c(0:4, 0.015, 0.235, 0.425, 0.245, 0.080), ncol = 2) mbinom(data) sbinom(data) data <- data.frame(X = 0:4, P_X = c(12, 188, 340, 196, 64)) mbinom(data) sbinom(data)
This function executes the t test for one or two groups. In case of two independent groups, the function verifies if the group variances are equal, using the Ansari-Bradley test.
t_testAB( x, y = NULL, alternative = c("two.sided", "less", "greater"), var.equal = FALSE, mu = 0, paired = FALSE, conf.level = 0.95, data = NULL )
t_testAB( x, y = NULL, alternative = c("two.sided", "less", "greater"), var.equal = FALSE, mu = 0, paired = FALSE, conf.level = 0.95, data = NULL )
x |
A numeric vector. |
y |
An optional numeric vector, corresponding to the second group. |
alternative |
Character, for the alternative hypothesis. See details below. |
var.equal |
Logical argument indicating whether to treat the two variances as being equal |
mu |
A number indicating the true value of the mean (or difference in means if performing a two sample test). |
paired |
Logical indicating whether to perform a paired t-test. |
conf.level |
Confidence level of the interval |
data |
An optional matrix or a set of data containing the variables from a formula |
The argument alternative
follows the standard in the base function
t.test
(), and it can be "two.sided"
, "less"
or
"greater"
. In addition to those options, this function also allows for
"!="
and "two.tailed"
for the bidirectional alternative hypothesis,
as well as "<"
and "lower"
for the one tailed test on the left tail,
and ">"
and "higher"
for the right tailed test, respectively.
Adrian Dusa
group1 <- c(13, 14, 9, 12, 8, 10, 5, 10, 9, 12, 16) group2 <- c(16, 18, 11, 19, 14, 17, 13, 16, 17, 18, 22, 12) t_testAB(group1, group2) # or, if the variables are inside a dataset dataset <- data.frame( values = c(group1, group2), group = c(rep(1,11), rep(2,12)) ) t_testAB(values ~ group, data = dataset)
group1 <- c(13, 14, 9, 12, 8, 10, 5, 10, 9, 12, 16) group2 <- c(16, 18, 11, 19, 14, 17, 13, 16, 17, 18, 22, 12) t_testAB(group1, group2) # or, if the variables are inside a dataset dataset <- data.frame( values = c(group1, group2), group = c(rep(1,11), rep(2,12)) ) t_testAB(values ~ group, data = dataset)