This page is part of the University of Colorado-Anschutz Medical Campus’ BIOS 6618 Recitation collection. To view other questions, you can view the BIOS 6618 Recitation collection page or use the search bar to look for keywords.
t.test Functionality for Data Entry
We saw in some of our earlier examples and HW1 that the t-test in R can have its data entered using arguments x, y. If you pull up the help function for the t-test (e.g., ?t.test), you will see that it includes a “Default S3 method” that does use x, y, but also an “S3 method for class ‘formula’”. We will examine the use of the sleep dataset in R for a few ways we could implement our t-test:
For this approach, we will manually create the vectors of data to use. Given that we have the sleep data frame already, this is an inefficient approach. However, it isn’t a terrible approach if you have a small data set that isn’t already in R.
Welch Two Sample t-test
data: group1_extra and group2_extra
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.3654832 0.2054832
sample estimates:
mean of x mean of y
0.75 2.33
Approach 2: Subset the data from sleep to use
In this approach we will extract the information from our sleep data frame and then enter it using the “Default S3 method” approach with x=...,y=.... This works well if you already have data entered into R as a data frame. There are a host of ways one can extract this information (and we’ll dig into some of these later as well):
Code
# Method 1: Pull out the column "extra" from the data frame sleep using the $ operator and subset by groupgroup1_extra_extracted1 <- sleep$extra[ which(sleep$group==1) ]group2_extra_extracted1 <- sleep$extra[ which(sleep$group==2) ]t.test(x=group1_extra_extracted1, y=group2_extra_extracted1)
Welch Two Sample t-test
data: group1_extra_extracted1 and group2_extra_extracted1
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.3654832 0.2054832
sample estimates:
mean of x mean of y
0.75 2.33
Code
# Method 2: Subset the data from the columns using dataframe[row,column] notation# NOTE: the "which(sleep$group==1)" identifies the row numbers where group is equal to 1# NOTE: the "'extra'" piece is requesting the column of data called extra from the data framegroup1_extra_extracted2 <- sleep[which(sleep$group==1), 'extra' ]group2_extra_extracted2 <- sleep[which(sleep$group==2), 'extra' ]t.test(x=group1_extra_extracted2, y=group2_extra_extracted2)
Welch Two Sample t-test
data: group1_extra_extracted2 and group2_extra_extracted2
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.3654832 0.2054832
sample estimates:
mean of x mean of y
0.75 2.33
If you have an object that is a matrix instead of a data frame object, the $ operator no longer works to extract a column. However, you can still reference with the matrix[row,column] approach. There is some wonky behavior in our case where we are coercing the data frame into a matrix (i.e., it made everything a character), so we have to add a step to make it numeric again. However, if we created the data in a matrix where everything was already numeric it would work fine:
Code
sleep_mat <-as.matrix(sleep)sleep_mat # NOTICE IT HAS COERCED EVERYTHING TO BE A CHARACTER INSTEAD OF A NUMBER!!!
Welch Two Sample t-test
data: group1_extra_extracted3 and group2_extra_extracted3
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.3654832 0.2054832
sample estimates:
mean of x mean of y
0.75 2.33
Approach 3: Use the formula
Oftentimes we have sample sizes that are too large to enter manually, and we are already reading in the data from either an external file or existing R data frame, we can just leverage the formula representation:
Code
t.test(extra ~ group, data=sleep)
Welch Two Sample t-test
data: extra by group
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-3.3654832 0.2054832
sample estimates:
mean in group 1 mean in group 2
0.75 2.33
In this case, we put our outcome (extra) on the left side of the ~ operator and our group variable (group) on the right side. We’ll see this notation frequently used when we get to our linear regression programming approaches in the latter portion fo the semester. There are also other functions beyond the t-test that may have this functionality, so feel free to look at the help file or other code examples that are out there to play around with.
Source Code
---title: Ways to Enter Data for `t.test`author: name: Alex Kaizer roles: "Instructor" affiliation: University of Colorado-Anschutz Medical Campustoc: truetoc_float: truetoc-location: leftformat: html: code-fold: show code-overflow: wrap code-tools: true---```{r, echo=F, message=F, warning=F}library(kableExtra)library(dplyr)```This page is part of the University of Colorado-Anschutz Medical Campus' [BIOS 6618 Recitation](/recitation/index.qmd) collection. To view other questions, you can view the [BIOS 6618 Recitation](/recitation/index.qmd) collection page or use the search bar to look for keywords.# `t.test` Functionality for Data EntryWe saw in some of our earlier examples and HW1 that the t-test in R can have its data entered using arguments `x, y`. If you pull up the help function for the t-test (e.g., `?t.test`), you will see that it includes a "Default S3 method" that does use `x, y`, but also an "S3 method for class 'formula'". We will examine the use of the `sleep` dataset in R for a few ways we could implement our t-test:```{r}sleep```## Approach 1: Manually enter the data to useFor this approach, we will manually create the vectors of data to use. Given that we have the `sleep` data frame already, this is an inefficient approach. However, it isn't a terrible approach if you have a small data set that isn't already in R.```{r}group1_extra <-c(0.7,-1.6,-0.2,-1.2,-0.1,3.4,3.7,0.8,0.0,2.0)group2_extra <-c(1.9,0.8,1.1,0.1,-0.1,4.4,5.5,1.6,4.6,3.4)t.test(x=group1_extra, y=group2_extra)```## Approach 2: Subset the data from `sleep` to useIn this approach we will extract the information from our `sleep` data frame and then enter it using the "Default S3 method" approach with `x=...,y=...`. This works well if you already have data entered into R as a data frame. There are a host of ways one can extract this information (and we'll dig into some of these later as well):```{r}# Method 1: Pull out the column "extra" from the data frame sleep using the $ operator and subset by groupgroup1_extra_extracted1 <- sleep$extra[ which(sleep$group==1) ]group2_extra_extracted1 <- sleep$extra[ which(sleep$group==2) ]t.test(x=group1_extra_extracted1, y=group2_extra_extracted1)# Method 2: Subset the data from the columns using dataframe[row,column] notation# NOTE: the "which(sleep$group==1)" identifies the row numbers where group is equal to 1# NOTE: the "'extra'" piece is requesting the column of data called extra from the data framegroup1_extra_extracted2 <- sleep[which(sleep$group==1), 'extra' ]group2_extra_extracted2 <- sleep[which(sleep$group==2), 'extra' ]t.test(x=group1_extra_extracted2, y=group2_extra_extracted2)```If you have an object that is a *matrix* instead of a data frame object, the `$` operator no longer works to extract a column. However, you can still reference with the `matrix[row,column]` approach. There is some wonky behavior in our case where we are coercing the data frame into a matrix (i.e., it made everything a character), so we have to add a step to make it numeric again. However, if we created the data in a matrix where everything was already numeric it would work fine:```{r}sleep_mat <-as.matrix(sleep)sleep_mat # NOTICE IT HAS COERCED EVERYTHING TO BE A CHARACTER INSTEAD OF A NUMBER!!!group1_extra_extracted3 <- sleep_mat[ which(sleep_mat[,'group']==1), 'extra']group1_extra_extracted3 <-as.numeric(group1_extra_extracted3)group2_extra_extracted3 <- sleep_mat[ which(sleep_mat[,'group']==2), 'extra']group2_extra_extracted3 <-as.numeric(group2_extra_extracted3)t.test(x=group1_extra_extracted3, y=group2_extra_extracted3)```## Approach 3: Use the formulaOftentimes we have sample sizes that are too large to enter manually, and we are already reading in the data from either an external file or existing R data frame, we can just leverage the formula representation:```{r}t.test(extra ~ group, data=sleep)```In this case, we put our outcome (`extra`) on the left side of the `~` operator and our group variable (`group`) on the right side. We'll see this notation frequently used when we get to our linear regression programming approaches in the latter portion fo the semester. There are also other functions beyond the t-test that may have this functionality, so feel free to look at the help file or other code examples that are out there to play around with.