Ways to Enter Data for `t.test`

Author

Affiliation

Alex Kaizer

University of Colorado-Anschutz Medical Campus

This page is part of the University of Colorado-Anschutz Medical Campus’ BIOS 6618 Recitation collection. To view other questions, you can view the BIOS 6618 Recitation collection page or use the search bar to look for keywords.

`t.test` Functionality for Data Entry

We saw in some of our earlier examples and HW1 that the t-test in R can have its data entered using arguments x, y. If you pull up the help function for the t-test (e.g., ?t.test), you will see that it includes a “Default S3 method” that does use x, y, but also an “S3 method for class ‘formula’”. We will examine the use of the sleep dataset in R for a few ways we could implement our t-test:

Code

sleep

   extra group ID
1    0.7     1  1
2   -1.6     1  2
3   -0.2     1  3
4   -1.2     1  4
5   -0.1     1  5
6    3.4     1  6
7    3.7     1  7
8    0.8     1  8
9    0.0     1  9
10   2.0     1 10
11   1.9     2  1
12   0.8     2  2
13   1.1     2  3
14   0.1     2  4
15  -0.1     2  5
16   4.4     2  6
17   5.5     2  7
18   1.6     2  8
19   4.6     2  9
20   3.4     2 10

Approach 1: Manually enter the data to use

For this approach, we will manually create the vectors of data to use. Given that we have the sleep data frame already, this is an inefficient approach. However, it isn’t a terrible approach if you have a small data set that isn’t already in R.

Code

group1_extra <- c(0.7,-1.6,-0.2,-1.2,-0.1,3.4,3.7,0.8,0.0,2.0)
group2_extra <- c(1.9,0.8,1.1,0.1,-0.1,4.4,5.5,1.6,4.6,3.4)

t.test(x=group1_extra, y=group2_extra)


    Welch Two Sample t-test

data:  group1_extra and group2_extra
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.3654832  0.2054832
sample estimates:
mean of x mean of y 
     0.75      2.33

Approach 2: Subset the data from `sleep` to use

In this approach we will extract the information from our sleep data frame and then enter it using the “Default S3 method” approach with x=...,y=.... This works well if you already have data entered into R as a data frame. There are a host of ways one can extract this information (and we’ll dig into some of these later as well):

Code

# Method 1: Pull out the column "extra" from the data frame sleep using the $ operator and subset by group
group1_extra_extracted1 <- sleep$extra[ which(sleep$group==1) ]
group2_extra_extracted1 <- sleep$extra[ which(sleep$group==2) ]
t.test(x=group1_extra_extracted1, y=group2_extra_extracted1)


    Welch Two Sample t-test

data:  group1_extra_extracted1 and group2_extra_extracted1
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.3654832  0.2054832
sample estimates:
mean of x mean of y 
     0.75      2.33

Code

# Method 2: Subset the data from the columns using dataframe[row,column] notation
# NOTE: the "which(sleep$group==1)" identifies the row numbers where group is equal to 1
# NOTE: the "'extra'" piece is requesting the column of data called extra from the data frame
group1_extra_extracted2 <- sleep[which(sleep$group==1), 'extra' ]
group2_extra_extracted2 <- sleep[which(sleep$group==2), 'extra' ]
t.test(x=group1_extra_extracted2, y=group2_extra_extracted2)


    Welch Two Sample t-test

data:  group1_extra_extracted2 and group2_extra_extracted2
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.3654832  0.2054832
sample estimates:
mean of x mean of y 
     0.75      2.33

If you have an object that is a matrix instead of a data frame object, the $ operator no longer works to extract a column. However, you can still reference with the matrix[row,column] approach. There is some wonky behavior in our case where we are coercing the data frame into a matrix (i.e., it made everything a character), so we have to add a step to make it numeric again. However, if we created the data in a matrix where everything was already numeric it would work fine:

Code

sleep_mat <- as.matrix(sleep)
sleep_mat # NOTICE IT HAS COERCED EVERYTHING TO BE A CHARACTER INSTEAD OF A NUMBER!!!

      extra  group ID  
 [1,] " 0.7" "1"   "1" 
 [2,] "-1.6" "1"   "2" 
 [3,] "-0.2" "1"   "3" 
 [4,] "-1.2" "1"   "4" 
 [5,] "-0.1" "1"   "5" 
 [6,] " 3.4" "1"   "6" 
 [7,] " 3.7" "1"   "7" 
 [8,] " 0.8" "1"   "8" 
 [9,] " 0.0" "1"   "9" 
[10,] " 2.0" "1"   "10"
[11,] " 1.9" "2"   "1" 
[12,] " 0.8" "2"   "2" 
[13,] " 1.1" "2"   "3" 
[14,] " 0.1" "2"   "4" 
[15,] "-0.1" "2"   "5" 
[16,] " 4.4" "2"   "6" 
[17,] " 5.5" "2"   "7" 
[18,] " 1.6" "2"   "8" 
[19,] " 4.6" "2"   "9" 
[20,] " 3.4" "2"   "10"

Code

group1_extra_extracted3 <- sleep_mat[ which(sleep_mat[,'group']==1), 'extra']
group1_extra_extracted3 <- as.numeric(group1_extra_extracted3)

group2_extra_extracted3 <- sleep_mat[ which(sleep_mat[,'group']==2), 'extra']
group2_extra_extracted3 <- as.numeric(group2_extra_extracted3)

t.test(x=group1_extra_extracted3, y=group2_extra_extracted3)


    Welch Two Sample t-test

data:  group1_extra_extracted3 and group2_extra_extracted3
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.3654832  0.2054832
sample estimates:
mean of x mean of y 
     0.75      2.33

Approach 3: Use the formula

Oftentimes we have sample sizes that are too large to enter manually, and we are already reading in the data from either an external file or existing R data frame, we can just leverage the formula representation:

Code

t.test(extra ~ group, data=sleep)


    Welch Two Sample t-test

data:  extra by group
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -3.3654832  0.2054832
sample estimates:
mean in group 1 mean in group 2 
           0.75            2.33

In this case, we put our outcome (extra) on the left side of the ~ operator and our group variable (group) on the right side. We’ll see this notation frequently used when we get to our linear regression programming approaches in the latter portion fo the semester. There are also other functions beyond the t-test that may have this functionality, so feel free to look at the help file or other code examples that are out there to play around with.

t.test Functionality for Data Entry

Approach 1: Manually enter the data to use

Approach 2: Subset the data from sleep to use

Approach 3: Use the formula

`t.test` Functionality for Data Entry

Approach 2: Subset the data from `sleep` to use