---
title: Ways to Enter Data for `t.test`
author:
name: Alex Kaizer
roles: "Instructor"
affiliation: University of Colorado-Anschutz Medical Campus
toc: true
toc_float: true
toc-location: left
format:
html:
code-fold: show
code-overflow: wrap
code-tools: true
---
```{r, echo=F, message=F, warning=F}
library(kableExtra)
library(dplyr)
```
This page is part of the University of Colorado-Anschutz Medical Campus' [BIOS 6618 Recitation](/recitation/index.qmd) collection. To view other questions, you can view the [BIOS 6618 Recitation](/recitation/index.qmd) collection page or use the search bar to look for keywords.
# `t.test` Functionality for Data Entry
We saw in some of our earlier examples and HW1 that the t-test in R can have its data entered using arguments `x, y`. If you pull up the help function for the t-test (e.g., `?t.test`), you will see that it includes a "Default S3 method" that does use `x, y`, but also an "S3 method for class 'formula'". We will examine the use of the `sleep` dataset in R for a few ways we could implement our t-test:
```{r}
sleep
```
## Approach 1: Manually enter the data to use
For this approach, we will manually create the vectors of data to use. Given that we have the `sleep` data frame already, this is an inefficient approach. However, it isn't a terrible approach if you have a small data set that isn't already in R.
```{r}
group1_extra <- c(0.7,-1.6,-0.2,-1.2,-0.1,3.4,3.7,0.8,0.0,2.0)
group2_extra <- c(1.9,0.8,1.1,0.1,-0.1,4.4,5.5,1.6,4.6,3.4)
t.test(x=group1_extra, y=group2_extra)
```
## Approach 2: Subset the data from `sleep` to use
In this approach we will extract the information from our `sleep` data frame and then enter it using the "Default S3 method" approach with `x=...,y=...`. This works well if you already have data entered into R as a data frame. There are a host of ways one can extract this information (and we'll dig into some of these later as well):
```{r}
# Method 1: Pull out the column "extra" from the data frame sleep using the $ operator and subset by group
group1_extra_extracted1 <- sleep$extra[ which(sleep$group==1) ]
group2_extra_extracted1 <- sleep$extra[ which(sleep$group==2) ]
t.test(x=group1_extra_extracted1, y=group2_extra_extracted1)
# Method 2: Subset the data from the columns using dataframe[row,column] notation
# NOTE: the "which(sleep$group==1)" identifies the row numbers where group is equal to 1
# NOTE: the "'extra'" piece is requesting the column of data called extra from the data frame
group1_extra_extracted2 <- sleep[which(sleep$group==1), 'extra' ]
group2_extra_extracted2 <- sleep[which(sleep$group==2), 'extra' ]
t.test(x=group1_extra_extracted2, y=group2_extra_extracted2)
```
If you have an object that is a *matrix* instead of a data frame object, the `$` operator no longer works to extract a column. However, you can still reference with the `matrix[row,column]` approach. There is some wonky behavior in our case where we are coercing the data frame into a matrix (i.e., it made everything a character), so we have to add a step to make it numeric again. However, if we created the data in a matrix where everything was already numeric it would work fine:
```{r}
sleep_mat <- as.matrix(sleep)
sleep_mat # NOTICE IT HAS COERCED EVERYTHING TO BE A CHARACTER INSTEAD OF A NUMBER!!!
group1_extra_extracted3 <- sleep_mat[ which(sleep_mat[,'group']==1), 'extra']
group1_extra_extracted3 <- as.numeric(group1_extra_extracted3)
group2_extra_extracted3 <- sleep_mat[ which(sleep_mat[,'group']==2), 'extra']
group2_extra_extracted3 <- as.numeric(group2_extra_extracted3)
t.test(x=group1_extra_extracted3, y=group2_extra_extracted3)
```
## Approach 3: Use the formula
Oftentimes we have sample sizes that are too large to enter manually, and we are already reading in the data from either an external file or existing R data frame, we can just leverage the formula representation:
```{r}
t.test(extra ~ group, data=sleep)
```
In this case, we put our outcome (`extra`) on the left side of the `~` operator and our group variable (`group`) on the right side. We'll see this notation frequently used when we get to our linear regression programming approaches in the latter portion fo the semester. There are also other functions beyond the t-test that may have this functionality, so feel free to look at the help file or other code examples that are out there to play around with.