How Do I Select a Column in a Data File I Uploaded to R

How to Work With Data Frames and CSV Files in R — A Detailed Introduction with Examples

Welcome! If you want to outset diving into data science and statistics, so data frames, CSV files, and R will exist essential tools for you lot. Let's see how you tin can use their amazing capabilities.

In this article, y'all will learn:

  • What CSV files are and what they are used for.
  • How to create CSV files using Google Sheets.
  • How to read CSV files in R.
  • What Information Frames are and what they are used for.
  • How to admission the elements of a data frame.
  • How to modify a data frame.
  • How to add and delete rows and columns.

We will use RStudio, an open-source IDE (Integrated Development Environment) to run the examples.

Let's begin! ✨

🔹 Introduction to CSV Files

CSV (Comma-separated Values) files can be considered i of the building blocks of data analysis considering they are used to store information represented in the form of a table.

In this file, values are separated past commas to represent the different columns of the tabular array, like in this case:

image-153
CSV File

We will generate this file using Google Sheets.

🔸 How to Create a CSV File Using Google Sheets

Let's create your first CSV file using Google Sheets.

Step 1: Go to the Google Sheets Website and click on "Go to Google Sheets":

image-227

💡 Tip: You can access Google Sheets by clicking on the button located at the top-correct edge of Google's Home Page:

image-228

If we zoom in, nosotros see the "Sheets" button:

image-156

💡 Tip: To use Google Sheets, you need to accept a Gmail account. Alternatively, y'all tin create a CSV file using MS Excel or another spreadsheet editor.

You will run into this panel:

image-157

Pace 2: Create a bare spreadsheet past clicking on the "+" button.

image-158

Now you have a new empty spreadsheet:

image-159

Stride three: Change the name of the spreadsheet to students_data. We will need to utilise the proper noun of the file to work with data frames. Write the new name and click enter to confirm the change.

image-162

Step iv: In the showtime row of the spreadsheet, write the titles of the columns.

image-160

When you import a CSV file in R, the titles of the columns are chosen variables. Nosotros will ascertain vi variables: first_name, last_name, age, num_siblings, num_pets, and eye_color, as y'all can encounter right here below:

image-163

💡 Tip: Notice that the names are written in lowercase and words are separated with an underscore. This is not mandatory, but since you will need to access these names in R, information technology's very common to use this format.

Step 5: Enter the data for each 1 of the columns.

When you lot read the file in R, each row is chosen an observation, and it corresponds to data taken from an individual, animal, object, or entity that we nerveless data from.

In this example, each row corresponds to the information of a student:

image-164

Step 6: Download the CSV file by clicking on File -> Download -> Comma-separated values, as you can see beneath:

image-165

Stride 7: Rename the file CSV file. You will need to remove "Sheet1" from the default proper noun considering Google Sheet will automatically add this to the proper name of the file.

image-169

Bully piece of work! Now y'all take your CSV file and it's time to start working with it in R.

🔹 How to Read a CSV file in R

In RStudio, the get-go footstep before reading a CSV file is making sure that your electric current working directory is the directory where the CSV file is located.

💡 Tip: If this is not the case, you lot will demand to utilize the full path to the file.

Change Current Working Directory

You can change your current working directory in this panel:

image-172

If nosotros zoom in, you tin see the current path (1) and select the new ane by clicking on the ellipsis (...) button to the right (two):

image-171

💡 Tip: You can also check your current working directory with getwd() in the interactive console.

And then, click "More" and "Set As Working Directory".

image-175

Read the CSV File

Once y'all have your current working directory ready upwards, you lot tin can read the CSV file with this command:

image-176

In R code, we have this:

                > students_data <- read.csv("students_data.csv")              

💡 Tip: We assign it to the variable students_data to admission the data of the CSV file with this variable. In R, we can carve up words using dots ., underscores _, UpperCamelCase, or lowerCamelCase.

After running this command, you will run across this in the top right panel:

image-177

At present y'all have a variable divers in the surround! Let's run into what data frames are and how they are closely related to CSV files.

🔸 Introduction to Information Frames

Data frames are the standard digital format used to store statistical data in the form of a tabular array. When you read a CSV file in R, a data frame is generated.

Nosotros can confirm this by checking the blazon of the variable with the class function:

                > class(students_data) [1] "data.frame"              

Information technology makes sense, right? CSV files contain data represented in the course of a table and data frames represent that tabular data in your code, so they are securely connected.

If yous enter this variable in the interactive console, you will see the content of the CSV file:

                > students_data   first_name last_name historic period num_siblings num_pets eye_color 1      Emily    Dawson  xv            2        v      Blueish ii       Rose Patterson  14            v        0     GREEN three  Alexander     Smith  16            0        2     BROWN iv       Nora    Navona  sixteen            4       ten     Light-green 5       Gino      Sand  17            3        8      BLUE              

More Information Virtually the Information Frame

You have several different alternatives to see the number of variables and observations of the data frame:

  • Your showtime selection is to look at the acme right panel that shows the variables that are currently divers in the environs. This data frame has 5 observations (rows) and 6 variables (columns):
image-178
  • Another alternative is to utilise the functions nrow and ncol in the interactive console or in your programme, passing the data frame as argument. We get the same results: 5 rows and 6 columns.
                > nrow(students_data) [1] 5 > ncol(students_data) [i] 6              
  • You can also see more information virtually the data frame using the str function:
                > str(students_data) 'information.frame':	5 obs. of  six variables:  $ first_name  : Factor w/ 5 levels "Alexander","Emily",..: 2 5 1 4 3  $ last_name   : Cistron w/ 5 levels "Dawson","Navona",..: 1 3 5 two 4  $ historic period         : int  xv 14 16 16 17  $ num_siblings: int  2 five 0 4 3  $ num_pets    : int  five 0 two 10 8  $ eye_color   : Factor w/ 3 levels "BLUE","Dark-brown",..: 1 3 2 3 one              

This function (applied to a data frame) tells y'all:

  • The number of observations (rows).
  • The number of variables (columns).
  • The names of the variables.
  • The information types of the variables.
  • More information about the variables.

You tin see that this function is really great when you desire to know more most the information that y'all are working with.

💡 Tip: In R, a "Factor" is a qualitative variable, which is a variable whose values stand for categories. For example, eye_color has the values "BLUE", "BROWN", "GREEN" which are categories, and then every bit you can run across in the output of str above, this variable is automatically divers as a "gene" when the CSV file is read in R.

🔹 Data Frames: Central Operations and Functions

Now you know how to see more information about the information frame. But the magic of data frames lies in the amazing capabilities and functionality that they offer, and then let's run across this in more than detail.

How to Access A Value of a Data Frame

Data frames are like matrices, so you lot can access private values using two indices surrounded by foursquare brackets and separated by a comma to indicate which rows and which columns you would like to include in the event, like this:

image-181

For case, if we desire to access the value of eye_color (column 6) of the quaternary educatee in the data (row 4):

image-182

Nosotros need to utilise this command:

                > students_data[four, 6]              

💡 Tip: In R, indices start at i and the kickoff row with the names of the variables is not counted.

This is the output:

                [1] GREEN Levels: Blueish BROWN GREEN              

You can run into that the value is "Dark-green". Variables of type "factor" accept "levels" that represent the unlike categories or values that they can take. This output tells usa the levels of the variable eye_color.

How to Access Rows and Columns of a Information Frame

We can also use this syntax to admission a range of rows and columns to get a portion of the original matrix, like this:

image-179

For case, if we desire to become the age and number of siblings of the third, fourth, and 5th pupil in the listing, we would use:

                > students_data[3:five, three:4]    age num_siblings 3  16            0 four  16            four 5  17            3              

💡 Tip: The basic syntax to ascertain an interval in R is <start>:<end>. Notation that these indices are inclusive, so the third and 5th elements are included in the example in a higher place when we write 3:5.

If we want to get all the rows or columns, we merely omit the interval and include the comma, like this:

                > students_data[3:5,]    first_name last_name age num_siblings num_pets eye_color 3  Alexander     Smith  16            0        2     Chocolate-brown 4       Nora    Navona  16            4       ten     GREEN 5       Gino      Sand  17            3        8      BLUE              

Nosotros did non include an interval for the columns after the comma in students_data[3:v,], so nosotros get all the columns of the data frame for the three rows that we specified.

Similarly, we can get all the rows for a specific range of columns if nosotros omit the rows:

                > students_data[, 1:3]    first_name last_name age ane      Emily    Dawson  fifteen two       Rose Patterson  14 3  Alexander     Smith  16 4       Nora    Navona  16 five       Gino      Sand  17              

💡 Tip: Detect that you still need to include the comma in both cases.

How to Admission a Column

There are three ways to access an entire column:

  • Selection #1: to access a column and return it every bit a data frame, you can utilise this syntax:
image-184

For example:

                > students_data["first_name"]    first_name 1      Emily 2       Rose 3  Alexander 4       Nora 5       Gino              
  • Option #2: to get a cavalcade as a vector (sequence), you lot tin can use this syntax:
image-185

💡 Tip: Observe the use of the $ symbol.

For case:

                > students_data$first_name  [1] Emily     Rose      Alexander Nora      Gino      Levels: Alexander Emily Gino Nora Rose              
  • Option #3: You tin too use this syntax to get the column as a vector (come across below). This is equivalent to the previous syntax:
                > students_data[["first_name"]]  [1] Emily     Rose      Alexander Nora      Gino      Levels: Alexander Emily Gino Nora Rose              

How to Filter Rows of a Data Frame

You can filter the rows of a data frame to get a portion of the matrix that meets certain atmospheric condition.

For this, we use this syntax, passing the condition as the first element within square brackets, then a comma, and finally leaving the second chemical element empty.

image-190

For case, to get all rows for which students_data$age > 16, we would utilize:

                > students_data[students_data$historic period > 16,]    first_name last_name age num_siblings num_pets eye_color five       Gino      Sand  17            three        8      Blue              

We  get a data frame with the rows that meet this status.

Filter Rows and Cull Columns

Y'all can combine this condition with a range of columns:

                > students_data[students_data$age > xvi, iii:six]    age num_siblings num_pets eye_color 5  17            three        8      BLUE              

We get the rows that meet the condition and the columns in the range 3:6.

🔸 How to Modify Data Frames

You tin can modify individual values of a data frame, add columns, add rows, and remove them. Allow'southward see how you tin do this!

How to Modify A Value

To change an individual value of the information frame, yous demand to use this syntax:

image-191

For case, if we want to change the value that is currently at row iv and cavalcade 6, denoted in blue right here:

image-182

We need to use this line of lawmaking:

                students_data[4, six] <- "BROWN"              

💡 Tip: You tin likewise utilize = as the consignment operator.

This is the output. The value was changed successfully.

image-193

💡 Tip: Think that the starting time row of the CSV file is not counted as the kickoff row because it has the names of the variables.

How to Add Rows to a Data Frame

To add together a row to a data frame, you lot demand to use the rbind function:

image-194

This function takes two arguments:

  • The data frame that you want to change.
  • A list with the data of the new row. To create the listing, you can use the listing() role with each value separated by a comma.

This is an example:

                > rbind(students_data, listing("William", "Smith", fourteen, seven, 3, "BROWN"))              

The output is:

                                  first_name last_name historic period num_siblings num_pets eye_color one      Emily    Dawson  15            two        5      Blueish 2       Rose Patterson  xiv            5        0     GREEN 3  Alexander     Smith  16            0        2     Brownish 4       Nora    Navona  16            four       10     Brown 5       Gino      Sand  17            iii        8      Blue 6       <NA>     Smith  14            7        3     BROWN              

But wait! A warning message was displayed:

                Warning message: In `[<-.gene`(`*tmp*`, ri, value = "William") :   invalid cistron level, NA generated              

And notice the first value of the sixth row, information technology is <NA>:

                6       <NA>     Smith  fourteen            7        3     Dark-brown              

This occurred considering the variable first_name was divers automatically as a factor when nosotros read the CSV file and factors have fixed "categories" (levels).

You lot cannot add a new level (value - "William") to this variable unless you read the CSV file with the value FALSE for the parameter stringsAsFactors, equally shown below:

                > students_data <- read.csv("students_data.csv", stringsAsFactors = FALSE)              
image-196

Now, if we try to add this row, the data frame is modified successfully.

                > students_data <- rbind(students_data, listing("William", "Smith", xiv, seven, 3, "Dark-brown")) > students_data    first_name last_name age num_siblings num_pets eye_color 1      Emily    Dawson  xv            ii        5      BLUE 2       Rose Patterson  14            5        0     GREEN three  Alexander     Smith  16            0        2     Dark-brown iv       Nora    Navona  16            4       x     Light-green 5       Gino      Sand  17            3        8      BLUE half-dozen    William     Smith  xiv            7        3     BROWN              

💡 Tip: Note that if you lot read the CSV file again and assign information technology to the same variable, all the changes made previously will be removed and you volition come across the original data frame. Yous need to add this argument to the first line of code that reads the CSV file and then make changes to it.

How to Add Columns to a Data Frame

Adding columns to a data frame is much simpler. Yous need to use this syntax:

image-197

For instance:

                > students_data$GPA <- c(4.0, 3.v, 3.2, 3.15, two.9, 3.0)              

💡 Tip: The number of elements has to be equal to the number of rows of the information frame.

The output shows the data frame with the new GPA column:

                > students_data    first_name last_name age num_siblings num_pets eye_color  GPA one      Emily    Dawson  15            2        v      BLUE 4.00 ii       Rose Patterson  14            five        0     GREEN three.50 3  Alexander     Smith  16            0        2     BROWN 3.xx four       Nora    Navona  16            4       10     GREEN iii.xv 5       Gino      Sand  17            iii        viii      BLUE 2.xc 6    William     Smith  14            7        3     Chocolate-brown 3.00              

How to Remove Columns

To remove columns from a data frame, yous need to apply this syntax:

image-198

When you assign the value Null to a column, that column is removed from the data frame automatically.

For example, to remove the age cavalcade, we use:

                > students_data$age <- Null              

The output is:

                > students_data    first_name last_name num_siblings num_pets eye_color  GPA 1      Emily    Dawson            ii        5      BLUE four.00 ii       Rose Patterson            5        0     GREEN three.l three  Alexander     Smith            0        2     BROWN three.20 4       Nora    Navona            4       10     GREEN iii.xv 5       Gino      Sand            3        8      BLUE 2.90 6    William     Smith            7        3     Dark-brown iii.00              

How to Remove Rows

To remove rows from a information frame, you can utilise indices and ranges. For case, to remove the outset row of a data frame:

image-200

The [-i,] takes a portion of the data frame that doesn't include the offset row. Then, this portion is assigned to the same variable.

If we have this data frame and we want to delete the first row:

image-230

The output is a data frame that doesn't include the first row:

image-231

In general, to remove a specific row, you need to use this syntax where <row_num> is the row that you want to remove:

image-229

💡 Tip: Notice the - sign before the row number.

For instance, if we desire to remove row 4 from this data frame:

image-232

The output is:

image-233

As you tin run across, row four was successfully removed.

🔹 In Summary

  • CSV files are Comma-Separated Values Files used to stand for data in the form of a tabular array. These files can be read using R and RStudio.
  • Data frames are used in R to represent tabular data. When you read a CSV file, a data frame is created to shop the data.
  • You can access and modify the values, rows, and columns of a data frame.

I really promise that you liked my article and found information technology helpful. At present you tin piece of work with information frames and CSV files in R.

If you lot liked this article, consider enrolling in my new online course "Introduction to Statistics in R - A Practical Arroyo "



Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started

kramerclon1975.blogspot.com

Source: https://www.freecodecamp.org/news/how-to-work-with-data-frames-and-csv-files-in-r/

0 Response to "How Do I Select a Column in a Data File I Uploaded to R"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel