Coding Help

Saving Plots

Base R Plots

In R, there are several ways to save and export plots. One of the most common ways is to use the png() function, which creates a PNG file and opens a plotting device. The syntax for this function is as follows:

png(filename, width, height, units = "px", res = 72)
  • filename is the file name and path where the PNG file should be saved.
  • width and height are the width and height of the PNG file in pixels, respectively.
  • units is the units for the width and height. The default is "px" for pixels.
  • res is the resolution of the PNG file in dots per inch (DPI). The default is 72 DPI.

Once the plotting device is open, you can create your plot using any standard R plotting function, such as plot(), barplot(), hist(), etc. Once the plot is created, you can save it to the PNG file by using the dev.off() function.

For example, imagine we have a data.frame called data_for_plots:

head(data_for_plots)
  variable_a variable_b
1  0.1703222  0.6700134
2  0.1612494  1.5436437
3 -2.0870488  1.1083646
4  0.6901655  0.9038989
5 -1.0462657 -0.3617654
6 -0.1199232 -0.4250342

We can create a plot and save it as follows:

png("path_to_folder/file_name.png", width = 500, height = 500)
plot(x = data_for_plots$variable_a,
     y = data_for_plots$variable_b,
     xlab = "X-axis variable name",
     ylab = "Y-axis variable name")
dev.off()

You can also save plots as other file types like pdf, jpeg, bmp, etc by using the corresponding functions like pdf(), jpeg(), bmp(). It’s important to note that the dev.off() function should always be called after the plot has been created and saved to ensure that the plotting device is closed properly.

GGplots

Another way to save a plot is by using the ggsave() function from the ggplot2 package. The ggsave() function saves the last ggplot that you created.

ggsave(filename, width = 7, height = 7, units = c("in", "cm", "mm"), dpi = 300)
  • filename is the file name and path where the plot should be saved.
  • width and height are the width and height of the plot in the specified units.
  • units is the units for the width and height. The default is "in" for inches.
  • dpi is the resolution of the plot in dots per inch (DPI). The default is 300 DPI.

For example:

ggplot(data_for_plots,
       aes(x = variable_a,
           y = variable_b)) + 
  geom_point() + 
  xlab("X-axis variable name") + 
  ylab("Y-axis variable name")
ggsave("path_to_folder/file_name.png", width = 6, height = 6)

group_by() and summarise()

The group_by() and summarise() functions are part of the dplyr package which is loaded when you call library(tidyverse), and are commonly used to perform data aggregation and summarization.

The group_by() function is used to group a data frame by one or more variables. For example, if you have a data frame called data with columns "A", "B", and "C", you can group the data by column "A" using the following code:

data %>%
  group_by(A)

You can also group by multiple columns, by passing multiple column names as arguments. For example:

data %>% 
  group_by(A, B)

The summarise() function is then used to create a summary of the grouped data. This function takes one or more arguments, which are the summary statistics that you want to calculate. For example, you can calculate the mean of column "C" for each group using the following code:

data %>% 
  group_by(A) %>% 
  summarise(mean_C = mean(C))

You can also use multiple, different summary statistics like sum, max, min, etc as per your requirements:

data %>% group_by(A) %>% 
  summarise(mean_C = mean(C),
            median_B = median(B))

It’s important to note that the group_by() function must be called before the summarise() function, and the summarise() function must be the last function called in the chain of operations. summarise() returns a new data frame, so it’s a good practice to assign the result to a new object:

my_summary_object <- data %>% 
  group_by(A) %>% 
  summarise(mean_C = mean(C),
            median_B = median(B))