A: Apply the function to two files in folders with for loop or lapply and save the results in a dataframe
Try this solution:
Get all folders with
list.dirs
.For each folder, read the "alpha" and "beta" files and return a 3-column tibble with
Alpha
,Beta
jAlpha-Beta
Values.Concatenate all dataframes with and
I WOULD
Column to know which folder each value is from.
all_folders <- list.dirs('Data/', recursive = FALSE, full.names = TRUE)result <- purrr::map_df(all folders, function(x) {
all_Files <- list.files(x, full.names = TRUE, pattern = 'alpha|beta')
df1 <- read.csv(all_files[1])
df2 <- read.csv(all_files[2])
tibble::tibble(alpha = df1$media, beta = df2$media, alphabet = alpha/beta)
}, .id = "id")
Skip the last N lines with lapply and then read.csv
Something like this should get you on the right track. This first reads the files, then removes the last 5 lines, and finally merges them. I would also suggest not using variable names that might conflict with function names.files
jC
are functions in base R. Here I useAll files
lieberfiles
. -
all_files <- list.files(path = "./saved files", full.names = TRUE)do.call(rbind, # assuming columns match 1:1; use dplyr::bind_rows() if not 1:1
lapply(all_files, function(x) {
head(read.csv(x, header = T, stringsAsFactors = F), -5) # change as needed
})
)
Using the lapply variable in read.csv
In general, it's often more useful to apply a function to list items and then, when using lapply, return a list that stores the variables and can be given a name. Example (edit: use split to process files together):
archivos <- list.files(path= "I:/Results/", pattern = "site_[abcd]_.*csv", full.names = TRUE)
files <- split(files, gsub(".*site_([abcd]).*", "\\1", files))
process file <- function(x) {
todo <- leer.csv(x[grep("_all.csv", x)])
rsid <- leer.csv(x[grep("_rsid.csv", x)])
tabl <- leer.csv(x[grep("_tbl.csv", x)])
# do more, generate df, return(df)
}
res <- lapply (files, process files)
How to load multiple CSV files into separate objects (dataframes) based on filename in R?
Solution for the curious...
Files <- list.files(pattern=".*csv")for(file at 1:length(files)) {
file_name <- paste(c("file00",file), collapse = " ")
file_name <- gsub(" ", "", file_name, fixed = TRUE)
ex_file_name <- paste(c("exfile00",file), collapse = " ")
ex_file_name <- gsub(" ", "", ex_file_name, fixed = TRUE)
(Video) Step 19_2 Read all csv files from a folder using lapplyfile_object <- read.csv(file = paste(file_name, ".csv", sep=""),fileencoding="UTF-8-BOM")
exfile_object <- read.csv(file = paste(ex_file_name, ".csv", sep=""),fileEncoding="UTF-8-BOM")
}
Essentially create the filename inside the loop and then pass it to the readcsv function on each iteration.
How do I import multiple .csv files at once?
Something like the following should result in each dataframe as a separate item in a single list:
temp = list.files(pattern="*.csv")
misarchivos = lapply(temp, read.delim)
This assumes that you have these CSVs in a single directory, your current working directory, and that they all have the lowercase extension.csv
.
If you then want to combine those dataframes into a single dataframe, you can find solutions in other answers with things likemake.call(bind,...)
,dplyr::bind_rows()
Ödata.table::rbindlist()
.
If you really want each dataframe in a separate object, although this is often not recommended, here's what you can do withassign to
:
temp = list.files(pattern="*.csv")
for (i en 1:longitud(temp)) asignar(temp[i], read.csv(temp[i]))
Or withoutassign to
, and to (1) demonstrate how the filename can be sanitized and (2) show how to uselist2env
You can try the following:
temp = list.files(pattern="*.csv")
list2env(
lapply(setNombres(temp, make.names(gsub("*.csv$", "", temp))),
read.csv), entorno = .GlobalEnv)
But then again, it's often better to just list them.
A: Bugs and problems when executing functions with lapply/map - it doesn't read the input list and doesn't write the output files
To run the second part you need to use mapply like in this example.
###import .tsv of PCR results, format, natural order. Export the clean file as .csv.
###Import .tsv from Replicates file, split $Samples column in two and merge them back with Replicates.
###Export modified replica tibble to a new .csv file.###Environment setting; change the folder accordingly. If necessary, install Tidyverse.
(Video) How to Read Multiple CSV Files with For-Loop in Rsetwd("C:/Usuarios/asmit/Desktop/Exercise Files")
#install.packages(tidyverse)
Library (Tidyverse)
###import .tsv of PCR results, format, natural order. Export the clean file as .csv.
singlet_files <- list.files(path = ".", pattern = "[^replicas]\\.tsv")
tibble_singlet <- function(x) { ###Function to create tibble from singlet files
cleanup_tibble <- as_tibble(read_tsv(x, col_names = TRUE, skip = 1))
}
singlet_cleanup <- function(x) { ##Function to clean up singlet files
nuevo_archivo <- str_replace(x, "(.*).tsv", "\\1_cleaned.csv")
tibble_singlet(x) %>%
select("Pos", "Name", "Cp", "Konzentration") %>%
.[str_order(.$Pos, numeric = TRUE),] %>%
write_csv (file = new_file)
}
lapply(singlet_files, singlet_cleanup) ##ejecutar (singlet_cleanup) en archives en singlet_files
#> Rows: 96 Columns: 8
#> -- Column specification -------------------------------------------------------- - - -----------
#> Separator: "\t"
#> chr(3): pos, name, state
#> dbl(4): color, cp, concentration, standard
#> lgl(1): Include
#>
#> i Use `spec()` to get the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to mute this message.
#> Rows: 96 Columns: 8
#> -- Column specification -------------------------------------------------------- - - -----------
#> Separator: "\t"
#> chr(3): pos, name, state
#> dbl(4): color, cp, concentration, standard
#> lgl(1): Include
#>
#> i Use `spec()` to get the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to mute this message.
#> Rows: 96 Columns: 8
#> -- Column specification -------------------------------------------------------- - - -----------
#> Separator: "\t"
#> chr(3): pos, name, state
#> dbl(4): color, cp, concentration, standard
#> lgl(1): Include
#>
#> i Use `spec()` to get the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to mute this message.
#> [[1]]
#> # Una Tibble: 96 x 4
#> Position name Cp concentration
#> <character> <character> <dbl> <dbl>
#> 1 A1 1E6 17.2 894000
#> 2 A2 1E6 17,2 877000
#> 3 A3 23 NA NA
#> 4 A4 23 NA NA
#> 5 A5 79 35,1 8,73
#> 6 A6 79 36.2 4.26
#> 7 A7 144 35.7 6.09
#> 8 A8 144 36,7 3,19
#> 9 A9 229 39,2 0,633
#> 10 A10 229 37.7 1.64
#> # ... with 86 more lines
#>
#> [[2]]
#> # Una Tibble: 96 x 4
#> Position name Cp concentration
#> <character> <character> <dbl> <dbl>
#> 1 A1 1E6 19.1 769000
#> 2 A2 1E6 18,9 906000
#> 3 A3 319 33,5 103
#> 4 A4 319 33.8 86.3
#> 5 A5 370 35,8 23,4
#> 6 A6 370 40 1.79
#> 7 A7 415 35.6 27.2
#> 8 A8 415 36,8 13
#> 9 A9 486 34,5 55,3
#> 10 A10 486 36.0 21.1
#> # ... with 86 more lines
#>
#> [[3]]
#> # Una Tibble: 96 x 4
#> Position name Cp concentration
#> <character> <character> <dbl> <dbl>
#> 1 A1 1E6 18.2 568000
#> 2 A2 1E6 17,0 1210000
#> 3 A3 23 35,7 12,3
#> 4 A4 23 35.9 10.9
#> 5 A5 67 35,6 13,3
#> 6 A6 67 35.5 14.5
#> 7 A7 129 38.3 2.6
#> 8 A8 129 NA NA
#> 9 A9 172 NA NA
#> 10 A10 172 37.3 4.69
#> # ... with 86 more lines
###Import .tsv from Replicates file, split $Samples column in two and merge them back with Replicates.
singlet_cleaned <- list.files(ruta = ".", patrón = "[_cleaned]\\.csv")
files_match_pairs <- list.files(path = ".", pattern = "[replicas]\\.tsv")
clean_tibble <- function(y) { ##Function to read clean .csv files as tibble
Pos_tibble <- as_tibble(read_csv(y, col_names = TRUE))
}
matches <- function(m){ ##function to purge a replicated file
match_tibble <- as_tibble(read_tsv(m, col_names = TRUE, skip = 1))
}
merged <- function(m,y){ ##Function to merge tibble matching a specific column of clean_tibble
organ <- regmatches(m, regexpr("(liver|lung|kidney|spleen)", m))
archivo_salida <- str_replace(m, "(.*)_replicates.tsv", "\\1_final.csv")
Party(s) %>%
mutate("R1" = gsub(x = .$examples, pattern = "^(.*),.*", replacement = "\\1")) %>%
mutate("R2" = gsub(x = .$samples, pattern = ".*,\\s(.*)", replacement = "\\1")) %>%
pivot_longer(cols = c("R1", "R2"), names_to ="Pot Pairs", values_to = "Pozos") %>%
select("MeanCp", "STD Cp", "Mean conc", "STD conc", "Wells") %>%
relocate("Brunnen", 1) %>%
right_join((cleaned_tibble(y)), by = c("Wells"="Pos")) %>%
.[str_order(.$Wells, numeric = TRUE),] %>%
select("Nombre", "MeanCp", "STD Cp", "Mean conc", "STD conc") %>%
flag(Name, .keep_all = TRUE) %>%
add_column(organ = organ) %>%
write_csv(file = output_file) ###Export modified replica tibble to a new .csv file.
}
mapply(merged, matching_pair_files, singlet_cleaned, SIMPLIFY = FALSE)
#> Rows: 47 Columns: 5
#> -- Column specification -------------------------------------------------------- - - -----------
#> Separator: "\t"
#> chr(1): Examples
#> dbl (4): MeanCp, STD Cp, Mean conc, STD conc
#>
#> i Use `spec()` to get the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to mute this message.
#> Rows: 96 Columns: 4
#> -- Column specification -------------------------------------------------------- - - -----------
#> Separator: ","
#> chr (2): Pos, Name
#> dbl(2): Cp, concentration
#>
#> i Use `spec()` to get the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to mute this message.
#> Rows: 46 Columns: 5
#> -- Column specification -------------------------------------------------------- - - -----------
#> Separator: "\t"
#> chr(1): Examples
#> dbl (4): MeanCp, STD Cp, Mean conc, STD conc
#>
#> i Use `spec()` to get the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to mute this message.
#> Rows: 48 Columns: 6
#> -- Column specification -------------------------------------------------------- - - -----------
#> Separator: ","
#> chr (2): name, organ
#> dbl (4): MeanCp, STD Cp, Mean conc, STD conc
#>
#> i Use `spec()` to get the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to mute this message.
#> Error: Join columns must exist in the data.
#> x Problem mit `Pos`.
(Video) R Studio - Append multiple files together from a folder
However, I don't know what this last error message below is about... all my files have the expected result. I... I'm not worrying about that right now.
Created on 2021-09-22 by reprex package (v2.0.1)
Related topics
Align fiddle plots with dodged boxplots
Visualization of dependencies of R functions
How to create a different report for each subset of a data frame using R Markdown
Applying an R script prepared for a single file to multiple files in the directory
Ggplot2 Increase spacing between legend keys
Where should I store the data for automated testing with Testthat?
The aesthetics must be at least as long as the data issues
How to draw grid lines behind the data using Abline()
Change the Stringsasfactors setting for Data.Frame
The "slam" dependency is not available when installing the tm package
Convert the matrix into a three-column data frame
Random forest with classes that are very unbalanced
Run R script from .bat (batch file)
How to place labels outside of the pie chart
Ways to read only selected columns from a file in R? (A happy medium between 'Read.Table' and 'Scan')
How to expand an ellipse argument (...) without evaluating it in R
How do you change the levels of a factor column in a data table?
Forces R (and Rstudio) to use virtual memory on Windows