Skip to content
Thomas Leo Scherer edited this page Jul 21, 2022 · 5 revisions

Generating Tables from PDF Files

Using regular expressions in R

  • Regex, or "regular expressions", refer to character string search patterns. In R, we often use regex to find and replace certain strings. In matching strings, we can also subset and reformat data.
  • Base R has several functions for regex. grep, grepl, regexpr, gregexpr and regexec search for matches to a given string pattern within each element of a character vector. sub and gsub perform replacement of the first and all matches respectively. Below is an example provided from package grep's R documentation:
haystack <- c("red", "blue", "green", "blue", "green forest")

grep("green", haystack) # returns position(s) in vector where "green" can be found (3 and 5). 
grep("r", haystack, value = TRUE) # returns value
grepl("r", haystack) # returns boolean

sub("e", "+", haystack) # replaces pattern with replacement (once)
gsub("e", "+", haystack) # replaces pattern with replacement (global)
Further regex resources
  • grep - documentation for grep. This is the most commonly used package for regular expression.
  • RegExplain - useful resource for covering the basics of regular expressions.
  • rex - GitHub repository with useful regular expression templates. Rex allows you to build complex regular expressions from human readable expressions.
  • Terminal Tools

Clone this wiki locally