minutemaker

This R package provides tools for transcribing and summarizing meeting and conference recordings.

Installation

You can install the development version of minutemaker from GitHub:

# install.packages("remotes")
remotes::install_github("bakaburg1/minutemaker")

Usage

This package allows transcribing audio recordings using different speech-to-text APIs and then summarizing the transcripts using remote or local Large Language Models. You can also start directly with existing transcripts from the major meeting platforms (WebEx, Zoom, MS Teams, etc.) to bypass speech-to-text entirely.

The package supports the Whisper API from OpenAI or Azure, or local models via whisper-ctranslate2 (install separately) for speech transcription.

For the transcript summarization, the package can use the chat endpoint from either OpenAI, Azure or a local model with an OpenAI compatible REST API (for example as served by LMStudio or Ollama).

We suggest to use GPT-4 grade models for long meetings/talks (~ 1h) since they provide a longer context window and better summarization quality. Otherwise, the “rolling window” summarization method can be used to produce summaries of consistent quality on long transcripts also with smaller models.

In general, we suggest to use an LLM with a >32K long context window, to avoid loss of information.

Here is an example workflow.

Template workflow script

You can start from the packaged workflow template and customize it for your meeting. The helper below writes a copy to the target directory.

generate_workflow_template("path/to/meeting/folder")

If you prefer to inspect the template without copying it:

system.file(
  "templates",
  "meeting_summary_template.R",
  package = "minutemaker"
)

Setting up the package

You need to set up the infrastructure for the speech-to-text and text summarization APIs. The LLM-powered summarization requires the bakaburg1/llmR package which is installed (from GitHub) together with minutemaker.

You can do this by setting the following options:

# Load the necessary packages
library(minutemaker)

# Set up LLM model of choice using llmR functions

# Example: Setting up OpenAI GPT-4 model
llmR::record_llmr_model(
  label = "openai",
  provider = "openai",
  model = "gpt-4",
  api_key = "your_openai_api_key"
)

# Example: Setting up Azure GPT model
llmR::record_llmr_model(
  label = "azure_gpt",
  provider = "azure",
  model = "your_azure_deployment_id",
  endpoint = "https://your-resource-name.openai.azure.com",
  api_key = "your_azure_api_key",
  api_version = "2024-06-01"
)

# Set the preferred LLM globally using one of the labels defined above
llmR::set_llmr_model("openai")

# Set up the speech-to-text (STT) options

options(
  # Choose the STT model among online models: "azure_whisper" or "openai_whisper"
  # or local models:  "whisper_local", "mlx_whisper_local" (python based),
  # (use "mlx_whisper_local" for Apple Silicon)
  # "whisper_ctranslate2" (cli based, install ctranslate2 separately)
  minutemaker_stt_model = "whisper_local",
  
  # OpenAI Whisper API Key (for remote OpenAI whisper model)
  minutemaker_openai_api_key = "your_openai_api_key",
  
  # Azure Whisper API credentials (for remote Azure whisper model)
  minutemaker_azure_resource_whisper = "your_azure_resource_name",
  minutemaker_azure_deployment_whisper = "your_azure_deployment_id",
  minutemaker_azure_api_key_whisper = "your_azure_api_key",
  minutemaker_azure_api_version = "2024-06-01",
  
  # LLM for transcript correction
  minutemaker_correction_llm_model = "your_preferred_correction_llm_label",
  
  # Whether to add explicit chain-of-thought reasoning for non-reasoning LLMs;
  # set this TRUE to improve corrections on those models, and FALSE for models
  # that already reason internally (e.g., OpenAI o3 or Google Gemini 2.5)
  minutemaker_include_llm_reasoning = TRUE 
)

These settings can also be passed manually to the various functions, but the option system is more convenient. You only need to set the options for the APIs you want to use (e.g., you don’t need the speech-to-text API options if you already have a transcript).

Automated Context Generation

For enhanced summarization quality, you can automatically generate meeting context from documentation materials using the generate_context() function. This extracts expected agendas, event descriptions, audience information, vocabulary lists, and speech-to-text prompts from meeting materials (PDFs, DOCX, PPTX, Excel files).

# Generate context from documentation in a materials folder
generate_context(
  target_dir = "meeting_folder",
  material_dir = "documentation",  # Folder with meeting materials
  strategy = "agentic"  # Multi-pass context generation (recommended)
)

# Context is automatically used by speech_to_summary_workflow() when generate_context = TRUE
speech_to_summary_workflow(
  target_dir = "meeting_folder",
  generate_context = TRUE,  # Enable automatic context generation
  # ... other arguments
)

Model recommendations: Use large, state-of-the-art models like GPT-5.2 and the agentic strategy for best results. The one_pass strategy is suitable for a cost-effective solution.

Configuration Options: Several global options control context generation behavior:

minutemaker_context_material_dir: Directory name for documentation materials (default: documentation)
minutemaker_context_gen_strategy: Generation strategy - agentic (recommended) or one_pass (default: agentic)
minutemaker_context_gen_llm_model: Specific llmR model label for context generation (optional, uses active model if unset)
minutemaker_overwrite_context: Whether to overwrite existing context files (default: FALSE)

Set these options globally to customize context generation:

options(
  minutemaker_context_material_dir = "documentation",
  minutemaker_context_gen_strategy = "agentic",
  minutemaker_context_gen_llm_model = "your_context_model",
  minutemaker_overwrite_context = FALSE
)

Transcript-First Workflow

If you already have a transcript from your meeting platform (such as WebEx VTT files, SRT files, or MS Teams DOCX exports), you can skip the speech-to-text transcription entirely and start directly with the transcript. The package supports:

WebEx transcripts (.vtt files)
Generic SRT subtitles (.srt files)
MS Teams transcripts (.docx files exported from Teams meetings)

This approach saves time and API costs while maintaining all downstream functionality like LLM-based correction, agenda inference, and summarization.

# Use an external transcript directly
speech_to_summary_workflow(
  target_dir = "path/to/your/project",

  # Skip audio processing entirely by providing external transcript
  external_transcript = "path/to/transcript.vtt", # or .srt, or .docx
  source_audio = NULL, # No audio file needed

  # Enable speaker diarization import if available in the transcript
  import_diarization = TRUE,

  # All other arguments work the same as with audio input
  use_agenda = "yes",
  llm_provider = "openai",
  formatted_output_file = "meeting_summary.txt"
)

The workflow will automatically detect and standardize your external transcript into the package’s internal format, then proceed with LLM correction and summarization as usual.

Alternatively, simply place your transcript file (.vtt, .srt, or .docx) in your working directory - the workflow will auto-detect it and use it as the primary source.

Transcribing the audio

If you don’t have an existing transcript, the first step is transcribing the audio recording of a meeting.

Since the Whisper API can process files only up to 25MB (~ 50-60 minutes in mp3), it could be useful to split the audio file into smaller chunks. This can be done using the split_audio() function.

# Construct the path to the source audio/video file

audio_file <- "path-to-audio-file.mp3"
stt_audio_dir <- "path-to-audio-files-folder"

# Split the audio files into 40 minutes segments

split_audio(
  audio_file = audio_file,
  segment_duration = 40,
  output_folder = stt_audio_dir
)

The function will ask you to install the av package to split the audio file split_audio() should works only with video files via the ffmpeg command line tool (a separated installation is required). Only the “mp4” format was tested.

The second step is transcribing the audio files. This can be done using the transcribe_audio() function.

# Construct the path to the recording parts directory

stt_audio_dir <- "path-to-audio-files-folder"

transcript <- perform_speech_to_text(
  # Can also be the path to a single audio file
  audio_path = stt_audio_dir,
  
  # A folder where the transcript for each audio file will be saved in JSON
  # format. Defaults to "transcription_output_data" in the same directory as
  # `audio_path`.
  output_dir = "transcription_output_data",
  
  # Already transcribed files will be skipped if this is FALSE
  overwrite = FALSE,
  
  # Optional; useful to recognize uncommon words, acronyms, and names
  initial_prompt = "Keep these terms in mind: term1, name2, etc..."
  
  # Takes more arguments to pass to the API functions. See the documentation for
  # details.
)

# In alternative, the transcript data frame can be generated at will using the
# parse_transcript_json() function and the folder where the trascription outputs
# are stored.
transcript <- parse_transcript_json("transcription_output_data")

# This allows you also to pass in the starting time of the event in
# the HH:MM(:SS)( AM/PM) format which allows to compute the timing of the speech
# segment in real clock time.
# The event start time can be set globally using the
# minutemaker_event_start_time option
transcript_with_clocktime <- parse_transcript_json(
  "transcription_output_data",
  event_start_time = "10:00" # or 10:30:12, or 15:30, or 03:30 PM, etc...
)

# Then, you can save the transcript to a file
readr::write_csv(transcript, file.path(work_dir, day, "transcript.csv"))

NOTE: this process sometimes may fails for issues on the API side. Some times the process stops and with the creation with a JSON file with an error inside; just remove this file and restart the process. For other API related error, restarting the process is usually enough. Otherwise, feel free to open an issue.

Manual Transcript Import and Enhancement

For advanced users who want to work with transcripts manually, the package provides functions to import and enhance transcript data.

The import_transcript_from_file() function can read transcripts in various formats:

# Import a transcript file
transcript <- import_transcript_from_file(
  transcript_file = "path/to/transcript.vtt", # Supports .vtt, .srt, .docx
  import_diarization = TRUE # Include speaker information if available
)

For transcripts that include chat messages (such as WebEx exports), you can integrate them using add_chat_transcript():

# Add chat messages to an existing transcript
enhanced_transcript <- add_chat_transcript(
  transcript_data = transcript,
  chat_transcript = "path/to/chat_export.txt",
  start_time = "12:00 PM" # To align chat timestamps
)

Note: For most users, the transcript-first workflow (see above) handles all of this automatically. Manual import functions are mainly useful for custom processing or when working with unusual transcript formats.

LLM-based Transcript Correction

After obtaining a transcript, you may want to use a Large Language Model to correct potential errors in spelling, grammar, or term usage. The function apply_llm_correction() is designed for this purpose. It processes transcript JSON files (either a single file or all JSON files in a directory), applies corrections using an LLM, and saves the corrected version.

Key features:

Selective Processing: It can skip files already marked as corrected if overwrite = FALSE.
Term Guidance: You can provide a list of important terms (names, acronyms, etc.) to guide the LLM.
Model Choice: The LLM used for correction is determined by the R option minutemaker_correction_llm_model.
Reasoning Output: If minutemaker_include_llm_reasoning is TRUE (or passed directly as an argument), the LLM’s reasoning for corrections can be included in the output, which is useful for debugging.

# Example: Correcting transcripts in a directory
corrected_files <- apply_llm_correction(
  input_path = "transcription_output_data", # Directory with transcript JSONs
  terms = c(
    "Specific Term",
    "Participant name",
    "ACRONYM: explanation"), # Explanations give context to the LLM but are not used for the corrections
  overwrite = TRUE # Set to FALSE to skip already corrected files
)

# Example: Correcting a single transcript file
corrected_file <- apply_llm_correction(
  input_path = "transcription_output_data/segment_1.json",
  terms = vocabulary, # Use a pre-defined vocabulary
  overwrite = FALSE
)

This step is also integrated into the main speech_to_summary_workflow() function, as shown later.

Creation of the “Agenda” object

The next step is the creation of the “Agenda” object. This is a list of lists, where each element represents a talk in the meeting.

Here’s an example of Agenda object with 2 talks:

agenda <- list(
  list(
    session = "Session 1",
    title = "Opening",
    
    # Information to provide context to the summarization process
    type = "Conference introduction",
    description = "A description of the topic",
    
    # The start and end time of a talk in seconds from the beginning of the
    # event
    from = 0,
    to = 334,
    
    # Speakers and moderators as vectors of names
    speakers = c("Speaker 1", "Speaker 2"),
    moderators = "Moderator name",
  ),
  list(
    session = "Session 1",
    title = "Presentation 1",
    type = "conference presentation",
    description = "This presentation is about this and that.",
    from = 334,
    to = 600,
    speakers = c("Speaker 3"),
    moderators = "Moderator name",
  )
)

# Alternatively, actual clock times can be used, according to the HH:MM(:SS)(
# AM/PM) format
list(
  list(
    session = "Session 1",
    title = "Presentation 1",
    type = "conference presentation",
    description = "This presentation is about this and that.",
    from = "10:10",
    to = "10:50",
    speakers = c("Speaker 3"),
    moderators = "Moderator name",
  )
)

This object can be saved in an R file with the dput() function and then loaded with the dget() function.

The most important components of the agenda are the start and end time of each talk. These needs to be manually extracted by reading the transcript. The timings are not mandatory if the meeting consists of only one talk. The agenda object itself is not strictly necessary, but can be helpful for long meetings with multiple talks.

An alternative approach is to generate the agenda automatically via the infer_agenda_from_transcript() function. This function uses the transcript to infer the different sessions of the talk, their start and end times and also generate a description and a title. The function can use contextual information to improve the quality of the generated agenda, such as the event description, the audience, a vocabulary, and the expected agenda.

It’s important to review and correct the inferred agenda, since the function might not be able to infer the correct structure of the meeting.

Summarizing a single meeting transcript

The final step is summarizing the transcript. This can be done using the summarise_transcript() function. The first choice is to decide which summarization method to use. The package currently supports two summarization methods: “simple” and “rolling”. The “simple” method ingest the whole transcript at once and produces a summary of the whole transcript; it’s a faster method but the quality may drop with very long transcripts. The “rolling” method ingest the transcript in chunks of a given size and produces a summary for each chunk; these summaries are then aggregated into a final summary of the whole transcript; it’s a slower method but the quality is more consistent across different transcript lengths. If the “rolling” method is used, the user can also specify the size of the chunks to use (window_size argument, default is 15 minutes in transcript time).

The user can also define the expected output length in number of pages (output_length, default is 3 pages). The summarization process will try to produce a summary of the given length, but the actual length may vary depending on the transcript content (and the LLM idiosyncrasies).

The following code shows how to set all the arguments for the summarization process; but it’s possible to use the function with only the transcript data to get a summary. The other arguments only provide additional contextual information to improve the summary.

For example an agenda generated via infer_agenda_from_transcript() can be formatted into text using format_agenda() and then added to summary_structure argument (see example below).

# Construct the path to the transcript file
transcript_data <- readr::read_csv(
  "path-to-transcript.csv",
  show_col_types = FALSE)

# Extract the text from the transcript file at a given time interval
transcript_text <- extract_text_from_transcript(
  transcript_data, start_time = 0, end_time = 334
)

# Define the summarization method
method <- "rolling" # or "simple"

# Alternatively, you can extract the text from the transcript file using the
# agenda object which has the start and end time of each talk
agenda <- dget("path-to-agenda.R")

transcript_text <- extract_text_from_transcript(
  transcript_data,
  agenda_element = agenda[[1]] # First talk in the agenda
)

# You may want to define some other contextual information about the meeting
# to improve the summary.

## A description of the meeting/conference with some background
event_description <- "***Title of the Conference***
  
  Description of the conference."

## A vocabulary of terms to keep in mind when summarizing the transcript AND
## for LLM-based transcript correction. Can be useful also to make the 
## summarization robust to transcription errors.
vocabulary <- c("Term1", "ACRONYM: description", "Jack Black", "etc")

## Specific talk details, like title and speakers. Can be built using
## `generate_recording_details()` from an agenda element:
recording_details <- generate_recording_details(
  agenda[[1]],
)

## or manually:
recording_details <- generate_recording_details(
  session = "Session 1",
  title = "Opening",
  type = "Conference introduction",
  speakers = c("Speaker 1", "Speaker 2"),
)

## The audience of the meeting/conference, which helps the summarisation to
## focus on specific topics and helps setting the tone of the summary.
audience <- "An audience with some expertise in the topic
    of the conference and with a particular interest in this and that."

# Not mandatory but may help, for example with hybrid events where the room
# speakers are not reported in the transcript. Your instructions will be
# appended to the default ones. See get_prompts("diarization_template") for
# the defaults.
diarization_instructions <- "The event is hybrid, with both online and
in-presence participants and speakers. Extrapolate the speakers from the
context if not reported"

# The user can modify the expected structure of the summary (check it using
# get_prompts("summary_structure")). For example, one can add an extra
# summarisation section with:
summary_structure <- paste0(
  get_prompts("summary_structure"),
  "
- My extra summarisation instruction"
)

# The use can also use the summarisation instruction to add and agenda to drive
# the summarisation focus:
agenda_text <- format_agenda(agenda) # Renamed from agenda to avoid conflict
summary_structure_with_agenda <- stringr::str_glue("
  {get_prompts('summary_structure')}
  Here is an agenda of the event to keep into account while summarizing:
  {agenda_text}
  Strictly follow the agenda to understand which information is worth summarizing.
")


# Finally, the user can add extra output instructions to the default ones (check
# them using get_prompts("output_summarisation") for the summarisation and
# get_prompts("output_rolling_aggregation") for the rolling aggregation). For
# example, one can add an extra output instruction with:
output_instructions <- "
- Focus on and report quantitative details when they
are discussed."

# Get the summary

talk_summary <- summarise_transcript(
  transcript_data = transcript_text,
  method = method,
  event_description = event_description,
  recording_details = recording_details,
  audience = audience,
  vocabulary = vocabulary,
  
  summary_structure = summary_structure_with_agenda, # Or don't pass it to use the default
  extra_diarization_instructions = diarization_instructions,
  extra_output_instructions = output_instructions,
  
  provider = "my-LLM-provider-of-choice",
)

Use cat(talk_summary) to print the summary to the console.

Summarizing multiple meeting transcripts

If your transcript covers multiple talks, you can use the summarise_full_meeting() function to generate a summary for each talk in the transcript. The various contextual information (e.g. event description and audience) and custom prompt instructions will be applied to the whole set of talks.

In this context, the use of an agenda object is what brings the most value, since it clearly defines the timing of each talk and provide extra context as a title, the recording type (e.g., conference talk, meeting, lesson, etc), description of the topic, the list of speakers, etc…

The output of the function is a “summary tree”, i.e. a list with a different summary for each talk in the agenda.

conference_summaries <- summarise_full_meeting(
  transcript_data = transcript_data,
  method = method,
  agenda = agenda,
  output_file = "path-to-output-file.R",

  event_description = event_description,
  audience = audience,
  vocabulary = vocabulary,

  summary_structure = summary_structure_with_agenda, # Using the one with agenda
  extra_diarization_instructions = diarization_instructions,
  extra_output_instructions = output_instructions,

  provider = "my-LLM-provider-of-choice",
)

The “summary tree” will be saved at the location indicated by output_file. Unless the overwrite argument is set to TRUE, the function will stop if the output_file already exists.

Formatting the summary

To generate a more human-readable document with all the summaries, you can use the format_summary_tree() function. This function will generate a formatted text file with the summaries of all the talks in the agenda. The agenda provide information about each talk.

formatted_summary <- format_summary_tree(
  summary_tree = conference_summaries, # Use the output from summarise_full_meeting
  agenda = agenda,
  output_file = "path-to-output-file.txt" # Optional file to save the summary
)

cat(formatted_summary)

The output will be saved at the location indicated by output_file if not NULL.

Workflow shortcut function

The package is built around a flexible workflow that can start from either audio files or existing transcripts, ending with a formatted summary file. The workflow includes all the steps described above:

Audio-first path: Splitting audio files → Speech-to-text transcription → LLM correction → Summarization
Transcript-first path: External transcript import → Standardization → LLM correction → Summarization

Both paths support chat message integration, agenda inference, and formatted output generation.

The speech_to_summary_workflow() function implements this workflow and can be used to perform all the steps in one go.

# initial_prompt, event_description, audience,
# vocabulary, diarization_instructions, summary_structure_with_agenda
# are defined in the previous examples

# Set a workspace directory (replace with your own project path if needed)
work_dir <- "path/to/your/project"

# Perform the whole audio files to formatted summary workflow. Most arguments
# are omitted since the defaults are fine but some (e.g. the output paths and
# the overwrite arguments) are shown for clarity.
speech_to_summary_workflow(
  target_dir = work_dir,

  # Arguments for `split_audio`
  source_audio = file.path(work_dir, "recording.mp3"),
  split_audio = TRUE,
  split_audio_duration = 40,
  stt_audio_dir = file.path(work_dir, "audio_to_transcribe"),
  overwrite_stt_audio = FALSE,

  # Arguments for `perform_speech_to_text`
  stt_output_dir = file.path(work_dir, "transcription_output_data"),
  stt_model = getOption("minutemaker_stt_model"),
  stt_initial_prompt = initial_prompt,
  stt_language = "en",
  overwrite_transcription_files = FALSE,

  # Arguments for LLM-based transcript correction (runs after STT)
  enable_llm_correction = TRUE, # Defaults to TRUE
  # `vocabulary` is also used for `terms` in `apply_llm_correction`

  # Arguments for `parse_transcript_json`
  transcript_file = file.path(work_dir, "transcript.csv"),
  event_start_time = "09:00 AM",
  overwrite_transcript = FALSE,

  # Arguments for external transcript input (transcript-first workflow)
  # Set to NULL to skip external transcripts, or provide a path
  # Auto-detection example (returns NA if no match, which is treated as NULL):
  external_transcript = list.files(
    work_dir,
    pattern = "\\.(vtt|srt|docx)$",
    ignore.case = TRUE,
    full.names = TRUE
  )[1], # Auto-detects external transcripts in working directory
  import_diarization = TRUE, # Include speaker info from transcript

  # Arguments for `add_chat_transcript`
  chat_file = list.files(work_dir, pattern = "Chat.*\\.txt$", full.names = TRUE, ignore.case = TRUE)[1],
  chat_format = "webex",

  # Arguments for `summarise_full_meeting` and `infer_agenda_from_transcript`
  use_agenda = "yes", # or "ask"/"no"
  agenda_file = file.path(work_dir, "agenda.R"),
  expected_agenda = NULL,
  agenda_generation_window_size = 3600,
  extra_agenda_generation_args = NULL,

  # Arguments for the summarization
  multipart_summary = TRUE,
  event_description = event_description,
  audience = audience,
  vocabulary = vocabulary, # Used for summarization AND LLM correction
  consider_diarization = TRUE,
  summary_structure = summary_structure_with_agenda,
  extra_diarization_instructions = diarization_instructions,
  extra_output_instructions = output_instructions,
  llm_provider = getOption("llmr_llm_provider"),
  summarization_window_size = 15,
  summarization_output_length = 1,
  summarization_method = "simple",

  summarization_output_file = file.path(work_dir, "event_summary.R"),
  overwrite_summary_tree = FALSE,

  # Arguments for `format_summary_tree`
  formatted_output_file = file.path(work_dir, "event_summary.txt"),
  overwrite_formatted_output = FALSE
)

Changing the default prompts

The summarisation process is built upon a number of “technical prompt” under the hood.

You can use the default prompts or define your own through a number of options.

You can see the current prompts by running:

prompts <- get_prompts() # get all the prompts in a named vector
prompt <- get_prompts("output_instructions") # get a specific prompt

# Print one of the prompts, cat() improves the formatting
cat(prompts[[1]])

Or you can use set_prompts() function to edit the defaults.

set_prompts(audience = "My custom audience description")

The package uses “structural prompt” (a “base_task” prompt and all the promp ending with “_template”) which organize the prompt general implementation and then uses the “instruction prompt” (e.g. “diarization_instructions” or “summarisation_sections”) to specifically instruct the LLM.

The user is mostly expected to change the second type, while changing the first is an expert option since it can break the prompt generation.

Use set_promts(force = TRUE) to restore all prompts to their default values.

Package options

This section lists every minutemaker_* option defined in the package code. You can set any of them with options(...) in your R session.

Core workflow and formatting

Option	Default	Format	Description
`minutemaker_event_start_time`	`NULL`	Clock-time string `HH:MM[:SS][ AM/PM]` (e.g., `"09:00"` or `"03:30 PM"`)	Converts clock times to seconds to align transcripts and agendas.
`minutemaker_stt_model`	`NULL` (functions default to `"whisper_local"`)	String model label: `"openai_whisper"`, `"azure_whisper"`, `"whisper_local"`, `"mlx_whisper_local"`, or `"whisper_ctranslate2"`	Default STT backend used by workflow helpers and `perform_speech_to_text()`.

Speech-to-text (STT)

Option	Default	Format	Description
`minutemaker_openai_api_key`	`NULL`	Non-empty string API key	Used by OpenAI Whisper requests.
`minutemaker_azure_resource_whisper`	`NULL`	String resource name (not a URL)	Used to build the Azure Whisper endpoint.
`minutemaker_azure_deployment_whisper`	`NULL`	String deployment ID	Selects the Azure Whisper deployment.
`minutemaker_azure_api_key_whisper`	`NULL`	Non-empty string API key	Used for Azure Whisper authentication.
`minutemaker_azure_api_version`	`NULL`	String version (e.g., `"2024-06-01"`)	Passed as the Azure `api-version`.
`minutemaker_whisper_package`	`"openai-whisper"` or `"mlx_whisper"`	String python package name: `"openai-whisper"` or `"mlx_whisper"`	Selects the local Whisper python backend package.
`minutemaker_split_audio_parallel`	`FALSE`	Logical `TRUE`/`FALSE`	Enables `mirai` workers for `split_audio()`.
`minutemaker_split_audio_parallel_timeout`	`120`	Positive integer seconds	Timeout for parallel audio splitting before aborting.

Transcript correction

Option	Default	Format	Description
`minutemaker_correction_llm_model`	`NULL` (falls back to `llmr_current_model`)	String llmR model label	Model used by `apply_llm_correction()`; falls back to `llmr_current_model`.
`minutemaker_include_llm_reasoning`	`TRUE`	Logical `TRUE`/`FALSE`	Requests explicit reasoning before JSON corrections.

Context generation

Option	Default	Format	Description
`minutemaker_context_material_dir`	`"documentation"`	String directory name or relative path	Folder (relative to `target_dir`) containing meeting materials for `generate_context()`.
`minutemaker_context_gen_strategy`	`"agentic"`	String `"agentic"` or `"one_pass"`	Selects multi-pass vs single-call context generation.
`minutemaker_context_gen_llm_model`	`NULL`	String llmR model label	Overrides the active model only for context generation.
`minutemaker_overwrite_context`	`FALSE`	Logical `TRUE`/`FALSE`	Overwrite existing `context/` files when `TRUE`.
`minutemaker_context_generation_max_retries`	`3`	Positive integer	Max retry attempts per generated context field.

Prompt customization (`set_prompts()` / `get_prompts()`)

Option	Default	Format	Description
`minutemaker_prompt_persona`	Built-in prompt text	Character string	Persona block that sets overall voice/role.
`minutemaker_prompt_base_task`	Built-in prompt text	Character string	Base task instruction for summarization.
`minutemaker_prompt_aggregate_task_rolling`	Built-in prompt text	Character string	Task for rolling-window aggregation.
`minutemaker_prompt_agenda_inference_task`	Built-in prompt text	Character string	Task used when inferring agenda structure.
`minutemaker_prompt_event_description_template`	Built-in prompt text	Character string template with `{event_description}`	Wraps event description content.
`minutemaker_prompt_recording_details_template`	Built-in prompt text	Character string template with `{recording_details}`	Wraps recording details content.
`minutemaker_prompt_transcript_template`	Built-in prompt text	Character string template with `{transcript}`	Wraps transcript segment text.
`minutemaker_prompt_aggregate_template_rolling`	Built-in prompt text	Character string	Intro wrapper before rolling summaries.
`minutemaker_prompt_agenda_inference_template`	Built-in prompt text	Character string template	Frames transcript text for agenda inference.
`minutemaker_prompt_vocabulary_template`	Built-in prompt text	Character string template with `{vocabulary}`	Injects vocabulary and term guidance.
`minutemaker_prompt_diarization_template`	Built-in prompt text	Character string template with `{extra_diarization_instructions}`	Wraps diarization instructions and extra hints.
`minutemaker_prompt_audience_template`	Built-in prompt text	Character string template with `{audience}`	Injects audience focus guidance.
`minutemaker_prompt_summarisation_template`	Built-in prompt text	Character string template with `{summary_structure}`	Wraps summary structure instructions.
`minutemaker_prompt_summary_structure`	Built-in prompt text	Character string or vector of bullet lines	Default summary sections when none are supplied.
`minutemaker_prompt_output_template`	Built-in prompt text	Character string template with `{output_instructions}` and `{output_length}`	Wraps output rules and length requirement.
`minutemaker_prompt_output_summarisation`	Built-in prompt text	Character vector of instruction lines	Default output rules for summarization.
`minutemaker_prompt_output_rolling_aggregation`	Built-in prompt text	Character vector of instruction lines	Default output rules for rolling aggregation.

For the full default prompt texts, run get_prompts() in R (or inspect set_prompts() in the source).

Internal cache/state (advanced)

Option	Default	Format	Description
`minutemaker_temp_agenda`	`NULL`	List of numeric seconds	In-session cache of inferred agenda breakpoints.
`minutemaker_temp_agenda_last_bp`	`NULL`	Integer index	Last processed breakpoint index for resume.
`minutemaker_temp_agenda_hash`	`NULL`	String hash	Cache key used to invalidate agenda inference results.
`minutemaker_data_merge_hash`	`NULL`	String hash	Cache key for deprecated GloVe utilities.
`minutemaker_data_merge_model`	`NULL`	Numeric matrix	Cached GloVe embeddings for deprecated functions.

Name		Name	Last commit message	Last commit date
Latest commit History 450 Commits
.cursor/rules		.cursor/rules
R		R
inst/templates		inst/templates
man		man
renv		renv
tests		tests
.Rbuildignore		.Rbuildignore
.Rprofile		.Rprofile
.gitignore		.gitignore
AGENTS.md		AGENTS.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
minutemaker.Rproj		minutemaker.Rproj
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

minutemaker

Installation

Usage

Template workflow script

Setting up the package

Automated Context Generation

Transcript-First Workflow

Transcribing the audio

Manual Transcript Import and Enhancement

LLM-based Transcript Correction

Creation of the “Agenda” object

Summarizing a single meeting transcript

Summarizing multiple meeting transcripts

Formatting the summary

Workflow shortcut function

Changing the default prompts

Package options

Core workflow and formatting

Speech-to-text (STT)

Transcript correction

Context generation

Prompt customization (`set_prompts()` / `get_prompts()`)

Internal cache/state (advanced)

About

Licenses found

Uh oh!

Releases 18

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

Licenses found

bakaburg1/minutemaker

Folders and files

Latest commit

History

Repository files navigation

minutemaker

Installation

Usage

Template workflow script

Setting up the package

Automated Context Generation

Transcript-First Workflow

Transcribing the audio

Manual Transcript Import and Enhancement

LLM-based Transcript Correction

Creation of the “Agenda” object

Summarizing a single meeting transcript

Summarizing multiple meeting transcripts

Formatting the summary

Workflow shortcut function

Changing the default prompts

Package options

Core workflow and formatting

Speech-to-text (STT)

Transcript correction

Context generation

Prompt customization (set_prompts() / get_prompts())

Internal cache/state (advanced)

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 18

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Prompt customization (`set_prompts()` / `get_prompts()`)

Packages