Skip to content

Questions and doubts about the dataset #7

@fernandobperezm

Description

@fernandobperezm

Hi!

First and foremost, thanks for your contribution.

I'm using this dataset in my research; however, I'm having troubles to use the dataset after reading the SIGIR paper "RL4RS: A Real-World Dataset for Reinforcement Learning based Recommender System" . I'm hoping you could answer the following questions:

  1. Could you please explain me what is the meaning of the a_ and b_ prefixes in the data files? e.g., rl4rs_dataset_a_rl vs rl4rs_dataset_b_rl.
  2. Could you please explain me what is the meaning of the _rl and _sl suffixes in the data files? e.g., rl4rs_dataset_a_rl vs rl4rs_dataset_a_sl.
  3. Do users have a unique numerical identifiers? I tried doing a .unique() operation on the user_protrait column. However, I got way more unique strings than what is reported in Table 2.
  4. Inside the item_feature column, how can I identify the item numerical identifier? The paper says that the ID is inside this column but does not specify its position inside the array.
  5. If I want to perform an offline evaluation using a traditional user-rating matrix, can I join those datasets into a single matrix? or, instead, should I keep four different matrices (one for each data file)?
  6. Could you please provide or highlight the code that computes the statistics of the dataset?
  7. I'm trying to replicate Table 2 at the moment, however, I do not know how to map Slate-SL, Slate-RL, SeqSlate-SL, SeqSlate-RL to the data files.
  8. Similar to 7, how can I create the Slate and SeqSlate datasets shown on the same Table?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions