Skip to content

SPINOS: A Dataset of Subtle Polarity and Intensity Opinion Shifts

Notifications You must be signed in to change notification settings

caisa-lab/SPINOS-dataset

Repository files navigation

SPINOS1: A Dataset of Subtle Polarity and INtensity Opinion Shifts

You can find here an example of the annotation template for abortion.

The dataset is introduced and analyzed in our paper: Investigating User Radicalization: A Novel Dataset for Identifying Fine-Grained Temporal Shifts in Opinion, Flora Sakketou, Allison Lahnala, Liane Vogel and Lucie Flek

Read here a blog post about our paper.

You will need Python>=3.8 and the following package to be installed in order to load the dataset:

pip install pandas==1.4.1

You can read the data with:

import pandas as pd

spinos_official = pd.read_pickle('data/Spinos_official_dataset_v1.1.pkl')
thread_posts = pd.read_pickle('data/Spinos_context_posts_v1.1.pkl')

The index of the rows corresponds to the ids of the posts in the Reddit API

The dataframe contains the following columns:

  • author_id (str): Which user has posted this particular post. Note: The usernames are anonymous.

  • title (str): The title of the post, if it exists.

  • content (str): The content of the post, if it exists.

  • annotation (str): Majority vote of the non-expert annotators.

  • topic (str): The topic this post is about. Possible values: 'abortion' 'feminism' 'brexit' 'veganism' 'guns' 'nuclear-energy' 'capitalism' 'climate-change'

  • subreddit (str): The subreddit this post was posted on.

  • is_sarcastic (str): The stance is sarcastic. Values = ['', 'No', 'Yes'].

  • is_unsure (str): How many annotators stated that they were unsure of the annotation. Values = ['', 'No', '1/3', '2/3'].

  • is_explicit (str): The stance is explicitly stated. Values = ['No', 'Yes'].

  • top_level_post: How many annotators requested to read the top-level post in order to do the annotation. Values = ['No', '1', '2', '3']

  • toplevel_id (str): ID of the top-level post in the thread (prev. top_level_post_id)

  • parents: How many annotators requested to read the parent posts in order to do the annotation. Values = ['No', '1', '2', '3']

  • parent_ids (list): List of the parent posts' ids that where used for the annotation

  • timestamp (pandas Timestamp): Date and time when the post was posted.

  • parent_id (str): ID of post's parent

  • n_parents (int): number of posts between root and post

  • n_children (int): number of children the post has (1 level down)

  • children_ids_limited (list): we share up to two children

You can construct the context with:

def get_thread_df(post_id):
    """
    Given a post_id in spinos_official, return the thread as a DataFrame
    in order (parents -> post -> children), with all columns.
    Looks in thread_posts first, then spinos_official for missing posts.
    """
    if post_id not in spinos_official.index:
        raise ValueError(f"{post_id} not in spinos_official")
    
    # Get the IDs for the thread
    parent_ids = spinos_official.loc[post_id, 'parent_ids']
    children_ids = spinos_official.loc[post_id, 'children_ids_limited']
    
    # Full thread order
    thread_ids = parent_ids + [post_id] + children_ids
    
    rows = []
    for tid in thread_ids:
        if tid in thread_posts.index:
            row = thread_posts.loc[tid]
        elif tid in spinos_official.index:
            # Take only the relevant columns
            row = spinos_official.loc[tid, ['parent_id','author_id', 'title', 'content', 'topic', 'subreddit', 'timestamp']]
        else:
            print("post not found")
            continue  # skip if missing everywhere
        rows.append(row)
    
    # Combine into a DataFrame
    thread_df = pd.DataFrame(rows)
    thread_df.index = [tid for tid in thread_ids if tid in thread_df.index]
    
    return thread_df

test_post_id = spinos_official.index[0]
thread_df = get_thread_df(test_post_id)
print(test_post_id)
thread_df

Footnotes

  1. Fun fact: SPINOS in Greek (σπίνος) means chaffinch, which is our logo. Interestigly, this name brings together some of the authors of the paper, since the first author is Greek, the second author is an aspirant birdwatcher and the third author's surname means "bird" in German.

About

SPINOS: A Dataset of Subtle Polarity and Intensity Opinion Shifts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •