SPINOS¹: A Dataset of Subtle Polarity and INtensity Opinion Shifts

You can find here an example of the annotation template for abortion.

The dataset is introduced and analyzed in our paper: Investigating User Radicalization: A Novel Dataset for Identifying Fine-Grained Temporal Shifts in Opinion, Flora Sakketou, Allison Lahnala, Liane Vogel and Lucie Flek

Read here a blog post about our paper.

You will need Python>=3.8 and the following package to be installed in order to load the dataset:

pip install pandas==1.4.1

You can read the data with:

import pandas as pd

spinos_official = pd.read_pickle('data/Spinos_official_dataset_v1.1.pkl')
thread_posts = pd.read_pickle('data/Spinos_context_posts_v1.1.pkl')

The index of the rows corresponds to the ids of the posts in the Reddit API

The dataframe contains the following columns:

author_id (str): Which user has posted this particular post. Note: The usernames are anonymous.
title (str): The title of the post, if it exists.
content (str): The content of the post, if it exists.
annotation (str): Majority vote of the non-expert annotators.
topic (str): The topic this post is about. Possible values: 'abortion' 'feminism' 'brexit' 'veganism' 'guns' 'nuclear-energy' 'capitalism' 'climate-change'
subreddit (str): The subreddit this post was posted on.
is_sarcastic (str): The stance is sarcastic. Values = ['', 'No', 'Yes'].
is_unsure (str): How many annotators stated that they were unsure of the annotation. Values = ['', 'No', '1/3', '2/3'].
is_explicit (str): The stance is explicitly stated. Values = ['No', 'Yes'].
top_level_post: How many annotators requested to read the top-level post in order to do the annotation. Values = ['No', '1', '2', '3']
toplevel_id (str): ID of the top-level post in the thread (prev. top_level_post_id)
parents: How many annotators requested to read the parent posts in order to do the annotation. Values = ['No', '1', '2', '3']
parent_ids (list): List of the parent posts' ids that where used for the annotation
timestamp (pandas Timestamp): Date and time when the post was posted.
parent_id (str): ID of post's parent
n_parents (int): number of posts between root and post
n_children (int): number of children the post has (1 level down)
children_ids_limited (list): we share up to two children

You can construct the context with:

def get_thread_df(post_id):
    """
    Given a post_id in spinos_official, return the thread as a DataFrame
    in order (parents -> post -> children), with all columns.
    Looks in thread_posts first, then spinos_official for missing posts.
    """
    if post_id not in spinos_official.index:
        raise ValueError(f"{post_id} not in spinos_official")
    
    # Get the IDs for the thread
    parent_ids = spinos_official.loc[post_id, 'parent_ids']
    children_ids = spinos_official.loc[post_id, 'children_ids_limited']
    
    # Full thread order
    thread_ids = parent_ids + [post_id] + children_ids
    
    rows = []
    for tid in thread_ids:
        if tid in thread_posts.index:
            row = thread_posts.loc[tid]
        elif tid in spinos_official.index:
            # Take only the relevant columns
            row = spinos_official.loc[tid, ['parent_id','author_id', 'title', 'content', 'topic', 'subreddit', 'timestamp']]
        else:
            print("post not found")
            continue  # skip if missing everywhere
        rows.append(row)
    
    # Combine into a DataFrame
    thread_df = pd.DataFrame(rows)
    thread_df.index = [tid for tid in thread_ids if tid in thread_df.index]
    
    return thread_df

test_post_id = spinos_official.index[0]
thread_df = get_thread_df(test_post_id)
print(test_post_id)
thread_df

Fun fact: SPINOS in Greek (σπίνος) means chaffinch, which is our logo. Interestigly, this name brings together some of the authors of the paper, since the first author is Greek, the second author is an aspirant birdwatcher and the third author's surname means "bird" in German. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
annotation_template		annotation_template
.gitignore		.gitignore
README.md		README.md
SPINOS.jpeg		SPINOS.jpeg
SPINOS_context_posts_v1.1.pkl		SPINOS_context_posts_v1.1.pkl
SPINOS_official_dataset_v1.1.pkl		SPINOS_official_dataset_v1.1.pkl
read_data.ipynb		read_data.ipynb
topic_descriptions.json		topic_descriptions.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPINOS¹: A Dataset of Subtle Polarity and INtensity Opinion Shifts

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

caisa-lab/SPINOS-dataset

Folders and files

Latest commit

History

Repository files navigation

SPINOS1: A Dataset of Subtle Polarity and INtensity Opinion Shifts

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

SPINOS¹: A Dataset of Subtle Polarity and INtensity Opinion Shifts

Packages