Skip to content

Data collection #3

@saurabh-khanna

Description

@saurabh-khanna

Data we can collect using public resources

  1. Engagement Metrics:

    • Data: Likes, comments, and shares for posts/accounts.
    • Source: Instagram Graph API (limited to authenticated accounts with permissions).
    • Python Resources:
      • instagrapi: A Python library for Instagram automation and data retrieval.
      • requests/urllib for direct API calls.
  2. Metadata:

    • Data: Post timing, hashtags, geotags, follower counts, bio details.
    • Source: Instagram Graph API or publicly visible profile information (if scraping is allowed).
    • Python Resources:
      • instagrapi or beautifulsoup4 for web scraping (if permitted).
  3. Content Data:

    • Data: Post captions, hashtags, or image/video metadata.
    • Source: Instagram Graph API or scraping public posts.
    • Python Resources:
      • instagrapi for API-based caption and hashtag extraction.
      • Pillow or OpenCV for analyzing downloaded images.
  4. Hashtag and Topic Trends:

    • Data: Frequency and reach of hashtags or topics.
    • Source: Manually collect hashtags or use third-party tools.
  5. Sentiment and Keyword Analysis:

    • Data: Extracted captions and comments for text analysis.
    • Python Resources:
      • NLTK or spaCy for NLP tasks like sentiment analysis or keyword extraction.
      • Transformers from Hugging Face for advanced sentiment or topic modeling.
  6. Sensitive Content Analysis:

    • Data: Analyze engagement trends for flagged sensitive content.
    • Source: Measure differences pre- and post-flagging using API data.
  7. Historical Policy Data:

    • Data: Changes in Instagram's guidelines over time.
    • Source: Archive platforms like the Wayback Machine or scraping public policy pages.
    • Python Resources:
      • selenium or requests for automating data retrieval from policy archives.
  8. Adversarial Testing Data:

    • Data: Results from posting test content systematically (different hashtags, metadata).
    • Source: Automated posting experiments.
    • Python Resources:
      • instagrapi or custom bots for automated content posting and engagement tracking.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions