Skip to content

Commit 617cb52

Browse files
Clear dataset-site placeholders from GitHub metadata (#21)
* Remove dataset-site GitHub placeholders * Update datasets/kvasir-instrument.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 800de62 commit 617cb52

34 files changed

+96
-54
lines changed

README.md

Lines changed: 47 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -4,51 +4,56 @@ A collection of open datasets published by Simula Research Laboratory and Simula
44
Currently, we have published the following datasets:
55

66
**Medical and Biology Datasets**
7-
* Depresjon, The Depresjon Dataset. [ [publication](https://dl.acm.org/doi/10.1145/3204949.3208125) ]
8-
* HyperKvasir, The Largest Gastrointestinal Dataset. [ [publication](https://www.nature.com/articles/s41597-020-00622-y) ]
9-
* HYPERAKTIV, A Motor Activity Database of Patients with ADHD. [ [publication](https://dl.acm.org/doi/10.1145/3458305.3478454) ]
10-
* KvasirCapsule SEG, A Capsule Endoscopy Segmentation Dataset. [ [publication](https://arxiv.org/pdf/2104.11138) ]
11-
* Cellular, A cell autophagy dataset. [ [publication](https://github.com/simula/cellular) ]
12-
* GastroVision, A multicenter dataset. [ [publication](https://arxiv.org/abs/2307.08140) ]
13-
* Nerthus, A Bowel Preparation Quality Video Dataset. [ [publication](https://dl.acm.org/do/10.1145/3193165/abs/) ]
14-
* Kvasir-VQA: A Text-Image Pair GI Tract Dataset
15-
* Kvasir Capsule, The largest gastrointestinal PillCAM dataset. [ [publication](https://www.nature.com/articles/s41597-021-00920-z) ]
16-
* Kvasir Instrument, A gastrointestinal instrument Dataset. [ [publication](https://link.springer.com/chapter/10.1007/978-3-030-67835-7_19) ]
17-
* Kvasir SEG, Segmented Polyp Dataset for Computer Aided Gastrointestinal Disease Detection. [ [publication](https://dl.acm.org/doi/10.1007/978-3-030-37734-2_37) ]
18-
* Kvasir, A Multi-Class Image-Dataset for Computer Aided Gastrointestinal Disease Detection. [ [publication](https://dl.acm.org/do/10.1145/3193289/abs/) ]
19-
* Psykose, A Motor Activity Database of Patients with Schizophrenia. [ [publication](https://ieeexplore.ieee.org/document/9182896) ]
20-
* VISEM QC, A sperm quality control dataset.
21-
* VISEM, A Multimodal Video Dataset of Human Spermatozoa. [ [publication](https://dl.acm.org/doi/10.1145/3304109.3325814) ]
7+
* Cellular, A cell autophagy dataset. [[project](https://github.com/simula/cellular)]
8+
* Depresjon, The Depresjon Dataset. [[publication](https://dl.acm.org/doi/10.1145/3204949.3208125) | [project](https://datasets.simula.no/depresjon/)]
9+
* GastroVision, A multicenter dataset. [[publication](https://arxiv.org/abs/2307.08140) | [project](https://github.com/DebeshJha/GastroVision)]
10+
* HTAD, A Home-Tasks Activities Dataset with Wrist-accelerometer and Audio Features. [[publication](https://link.springer.com/chapter/10.1007/978-3-030-67835-7_17) | [project](https://osf.io/4dnh8/)]
11+
* HYPERAKTIV, A Motor Activity Database of Patients with ADHD. [[publication](https://dl.acm.org/doi/10.1145/3458305.3478454) | [project](https://github.com/simula/hyperaktiv)]
12+
* HyperKvasir, The Largest Gastrointestinal Dataset. [[publication](https://www.nature.com/articles/s41597-020-00622-y) | [project](https://github.com/simula/hyper-kvasir)]
13+
* Kvasir, A Multi-Class Image-Dataset for Computer Aided Gastrointestinal Disease Detection. [[publication](https://doi.org/10.1145/3083187.3083212) | [project](https://datasets.simula.no/kvasir/)]
14+
* Kvasir Capsule, The largest gastrointestinal PillCAM dataset. [[publication](https://www.nature.com/articles/s41597-021-00920-z) | [project](https://github.com/simula/kvasir-capsule)]
15+
* Kvasir Instrument, A gastrointestinal instrument Dataset. [[publication](https://doi.org/10.1007/978-3-030-67835-7_19) | [project](https://osf.io/kp6my/)]
16+
* Kvasir SEG, Segmented Polyp Dataset for Computer Aided Gastrointestinal Disease Detection. [[publication](https://dl.acm.org/doi/10.1007/978-3-030-37734-2_37) | [project](https://datasets.simula.no/kvasir-seg/)]
17+
* Kvasir-VQA, A Text-Image Pair GI Tract Dataset. [[publication](https://doi.org/10.1145/3689096.3689458) | [project](https://huggingface.co/datasets/SimulaMet-HOST/Kvasir-VQA)]
18+
* Kvasir-VQA-x1, A Large-Scale Multi-Task Benchmark for GI Tract Visual Question Answering. [[publication](https://doi.org/10.1007/978-3-032-08009-7_6) | [project](https://github.com/simula/Kvasir-VQA-x1)]
19+
* KvasirCapsule SEG, A Capsule Endoscopy Segmentation Dataset. [[publication](https://arxiv.org/abs/2104.11138) | [project](https://github.com/DebeshJha/NanoNet)]
20+
* MedMultiPoints, A Multimodal Dataset for Object Detection, Localization, and Counting in Medical Imaging. [[publication](https://arxiv.org/abs/2505.16647) | [project](https://github.com/Simula/PointDetectCount)]
21+
* Medico Multimedia - VISEM Tracking, A sperm tracking dataset. [[publication](https://doi.org/10.1145/3304109.3325814) | [project](https://multimediaeval.github.io/editions/2022/)]
22+
* Nerthus, A Bowel Preparation Quality Video Dataset. [[publication](https://doi.org/10.1145/3083187.3083216) | [project](https://datasets.simula.no/nerthus/)]
23+
* Psykose, A Motor Activity Database of Patients with Schizophrenia. [[publication](https://ieeexplore.ieee.org/document/9182896) | [project](https://osf.io/dgjzu/)]
24+
* VISEM, A Multimodal Video Dataset of Human Spermatozoa. [[publication](https://dl.acm.org/doi/10.1145/3304109.3325814) | [project](https://datasets.simula.no/visem/)]
25+
* VISEM QC, A sperm quality control dataset. [[project](https://datasets.simula.no/visem-qc/)]
2226

23-
**Sport Datasets**
24-
* Alfheim, Soccer video and player position dataset. [ [publication](https://dl.acm.org/doi/10.1145/2557642.2563677) ]
25-
* ARX, A Text-Classification Dataset Consisting of Norwegian Soccer Articles from VG and TV2. [ [publication](https://ieeexplore.ieee.org/abstract/document/8877417/) ]
26-
* ExposureEngine, Oriented Logo Detection & Sponsor Visibility Analytics (Dataset).
27-
* Heimdallr, A Dataset For Sport Analysis.
28-
* HockeyAI, A Multi-Class Ice Hockey Dataset for Object Detection. [ [publication](https://dl.acm.org/doi/10.1145/3712676.3718335) ]
29-
* HockeyRink: A Dataset for Precise Ice Hockey Rink Keypoint Mapping and Analytics. [ [publication](https://dl.acm.org/doi/10.1145/3712676.3718338) ]
30-
* HockeyOrient, A Dataset for Ice Hockey Player Orientation Classification. [ [publication](https://dl.acm.org/doi/10.1145/3712676.3718342) ]
31-
* ScopeSense, A 8.5-month sport, nutrition, and lifestyle lifelogging dataset.
32-
* Soccer Summarization, Soccer game captions and summary in English for game summarization. [ [publication](https://dl.acm.org/doi/10.1145/3552463.3557019) ]
33-
* SoccerMon, Subjective and objective data collected over two years from two different elite women´s soccer teams.
34-
* SoccerSum, The SoccerSum Dataset for Automated Detection, Segmentation, and Tracking of Objects on the Soccer Pitch [ [publication](http://localhost:3000/---) ]
35-
* SoccerNet-Echoes, SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset [ [publication](https://arxiv.org/abs/2405.07354) ]
36-
* PMData , A lifelogging dataset of 16 persons during 5 months using Fitbit, Google Forms and PMSys.
37-
* TACDEC, TACDEC: Dataset of Tackle Events in Soccer Game Videos [ [publication](https://dl.acm.org/doi/10.1145/3625468.3652166) ]
27+
**Sport and Activity Datasets**
28+
* Alfheim, Soccer video and player position dataset. [[publication](https://dl.acm.org/doi/10.1145/2557642.2563677) | [project](https://datasets.simula.no/alfheim/)]
29+
* Arx, A Text-Classification Dataset Consisting of Norwegian Soccer Articles from VG and TV2. [[publication](https://ieeexplore.ieee.org/abstract/document/8877417/) | [project](https://datasets.simula.no/arx/)]
30+
* ExposureEngine, Oriented Logo Detection and Sponsor Visibility Analytics in Sports Broadcasts. [[project](https://huggingface.co/datasets/SimulaMet-HOST/ExposureEngine)]
31+
* Heimdallr, A Dataset For Sport Analysis. [[project](https://datasets.simula.no/heimdallr/)]
32+
* HockeyAI, A Multi-Class Ice Hockey Dataset for Object Detection. [[publication](https://dl.acm.org/doi/10.1145/3712676.3718335) | [project](https://github.com/acmmmsys/2025-HockeyAI)]
33+
* HockeyOrient, A Dataset for Ice Hockey Player Orientation Classification. [[publication](https://dl.acm.org/doi/10.1145/3712676.3718342) | [project](https://github.com/acmmmsys/2025-HockeyOrient)]
34+
* HockeyRink, A Dataset for Precise Ice Hockey Rink Keypoint Mapping and Analytics. [[publication](https://dl.acm.org/doi/10.1145/3712676.3718338) | [project](https://github.com/acmmmsys/2025-HockeyRink)]
35+
* PMData, A lifelogging dataset of 16 persons during 5 months using Fitbit, Google Forms and PMSys. [[publication](https://dl.acm.org/doi/10.1145/3339825.3394926) | [project](https://osf.io/vx4bk/)]
36+
* ScopeSense, A 8.5-month sport, nutrition, and lifestyle lifelogging dataset. [[project](https://osf.io/v5acr/)]
37+
* Soccer Summarization, Soccer game captions and summary in English for game summarization. [[publication](https://dl.acm.org/doi/10.1145/3552463.3557019) | [project](https://github.com/simula/soccer-summarization)]
38+
* SoccerChat, A Multimodal Video-Text Dataset for Natural Language Soccer Game Understanding. [[publication](https://arxiv.org/abs/2505.16630) | [project](https://github.com/simula/SoccerChat)]
39+
* SoccerMon, Subjective and objective data collected over two years from two different elite women´s soccer teams. [[project](https://osf.io/uryz9/)]
40+
* SoccerNet-Echoes, A Soccer Game Audio Commentary Dataset. [[publication](https://arxiv.org/abs/2405.07354) | [project](https://github.com/SoccerNet/sn-echoes)]
41+
* SoccerSum, The SoccerSum Dataset for Automated Detection, Segmentation, and Tracking of Objects on the Soccer Pitch. [[publication](https://doi.org/10.1145/3625468.3652180) | [project](https://github.com/simula/SoccerSum)]
42+
* TACDEC, TACDEC: Dataset of Tackle Events in Soccer Game Videos. [[publication](https://doi.org/10.1145/3625468.3652166) | [project](https://github.com/simula/tacdec)]
3843

3944
**Other Datasets**
40-
* Anarchy Online, Server-side Network Traffic from Anarchy Online: Analysis, Statistics and Applications. [ [publication](https://datasets.simula.no/ao/mmsys2012-dataset.pdf) ]
41-
* European Cloud Cover, A dataset containing reanalysis data from ERA5 and satellite retrievals from METeosat Second Generation. [ [publication](https://www.mdpi.com/2504-2289/5/4/62/pdf) ]
42-
* Eye Tracker, A Serious Game Based Dataset. [ [publication](http://ceur-ws.org/Vol-1345/gamifir15_5.pdf) ]
43-
* HSDPA, HSDPA-bandwidth logs for mobile HTTP streaming scenarios.
44-
* HTAD, A Home-Tasks Activities Dataset with Wrist-accelerometer and Audio Features. [ [publication](https://link.springer.com/chapter/10.1007/978-3-030-67835-7_17) ]
45-
* Image Sentiment, A dataset for image sentiment analysis. [ [publication](https://arxiv.org/pdf/2009.03051.pdf) ]
46-
* Njord, A fishing boat dataset.
47-
* Right Inflight, A Dataset for Exploring the Automatic Prediction of Movies Suitable for a Watching Situation.
48-
* THREAT, A Large Annotated Corpus for Detection of Violent Threats.
49-
* Toadstool, A Dataset for Training Emotional and Intelligent Machines Playing Super Mario Bros. [ [publication](https://dl.acm.org/doi/10.1145/3339825.3394939) ]
50-
* WICO Graph Dataset, A Labeled Dataset of Twitter Subgraphs based on Conspiracy Theory and 5G-Corona Misinformation Tweets. [ [publication](https://dl.acm.org/doi/10.1145/3472720.3483617) ]
51-
* WICO Text, A labeled dataset of conspiracy theory and 5G-corona misinformation tweets. [ [publication](https://dl.acm.org/doi/abs/10.1145/3472720.3483617) ]
45+
* Anarchy Online, Server-side Network Traffic from Anarchy Online: Analysis, Statistics and Applications. [[publication](https://datasets.simula.no/ao/mmsys2012-dataset.pdf) | [project](https://datasets.simula.no/ao/)]
46+
* European Cloud Cover, A dataset containing reanalysis data from ERA5 and satellite retrievals from METeosat Second Generation. [[publication](https://www.mdpi.com/2504-2289/5/4/62/pdf) | [project](https://osf.io/kqdgx/)]
47+
* Eye Tracker, A Serious Game Based Dataset. [[publication](http://ceur-ws.org/Vol-1345/gamifir15_5.pdf) | [project](https://datasets.simula.no/eye-tracker/)]
48+
* HSDPA, HSDPA-bandwidth logs for mobile HTTP streaming scenarios. [[publication](http://home.ifi.uio.no/paalh/publications/files/mmsys2013-dataset.pdf) | [project](https://datasets.simula.no/hsdpa/)]
49+
* Image Sentiment, A dataset for image sentiment analysis. [[publication](https://arxiv.org/pdf/2009.03051.pdf) | [project](https://osf.io/xakp2/)]
50+
* Njord, A fishing boat dataset. [[project](https://github.com/simula/njord)]
51+
* Right Inflight, A Dataset for Exploring the Automatic Prediction of Movies Suitable for a Watching Situation. [[project](https://zenodo.org/record/1118338)]
52+
* THREAT, A Large Annotated Corpus for Detection of Violent Threats. [[project](https://datasets.simula.no/threat/)]
53+
* Toadstool, A Dataset for Training Emotional and Intelligent Machines Playing Super Mario Bros. [[publication](https://dl.acm.org/doi/10.1145/3339825.3394939) | [project](https://github.com/simula/toadstool)]
54+
* WICO Graph Dataset, A Labeled Dataset of Twitter Subgraphs based on Conspiracy Theory and 5G-Corona Misinformation Tweets. [[publication](https://dl.acm.org/doi/10.1145/3472720.3483617) | [project](https://osf.io/5m3by/)]
55+
* WICO Text, A labeled dataset of conspiracy theory and 5G-corona misinformation tweets. [[publication](https://doi.org/10.1145/3472720.3483617) | [project](https://datasets.simula.no/wico-text/)]
56+
5257

5358
## How to contribute
5459
To add a new **dataset**, follow these steps:

datasets/alfheim.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: 'Alfheim'
33
desc: 'Soccer video and player position dataset.'
44
thumbnail: /thumbnails/alfheim.png
55
publication: https://dl.acm.org/doi/10.1145/2557642.2563677
6+
github: ''
67
tags:
78
- soccer
89
- video analysis

datasets/ao.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: 'Anarchy Online'
33
desc: 'Server-side Network Traffic from Anarchy Online: Analysis, Statistics and Applications.'
44
thumbnail: /thumbnails/anarchy-online.png
55
publication: https://datasets.simula.no/ao/mmsys2012-dataset.pdf
6+
github: ''
67
tags:
78
- climate change
89
- sensor

datasets/arx.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: 'Arx'
33
desc: 'A Text-Classification Dataset Consisting of Norwegian Soccer Articles from VG and TV2.'
44
thumbnail: /thumbnails/arx.jpg
55
publication: https://ieeexplore.ieee.org/abstract/document/8877417/
6+
github: ''
67
tags:
78
- soccer
89
- text

datasets/cellular.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: 'Cellular'
33
desc: 'A cell autophagy dataset.'
44
thumbnail: /thumbnails/cellular.png
5+
publication: ''
56
github: https://github.com/simula/cellular
67
hidden: false
78
tags:

datasets/depresjon.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: 'Depresjon'
33
desc: 'The Depresjon Dataset.'
44
thumbnail: /thumbnails/depresjon.png
55
publication: https://dl.acm.org/doi/10.1145/3204949.3208125
6+
github: ''
67
tags:
78
- mental health
89
- sensor

datasets/ecc-dataset.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: 'European Cloud Cover'
33
desc: 'A dataset containing reanalysis data from ERA5 and satellite retrievals from METeosat Second Generation.'
44
thumbnail: /thumbnails/european-cloud-cover.jpg
55
publication: https://www.mdpi.com/2504-2289/5/4/62/pdf
6+
github: https://osf.io/kqdgx/
67
tags:
78
- climate change
89
- sensor

datasets/exposure-engine.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
title: 'ExposureEngine'
33
desc: 'Oriented Logo Detection and Sponsor Visibility Analytics in Sports Broadcasts'
44
thumbnail: /thumbnails/exposure-engine.jpg
5-
publication: ---
6-
github: ---
5+
publication: ''
6+
github: https://huggingface.co/datasets/SimulaMet-HOST/ExposureEngine
77
hidden: false
88
tags:
99
- Logo Detection

datasets/eye-tracker.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: 'Eye Tracker'
33
desc: 'A Serious Game Based Dataset.'
44
thumbnail: /thumbnails/eye-tracker.png
55
publication: http://ceur-ws.org/Vol-1345/gamifir15_5.pdf
6+
github: ''
67
tags:
78
- climate change
89
- sensor

datasets/gastrovision.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: 'GastroVision'
33
desc: 'A multicenter dataset.'
44
thumbnail: /thumbnails/gastrovision.jpg
5+
publication: https://arxiv.org/abs/2307.08140
56
github: https://github.com/DebeshJha/GastroVision
67
hidden: false
78
tags:

0 commit comments

Comments
 (0)