-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Colony Hackathon Submission
Project Title
Distributed Data Science Colony
Project Description
A decentralized way to gather Data Science teams around publicly available data sets (www.data.gov and other initiatives), teams and reward work accordingly. It covers the basic 5 phases of data science projects: Question, Exploratory Data Analysis, Formal Modeling, Interpretation, Communication.
A single colony can manage various Data Science Projects (defined as Domains).
Some of the issues tacked by the solution:
- Establishing clear goals (Question)
- Avoiding data dredging (trying everything under the Sun for correlation), as sources are tasks and require approval
- Provide random seeds for reproducible results while avoiding peers from selecting seed that yield specific results when applying non-deterministic algorithms (i.e: K-Means Clustering)
- Document not only positive results but negative ones too. There is currently an enormous bias towards only publishing positive results while negative ones are equally valid and useful. All research start is public and so are the results
- Data is greatly available nowadays, never before at this speed and openness, only a fully distributed solution will allow geographically disperse teams (Civic Hackers mostly) to coordinate work and contribute research.
** Very Late submission, I know **
Only got a chance to start serious work on Friday 22th, but I plan on continuing work an eventually publishing the fully operation solution. It uses Electron so to allow browser free use, only relying on the Blockchain client of choice.
Project Repository
https://github.com/mduske/colonyHackathon
Team Members and Contact info
https://github.com/mduske
https://twitter.com/MarkDuske
https://www.linkedin.com/in/markduske/