-
Notifications
You must be signed in to change notification settings - Fork 1
Data processing: primary data processing
We use TaskTracker plugin and
activity tracker plugin to gather the source data.
The data gathering consists of us collecting code snapshots and actions during the solving of various programming tasks
by students. The data also contains the information about the age, programming experience and so on of the student (student profile),
and the current task that the student is solving.
At this stage, the test files that were created during the testing phase are deleted. They have ON value in the test
mode column in the TaskTracker file. Also, the student could send several files with the history of solving the task,
each of which can include the previous ones. At this stage, unnecessary files are deleted. Ultimately, there is only one
file with a unique history of solving the current problem. In addition, for each TaskTracker file, a unique file of
the activity tracker is sent. In this step, all files of the activity tracker are combined into one.
The TaskTracker plugin allows to collect the sequence of code snapshots, as well asking the user about their age, programming experience, gender, and country. TaskTracker files can have any names and contain the following information:
- date of an action;
- timestamp of an action;
- name of the edited file;
- hash code of the edited file;
- the current code fragment;
- the current chosen programming task;
- test mode;
- student's id;
- student’s age. Available values: 1-100;
- student's programming experience in years. Available values: >= 0;
- student's programming experience in months (if programming experience in years is zero). Available values: 0-11;
- student’s country;
- student’s gender;
An example of the TaskTracker file can be found here.
The activity tracker plugin allows to track and record IDE user activity. The list of the columns and activities can be found here.
An example of the activity tracker file can be found here.
Use preprocess_data method from preprocessing.py.
| Argument | Description |
|---|---|
| path | path to the directory with files |
The root directory must have the following structure:
-root
--user_N1
---task1
----user_N1_files
--user_N2
---task1
----user_N2_files
The requirements for the user's files:
- All files have to be in the .csv format.
- Activity tracker files should have the prefix ide-events.