Status: Archived

No longer maintained

TaskTracker postprocessing

Overview

This tool prepares raw data collected by TaskTracker plugin for further analysis. This data contains snapshots of code collected during the solution process and records of user interaction with the IDE.

The tool consists of two major modules:

data processing
data visualization

Data processing

Requirements for the source data

The source data has to be in the .csv format.
Activity-tracker files have a prefix ide-events. We use activity-tracker plugin.
Codetracker files can have any names with a prefix of the key of the task, the data for which is collected in this file. We use TaskTracker plugin at the same time with the activity tracker plugin.
Columns for the activity-tracker files can be found in the const file (the ACTIVITY_TRACKER_COLUMN const).
Columns for the task-tracker files can be found in the const file (the TASK_TRACKER_COLUMN const).

Processing

The correct order for data processing is:

Do primary data preprocessing (use preprocess_data function from preprocessing.py).
Merge task-tracker files and activity-tracker files (use merge_tt_with_ati function from merging_tt_with_ati.py).
Find tests results for the tasks (use run_tests function from tasks_tests_handler.py).
Reorganize files structure (use reorganize_files_structure function from task_scoring.py).
[Optional] Remove intermediate diffs (use remove_intermediate_diffs function from intermediate_diffs_removing.py).
[Optional, only for Python language] Remove inefficient statements (use remove_inefficient_statements function from inefficient_statements_removing.py).
[Optional] Add int experience column (use add_int_experience function from int_experience_adding.py).

Note: you can use the actions independently, the data for the Nth step must have passed all the steps before it.

Available languages

C++
Java
Kotlin
Python

Visualization

You can visualize different parts of the pipeline.

Participants distribution

Note: Run before 'reorganize_files_structure' because the old files structure is used to count unique users.

Use get_profile_statistics function from statistics_gathering.py to get the age and experience statistics. After that, run plot_profile_statistics function from profile_statistics_plots.py with the necessary column and options. Use serialized files with statistic as a parameter.

Two column types are available:

STATISTICS_KEY.AGE
STATISTICS_KEY.EXPERIENCE

Two chart types are available:

CHART_TYPE.BAR
CHART_TYPE.PIE

Other options:

to_union_rare use to merge the rare values. The rare value means the frequency of the value is less than or equal to STATISTICS_RARE_VALUE_THRESHOLD from consts.py. Default value for STATISTICS_RARE_VALUE_THRESHOLD is 2.
format use to save the output into a file in different formats. The default value is html because the plots are interactive.
auto_open use to open plots automatically.
x_category_order use to choose the sort order for X axis. Available values are stored in PLOTTY_CATEGORY_ORDER from consts.py. The default value is PLOTTY_CATEGORY_ORDER.TOTAL_ASCENDING.

Tasks distribution

Note: Run after 'reorganize_files_structure'.

Use plot_tasks_statistics function from tasks_statistics_plots.py to plot tasks statistics.

Available options:

plot_name use to choose the filename. The default value is task_distribution_plot.
format use to save the output into different formats. The default value is html because the plots are interactive.
auto_open use to open plots automatically.

Activity tracker plots

Use create_ati_data_plot function from ati_data_plots to plot length of the current fragment together with the actions performed in IDE.

Scoring solutions plots

Note: Run after 'run_tests'.

Use plot_scoring_solutions function from scoring_solutions_plots.py to plot scoring solutions.

Installation

Simply clone the repository and run the following commands:

pip install -r requirements.txt
pip install -r dev-requirements.txt
pip install -r test-requirements.txt

Usage

Run the necessary file for available modules:

File	Module	Description
processing.py	Data processing module	Includes all steps from the Data processing section
plots.py	Plots module	Includes all plots from the Visualization section

A simple configuration: python <file> <args>

Use -h option to show help for each module.

Data processing module

See description: usage

File for running: preprocessing.py

Required arguments:

path — the path to data.

Optional arguments:

--level — use to set the level for the preprocessing. Available levels:

Value	Description
0	primary data processing
1	merge codetracker files and activity-tracker files
2	find tests results for the tasks
3	reorganize files structure
4	remove intermediate diffs
5	[only for Python language] remove inefficient statements
6	add int experience column, default value

Note: the Nth level runs all the levels before it. The default value is 3.

Plots module

See description: usage

File for running: plots.py

Required arguments:

path — the path to data.
plot_type — the type of plot. Available values:

Value	Description
participants_distr	use to visualize Participants distribution
tasks_distr	use to visualize Tasks distribution
ati	use to visualize Activity tracker plots
scoring	use to visualize Scoring solutions plots

Optional arguments:

Parameter	Description
‑‑type_distr	distribution type. Only for plot_type: `participants_distr`. Available values are `programExperience` and `age`. The default value is `programExperience`.
‑‑chart_type	chart type. Only for plot_type: `participants_distr`. Available values are `bar` and `pie`. The default value is `bar`.
‑‑to_union_rare	use to merge the rare values. Only for plot_type: `participants_distr`.
‑‑format	use to save the output into a file in different formats. For all plots except plot_type: `ati` Available values are `html` and `png`. The default value is `html`.
‑‑auto_open	use to open plots automatically.

Tests running

We use pytest library for tests.

Note: If you have ModuleNotFoundError while you try to run tests, please call pip install -e . before using the test system.

Note: We use different compilers for checking tasks. You can find all of them in the Dockerfile. But we also use kotlin compiler for checking kotlin tasks, you need to install it too if you have kotlin files.

Use python setup.py test from the root directory to run ALL tests. If you want to only run some tests, please use param --test_level.

You can use different test levels for param --test_level:

Param	Description
all	all tests from all modules (the default value)
plots	tests from the plots module
process	tests from the preprocessing module
test_scoring	tests from the test scoring module
util	tests from the util module
cli	tests from the cli module

Name		Name	Last commit message	Last commit date
Latest commit History 825 Commits
.circleci		.circleci
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
dev-requirements.txt		dev-requirements.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test-requirements.txt		test-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Status: Archived

Table of Contents

TaskTracker postprocessing

Overview

Data processing

Requirements for the source data

Processing

Available languages

Visualization

Participants distribution

Tasks distribution

Activity tracker plots

Scoring solutions plots

Installation

Usage

Data processing module

Plots module

Tests running

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

JetBrains-Research/task-tracker-post-processing

Folders and files

Latest commit

History

Repository files navigation

Status: Archived

Table of Contents

TaskTracker postprocessing

Overview

Data processing

Requirements for the source data

Processing

Available languages

Visualization

Participants distribution

Tasks distribution

Activity tracker plots

Scoring solutions plots

Installation

Usage

Data processing module

Plots module

Tests running

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages