No longer maintained
This tool prepares raw data collected by TaskTracker plugin for further analysis. This data contains snapshots of code collected during the solution process and records of user interaction with the IDE.
The tool consists of two major modules:
- data processing
- data visualization
- The source data has to be in the .csv format.
- Activity-tracker files have a prefix ide-events. We use activity-tracker plugin.
- Codetracker files can have any names with a prefix of the key of the task, the data for which is collected in this file. We use TaskTracker plugin at the same time with the activity tracker plugin.
- Columns for the activity-tracker files can be found in the const file (the ACTIVITY_TRACKER_COLUMN const).
- Columns for the task-tracker files can be found in the const file (the TASK_TRACKER_COLUMN const).
The correct order for data processing is:
- Do primary data preprocessing (use preprocess_data function from preprocessing.py).
- Merge task-tracker files and activity-tracker files (use merge_tt_with_ati function from merging_tt_with_ati.py).
- Find tests results for the tasks (use run_tests function from tasks_tests_handler.py).
- Reorganize files structure (use reorganize_files_structure function from task_scoring.py).
- [Optional] Remove intermediate diffs (use remove_intermediate_diffs function from intermediate_diffs_removing.py).
- [Optional, only for Python language] Remove inefficient statements (use remove_inefficient_statements function from inefficient_statements_removing.py).
- [Optional] Add int experience column (use add_int_experience function from int_experience_adding.py).
Note: you can use the actions independently, the data for the Nth step must have passed all the steps before it.
- C++
- Java
- Kotlin
- Python
You can visualize different parts of the pipeline.
Note: Run before 'reorganize_files_structure' because the old files structure is used to count unique users.
Use get_profile_statistics function from statistics_gathering.py to get the age and experience statistics. After that, run plot_profile_statistics function from profile_statistics_plots.py with the necessary column and options. Use serialized files with statistic as a parameter.
Two column types are available:
- STATISTICS_KEY.AGE
- STATISTICS_KEY.EXPERIENCE
Two chart types are available:
- CHART_TYPE.BAR
- CHART_TYPE.PIE
Other options:
- to_union_rare use to merge the rare values. The rare value means the frequency of the value is less than or equal to
STATISTICS_RARE_VALUE_THRESHOLDfrom consts.py. Default value forSTATISTICS_RARE_VALUE_THRESHOLDis 2. - format use to save the output into a file in different formats. The default value is
htmlbecause the plots are interactive. - auto_open use to open plots automatically.
- x_category_order use to choose the sort order for X axis. Available values are stored in
PLOTTY_CATEGORY_ORDERfrom consts.py. The default value isPLOTTY_CATEGORY_ORDER.TOTAL_ASCENDING.
Note: Run after 'reorganize_files_structure'.
Use plot_tasks_statistics function from tasks_statistics_plots.py to plot tasks statistics.
Available options:
- plot_name use to choose the filename. The default value is task_distribution_plot.
- format use to save the output into different formats. The default value is
htmlbecause the plots are interactive. - auto_open use to open plots automatically.
Use create_ati_data_plot function from ati_data_plots to plot length of the current fragment together with the actions performed in IDE.
Note: Run after 'run_tests'.
Use plot_scoring_solutions function from scoring_solutions_plots.py to plot scoring solutions.
Simply clone the repository and run the following commands:
pip install -r requirements.txtpip install -r dev-requirements.txtpip install -r test-requirements.txt
Run the necessary file for available modules:
| File | Module | Description |
|---|---|---|
| processing.py | Data processing module | Includes all steps from the Data processing section |
| plots.py | Plots module | Includes all plots from the Visualization section |
A simple configuration: python <file> <args>
Use -h option to show help for each module.
See description: usage
File for running: preprocessing.py
Required arguments:
- path — the path to data.
Optional arguments:
--level — use to set the level for the preprocessing. Available levels:
| Value | Description |
|---|---|
| 0 | primary data processing |
| 1 | merge codetracker files and activity-tracker files |
| 2 | find tests results for the tasks |
| 3 | reorganize files structure |
| 4 | remove intermediate diffs |
| 5 | [only for Python language] remove inefficient statements |
| 6 | add int experience column, default value |
Note: the Nth level runs all the levels before it. The default value is 3.
See description: usage
File for running: plots.py
Required arguments:
- path — the path to data.
- plot_type — the type of plot. Available values:
| Value | Description |
|---|---|
| participants_distr | use to visualize Participants distribution |
| tasks_distr | use to visualize Tasks distribution |
| ati | use to visualize Activity tracker plots |
| scoring | use to visualize Scoring solutions plots |
Optional arguments:
| Parameter | Description |
|---|---|
| ‑‑type_distr | distribution type. Only for plot_type: participants_distr. Available values are programExperience and age. The default value is programExperience. |
| ‑‑chart_type | chart type. Only for plot_type: participants_distr. Available values are bar and pie. The default value is bar. |
| ‑‑to_union_rare | use to merge the rare values. Only for plot_type: participants_distr. |
| ‑‑format | use to save the output into a file in different formats. For all plots except plot_type: ati Available values are html and png. The default value is html. |
| ‑‑auto_open | use to open plots automatically. |
We use pytest library for tests.
Note: If you have ModuleNotFoundError while you try to run tests, please call pip install -e .
before using the test system.
Note: We use different compilers for checking tasks. You can find all of them in the Dockerfile. But we also use kotlin compiler for checking kotlin tasks, you need to install it too if you have kotlin files.
Use python setup.py test from the root directory to run ALL tests.
If you want to only run some tests, please use param --test_level.
You can use different test levels for param --test_level:
| Param | Description |
|---|---|
| all | all tests from all modules (the default value) |
| plots | tests from the plots module |
| process | tests from the preprocessing module |
| test_scoring | tests from the test scoring module |
| util | tests from the util module |
| cli | tests from the cli module |