diff --git a/README.rst b/README.rst index 8fc8f149..d61cad76 100644 --- a/README.rst +++ b/README.rst @@ -4,8 +4,8 @@ Platform Plugin Aspects Purpose ******* -This repository holds various Aspects plugins for the Open edX platform including the -event sinks that move data from the LMS to ClickHouse and the embbedding of Superset +This repository holds various Aspects plugins for the Open edX platform, including the +event sinks that move data from the LMS to ClickHouse and the embedding of Superset dashboards back into the platform. Version Compatibility @@ -23,70 +23,82 @@ events are emitted by the Open edX platform via `Open edX events`_ or Django sig Available Sinks =============== -Below are the existing sink names, and their corresponding object names (as needed for the +Below are the existing sink names and their corresponding object names (as needed for the ``dump_data_to_clickhouse`` command below. -- ``CourseOverviewSink`` - Listens for the `COURSE_PUBLISHED` event and stores the - course structure including ordering data through `XBlockSink` in ClickHouse. Object name: +- ``CourseOverviewSink`` - Listens for the ``COURSE_PUBLISHED`` event and stores the + course structure, including ordering data through ``XBlockSink`` in ClickHouse. Object name: ``course_overviews`` -- ``ExternalIdSink`` - Listens for the `post_save` Django signal on the `ExternalId` +- ``ExternalIdSink`` - Listens for the ``post_save`` Django signal on the ``ExternalId`` model and stores the external id data in ClickHouse. This model stores the relationships between users and their xAPI unique identifiers. Object name: ``external_id`` -- ``UserProfile`` - Listens for the `post_save` Django signal on the ``UserProfile`` +- ``UserProfile`` - Listens for the ``post_save`` Django signal on the ``UserProfile`` model and stores the user profile data in ClickHouse. Object name: ``user_profile`` -- ``CourseEnrollmentSink`` - Listen for the `ENROLL_STATUS_CHANGE` event and stores +- ``CourseEnrollmentSink`` - Listen for the ``ENROLL_STATUS_CHANGE`` event and stores the course enrollment data. Object name: ``course_enrollment`` -- ``UserRetirementSink`` - Listen for the `USER_RETIRE_LMS_MISC` Django signal and - remove the user PII information from ClickHouse. This is a special sink and has no object name. +- ``UserRetirementSink`` - Listen for the ``USER_RETIRE_LMS_MISC`` Django signal and + remove the user PII information from ClickHouse. This is a special sink with no assigned object name. Commands ======== In addition to being an event listener, this package provides the following commands: -- `dump_data_to_clickhouse` - This command allows bulk export of the data from the Sinks. +- ``dump_data_to_clickhouse`` - This command allows bulk export of the data from the Sinks. Allows bootstrapping a new data platform or backfilling lost or missing data. Each sink object - is dumped individually. For large dumps you can use the --batch_size and --sleep_time to control + is dumped individually. For large dumps, you can use the ``--batch_size`` and ``--sleep_time`` to control how much load is placed on your LMS / Studio servers. Examples: - Dump any courses that the systems thinks are out of date (last publish time is newer than the - last dump time in ClickHouse): + Dump any courses that the system thinks are out of date (last publish time is newer than the + last dump time in ClickHouse): - ``python manage.py cms dump_data_to_clickhouse --object course_overviews`` + .. code-block:: bash - The ``force`` option willl dump all objects, regardless of the data ClickHouse currently has - so this command will push all course data for all courses: + python manage.py cms dump_data_to_clickhouse --object course_overviews - ``python manage.py cms dump_data_to_clickhouse --object course_overviews --force`` + The ``force`` option will dump all objects, regardless of the data ClickHouse currently has + so this command will push all course data for all courses: - These commands will dump the user data Aspects uses when PII is turned on: + .. code-block:: bash - ``python manage.py cms dump_data_to_clickhouse --object external_id`` - ``python manage.py cms dump_data_to_clickhouse --object user_profile`` + python manage.py cms dump_data_to_clickhouse --object course_overviews --force - To reduce server load, this command will dump 1000 user profiles at a time, with a 5 second - sleep in between: + These commands will dump the user data Aspects uses when PII is turned on: - ``python manage.py cms dump_data_to_clickhouse --object user_profile --batch_size 1000 --sleep_time 5`` + .. code-block:: bash - There are many more options that can be used for different circumstances. Please refer to - the commands help for more information. There is also a Tutor command that wraps this, so - that you don't need to get shell on a container to execute this command. More information on - that can be found in the `Aspects backfill documentation`_. + python manage.py cms dump_data_to_clickhouse --object external_id + python manage.py cms dump_data_to_clickhouse --object user_profile -- `load_test_tracking_events` - This command allows loading test tracking events into + To reduce server load, this command will dump 1000 user profiles at a time, with a 5-second + sleep in between: + + .. code-block:: bash + + python manage.py cms dump_data_to_clickhouse --object user_profile --batch_size 1000 --sleep_time 5 + + There are many more options that can be used for different circumstances. Please refer to + the commands help for more information. There is also a Tutor command that wraps this, so + that you don't need to get shell on a container to execute this command. More information on + that can be found in the `Aspects backfill documentation`_. + +- ``load_test_tracking_events`` - This command allows loading test tracking events into ClickHouse. This is useful for testing the ClickHouse connection to measure the performance of the - different data pipelines such as Vector, Event Bus (Redis and Kafka), and Celery. + different data pipelines, such as Vector, Event Bus (Redis and Kafka), and Celery. - Do not use this command in production as it will generate a large amount of data + **Do not use this command in production**, as it will generate a large amount of data and will slow down the system. - ``python manage.py cms load_test_tracking_events`` + .. code-block:: bash + + python manage.py cms load_test_tracking_events -- `monitor_load_test_tracking` - Monitors the load test tracking script and saves +- ``monitor_load_test_tracking`` - Monitors the load test tracking script and saves output for later analysis. - ``python manage.py cms monitor_load_test_tracking`` + .. code-block:: bash + + python manage.py cms monitor_load_test_tracking Instructor Dashboard Integration ================================ @@ -102,10 +114,10 @@ Please see the Open edX documentation for `guidance on Python development