Skip to content

Conversation

@cnlucas
Copy link
Member

@cnlucas cnlucas commented Apr 19, 2024

Summary (required)

This PR adds commands to manually check for and clear long running queries. I added logic to make sure that we are only killing queries in the read replica in prod.

Here's the dashboard for check_queries

If we use db.engine our logic routes to the default engine created by flask-sqlalchemy and points to SQLA_CONN. If we use db.session our logic goes through the follower logic, and IF the session is non-flushing the command will go to the read replica. We can test this on our locals by setting SQLA_FOLLOWERS. If you don't have SQLA_FOLLOWERS set, it will run against SQLA_CONN and if you don't have either set, it will run against cfdm_test.

Required reviewers

3 devs

Impacted areas of the application

General components of the application that this PR will affect:

  • RO replica long running queries

How to test

Locally:

  • You can test the db routing logic by exporting SQLA_FOLLOWERS to your local cfdm_test, if you don't do this and just unset SQLA_CONN it will run against your local test db
    'export SQLA_FOLLOWERS=postgresql://:@/cfdm_test'
    WARNING: Unsetting SQLA_CONN will be priority over exporting SQLA_FOLLOWERS.
  • If you haven't already, export the slack hook (you can grab from cf env api)
    'export SLACK_HOOK="slack hook here"'
  • In each function change SLACK_BOTS="#test-bot"
  • Remove the datname and usename portions of the SQL for both commands
    check_long_queries (lines 160-171):
    SQL = """
    SELECT *
    FROM pg_stat_activity
    WHERE state = 'active'
    and lower(query) like 'select %'
    and lower(query) not like '%refresh%'
    and lower(query) not like '%rollback%'
    and (now() - pg_stat_activity.query_start) >= interval '{} minutes'
    order by pg_stat_activity.query_start desc;
    """
    clear_long_queries (lines 199-210):
    SQL = """
    SELECT pg_terminate_backend(pid)
    FROM pg_stat_activity
    WHERE state = 'active'
    and lower(query) like 'select %'
    and lower(query) not like '%refresh%'
    and lower(query) not like '%rollback%'
    and (now() - pg_stat_activity.query_start) >= interval '{} minutes'
    order by pg_stat_activity.query_start desc;
    """
  • in dbeaver create a long-running test query like "select pg_sleep(5 * 60);"
  • wait 2 min
  • 'python cli.py check_long_queries 2'
    You should see the output in test-bot and in your terminal
  • 'python cli.py clear_long_queries 2'
    You should see the output in test-bot
    You can test running intervals lower than 2 (will create an error) or without an interval (will default to 5)
    You can also run multiple long queries.

Deploy to a space:

  • remove the datname and usename for both commands like above
  • switch bots to test-bot for both commands
  • deploy to dev
  • run a long query in dev like 'select pg_sleep(5 * 60);' and wait 2 minutes
  • 'cf run-task api --command "python cli.py check_long_queries 2" --name check_queries'
  • search for check_queries (or whatever you named the task) in logs
  • You should see the number of queries show up in the test-bot channel, and the query information in the logs
  • 'cf run-task api --command "python cli.py clear_long_queries 2" --name clear_queries'
  • search for clear_queries (or whatever you named the task) in logs
  • You should see the number of queries show up in the test-bot channel

@cnlucas cnlucas changed the title 5755-add long query commands [WIP] 5755-add long query commands Apr 19, 2024
@cnlucas cnlucas force-pushed the feature/5755-add-long-running-query-commands branch from 9dc47d6 to 9dfcfd7 Compare April 21, 2024 15:11
@cnlucas cnlucas changed the title [WIP] 5755-add long query commands 5755-add long query commands Apr 21, 2024
@codecov
Copy link

codecov bot commented Apr 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.81%. Comparing base (49e8553) to head (846dc5f).
Report is 4 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #5800   +/-   ##
========================================
  Coverage    85.81%   85.81%           
========================================
  Files           81       81           
  Lines         8594     8594           
========================================
  Hits          7375     7375           
  Misses        1219     1219           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@pkfec pkfec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected, thanks @cnlucas

@cnlucas cnlucas closed this Apr 30, 2024
@cnlucas cnlucas reopened this Apr 30, 2024
@cnlucas cnlucas force-pushed the feature/5755-add-long-running-query-commands branch from 9dfcfd7 to 846dc5f Compare April 30, 2024 21:51
@pkfec pkfec changed the title 5755-add long query commands [Do Not Merge]5755-add long query commands May 23, 2024
@pkfec pkfec marked this pull request as draft September 10, 2024 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

Research Creating a Celery Task to Check for Long Running Database Queries

4 participants