Skip to content

Update to use Prometheus Pusher context manager, #129#132

Closed
voetberg wants to merge 1 commit intorucio:masterfrom
voetberg:common_context_manager_update
Closed

Update to use Prometheus Pusher context manager, #129#132
voetberg wants to merge 1 commit intorucio:masterfrom
voetberg:common_context_manager_update

Conversation

@voetberg
Copy link
Contributor

Update some of the common probes to use utils.common.PrometheusPusher. Uses #89 as a basis, but swaps out the prometheus_client context manager. Includes some metric name changes to match file names, and printing out some error info for debugging.

CC: @ericvaandering

@voetberg voetberg force-pushed the common_context_manager_update branch from ef2d176 to 072758c Compare February 15, 2024 16:30
result = query.scalar() or 0
# Possible check against a threshold. If result > max_value then sys.exit(CRITICAL)

manager.gauge('expired_dids.total',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does change the metric value. IIRC, ATLAS is ok with this in theory as we wanted to make things more sensical. But I'd like @dchristidis thoughts before we start using this for CMS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should be generally acceptable. But please add a comprehensive list of such changes in the commit message.


# Add to metrics
backlog_count = summary['submitted'] + summary['active'] + summary['staging'] + summary['started']
manager.gauge(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes names too, but in a slightly more subtle way. However, I think the variable things should always come at the end of the statsd metric.

On a style note, I'd prefer combining lines 138 and 139. You could also split that long line (140) just before the first closing paren and it would look nice.

print(f"Messages in queue: {message_count}")

manager.gauge(
"messages_to_submit.queues.messages",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this changes the metric value. The job label in prometheus will say where this comes from, but it still might be a good idea.

.filter(models.DataIdentifier.is_new.isnot(None)))
result = query.scalar() or 0

manager.gauge("new_dids").set(result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a new metric name

result = session.execute(sql_text(count_sql)).fetchone()[0]

with PrometheusPusher() as manager:
manager.gauge("unevaluated_dids").set(result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a change

push_to_gateway(server.strip(), job='check_updated_dids', registry=registry)
except:
continue
manager.gauge('updated_dids').set(result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another change

@voetberg voetberg force-pushed the common_context_manager_update branch 2 times, most recently from 7992510 to 174bfd1 Compare February 20, 2024 22:06
…ge uniformly rucio#129

Changes to metric names include:
                    * undertaker_expired_dids -> expired_dids.total
                    * fts3.{hostname}.submitted -> fts_backlog.submitted.{hostname}
                    * hermes_queues_messages.queues.messages -> messages_to_submit.queues.messages
                    * transmogrifier_new_dids -> new_dids
                    * judge_stuck_rules_without_missing_source_replica -> stuck_rules.{source_status} (source_status = [without_missing_source_replica, with_missing_source_replica])
                    * check_transfer_queues_status -> transfer_queues_status
                    * judge.waiting_dids -> unevaluated_dids
                    * reaper.unlocked_replicas -> unlocked_replicas.{replica_status} (replica_status = [expired, unlocked])
                    * judge.updated_dids -> updated_dids
print(f'Busy channels (>{busylimit} submitted):')
for bc in busy_channels:
activities_str = ", ".join([(f"{key}: {val}") for key, val in bc['activities'].items()])
print(f'{bc['src']} to {bc['dst']} : {bc['submitted']} submitted jobs {activities_str}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not compile. Did you mean to use double quotes here?
Also at other places for fstrings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a linter got a little excited and undid my work - thanks for catching it

@dchristidis
Copy link
Contributor

@ericvaandering and @dynamic-entropy, please resolve the conversations that you believe have been addressed to your satisfaction. I will begin my review then.

@voetberg
Copy link
Contributor Author

voetberg commented Aug 5, 2024

@/ericvaandering and @/dynamic-entropy, please resolve the conversations that you believe have been addressed to your satisfaction. I will begin my review then.

@dchristidis I think I'll just close this PR and restart each probe as its own PR. All our sql2.0 guidelines have been documented since this PR was put up, so a lot of things are (obviously) wrong here.

@voetberg voetberg closed this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants