Update to use Prometheus Pusher context manager, #129#132
Update to use Prometheus Pusher context manager, #129#132voetberg wants to merge 1 commit intorucio:masterfrom
Conversation
ef2d176 to
072758c
Compare
| result = query.scalar() or 0 | ||
| # Possible check against a threshold. If result > max_value then sys.exit(CRITICAL) | ||
|
|
||
| manager.gauge('expired_dids.total', |
There was a problem hiding this comment.
This does change the metric value. IIRC, ATLAS is ok with this in theory as we wanted to make things more sensical. But I'd like @dchristidis thoughts before we start using this for CMS.
There was a problem hiding this comment.
Yes, that should be generally acceptable. But please add a comprehensive list of such changes in the commit message.
common/check_fts_backlog
Outdated
|
|
||
| # Add to metrics | ||
| backlog_count = summary['submitted'] + summary['active'] + summary['staging'] + summary['started'] | ||
| manager.gauge( |
There was a problem hiding this comment.
This changes names too, but in a slightly more subtle way. However, I think the variable things should always come at the end of the statsd metric.
On a style note, I'd prefer combining lines 138 and 139. You could also split that long line (140) just before the first closing paren and it would look nice.
| print(f"Messages in queue: {message_count}") | ||
|
|
||
| manager.gauge( | ||
| "messages_to_submit.queues.messages", |
There was a problem hiding this comment.
Again this changes the metric value. The job label in prometheus will say where this comes from, but it still might be a good idea.
| .filter(models.DataIdentifier.is_new.isnot(None))) | ||
| result = query.scalar() or 0 | ||
|
|
||
| manager.gauge("new_dids").set(result) |
There was a problem hiding this comment.
Also a new metric name
| result = session.execute(sql_text(count_sql)).fetchone()[0] | ||
|
|
||
| with PrometheusPusher() as manager: | ||
| manager.gauge("unevaluated_dids").set(result) |
| push_to_gateway(server.strip(), job='check_updated_dids', registry=registry) | ||
| except: | ||
| continue | ||
| manager.gauge('updated_dids').set(result) |
7992510 to
174bfd1
Compare
…ge uniformly rucio#129 Changes to metric names include: * undertaker_expired_dids -> expired_dids.total * fts3.{hostname}.submitted -> fts_backlog.submitted.{hostname} * hermes_queues_messages.queues.messages -> messages_to_submit.queues.messages * transmogrifier_new_dids -> new_dids * judge_stuck_rules_without_missing_source_replica -> stuck_rules.{source_status} (source_status = [without_missing_source_replica, with_missing_source_replica]) * check_transfer_queues_status -> transfer_queues_status * judge.waiting_dids -> unevaluated_dids * reaper.unlocked_replicas -> unlocked_replicas.{replica_status} (replica_status = [expired, unlocked]) * judge.updated_dids -> updated_dids
174bfd1 to
4f8992e
Compare
| print(f'Busy channels (>{busylimit} submitted):') | ||
| for bc in busy_channels: | ||
| activities_str = ", ".join([(f"{key}: {val}") for key, val in bc['activities'].items()]) | ||
| print(f'{bc['src']} to {bc['dst']} : {bc['submitted']} submitted jobs {activities_str}') |
There was a problem hiding this comment.
This will not compile. Did you mean to use double quotes here?
Also at other places for fstrings.
There was a problem hiding this comment.
I think a linter got a little excited and undid my work - thanks for catching it
|
@ericvaandering and @dynamic-entropy, please resolve the conversations that you believe have been addressed to your satisfaction. I will begin my review then. |
@dchristidis I think I'll just close this PR and restart each probe as its own PR. All our sql2.0 guidelines have been documented since this PR was put up, so a lot of things are (obviously) wrong here. |
Update some of the common probes to use
utils.common.PrometheusPusher. Uses #89 as a basis, but swaps out theprometheus_clientcontext manager. Includes some metric name changes to match file names, and printing out some error info for debugging.CC: @ericvaandering