Skip to content

Global Zone Memory Support#4

Open
Napsty wants to merge 1 commit intoVoxer:masterfrom
Napsty:gzpatch
Open

Global Zone Memory Support#4
Napsty wants to merge 1 commit intoVoxer:masterfrom
Napsty:gzpatch

Conversation

@Napsty
Copy link

@Napsty Napsty commented Jan 10, 2014

The plugin check_mem works great in smartmachines (zones) but unfortunately not on a physical smartos (global zone). I patched the plugin, so the plugin also works on a global zone. Unfortunately it's much more complicated on a global zone because kstat does not show all necessary information.

There might be much better solutions how to accomplish this, but even though several commands are used to obtain all necessary information on the global zone, it works and it's sufficiently fast (0.1s).

@bahamas10
Copy link
Contributor

Can you show me what happens when you run check_mem from the global zone? I've been using it for a while with no problems.

root @ [ datadyne :: (SunOS) ] ~ # zonename 
global
root @ [ datadyne :: (SunOS) ] ~ # ./check_mem 
ok: 3% used (warning=90%, critical=95%)|mem_used=38940672;mem_cap=1073741824

here's the platform I'm running this on

root @ [ datadyne :: (SunOS) ] ~ # uname -v
joyent_20131102T215831Z

@Napsty
Copy link
Author

Napsty commented Jan 11, 2014

Actually it doesnt work at all on the global because the rss value is always 0. See the output on http://www.claudiokuenzler.com/blog/434/get-real-memory-usage-statistics-physical-smartos-global.

Besides that for the physical host I would want to see the full physical usage of the whole server, including all smartmachines, not just of the gz.

It may be a SmartOS issue, too. Our PI's are older.

@Napsty
Copy link
Author

Napsty commented Jan 13, 2014

Hi Dave,

Back at work. Here's the output on the physical SmartOS. It works but it doesn't display the hosts memory usage but rather of a zone:

./check_mem
ok: 11% used (warning=90%, critical=95%)|mem_used=241262592;mem_cap=2147483648

As you see, the memory capacity is 2GB. But the physical host has 128GB built in. So the value is taken from the last found kstat entry:

kstat -p :::rss | grep 241262592
memory_cap:19:74fbcd92-085b-47fd-87c0-87c302:rss        241262592

zoneadm list -v | egrep "^  19"
  19 74fbcd92-085b-47fd-87c0-87c3026a0eb1 running    /zones/74fbcd92-085b-47fd-87c0-87c3026a0eb1 joyent   excl  

If I use check_mem with my patch, it output is:

./check_mem
ok: 13% used (warning=90%, critical=95%)|mem_used=18439311360;mem_cap=137427419136

@bahamas10
Copy link
Contributor

I see the issue now, good catch! I pushed this commit c224b7e which solves the problem for the global zone, without relying on the heavyweight call to prstat(1M) or the dependence on a temporary file. Check it out and let me know if it works.

I've tested this on my personal global zone machine (we don't have global zone access at Voxer) and both variations produce the same output. I took the logic for availrmem - freemem etc. from http://www.opensource.apple.com/source/dtrace/dtrace-78/DTTk/Mem/swapinfo.d

@Napsty
Copy link
Author

Napsty commented Jan 16, 2014

Excellent. Yes, I came across the kstat/pages information as well. I actually patched another check_mem plugin with exactly these values :) (see justintime/nagios-plugins#8).

I will test and let you know.

@Napsty
Copy link
Author

Napsty commented Jan 16, 2014

Actually I see a big difference now, but its not correct yet:

Your commit:

./check_mem.new
ok: 1% used (warning=90%, critical=95%)|mem_used=1908801536;mem_cap=137418817536

My patch:

./check_mem
ok: 26% used (warning=90%, critical=95%)|mem_used=36535824384;mem_cap=137427419136

Note the big difference in mem used.

I have made some changes by using totalpages instead of availrmem and also subtracted the zfs arc size. This results in yet another result (lol):

./check_mem.new2
 ok: 35% used (warning=90%, critical=95%)|mem_used=49280824512;mem_cap=137418809344

If I compare the used memory with the output from SDC (Used RAM 37376 MB) then its still the first version which is closest. Unfortunately I do not know how SDC is calculating the node's memory usage. And it doesnt seem to change a lot during the day...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants