collectl-interest Mailing List for collectl

This is now also available here github.com/sharkcz/collectl.git

Brought to you by: loberman, markseger

collectl-interest — General collectl discussions

You can subscribe to this list here.

2007	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec (1)
2008	Jan	Feb	Mar (7)	Apr	May	Jun	Jul	Aug	Sep	Oct (5)	Nov (5)	Dec (8)
2009	Jan (1)	Feb	Mar (3)	Apr (1)	May	Jun (3)	Jul (12)	Aug (1)	Sep (1)	Oct (1)	Nov (2)	Dec (11)
2010	Jan (14)	Feb (16)	Mar (2)	Apr	May	Jun (5)	Jul (6)	Aug (27)	Sep (20)	Oct (2)	Nov	Dec
2011	Jan	Feb (5)	Mar (66)	Apr (8)	May (2)	Jun (7)	Jul (2)	Aug (16)	Sep	Oct (7)	Nov (1)	Dec
2012	Jan	Feb (4)	Mar (14)	Apr	May (3)	Jun (4)	Jul	Aug (1)	Sep	Oct (26)	Nov (1)	Dec
2013	Jan	Feb (3)	Mar (34)	Apr (9)	May	Jun	Jul (1)	Aug (4)	Sep	Oct (8)	Nov (18)	Dec
2014	Jan (5)	Feb (7)	Mar (1)	Apr (2)	May (6)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan (2)	Feb	Mar (2)	Apr (3)	May (5)	Jun (7)	Jul (2)	Aug (4)	Sep (13)	Oct	Nov (1)	Dec
2016	Jan	Feb (4)	Mar (4)	Apr	May	Jun (5)	Jul	Aug	Sep	Oct (5)	Nov (1)	Dec
2017	Jan	Feb (1)	Mar	Apr	May (1)	Jun (2)	Jul	Aug	Sep	Oct	Nov	Dec
2019	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct (2)	Nov	Dec
2020	Jan	Feb (3)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

1 2 3 .. 20 > >> (Page 1 of 20)

[Collectl-interest] Not able to get lustre stats using collectl!

From: Puneet B. <bak...@gm...> - 2020-02-07 05:26:59

Hi,

I want to gather lustre stats from collectl but I am not getting blank
lines. What is wrong, I am doing

Collectl version (3.6.9)

collectl may be copied only under the terms of either the Artistic License
or the GNU General Public License, which may be found in the source kit
root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9#


Perl version (v5.26.1)

root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9# perl --version

This is perl 5, version 26, subversion 1 (v5.26.1) built for
x86_64-linux-gnu-thread-multi
(with 67 registered patches, see perl -V for more detail)
:::

Lustre version (2.12.2_178_ga0680fe_dirty)

root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9# lctl get_param version
version=2.12.2_178_ga0680fe_dirty

root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9# uname -a
Linux dgx1 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019
x86_64 x86_64 x86_64 GNU/Linux

root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9# uname -r
4.15.0-45-generic


Collectl lustre run (getting blank lines)

root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9# ./collectl.pl --verbose
--import lustreMDS,s
Use of uninitialized value $strace in pattern match (m//) at ./formatit.ph
line 178.
Use of uninitialized value $speed in numeric gt (>) at ./formatit.ph line
181.
waiting for 1 second sample...

### RECORD    1 >>> dgx1 <<< (1581052775.001) (Fri Feb  7 10:49:35 2020) ###

### RECORD    2 >>> dgx1 <<< (1581052776.001) (Fri Feb  7 10:49:36 2020) ###

### RECORD    3 >>> dgx1 <<< (1581052777.001) (Fri Feb  7 10:49:37 2020) ###

### RECORD    4 >>> dgx1 <<< (1581052778.001) (Fri Feb  7 10:49:38 2020) ###
^COuch!


root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9# ./collectl.pl --verbose
--import lustreOSS,s
Use of uninitialized value $strace in pattern match (m//) at ./formatit.ph
line 178.
Use of uninitialized value $speed in numeric gt (>) at ./formatit.ph line
181.
waiting for 1 second sample...

### RECORD    1 >>> dgx1 <<< (1581053094.001) (Fri Feb  7 10:54:54 2020) ###

### RECORD    2 >>> dgx1 <<< (1581053095.001) (Fri Feb  7 10:54:55 2020) ###

### RECORD    3 >>> dgx1 <<< (1581053096.001) (Fri Feb  7 10:54:56 2020) ###
^COuch!


When tried using updated version of collectl (V4.3.1-1), same problem
persists.

root@dgx1:~/pb/collectl/4.3.1/collectl-4.3.1# collectl --version
collectl V4.3.1-1 (zlib:2.074,HiRes:1.9741)

Copyright 2003-2018 Hewlett-Packard Development Company, L.P.
collectl may be copied only under the terms of either the Artistic License
or the GNU General Public License, which may be found in the source kit
root@dgx1:~/pb/collectl/4.3.1/collectl-4.3.1#
root@dgx1:~/pb/collectl/4.3.1/collectl-4.3.1# ./collectl --verbose --import
lustreMDS,s
Use of uninitialized value $strace in pattern match (m//) at
/root/pb/collectl/4.3.1/collectl-4.3.1/formatit.ph line 178.
Use of uninitialized value $speed in numeric gt (>) at
/root/pb/collectl/4.3.1/collectl-4.3.1/formatit.ph line 181.
waiting for 1 second sample...

### RECORD    1 >>> dgx1 <<< (1581052937.001) (Fri Feb  7 10:52:17 2020) ###

### RECORD    2 >>> dgx1 <<< (1581052938.001) (Fri Feb  7 10:52:18 2020) ###


Regards,
~Puneet

Re: [Collectl-interest] collectl says "system does not have lustre modules installed"!!

From: Puneet B. <bak...@gm...> - 2020-02-07 04:58:37

Why collectl (V4.1.0-1) does not have subsystem "lustre"? What am I missing?


root@dgx1:~/pb/collectl/4.1.0/collectl-4.1.0# collectl --version
collectl V4.1.0-1 (zlib:2.074,HiRes:1.9741)

Copyright 2003-2016 Hewlett-Packard Development Company, L.P.
collectl may be copied only under the terms of either the Artistic License
or the GNU General Public License, which may be found in the source kit
root@dgx1:~/pb/collectl/4.1.0/collectl-4.1.0#

root@dgx1:~/pb/collectl/4.1.0/collectl-4.1.0# collectl --showsubsys
The following subsystems can be specified in any combinations with -s or
--subsys in both record and playbackmode.  [default=bcdfijmnstx]

These generate summary, which is the total of ALL data for a particular type
  b - buddy info (memory fragmentation)
  c - cpu
  d - disk
  f - nfs
  i - inodes
  j - interrupts by CPU
  m - memory
  n - network
  s - sockets
  t - tcp
  x - interconnect (currently supported: OFED/Infiniband)
  y - slabs

These generate detail data, typically but not limited to the device level

  C -  individual CPUs, including interrupts if -sj or -sJ
  D -  individual Disks
  E -  environmental (fan, power, temp) [requires ipmitool]
  F -  nfs data
  J -  interrupts by CPU by interrupt number
  M -  memory numa/node
  N -  individual Networks
  T -  tcp details (lots of data!)
  X -  interconnect ports/rails (Infiniband/Quadrics)
  Y -  slabs/slubs
  Z -  processes

An alternative format lets you add and/or subtract subsystems to the
defaults by
immediately following -s with a + and/or -
  eg: -s+YZ-x adds slabs & processes and removes interconnet summary data
      -s-n removes network summary data
      -s-all removes ALL subsystems, something that can handy when playing
back
             data collected with --import and you ONLY want to see that data
root@dgx1:~/pb/collectl/4.1.0/collectl-4.1.0#

On Thu, Feb 6, 2020 at 3:34 PM Puneet Bakshi <bak...@gm...>
wrote:

> Hi,
>
> I want to use collectl (V4.1.0-1) to get lustre
> (version=2.12.2_178_ga0680fe_dirty) specific stats. But, it says "-sl
> disabled because this system does not have lustre modules installed"! But,
> system does have the necessary lustre modules. Can somebody help in
> resolving this issue.
>
> root@dgx1:~# collectl -sL
> Use of uninitialized value $strace in pattern match (m//) at
> /usr/share/collectl/formatit.ph line 178.
> Use of uninitialized value $speed in numeric gt (>) at /usr/share/collectl/
> formatit.ph line 181.
> -sl disabled because this system does not have lustre modules installed
> Error: no subsystems selected
> type 'collectl -h' for help
> root@dgx1:~#
>
> root@dgx1:~# collectl -sl
> Error: invalid subsystem 'l'
> type 'collectl -h' for help
> root@dgx1:~#
>
>
> Following are the system details.
>
> root@dgx1:~# uname -r
> 4.15.0-45-generic
>
> root@dgx1:~# uname -a
> Linux dgx1 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019
> x86_64 x86_64 x86_64 GNU/Linux
>
> root@dgx1:~# lctl get_param version
> version=2.12.2_178_ga0680fe_dirty
>
> root@dgx1:~# lsmod | grep lustre
> lustre                737280  2093
> lmv                   180224  3 lustre
> mdc                   237568  3 lustre
> lov                   311296  1397 lustre
> ptlrpc               1306624  8 fld,osc,fid,mgc,lov,mdc,lmv,lustre
> obdclass             2158592  1421
> fld,osc,fid,ptlrpc,mgc,lov,mdc,lmv,lustre
> lnet                  557056  7 osc,ko2iblnd,obdclass,ptlrpc,mgc,lmv,lustre
> libcfs                471040  12
> fld,lnet,osc,fid,ko2iblnd,obdclass,ptlrpc,mgc,lov,mdc,lmv,lustre
>
> root@dgx1:~# collectl --version
> collectl V4.1.0-1 (zlib:2.074,HiRes:1.9741)
> Copyright 2003-2016 Hewlett-Packard Development Company, L.P.
> collectl may be copied only under the terms of either the Artistic License
> or the GNU General Public License, which may be found in the source kit
>
> Regards,
> ~Puneet
>

[Collectl-interest] collectl says "system does not have lustre modules installed"!!

From: Puneet B. <bak...@gm...> - 2020-02-06 10:05:31

Hi,

I want to use collectl (V4.1.0-1) to get lustre
(version=2.12.2_178_ga0680fe_dirty) specific stats. But, it says "-sl
disabled because this system does not have lustre modules installed"! But,
system does have the necessary lustre modules. Can somebody help in
resolving this issue.

root@dgx1:~# collectl -sL
Use of uninitialized value $strace in pattern match (m//) at
/usr/share/collectl/formatit.ph line 178.
Use of uninitialized value $speed in numeric gt (>) at /usr/share/collectl/
formatit.ph line 181.
-sl disabled because this system does not have lustre modules installed
Error: no subsystems selected
type 'collectl -h' for help
root@dgx1:~#

root@dgx1:~# collectl -sl
Error: invalid subsystem 'l'
type 'collectl -h' for help
root@dgx1:~#


Following are the system details.

root@dgx1:~# uname -r
4.15.0-45-generic

root@dgx1:~# uname -a
Linux dgx1 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019
x86_64 x86_64 x86_64 GNU/Linux

root@dgx1:~# lctl get_param version
version=2.12.2_178_ga0680fe_dirty

root@dgx1:~# lsmod | grep lustre
lustre                737280  2093
lmv                   180224  3 lustre
mdc                   237568  3 lustre
lov                   311296  1397 lustre
ptlrpc               1306624  8 fld,osc,fid,mgc,lov,mdc,lmv,lustre
obdclass             2158592  1421 fld,osc,fid,ptlrpc,mgc,lov,mdc,lmv,lustre
lnet                  557056  7 osc,ko2iblnd,obdclass,ptlrpc,mgc,lmv,lustre
libcfs                471040  12
fld,lnet,osc,fid,ko2iblnd,obdclass,ptlrpc,mgc,lov,mdc,lmv,lustre

root@dgx1:~# collectl --version
collectl V4.1.0-1 (zlib:2.074,HiRes:1.9741)
Copyright 2003-2016 Hewlett-Packard Development Company, L.P.
collectl may be copied only under the terms of either the Artistic License
or the GNU General Public License, which may be found in the source kit

Regards,
~Puneet

Re: [Collectl-interest] Collectl - Environment data exportation to graphite daemon formatting bug

From: Mark S. <mj...@gm...> - 2019-10-25 12:54:56

Thanks for the patch but unfortunately for collectl i retired a few months
ago and have not been able to convince anyone at HPE to pick it up and
continue to support it. I do keep hoping someone will and I'd be more than
happy to help answer any development questions.
-mark

On Wed, Oct 23, 2019, 8:48 AM Florian, BERBAR <flo...@at...>
wrote:

> Hello,
>
> The information sent by collectl to graphite_exporter trigger a bug during the formatting of environnement data :
> Aug 18 01:45:33 <HOSTNAME> graphite_expo Invalid value rter[15097]: time="2019-08-18T01:45:33+02:00" level=info msg="Invalid value in line: <HOSTNAME>.env.BTemp1 sf 1566085530" source="main.go:112"
>
> The value of environnement data was set to « sf » that return a "Invalid value" notification.
>
> A unitary execution of the ‘Environment subsystem’ (E parameter of the -s option) between a host and the grafite daemon show this behaviour :
>
> <host># collectl -sE -i::1 -c1 --export graphite,<graphite_host>
>
> <graphite_host># tcpdump
> [...]
>        0x0000: 4500 0057 9ca7 4000 4006 4545 0a82 21b3 E..W..@.@.EE..!.
>        0x0010: 0a82 21fe dafe 07d3 57d9 6ddd fbf7 1fb5 ..!.....W.m.....
>        0x0020: 8018 00de cdde 0000 0101 080a a2e9 f028 ...............(
>        0x0030: a2f4 333b 0000 0000 0000 0000 002e 656e ..3;<HOSTNAME>.en
>        0x0040: 762e 4254 656d 7032 2073 6620 3135 3636 v.BTemp2.sf.1566
>        0x0050: 3339 3231 3835 0a 392185.
>
>
> <host># tcpdump
> [...]
>        0x0000: 4500 0057 b54f 4000 4006 2c9d 0a82 21b3 E..W.O@.@.,...!.
>        0x0010: 0a82 21fe dafe 07d3 57de dd56 fbf7 1fb5 ..!.....W..V....
>        0x0020: 8018 00de 2e44 0000 0101 080a a2f1 09af .....D..........
>        0x0030: a2fb 4cc2 0000 0000 0000 0000 002e 656e ..L.<HOSTNAME>.en
>        0x0040: 762e 4254 656d 7032 2073 6620 3135 3636 v.BTemp2.sf.1566
>        0x0050: 3339 3236 3530 0a 392650.
>
> This two network dumps show the issues at data sending time by the host involving the corrupt data received by the graphite host.
>
> The execution without exporting data to graphite host shows good values :
>
> <host># collectl -sE -i::1 -c1 -f /var/log/collectl/
> <host># ls -l /var/log/collectl/<host>2-20190821-145917.raw.gz
> -rw-r--r--. 1 root root 975 Aug 21 14:59 /var/log/collectl/<host>-20190821-145917.raw.gz
> <host># zgrep "Blade Temp" /var/log/collectl/<host>-20190821-145917.raw.gz
> ipmi: Blade Temp1,28,degrees C,ok
> ipmi: Blade Temp2,31,degrees C,ok
> ipmi: Blade Temp1,28,degrees C,ok
> ipmi: Blade Temp2,31,degrees C,ok
>
> The collected data is sent to graphite deamon using the sendData function defined at line 445 of /usr/share/collect/graphite.ph file. This function take 4 arguments. The 4th arguments is the float precision used during data formating :
> 445 sendData("env.$name$inst", $name, $ipmiData->{$key}->[$i]->{value}, '%s');
> [...]
> 460 sub sendData
> 461{
> 462 my $name= shift;
> 463 my $units=shift;
> 664 my $value=shift;
> 465 my $numpl=shift; # number of decimal places
> [...]
> 516 my $valString=(!defined($numpl)) ? sprintf('%d', $value) : sprintf("%.${numpl}f", $value);
> 517 my $message=sprintf("$graphiteBefore$graphiteMyHost$graphitePost.$name $valString %d\n", $graphiteIntTimeLast);
> 518 print $message if $graphiteDebug & 1;
> 519 if (!($graphiteDebug & 8))
> 520 {
> 521 my $bytes=syswrite($graphiteSocket, $message, length($message), 0);
> 522 }
>
> The float precision is set to ‘%s’ string which can’t be used as part of the float string format used at line 516. The string "%s" gives the string format "%.%sf". The values sent are therefore set to "sf" instead of a floating point number.
>
> The fix is to set the precision parameter to a integer constant instead of the string '%s'. A patch was add in attachment (The second correction is just to make the string format generation more human readable).
>
>
> Thank you
>
> Florian
> _______________________________________________
> Collectl-interest mailing list
> Col...@li...
> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>

[Collectl-interest] Collectl - Environment data exportation to graphite daemon formatting bug

From: Florian, B. <flo...@at...> - 2019-10-23 12:48:02

Attachments: collect.graphite.ph_bugformat.patch

Hello,

The information sent by collectl to graphite_exporter trigger a bug during the formatting of environnement data :
Aug 18 01:45:33 <HOSTNAME> graphite_expo Invalid value rter[15097]: time="2019-08-18T01:45:33+02:00" level=info msg="Invalid value in line: <HOSTNAME>.env.BTemp1 sf 1566085530" source="main.go:112"

The value of environnement data was set to « sf » that return a "Invalid value" notification.

A unitary execution of the ‘Environment subsystem’ (E parameter of the -s option) between a host and the grafite daemon show this behaviour :

<host># collectl -sE -i::1 -c1 --export graphite,<graphite_host>

<graphite_host># tcpdump
[...]
       0x0000: 4500 0057 9ca7 4000 4006 4545 0a82 21b3 E..W..@.@.EE..!.
       0x0010: 0a82 21fe dafe 07d3 57d9 6ddd fbf7 1fb5 ..!.....W.m.....
       0x0020: 8018 00de cdde 0000 0101 080a a2e9 f028 ...............(
       0x0030: a2f4 333b 0000 0000 0000 0000 002e 656e ..3;<HOSTNAME>.en
       0x0040: 762e 4254 656d 7032 2073 6620 3135 3636 v.BTemp2.sf.1566
       0x0050: 3339 3231 3835 0a 392185.


<host># tcpdump
[...]
       0x0000: 4500 0057 b54f 4000 4006 2c9d 0a82 21b3 E..W.O@.@.,...!.
       0x0010: 0a82 21fe dafe 07d3 57de dd56 fbf7 1fb5 ..!.....W..V....
       0x0020: 8018 00de 2e44 0000 0101 080a a2f1 09af .....D..........
       0x0030: a2fb 4cc2 0000 0000 0000 0000 002e 656e ..L.<HOSTNAME>.en
       0x0040: 762e 4254 656d 7032 2073 6620 3135 3636 v.BTemp2.sf.1566
       0x0050: 3339 3236 3530 0a 392650.

This two network dumps show the issues at data sending time by the host involving the corrupt data received by the graphite host.

The execution without exporting data to graphite host shows good values :

<host># collectl -sE -i::1 -c1 -f /var/log/collectl/
<host># ls -l /var/log/collectl/<host>2-20190821-145917.raw.gz
-rw-r--r--. 1 root root 975 Aug 21 14:59 /var/log/collectl/<host>-20190821-145917.raw.gz
<host># zgrep "Blade Temp" /var/log/collectl/<host>-20190821-145917.raw.gz
ipmi: Blade Temp1,28,degrees C,ok
ipmi: Blade Temp2,31,degrees C,ok
ipmi: Blade Temp1,28,degrees C,ok
ipmi: Blade Temp2,31,degrees C,ok

The collected data is sent to graphite deamon using the sendData function defined at line 445 of /usr/share/collect/graphite.ph file. This function take 4 arguments. The 4th arguments is the float precision used during data formating :
445 sendData("env.$name$inst", $name, $ipmiData->{$key}->[$i]->{value}, '%s');
[...]
460 sub sendData
461{
462 my $name= shift;
463 my $units=shift;
664 my $value=shift;
465 my $numpl=shift; # number of decimal places
[...]
516 my $valString=(!defined($numpl)) ? sprintf('%d', $value) : sprintf("%.${numpl}f", $value);
517 my $message=sprintf("$graphiteBefore$graphiteMyHost$graphitePost.$name $valString %d\n", $graphiteIntTimeLast);
518 print $message if $graphiteDebug & 1;
519 if (!($graphiteDebug & 8))
520 {
521 my $bytes=syswrite($graphiteSocket, $message, length($message), 0);
522 }

The float precision is set to ‘%s’ string which can’t be used as part of the float string format used at line 516. The string "%s" gives the string format "%.%sf". The values sent are therefore set to "sf" instead of a floating point number.

The fix is to set the precision parameter to a integer constant instead of the string '%s'. A patch was add in attachment (The second correction is just to make the string format generation more human readable).


Thank you

Florian

[Collectl-interest] Collectl support ending, looking for someone to take over

From: Mark S. <mj...@gm...> - 2019-08-13 20:03:57

After developing and supporting collectl for close to 20 years, I have
retired and as a result no longer have access to systems suitable for
providing support. I have contacted several people at my previous employers
looking for someone to take over but to date have not had any responses.

Therefore if someone wants to raise their hand and just fork a new instance
by all means do so and I'll be happy to help in any way I can. I would
recommend a reincarnation happen in github.

Meanwhile I will continue to provide email support, I just don't have the
means to make/test code changes. Of course if someone would like me to
provide some personal handholding or even training, shoot me an email and
we can discuss.

-mark

Re: [Collectl-interest] Clarification on counters

From: Mark S. <mj...@gm...> - 2017-06-20 20:34:42

quite honestly there are so many stats I'm not that familiar with all of
them in detail. for example memory stats come from /proc/meminfo and
/proc/vmstat and virtually all utilities that report the same data. might
be helpful to look at some other utilities to confirm they're reporting the
same numbers and how they've documented it. if there are better words I can
add to collectl descripting I'll be happy to do so
-mark

On Jun 20, 2017 4:39 AM, "Fulvio Scapin" <tra...@gm...> wrote:

Hello.

I was emptying out a swap volume (swapoff) and looking at the output of
collectl  -sm  --verbose -oT , while trying to make sense of a system with
a continuously rising swap occupation despite vm.swappiness set to 0.

Since the total swap memory was progressively shrinking down from a few
hundreds MB, I can't quite understand the values of swapped-in memory in
the GB range ( e.g. 6220M, 8536M ).
Is it a collectl bug or just a misunderstanding of mine?

# uname -a
Linux mail 4.4.0-79-generic #100-Ubuntu SMP Wed May 17 19:58:14 UTC 2017
x86_64 x86_64 x86_64 GNU/Linux


# MEMORY SUMMARY
#         <------------------------------------Physical
Memory------------------------------------------><----------
-Swap------------><-------Paging------>
#            Total    Used    Free    Buff  Cached    Slab  Mapped    Anon
  AnonH  Commit  Locked Inact Total  Used  Free   In  Out Fault MajFt   In
 Out
10:24:46    28143M  27931M 216776K   1504M   5453M   1089M 348180K  19686M
 17818M  21190M   3656K 3186M  247M  247M     0 6220M    0    15     0 6240
 164
10:24:47    28143M  27936M 211496K   1504M   5454M   1089M 348180K  19691M
 17818M  21190M   3656K 3186M  240M  240M     0 6472M    0     1     0 6472
1860
10:24:48    28143M  27945M 203004K   1504M   5455M   1089M 348180K  19698M
 17818M  21190M   3656K 3186M  232M  232M     0 8484M    0    44     0 9020
 664
10:24:49    28143M  27927M 220972K   1500M   5437M   1089M 347972K  19702M
 17818M  21190M   3656K 3180M  227M  227M     0 4768M    0   113     0 4772
2876
10:24:50    28143M  27934M 214276K   1500M   5438M   1089M 347972K  19708M
 17818M  21190M   3656K 3179M  221M  221M     0 6752M    0    67     0 7464
1468
10:24:51    28143M  27943M 204968K   1500M   5439M   1089M 347976K  19715M
 17818M  21190M   3656K 3179M  212M  212M     0 8536M    0   278     0 9184
 364
10:24:52    28143M  27927M 220716K   1498M   5419M   1089M 347908K  19721M
 17816M  21190M   3656K 3173M  205M  205M     0 7428M    0   388     0 7724
1444
10:24:53    28143M  27934M 213628K   1498M   5421M   1089M 347908K  19728M
 17816M  21190M   3656K 3173M  197M  197M     0 7812M    0    95     0 7864
 320
10:24:54    28143M  27941M 206700K   1499M   5423M   1089M 347908K  19732M
 17816M  21190M   3656K 3173M  192M  192M     0 5576M    0    77     0 6080
 388
10:24:55    28143M  27925M 223432K   1496M   5404M   1089M 347940K  19736M
 17816M  21190M   3656K 3166M  186M  186M     0 5876M    0    53     7 5960
 268
10:24:56    28143M  27933M 215256K   1497M   5407M   1089M 347824K  19739M
 17816M  21190M   3656K 3167M  181M  181M     0 5720M    0  3721     0 6444
5568
10:24:57    28143M  27938M 209808K   1497M   5409M   1089M 347828K  19743M
 17816M  21190M   3656K 3167M  176M  176M     0 4756M    0    67     0 4904
1312
10:24:58    28143M  27922M 225808K   1495M   5390M   1089M 347868K  19748M
 17816M  21190M   3656K 3162M  171M  171M     0 4588M    0   559     1 5540
3680
10:24:59    28143M  27928M 220016K   1495M   5391M   1089M 347712K  19753M
 17816M  21190M   3656K 3162M  166M  166M     0 5696M    0    45     0 5700
3056
10:25:00    28143M  27934M 213708K   1495M   5392M   1089M 347712K  19757M
 17816M  21190M   3656K 3162M  161M  161M     0 4537M    0    77    12 4661
 740
10:25:01    28143M  27939M 208256K   1495M   5393M   1089M 347768K  19761M
 17816M  21155M   3656K 3162M  157M  157M     0 4000M    0  2374     2 4253
1122
10:25:02    28143M  27943M 204628K   1496M   5395M   1089M 347828K  19764M
 17816M  21155M   3656K 3162M  153M  153M     0 4768M    0   724     8 5052
4244
10:25:03    28143M  27929M 218600K   1493M   5377M   1089M 347800K  19769M
 17816M  21155M   3656K 3156M  146M  146M     0 6524M    0    90     4 6544
2748
10:25:04    28143M  27930M 217436K   1494M   5376M   1089M 347848K  19772M
 17816M  21155M   3656K 3155M  142M  142M     0 4012M    0   103     0 4456
8308
10:25:05    28143M  27935M 212584K   1494M   5377M   1089M 347848K  19777M
 17816M  21155M   3656K 3155M  138M  138M     0 4964M    0    64     0 4968
 928
10:25:06    28143M  27940M 207656K   1494M   5379M   1089M 347872K  19780M
 17816M  21155M   3656K 3155M  133M  133M     0 4900M    0    31     0 4928
 316
10:25:07    28143M  27923M 224816K   1488M   5363M   1089M 347872K  19785M
 17816M  21155M   3656K 3149M  125M  125M     0 5828M    0   147    10 5872
 392



Thanks in advance and also thanks a lot for collectl as a tool.


Regards,
Fulvio Scapin

------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Collectl-interest mailing list
Col...@li...
https://lists.sourceforge.net/lists/listinfo/collectl-interest

[Collectl-interest] Clarification on counters

From: Fulvio S. <tra...@gm...> - 2017-06-20 08:38:24

Hello.

I was emptying out a swap volume (swapoff) and looking at the output of
collectl  -sm  --verbose -oT , while trying to make sense of a system with
a continuously rising swap occupation despite vm.swappiness set to 0.

Since the total swap memory was progressively shrinking down from a few
hundreds MB, I can't quite understand the values of swapped-in memory in
the GB range ( e.g. 6220M, 8536M ).
Is it a collectl bug or just a misunderstanding of mine?

# uname -a
Linux mail 4.4.0-79-generic #100-Ubuntu SMP Wed May 17 19:58:14 UTC 2017
x86_64 x86_64 x86_64 GNU/Linux


# MEMORY SUMMARY
#         <------------------------------------Physical
Memory------------------------------------------><-----------Swap------------><-------Paging------>

#            Total    Used    Free    Buff  Cached    Slab  Mapped    Anon
  AnonH  Commit  Locked Inact Total  Used  Free   In  Out Fault MajFt   In
 Out
10:24:46    28143M  27931M 216776K   1504M   5453M   1089M 348180K  19686M
 17818M  21190M   3656K 3186M  247M  247M     0 6220M    0    15     0 6240
 164
10:24:47    28143M  27936M 211496K   1504M   5454M   1089M 348180K  19691M
 17818M  21190M   3656K 3186M  240M  240M     0 6472M    0     1     0 6472
1860
10:24:48    28143M  27945M 203004K   1504M   5455M   1089M 348180K  19698M
 17818M  21190M   3656K 3186M  232M  232M     0 8484M    0    44     0 9020
 664
10:24:49    28143M  27927M 220972K   1500M   5437M   1089M 347972K  19702M
 17818M  21190M   3656K 3180M  227M  227M     0 4768M    0   113     0 4772
2876
10:24:50    28143M  27934M 214276K   1500M   5438M   1089M 347972K  19708M
 17818M  21190M   3656K 3179M  221M  221M     0 6752M    0    67     0 7464
1468
10:24:51    28143M  27943M 204968K   1500M   5439M   1089M 347976K  19715M
 17818M  21190M   3656K 3179M  212M  212M     0 8536M    0   278     0 9184
 364
10:24:52    28143M  27927M 220716K   1498M   5419M   1089M 347908K  19721M
 17816M  21190M   3656K 3173M  205M  205M     0 7428M    0   388     0 7724
1444
10:24:53    28143M  27934M 213628K   1498M   5421M   1089M 347908K  19728M
 17816M  21190M   3656K 3173M  197M  197M     0 7812M    0    95     0 7864
 320
10:24:54    28143M  27941M 206700K   1499M   5423M   1089M 347908K  19732M
 17816M  21190M   3656K 3173M  192M  192M     0 5576M    0    77     0 6080
 388
10:24:55    28143M  27925M 223432K   1496M   5404M   1089M 347940K  19736M
 17816M  21190M   3656K 3166M  186M  186M     0 5876M    0    53     7 5960
 268
10:24:56    28143M  27933M 215256K   1497M   5407M   1089M 347824K  19739M
 17816M  21190M   3656K 3167M  181M  181M     0 5720M    0  3721     0 6444
5568
10:24:57    28143M  27938M 209808K   1497M   5409M   1089M 347828K  19743M
 17816M  21190M   3656K 3167M  176M  176M     0 4756M    0    67     0 4904
1312
10:24:58    28143M  27922M 225808K   1495M   5390M   1089M 347868K  19748M
 17816M  21190M   3656K 3162M  171M  171M     0 4588M    0   559     1 5540
3680
10:24:59    28143M  27928M 220016K   1495M   5391M   1089M 347712K  19753M
 17816M  21190M   3656K 3162M  166M  166M     0 5696M    0    45     0 5700
3056
10:25:00    28143M  27934M 213708K   1495M   5392M   1089M 347712K  19757M
 17816M  21190M   3656K 3162M  161M  161M     0 4537M    0    77    12 4661
 740
10:25:01    28143M  27939M 208256K   1495M   5393M   1089M 347768K  19761M
 17816M  21155M   3656K 3162M  157M  157M     0 4000M    0  2374     2 4253
1122
10:25:02    28143M  27943M 204628K   1496M   5395M   1089M 347828K  19764M
 17816M  21155M   3656K 3162M  153M  153M     0 4768M    0   724     8 5052
4244
10:25:03    28143M  27929M 218600K   1493M   5377M   1089M 347800K  19769M
 17816M  21155M   3656K 3156M  146M  146M     0 6524M    0    90     4 6544
2748
10:25:04    28143M  27930M 217436K   1494M   5376M   1089M 347848K  19772M
 17816M  21155M   3656K 3155M  142M  142M     0 4012M    0   103     0 4456
8308
10:25:05    28143M  27935M 212584K   1494M   5377M   1089M 347848K  19777M
 17816M  21155M   3656K 3155M  138M  138M     0 4964M    0    64     0 4968
 928
10:25:06    28143M  27940M 207656K   1494M   5379M   1089M 347872K  19780M
 17816M  21155M   3656K 3155M  133M  133M     0 4900M    0    31     0 4928
 316
10:25:07    28143M  27923M 224816K   1488M   5363M   1089M 347872K  19785M
 17816M  21155M   3656K 3149M  125M  125M     0 5828M    0   147    10 5872
 392



Thanks in advance and also thanks a lot for collectl as a tool.


Regards,
Fulvio Scapin

[Collectl-interest] patch for Mellanox cards in Ethernet mode

From: Frederik F. <fre...@di...> - 2017-05-25 12:23:17

Attachments: 0001-Ignore-ethernet-ports-when-querying-performance-on-M.patch 0002-Infiniband-received-traffic-recording.patch

Hi Mark, All,

we've recently started deploying a few systems with Mellanox connectX
cards used as Ethernet adaptors and a second Mellanox card as Infiniband
HBA. In this configuration we have been seeing error in syslog as below.

kernel: infiniband mlx4_0: ib_register_mad_agent: QP 0 not supported

This seems to be a bit similar to the issue discussed here:

https://sourceforge.net/p/collectl/discussion/696865/thread/16e495ae/

The attached patch based on the discussions above seems to fix the issue
for us.

I'll also include a second patch for an error message I noticed while
investigating this. Note that I'm not that fluent in perl, so I'm not
sure if this is the right fix or if this just hides something else...

I would appreciate if these patches could be included in future releases
if appropriate.

Kind regards,
Frederik
--
Frederik Ferner
Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624
Diamond Light Source Ltd. mob: +44 7917 08 5110

Duty Sys Admin can be reached on x8596

(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

[Collectl-interest] Question about formatting of HCA names on multi-HCA systems

From: Craig t. <crt...@gm...> - 2017-02-28 20:49:38

Hello,

I am trying to run collectl on a multi-HCA system.  When I run with -sX,
the output returned is:

# INFINIBAND STATISTICS (/sec)
#Date    Time      HCA       KBIn   PktIn  SizeIn   KBOut  PktOut SizeOut
 Errors
20170228 12:46:35  mlx5         0       0       0       0       0       0
    0
20170228 12:46:35  mlx5         0       0       0       0       0       0
    0
20170228 12:46:35  mlx5         0       0       0       0       0       0
    0
20170228 12:46:35  mlx5         0       0       0       0       0       0
    0
20170228 12:46:36  mlx5         0       3      63       0       2      90
    0
20170228 12:46:36  mlx5         0       0       0       0       0       0
    0
20170228 12:46:36  mlx5         0       0       0       0       0       0
    0
20170228 12:46:36  mlx5         0       0       0       0       0       0
    0

I cannot identify which HCA is being used.  With the patch below, I get
something that makes sense (to me).

# INFINIBAND STATISTICS (/sec)
#Date    Time      HCA       KBIn   PktIn  SizeIn   KBOut  PktOut SizeOut
 Errors
20170228 12:47:32  mlx5_0       0       3      77       0       3      64
    0
20170228 12:47:32  mlx5_1       0       0       0       0       0       0
    0
20170228 12:47:32  mlx5_2       0       0       0       0       0       0
    0
20170228 12:47:32  mlx5_3       0       0       0       0       0       0
    0
20170228 12:47:33  mlx5_0       0       0       0       0       0       0
    0
20170228 12:47:33  mlx5_1       0       0       0       0       0       0
    0
20170228 12:47:33  mlx5_2       0       0       0       0       0       0
    0
20170228 12:47:33  mlx5_3       0       0       0       0       0       0
    0

Is there a reason for stripping of the _ and not adding the device number?

Thanks,
Craig

--- collectl/usr/share/collectl/formatit.ph 2017-02-28 09:29:53.859849557
-0800
+++ debug/collectl/usr/share/collectl/formatit.ph 2017-02-28
12:43:09.431284802 -0800
@@ -7208,9 +7208,11 @@
         # this is messy.  some HCSa end with _ which we don't want to
print BUT we
         # need to preserve the full name in the array so do a non-greedy
match so
         # we see everything except the optional _ at the end.
-        $HCAName[$i]=~/(\S+?)_*$/;
+
+ $name=$HCAName[$i].$i;
+
         $line=sprintf("$datetime %-6s %7s %7s %7s %7s %7s %7s %7s\n",
-          $1,
+          $name,
           cvt($ibRxKB[$i]/$intSecs,7,0,1), cvt($ibRx[$i]/$intSecs,6),
           $ibRx[$i] ? cvt($ibRxKB[$i]*1024/$ibRx[$i],4,0,1) : 0,
           cvt($ibTxKB[$i]/$intSecs,7,0,1), cvt($ibTx[$i]/$intSecs,6),

[Collectl-interest] Patch for reversing the host fqdn in graphite.ph

From: Laban M. <lm...@gm...> - 2016-11-30 19:51:54

Hello all,
 I have initiated a pull request that provides a reversed hostname for metrics being vended to graphite. This allows one to use a better hierarchical tree when navigating a large group of metrics for related services.
Please have a look at: https://github.com/labeneator/collectl/pull/1 <" rel="nofollow">https://github.com/labeneator/collectl/pull/1>

I’d love if this could be pulled into the main codebase after code review. Please let me know if there are improvements to be made.

Cheers,
Laban

Re: [Collectl-interest] collectl : how can I display columns : PureAcks HPAcks Loss FTrans ?

From: <Sop...@sm...> - 2016-10-31 12:12:56

Thank you Mark. This is not urgent so please don't put yourself out for 
me.




Kind regards,
Sophie Loewenthal

Server Infrastructure
Smals.be



From:   Mark Seger <mj...@gm...>
To:     Sop...@sm..., 
Cc:     "col...@li..." 
<col...@li...>
Date:   31/10/2016 13:07
Subject:        Re: [Collectl-interest] collectl : how can I display 
columns : PureAcks HPAcks Loss FTrans ?



looks like 3.6.3 has that format, but that is also 4 years old so no 
telling what had broken with newer kernels.  I will try to do something in 
the newer versions to restore these but that won't be immediately
-mark

On Mon, Oct 31, 2016 at 4:33 AM, <Sop...@sm...> wrote:
Hi Mark, 

> it came from an older version collectl, is that right?  
Yes it did, but copied from a blog entry. 

Do you recall the version with the old network stats?  I can use this 
instead. 

Kind regards, 
Sophie Loewenthal 

Server Infrastructure
Smals.be 



From:        Mark Seger <mj...@gm...> 
To:        Sop...@sm..., 
Cc:        "col...@li..." <
col...@li...> 
Date:        28/10/2016 16:34 
Subject:        Re: [Collectl-interest] collectl : how can I display 
columns : PureAcks HPAcks Loss FTrans ? 



I'm a little confused.  The output in your first example looks like it 
came from an older version collectl, is that right?  Awhile back I changed 
things to allow collectl to collect more network stats and so now in brief 
format it just shows error counts for 4 different kind of network stats. 
To get the specific fields, of which there are now well over 50, you need 
to specify a filter for the type of data you're interested in using 
--tcpfilt.  Furthermore you need to tell collectl to report in --verbose 
rather than brief.  This data is now reported as TCP Extended stats which 
you access via --tcpfilt T. 

To make things a little more complicated, sorry about this, since there is 
so much data and wanting to use fixed width fields I needed to come up 
with different names that were also more reflective of the data elements, 
which you can see if you look at /proc/net/snmp. and /proc/net/netstat. 

Looking more closely at both the code and the description of the output 
in http://collectl.sourceforge.net/Data-verbose.html I can see where 
making/documenting all those changes has led to some confusion. 
 specifically, look at this: 

mjs@blkjak:~$ collectl -c1 -st --tcpfilt T --verbose 
waiting for 1 second sample... 

# TCP STACK SUMMARY (/sec) 
#<------------------------------------------TcpExt-----------------------------------------> 

# FasTim Reject DelAck QikAck PktQue PreQuB HdPdct AkNoPy PreAck DsAcks 
RUData REClos  SackS 
       0      0      0      0      0      0      0      1      0      0   
   0      0      0 

As it turns out, PureAcks and HPAcks are now the AkNoPy and PreAck fields. 
I've no idea why I called them that but can certainly clean up the 
documentation to say so.  Also as for loss and fastretrans, I'm not 
reporting them at all here but do so in plot format.  I'm not really sure 
how that happened: 

collectl -c1 -st -P 
waiting for 1 second sample... 
#Date Time [TCP]IpErr [TCP]TcpErr [TCP]UdpErr [TCP]IcmpErr [TCP]Loss 
[TCP]FTrans 
20161028 10:33:04 0 0 0 0 0 0 

This will take some more thinking on my part.  Sorry about all this... 

-mark 








On Fri, Oct 28, 2016 at 7:43 AM, <Sop...@sm...> wrote: 
Dear Mark, 

How can I get this read out from collectl? 

[root@poker ~]# collectl -st
waiting for 1 second sample...
#<------------TCP------------->
#PureAcks HPAcks   Loss FTrans
       3      0      0      0
       1      0      0      0 

When I run this I have, 

# collectl -s t 
waiting for 1 second sample... 
#<-------TCP--------> 
#  IP  Tcp  Udp Icmp 
    0    0    0    0 
    0    0    0    0 
    0    0    0    0 


Version # 
# collectl  -v 
collectl V4.1.0-1 (zlib:2.021) 

Thanks for letting me know. 

Kind regards, 
Sophie Loewenthal 

Server Infrastructure
Smals.be 















Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in 
haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur 
of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals.
Indien dit bericht niet voor u bestemd is, verzoeken wij u dit 
onmiddellijk aan ons te melden en het bericht te vernietigen.

Conformément aux dispositions relatives à la représentation de l'asbl dans 
ses statuts, seul l'administrateur délégué, le directeur général ou son 
mandataire exprès est habilité à souscrire des engagements au nom de 
Smals.
Si ce message ne vous est pas destiné, nous vous prions de nous le 
signaler immédiatement et de détruire le message.

According to the provisions regarding representation of the non profit 
association in its bylaws, only the chief executive officer, the general 
manager or his explicit agent can enter into engagements on behalf of 
Smals.
If you are not the addressee of this message, we kindly ask you to signal 
this to us immediately and to delete the message. 






------------------------------------------------------------------------------
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive.
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik
_______________________________________________
Collectl-interest mailing list
Col...@li...
https://lists.sourceforge.net/lists/listinfo/collectl-interest

















Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in 
haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur 
of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals.
Indien dit bericht niet voor u bestemd is, verzoeken wij u dit 
onmiddellijk aan ons te melden en het bericht te vernietigen.

Conformément aux dispositions relatives à la représentation de l'asbl dans 
ses statuts, seul l'administrateur délégué, le directeur général ou son 
mandataire exprès est habilité à souscrire des engagements au nom de 
Smals.
Si ce message ne vous est pas destiné, nous vous prions de nous le 
signaler immédiatement et de détruire le message.

According to the provisions regarding representation of the non profit 
association in its bylaws, only the chief executive officer, the general 
manager or his explicit agent can enter into engagements on behalf of 
Smals.
If you are not the addressee of this message, we kindly ask you to signal 
this to us immediately and to delete the message.




------------------------------------------------------------------------------
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik_______________________________________________
Collectl-interest mailing list
Col...@li...
https://lists.sourceforge.net/lists/listinfo/collectl-interest


















Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in 
haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur 
of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals.
Indien dit bericht niet voor u bestemd is, verzoeken wij u dit 
onmiddellijk aan ons te melden en het bericht te vernietigen.

Conformément aux dispositions relatives à la représentation de l'asbl dans 
ses statuts, seul l'administrateur délégué, le directeur général ou son 
mandataire exprès est habilité à souscrire des engagements au nom de 
Smals.
Si ce message ne vous est pas destiné, nous vous prions de nous le 
signaler immédiatement et de détruire le message.

According to the provisions regarding representation of the non profit 
association in its bylaws, only the chief executive officer, the general 
manager or his explicit agent can enter into engagements on behalf of 
Smals.
If you are not the addressee of this message, we kindly ask you to signal 
this to us immediately and to delete the message.

Re: [Collectl-interest] collectl : how can I display columns : PureAcks HPAcks Loss FTrans ?

From: Mark S. <mj...@gm...> - 2016-10-31 11:57:18

looks like 3.6.3 has that format, but that is also 4 years old so no
telling what had broken with newer kernels.  I will try to do something in
the newer versions to restore these but that won't be immediately
-mark

On Mon, Oct 31, 2016 at 4:33 AM, <Sop...@sm...> wrote:

> Hi Mark,
>
> *> **it came from an older version collectl, is that right?  *
> Yes it did, but copied from a blog entry.
>
> Do you recall the version with the old network stats?  I can use this
> instead.
>
> Kind regards,
> Sophie Loewenthal
>
> Server Infrastructure
> Smals.be
>
>
>
> From:        Mark Seger <mj...@gm...>
> To:        Sop...@sm...,
> Cc:        "col...@li..." <
> col...@li...>
> Date:        28/10/2016 16:34
> Subject:        Re: [Collectl-interest] collectl : how can I display
> columns : PureAcks HPAcks Loss FTrans ?
> ------------------------------
>
>
>
> I'm a little confused.  The output in your first example looks like it
> came from an older version collectl, is that right?  Awhile back I changed
> things to allow collectl to collect more network stats and so now in brief
> format it just shows error counts for 4 different kind of network stats. To
> get the specific fields, of which there are now well over 50, you need to
> specify a filter for the type of data you're interested in using
> --tcpfilt.  Furthermore you need to tell collectl to report in --verbose
> rather than brief.  This data is now reported as TCP Extended stats which
> you access via --tcpfilt T.
>
> To make things a little more complicated, sorry about this, since there is
> so much data and wanting to use fixed width fields I needed to come up with
> different names that were also more reflective of the data elements, which
> you can see if you look at /proc/net/snmp. and /proc/net/netstat.
>
> Looking more closely at both the code and the description of the output in
> *http://collectl.sourceforge.net/Data-verbose.html*
> <" rel="nofollow">http://collectl.sourceforge.net/Data-verbose.html> I can see where
> making/documenting all those changes has led to some confusion.
>  specifically, look at this:
>
> mjs@blkjak:~$ collectl -c1 -st --tcpfilt T --verbose
> waiting for 1 second sample...
>
> # TCP STACK SUMMARY (/sec)
> #<------------------------------------------TcpExt----------
> ------------------------------->
> # FasTim Reject DelAck QikAck PktQue PreQuB HdPdct AkNoPy PreAck DsAcks
> RUData REClos  SackS
>        0      0      0      0      0      0      0      1      0      0
>    0      0      0
>
> As it turns out, PureAcks and HPAcks are now the AkNoPy and PreAck fields.
> I've no idea why I called them that but can certainly clean up the
> documentation to say so.  Also as for loss and fastretrans, I'm not
> reporting them at all here but do so in plot format.  I'm not really sure
> how that happened:
>
> collectl -c1 -st -P
> waiting for 1 second sample...
> #Date Time [TCP]IpErr [TCP]TcpErr [TCP]UdpErr [TCP]IcmpErr [TCP]Loss
> [TCP]FTrans
> 20161028 10:33:04 0 0 0 0 0 0
>
> This will take some more thinking on my part.  Sorry about all this...
>
> -mark
>
>
>
>
>
>
>
>
> On Fri, Oct 28, 2016 at 7:43 AM, <*Sop...@sm...*
> <Sop...@sm...>> wrote:
> Dear Mark,
>
> How can I get this read out from collectl?
>
> [root@poker ~]# collectl -st
> waiting for 1 second sample...
> #<------------TCP------------->
> #PureAcks HPAcks   Loss FTrans
>        3      0      0      0
>        1      0      0      0
>
> When I run this I have,
>
> # collectl -s t
> waiting for 1 second sample...
> #<-------TCP-------->
> #  IP  Tcp  Udp Icmp
>     0    0    0    0
>     0    0    0    0
>     0    0    0    0
>
>
> Version #
> # *collectl  -v*
> collectl V4.1.0-1 (zlib:2.021)
>
> Thanks for letting me know.
>
> Kind regards,
> Sophie Loewenthal
>
> Server Infrastructure
> Smals.be
>
>
> <" rel="nofollow">http://www.smals.be/>
>
>
> ------------------------------
>
>
> Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in
> haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur
> of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals.
> Indien dit bericht niet voor u bestemd is, verzoeken wij u dit
> onmiddellijk aan ons te melden en het bericht te vernietigen.
>
> Conformément aux dispositions relatives à la représentation de l'asbl dans
> ses statuts, seul l'administrateur délégué, le directeur général ou son
> mandataire exprès est habilité à souscrire des engagements au nom de Smals.
> Si ce message ne vous est pas destiné, nous vous prions de nous le
> signaler immédiatement et de détruire le message.
>
> According to the provisions regarding representation of the non profit
> association in its bylaws, only the chief executive officer, the general
> manager or his explicit agent can enter into engagements on behalf of Smals.
> If you are not the addressee of this message, we kindly ask you to signal
> this to us immediately and to delete the message.
>
>
>
> ------------------------------------------------------------
> ------------------
> The Command Line: Reinvented for Modern Developers
> Did the resurgence of CLI tooling catch you by surprise?
> Reconnect with the command line and become more productive.
> Learn the new .NET and *ASP.NET* <" rel="nofollow">http://asp.net/> CLI. Get your free
> copy!
> *http://sdm.link/telerik* <" rel="nofollow">http://sdm.link/telerik>
> _______________________________________________
> Collectl-interest mailing list
> *Col...@li...*
> <Col...@li...>
> *https://lists.sourceforge.net/lists/listinfo/collectl-interest*
> <" rel="nofollow">https://lists.sourceforge.net/lists/listinfo/collectl-interest>
>
>
>
>
> <" rel="nofollow">http://www.smals.be/>
>
>
> ------------------------------
>
>
> Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in
> haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur
> of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals.
> Indien dit bericht niet voor u bestemd is, verzoeken wij u dit
> onmiddellijk aan ons te melden en het bericht te vernietigen.
>
> Conformément aux dispositions relatives à la représentation de l'asbl dans
> ses statuts, seul l'administrateur délégué, le directeur général ou son
> mandataire exprès est habilité à souscrire des engagements au nom de Smals.
> Si ce message ne vous est pas destiné, nous vous prions de nous le
> signaler immédiatement et de détruire le message.
>
> According to the provisions regarding representation of the non profit
> association in its bylaws, only the chief executive officer, the general
> manager or his explicit agent can enter into engagements on behalf of Smals.
> If you are not the addressee of this message, we kindly ask you to signal
> this to us immediately and to delete the message.
>
>
>

Re: [Collectl-interest] collectl : how can I display columns : PureAcks HPAcks Loss FTrans ?

From: <Sop...@sm...> - 2016-10-31 08:33:40

Hi Mark,

> it came from an older version collectl, is that right?  
Yes it did, but copied from a blog entry. 

Do you recall the version with the old network stats?  I can use this 
instead.

Kind regards,
Sophie Loewenthal

Server Infrastructure
Smals.be



From:   Mark Seger <mj...@gm...>
To:     Sop...@sm..., 
Cc:     "col...@li..." 
<col...@li...>
Date:   28/10/2016 16:34
Subject:        Re: [Collectl-interest] collectl : how can I display 
columns : PureAcks HPAcks Loss FTrans ?



I'm a little confused.  The output in your first example looks like it 
came from an older version collectl, is that right?  Awhile back I changed 
things to allow collectl to collect more network stats and so now in brief 
format it just shows error counts for 4 different kind of network stats. 
To get the specific fields, of which there are now well over 50, you need 
to specify a filter for the type of data you're interested in using 
--tcpfilt.  Furthermore you need to tell collectl to report in --verbose 
rather than brief.  This data is now reported as TCP Extended stats which 
you access via --tcpfilt T.

To make things a little more complicated, sorry about this, since there is 
so much data and wanting to use fixed width fields I needed to come up 
with different names that were also more reflective of the data elements, 
which you can see if you look at /proc/net/snmp. and /proc/net/netstat.

Looking more closely at both the code and the description of the output 
in http://collectl.sourceforge.net/Data-verbose.html I can see where 
making/documenting all those changes has led to some confusion. 
 specifically, look at this:

mjs@blkjak:~$ collectl -c1 -st --tcpfilt T --verbose
waiting for 1 second sample...

# TCP STACK SUMMARY (/sec)
#<------------------------------------------TcpExt----------------------------------------->
# FasTim Reject DelAck QikAck PktQue PreQuB HdPdct AkNoPy PreAck DsAcks 
RUData REClos  SackS
       0      0      0      0      0      0      0      1      0      0   
   0      0      0

As it turns out, PureAcks and HPAcks are now the AkNoPy and PreAck fields. 
I've no idea why I called them that but can certainly clean up the 
documentation to say so.  Also as for loss and fastretrans, I'm not 
reporting them at all here but do so in plot format.  I'm not really sure 
how that happened:

collectl -c1 -st -P
waiting for 1 second sample...
#Date Time [TCP]IpErr [TCP]TcpErr [TCP]UdpErr [TCP]IcmpErr [TCP]Loss 
[TCP]FTrans
20161028 10:33:04 0 0 0 0 0 0

This will take some more thinking on my part.  Sorry about all this...

-mark








On Fri, Oct 28, 2016 at 7:43 AM, <Sop...@sm...> wrote:
Dear Mark, 

How can I get this read out from collectl? 

[root@poker ~]# collectl -st
waiting for 1 second sample...
#<------------TCP------------->
#PureAcks HPAcks   Loss FTrans
       3      0      0      0
       1      0      0      0 

When I run this I have, 

# collectl -s t 
waiting for 1 second sample... 
#<-------TCP--------> 
#  IP  Tcp  Udp Icmp 
    0    0    0    0 
    0    0    0    0 
    0    0    0    0 


Version # 
# collectl  -v 
collectl V4.1.0-1 (zlib:2.021) 

Thanks for letting me know. 

Kind regards, 
Sophie Loewenthal 

Server Infrastructure
Smals.be 















Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in 
haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur 
of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals.
Indien dit bericht niet voor u bestemd is, verzoeken wij u dit 
onmiddellijk aan ons te melden en het bericht te vernietigen.

Conformément aux dispositions relatives à la représentation de l'asbl dans 
ses statuts, seul l'administrateur délégué, le directeur général ou son 
mandataire exprès est habilité à souscrire des engagements au nom de 
Smals.
Si ce message ne vous est pas destiné, nous vous prions de nous le 
signaler immédiatement et de détruire le message.

According to the provisions regarding representation of the non profit 
association in its bylaws, only the chief executive officer, the general 
manager or his explicit agent can enter into engagements on behalf of 
Smals.
If you are not the addressee of this message, we kindly ask you to signal 
this to us immediately and to delete the message.





------------------------------------------------------------------------------
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive.
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik
_______________________________________________
Collectl-interest mailing list
Col...@li...
https://lists.sourceforge.net/lists/listinfo/collectl-interest



















Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in 
haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur 
of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals.
Indien dit bericht niet voor u bestemd is, verzoeken wij u dit 
onmiddellijk aan ons te melden en het bericht te vernietigen.

Conformément aux dispositions relatives à la représentation de l'asbl dans 
ses statuts, seul l'administrateur délégué, le directeur général ou son 
mandataire exprès est habilité à souscrire des engagements au nom de 
Smals.
Si ce message ne vous est pas destiné, nous vous prions de nous le 
signaler immédiatement et de détruire le message.

According to the provisions regarding representation of the non profit 
association in its bylaws, only the chief executive officer, the general 
manager or his explicit agent can enter into engagements on behalf of 
Smals.
If you are not the addressee of this message, we kindly ask you to signal 
this to us immediately and to delete the message.

Re: [Collectl-interest] collectl : how can I display columns : PureAcks HPAcks Loss FTrans ?

From: Mark S. <mj...@gm...> - 2016-10-28 14:34:28

I'm a little confused.  The output in your first example looks like it came
from an older version collectl, is that right?  Awhile back I changed
things to allow collectl to collect more network stats and so now in brief
format it just shows error counts for 4 different kind of network stats. To
get the specific fields, of which there are now well over 50, you need to
specify a filter for the type of data you're interested in using
--tcpfilt.  Furthermore you need to tell collectl to report in --verbose
rather than brief.  This data is now reported as TCP Extended stats which
you access via --tcpfilt T.

To make things a little more complicated, sorry about this, since there is
so much data and wanting to use fixed width fields I needed to come up with
different names that were also more reflective of the data elements, which
you can see if you look at /proc/net/snmp. and /proc/net/netstat.

Looking more closely at both the code and the description of the output in
http://collectl.sourceforge.net/Data-verbose.html I can see where
making/documenting all those changes has led to some confusion.
 specifically, look at this:

mjs@blkjak:~$ collectl -c1 -st --tcpfilt T --verbose
waiting for 1 second sample...

# TCP STACK SUMMARY (/sec)
#<------------------------------------------TcpExt----------------------------------------->
# FasTim Reject DelAck QikAck PktQue PreQuB HdPdct AkNoPy PreAck DsAcks
RUData REClos  SackS
       0      0      0      0      0      0      0      1      0      0
 0      0      0

As it turns out, PureAcks and HPAcks are now the AkNoPy and PreAck fields.
I've no idea why I called them that but can certainly clean up the
documentation to say so.  Also as for loss and fastretrans, I'm not
reporting them at all here but do so in plot format.  I'm not really sure
how that happened:

collectl -c1 -st -P
waiting for 1 second sample...
#Date Time [TCP]IpErr [TCP]TcpErr [TCP]UdpErr [TCP]IcmpErr [TCP]Loss
[TCP]FTrans
20161028 10:33:04 0 0 0 0 0 0

This will take some more thinking on my part.  Sorry about all this...

-mark








On Fri, Oct 28, 2016 at 7:43 AM, <Sop...@sm...> wrote:

> Dear Mark,
>
> How can I get this read out from collectl?
>
> [root@poker ~]# collectl -st
> waiting for 1 second sample...
> #<------------TCP------------->
> #PureAcks HPAcks   Loss FTrans
>        3      0      0      0
>        1      0      0      0
>
> When I run this I have,
>
> # collectl -s t
> waiting for 1 second sample...
> #<-------TCP-------->
> #  IP  Tcp  Udp Icmp
>     0    0    0    0
>     0    0    0    0
>     0    0    0    0
>
>
> Version #
> # *collectl  -v*
> collectl V4.1.0-1 (zlib:2.021)
>
> Thanks for letting me know.
>
> Kind regards,
> Sophie Loewenthal
>
> Server Infrastructure
> Smals.be
>
>
> <" rel="nofollow">http://www.smals.be/>
>
>
> ------------------------------
>
>
> Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in
> haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur
> of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals.
> Indien dit bericht niet voor u bestemd is, verzoeken wij u dit
> onmiddellijk aan ons te melden en het bericht te vernietigen.
>
> Conformément aux dispositions relatives à la représentation de l'asbl dans
> ses statuts, seul l'administrateur délégué, le directeur général ou son
> mandataire exprès est habilité à souscrire des engagements au nom de Smals.
> Si ce message ne vous est pas destiné, nous vous prions de nous le
> signaler immédiatement et de détruire le message.
>
> According to the provisions regarding representation of the non profit
> association in its bylaws, only the chief executive officer, the general
> manager or his explicit agent can enter into engagements on behalf of Smals.
> If you are not the addressee of this message, we kindly ask you to signal
> this to us immediately and to delete the message.
>
>
>
> ------------------------------------------------------------
> ------------------
> The Command Line: Reinvented for Modern Developers
> Did the resurgence of CLI tooling catch you by surprise?
> Reconnect with the command line and become more productive.
> Learn the new .NET and ASP.NET CLI. Get your free copy!
> http://sdm.link/telerik
> _______________________________________________
> Collectl-interest mailing list
> Col...@li...
> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>
>

[Collectl-interest] collectl : how can I display columns : PureAcks HPAcks Loss FTrans ?

From: <Sop...@sm...> - 2016-10-28 12:01:24

Dear Mark,

How can I get this read out from collectl? 

[root@poker ~]# collectl -st
waiting for 1 second sample...
#<------------TCP------------->
#PureAcks HPAcks   Loss FTrans
        3      0      0      0
        1      0      0      0

When I run this I have, 

# collectl -s t
waiting for 1 second sample...
#<-------TCP-------->
#  IP  Tcp  Udp Icmp 
    0    0    0    0 
    0    0    0    0 
    0    0    0    0 


Version #
# collectl  -v
collectl V4.1.0-1 (zlib:2.021)

Thanks for letting me know.

Kind regards,
Sophie Loewenthal

Server Infrastructure
Smals.be
















Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in 
haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur 
of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals.
Indien dit bericht niet voor u bestemd is, verzoeken wij u dit 
onmiddellijk aan ons te melden en het bericht te vernietigen.

Conformément aux dispositions relatives à la représentation de l'asbl dans 
ses statuts, seul l'administrateur délégué, le directeur général ou son 
mandataire exprès est habilité à souscrire des engagements au nom de 
Smals.
Si ce message ne vous est pas destiné, nous vous prions de nous le 
signaler immédiatement et de détruire le message.

According to the provisions regarding representation of the non profit 
association in its bylaws, only the chief executive officer, the general 
manager or his explicit agent can enter into engagements on behalf of 
Smals.
If you are not the addressee of this message, we kindly ask you to signal 
this to us immediately and to delete the message.

Re: [Collectl-interest] colmux time format error

From: Hernan L. <her...@gm...> - 2016-06-23 02:38:06

Hello Mark,

I downloaded the latest version from Sourceforge and it seems to fix these
issues, even with RAW files generated by the (older?) version available on
Debian. I will use this version going forward, we can declare the problem
resolved.

Thanks for your help,

Hernan


On Thu, Jun 16, 2016 at 5:30 AM, Mark Seger <mj...@gm...> wrote:

> Wow, that's a tricky one.  quite honestly colmux has been so solid for me
> I haven't looked at the code in ages, but that doesn't mean anything
> either.  It's also amusing to note I had totally forgotten it supported the
> hostname address syntax you're using.  ;)  That allowed me to essentially
> use the same command you are, with one note.  I also added -test and see
> columns 10 and 20 are different than you're saying.  maybe you have a
> different kernel?  I'm on 4.4.7-1-amd64-hpelinux which is the linux we use
> for our Helion Cloud and is essentially debian as well.
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P" -cols 10,20
>
>          [CPU:0]Idle%                  [CPU:1]Soft%
> #Time    1-mgmt 2-mgmt 3-mgmt |  1-mgmt 2-mgmt 3-mgmt
> 12:08:27     -1     -1     -1 |      -1     -1     -1
> 12:08:28     -1     -1     -1 |      -1     -1     -1
> 12:08:29     95     -1    100 |       0     -1      0
> 12:08:30     95     97     98 |       0      0      0
> 12:08:31     97    100    100 |       0      0      0
> 12:08:32     87    100     89 |       0      0      0
> 12:08:33    100    100    100 |       0      0      0
> 12:08:34    100    100     99 |       0      0      0
> 12:08:35    100     97     97 |       0      0      0
> 12:08:36     99     98    100 |       0      0      0
>
> What you didn't say is does this fail all the time or intermittently.  If
> intermittent it will indeed be hard to track down, but there is hope too ;)
>
> Have you tried playing back a file with colmux yet?  If not, you can
> simply rerun the command but include -p and point it to the raw files.  The
> one thing I did discover is I think I introduced a bug some time in the
> past and you need to have the hostname portion of the string start with a
> wild card rather than anywhere in the middle.  And then to make matters
> worse I found a second bug and am using the wrong column during playback.
>  more digging into that required too.  ;(
>
> BUT if I add 1 to each column I think this looks right if you ignore what
> the headers say:
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P -p
> '/var/cache/collectl/*-mgmt-20160616-110000.raw.gz'" -cols 11,21|more
>
>          [CPU:0]Totl%                  [CPU:1]Steal%
> #Time    1-mgmt 2-mgmt 3-mgmt |  1-mgmt 2-mgmt 3-mgmt
>      99     99    100 |       0      0      0
>      98     99     97 |       0      0      0
>      94     98     94 |       0      0      0
>      94     93     92 |       0      0      0
>      99     94     98 |       0      0      0
>      99    100     99 |       0      0      0
>      99    100    100 |       0      0      0
>
> and since this is a playback command, you can use time ranges as well to
> limit what is being displayed so I may help zero in on where in the data
> the problem is and then maybe even send me a subset of the problem raw file
> [use collectl --extract to create a new raw from from the time slice of an
> old one].  then, maybe I can track down why this is happening.
>
> -mark
>
>
>
>
>
>
> On Wed, Jun 15, 2016 at 8:35 PM, Hernan Laffitte <
> her...@gm...> wrote:
>
>> Hello,
>>
>> We are trying to gather detailed CPU usage from a number of machines in
>> our cluster. In particular, we want to see usage of every individual CPU in
>> a group of machines.
>>
>> With collectl, on a single machine, the command we can run is:
>>
>>    collectl -sC -oT -P
>>
>> Which gives us 282 columns (the machines have 28 CPU's).
>>
>> Now we want to run a colmux command to see the idle time of CPU's 0 and 1
>> on 3 machines. This is columns 10 and 20 ("[CPU:0]Idle%" and
>> "[CPU:1]Idle%"). The command we use is:
>>
>>    colmux -addr 'machine-[1-3]' -command "-sC -oT -P" -cols 10,20
>>
>> This generates the error:
>>
>>    Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.
>>
>> The error occurs when parsing the field "lasttime" of a data structure
>> $hostVars, which has the following content at the time of the error:
>>
>> {
>>           'lasttime' => [
>>                           '',
>>                           '20160615'
>>                         ],
>>           'maxinst' => [
>>                          -1,
>>                          0
>>                        ],
>>           'lastinst' => [
>>                           -1,
>>                           0
>>                         ],
>>           'bufptr' => 1
>> };
>>
>> I am currently running version "collectl V3.6.9-1
>> (zlib:2.06,HiRes:1.9725)" on Debian. Any idea of what may be the problem
>> here?
>>
>>
>> Thanks in advance,
>>
>> Hernan
>>
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning
>> reports.
>> http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
>> _______________________________________________
>> Collectl-interest mailing list
>> Col...@li...
>> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>>
>>
>

[Collectl-interest] Fwd: colmux time format error

From: Hernan L. <her...@gm...> - 2016-06-23 00:05:25

---------- Forwarded message ----------
From: Hernan Laffitte <her...@gm...>
Date: Wed, Jun 22, 2016 at 4:45 PM
Subject: Re: [Collectl-interest] colmux time format error
To: Mark Seger <mj...@gm...>

Hello Mark,

Thanks for the reply! I finally had some time to run the additional tests
you requested. Some comments below...

On Thu, Jun 16, 2016 at 5:30 AM, Mark Seger <mj...@gm...> wrote:

> maybe you have a different kernel?
>

The machine where I am having this issue is running Debian "jessie/sid".
Kernel is:

   Linux spaa-1 3.13.0-85-generic #129-Ubuntu SMP Thu Mar 17 20:50:15 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux

The version of colmux I have installed is "colmux: 4.7.1 (Term::ReadKey:
V2.31 Threads: 1.86)"

When running the command in 'test' mode, the columns 10, 20, 30, ... were
the "%Idle" of the CPU's. Columns 11, 21, 31,... were the "%Total" of the
CPU's.

In both cases, the commands give this error all the time (not an
intermittent error). One or two "-1" rows appear, followed by the message:

   Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.

What you didn't say is does this fail all the time or intermittently.  If
> intermittent it will indeed be hard to track down, but there is hope too ;)
>
>
The error occurs every time I try this command.

> Have you tried playing back a file with colmux yet?
>

I am gathering the output of collectl from all the machines into an NFS
directory. All the machines in the cluster have /var/log/collectl
 symlinked to /nfs/mnt/path/to/collectl

If I run the command via replay, it doesn't :

colmux -addr 'spaa-[1-3]' -command "-sC -oT -P -p
'/nfs/mnt/path/to/collectl/*20160621*raw.gz'" -cols 11,21 | less

However, every row the 3 machines all have the same values for CPU0 and
CPU1. Something like:

#Time    spaa-1 spaa-2 spaa-3 |  spaa-1 spaa-2 spaa-3
...
      1      1      1 |       3      3      3
      0      0      0 |      11     11     11
...

and since this is a playback command, you can use time ranges as well to
> limit what is being displayed so I may help zero in on where in the data
> the problem is and then maybe even send me a subset of the problem raw file
> [use collectl --extract to create a new raw from from the time slice of an
> old one].  then, maybe I can track down why this is happening.
>
> -mark
>
>
Thanks Mark, I will send a copy of the raw files in a private email.

Regards,

Hernan

Re: [Collectl-interest] colmux time format error

From: Mark S. <mj...@gm...> - 2016-06-16 17:12:25

so while I haven't been able to repeat what you've reported I did find a
few bugs with playing back raw files in plot mode, so this has been a good
thing.  the biggest challenge is there are a lot of switch combinations in
native collectl and tossing colmux into the mix makes it even more
complicated, especially when you fear breaking something that already
works, but I think I've figure it out.  The other complication is the lack
of testing as I often feel like I'm the only one who uses some of the more
obscure, but useful, features.  Good to see you doing so too and if you
haven't yet tried playing back files across multiple machines I think
you'll discover a whole new power.  ;)
-mark

On Thu, Jun 16, 2016 at 8:30 AM, Mark Seger <mj...@gm...> wrote:

> Wow, that's a tricky one.  quite honestly colmux has been so solid for me
> I haven't looked at the code in ages, but that doesn't mean anything
> either.  It's also amusing to note I had totally forgotten it supported the
> hostname address syntax you're using.  ;)  That allowed me to essentially
> use the same command you are, with one note.  I also added -test and see
> columns 10 and 20 are different than you're saying.  maybe you have a
> different kernel?  I'm on 4.4.7-1-amd64-hpelinux which is the linux we use
> for our Helion Cloud and is essentially debian as well.
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P" -cols 10,20
>
>          [CPU:0]Idle%                  [CPU:1]Soft%
> #Time    1-mgmt 2-mgmt 3-mgmt |  1-mgmt 2-mgmt 3-mgmt
> 12:08:27     -1     -1     -1 |      -1     -1     -1
> 12:08:28     -1     -1     -1 |      -1     -1     -1
> 12:08:29     95     -1    100 |       0     -1      0
> 12:08:30     95     97     98 |       0      0      0
> 12:08:31     97    100    100 |       0      0      0
> 12:08:32     87    100     89 |       0      0      0
> 12:08:33    100    100    100 |       0      0      0
> 12:08:34    100    100     99 |       0      0      0
> 12:08:35    100     97     97 |       0      0      0
> 12:08:36     99     98    100 |       0      0      0
>
> What you didn't say is does this fail all the time or intermittently.  If
> intermittent it will indeed be hard to track down, but there is hope too ;)
>
> Have you tried playing back a file with colmux yet?  If not, you can
> simply rerun the command but include -p and point it to the raw files.  The
> one thing I did discover is I think I introduced a bug some time in the
> past and you need to have the hostname portion of the string start with a
> wild card rather than anywhere in the middle.  And then to make matters
> worse I found a second bug and am using the wrong column during playback.
>  more digging into that required too.  ;(
>
> BUT if I add 1 to each column I think this looks right if you ignore what
> the headers say:
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P -p
> '/var/cache/collectl/*-mgmt-20160616-110000.raw.gz'" -cols 11,21|more
>
>          [CPU:0]Totl%                  [CPU:1]Steal%
> #Time    1-mgmt 2-mgmt 3-mgmt |  1-mgmt 2-mgmt 3-mgmt
>      99     99    100 |       0      0      0
>      98     99     97 |       0      0      0
>      94     98     94 |       0      0      0
>      94     93     92 |       0      0      0
>      99     94     98 |       0      0      0
>      99    100     99 |       0      0      0
>      99    100    100 |       0      0      0
>
> and since this is a playback command, you can use time ranges as well to
> limit what is being displayed so I may help zero in on where in the data
> the problem is and then maybe even send me a subset of the problem raw file
> [use collectl --extract to create a new raw from from the time slice of an
> old one].  then, maybe I can track down why this is happening.
>
> -mark
>
>
>
>
>
>
> On Wed, Jun 15, 2016 at 8:35 PM, Hernan Laffitte <
> her...@gm...> wrote:
>
>> Hello,
>>
>> We are trying to gather detailed CPU usage from a number of machines in
>> our cluster. In particular, we want to see usage of every individual CPU in
>> a group of machines.
>>
>> With collectl, on a single machine, the command we can run is:
>>
>>    collectl -sC -oT -P
>>
>> Which gives us 282 columns (the machines have 28 CPU's).
>>
>> Now we want to run a colmux command to see the idle time of CPU's 0 and 1
>> on 3 machines. This is columns 10 and 20 ("[CPU:0]Idle%" and
>> "[CPU:1]Idle%"). The command we use is:
>>
>>    colmux -addr 'machine-[1-3]' -command "-sC -oT -P" -cols 10,20
>>
>> This generates the error:
>>
>>    Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.
>>
>> The error occurs when parsing the field "lasttime" of a data structure
>> $hostVars, which has the following content at the time of the error:
>>
>> {
>>           'lasttime' => [
>>                           '',
>>                           '20160615'
>>                         ],
>>           'maxinst' => [
>>                          -1,
>>                          0
>>                        ],
>>           'lastinst' => [
>>                           -1,
>>                           0
>>                         ],
>>           'bufptr' => 1
>> };
>>
>> I am currently running version "collectl V3.6.9-1
>> (zlib:2.06,HiRes:1.9725)" on Debian. Any idea of what may be the problem
>> here?
>>
>>
>> Thanks in advance,
>>
>> Hernan
>>
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning
>> reports.
>> http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
>> _______________________________________________
>> Collectl-interest mailing list
>> Col...@li...
>> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>>
>>
>

Re: [Collectl-interest] colmux time format error

From: Mark S. <mj...@gm...> - 2016-06-16 12:30:26

Wow, that's a tricky one.  quite honestly colmux has been so solid for me I
haven't looked at the code in ages, but that doesn't mean anything either.
It's also amusing to note I had totally forgotten it supported the hostname
address syntax you're using.  ;)  That allowed me to essentially use the
same command you are, with one note.  I also added -test and see columns 10
and 20 are different than you're saying.  maybe you have a different
kernel?  I'm on 4.4.7-1-amd64-hpelinux which is the linux we use for our
Helion Cloud and is essentially debian as well.

stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
-command "-sC -oT -P" -cols 10,20

         [CPU:0]Idle%                  [CPU:1]Soft%
#Time    1-mgmt 2-mgmt 3-mgmt |  1-mgmt 2-mgmt 3-mgmt
12:08:27     -1     -1     -1 |      -1     -1     -1
12:08:28     -1     -1     -1 |      -1     -1     -1
12:08:29     95     -1    100 |       0     -1      0
12:08:30     95     97     98 |       0      0      0
12:08:31     97    100    100 |       0      0      0
12:08:32     87    100     89 |       0      0      0
12:08:33    100    100    100 |       0      0      0
12:08:34    100    100     99 |       0      0      0
12:08:35    100     97     97 |       0      0      0
12:08:36     99     98    100 |       0      0      0

What you didn't say is does this fail all the time or intermittently.  If
intermittent it will indeed be hard to track down, but there is hope too ;)

Have you tried playing back a file with colmux yet?  If not, you can simply
rerun the command but include -p and point it to the raw files.  The one
thing I did discover is I think I introduced a bug some time in the past
and you need to have the hostname portion of the string start with a wild
card rather than anywhere in the middle.  And then to make matters worse I
found a second bug and am using the wrong column during playback.  more
digging into that required too.  ;(

BUT if I add 1 to each column I think this looks right if you ignore what
the headers say:

stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
-command "-sC -oT -P -p
'/var/cache/collectl/*-mgmt-20160616-110000.raw.gz'" -cols 11,21|more

         [CPU:0]Totl%                  [CPU:1]Steal%
#Time    1-mgmt 2-mgmt 3-mgmt |  1-mgmt 2-mgmt 3-mgmt
     99     99    100 |       0      0      0
     98     99     97 |       0      0      0
     94     98     94 |       0      0      0
     94     93     92 |       0      0      0
     99     94     98 |       0      0      0
     99    100     99 |       0      0      0
     99    100    100 |       0      0      0

and since this is a playback command, you can use time ranges as well to
limit what is being displayed so I may help zero in on where in the data
the problem is and then maybe even send me a subset of the problem raw file
[use collectl --extract to create a new raw from from the time slice of an
old one].  then, maybe I can track down why this is happening.

-mark

On Wed, Jun 15, 2016 at 8:35 PM, Hernan Laffitte <her...@gm...>
wrote:

> Hello,
>
> We are trying to gather detailed CPU usage from a number of machines in
> our cluster. In particular, we want to see usage of every individual CPU in
> a group of machines.
>
> With collectl, on a single machine, the command we can run is:
>
>    collectl -sC -oT -P
>
> Which gives us 282 columns (the machines have 28 CPU's).
>
> Now we want to run a colmux command to see the idle time of CPU's 0 and 1
> on 3 machines. This is columns 10 and 20 ("[CPU:0]Idle%" and
> "[CPU:1]Idle%"). The command we use is:
>
>    colmux -addr 'machine-[1-3]' -command "-sC -oT -P" -cols 10,20
>
> This generates the error:
>
>    Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.
>
> The error occurs when parsing the field "lasttime" of a data structure
> $hostVars, which has the following content at the time of the error:
>
> {
>           'lasttime' => [
>                           '',
>                           '20160615'
>                         ],
>           'maxinst' => [
>                          -1,
>                          0
>                        ],
>           'lastinst' => [
>                           -1,
>                           0
>                         ],
>           'bufptr' => 1
> };
>
> I am currently running version "collectl V3.6.9-1
> (zlib:2.06,HiRes:1.9725)" on Debian. Any idea of what may be the problem
> here?
>
>
> Thanks in advance,
>
> Hernan
>
>
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning
> reports.
> http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
> _______________________________________________
> Collectl-interest mailing list
> Col...@li...
> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>
>

[Collectl-interest] colmux time format error

From: Hernan L. <her...@gm...> - 2016-06-16 00:35:29

Hello,

We are trying to gather detailed CPU usage from a number of machines in our
cluster. In particular, we want to see usage of every individual CPU in a
group of machines.

With collectl, on a single machine, the command we can run is:

   collectl -sC -oT -P

Which gives us 282 columns (the machines have 28 CPU's).

Now we want to run a colmux command to see the idle time of CPU's 0 and 1
on 3 machines. This is columns 10 and 20 ("[CPU:0]Idle%" and
"[CPU:1]Idle%"). The command we use is:

   colmux -addr 'machine-[1-3]' -command "-sC -oT -P" -cols 10,20

This generates the error:

   Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.

The error occurs when parsing the field "lasttime" of a data structure
$hostVars, which has the following content at the time of the error:

{
          'lasttime' => [
                          '',
                          '20160615'
                        ],
          'maxinst' => [
                         -1,
                         0
                       ],
          'lastinst' => [
                          -1,
                          0
                        ],
          'bufptr' => 1
};

I am currently running version "collectl V3.6.9-1 (zlib:2.06,HiRes:1.9725)"
on Debian. Any idea of what may be the problem here?


Thanks in advance,

Hernan

Re: [Collectl-interest] Suggestion: Additional NFS data from /proc/self/mountstats (same as nfsiostat-command)

From: Mark S. <mj...@gm...> - 2016-03-15 11:45:42

I was wondering it the tool might run something and measure times, clearly
something collectl can't do given the way it currently works - you wouldn't
want it to stall waiting for something.

If you want to get an idea what it takes to write a collectl plugin, have a
look at /usr/share/collectl/misc.ph which is a farily simply plugin that
does a lot of things one could even exclude depending on your needs.  To
use it you simply do something like this:

$ collectl --import misc
waiting for 1 second sample...
#<------Misc------>
# UTim  MHz MT Log
     5 1272  0   5
     5 1800  0   5
     5 1236  0   5

and what you see are the cpu speed, nfs mounts (I actually forgot it did
this ;)) and how many users are logged in.  Naturally being fully
integrated, you can also combine its output with other things collectl
knows about such as like this, noting collectl also has it's own version of
hello world':

$ collectl --import hello:misc -sc
waiting for 1 second sample...
#<----CPU[HYPER]-----><-Hello-><------Misc------>
#cpu sys inter  ctxsw   Total   UTim  MHz MT Log
   2   0  7877  32674     140       5 1218  0   5
   1   0  5161  16135     230       5 1416  0   5


and you can also get the output included in the tab file so you can plot it.

-mark



On Mon, Mar 14, 2016 at 11:45 AM, Thomas Oliw <tho...@er...>
wrote:

> Hi Mark,
>
>
>
> Thanks!
>
>
>
> Yes, another switch.. How I have waited! J
>
>
>
> I looked at /proc/self/mountstats and could not see any timingdata
> either.. So where nfsiostat gets the RTT values is a bit of a mystery.
>
> I did find some references of a nfs-iostat.py script that might give a
> clue.
>
>
> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=tools/nfs-iostat/nfs-iostat.py;h=9626d42609b9485c7fda0c9ef69d698f9fa929fd;hb=HEAD
> )
>
> I think it runs several times and calculates delta?!
>
>
>
> If it helps, this is output from nfsiostat on one of our RedHat 6.5
> servers:
>
> (As you can see the RTT for write operations are very high).
>
>
>
> [root@myserver collectl]# nfsiostat 10 2 /proj/eiffel002_config_fem001
>
>
>
> nfsserv.somedomain.se:/vol/volp01234/data_config mounted on
> /proj/data_config_server001:
>
>
>
>    op/s         rpc bklog
>
> 218.81            0.00
>
> read:             ops/s            kB/s           kB/op
> retrans         avg RTT (ms)    avg exe (ms)
>
>                  12.293         697.798          56.763        0
> (0.0%)           7.972          15.149
>
> write:            ops/s            kB/s           kB/op
> retrans         avg RTT (ms)    avg exe (ms)
>
>                  38.325         1322.385         34.505      392
> (0.0%)          35.313         1237.445
>
>
>
> nfsserv.somedomain.se:/vol/volp01234/data_config mounted on
> /proj/data_config_server001:
>
>
>
>    op/s         rpc bklog
>
>   68.20            0.00
>
> read:             ops/s            kB/s           kB/op
> retrans         avg RTT (ms)    avg exe (ms)
>
>                   0.000           0.000           0.000        0
> (0.0%)           0.000           0.000
>
> write:            ops/s            kB/s           kB/op
> retrans         avg RTT (ms)    avg exe (ms)
>
>                  69.700         4481.492         64.297        0
> (0.0%)         658.139         71380.063
>
>
>
>
>
> I understand that  this is hard to build and test without NFS systems!
>
> I just wanted to throw out the suggestion and see what happens.
>
>
>
> Learning to write a plugin for collectl is tempting. I am not a
> programmer, but have fiddled around with some simple perl scripts in the
> past.
>
> I’ll do some reading on the webpage and try to get an idea of how the
> plugin stuff works.
>
> You’ll never get rid of me if I start.. J
>
>
>
> Kind Regards,
>
>
>
> Thomas
>
>
>
>
>
>
>
> *From:* Mark Seger [mailto:mj...@gm...]
> *Sent:* den 14 mars 2016 14:56
> *To:* Thomas Oliw
> *Cc:* col...@li...
> *Subject:* Re: [Collectl-interest] Suggestion: Additional NFS data from
> /proc/self/mountstats (same as nfsiostat-command)
>
>
>
> Always happy to hear from happy users.
>
>
>
> I just looked at /proc/xx/mountstats, which actually applies to all pids,
> self is just a shortcut to yourself.  The problem with pid-based stats is
> it can be a lot of overhead to read any more stats than collectl already
> reads, but my thought was I might be able to add something optionally.  Oh
> boy, another switch!  ;)
>
>
>
> But when I looks at these stats I did't see anything about timing and only
> saw info on what is mounted.  That said, I'd think since nfs is a shared
> resource, there might be timing data for nfs in generat, but my systems
> currently don't use nfs and I might need to do some experiments to see what
> happens if/when I do configure it.
>
>
>
> Worse case, especially if you're a collectl fan, you might be able to
> write your own plugin if you're a perl user.  The benefit there is once you
> see how easy it is to write a plugin you then might be able to add even
> more metrics, possibly at the application level if you find that useful.
> If so, I'm always ready to help...
>
>
>
> I'm out of town this week but I'll try to revisit next week when I return.
>
>
>
> -mark
>
>
>
> On Mon, Mar 14, 2016 at 8:12 AM, Thomas Oliw <tho...@er...>
> wrote:
>
> Hi,
>
>
>
> I love collectl and use it extensively for many performance related
> troubleshooting/monitoring  tasks in our server park.
>
> The possibility to run live and/or record to file is a fantastic mix of
> features and very useful!
>
>
>
> However, one thing that I miss, is NFS Response time data…
>
> We use lots of NFS shares in our environment, and that particular metric
> is one of the most useful ones in my opinion.
>
>
>
> As a complement to collectl, I use “nfsiostat” when NFS is suspected to be
> a performance bottleneck.
>
> It shows me a number of good metrics and has a “RTT” (Round Trip Time)
> field, that at least gives me a hint of the NFS server responsetime.
>
> If I read the documentation correct, it gets its data from  /proc/self/mountstats.
>
>
>
> I think it would be very useful if those metrics could be collected in collectl as well.
>
> The nfsiostat tool itself is a bit crude, at least in our a bit aged
> RedHat environment and for us it would be convenient to have these metrics
> managed with collectl instead.
>
>
>
> Just a suggestion…
>
> Thanks for the collectl tool!!
>
>
>
> Kind Regards,
>
>
>
> Thomas Oliw
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
> _______________________________________________
> Collectl-interest mailing list
> Col...@li...
> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>
>
>

Re: [Collectl-interest] Suggestion: Additional NFS data from /proc/self/mountstats (same as nfsiostat-command)

From: Thomas O. <tho...@er...> - 2016-03-14 15:45:13

Hi Mark,

Thanks!

Yes, another switch.. How I have waited! ☺

I looked at /proc/self/mountstats and could not see any timingdata either.. So where nfsiostat gets the RTT values is a bit of a mystery.
I did find some references of a nfs-iostat.py script that might give a clue.
http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=tools/nfs-iostat/nfs-iostat.py;h=9626d42609b9485c7fda0c9ef69d698f9fa929fd;hb=HEAD)
I think it runs several times and calculates delta?!

If it helps, this is output from nfsiostat on one of our RedHat 6.5 servers:
(As you can see the RTT for write operations are very high).

[root@myserver collectl]# nfsiostat 10 2 /proj/eiffel002_config_fem001

nfsserv.somedomain.se:/vol/volp01234/data_config mounted on /proj/data_config_server001:

   op/s         rpc bklog
218.81            0.00
read:             ops/s            kB/s           kB/op         retrans         avg RTT (ms)    avg exe (ms)
                 12.293         697.798          56.763        0 (0.0%)           7.972          15.149
write:            ops/s            kB/s           kB/op         retrans         avg RTT (ms)    avg exe (ms)
                 38.325         1322.385         34.505      392 (0.0%)          35.313         1237.445

nfsserv.somedomain.se:/vol/volp01234/data_config mounted on /proj/data_config_server001:

   op/s         rpc bklog
  68.20            0.00
read:             ops/s            kB/s           kB/op         retrans         avg RTT (ms)    avg exe (ms)
                  0.000           0.000           0.000        0 (0.0%)           0.000           0.000
write:            ops/s            kB/s           kB/op         retrans         avg RTT (ms)    avg exe (ms)
                 69.700         4481.492         64.297        0 (0.0%)         658.139         71380.063


I understand that  this is hard to build and test without NFS systems!
I just wanted to throw out the suggestion and see what happens.

Learning to write a plugin for collectl is tempting. I am not a programmer, but have fiddled around with some simple perl scripts in the past.
I’ll do some reading on the webpage and try to get an idea of how the plugin stuff works.
You’ll never get rid of me if I start.. ☺

Kind Regards,

Thomas



From: Mark Seger [mailto:mj...@gm...]
Sent: den 14 mars 2016 14:56
To: Thomas Oliw
Cc: col...@li...
Subject: Re: [Collectl-interest] Suggestion: Additional NFS data from /proc/self/mountstats (same as nfsiostat-command)

Always happy to hear from happy users.

I just looked at /proc/xx/mountstats, which actually applies to all pids, self is just a shortcut to yourself.  The problem with pid-based stats is it can be a lot of overhead to read any more stats than collectl already reads, but my thought was I might be able to add something optionally.  Oh boy, another switch!  ;)

But when I looks at these stats I did't see anything about timing and only saw info on what is mounted.  That said, I'd think since nfs is a shared resource, there might be timing data for nfs in generat, but my systems currently don't use nfs and I might need to do some experiments to see what happens if/when I do configure it.

Worse case, especially if you're a collectl fan, you might be able to write your own plugin if you're a perl user.  The benefit there is once you see how easy it is to write a plugin you then might be able to add even more metrics, possibly at the application level if you find that useful.  If so, I'm always ready to help...

I'm out of town this week but I'll try to revisit next week when I return.

-mark

On Mon, Mar 14, 2016 at 8:12 AM, Thomas Oliw <tho...@er...<mailto:tho...@er...>> wrote:
Hi,

I love collectl and use it extensively for many performance related troubleshooting/monitoring  tasks in our server park.
The possibility to run live and/or record to file is a fantastic mix of features and very useful!

However, one thing that I miss, is NFS Response time data…
We use lots of NFS shares in our environment, and that particular metric is one of the most useful ones in my opinion.

As a complement to collectl, I use “nfsiostat” when NFS is suspected to be a performance bottleneck.
It shows me a number of good metrics and has a “RTT” (Round Trip Time) field, that at least gives me a hint of the NFS server responsetime.

If I read the documentation correct, it gets its data from  /proc/self/mountstats.



I think it would be very useful if those metrics could be collected in collectl as well.
The nfsiostat tool itself is a bit crude, at least in our a bit aged RedHat environment and for us it would be convenient to have these metrics managed with collectl instead.

Just a suggestion…
Thanks for the collectl tool!!

Kind Regards,

Thomas Oliw






------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Collectl-interest mailing list
Col...@li...<mailto:Col...@li...>
https://lists.sourceforge.net/lists/listinfo/collectl-interest

Re: [Collectl-interest] Suggestion: Additional NFS data from /proc/self/mountstats (same as nfsiostat-command)

From: Mark S. <mj...@gm...> - 2016-03-14 13:55:51

Always happy to hear from happy users.

I just looked at /proc/xx/mountstats, which actually applies to all pids,
self is just a shortcut to yourself.  The problem with pid-based stats is
it can be a lot of overhead to read any more stats than collectl already
reads, but my thought was I might be able to add something optionally.  Oh
boy, another switch!  ;)

But when I looks at these stats I did't see anything about timing and only
saw info on what is mounted.  That said, I'd think since nfs is a shared
resource, there might be timing data for nfs in generat, but my systems
currently don't use nfs and I might need to do some experiments to see what
happens if/when I do configure it.

Worse case, especially if you're a collectl fan, you might be able to write
your own plugin if you're a perl user.  The benefit there is once you see
how easy it is to write a plugin you then might be able to add even more
metrics, possibly at the application level if you find that useful.  If so,
I'm always ready to help...

I'm out of town this week but I'll try to revisit next week when I return.

-mark

On Mon, Mar 14, 2016 at 8:12 AM, Thomas Oliw <tho...@er...>
wrote:

> Hi,
>
>
>
> I love collectl and use it extensively for many performance related
> troubleshooting/monitoring  tasks in our server park.
>
> The possibility to run live and/or record to file is a fantastic mix of
> features and very useful!
>
>
>
> However, one thing that I miss, is NFS Response time data…
>
> We use lots of NFS shares in our environment, and that particular metric
> is one of the most useful ones in my opinion.
>
>
>
> As a complement to collectl, I use “nfsiostat” when NFS is suspected to be
> a performance bottleneck.
>
> It shows me a number of good metrics and has a “RTT” (Round Trip Time)
> field, that at least gives me a hint of the NFS server responsetime.
>
> If I read the documentation correct, it gets its data from  /proc/self/mountstats.
>
>
>
> I think it would be very useful if those metrics could be collected in collectl as well.
>
> The nfsiostat tool itself is a bit crude, at least in our a bit aged
> RedHat environment and for us it would be convenient to have these metrics
> managed with collectl instead.
>
>
>
> Just a suggestion…
>
> Thanks for the collectl tool!!
>
>
>
> Kind Regards,
>
>
>
> Thomas Oliw
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
> _______________________________________________
> Collectl-interest mailing list
> Col...@li...
> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>
>

[Collectl-interest] Suggestion: Additional NFS data from /proc/self/mountstats (same as nfsiostat-command)

From: Thomas O. <tho...@er...> - 2016-03-14 12:13:09

Hi,

I love collectl and use it extensively for many performance related troubleshooting/monitoring  tasks in our server park.
The possibility to run live and/or record to file is a fantastic mix of features and very useful!

However, one thing that I miss, is NFS Response time data...
We use lots of NFS shares in our environment, and that particular metric is one of the most useful ones in my opinion.

As a complement to collectl, I use "nfsiostat" when NFS is suspected to be a performance bottleneck.
It shows me a number of good metrics and has a "RTT" (Round Trip Time) field, that at least gives me a hint of the NFS server responsetime.

If I read the documentation correct, it gets its data from  /proc/self/mountstats<file:///\\proc\self\mountstats>.



I think it would be very useful if those metrics could be collected in collectl as well.
The nfsiostat tool itself is a bit crude, at least in our a bit aged RedHat environment and for us it would be convenient to have these metrics managed with collectl instead.

Just a suggestion...
Thanks for the collectl tool!!

Kind Regards,

Thomas Oliw

3 messages has been excluded from this view by a project administrator.

Flat | Threaded

1 2 3 .. 20 > >> (Page 1 of 20)