You can subscribe to this list here.
| 2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(1) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2008 |
Jan
|
Feb
|
Mar
(7) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(5) |
Nov
(5) |
Dec
(8) |
| 2009 |
Jan
(1) |
Feb
|
Mar
(3) |
Apr
(1) |
May
|
Jun
(3) |
Jul
(12) |
Aug
(1) |
Sep
(1) |
Oct
(1) |
Nov
(2) |
Dec
(11) |
| 2010 |
Jan
(14) |
Feb
(16) |
Mar
(2) |
Apr
|
May
|
Jun
(5) |
Jul
(6) |
Aug
(27) |
Sep
(20) |
Oct
(2) |
Nov
|
Dec
|
| 2011 |
Jan
|
Feb
(5) |
Mar
(66) |
Apr
(8) |
May
(2) |
Jun
(7) |
Jul
(2) |
Aug
(16) |
Sep
|
Oct
(7) |
Nov
(1) |
Dec
|
| 2012 |
Jan
|
Feb
(4) |
Mar
(14) |
Apr
|
May
(3) |
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
(26) |
Nov
(1) |
Dec
|
| 2013 |
Jan
|
Feb
(3) |
Mar
(34) |
Apr
(9) |
May
|
Jun
|
Jul
(1) |
Aug
(4) |
Sep
|
Oct
(8) |
Nov
(18) |
Dec
|
| 2014 |
Jan
(5) |
Feb
(7) |
Mar
(1) |
Apr
(2) |
May
(6) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2015 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(3) |
May
(5) |
Jun
(7) |
Jul
(2) |
Aug
(4) |
Sep
(13) |
Oct
|
Nov
(1) |
Dec
|
| 2016 |
Jan
|
Feb
(4) |
Mar
(4) |
Apr
|
May
|
Jun
(5) |
Jul
|
Aug
|
Sep
|
Oct
(5) |
Nov
(1) |
Dec
|
| 2017 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
| 2020 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Puneet B. <bak...@gm...> - 2020-02-07 05:26:59
|
Hi, I want to gather lustre stats from collectl but I am not getting blank lines. What is wrong, I am doing Collectl version (3.6.9) collectl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the source kit root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9# Perl version (v5.26.1) root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9# perl --version This is perl 5, version 26, subversion 1 (v5.26.1) built for x86_64-linux-gnu-thread-multi (with 67 registered patches, see perl -V for more detail) ::: Lustre version (2.12.2_178_ga0680fe_dirty) root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9# lctl get_param version version=2.12.2_178_ga0680fe_dirty root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9# uname -a Linux dgx1 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9# uname -r 4.15.0-45-generic Collectl lustre run (getting blank lines) root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9# ./collectl.pl --verbose --import lustreMDS,s Use of uninitialized value $strace in pattern match (m//) at ./formatit.ph line 178. Use of uninitialized value $speed in numeric gt (>) at ./formatit.ph line 181. waiting for 1 second sample... ### RECORD 1 >>> dgx1 <<< (1581052775.001) (Fri Feb 7 10:49:35 2020) ### ### RECORD 2 >>> dgx1 <<< (1581052776.001) (Fri Feb 7 10:49:36 2020) ### ### RECORD 3 >>> dgx1 <<< (1581052777.001) (Fri Feb 7 10:49:37 2020) ### ### RECORD 4 >>> dgx1 <<< (1581052778.001) (Fri Feb 7 10:49:38 2020) ### ^COuch! root@dgx1:~/pb/collectl/3.6.9/collectl-3.6.9# ./collectl.pl --verbose --import lustreOSS,s Use of uninitialized value $strace in pattern match (m//) at ./formatit.ph line 178. Use of uninitialized value $speed in numeric gt (>) at ./formatit.ph line 181. waiting for 1 second sample... ### RECORD 1 >>> dgx1 <<< (1581053094.001) (Fri Feb 7 10:54:54 2020) ### ### RECORD 2 >>> dgx1 <<< (1581053095.001) (Fri Feb 7 10:54:55 2020) ### ### RECORD 3 >>> dgx1 <<< (1581053096.001) (Fri Feb 7 10:54:56 2020) ### ^COuch! When tried using updated version of collectl (V4.3.1-1), same problem persists. root@dgx1:~/pb/collectl/4.3.1/collectl-4.3.1# collectl --version collectl V4.3.1-1 (zlib:2.074,HiRes:1.9741) Copyright 2003-2018 Hewlett-Packard Development Company, L.P. collectl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the source kit root@dgx1:~/pb/collectl/4.3.1/collectl-4.3.1# root@dgx1:~/pb/collectl/4.3.1/collectl-4.3.1# ./collectl --verbose --import lustreMDS,s Use of uninitialized value $strace in pattern match (m//) at /root/pb/collectl/4.3.1/collectl-4.3.1/formatit.ph line 178. Use of uninitialized value $speed in numeric gt (>) at /root/pb/collectl/4.3.1/collectl-4.3.1/formatit.ph line 181. waiting for 1 second sample... ### RECORD 1 >>> dgx1 <<< (1581052937.001) (Fri Feb 7 10:52:17 2020) ### ### RECORD 2 >>> dgx1 <<< (1581052938.001) (Fri Feb 7 10:52:18 2020) ### Regards, ~Puneet |
|
From: Puneet B. <bak...@gm...> - 2020-02-07 04:58:37
|
Why collectl (V4.1.0-1) does not have subsystem "lustre"? What am I missing?
root@dgx1:~/pb/collectl/4.1.0/collectl-4.1.0# collectl --version
collectl V4.1.0-1 (zlib:2.074,HiRes:1.9741)
Copyright 2003-2016 Hewlett-Packard Development Company, L.P.
collectl may be copied only under the terms of either the Artistic License
or the GNU General Public License, which may be found in the source kit
root@dgx1:~/pb/collectl/4.1.0/collectl-4.1.0#
root@dgx1:~/pb/collectl/4.1.0/collectl-4.1.0# collectl --showsubsys
The following subsystems can be specified in any combinations with -s or
--subsys in both record and playbackmode. [default=bcdfijmnstx]
These generate summary, which is the total of ALL data for a particular type
b - buddy info (memory fragmentation)
c - cpu
d - disk
f - nfs
i - inodes
j - interrupts by CPU
m - memory
n - network
s - sockets
t - tcp
x - interconnect (currently supported: OFED/Infiniband)
y - slabs
These generate detail data, typically but not limited to the device level
C - individual CPUs, including interrupts if -sj or -sJ
D - individual Disks
E - environmental (fan, power, temp) [requires ipmitool]
F - nfs data
J - interrupts by CPU by interrupt number
M - memory numa/node
N - individual Networks
T - tcp details (lots of data!)
X - interconnect ports/rails (Infiniband/Quadrics)
Y - slabs/slubs
Z - processes
An alternative format lets you add and/or subtract subsystems to the
defaults by
immediately following -s with a + and/or -
eg: -s+YZ-x adds slabs & processes and removes interconnet summary data
-s-n removes network summary data
-s-all removes ALL subsystems, something that can handy when playing
back
data collected with --import and you ONLY want to see that data
root@dgx1:~/pb/collectl/4.1.0/collectl-4.1.0#
On Thu, Feb 6, 2020 at 3:34 PM Puneet Bakshi <bak...@gm...>
wrote:
> Hi,
>
> I want to use collectl (V4.1.0-1) to get lustre
> (version=2.12.2_178_ga0680fe_dirty) specific stats. But, it says "-sl
> disabled because this system does not have lustre modules installed"! But,
> system does have the necessary lustre modules. Can somebody help in
> resolving this issue.
>
> root@dgx1:~# collectl -sL
> Use of uninitialized value $strace in pattern match (m//) at
> /usr/share/collectl/formatit.ph line 178.
> Use of uninitialized value $speed in numeric gt (>) at /usr/share/collectl/
> formatit.ph line 181.
> -sl disabled because this system does not have lustre modules installed
> Error: no subsystems selected
> type 'collectl -h' for help
> root@dgx1:~#
>
> root@dgx1:~# collectl -sl
> Error: invalid subsystem 'l'
> type 'collectl -h' for help
> root@dgx1:~#
>
>
> Following are the system details.
>
> root@dgx1:~# uname -r
> 4.15.0-45-generic
>
> root@dgx1:~# uname -a
> Linux dgx1 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019
> x86_64 x86_64 x86_64 GNU/Linux
>
> root@dgx1:~# lctl get_param version
> version=2.12.2_178_ga0680fe_dirty
>
> root@dgx1:~# lsmod | grep lustre
> lustre 737280 2093
> lmv 180224 3 lustre
> mdc 237568 3 lustre
> lov 311296 1397 lustre
> ptlrpc 1306624 8 fld,osc,fid,mgc,lov,mdc,lmv,lustre
> obdclass 2158592 1421
> fld,osc,fid,ptlrpc,mgc,lov,mdc,lmv,lustre
> lnet 557056 7 osc,ko2iblnd,obdclass,ptlrpc,mgc,lmv,lustre
> libcfs 471040 12
> fld,lnet,osc,fid,ko2iblnd,obdclass,ptlrpc,mgc,lov,mdc,lmv,lustre
>
> root@dgx1:~# collectl --version
> collectl V4.1.0-1 (zlib:2.074,HiRes:1.9741)
> Copyright 2003-2016 Hewlett-Packard Development Company, L.P.
> collectl may be copied only under the terms of either the Artistic License
> or the GNU General Public License, which may be found in the source kit
>
> Regards,
> ~Puneet
>
|
|
From: Puneet B. <bak...@gm...> - 2020-02-06 10:05:31
|
Hi, I want to use collectl (V4.1.0-1) to get lustre (version=2.12.2_178_ga0680fe_dirty) specific stats. But, it says "-sl disabled because this system does not have lustre modules installed"! But, system does have the necessary lustre modules. Can somebody help in resolving this issue. root@dgx1:~# collectl -sL Use of uninitialized value $strace in pattern match (m//) at /usr/share/collectl/formatit.ph line 178. Use of uninitialized value $speed in numeric gt (>) at /usr/share/collectl/ formatit.ph line 181. -sl disabled because this system does not have lustre modules installed Error: no subsystems selected type 'collectl -h' for help root@dgx1:~# root@dgx1:~# collectl -sl Error: invalid subsystem 'l' type 'collectl -h' for help root@dgx1:~# Following are the system details. root@dgx1:~# uname -r 4.15.0-45-generic root@dgx1:~# uname -a Linux dgx1 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux root@dgx1:~# lctl get_param version version=2.12.2_178_ga0680fe_dirty root@dgx1:~# lsmod | grep lustre lustre 737280 2093 lmv 180224 3 lustre mdc 237568 3 lustre lov 311296 1397 lustre ptlrpc 1306624 8 fld,osc,fid,mgc,lov,mdc,lmv,lustre obdclass 2158592 1421 fld,osc,fid,ptlrpc,mgc,lov,mdc,lmv,lustre lnet 557056 7 osc,ko2iblnd,obdclass,ptlrpc,mgc,lmv,lustre libcfs 471040 12 fld,lnet,osc,fid,ko2iblnd,obdclass,ptlrpc,mgc,lov,mdc,lmv,lustre root@dgx1:~# collectl --version collectl V4.1.0-1 (zlib:2.074,HiRes:1.9741) Copyright 2003-2016 Hewlett-Packard Development Company, L.P. collectl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the source kit Regards, ~Puneet |
|
From: Mark S. <mj...@gm...> - 2019-10-25 12:54:56
|
Thanks for the patch but unfortunately for collectl i retired a few months
ago and have not been able to convince anyone at HPE to pick it up and
continue to support it. I do keep hoping someone will and I'd be more than
happy to help answer any development questions.
-mark
On Wed, Oct 23, 2019, 8:48 AM Florian, BERBAR <flo...@at...>
wrote:
> Hello,
>
> The information sent by collectl to graphite_exporter trigger a bug during the formatting of environnement data :
> Aug 18 01:45:33 <HOSTNAME> graphite_expo Invalid value rter[15097]: time="2019-08-18T01:45:33+02:00" level=info msg="Invalid value in line: <HOSTNAME>.env.BTemp1 sf 1566085530" source="main.go:112"
>
> The value of environnement data was set to « sf » that return a "Invalid value" notification.
>
> A unitary execution of the ‘Environment subsystem’ (E parameter of the -s option) between a host and the grafite daemon show this behaviour :
>
> <host># collectl -sE -i::1 -c1 --export graphite,<graphite_host>
>
> <graphite_host># tcpdump
> [...]
> 0x0000: 4500 0057 9ca7 4000 4006 4545 0a82 21b3 E..W..@.@.EE..!.
> 0x0010: 0a82 21fe dafe 07d3 57d9 6ddd fbf7 1fb5 ..!.....W.m.....
> 0x0020: 8018 00de cdde 0000 0101 080a a2e9 f028 ...............(
> 0x0030: a2f4 333b 0000 0000 0000 0000 002e 656e ..3;<HOSTNAME>.en
> 0x0040: 762e 4254 656d 7032 2073 6620 3135 3636 v.BTemp2.sf.1566
> 0x0050: 3339 3231 3835 0a 392185.
>
>
> <host># tcpdump
> [...]
> 0x0000: 4500 0057 b54f 4000 4006 2c9d 0a82 21b3 E..W.O@.@.,...!.
> 0x0010: 0a82 21fe dafe 07d3 57de dd56 fbf7 1fb5 ..!.....W..V....
> 0x0020: 8018 00de 2e44 0000 0101 080a a2f1 09af .....D..........
> 0x0030: a2fb 4cc2 0000 0000 0000 0000 002e 656e ..L.<HOSTNAME>.en
> 0x0040: 762e 4254 656d 7032 2073 6620 3135 3636 v.BTemp2.sf.1566
> 0x0050: 3339 3236 3530 0a 392650.
>
> This two network dumps show the issues at data sending time by the host involving the corrupt data received by the graphite host.
>
> The execution without exporting data to graphite host shows good values :
>
> <host># collectl -sE -i::1 -c1 -f /var/log/collectl/
> <host># ls -l /var/log/collectl/<host>2-20190821-145917.raw.gz
> -rw-r--r--. 1 root root 975 Aug 21 14:59 /var/log/collectl/<host>-20190821-145917.raw.gz
> <host># zgrep "Blade Temp" /var/log/collectl/<host>-20190821-145917.raw.gz
> ipmi: Blade Temp1,28,degrees C,ok
> ipmi: Blade Temp2,31,degrees C,ok
> ipmi: Blade Temp1,28,degrees C,ok
> ipmi: Blade Temp2,31,degrees C,ok
>
> The collected data is sent to graphite deamon using the sendData function defined at line 445 of /usr/share/collect/graphite.ph file. This function take 4 arguments. The 4th arguments is the float precision used during data formating :
> 445 sendData("env.$name$inst", $name, $ipmiData->{$key}->[$i]->{value}, '%s');
> [...]
> 460 sub sendData
> 461{
> 462 my $name= shift;
> 463 my $units=shift;
> 664 my $value=shift;
> 465 my $numpl=shift; # number of decimal places
> [...]
> 516 my $valString=(!defined($numpl)) ? sprintf('%d', $value) : sprintf("%.${numpl}f", $value);
> 517 my $message=sprintf("$graphiteBefore$graphiteMyHost$graphitePost.$name $valString %d\n", $graphiteIntTimeLast);
> 518 print $message if $graphiteDebug & 1;
> 519 if (!($graphiteDebug & 8))
> 520 {
> 521 my $bytes=syswrite($graphiteSocket, $message, length($message), 0);
> 522 }
>
> The float precision is set to ‘%s’ string which can’t be used as part of the float string format used at line 516. The string "%s" gives the string format "%.%sf". The values sent are therefore set to "sf" instead of a floating point number.
>
> The fix is to set the precision parameter to a integer constant instead of the string '%s'. A patch was add in attachment (The second correction is just to make the string format generation more human readable).
>
>
> Thank you
>
> Florian
> _______________________________________________
> Collectl-interest mailing list
> Col...@li...
> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>
|
|
From: Florian, B. <flo...@at...> - 2019-10-23 12:48:02
|
Hello,
The information sent by collectl to graphite_exporter trigger a bug during the formatting of environnement data :
Aug 18 01:45:33 <HOSTNAME> graphite_expo Invalid value rter[15097]: time="2019-08-18T01:45:33+02:00" level=info msg="Invalid value in line: <HOSTNAME>.env.BTemp1 sf 1566085530" source="main.go:112"
The value of environnement data was set to « sf » that return a "Invalid value" notification.
A unitary execution of the ‘Environment subsystem’ (E parameter of the -s option) between a host and the grafite daemon show this behaviour :
<host># collectl -sE -i::1 -c1 --export graphite,<graphite_host>
<graphite_host># tcpdump
[...]
0x0000: 4500 0057 9ca7 4000 4006 4545 0a82 21b3 E..W..@.@.EE..!.
0x0010: 0a82 21fe dafe 07d3 57d9 6ddd fbf7 1fb5 ..!.....W.m.....
0x0020: 8018 00de cdde 0000 0101 080a a2e9 f028 ...............(
0x0030: a2f4 333b 0000 0000 0000 0000 002e 656e ..3;<HOSTNAME>.en
0x0040: 762e 4254 656d 7032 2073 6620 3135 3636 v.BTemp2.sf.1566
0x0050: 3339 3231 3835 0a 392185.
<host># tcpdump
[...]
0x0000: 4500 0057 b54f 4000 4006 2c9d 0a82 21b3 E..W.O@.@.,...!.
0x0010: 0a82 21fe dafe 07d3 57de dd56 fbf7 1fb5 ..!.....W..V....
0x0020: 8018 00de 2e44 0000 0101 080a a2f1 09af .....D..........
0x0030: a2fb 4cc2 0000 0000 0000 0000 002e 656e ..L.<HOSTNAME>.en
0x0040: 762e 4254 656d 7032 2073 6620 3135 3636 v.BTemp2.sf.1566
0x0050: 3339 3236 3530 0a 392650.
This two network dumps show the issues at data sending time by the host involving the corrupt data received by the graphite host.
The execution without exporting data to graphite host shows good values :
<host># collectl -sE -i::1 -c1 -f /var/log/collectl/
<host># ls -l /var/log/collectl/<host>2-20190821-145917.raw.gz
-rw-r--r--. 1 root root 975 Aug 21 14:59 /var/log/collectl/<host>-20190821-145917.raw.gz
<host># zgrep "Blade Temp" /var/log/collectl/<host>-20190821-145917.raw.gz
ipmi: Blade Temp1,28,degrees C,ok
ipmi: Blade Temp2,31,degrees C,ok
ipmi: Blade Temp1,28,degrees C,ok
ipmi: Blade Temp2,31,degrees C,ok
The collected data is sent to graphite deamon using the sendData function defined at line 445 of /usr/share/collect/graphite.ph file. This function take 4 arguments. The 4th arguments is the float precision used during data formating :
445 sendData("env.$name$inst", $name, $ipmiData->{$key}->[$i]->{value}, '%s');
[...]
460 sub sendData
461{
462 my $name= shift;
463 my $units=shift;
664 my $value=shift;
465 my $numpl=shift; # number of decimal places
[...]
516 my $valString=(!defined($numpl)) ? sprintf('%d', $value) : sprintf("%.${numpl}f", $value);
517 my $message=sprintf("$graphiteBefore$graphiteMyHost$graphitePost.$name $valString %d\n", $graphiteIntTimeLast);
518 print $message if $graphiteDebug & 1;
519 if (!($graphiteDebug & 8))
520 {
521 my $bytes=syswrite($graphiteSocket, $message, length($message), 0);
522 }
The float precision is set to ‘%s’ string which can’t be used as part of the float string format used at line 516. The string "%s" gives the string format "%.%sf". The values sent are therefore set to "sf" instead of a floating point number.
The fix is to set the precision parameter to a integer constant instead of the string '%s'. A patch was add in attachment (The second correction is just to make the string format generation more human readable).
Thank you
Florian
|
|
From: Mark S. <mj...@gm...> - 2019-08-13 20:03:57
|
After developing and supporting collectl for close to 20 years, I have retired and as a result no longer have access to systems suitable for providing support. I have contacted several people at my previous employers looking for someone to take over but to date have not had any responses. Therefore if someone wants to raise their hand and just fork a new instance by all means do so and I'll be happy to help in any way I can. I would recommend a reincarnation happen in github. Meanwhile I will continue to provide email support, I just don't have the means to make/test code changes. Of course if someone would like me to provide some personal handholding or even training, shoot me an email and we can discuss. -mark |
|
From: Mark S. <mj...@gm...> - 2017-06-20 20:34:42
|
quite honestly there are so many stats I'm not that familiar with all of them in detail. for example memory stats come from /proc/meminfo and /proc/vmstat and virtually all utilities that report the same data. might be helpful to look at some other utilities to confirm they're reporting the same numbers and how they've documented it. if there are better words I can add to collectl descripting I'll be happy to do so -mark On Jun 20, 2017 4:39 AM, "Fulvio Scapin" <tra...@gm...> wrote: Hello. I was emptying out a swap volume (swapoff) and looking at the output of collectl -sm --verbose -oT , while trying to make sense of a system with a continuously rising swap occupation despite vm.swappiness set to 0. Since the total swap memory was progressively shrinking down from a few hundreds MB, I can't quite understand the values of swapped-in memory in the GB range ( e.g. 6220M, 8536M ). Is it a collectl bug or just a misunderstanding of mine? # uname -a Linux mail 4.4.0-79-generic #100-Ubuntu SMP Wed May 17 19:58:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux # MEMORY SUMMARY # <------------------------------------Physical Memory------------------------------------------><---------- -Swap------------><-------Paging------> # Total Used Free Buff Cached Slab Mapped Anon AnonH Commit Locked Inact Total Used Free In Out Fault MajFt In Out 10:24:46 28143M 27931M 216776K 1504M 5453M 1089M 348180K 19686M 17818M 21190M 3656K 3186M 247M 247M 0 6220M 0 15 0 6240 164 10:24:47 28143M 27936M 211496K 1504M 5454M 1089M 348180K 19691M 17818M 21190M 3656K 3186M 240M 240M 0 6472M 0 1 0 6472 1860 10:24:48 28143M 27945M 203004K 1504M 5455M 1089M 348180K 19698M 17818M 21190M 3656K 3186M 232M 232M 0 8484M 0 44 0 9020 664 10:24:49 28143M 27927M 220972K 1500M 5437M 1089M 347972K 19702M 17818M 21190M 3656K 3180M 227M 227M 0 4768M 0 113 0 4772 2876 10:24:50 28143M 27934M 214276K 1500M 5438M 1089M 347972K 19708M 17818M 21190M 3656K 3179M 221M 221M 0 6752M 0 67 0 7464 1468 10:24:51 28143M 27943M 204968K 1500M 5439M 1089M 347976K 19715M 17818M 21190M 3656K 3179M 212M 212M 0 8536M 0 278 0 9184 364 10:24:52 28143M 27927M 220716K 1498M 5419M 1089M 347908K 19721M 17816M 21190M 3656K 3173M 205M 205M 0 7428M 0 388 0 7724 1444 10:24:53 28143M 27934M 213628K 1498M 5421M 1089M 347908K 19728M 17816M 21190M 3656K 3173M 197M 197M 0 7812M 0 95 0 7864 320 10:24:54 28143M 27941M 206700K 1499M 5423M 1089M 347908K 19732M 17816M 21190M 3656K 3173M 192M 192M 0 5576M 0 77 0 6080 388 10:24:55 28143M 27925M 223432K 1496M 5404M 1089M 347940K 19736M 17816M 21190M 3656K 3166M 186M 186M 0 5876M 0 53 7 5960 268 10:24:56 28143M 27933M 215256K 1497M 5407M 1089M 347824K 19739M 17816M 21190M 3656K 3167M 181M 181M 0 5720M 0 3721 0 6444 5568 10:24:57 28143M 27938M 209808K 1497M 5409M 1089M 347828K 19743M 17816M 21190M 3656K 3167M 176M 176M 0 4756M 0 67 0 4904 1312 10:24:58 28143M 27922M 225808K 1495M 5390M 1089M 347868K 19748M 17816M 21190M 3656K 3162M 171M 171M 0 4588M 0 559 1 5540 3680 10:24:59 28143M 27928M 220016K 1495M 5391M 1089M 347712K 19753M 17816M 21190M 3656K 3162M 166M 166M 0 5696M 0 45 0 5700 3056 10:25:00 28143M 27934M 213708K 1495M 5392M 1089M 347712K 19757M 17816M 21190M 3656K 3162M 161M 161M 0 4537M 0 77 12 4661 740 10:25:01 28143M 27939M 208256K 1495M 5393M 1089M 347768K 19761M 17816M 21155M 3656K 3162M 157M 157M 0 4000M 0 2374 2 4253 1122 10:25:02 28143M 27943M 204628K 1496M 5395M 1089M 347828K 19764M 17816M 21155M 3656K 3162M 153M 153M 0 4768M 0 724 8 5052 4244 10:25:03 28143M 27929M 218600K 1493M 5377M 1089M 347800K 19769M 17816M 21155M 3656K 3156M 146M 146M 0 6524M 0 90 4 6544 2748 10:25:04 28143M 27930M 217436K 1494M 5376M 1089M 347848K 19772M 17816M 21155M 3656K 3155M 142M 142M 0 4012M 0 103 0 4456 8308 10:25:05 28143M 27935M 212584K 1494M 5377M 1089M 347848K 19777M 17816M 21155M 3656K 3155M 138M 138M 0 4964M 0 64 0 4968 928 10:25:06 28143M 27940M 207656K 1494M 5379M 1089M 347872K 19780M 17816M 21155M 3656K 3155M 133M 133M 0 4900M 0 31 0 4928 316 10:25:07 28143M 27923M 224816K 1488M 5363M 1089M 347872K 19785M 17816M 21155M 3656K 3149M 125M 125M 0 5828M 0 147 10 5872 392 Thanks in advance and also thanks a lot for collectl as a tool. Regards, Fulvio Scapin ------------------------------------------------------------ ------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Collectl-interest mailing list Col...@li... https://lists.sourceforge.net/lists/listinfo/collectl-interest |
|
From: Fulvio S. <tra...@gm...> - 2017-06-20 08:38:24
|
Hello. I was emptying out a swap volume (swapoff) and looking at the output of collectl -sm --verbose -oT , while trying to make sense of a system with a continuously rising swap occupation despite vm.swappiness set to 0. Since the total swap memory was progressively shrinking down from a few hundreds MB, I can't quite understand the values of swapped-in memory in the GB range ( e.g. 6220M, 8536M ). Is it a collectl bug or just a misunderstanding of mine? # uname -a Linux mail 4.4.0-79-generic #100-Ubuntu SMP Wed May 17 19:58:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux # MEMORY SUMMARY # <------------------------------------Physical Memory------------------------------------------><-----------Swap------------><-------Paging------> # Total Used Free Buff Cached Slab Mapped Anon AnonH Commit Locked Inact Total Used Free In Out Fault MajFt In Out 10:24:46 28143M 27931M 216776K 1504M 5453M 1089M 348180K 19686M 17818M 21190M 3656K 3186M 247M 247M 0 6220M 0 15 0 6240 164 10:24:47 28143M 27936M 211496K 1504M 5454M 1089M 348180K 19691M 17818M 21190M 3656K 3186M 240M 240M 0 6472M 0 1 0 6472 1860 10:24:48 28143M 27945M 203004K 1504M 5455M 1089M 348180K 19698M 17818M 21190M 3656K 3186M 232M 232M 0 8484M 0 44 0 9020 664 10:24:49 28143M 27927M 220972K 1500M 5437M 1089M 347972K 19702M 17818M 21190M 3656K 3180M 227M 227M 0 4768M 0 113 0 4772 2876 10:24:50 28143M 27934M 214276K 1500M 5438M 1089M 347972K 19708M 17818M 21190M 3656K 3179M 221M 221M 0 6752M 0 67 0 7464 1468 10:24:51 28143M 27943M 204968K 1500M 5439M 1089M 347976K 19715M 17818M 21190M 3656K 3179M 212M 212M 0 8536M 0 278 0 9184 364 10:24:52 28143M 27927M 220716K 1498M 5419M 1089M 347908K 19721M 17816M 21190M 3656K 3173M 205M 205M 0 7428M 0 388 0 7724 1444 10:24:53 28143M 27934M 213628K 1498M 5421M 1089M 347908K 19728M 17816M 21190M 3656K 3173M 197M 197M 0 7812M 0 95 0 7864 320 10:24:54 28143M 27941M 206700K 1499M 5423M 1089M 347908K 19732M 17816M 21190M 3656K 3173M 192M 192M 0 5576M 0 77 0 6080 388 10:24:55 28143M 27925M 223432K 1496M 5404M 1089M 347940K 19736M 17816M 21190M 3656K 3166M 186M 186M 0 5876M 0 53 7 5960 268 10:24:56 28143M 27933M 215256K 1497M 5407M 1089M 347824K 19739M 17816M 21190M 3656K 3167M 181M 181M 0 5720M 0 3721 0 6444 5568 10:24:57 28143M 27938M 209808K 1497M 5409M 1089M 347828K 19743M 17816M 21190M 3656K 3167M 176M 176M 0 4756M 0 67 0 4904 1312 10:24:58 28143M 27922M 225808K 1495M 5390M 1089M 347868K 19748M 17816M 21190M 3656K 3162M 171M 171M 0 4588M 0 559 1 5540 3680 10:24:59 28143M 27928M 220016K 1495M 5391M 1089M 347712K 19753M 17816M 21190M 3656K 3162M 166M 166M 0 5696M 0 45 0 5700 3056 10:25:00 28143M 27934M 213708K 1495M 5392M 1089M 347712K 19757M 17816M 21190M 3656K 3162M 161M 161M 0 4537M 0 77 12 4661 740 10:25:01 28143M 27939M 208256K 1495M 5393M 1089M 347768K 19761M 17816M 21155M 3656K 3162M 157M 157M 0 4000M 0 2374 2 4253 1122 10:25:02 28143M 27943M 204628K 1496M 5395M 1089M 347828K 19764M 17816M 21155M 3656K 3162M 153M 153M 0 4768M 0 724 8 5052 4244 10:25:03 28143M 27929M 218600K 1493M 5377M 1089M 347800K 19769M 17816M 21155M 3656K 3156M 146M 146M 0 6524M 0 90 4 6544 2748 10:25:04 28143M 27930M 217436K 1494M 5376M 1089M 347848K 19772M 17816M 21155M 3656K 3155M 142M 142M 0 4012M 0 103 0 4456 8308 10:25:05 28143M 27935M 212584K 1494M 5377M 1089M 347848K 19777M 17816M 21155M 3656K 3155M 138M 138M 0 4964M 0 64 0 4968 928 10:25:06 28143M 27940M 207656K 1494M 5379M 1089M 347872K 19780M 17816M 21155M 3656K 3155M 133M 133M 0 4900M 0 31 0 4928 316 10:25:07 28143M 27923M 224816K 1488M 5363M 1089M 347872K 19785M 17816M 21155M 3656K 3149M 125M 125M 0 5828M 0 147 10 5872 392 Thanks in advance and also thanks a lot for collectl as a tool. Regards, Fulvio Scapin |
|
From: Frederik F. <fre...@di...> - 2017-05-25 12:23:17
|
Hi Mark, All, we've recently started deploying a few systems with Mellanox connectX cards used as Ethernet adaptors and a second Mellanox card as Infiniband HBA. In this configuration we have been seeing error in syslog as below. kernel: infiniband mlx4_0: ib_register_mad_agent: QP 0 not supported This seems to be a bit similar to the issue discussed here: https://sourceforge.net/p/collectl/discussion/696865/thread/16e495ae/ The attached patch based on the discussions above seems to fix the issue for us. I'll also include a second patch for an error message I noticed while investigating this. Note that I'm not that fluent in perl, so I'm not sure if this is the right fix or if this just hides something else... I would appreciate if these patches could be included in future releases if appropriate. Kind regards, Frederik -- Frederik Ferner Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 Duty Sys Admin can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom |
|
From: Craig t. <crt...@gm...> - 2017-02-28 20:49:38
|
Hello,
I am trying to run collectl on a multi-HCA system. When I run with -sX,
the output returned is:
# INFINIBAND STATISTICS (/sec)
#Date Time HCA KBIn PktIn SizeIn KBOut PktOut SizeOut
Errors
20170228 12:46:35 mlx5 0 0 0 0 0 0
0
20170228 12:46:35 mlx5 0 0 0 0 0 0
0
20170228 12:46:35 mlx5 0 0 0 0 0 0
0
20170228 12:46:35 mlx5 0 0 0 0 0 0
0
20170228 12:46:36 mlx5 0 3 63 0 2 90
0
20170228 12:46:36 mlx5 0 0 0 0 0 0
0
20170228 12:46:36 mlx5 0 0 0 0 0 0
0
20170228 12:46:36 mlx5 0 0 0 0 0 0
0
I cannot identify which HCA is being used. With the patch below, I get
something that makes sense (to me).
# INFINIBAND STATISTICS (/sec)
#Date Time HCA KBIn PktIn SizeIn KBOut PktOut SizeOut
Errors
20170228 12:47:32 mlx5_0 0 3 77 0 3 64
0
20170228 12:47:32 mlx5_1 0 0 0 0 0 0
0
20170228 12:47:32 mlx5_2 0 0 0 0 0 0
0
20170228 12:47:32 mlx5_3 0 0 0 0 0 0
0
20170228 12:47:33 mlx5_0 0 0 0 0 0 0
0
20170228 12:47:33 mlx5_1 0 0 0 0 0 0
0
20170228 12:47:33 mlx5_2 0 0 0 0 0 0
0
20170228 12:47:33 mlx5_3 0 0 0 0 0 0
0
Is there a reason for stripping of the _ and not adding the device number?
Thanks,
Craig
--- collectl/usr/share/collectl/formatit.ph 2017-02-28 09:29:53.859849557
-0800
+++ debug/collectl/usr/share/collectl/formatit.ph 2017-02-28
12:43:09.431284802 -0800
@@ -7208,9 +7208,11 @@
# this is messy. some HCSa end with _ which we don't want to
print BUT we
# need to preserve the full name in the array so do a non-greedy
match so
# we see everything except the optional _ at the end.
- $HCAName[$i]=~/(\S+?)_*$/;
+
+ $name=$HCAName[$i].$i;
+
$line=sprintf("$datetime %-6s %7s %7s %7s %7s %7s %7s %7s\n",
- $1,
+ $name,
cvt($ibRxKB[$i]/$intSecs,7,0,1), cvt($ibRx[$i]/$intSecs,6),
$ibRx[$i] ? cvt($ibRxKB[$i]*1024/$ibRx[$i],4,0,1) : 0,
cvt($ibTxKB[$i]/$intSecs,7,0,1), cvt($ibTx[$i]/$intSecs,6),
|
|
From: Laban M. <lm...@gm...> - 2016-11-30 19:51:54
|
Hello all, I have initiated a pull request that provides a reversed hostname for metrics being vended to graphite. This allows one to use a better hierarchical tree when navigating a large group of metrics for related services. Please have a look at: https://github.com/labeneator/collectl/pull/1 <" rel="nofollow">https://github.com/labeneator/collectl/pull/1> I’d love if this could be pulled into the main codebase after code review. Please let me know if there are improvements to be made. Cheers, Laban |
|
From: <Sop...@sm...> - 2016-10-31 12:12:56
|
Thank you Mark. This is not urgent so please don't put yourself out for me. Kind regards, Sophie Loewenthal Server Infrastructure Smals.be From: Mark Seger <mj...@gm...> To: Sop...@sm..., Cc: "col...@li..." <col...@li...> Date: 31/10/2016 13:07 Subject: Re: [Collectl-interest] collectl : how can I display columns : PureAcks HPAcks Loss FTrans ? looks like 3.6.3 has that format, but that is also 4 years old so no telling what had broken with newer kernels. I will try to do something in the newer versions to restore these but that won't be immediately -mark On Mon, Oct 31, 2016 at 4:33 AM, <Sop...@sm...> wrote: Hi Mark, > it came from an older version collectl, is that right? Yes it did, but copied from a blog entry. Do you recall the version with the old network stats? I can use this instead. Kind regards, Sophie Loewenthal Server Infrastructure Smals.be From: Mark Seger <mj...@gm...> To: Sop...@sm..., Cc: "col...@li..." < col...@li...> Date: 28/10/2016 16:34 Subject: Re: [Collectl-interest] collectl : how can I display columns : PureAcks HPAcks Loss FTrans ? I'm a little confused. The output in your first example looks like it came from an older version collectl, is that right? Awhile back I changed things to allow collectl to collect more network stats and so now in brief format it just shows error counts for 4 different kind of network stats. To get the specific fields, of which there are now well over 50, you need to specify a filter for the type of data you're interested in using --tcpfilt. Furthermore you need to tell collectl to report in --verbose rather than brief. This data is now reported as TCP Extended stats which you access via --tcpfilt T. To make things a little more complicated, sorry about this, since there is so much data and wanting to use fixed width fields I needed to come up with different names that were also more reflective of the data elements, which you can see if you look at /proc/net/snmp. and /proc/net/netstat. Looking more closely at both the code and the description of the output in http://collectl.sourceforge.net/Data-verbose.html I can see where making/documenting all those changes has led to some confusion. specifically, look at this: mjs@blkjak:~$ collectl -c1 -st --tcpfilt T --verbose waiting for 1 second sample... # TCP STACK SUMMARY (/sec) #<------------------------------------------TcpExt-----------------------------------------> # FasTim Reject DelAck QikAck PktQue PreQuB HdPdct AkNoPy PreAck DsAcks RUData REClos SackS 0 0 0 0 0 0 0 1 0 0 0 0 0 As it turns out, PureAcks and HPAcks are now the AkNoPy and PreAck fields. I've no idea why I called them that but can certainly clean up the documentation to say so. Also as for loss and fastretrans, I'm not reporting them at all here but do so in plot format. I'm not really sure how that happened: collectl -c1 -st -P waiting for 1 second sample... #Date Time [TCP]IpErr [TCP]TcpErr [TCP]UdpErr [TCP]IcmpErr [TCP]Loss [TCP]FTrans 20161028 10:33:04 0 0 0 0 0 0 This will take some more thinking on my part. Sorry about all this... -mark On Fri, Oct 28, 2016 at 7:43 AM, <Sop...@sm...> wrote: Dear Mark, How can I get this read out from collectl? [root@poker ~]# collectl -st waiting for 1 second sample... #<------------TCP-------------> #PureAcks HPAcks Loss FTrans 3 0 0 0 1 0 0 0 When I run this I have, # collectl -s t waiting for 1 second sample... #<-------TCP--------> # IP Tcp Udp Icmp 0 0 0 0 0 0 0 0 0 0 0 0 Version # # collectl -v collectl V4.1.0-1 (zlib:2.021) Thanks for letting me know. Kind regards, Sophie Loewenthal Server Infrastructure Smals.be Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals. Indien dit bericht niet voor u bestemd is, verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te vernietigen. Conformément aux dispositions relatives à la représentation de l'asbl dans ses statuts, seul l'administrateur délégué, le directeur général ou son mandataire exprès est habilité à souscrire des engagements au nom de Smals. Si ce message ne vous est pas destiné, nous vous prions de nous le signaler immédiatement et de détruire le message. According to the provisions regarding representation of the non profit association in its bylaws, only the chief executive officer, the general manager or his explicit agent can enter into engagements on behalf of Smals. If you are not the addressee of this message, we kindly ask you to signal this to us immediately and to delete the message. ------------------------------------------------------------------------------ The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik _______________________________________________ Collectl-interest mailing list Col...@li... https://lists.sourceforge.net/lists/listinfo/collectl-interest Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals. Indien dit bericht niet voor u bestemd is, verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te vernietigen. Conformément aux dispositions relatives à la représentation de l'asbl dans ses statuts, seul l'administrateur délégué, le directeur général ou son mandataire exprès est habilité à souscrire des engagements au nom de Smals. Si ce message ne vous est pas destiné, nous vous prions de nous le signaler immédiatement et de détruire le message. According to the provisions regarding representation of the non profit association in its bylaws, only the chief executive officer, the general manager or his explicit agent can enter into engagements on behalf of Smals. If you are not the addressee of this message, we kindly ask you to signal this to us immediately and to delete the message. ------------------------------------------------------------------------------ The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik_______________________________________________ Collectl-interest mailing list Col...@li... https://lists.sourceforge.net/lists/listinfo/collectl-interest Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals. Indien dit bericht niet voor u bestemd is, verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te vernietigen. Conformément aux dispositions relatives à la représentation de l'asbl dans ses statuts, seul l'administrateur délégué, le directeur général ou son mandataire exprès est habilité à souscrire des engagements au nom de Smals. Si ce message ne vous est pas destiné, nous vous prions de nous le signaler immédiatement et de détruire le message. According to the provisions regarding representation of the non profit association in its bylaws, only the chief executive officer, the general manager or his explicit agent can enter into engagements on behalf of Smals. If you are not the addressee of this message, we kindly ask you to signal this to us immediately and to delete the message. |
|
From: Mark S. <mj...@gm...> - 2016-10-31 11:57:18
|
looks like 3.6.3 has that format, but that is also 4 years old so no telling what had broken with newer kernels. I will try to do something in the newer versions to restore these but that won't be immediately -mark On Mon, Oct 31, 2016 at 4:33 AM, <Sop...@sm...> wrote: > Hi Mark, > > *> **it came from an older version collectl, is that right? * > Yes it did, but copied from a blog entry. > > Do you recall the version with the old network stats? I can use this > instead. > > Kind regards, > Sophie Loewenthal > > Server Infrastructure > Smals.be > > > > From: Mark Seger <mj...@gm...> > To: Sop...@sm..., > Cc: "col...@li..." < > col...@li...> > Date: 28/10/2016 16:34 > Subject: Re: [Collectl-interest] collectl : how can I display > columns : PureAcks HPAcks Loss FTrans ? > ------------------------------ > > > > I'm a little confused. The output in your first example looks like it > came from an older version collectl, is that right? Awhile back I changed > things to allow collectl to collect more network stats and so now in brief > format it just shows error counts for 4 different kind of network stats. To > get the specific fields, of which there are now well over 50, you need to > specify a filter for the type of data you're interested in using > --tcpfilt. Furthermore you need to tell collectl to report in --verbose > rather than brief. This data is now reported as TCP Extended stats which > you access via --tcpfilt T. > > To make things a little more complicated, sorry about this, since there is > so much data and wanting to use fixed width fields I needed to come up with > different names that were also more reflective of the data elements, which > you can see if you look at /proc/net/snmp. and /proc/net/netstat. > > Looking more closely at both the code and the description of the output in > *http://collectl.sourceforge.net/Data-verbose.html* > <" rel="nofollow">http://collectl.sourceforge.net/Data-verbose.html> I can see where > making/documenting all those changes has led to some confusion. > specifically, look at this: > > mjs@blkjak:~$ collectl -c1 -st --tcpfilt T --verbose > waiting for 1 second sample... > > # TCP STACK SUMMARY (/sec) > #<------------------------------------------TcpExt---------- > -------------------------------> > # FasTim Reject DelAck QikAck PktQue PreQuB HdPdct AkNoPy PreAck DsAcks > RUData REClos SackS > 0 0 0 0 0 0 0 1 0 0 > 0 0 0 > > As it turns out, PureAcks and HPAcks are now the AkNoPy and PreAck fields. > I've no idea why I called them that but can certainly clean up the > documentation to say so. Also as for loss and fastretrans, I'm not > reporting them at all here but do so in plot format. I'm not really sure > how that happened: > > collectl -c1 -st -P > waiting for 1 second sample... > #Date Time [TCP]IpErr [TCP]TcpErr [TCP]UdpErr [TCP]IcmpErr [TCP]Loss > [TCP]FTrans > 20161028 10:33:04 0 0 0 0 0 0 > > This will take some more thinking on my part. Sorry about all this... > > -mark > > > > > > > > > On Fri, Oct 28, 2016 at 7:43 AM, <*Sop...@sm...* > <Sop...@sm...>> wrote: > Dear Mark, > > How can I get this read out from collectl? > > [root@poker ~]# collectl -st > waiting for 1 second sample... > #<------------TCP-------------> > #PureAcks HPAcks Loss FTrans > 3 0 0 0 > 1 0 0 0 > > When I run this I have, > > # collectl -s t > waiting for 1 second sample... > #<-------TCP--------> > # IP Tcp Udp Icmp > 0 0 0 0 > 0 0 0 0 > 0 0 0 0 > > > Version # > # *collectl -v* > collectl V4.1.0-1 (zlib:2.021) > > Thanks for letting me know. > > Kind regards, > Sophie Loewenthal > > Server Infrastructure > Smals.be > > > <" rel="nofollow">http://www.smals.be/> > > > ------------------------------ > > > Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in > haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur > of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals. > Indien dit bericht niet voor u bestemd is, verzoeken wij u dit > onmiddellijk aan ons te melden en het bericht te vernietigen. > > Conformément aux dispositions relatives à la représentation de l'asbl dans > ses statuts, seul l'administrateur délégué, le directeur général ou son > mandataire exprès est habilité à souscrire des engagements au nom de Smals. > Si ce message ne vous est pas destiné, nous vous prions de nous le > signaler immédiatement et de détruire le message. > > According to the provisions regarding representation of the non profit > association in its bylaws, only the chief executive officer, the general > manager or his explicit agent can enter into engagements on behalf of Smals. > If you are not the addressee of this message, we kindly ask you to signal > this to us immediately and to delete the message. > > > > ------------------------------------------------------------ > ------------------ > The Command Line: Reinvented for Modern Developers > Did the resurgence of CLI tooling catch you by surprise? > Reconnect with the command line and become more productive. > Learn the new .NET and *ASP.NET* <" rel="nofollow">http://asp.net/> CLI. Get your free > copy! > *http://sdm.link/telerik* <" rel="nofollow">http://sdm.link/telerik> > _______________________________________________ > Collectl-interest mailing list > *Col...@li...* > <Col...@li...> > *https://lists.sourceforge.net/lists/listinfo/collectl-interest* > <" rel="nofollow">https://lists.sourceforge.net/lists/listinfo/collectl-interest> > > > > > <" rel="nofollow">http://www.smals.be/> > > > ------------------------------ > > > Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in > haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur > of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals. > Indien dit bericht niet voor u bestemd is, verzoeken wij u dit > onmiddellijk aan ons te melden en het bericht te vernietigen. > > Conformément aux dispositions relatives à la représentation de l'asbl dans > ses statuts, seul l'administrateur délégué, le directeur général ou son > mandataire exprès est habilité à souscrire des engagements au nom de Smals. > Si ce message ne vous est pas destiné, nous vous prions de nous le > signaler immédiatement et de détruire le message. > > According to the provisions regarding representation of the non profit > association in its bylaws, only the chief executive officer, the general > manager or his explicit agent can enter into engagements on behalf of Smals. > If you are not the addressee of this message, we kindly ask you to signal > this to us immediately and to delete the message. > > > |
|
From: <Sop...@sm...> - 2016-10-31 08:33:40
|
Hi Mark, > it came from an older version collectl, is that right? Yes it did, but copied from a blog entry. Do you recall the version with the old network stats? I can use this instead. Kind regards, Sophie Loewenthal Server Infrastructure Smals.be From: Mark Seger <mj...@gm...> To: Sop...@sm..., Cc: "col...@li..." <col...@li...> Date: 28/10/2016 16:34 Subject: Re: [Collectl-interest] collectl : how can I display columns : PureAcks HPAcks Loss FTrans ? I'm a little confused. The output in your first example looks like it came from an older version collectl, is that right? Awhile back I changed things to allow collectl to collect more network stats and so now in brief format it just shows error counts for 4 different kind of network stats. To get the specific fields, of which there are now well over 50, you need to specify a filter for the type of data you're interested in using --tcpfilt. Furthermore you need to tell collectl to report in --verbose rather than brief. This data is now reported as TCP Extended stats which you access via --tcpfilt T. To make things a little more complicated, sorry about this, since there is so much data and wanting to use fixed width fields I needed to come up with different names that were also more reflective of the data elements, which you can see if you look at /proc/net/snmp. and /proc/net/netstat. Looking more closely at both the code and the description of the output in http://collectl.sourceforge.net/Data-verbose.html I can see where making/documenting all those changes has led to some confusion. specifically, look at this: mjs@blkjak:~$ collectl -c1 -st --tcpfilt T --verbose waiting for 1 second sample... # TCP STACK SUMMARY (/sec) #<------------------------------------------TcpExt-----------------------------------------> # FasTim Reject DelAck QikAck PktQue PreQuB HdPdct AkNoPy PreAck DsAcks RUData REClos SackS 0 0 0 0 0 0 0 1 0 0 0 0 0 As it turns out, PureAcks and HPAcks are now the AkNoPy and PreAck fields. I've no idea why I called them that but can certainly clean up the documentation to say so. Also as for loss and fastretrans, I'm not reporting them at all here but do so in plot format. I'm not really sure how that happened: collectl -c1 -st -P waiting for 1 second sample... #Date Time [TCP]IpErr [TCP]TcpErr [TCP]UdpErr [TCP]IcmpErr [TCP]Loss [TCP]FTrans 20161028 10:33:04 0 0 0 0 0 0 This will take some more thinking on my part. Sorry about all this... -mark On Fri, Oct 28, 2016 at 7:43 AM, <Sop...@sm...> wrote: Dear Mark, How can I get this read out from collectl? [root@poker ~]# collectl -st waiting for 1 second sample... #<------------TCP-------------> #PureAcks HPAcks Loss FTrans 3 0 0 0 1 0 0 0 When I run this I have, # collectl -s t waiting for 1 second sample... #<-------TCP--------> # IP Tcp Udp Icmp 0 0 0 0 0 0 0 0 0 0 0 0 Version # # collectl -v collectl V4.1.0-1 (zlib:2.021) Thanks for letting me know. Kind regards, Sophie Loewenthal Server Infrastructure Smals.be Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals. Indien dit bericht niet voor u bestemd is, verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te vernietigen. Conformément aux dispositions relatives à la représentation de l'asbl dans ses statuts, seul l'administrateur délégué, le directeur général ou son mandataire exprès est habilité à souscrire des engagements au nom de Smals. Si ce message ne vous est pas destiné, nous vous prions de nous le signaler immédiatement et de détruire le message. According to the provisions regarding representation of the non profit association in its bylaws, only the chief executive officer, the general manager or his explicit agent can enter into engagements on behalf of Smals. If you are not the addressee of this message, we kindly ask you to signal this to us immediately and to delete the message. ------------------------------------------------------------------------------ The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik _______________________________________________ Collectl-interest mailing list Col...@li... https://lists.sourceforge.net/lists/listinfo/collectl-interest Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals. Indien dit bericht niet voor u bestemd is, verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te vernietigen. Conformément aux dispositions relatives à la représentation de l'asbl dans ses statuts, seul l'administrateur délégué, le directeur général ou son mandataire exprès est habilité à souscrire des engagements au nom de Smals. Si ce message ne vous est pas destiné, nous vous prions de nous le signaler immédiatement et de détruire le message. According to the provisions regarding representation of the non profit association in its bylaws, only the chief executive officer, the general manager or his explicit agent can enter into engagements on behalf of Smals. If you are not the addressee of this message, we kindly ask you to signal this to us immediately and to delete the message. |
|
From: Mark S. <mj...@gm...> - 2016-10-28 14:34:28
|
I'm a little confused. The output in your first example looks like it came from an older version collectl, is that right? Awhile back I changed things to allow collectl to collect more network stats and so now in brief format it just shows error counts for 4 different kind of network stats. To get the specific fields, of which there are now well over 50, you need to specify a filter for the type of data you're interested in using --tcpfilt. Furthermore you need to tell collectl to report in --verbose rather than brief. This data is now reported as TCP Extended stats which you access via --tcpfilt T. To make things a little more complicated, sorry about this, since there is so much data and wanting to use fixed width fields I needed to come up with different names that were also more reflective of the data elements, which you can see if you look at /proc/net/snmp. and /proc/net/netstat. Looking more closely at both the code and the description of the output in http://collectl.sourceforge.net/Data-verbose.html I can see where making/documenting all those changes has led to some confusion. specifically, look at this: mjs@blkjak:~$ collectl -c1 -st --tcpfilt T --verbose waiting for 1 second sample... # TCP STACK SUMMARY (/sec) #<------------------------------------------TcpExt-----------------------------------------> # FasTim Reject DelAck QikAck PktQue PreQuB HdPdct AkNoPy PreAck DsAcks RUData REClos SackS 0 0 0 0 0 0 0 1 0 0 0 0 0 As it turns out, PureAcks and HPAcks are now the AkNoPy and PreAck fields. I've no idea why I called them that but can certainly clean up the documentation to say so. Also as for loss and fastretrans, I'm not reporting them at all here but do so in plot format. I'm not really sure how that happened: collectl -c1 -st -P waiting for 1 second sample... #Date Time [TCP]IpErr [TCP]TcpErr [TCP]UdpErr [TCP]IcmpErr [TCP]Loss [TCP]FTrans 20161028 10:33:04 0 0 0 0 0 0 This will take some more thinking on my part. Sorry about all this... -mark On Fri, Oct 28, 2016 at 7:43 AM, <Sop...@sm...> wrote: > Dear Mark, > > How can I get this read out from collectl? > > [root@poker ~]# collectl -st > waiting for 1 second sample... > #<------------TCP-------------> > #PureAcks HPAcks Loss FTrans > 3 0 0 0 > 1 0 0 0 > > When I run this I have, > > # collectl -s t > waiting for 1 second sample... > #<-------TCP--------> > # IP Tcp Udp Icmp > 0 0 0 0 > 0 0 0 0 > 0 0 0 0 > > > Version # > # *collectl -v* > collectl V4.1.0-1 (zlib:2.021) > > Thanks for letting me know. > > Kind regards, > Sophie Loewenthal > > Server Infrastructure > Smals.be > > > <" rel="nofollow">http://www.smals.be/> > > > ------------------------------ > > > Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in > haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur > of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals. > Indien dit bericht niet voor u bestemd is, verzoeken wij u dit > onmiddellijk aan ons te melden en het bericht te vernietigen. > > Conformément aux dispositions relatives à la représentation de l'asbl dans > ses statuts, seul l'administrateur délégué, le directeur général ou son > mandataire exprès est habilité à souscrire des engagements au nom de Smals. > Si ce message ne vous est pas destiné, nous vous prions de nous le > signaler immédiatement et de détruire le message. > > According to the provisions regarding representation of the non profit > association in its bylaws, only the chief executive officer, the general > manager or his explicit agent can enter into engagements on behalf of Smals. > If you are not the addressee of this message, we kindly ask you to signal > this to us immediately and to delete the message. > > > > ------------------------------------------------------------ > ------------------ > The Command Line: Reinvented for Modern Developers > Did the resurgence of CLI tooling catch you by surprise? > Reconnect with the command line and become more productive. > Learn the new .NET and ASP.NET CLI. Get your free copy! > http://sdm.link/telerik > _______________________________________________ > Collectl-interest mailing list > Col...@li... > https://lists.sourceforge.net/lists/listinfo/collectl-interest > > |
|
From: <Sop...@sm...> - 2016-10-28 12:01:24
|
Dear Mark,
How can I get this read out from collectl?
[root@poker ~]# collectl -st
waiting for 1 second sample...
#<------------TCP------------->
#PureAcks HPAcks Loss FTrans
3 0 0 0
1 0 0 0
When I run this I have,
# collectl -s t
waiting for 1 second sample...
#<-------TCP-------->
# IP Tcp Udp Icmp
0 0 0 0
0 0 0 0
0 0 0 0
Version #
# collectl -v
collectl V4.1.0-1 (zlib:2.021)
Thanks for letting me know.
Kind regards,
Sophie Loewenthal
Server Infrastructure
Smals.be
Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in
haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur
of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals.
Indien dit bericht niet voor u bestemd is, verzoeken wij u dit
onmiddellijk aan ons te melden en het bericht te vernietigen.
Conformément aux dispositions relatives à la représentation de l'asbl dans
ses statuts, seul l'administrateur délégué, le directeur général ou son
mandataire exprès est habilité à souscrire des engagements au nom de
Smals.
Si ce message ne vous est pas destiné, nous vous prions de nous le
signaler immédiatement et de détruire le message.
According to the provisions regarding representation of the non profit
association in its bylaws, only the chief executive officer, the general
manager or his explicit agent can enter into engagements on behalf of
Smals.
If you are not the addressee of this message, we kindly ask you to signal
this to us immediately and to delete the message.
|
|
From: Hernan L. <her...@gm...> - 2016-06-23 02:38:06
|
Hello Mark,
I downloaded the latest version from Sourceforge and it seems to fix these
issues, even with RAW files generated by the (older?) version available on
Debian. I will use this version going forward, we can declare the problem
resolved.
Thanks for your help,
Hernan
On Thu, Jun 16, 2016 at 5:30 AM, Mark Seger <mj...@gm...> wrote:
> Wow, that's a tricky one. quite honestly colmux has been so solid for me
> I haven't looked at the code in ages, but that doesn't mean anything
> either. It's also amusing to note I had totally forgotten it supported the
> hostname address syntax you're using. ;) That allowed me to essentially
> use the same command you are, with one note. I also added -test and see
> columns 10 and 20 are different than you're saying. maybe you have a
> different kernel? I'm on 4.4.7-1-amd64-hpelinux which is the linux we use
> for our Helion Cloud and is essentially debian as well.
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P" -cols 10,20
>
> [CPU:0]Idle% [CPU:1]Soft%
> #Time 1-mgmt 2-mgmt 3-mgmt | 1-mgmt 2-mgmt 3-mgmt
> 12:08:27 -1 -1 -1 | -1 -1 -1
> 12:08:28 -1 -1 -1 | -1 -1 -1
> 12:08:29 95 -1 100 | 0 -1 0
> 12:08:30 95 97 98 | 0 0 0
> 12:08:31 97 100 100 | 0 0 0
> 12:08:32 87 100 89 | 0 0 0
> 12:08:33 100 100 100 | 0 0 0
> 12:08:34 100 100 99 | 0 0 0
> 12:08:35 100 97 97 | 0 0 0
> 12:08:36 99 98 100 | 0 0 0
>
> What you didn't say is does this fail all the time or intermittently. If
> intermittent it will indeed be hard to track down, but there is hope too ;)
>
> Have you tried playing back a file with colmux yet? If not, you can
> simply rerun the command but include -p and point it to the raw files. The
> one thing I did discover is I think I introduced a bug some time in the
> past and you need to have the hostname portion of the string start with a
> wild card rather than anywhere in the middle. And then to make matters
> worse I found a second bug and am using the wrong column during playback.
> more digging into that required too. ;(
>
> BUT if I add 1 to each column I think this looks right if you ignore what
> the headers say:
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P -p
> '/var/cache/collectl/*-mgmt-20160616-110000.raw.gz'" -cols 11,21|more
>
> [CPU:0]Totl% [CPU:1]Steal%
> #Time 1-mgmt 2-mgmt 3-mgmt | 1-mgmt 2-mgmt 3-mgmt
> 99 99 100 | 0 0 0
> 98 99 97 | 0 0 0
> 94 98 94 | 0 0 0
> 94 93 92 | 0 0 0
> 99 94 98 | 0 0 0
> 99 100 99 | 0 0 0
> 99 100 100 | 0 0 0
>
> and since this is a playback command, you can use time ranges as well to
> limit what is being displayed so I may help zero in on where in the data
> the problem is and then maybe even send me a subset of the problem raw file
> [use collectl --extract to create a new raw from from the time slice of an
> old one]. then, maybe I can track down why this is happening.
>
> -mark
>
>
>
>
>
>
> On Wed, Jun 15, 2016 at 8:35 PM, Hernan Laffitte <
> her...@gm...> wrote:
>
>> Hello,
>>
>> We are trying to gather detailed CPU usage from a number of machines in
>> our cluster. In particular, we want to see usage of every individual CPU in
>> a group of machines.
>>
>> With collectl, on a single machine, the command we can run is:
>>
>> collectl -sC -oT -P
>>
>> Which gives us 282 columns (the machines have 28 CPU's).
>>
>> Now we want to run a colmux command to see the idle time of CPU's 0 and 1
>> on 3 machines. This is columns 10 and 20 ("[CPU:0]Idle%" and
>> "[CPU:1]Idle%"). The command we use is:
>>
>> colmux -addr 'machine-[1-3]' -command "-sC -oT -P" -cols 10,20
>>
>> This generates the error:
>>
>> Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.
>>
>> The error occurs when parsing the field "lasttime" of a data structure
>> $hostVars, which has the following content at the time of the error:
>>
>> {
>> 'lasttime' => [
>> '',
>> '20160615'
>> ],
>> 'maxinst' => [
>> -1,
>> 0
>> ],
>> 'lastinst' => [
>> -1,
>> 0
>> ],
>> 'bufptr' => 1
>> };
>>
>> I am currently running version "collectl V3.6.9-1
>> (zlib:2.06,HiRes:1.9725)" on Debian. Any idea of what may be the problem
>> here?
>>
>>
>> Thanks in advance,
>>
>> Hernan
>>
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning
>> reports.
>> http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
>> _______________________________________________
>> Collectl-interest mailing list
>> Col...@li...
>> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>>
>>
>
|
|
From: Hernan L. <her...@gm...> - 2016-06-23 00:05:25
|
---------- Forwarded message ----------
From: Hernan Laffitte <her...@gm...>
Date: Wed, Jun 22, 2016 at 4:45 PM
Subject: Re: [Collectl-interest] colmux time format error
To: Mark Seger <mj...@gm...>
Hello Mark,
Thanks for the reply! I finally had some time to run the additional tests
you requested. Some comments below...
On Thu, Jun 16, 2016 at 5:30 AM, Mark Seger <mj...@gm...> wrote:
> maybe you have a different kernel?
>
The machine where I am having this issue is running Debian "jessie/sid".
Kernel is:
Linux spaa-1 3.13.0-85-generic #129-Ubuntu SMP Thu Mar 17 20:50:15 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
The version of colmux I have installed is "colmux: 4.7.1 (Term::ReadKey:
V2.31 Threads: 1.86)"
When running the command in 'test' mode, the columns 10, 20, 30, ... were
the "%Idle" of the CPU's. Columns 11, 21, 31,... were the "%Total" of the
CPU's.
In both cases, the commands give this error all the time (not an
intermittent error). One or two "-1" rows appear, followed by the message:
Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.
What you didn't say is does this fail all the time or intermittently. If
> intermittent it will indeed be hard to track down, but there is hope too ;)
>
>
The error occurs every time I try this command.
> Have you tried playing back a file with colmux yet?
>
I am gathering the output of collectl from all the machines into an NFS
directory. All the machines in the cluster have /var/log/collectl
symlinked to /nfs/mnt/path/to/collectl
If I run the command via replay, it doesn't :
colmux -addr 'spaa-[1-3]' -command "-sC -oT -P -p
'/nfs/mnt/path/to/collectl/*20160621*raw.gz'" -cols 11,21 | less
However, every row the 3 machines all have the same values for CPU0 and
CPU1. Something like:
#Time spaa-1 spaa-2 spaa-3 | spaa-1 spaa-2 spaa-3
...
1 1 1 | 3 3 3
0 0 0 | 11 11 11
...
and since this is a playback command, you can use time ranges as well to
> limit what is being displayed so I may help zero in on where in the data
> the problem is and then maybe even send me a subset of the problem raw file
> [use collectl --extract to create a new raw from from the time slice of an
> old one]. then, maybe I can track down why this is happening.
>
> -mark
>
>
Thanks Mark, I will send a copy of the raw files in a private email.
Regards,
Hernan
|
|
From: Mark S. <mj...@gm...> - 2016-06-16 17:12:25
|
so while I haven't been able to repeat what you've reported I did find a
few bugs with playing back raw files in plot mode, so this has been a good
thing. the biggest challenge is there are a lot of switch combinations in
native collectl and tossing colmux into the mix makes it even more
complicated, especially when you fear breaking something that already
works, but I think I've figure it out. The other complication is the lack
of testing as I often feel like I'm the only one who uses some of the more
obscure, but useful, features. Good to see you doing so too and if you
haven't yet tried playing back files across multiple machines I think
you'll discover a whole new power. ;)
-mark
On Thu, Jun 16, 2016 at 8:30 AM, Mark Seger <mj...@gm...> wrote:
> Wow, that's a tricky one. quite honestly colmux has been so solid for me
> I haven't looked at the code in ages, but that doesn't mean anything
> either. It's also amusing to note I had totally forgotten it supported the
> hostname address syntax you're using. ;) That allowed me to essentially
> use the same command you are, with one note. I also added -test and see
> columns 10 and 20 are different than you're saying. maybe you have a
> different kernel? I'm on 4.4.7-1-amd64-hpelinux which is the linux we use
> for our Helion Cloud and is essentially debian as well.
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P" -cols 10,20
>
> [CPU:0]Idle% [CPU:1]Soft%
> #Time 1-mgmt 2-mgmt 3-mgmt | 1-mgmt 2-mgmt 3-mgmt
> 12:08:27 -1 -1 -1 | -1 -1 -1
> 12:08:28 -1 -1 -1 | -1 -1 -1
> 12:08:29 95 -1 100 | 0 -1 0
> 12:08:30 95 97 98 | 0 0 0
> 12:08:31 97 100 100 | 0 0 0
> 12:08:32 87 100 89 | 0 0 0
> 12:08:33 100 100 100 | 0 0 0
> 12:08:34 100 100 99 | 0 0 0
> 12:08:35 100 97 97 | 0 0 0
> 12:08:36 99 98 100 | 0 0 0
>
> What you didn't say is does this fail all the time or intermittently. If
> intermittent it will indeed be hard to track down, but there is hope too ;)
>
> Have you tried playing back a file with colmux yet? If not, you can
> simply rerun the command but include -p and point it to the raw files. The
> one thing I did discover is I think I introduced a bug some time in the
> past and you need to have the hostname portion of the string start with a
> wild card rather than anywhere in the middle. And then to make matters
> worse I found a second bug and am using the wrong column during playback.
> more digging into that required too. ;(
>
> BUT if I add 1 to each column I think this looks right if you ignore what
> the headers say:
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P -p
> '/var/cache/collectl/*-mgmt-20160616-110000.raw.gz'" -cols 11,21|more
>
> [CPU:0]Totl% [CPU:1]Steal%
> #Time 1-mgmt 2-mgmt 3-mgmt | 1-mgmt 2-mgmt 3-mgmt
> 99 99 100 | 0 0 0
> 98 99 97 | 0 0 0
> 94 98 94 | 0 0 0
> 94 93 92 | 0 0 0
> 99 94 98 | 0 0 0
> 99 100 99 | 0 0 0
> 99 100 100 | 0 0 0
>
> and since this is a playback command, you can use time ranges as well to
> limit what is being displayed so I may help zero in on where in the data
> the problem is and then maybe even send me a subset of the problem raw file
> [use collectl --extract to create a new raw from from the time slice of an
> old one]. then, maybe I can track down why this is happening.
>
> -mark
>
>
>
>
>
>
> On Wed, Jun 15, 2016 at 8:35 PM, Hernan Laffitte <
> her...@gm...> wrote:
>
>> Hello,
>>
>> We are trying to gather detailed CPU usage from a number of machines in
>> our cluster. In particular, we want to see usage of every individual CPU in
>> a group of machines.
>>
>> With collectl, on a single machine, the command we can run is:
>>
>> collectl -sC -oT -P
>>
>> Which gives us 282 columns (the machines have 28 CPU's).
>>
>> Now we want to run a colmux command to see the idle time of CPU's 0 and 1
>> on 3 machines. This is columns 10 and 20 ("[CPU:0]Idle%" and
>> "[CPU:1]Idle%"). The command we use is:
>>
>> colmux -addr 'machine-[1-3]' -command "-sC -oT -P" -cols 10,20
>>
>> This generates the error:
>>
>> Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.
>>
>> The error occurs when parsing the field "lasttime" of a data structure
>> $hostVars, which has the following content at the time of the error:
>>
>> {
>> 'lasttime' => [
>> '',
>> '20160615'
>> ],
>> 'maxinst' => [
>> -1,
>> 0
>> ],
>> 'lastinst' => [
>> -1,
>> 0
>> ],
>> 'bufptr' => 1
>> };
>>
>> I am currently running version "collectl V3.6.9-1
>> (zlib:2.06,HiRes:1.9725)" on Debian. Any idea of what may be the problem
>> here?
>>
>>
>> Thanks in advance,
>>
>> Hernan
>>
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning
>> reports.
>> http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
>> _______________________________________________
>> Collectl-interest mailing list
>> Col...@li...
>> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>>
>>
>
|
|
From: Mark S. <mj...@gm...> - 2016-06-16 12:30:26
|
Wow, that's a tricky one. quite honestly colmux has been so solid for me I
haven't looked at the code in ages, but that doesn't mean anything either.
It's also amusing to note I had totally forgotten it supported the hostname
address syntax you're using. ;) That allowed me to essentially use the
same command you are, with one note. I also added -test and see columns 10
and 20 are different than you're saying. maybe you have a different
kernel? I'm on 4.4.7-1-amd64-hpelinux which is the linux we use for our
Helion Cloud and is essentially debian as well.
stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
-command "-sC -oT -P" -cols 10,20
[CPU:0]Idle% [CPU:1]Soft%
#Time 1-mgmt 2-mgmt 3-mgmt | 1-mgmt 2-mgmt 3-mgmt
12:08:27 -1 -1 -1 | -1 -1 -1
12:08:28 -1 -1 -1 | -1 -1 -1
12:08:29 95 -1 100 | 0 -1 0
12:08:30 95 97 98 | 0 0 0
12:08:31 97 100 100 | 0 0 0
12:08:32 87 100 89 | 0 0 0
12:08:33 100 100 100 | 0 0 0
12:08:34 100 100 99 | 0 0 0
12:08:35 100 97 97 | 0 0 0
12:08:36 99 98 100 | 0 0 0
What you didn't say is does this fail all the time or intermittently. If
intermittent it will indeed be hard to track down, but there is hope too ;)
Have you tried playing back a file with colmux yet? If not, you can simply
rerun the command but include -p and point it to the raw files. The one
thing I did discover is I think I introduced a bug some time in the past
and you need to have the hostname portion of the string start with a wild
card rather than anywhere in the middle. And then to make matters worse I
found a second bug and am using the wrong column during playback. more
digging into that required too. ;(
BUT if I add 1 to each column I think this looks right if you ignore what
the headers say:
stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
-command "-sC -oT -P -p
'/var/cache/collectl/*-mgmt-20160616-110000.raw.gz'" -cols 11,21|more
[CPU:0]Totl% [CPU:1]Steal%
#Time 1-mgmt 2-mgmt 3-mgmt | 1-mgmt 2-mgmt 3-mgmt
99 99 100 | 0 0 0
98 99 97 | 0 0 0
94 98 94 | 0 0 0
94 93 92 | 0 0 0
99 94 98 | 0 0 0
99 100 99 | 0 0 0
99 100 100 | 0 0 0
and since this is a playback command, you can use time ranges as well to
limit what is being displayed so I may help zero in on where in the data
the problem is and then maybe even send me a subset of the problem raw file
[use collectl --extract to create a new raw from from the time slice of an
old one]. then, maybe I can track down why this is happening.
-mark
On Wed, Jun 15, 2016 at 8:35 PM, Hernan Laffitte <her...@gm...>
wrote:
> Hello,
>
> We are trying to gather detailed CPU usage from a number of machines in
> our cluster. In particular, we want to see usage of every individual CPU in
> a group of machines.
>
> With collectl, on a single machine, the command we can run is:
>
> collectl -sC -oT -P
>
> Which gives us 282 columns (the machines have 28 CPU's).
>
> Now we want to run a colmux command to see the idle time of CPU's 0 and 1
> on 3 machines. This is columns 10 and 20 ("[CPU:0]Idle%" and
> "[CPU:1]Idle%"). The command we use is:
>
> colmux -addr 'machine-[1-3]' -command "-sC -oT -P" -cols 10,20
>
> This generates the error:
>
> Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.
>
> The error occurs when parsing the field "lasttime" of a data structure
> $hostVars, which has the following content at the time of the error:
>
> {
> 'lasttime' => [
> '',
> '20160615'
> ],
> 'maxinst' => [
> -1,
> 0
> ],
> 'lastinst' => [
> -1,
> 0
> ],
> 'bufptr' => 1
> };
>
> I am currently running version "collectl V3.6.9-1
> (zlib:2.06,HiRes:1.9725)" on Debian. Any idea of what may be the problem
> here?
>
>
> Thanks in advance,
>
> Hernan
>
>
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning
> reports.
> http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
> _______________________________________________
> Collectl-interest mailing list
> Col...@li...
> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>
>
|
|
From: Hernan L. <her...@gm...> - 2016-06-16 00:35:29
|
Hello,
We are trying to gather detailed CPU usage from a number of machines in our
cluster. In particular, we want to see usage of every individual CPU in a
group of machines.
With collectl, on a single machine, the command we can run is:
collectl -sC -oT -P
Which gives us 282 columns (the machines have 28 CPU's).
Now we want to run a colmux command to see the idle time of CPU's 0 and 1
on 3 machines. This is columns 10 and 20 ("[CPU:0]Idle%" and
"[CPU:1]Idle%"). The command we use is:
colmux -addr 'machine-[1-3]' -command "-sC -oT -P" -cols 10,20
This generates the error:
Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.
The error occurs when parsing the field "lasttime" of a data structure
$hostVars, which has the following content at the time of the error:
{
'lasttime' => [
'',
'20160615'
],
'maxinst' => [
-1,
0
],
'lastinst' => [
-1,
0
],
'bufptr' => 1
};
I am currently running version "collectl V3.6.9-1 (zlib:2.06,HiRes:1.9725)"
on Debian. Any idea of what may be the problem here?
Thanks in advance,
Hernan
|
|
From: Mark S. <mj...@gm...> - 2016-03-15 11:45:42
|
I was wondering it the tool might run something and measure times, clearly
something collectl can't do given the way it currently works - you wouldn't
want it to stall waiting for something.
If you want to get an idea what it takes to write a collectl plugin, have a
look at /usr/share/collectl/misc.ph which is a farily simply plugin that
does a lot of things one could even exclude depending on your needs. To
use it you simply do something like this:
$ collectl --import misc
waiting for 1 second sample...
#<------Misc------>
# UTim MHz MT Log
5 1272 0 5
5 1800 0 5
5 1236 0 5
and what you see are the cpu speed, nfs mounts (I actually forgot it did
this ;)) and how many users are logged in. Naturally being fully
integrated, you can also combine its output with other things collectl
knows about such as like this, noting collectl also has it's own version of
hello world':
$ collectl --import hello:misc -sc
waiting for 1 second sample...
#<----CPU[HYPER]-----><-Hello-><------Misc------>
#cpu sys inter ctxsw Total UTim MHz MT Log
2 0 7877 32674 140 5 1218 0 5
1 0 5161 16135 230 5 1416 0 5
and you can also get the output included in the tab file so you can plot it.
-mark
On Mon, Mar 14, 2016 at 11:45 AM, Thomas Oliw <tho...@er...>
wrote:
> Hi Mark,
>
>
>
> Thanks!
>
>
>
> Yes, another switch.. How I have waited! J
>
>
>
> I looked at /proc/self/mountstats and could not see any timingdata
> either.. So where nfsiostat gets the RTT values is a bit of a mystery.
>
> I did find some references of a nfs-iostat.py script that might give a
> clue.
>
>
> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=tools/nfs-iostat/nfs-iostat.py;h=9626d42609b9485c7fda0c9ef69d698f9fa929fd;hb=HEAD
> )
>
> I think it runs several times and calculates delta?!
>
>
>
> If it helps, this is output from nfsiostat on one of our RedHat 6.5
> servers:
>
> (As you can see the RTT for write operations are very high).
>
>
>
> [root@myserver collectl]# nfsiostat 10 2 /proj/eiffel002_config_fem001
>
>
>
> nfsserv.somedomain.se:/vol/volp01234/data_config mounted on
> /proj/data_config_server001:
>
>
>
> op/s rpc bklog
>
> 218.81 0.00
>
> read: ops/s kB/s kB/op
> retrans avg RTT (ms) avg exe (ms)
>
> 12.293 697.798 56.763 0
> (0.0%) 7.972 15.149
>
> write: ops/s kB/s kB/op
> retrans avg RTT (ms) avg exe (ms)
>
> 38.325 1322.385 34.505 392
> (0.0%) 35.313 1237.445
>
>
>
> nfsserv.somedomain.se:/vol/volp01234/data_config mounted on
> /proj/data_config_server001:
>
>
>
> op/s rpc bklog
>
> 68.20 0.00
>
> read: ops/s kB/s kB/op
> retrans avg RTT (ms) avg exe (ms)
>
> 0.000 0.000 0.000 0
> (0.0%) 0.000 0.000
>
> write: ops/s kB/s kB/op
> retrans avg RTT (ms) avg exe (ms)
>
> 69.700 4481.492 64.297 0
> (0.0%) 658.139 71380.063
>
>
>
>
>
> I understand that this is hard to build and test without NFS systems!
>
> I just wanted to throw out the suggestion and see what happens.
>
>
>
> Learning to write a plugin for collectl is tempting. I am not a
> programmer, but have fiddled around with some simple perl scripts in the
> past.
>
> I’ll do some reading on the webpage and try to get an idea of how the
> plugin stuff works.
>
> You’ll never get rid of me if I start.. J
>
>
>
> Kind Regards,
>
>
>
> Thomas
>
>
>
>
>
>
>
> *From:* Mark Seger [mailto:mj...@gm...]
> *Sent:* den 14 mars 2016 14:56
> *To:* Thomas Oliw
> *Cc:* col...@li...
> *Subject:* Re: [Collectl-interest] Suggestion: Additional NFS data from
> /proc/self/mountstats (same as nfsiostat-command)
>
>
>
> Always happy to hear from happy users.
>
>
>
> I just looked at /proc/xx/mountstats, which actually applies to all pids,
> self is just a shortcut to yourself. The problem with pid-based stats is
> it can be a lot of overhead to read any more stats than collectl already
> reads, but my thought was I might be able to add something optionally. Oh
> boy, another switch! ;)
>
>
>
> But when I looks at these stats I did't see anything about timing and only
> saw info on what is mounted. That said, I'd think since nfs is a shared
> resource, there might be timing data for nfs in generat, but my systems
> currently don't use nfs and I might need to do some experiments to see what
> happens if/when I do configure it.
>
>
>
> Worse case, especially if you're a collectl fan, you might be able to
> write your own plugin if you're a perl user. The benefit there is once you
> see how easy it is to write a plugin you then might be able to add even
> more metrics, possibly at the application level if you find that useful.
> If so, I'm always ready to help...
>
>
>
> I'm out of town this week but I'll try to revisit next week when I return.
>
>
>
> -mark
>
>
>
> On Mon, Mar 14, 2016 at 8:12 AM, Thomas Oliw <tho...@er...>
> wrote:
>
> Hi,
>
>
>
> I love collectl and use it extensively for many performance related
> troubleshooting/monitoring tasks in our server park.
>
> The possibility to run live and/or record to file is a fantastic mix of
> features and very useful!
>
>
>
> However, one thing that I miss, is NFS Response time data…
>
> We use lots of NFS shares in our environment, and that particular metric
> is one of the most useful ones in my opinion.
>
>
>
> As a complement to collectl, I use “nfsiostat” when NFS is suspected to be
> a performance bottleneck.
>
> It shows me a number of good metrics and has a “RTT” (Round Trip Time)
> field, that at least gives me a hint of the NFS server responsetime.
>
> If I read the documentation correct, it gets its data from /proc/self/mountstats.
>
>
>
> I think it would be very useful if those metrics could be collected in collectl as well.
>
> The nfsiostat tool itself is a bit crude, at least in our a bit aged
> RedHat environment and for us it would be convenient to have these metrics
> managed with collectl instead.
>
>
>
> Just a suggestion…
>
> Thanks for the collectl tool!!
>
>
>
> Kind Regards,
>
>
>
> Thomas Oliw
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
> _______________________________________________
> Collectl-interest mailing list
> Col...@li...
> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>
>
>
|
|
From: Thomas O. <tho...@er...> - 2016-03-14 15:45:13
|
Hi Mark, Thanks! Yes, another switch.. How I have waited! ☺ I looked at /proc/self/mountstats and could not see any timingdata either.. So where nfsiostat gets the RTT values is a bit of a mystery. I did find some references of a nfs-iostat.py script that might give a clue. http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=tools/nfs-iostat/nfs-iostat.py;h=9626d42609b9485c7fda0c9ef69d698f9fa929fd;hb=HEAD) I think it runs several times and calculates delta?! If it helps, this is output from nfsiostat on one of our RedHat 6.5 servers: (As you can see the RTT for write operations are very high). [root@myserver collectl]# nfsiostat 10 2 /proj/eiffel002_config_fem001 nfsserv.somedomain.se:/vol/volp01234/data_config mounted on /proj/data_config_server001: op/s rpc bklog 218.81 0.00 read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 12.293 697.798 56.763 0 (0.0%) 7.972 15.149 write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 38.325 1322.385 34.505 392 (0.0%) 35.313 1237.445 nfsserv.somedomain.se:/vol/volp01234/data_config mounted on /proj/data_config_server001: op/s rpc bklog 68.20 0.00 read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 0.000 0.000 0.000 0 (0.0%) 0.000 0.000 write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 69.700 4481.492 64.297 0 (0.0%) 658.139 71380.063 I understand that this is hard to build and test without NFS systems! I just wanted to throw out the suggestion and see what happens. Learning to write a plugin for collectl is tempting. I am not a programmer, but have fiddled around with some simple perl scripts in the past. I’ll do some reading on the webpage and try to get an idea of how the plugin stuff works. You’ll never get rid of me if I start.. ☺ Kind Regards, Thomas From: Mark Seger [mailto:mj...@gm...] Sent: den 14 mars 2016 14:56 To: Thomas Oliw Cc: col...@li... Subject: Re: [Collectl-interest] Suggestion: Additional NFS data from /proc/self/mountstats (same as nfsiostat-command) Always happy to hear from happy users. I just looked at /proc/xx/mountstats, which actually applies to all pids, self is just a shortcut to yourself. The problem with pid-based stats is it can be a lot of overhead to read any more stats than collectl already reads, but my thought was I might be able to add something optionally. Oh boy, another switch! ;) But when I looks at these stats I did't see anything about timing and only saw info on what is mounted. That said, I'd think since nfs is a shared resource, there might be timing data for nfs in generat, but my systems currently don't use nfs and I might need to do some experiments to see what happens if/when I do configure it. Worse case, especially if you're a collectl fan, you might be able to write your own plugin if you're a perl user. The benefit there is once you see how easy it is to write a plugin you then might be able to add even more metrics, possibly at the application level if you find that useful. If so, I'm always ready to help... I'm out of town this week but I'll try to revisit next week when I return. -mark On Mon, Mar 14, 2016 at 8:12 AM, Thomas Oliw <tho...@er...<mailto:tho...@er...>> wrote: Hi, I love collectl and use it extensively for many performance related troubleshooting/monitoring tasks in our server park. The possibility to run live and/or record to file is a fantastic mix of features and very useful! However, one thing that I miss, is NFS Response time data… We use lots of NFS shares in our environment, and that particular metric is one of the most useful ones in my opinion. As a complement to collectl, I use “nfsiostat” when NFS is suspected to be a performance bottleneck. It shows me a number of good metrics and has a “RTT” (Round Trip Time) field, that at least gives me a hint of the NFS server responsetime. If I read the documentation correct, it gets its data from /proc/self/mountstats. I think it would be very useful if those metrics could be collected in collectl as well. The nfsiostat tool itself is a bit crude, at least in our a bit aged RedHat environment and for us it would be convenient to have these metrics managed with collectl instead. Just a suggestion… Thanks for the collectl tool!! Kind Regards, Thomas Oliw ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 _______________________________________________ Collectl-interest mailing list Col...@li...<mailto:Col...@li...> https://lists.sourceforge.net/lists/listinfo/collectl-interest |
|
From: Mark S. <mj...@gm...> - 2016-03-14 13:55:51
|
Always happy to hear from happy users. I just looked at /proc/xx/mountstats, which actually applies to all pids, self is just a shortcut to yourself. The problem with pid-based stats is it can be a lot of overhead to read any more stats than collectl already reads, but my thought was I might be able to add something optionally. Oh boy, another switch! ;) But when I looks at these stats I did't see anything about timing and only saw info on what is mounted. That said, I'd think since nfs is a shared resource, there might be timing data for nfs in generat, but my systems currently don't use nfs and I might need to do some experiments to see what happens if/when I do configure it. Worse case, especially if you're a collectl fan, you might be able to write your own plugin if you're a perl user. The benefit there is once you see how easy it is to write a plugin you then might be able to add even more metrics, possibly at the application level if you find that useful. If so, I'm always ready to help... I'm out of town this week but I'll try to revisit next week when I return. -mark On Mon, Mar 14, 2016 at 8:12 AM, Thomas Oliw <tho...@er...> wrote: > Hi, > > > > I love collectl and use it extensively for many performance related > troubleshooting/monitoring tasks in our server park. > > The possibility to run live and/or record to file is a fantastic mix of > features and very useful! > > > > However, one thing that I miss, is NFS Response time data… > > We use lots of NFS shares in our environment, and that particular metric > is one of the most useful ones in my opinion. > > > > As a complement to collectl, I use “nfsiostat” when NFS is suspected to be > a performance bottleneck. > > It shows me a number of good metrics and has a “RTT” (Round Trip Time) > field, that at least gives me a hint of the NFS server responsetime. > > If I read the documentation correct, it gets its data from /proc/self/mountstats. > > > > I think it would be very useful if those metrics could be collected in collectl as well. > > The nfsiostat tool itself is a bit crude, at least in our a bit aged > RedHat environment and for us it would be convenient to have these metrics > managed with collectl instead. > > > > Just a suggestion… > > Thanks for the collectl tool!! > > > > Kind Regards, > > > > Thomas Oliw > > > > > > > > > > > ------------------------------------------------------------------------------ > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 > _______________________________________________ > Collectl-interest mailing list > Col...@li... > https://lists.sourceforge.net/lists/listinfo/collectl-interest > > |
|
From: Thomas O. <tho...@er...> - 2016-03-14 12:13:09
|
Hi, I love collectl and use it extensively for many performance related troubleshooting/monitoring tasks in our server park. The possibility to run live and/or record to file is a fantastic mix of features and very useful! However, one thing that I miss, is NFS Response time data... We use lots of NFS shares in our environment, and that particular metric is one of the most useful ones in my opinion. As a complement to collectl, I use "nfsiostat" when NFS is suspected to be a performance bottleneck. It shows me a number of good metrics and has a "RTT" (Round Trip Time) field, that at least gives me a hint of the NFS server responsetime. If I read the documentation correct, it gets its data from /proc/self/mountstats<file:///\\proc\self\mountstats>. I think it would be very useful if those metrics could be collected in collectl as well. The nfsiostat tool itself is a bit crude, at least in our a bit aged RedHat environment and for us it would be convenient to have these metrics managed with collectl instead. Just a suggestion... Thanks for the collectl tool!! Kind Regards, Thomas Oliw |