Sysdig

From wiki.mikejung.biz
Jump to: navigation, search

Liquidweb 728x90.jpg

What is a system call?

A system call is how a program interacts with the operating system. An example of a system call would be httpd requesting a new process to be created, or to have an idle thread killed, etc. Tracing system calls is nothing new, strace has been able to do this for a long time. Sysdig is relatively new to the club and makes it easier to filter out results based on many easy to understand filters.

Linux Syscall reference sheet

Install Sysdig on CentOS 6

To install on CentOS

rpm --import https://s3.amazonaws.com/download.draios.com/DRAIOS-GPG-KEY.public  
curl -s -o /etc/yum.repos.d/draios.repo http://download.draios.com/stable/rpm/draios.repo
rpm -i http://mirror.us.leaseweb.net/epel/6/i386/epel-release-6-8.noarch.rpm
yum -y install kernel-devel-$(uname -r)
yum -y install sysdig

Sysdig output formatting

By default sysdig will output using the following format

%evt.num %evt.time %evt.cpu %proc.name (%thread.tid) %evt.dir %evt.type %evt.args

Below is an example of what the output looks like when I run "sysdig --summary fd.ip=$Client_IP and fd.port=80"

1 11:40:03.058806870 3 httpd (32199) < accept fd=24(<4t>$Client_IP:59657->$ServIP:80) tuple=$ClientIP:59657->$Server_IP:80 queuepct=0 


##The first field is the event number, since this is the first event it's got a 1 here
evt.num = 1

##The second field is the timestamp for the event
evt.time = 11:40:03.058806870

##The third field is the CPU number where the event was captured on
evt.cpu = 3

##The fourth field is the name of the process the event was recorded under
proc.name = httpd

##The fifth field is the Thread ID that was generated for the event, this corresponds to the PID for single threaded processes
thread.tid = (32199)

##The sixth field is the direction of the event ">" is enter, "<" is exit
evt.dir = < 

##The seventh field is the event type
evt.type = accept

##The rest of the output is the list of arguments for the event. This can vary depending on the syscall
evt.args = fd=24(<4t>$Client_IP:59657->$ServIP:80) tuple=$ClientIP:59657->$Server_IP:80 queuepct=0


  • evt.num is the incremental event number
  • evt.time is the event timestamp
  • evt.cpu is the CPU number where the event was captured
  • proc.name is the name of the process that generated the event
  • thread.tid is the TID that generated the event, which corresponds to the PID for single thread processes
  • evt.dir is the event direction, > for enter events and < for exit events
  • evt.type is the name of the event, e.g. 'open' or 'read'
  • evt.args is the list of event arguments. In case of system calls, these tend to correspond to the system call arguments, but that’s not always the case: some system call arguments are excluded for simplicity or performance reasons.

You can modify this by using -p then entering in the fields you want to display

sysdig -p"%$Field_1 %$Field_2"

You can also re-label the fields by using :

sysdig -p"lolcats:%$Field_1 doge:%$Field_2"

Sysdig Fields

This list may get updated with newer releases of sysdig, please visit this page to make sure you have the latest list:

http://www.sysdig.org/wiki/sysdig-user-guide/

Sysdig IO Related Fields

fd.directory        If the fd is a file, the directory that contains it.
fd.filename         If the fd is a file, the filename without the path.
proc.cwd            the current working directory of the event.
evt.is_io           'true' for events that read or write to FDs, like read(), send, recvfrom(), etc.
evt.is_io_read      'true' for events that read from FDs, like read(), recv(), recvfrom(), etc.
evt.is_io_write     'true' for events that write to FDs, like write(), send(), etc.
evt.io_dir          'r' for events that read from FDs, like read(); 'w' for events that write to FDs, like write().
evt.is_wait         'true' for events that make the thread wait, e.g. sleep(), select(), poll().

Sysdig Memory Related Fields

proc.vmsize         total virtual memory for the process (as kb).
proc.vmrss          resident non-swapped memory for the process (as kb).
proc.vmswap         swapped memory for the process (as kb).

Sysdig IP and Port Related Fields

fd.ip               matches the ip address (client or server) of the fd.
fd.cip              client IP address.
fd.sip              server IP address.
fd.port             (FILTER ONLY) matches the port (either client or server) of the fd.
fd.cport            for TCP/UDP FDs, the client port.
fd.sport            for TCP/UDP FDs, server port.
fd.l4proto          the IP protocol of a socket. Can be 'tcp', 'udp', 'icmp' or'raw'.
fd.sockfamily       the socket family for socket events. Can be 'ip' or 'unix'.
fd.is_server        'true' if the process owning this FD is the server endpointin the connection.

Sysdig Name, Number and Type Related Fields

fd.num              the unique number identifying the file descriptor.
fd.type             Can be 'file', 'directory', ipv4', 'ipv6', 'unix', 'pipe', 'event', 'signalfd', 'eventpoll','inotify' 'signalfd'.

fd.typechar         type of FD as a single character. Can be 'f' for file, 4 for IPv4 socket, 6 for IPv6 socket, 'u' for unix socket, p for pipe, 'e' for eventfd, 's' for signalfd, 'l' for eventpoll, 'i' for inotify, 'o' for uknown.

fd.name             FD full name. If the fd is a file, this field contains the full path. If the FD is a socket, this field contain the connection tuple.

proc.pid            the id of the process generating the event.
proc.exe            the full name (including the path) of the executable generating the event.
proc.name           the name (excluding the path) of the executable generating the event.
proc.ppid           the pid of the parent of the process generating the event.
proc.pname          the name (excluding the path) of the parent of the process 
                    generating the event.
proc.apid           the pid of one of the process ancestors. E.g. proc.apid[1] 
                    returns the parent pid, proc.apid[2] returns the grandparen
                    t pid, and so on. proc.apid[0] is the pid of the current pr
                    ocess. proc.apid without arguments can be used in filters o
                    nly and matches any of the process ancestors, e.g. proc.api
                    d=1234.
proc.aname          the name (excluding the path) of one of the process ancesto
                    rs. E.g. proc.aname[1] returns the parent name, proc.aname[
                    2] returns the grandparent name, and so on. proc.aname[0] i
                    s the name of the current process. proc.aname without argum
                    ents can be used in filters only and matches any of the pro
                    cess ancestors, e.g. proc.aname=bash.
proc.loginshellid   the pid of the oldest shell among the ancestors of the curr
                    ent process, if there is one. This field can be used to sep
                    arate different user sessions, and is useful in conjunction
                     with chisels like spy_user.

Sysdig Process and Thread Related Fields

proc.args           the arguments passed on the command line when starting the 
                    process generating the event.
proc.cmdline        full process command line, i.e name + arguments.
proc.nchilds        the number of child threads of that the process generating 
                    the event currently has.

proc.duration       number of nanoseconds since the process started.
proc.fdopencount    number of open FDs for the process
proc.fdlimit        maximum number of FDs the process can open.
proc.fdusage        the ratio between open FDs and maximum available FDs for th
                    e process.
thread.pfmajor      number of major page faults since thread start.
thread.pfminor      number of minor page faults since thread start.
thread.tid          the id of the thread generating the event.
thread.ismain       'true' if the thread generating the event is the main one i
                    n the process.
thread.exectime     CPU time spent by the last scheduled thread, in nanoseconds
                    . Exported by switch events only.
thread.totexectime  Total CPU time, in nanoseconds since the beginning of the c
                    apture, for the current thread. Exported by switch events o
                    nly.

Sysdig Event Related Fields

evt.num             event number.
evt.time            event timestamp as a time string that includes the nanoseco
                    nd part.
evt.time.s          event timestamp as a time string with no nanoseconds.
evt.datetime        event timestamp as a time string that includes the date.
evt.rawtime         absolute event timestamp, i.e. nanoseconds from epoch.
evt.rawtime.s       integer part of the event timestamp (e.g. seconds since epo
                    ch).
evt.rawtime.ns      fractional part of the absolute event timestamp.
evt.reltime         number of nanoseconds from the beginning of the capture.
evt.reltime.s       number of seconds from the beginning of the capture.
evt.reltime.ns      fractional part (in ns) of the time from the beginning of t
                    he capture.
evt.latency         delta between an exit event and the correspondent enter eve
                    nt.
evt.latency.s       integer part of the event latency delta.
evt.latency.ns      fractional part of the event latency delta.
evt.deltatime       delta between this event and the previous event.
evt.deltatime.s     integer part of the delta between this event and the previo
                    us event.
evt.deltatime.ns    fractional part of the delta between this event and the pre
                    vious event.
evt.dir             event direction can be either '>' for enter events or '<' f
                    or exit events.
evt.type            For system call events, this is the name of the system call
                     (e.g. 'open').
evt.cpu             number of the CPU where this event happened.
evt.args            all the event arguments, aggregated into a single string.
evt.arg             (FILTER ONLY) one of the event arguments specified by name 
                    or by number. Some events (e.g. return codes or FDs) will b
                    e converted into a text representation when possible. E.g. 
                    'resarg.fd' or 'resarg[0]'.
evt.rawarg          (FILTER ONLY) one of the event arguments specified by name.
                     E.g. 'arg.fd'.
evt.info            for most events, this field returns the same value as evt.a
                    rgs. However, for some events (like writes to /dev/log) it 
                    provides higher level information coming from decoding the 
                    arguments.
evt.buffer          the binary data buffer for events that have one, like read(
                    ), recvfrom(), etc. Use this field in filters with 'contain
                    s' to search into I/O data buffers.
evt.res             event return value, as an error code string (e.g. 'ENOENT')
                    .
evt.rawres          event return value, as a number (e.g. -2). Useful for range
                     comparisons.
evt.failed          'true' for events that returned an error status.
evt.is_syslog       'true' for events that are writes to /dev/log.
evt.count           This filter field always returns 1 and can be used to count
                     events from inside chisels.
evt.around          (FILTER ONLY) Accepts the event if it's around the specifie
                    d time interval. The syntax is evt.around[T]=D, where T is 
                    the value returned by %evt.rawtime for the event and D is a
                     delta in milliseconds. For example, evt.around[14049969347
                    93590564]=1000 will return the events with timestamp with o
                    ne second before the timestamp and one second after it, for
                     a total of two seconds of capture.

Sysdig User and Group Related Fields

user.uid            user ID.
user.name           user name.
user.homedir        home directory of the user.
user.shell          user's shell.
group.gid           group ID.
group.name          group name.


Sysdig Syslog Related Fields

syslog.facility.str facility as a string.
syslog.facility     facility as a number (0-23).
syslog.severity.str severity as a string. Can have one of these values: emerg, 
                    alert, crit, err, warn, notice, info, debug
syslog.severity     severity as a number (0-7).
syslog.message      message sent to syslog.

Sysdig example commands

How to Analyze Apache, MySQL and PHP-FPM with Sysdig

Will display all events for [Apache], [MySQL] and [PHP-FPM]. If you use memcached, or [Varnish] you can include these as well.

sysdig proc.name=httpd or proc.name=mysqld or proc.name=php-fpm

If you want to reduce some of the "noisy" system calls you can filter out gettimeofday and switch

sysdig proc.name=httpd or proc.name=mysqld or proc.name=php-fpm and evt.type!=gettimeofday and evt.ty!=switch

How to Analyze MySQLd with Sysdig

This command will filter out most of the "noisy" output and will mainly show you when MySQL sends and receives data

sysdig -v -F proc.name=mysqld and evt.type!=gettimeofday and evt.type!=switch and evt.type!=io_getevents and evt.type!=futex and evt.type!=clock_gettime and evt.type!=select

How to Analyze Apache and PHP-FPM with Sysdig

This filters out a lot of noise, use it if you just want to focus on read and write like events for Apache and PHP. Create a phpinfo.php file and load it in a browser or use curl and run this.

sysdig -v -F proc.name=php-fpm or proc.name=httpd and evt.type!=gettimeofday and evt.type!=switch and evt.type!=clock_gettime and evt.type!=epoll_wait and evt.type!=getsockopt and evt.type!=wait4 and evt.type!=select and evt.type!=semop

How to analyze a website with sysdig

You can use this command to see exactly what happens when you load a webpage in your browser. Just replace the path with the actual path to your www/ content.

sysdig --summary fd.name contains /$path/to/www/

You can also look for a single IP address if you want to only gather information on a single connection. If you want to see how Apache processes your request to load a page you can specify port 80.

sysdig --summary fd.ip=$your_IP and fd.port=80

You can also filter a single IP, and filter out 1 specific service, for instance, if you want to view syscalls when you load certain files in the browser, but want to see all services besides say, ssh, use this command, otherwise you will get a ton of SSH info.

sysdig --summary fd.ip=$your_IP and proc.name!=sshd

If you want to record the results for later review you can use this command, replace $name.scap with whatever you want to name the file, it's suggested that you leave .scap to make it obvious that the file has information captured by sysdig in it.

sysdig --summary fd.ip=$your_IP and fd.port=80 -w $name.scap

To view that file after capture is done

sysdig -r $name.scap

If you want to view the delta between timestamps for each event, use -t d. This will tell you how long it took between one event and the next, useful for identifying how long each event took.

sysdig -t d --summary fd.ip=$your_IP and proc.name!=sshd

View available sysdig chisels

This command will list all the available chisels

sysdig -cl

By default, these are the available chisels.

Category: CPU Usage
-------------------
topprocs_cpu        Top processes by CPU usage

Category: Errors
----------------
topfiles_errors     top files by number of errors
topprocs_errors     top processes by number of errors

Category: I/O
-------------
echo_fds            Print the data read and written by processes.
fdbytes_by          I/O bytes, aggregated by an arbitrary filter field
fdcount_by          FD count, aggregated by an arbitrary filter field
iobytes             Sum of I/O bytes on any type of FD
iobytes_file        Sum of file I/O bytes
spy_file            Echo any read/write made by any process to all files. Optio
                    nally, you can provide the name of one file to only interce
                    pt reads/writes to that file.
stderr              Print stderr of processes
stdin               Print stdin of processes
stdout              Print stdout of processes
topfiles_bytes      Top files by R+W bytes
topfiles_time       Top files by time
topprocs_file       Top processes by R+W disk bytes

Category: Logs
--------------
spy_logs            Echo any write made by any process to a log file. Optionall
                    y, export the events around each log message to file.
spy_syslog          Print every message written to syslog. Optionally, export t
                    he events around each syslog message to file.

Category: Misc
--------------
around              Export to file the events around the where the given filter
                     matches.

Category: Net
-------------
iobytes_net         Show total network I/O bytes
spy_ip              Show the data exchanged with the given IP address
spy_port            Show the data exchanged using the given IP port number
topconns            top network connections by total bytes
topports_server     Top TCP/UDP server ports by R+W bytes
topprocs_net        Top processes by network I/O

Category: Performance
---------------------
bottlenecks         Slowest system calls
fileslower          Trace slow file I/O
netlower            Trace slow network I/0
proc_exec_time      Show process execution time
scallslower         Trace slow syscalls
topscalls           Top system calls by number of calls
topscalls_time      Top system calls by time

Category: Security
------------------
list_login_shells   List the login shell IDs
shellshock_detect   print shellshock attacks
spy_users           Display interactive user activity

Category: System State
----------------------
lsof                List (and optionally filter) the open file descriptors.
netstat             List (and optionally filter) network connections.
ps                  List (and optionally filter) the machine processes.

Use the -i flag to get detailed information about a specific chisel

For more information on a chisel you can use -i

sysdig -i $chisel_name


Use a Sysdig chisel

To use a specific chisel, just use -c and then the name

sysdig -c $chisel_name

View incoming network connections

If you want to view all incoming network connections, but don't want to see a certain service listed, you can use the command below, replace "apache" with the service you want to filter out.

sysdig evt.type=accept and proc.name!=apache

Common types of syscalls

sysdig is newer than strace and there is a lot you can do with it. By default it shows all system calls on a server but you can filter out certain applications if you want.

sysdig proc.name=$program

For more information on how to install and use sysdig:

Futex

"The futex() system call provides a method for a program to wait for a value at a given address to change, and a method to wake up anyone waiting on a particular address (while the addresses for the same memory in separate processes may not be equal, the kernel maps them internally so the same memory mapped in different locations will correspond for futex() calls). This system call is typically used to implement the contended case of a lock in shared memory, as described in futex(7)."

In general, futexes were created to help improve performance by avoiding the use of system calls whenever possible possible, since each call can consume several hundred instructions. The other part was to avoid unnecessary context switches.

In order to avoid systems calls for uncontended cases, there must be a shared state in the user space which is accessible by all processes and tasks. This shared spaces is known as the "user lock" which indicates it's status so all processes are aware if it's currently held or not.

The user lock is located in shared memory, created by mmap. Because of the location of the lock, it can be used by multiple processes and can be located in different virtual addresses in different address spaces.

A lock can be globally identified by [B,O]

B) The memory object backing the virtual address
O) The offset within that object

There are 3 memory types:

Anonymous Memory          (only usable by threads in the same process)
Shared Memory Segment     (usable by multiple processes)
Memory Mapped Files       (usable by multiple processes)
sysdig evt.type=futex

There are two arguments for the futex system call. The first argument is the address of the futex (addr=), the second argument is the operation (op=)

futex addr=198977C op=133(FUTEX_PRIVATE_FLAG|FUTEX_WAKE_OP) val=1

Clone

The Clone syscall creates new processes and threads. It is one of the more complex system calls and can be expensive to run so if you notice tons of these syscalls and performance is low you may want to try and reduce the amount of times this happens by increasing the process lifetime or reducing the amount of processes in general.

sysdig filter for clone

sysdig evt.type=clone

Execve

This syscall executes programs, typically you will see this call after the clone syscall. Everything that gets executed goes through this call.

sysdig filter for execve

sysdig evt.type=execve

Chdir

This syscall changes the process working directory. If anything changes directory you can see it by filtering this syscall.

sysdig filter for chdir

sysdig evt.type=chdir

open/creat

These syscalls opens files and can also create them. If you trace this syscall you can view file creation and who is touching what.

sysdig filter for open and creat

sysdig evt.type=open
sysdig evt.type=creat

connect

This syscall initiates connections on a socket(s). This syscall is the only one that can establish a network connection.

sysdig filter for connect. You can also specify a port or IP to view specific services or IPs.

sysdig evt.type=connect
sysdig evt.type=connect and fd.port=80

accept

This syscall accepts a connection on a socket. You will always see this syscall when connect is called.

sysdig filter for accept. You can also specify a port or IP to view specific services or IPs.

sysdig evt.type=accept
sysdig evt.type=accept and fd.port=80

read/write

These syscalls read or write data to or from a file.

sysdig filter for IO

sysdig evt.is_io=true

You can also use chisel to view IO for certain files, ports, or programs, for example

sysdig -c echo_fds fd.name=/var/lib/mysql/

sysdig -c echo_fds proc.name=httpd and fd.port!=80

unlink/rename

These syscalls delete or rename files.

sysdig evt.type=unlink
sysdig evt.type=rename