- 1 What is a system call?
- 2 Install Sysdig on CentOS 6
- 3 Sysdig output formatting
- 4 Sysdig Fields
- 5 Sysdig example commands
- 6 Common types of syscalls
What is a system call?
A system call is how a program interacts with the operating system. An example of a system call would be httpd requesting a new process to be created, or to have an idle thread killed, etc. Tracing system calls is nothing new, strace has been able to do this for a long time. Sysdig is relatively new to the club and makes it easier to filter out results based on many easy to understand filters.
Linux Syscall reference sheet
Install Sysdig on CentOS 6
To install on CentOS
rpm --import https://s3.amazonaws.com/download.draios.com/DRAIOS-GPG-KEY.public curl -s -o /etc/yum.repos.d/draios.repo http://download.draios.com/stable/rpm/draios.repo rpm -i http://mirror.us.leaseweb.net/epel/6/i386/epel-release-6-8.noarch.rpm yum -y install kernel-devel-$(uname -r) yum -y install sysdig
Sysdig output formatting
By default sysdig will output using the following format
%evt.num %evt.time %evt.cpu %proc.name (%thread.tid) %evt.dir %evt.type %evt.args
Below is an example of what the output looks like when I run "sysdig --summary fd.ip=$Client_IP and fd.port=80"
1 11:40:03.058806870 3 httpd (32199) < accept fd=24(<4t>$Client_IP:59657->$ServIP:80) tuple=$ClientIP:59657->$Server_IP:80 queuepct=0 ##The first field is the event number, since this is the first event it's got a 1 here evt.num = 1 ##The second field is the timestamp for the event evt.time = 11:40:03.058806870 ##The third field is the CPU number where the event was captured on evt.cpu = 3 ##The fourth field is the name of the process the event was recorded under proc.name = httpd ##The fifth field is the Thread ID that was generated for the event, this corresponds to the PID for single threaded processes thread.tid = (32199) ##The sixth field is the direction of the event ">" is enter, "<" is exit evt.dir = < ##The seventh field is the event type evt.type = accept ##The rest of the output is the list of arguments for the event. This can vary depending on the syscall evt.args = fd=24(<4t>$Client_IP:59657->$ServIP:80) tuple=$ClientIP:59657->$Server_IP:80 queuepct=0
- evt.num is the incremental event number
- evt.time is the event timestamp
- evt.cpu is the CPU number where the event was captured
- proc.name is the name of the process that generated the event
- thread.tid is the TID that generated the event, which corresponds to the PID for single thread processes
- evt.dir is the event direction, > for enter events and < for exit events
- evt.type is the name of the event, e.g. 'open' or 'read'
- evt.args is the list of event arguments. In case of system calls, these tend to correspond to the system call arguments, but that’s not always the case: some system call arguments are excluded for simplicity or performance reasons.
You can modify this by using -p then entering in the fields you want to display
sysdig -p"%$Field_1 %$Field_2"
You can also re-label the fields by using :
sysdig -p"lolcats:%$Field_1 doge:%$Field_2"
This list may get updated with newer releases of sysdig, please visit this page to make sure you have the latest list:
Sysdig IO Related Fields
fd.directory If the fd is a file, the directory that contains it. fd.filename If the fd is a file, the filename without the path. proc.cwd the current working directory of the event. evt.is_io 'true' for events that read or write to FDs, like read(), send, recvfrom(), etc. evt.is_io_read 'true' for events that read from FDs, like read(), recv(), recvfrom(), etc. evt.is_io_write 'true' for events that write to FDs, like write(), send(), etc. evt.io_dir 'r' for events that read from FDs, like read(); 'w' for events that write to FDs, like write(). evt.is_wait 'true' for events that make the thread wait, e.g. sleep(), select(), poll().
Sysdig Memory Related Fields
proc.vmsize total virtual memory for the process (as kb). proc.vmrss resident non-swapped memory for the process (as kb). proc.vmswap swapped memory for the process (as kb).
Sysdig IP and Port Related Fields
fd.ip matches the ip address (client or server) of the fd. fd.cip client IP address. fd.sip server IP address. fd.port (FILTER ONLY) matches the port (either client or server) of the fd. fd.cport for TCP/UDP FDs, the client port. fd.sport for TCP/UDP FDs, server port. fd.l4proto the IP protocol of a socket. Can be 'tcp', 'udp', 'icmp' or'raw'. fd.sockfamily the socket family for socket events. Can be 'ip' or 'unix'. fd.is_server 'true' if the process owning this FD is the server endpointin the connection.
Sysdig Name, Number and Type Related Fields
fd.num the unique number identifying the file descriptor. fd.type Can be 'file', 'directory', ipv4', 'ipv6', 'unix', 'pipe', 'event', 'signalfd', 'eventpoll','inotify' 'signalfd'. fd.typechar type of FD as a single character. Can be 'f' for file, 4 for IPv4 socket, 6 for IPv6 socket, 'u' for unix socket, p for pipe, 'e' for eventfd, 's' for signalfd, 'l' for eventpoll, 'i' for inotify, 'o' for uknown. fd.name FD full name. If the fd is a file, this field contains the full path. If the FD is a socket, this field contain the connection tuple. proc.pid the id of the process generating the event. proc.exe the full name (including the path) of the executable generating the event. proc.name the name (excluding the path) of the executable generating the event. proc.ppid the pid of the parent of the process generating the event. proc.pname the name (excluding the path) of the parent of the process generating the event. proc.apid the pid of one of the process ancestors. E.g. proc.apid returns the parent pid, proc.apid returns the grandparen t pid, and so on. proc.apid is the pid of the current pr ocess. proc.apid without arguments can be used in filters o nly and matches any of the process ancestors, e.g. proc.api d=1234. proc.aname the name (excluding the path) of one of the process ancesto rs. E.g. proc.aname returns the parent name, proc.aname[ 2] returns the grandparent name, and so on. proc.aname i s the name of the current process. proc.aname without argum ents can be used in filters only and matches any of the pro cess ancestors, e.g. proc.aname=bash. proc.loginshellid the pid of the oldest shell among the ancestors of the curr ent process, if there is one. This field can be used to sep arate different user sessions, and is useful in conjunction with chisels like spy_user.
Sysdig Process and Thread Related Fields
proc.args the arguments passed on the command line when starting the process generating the event. proc.cmdline full process command line, i.e name + arguments. proc.nchilds the number of child threads of that the process generating the event currently has. proc.duration number of nanoseconds since the process started. proc.fdopencount number of open FDs for the process proc.fdlimit maximum number of FDs the process can open. proc.fdusage the ratio between open FDs and maximum available FDs for th e process. thread.pfmajor number of major page faults since thread start. thread.pfminor number of minor page faults since thread start. thread.tid the id of the thread generating the event. thread.ismain 'true' if the thread generating the event is the main one i n the process. thread.exectime CPU time spent by the last scheduled thread, in nanoseconds . Exported by switch events only. thread.totexectime Total CPU time, in nanoseconds since the beginning of the c apture, for the current thread. Exported by switch events o nly.
Sysdig Event Related Fields
evt.num event number. evt.time event timestamp as a time string that includes the nanoseco nd part. evt.time.s event timestamp as a time string with no nanoseconds. evt.datetime event timestamp as a time string that includes the date. evt.rawtime absolute event timestamp, i.e. nanoseconds from epoch. evt.rawtime.s integer part of the event timestamp (e.g. seconds since epo ch). evt.rawtime.ns fractional part of the absolute event timestamp. evt.reltime number of nanoseconds from the beginning of the capture. evt.reltime.s number of seconds from the beginning of the capture. evt.reltime.ns fractional part (in ns) of the time from the beginning of t he capture. evt.latency delta between an exit event and the correspondent enter eve nt. evt.latency.s integer part of the event latency delta. evt.latency.ns fractional part of the event latency delta. evt.deltatime delta between this event and the previous event. evt.deltatime.s integer part of the delta between this event and the previo us event. evt.deltatime.ns fractional part of the delta between this event and the pre vious event. evt.dir event direction can be either '>' for enter events or '<' f or exit events. evt.type For system call events, this is the name of the system call (e.g. 'open'). evt.cpu number of the CPU where this event happened. evt.args all the event arguments, aggregated into a single string. evt.arg (FILTER ONLY) one of the event arguments specified by name or by number. Some events (e.g. return codes or FDs) will b e converted into a text representation when possible. E.g. 'resarg.fd' or 'resarg'. evt.rawarg (FILTER ONLY) one of the event arguments specified by name. E.g. 'arg.fd'. evt.info for most events, this field returns the same value as evt.a rgs. However, for some events (like writes to /dev/log) it provides higher level information coming from decoding the arguments. evt.buffer the binary data buffer for events that have one, like read( ), recvfrom(), etc. Use this field in filters with 'contain s' to search into I/O data buffers. evt.res event return value, as an error code string (e.g. 'ENOENT') . evt.rawres event return value, as a number (e.g. -2). Useful for range comparisons. evt.failed 'true' for events that returned an error status. evt.is_syslog 'true' for events that are writes to /dev/log. evt.count This filter field always returns 1 and can be used to count events from inside chisels. evt.around (FILTER ONLY) Accepts the event if it's around the specifie d time interval. The syntax is evt.around[T]=D, where T is the value returned by %evt.rawtime for the event and D is a delta in milliseconds. For example, evt.around[14049969347 93590564]=1000 will return the events with timestamp with o ne second before the timestamp and one second after it, for a total of two seconds of capture.
Sysdig User and Group Related Fields
user.uid user ID. user.name user name. user.homedir home directory of the user. user.shell user's shell. group.gid group ID. group.name group name.
Sysdig Syslog Related Fields
syslog.facility.str facility as a string. syslog.facility facility as a number (0-23). syslog.severity.str severity as a string. Can have one of these values: emerg, alert, crit, err, warn, notice, info, debug syslog.severity severity as a number (0-7). syslog.message message sent to syslog.
Sysdig example commands
How to Analyze Apache, MySQL and PHP-FPM with Sysdig
sysdig proc.name=httpd or proc.name=mysqld or proc.name=php-fpm
If you want to reduce some of the "noisy" system calls you can filter out gettimeofday and switch
sysdig proc.name=httpd or proc.name=mysqld or proc.name=php-fpm and evt.type!=gettimeofday and evt.ty!=switch
How to Analyze MySQLd with Sysdig
This command will filter out most of the "noisy" output and will mainly show you when MySQL sends and receives data
sysdig -v -F proc.name=mysqld and evt.type!=gettimeofday and evt.type!=switch and evt.type!=io_getevents and evt.type!=futex and evt.type!=clock_gettime and evt.type!=select
How to Analyze Apache and PHP-FPM with Sysdig
This filters out a lot of noise, use it if you just want to focus on read and write like events for Apache and PHP. Create a phpinfo.php file and load it in a browser or use curl and run this.
sysdig -v -F proc.name=php-fpm or proc.name=httpd and evt.type!=gettimeofday and evt.type!=switch and evt.type!=clock_gettime and evt.type!=epoll_wait and evt.type!=getsockopt and evt.type!=wait4 and evt.type!=select and evt.type!=semop
How to analyze a website with sysdig
You can use this command to see exactly what happens when you load a webpage in your browser. Just replace the path with the actual path to your www/ content.
sysdig --summary fd.name contains /$path/to/www/
You can also look for a single IP address if you want to only gather information on a single connection. If you want to see how Apache processes your request to load a page you can specify port 80.
sysdig --summary fd.ip=$your_IP and fd.port=80
You can also filter a single IP, and filter out 1 specific service, for instance, if you want to view syscalls when you load certain files in the browser, but want to see all services besides say, ssh, use this command, otherwise you will get a ton of SSH info.
sysdig --summary fd.ip=$your_IP and proc.name!=sshd
If you want to record the results for later review you can use this command, replace $name.scap with whatever you want to name the file, it's suggested that you leave .scap to make it obvious that the file has information captured by sysdig in it.
sysdig --summary fd.ip=$your_IP and fd.port=80 -w $name.scap
To view that file after capture is done
sysdig -r $name.scap
If you want to view the delta between timestamps for each event, use -t d. This will tell you how long it took between one event and the next, useful for identifying how long each event took.
sysdig -t d --summary fd.ip=$your_IP and proc.name!=sshd
View available sysdig chisels
This command will list all the available chisels
By default, these are the available chisels.
Category: CPU Usage ------------------- topprocs_cpu Top processes by CPU usage Category: Errors ---------------- topfiles_errors top files by number of errors topprocs_errors top processes by number of errors Category: I/O ------------- echo_fds Print the data read and written by processes. fdbytes_by I/O bytes, aggregated by an arbitrary filter field fdcount_by FD count, aggregated by an arbitrary filter field iobytes Sum of I/O bytes on any type of FD iobytes_file Sum of file I/O bytes spy_file Echo any read/write made by any process to all files. Optio nally, you can provide the name of one file to only interce pt reads/writes to that file. stderr Print stderr of processes stdin Print stdin of processes stdout Print stdout of processes topfiles_bytes Top files by R+W bytes topfiles_time Top files by time topprocs_file Top processes by R+W disk bytes Category: Logs -------------- spy_logs Echo any write made by any process to a log file. Optionall y, export the events around each log message to file. spy_syslog Print every message written to syslog. Optionally, export t he events around each syslog message to file. Category: Misc -------------- around Export to file the events around the where the given filter matches. Category: Net ------------- iobytes_net Show total network I/O bytes spy_ip Show the data exchanged with the given IP address spy_port Show the data exchanged using the given IP port number topconns top network connections by total bytes topports_server Top TCP/UDP server ports by R+W bytes topprocs_net Top processes by network I/O Category: Performance --------------------- bottlenecks Slowest system calls fileslower Trace slow file I/O netlower Trace slow network I/0 proc_exec_time Show process execution time scallslower Trace slow syscalls topscalls Top system calls by number of calls topscalls_time Top system calls by time Category: Security ------------------ list_login_shells List the login shell IDs shellshock_detect print shellshock attacks spy_users Display interactive user activity Category: System State ---------------------- lsof List (and optionally filter) the open file descriptors. netstat List (and optionally filter) network connections. ps List (and optionally filter) the machine processes. Use the -i flag to get detailed information about a specific chisel
For more information on a chisel you can use -i
sysdig -i $chisel_name
Use a Sysdig chisel
To use a specific chisel, just use -c and then the name
sysdig -c $chisel_name
View incoming network connections
If you want to view all incoming network connections, but don't want to see a certain service listed, you can use the command below, replace "apache" with the service you want to filter out.
sysdig evt.type=accept and proc.name!=apache
Common types of syscalls
sysdig is newer than strace and there is a lot you can do with it. By default it shows all system calls on a server but you can filter out certain applications if you want.
For more information on how to install and use sysdig:
"The futex() system call provides a method for a program to wait for a value at a given address to change, and a method to wake up anyone waiting on a particular address (while the addresses for the same memory in separate processes may not be equal, the kernel maps them internally so the same memory mapped in different locations will correspond for futex() calls). This system call is typically used to implement the contended case of a lock in shared memory, as described in futex(7)."
In general, futexes were created to help improve performance by avoiding the use of system calls whenever possible possible, since each call can consume several hundred instructions. The other part was to avoid unnecessary context switches.
In order to avoid systems calls for uncontended cases, there must be a shared state in the user space which is accessible by all processes and tasks. This shared spaces is known as the "user lock" which indicates it's status so all processes are aware if it's currently held or not.
The user lock is located in shared memory, created by mmap. Because of the location of the lock, it can be used by multiple processes and can be located in different virtual addresses in different address spaces.
A lock can be globally identified by [B,O]
B) The memory object backing the virtual address O) The offset within that object
There are 3 memory types:
Anonymous Memory (only usable by threads in the same process) Shared Memory Segment (usable by multiple processes) Memory Mapped Files (usable by multiple processes)
There are two arguments for the futex system call. The first argument is the address of the futex (addr=), the second argument is the operation (op=)
futex addr=198977C op=133(FUTEX_PRIVATE_FLAG|FUTEX_WAKE_OP) val=1
The Clone syscall creates new processes and threads. It is one of the more complex system calls and can be expensive to run so if you notice tons of these syscalls and performance is low you may want to try and reduce the amount of times this happens by increasing the process lifetime or reducing the amount of processes in general.
sysdig filter for clone
This syscall executes programs, typically you will see this call after the clone syscall. Everything that gets executed goes through this call.
sysdig filter for execve
This syscall changes the process working directory. If anything changes directory you can see it by filtering this syscall.
sysdig filter for chdir
These syscalls opens files and can also create them. If you trace this syscall you can view file creation and who is touching what.
sysdig filter for open and creat
sysdig evt.type=open sysdig evt.type=creat
This syscall initiates connections on a socket(s). This syscall is the only one that can establish a network connection.
sysdig filter for connect. You can also specify a port or IP to view specific services or IPs.
sysdig evt.type=connect sysdig evt.type=connect and fd.port=80
This syscall accepts a connection on a socket. You will always see this syscall when connect is called.
sysdig filter for accept. You can also specify a port or IP to view specific services or IPs.
sysdig evt.type=accept sysdig evt.type=accept and fd.port=80
These syscalls read or write data to or from a file.
sysdig filter for IO
You can also use chisel to view IO for certain files, ports, or programs, for example
sysdig -c echo_fds fd.name=/var/lib/mysql/ sysdig -c echo_fds proc.name=httpd and fd.port!=80
These syscalls delete or rename files.
sysdig evt.type=unlink sysdig evt.type=rename