We live in a rainbow of chaos. -Paul Cézanne
This post was intended to go up sooner, but life and work being what they are I fell a bit behind. It is here now, though, and continues through my experiences with Gremlin and Chaos Engineering.
If you have not visited their site and requested a demo of Gremlin, this next bit will be incredibly difficult. You do have to be able to run the commands which require a Team ID and a corresponding Team Secret.
Alright, there are few basics to cover real quick before we dive into the actual attacks. First is how you will monitor the attacks. If you are on a Linux or Mac machine locally, this is an easy task because you can simply open a second terminal session and connect to your server for monitoring purposes. Windows folks, we have a bit of explaining to do.
There are two simple options at this point, as PuTTY does not play well with multiple terminals. They are tmux and screen. Since both do relatively the same thing, for the purpose of this discussion I am going to show a limited amount of commands to be able to use tmux effectively and quickly.
TMUX: QUICK & DIRTY
First, create a new tmux session named “gremlin”:
tmux new -s gremlin
Now, this will launch into a tmux screen. You should see the name of your tmux session in the lower left corner. In order to detach from the session, but leave the work in that session run, do the following:
Press <CTRL> and “b”, release and then press “d”.
You may see commands similar to this represented as:
The “^” before the “b” means to press the <CTRL> key along with the letter. Just some fancy shorthand.
In order to reattach to the session:
tmux attach-session -t gremlin
This will now give you the ability to have a second session to run tests inside of and then jump back to our main session for monitoring. Now to catch back up with everyone else.
If you have not already, please go ahead and follow the short walk-through over here to get the base install created. Building on that foundation, we need to add one more package to help with monitoring and then initialize our Gremlin application.
The package we need is called “iotop” and it shows disk read/write usage. This will be important when we run the disk attacks later. To install it onto the Ubuntu 18.04 server it is just a matter of:
apt install iotop
After that is done, we need to run the init process with Gremlin. When you do this, it will ask for your Team ID and Team Secret. If you have not requested a demo, then you will need to do so at this point.
Now that all the planning and prepping is done, on to the fun stuff!
All of the attacks have their own help file that can be accessed with:
gremlin help attack <TYPE>
If you can’t remember the attack name, try the following to see a list:
gremlin help attack
In this post I am just going to cover the 4 Resource attacks: CPU, Disk, I/O and Memory. Each one will include its help file at the start, then any notes I noticed while playing with them, an example command and finally a way to monitor the command in action.
gremlin help attack cpu Usage: gremlin attack cpu [-l LENGTH] [-c CORES] An attack which consumes CPU resources Options: -l, --length LENGTH The length of the attack (seconds) -c, --cores CORES The number of cores to try to utilize
If you do not know how many cores your server has, you can use
top to see them. Once the system information is pulled up, hit the number
1 and the cores will be listed at the top of the screen under the section “%Cpu”. The count starts with 0.
Here is a simple attack that lasts sixty seconds and stresses a single core:
gremlin attack cpu -l 60 -c1
To monitor this, in your alternate session, go back to
top and pull up the core list again. You should see one of the cores gaining a fair amount of usage over the course of sixty seconds.
gremlin help attack disk Usage: gremlin attack disk [-l LENGTH] [-d DIR] [-w WORKERS] [-b SIZE] [-p PERCENTAGE] An attack which consumes disk resources Options: -l, --length LENGTH The length of the attack (seconds) -d, --dir DIR The root directory to run the disk attack -w, --workers WORKERS The number of disk-write workers to execute -b, --block-size SIZE Number of Kilobytes (KB) that are read/written at a time -p, --percent PERCENTAGE Percent of Volume to fill (0-100)
Some recommendations from my mistakes, if you do not have a specific use-case planned out ahead of time:
/tmp as the root directory. That way, if something does go wrong and the server crashes, it will empty the directory automatically on reboot.
2) Unless you have partitioned set up so that your entire disk drive does not get filled, do not set the percentage to 100.
gremlin attack disk -l 60 -d /tmp -w 1 -b 4 -p 75
df -h to keep an eye on disk space and to avoid having to type it over and over, work in the
watch -n5 df -h
This will keep it refreshing the command every five seconds until you break (^c) the command.
gremlin help attack io Usage: gremlin attack io [-l LENGTH] [-w WORKERS] [-d DIR] [-m MODE] [-s SIZE] [-c COUNT] An attack which consumes IO resources Options: -l, --length LENGTH The length of the attack (seconds) -w, --workers WORKERS The number of io workers to execute -d, --dir DIR The root directory to run the io attack -m, --mode MODE The io mode to execute [r,w,rw] -s, --block-size SIZE Number of Kilobytes (KB) that are read/written at a time -c, --block-count COUNT The number of blocks read/written by workers
Just like with the Disk attack, it is best to have this target the
/tmp directory. Also, with a smaller, cloud server, it is best to keep the block-size around 4KBs and simply increase the worker count if you are wanting to hyper-stress the server.
gremlin attack io -l 60 -w 1 -d /tmp -m rw -s 4 -c 1
Now, this is the part where
iotop comes into play. We will use it to monitor the read and write levels on the disk in our alternate session.
gremlin help attack memory Usage: gremlin attack memory [-l LENGTH] [-g GBS] [-m MBS] An attack which consumes memory Options: -l, --length LENGTH The length of the attack (seconds) -g, --gigabytes GBS The number of gigabytes to allocate -m, --megabytes MBS The number of megabytes to allocate
The only thing to really pay attention to here is that both
-m cannot be used in the same command. It will default to MBs, regardless of the order the two are placed in the command.
gremlin attack memory -60 -g 1
Similar to the disk monitoring, we will use the
watch command again:
watch -n5 free -m
That covers the basics of the first four attacks. Of course, all of these can be launched from inside the dashboard, but personally, I like tearing apart the commands in the terminal first. I believe that once I have a grasp of how the process works, then I can use a GUI or dashboard, if I really feel it will save me time. Until next time!