requestctl
requestctl is a command-line tool to control the configuration that manages access and routing of web requests. Wikimedia SREs use this tool to throttle and block certain requests patterns in our edge caching, either in the HAProxy TLS terminator or in Varnish frontend.
Get started
Tutorials
Essential concepts
requestctl uses a custom schema that defines four types of objects:
- pattern objects describe specific patterns of an HTTP request.
- ipblock objects group specific IP ranges into logical groups.
- action objects describe an action to be performed on a request that matches specific combinations of patterns and ipblocks. Actions are what is enabled on Varnish.
- haproxy_action objects are similar to
action
objects, but allow a different set of actions because of the capabilities of haproxy.
For more details of how requestctl works, read the Overview.
User guide
For a full list of commands, see the Command line reference.
Identify problematic traffic patterns
Use one or more of the following tools to find and confirm the traffic patterns you need to block:
- Filter requests in Turnilo and Superset live data, based on which requestctl actions match them
- Inspect the current or potential impact of requestctl actions in Turnilo/Superset
- Check Varnish logs for matching requests
Look for existing request pattern
To explore patterns that already exist, you can:
- Check the git repository dumped on the puppetservers at
/srv/git/conftool/auditlog
; all requesctl objects there have arequest-
prefix. - Query the backend using requestctl's
get
command:
# All patterns
requestctl get pattern
# Request a specific pattern, like 'ua/requests'
requestctl get pattern ua/requests
To list all actions to which a pattern is applied, use the find
command. For example:
$ requestctl find ua/requests
action: generic_ua_aws, expression: (pattern@ua/requests OR pattern@ua/curl) AND ipblock@cloud/aws
haproxy_action: generic_ua_aws, expression: (pattern@ua/requests OR pattern@ua/curl) AND ipblock@cloud/aws
Add a new pattern
If no existing request pattern matches what you need, add a new one by creating a YAML file on any puppetserver. The pattern object described by your file should capture, with good flexibility, most of the characteristics you want to match in a request.
See the tutorial for examples of adding new patterns. |
You can declare entries in your YAML file for the following fields:
method
, the http methodrequest_body
a regex to match in the http body. CURRENTLY UNSUPPORTED IN VARNISH.url_path
the path part of the URL, will be used as a regexpheader
an header name to match, using the regexp atheader_value
;header_value
the regexp to match the value ofheader
to. If left blank when a header is defined, the pattern means “the header is not present”query_parameter
andquery_parameter_value
are a parameter and a regexp for the value of a query parameter to match. An empty value will be interpreted as “for any value”.
Each pattern has an associated “scope” tag, which is an arbitrary grouping of different patterns. For example, the "ua" scope applies to patterns matching specific user agents.
Sync the pattern object to etcd
After you create the YAML file, use the apply
command to sync to etcd so you can use the pattern in an actionː
puppetserver1001:~$ sudo requestctl apply pattern <scope>/<name> -f <yaml_file>
Add a new ipblock
ipblocks let you group specific IP ranges into logical groups. For example: the ipblock with scope=cloud,name=aws includes all the IP ranges used by AWS.
To add a new ipblock: on any puppetserver, create a YAML file with the entries you want to declare. ipblocks have two fields:
comment
- a concise and precise description of the ipblockcidrs
- a list of IPv4 or IPv6 CIDRs
See the tutorial for an example of adding an ipblock. |
When adding a new ipblock, you should typically add it to one of the three scopes (basically: categories) that already exist:
abuse
- this category should gather all the small groups of abusers we add manually. Each list should be kept reasonably small as they're implemented as ACLs in varnish, and not netmapper files, which are more efficient.known-clients
- this category should include most large client groups that are not cloud providers. This includes e.g. googlebot. Some of these are updated automaticallycloud
- this category should only include cloud providers and all entries should be updated automatically.
Example scenario: a couple annoying clients are causing issues and you want to create a new ipblock for them. Because it's just two IPs, and you're defining a bespoke rule based on abusive behaviour, they should go in the abuse
category.
To add a new scope (category), you must contact the Traffic team, because that requires modifications to both varnish and haproxy configurations.
Sync the ipblock object to etcd
After you create the YAML file, use the apply
command to sync to etcd so you can use the ipblock in an actionː
puppetserver1001:~$ sudo requestctl apply ipblock <scope>/<name> -f <yaml_file>
Define an action
Action objects define what happens to the traffic matching a given pattern: should matching traffic be blocked, or rate limited? What HTTP status should be served? What message?
Actions are dumped to YAML files on the puppetservers at /srv/git/conftool/auditlog/request-actions
.
- To list all or a subset of actions, use the
get
command. - To modify an existing action, use the
modify
command. - To define a new action, create a new YAML file on any puppetserver, then run
apply
to sync to etcd, enable on Varnish, and commit.
Action objects are associated to a specific cluster (cache-text
or cache-upload
at the time of writing) and have a name. Their fields are:
enabled
boolean. If false, the pattern will not be included in VCLsites
a list of datacenters where to apply the rule. If empty, the rule will be applied to all datacenters.cache_miss_only
boolean. If false, the pattern will be applied also to cache hits. Not applicable to haproxy_actions.comment
a comment to describe what this action does.resp_status
the http status code to send as a responseresp_reason
the text to send as a reason with the responsedo_throttle
boolean to say if we should throttle requests matching theexpression
(true) on just respond withresp_status
unconditionally (false)throttle_requests
,throttle_interval
,throttle_duration
are the three arguments ofvsthrottle
in VCL to control the rate-limiting behaviour. Not available for haproxy_actions.throttle_per_ip
boolean. Makes the rate-limiting per-ip rather than per-cache-server. Not available for haproxy_actions.log_matching
if true, it will record inX-Requestctl
if a request matches the rule. It will thus be included into thevcl
objects even if disabled; it will just not perform any banning / ratelimiting action.expression
a string describing the combination of patterns and ipblocks that should be matched. The BNF of the grammar is described incli.Requestctl.grammar
, but in short:- A pattern is referenced with the keyword
pattern@<scope>/<name>
- An ipblock is referenced with the keyword
ipblock@<scope>/<name>
- Patterns and ipblocks can be combined with
AND
,AND NOT
,OR
,OR NOT
logic. - Organize statements into groups by using parentheses.
- A pattern is referenced with the keyword
Example valid expressions:
( pattern@ua/requests OR pattern@ua/curl ) AND ipblock@cloud/aws AND NOT pattern@site/commons
Sync action objects to etcd
After you create an action object in a file, to sync it in the datastoreː
puppetserver1001:~$ sudo requestctl apply action <cluster>/<name> -f <yaml_file>
Enable / disable and commit action
To actually get changes to an action injected into the Varnish configuration, run enable
or disable
and then commit
to etcd the resulting VCL snippet:
# Writes to the datastore, needs sudo
sudo requestctl enable cache-text/generic_ua_clouds && sudo requestctl commit
sudo requestctl disable cache-text/generic_ua_clouds && sudo requestctl commit
Define haproxy ̠actions
To act on requests before they touch the caching layer, you must inject actions in the HAProxy configuration instead of Varnish. Haproxy ̠action objects are very similar to action objects, and share many (but not all) of the same fields. Because they're targeting HAProxy, they allow you to perform different actions on the request. Specificallyː
- No rate-limiting is enforced at this level, as we were wary of the performance implications of adding potentially one stick-table per rule. Rate-limiting will have to happen at the Varnish layer. So all
throttle_*
fields aren't present. - Given we're not caching anything in HAProxy,
cache_miss_only
has no meaning. silent_drop
- HAProxy can both deny or silently drop a request. To silently drop a request, setsilent_drop: true</code
.- HAProxy can limit the bandwidth available for requests that match a certain pattern. This is controlled by:
bw_throttle
(boolean)bw_throttle_rate
(the rate limit in bandwidth)bw_duration
(duration of the limit)
To define a new haproxy action, create a YAML file on any puppetserver, then run apply
to sync to etcd, enable and commit.
List existing haproxy ̠actions
# All actions.
requestctl get haproxy_action -o yaml
# A specific action
requestctl get haproxy_action cache-text/generic_ua_clouds
# All enabled actions
requestctl get haproxy_action -o json | jq 'to_entries[] | select(.value.enabled == true)'
Sync haproxy ̠action objects to etcd
After you create an action object in a file, to sync it in the datastoreː
puppetserver1001:~$ sudo requestctl apply haproxy_action <cluster>/<name> -f <yaml_file>
Enable / disable and commit haproxy ̠action
To actually get changes to an action injected into the HAProxy configuration, run enable
or disable
and then commit
to etcd the resulting DSL snippet:
# Writes to the datastore, needs sudo
sudo requestctl enable -s haproxy cache-text/generic_ua_clouds && sudo requestctl commit
sudo requestctl disable -s haproxy cache-text/generic_ua_clouds && sudo requestctl commit
Remove an object
If you're removing an action object, be sure to follow it with a commit
to avoid issues later on if referenced objects (e.g., ipblocks) are deleted, while the action remains live in the production config. For example:
sudo requestctl delete action cache-text/some-action-to-delete
sudo requestctl commit
If you're removing a pattern / ipblock, first ensure it's not referenced by any action objectː
# Find all actions containing a specific pattern or ipblock - both will be searched!
requestctl find ua/foobar
# Same for haproxy-actions
requestctl find -s haproxy ua/foobar
requestctl doesn't allow you to remove a pattern / ipblock if they’re still referenced in an actionː it will terminate with exit code 1 and will print an error like pattern ua/foobar is used by the following actionː cache-text/baz
.
Once you've ensured your object can be removed, you can runː
sudo requestctl delete pattern ua/foobar
Note: requestctl will happily remove a pattern / ipblock if an action that references them has been deleted, but there has been no subsequent commit. As noted above, be sure to promptly commit after deleting the action to avoid this pitfall.
Modify and sync any object
- SSH to a puppetserver frontend.
- Find the YAML file corresponding to your object under
/srv/git/conftool/auditlog
; make a copy. - Modify the copied file.
- Run
sudo requestctl apply <object-type> <tag>/<name> -f <filename>
. - If you've modified an action object or an haproxy ̠action object, don't forget to inject the change into the production systems by running
sudo requestctl commit
.
Command line reference
The ultimate source of truth for requestctl commands is the code in Gitlab.
apply
Writes (or "syncs") an object to the datastore from data in a YAML file.
$ sudo requestctl apply haproxy_action cache-upload/bwlimit_google_cloud -f file.yaml
The enabled field in actions is explicitly excluded from syncing. |
If you've modified an action object or an haproxy ̠action object, don't forget to also inject the change into the production systems by running sudo requestctl commit
.
commit
Commits changes to actions to the compiled datastores. Interactive by default; pass -b
if you want to run in batch mode.
$ requestctl enable cache-text/requests_ua_api
$ requestctl commit
--- cache-text/global.old
+++ cache-text/global.new
@@ -1,3 +1,12 @@
+
+// FILTER requests_ua_api
+// Disallow python-requests to access restbase or the action api
+// This filter is generated from data in etcd. To disable it, run the following command:
+// sudo requestctl disable 'cache-text/requests_ua_api'
+if (req.http.User-Agent ~ "^python-requests" && (req.url ~ "^/api/rest_v1/" || req.url ~ "/w/api.php") && vsthrottle.is_denied("requestctl:requests_ua_api", 500, 30s, 1000s)) {
+ set req.http.Requestctl = req.http.Requestctl + ",requests_ua_api";
+ return (synth(429, "Please see our UA policy"));
+}
+
// FILTER enwiki_api_cloud
// Limit access to the enwiki api from the clouds
==> Ok to commit these changes?
Type "go" to proceed or "abort" to interrupt the execution
>
Varnish rules that also apply to cache hits will be shown as cache-text/hit-global or cache-text/hit-<datacenter> when committing. |
Once all Varnish changes are merged, the haproxy actions will be committed as well.
dump
For each category of objects, copies the directory from a specific repository:
# This generates a tree of action objects under dump_dir
$ requestctl dump -g dump_dir action
Parameters:
-g
,--git-repo
identifies the base directory
enable / disable
Enables / disables actions. Note: the enabled
field in actions is explicitly excluded from syncing with the apply
command.
$ requestctl enable cache-text/foobar # enables cache-text/foobar in varnish
$ requestctl disable -s varnish cache-text/foobar # disables the same action in Varnish.
$ requestctl enable -s haproxy cache-text/foobar # enables a similarly named haproxy_action
find
Finds which actions include a specific pattern or ipblock:
requestctl find pattern
Example:
$ requestctl find ua/requests
action: generic_ua_aws, expression: (pattern@ua/requests OR pattern@ua/curl) AND ipblock@cloud/aws
haproxy_action: generic_ua_aws, expression: (pattern@ua/requests OR pattern@ua/curl) AND ipblock@cloud/aws
find-ip
Finds which ipblocks include a specific IP address:
$ sudo requestctl find-ip -g /srv/git/conftool/auditlog 99.117.1.2
IP 99.117.1.2 is not part of any ipblock on disk
$ sudo requestctl find-ip -g /srv/git/conftool/auditlog 116.129.226.12
IP 116.129.226.12 is part of prefix 116.129.226.0/25 in ipblock cloud/aws
get
Gets the data from the datastore and displays them in the desired format. Can be used to fetch all objects or just one. See the sections below for example usage with different object types.
get action
# All actions.
requestctl get action -o yaml
# A specific action
requestctl get action cache-text/generic_ua_clouds
# All enabled actions
requestctl get action -o json | jq 'to_entries[] | select(.value.enabled == true)'
get haproxy_action
# All actions.
requestctl get haproxy_action -o yaml
# A specific action
requestctl get haproxy_action cache-text/generic_ua_clouds
# All enabled actions
requestctl get haproxy_action -o json | jq 'to_entries[] | select(.value.enabled == true)'
get ipblock
:~$ requestctl get ipblock -o json | jq -r 'keys[]'
abuse/blocked_nets
abuse/bot_blocked_nets
abuse/bot_posts_blocked_nets
abuse/phabricator_abusers
abuse/text_abuse_nets
cloud/akamai
cloud/aws
cloud/azure
cloud/digitalocean
cloud/gcp
cloud/linode
cloud/oci
cloud/public_cloud_nets
known-clients/googlebot
get pattern
$ requestctl get pattern
+------------------------+-------------------------------+
| name | pattern |
+------------------------+-------------------------------+
| cache-text/docroot | url:^/[\?$] |
| cache-text/bad_param_q | ?q=\w{12} |
| cache-text/enwiki | Host: en.wikipedia.org |
| cache-text/restbase | url:^/api/rest_v1/ |
| cache-text/action_api | url:/w/api.php |
| cache-text/requests_ua | User-Agent: python-requests.* |
| cache-text/wiki_page | url:/wiki/[^:]+(\?$) |
| ua/requests | User-Agent: ^python-requests |
+------------------------+-------------------------------+
$ requestctl get pattern ua/requests -o json | jq .
{
"ua/requests": {
"method": "",
"request_body": "",
"url_path": "",
"header": "User-Agent",
"header_value": "^python-requests",
"query_parameter": "",
"query_parameter_value": ""
}
}
$ requestctl get pattern ua/requests -o yaml
ua/requests:
header: User-Agent
header_value: ^python-requests
method: ''
query_parameter: ''
query_parameter_value: ''
request_body: ''
url_path: ''
haproxycfg
Outputs the haproxy configuration fragment generated from the haproxy_action
:
$ requestctl haproxycfg cache-text/requests_ua_api
# ACLs generated for requestctl actions
acl ua_python_requests hdr_reg(User-Agent) -i "^python\-requests"
acl url_rest_api path_reg -i "^/api/rest_v1/"
acl url_action_api path_reg -i "/w/api.php"
# requestctl haproxy action cache-text/requests_ua_api
# Disallow python-requests to access restbase or the action api
# This action is generated from data in etcd. To disable it, run the following command:
# sudo requestctl disable -s haproxy 'cache-text/requests_ua_api'
http-request deny status 429 reason "Please see our UA policy" if ua_python_requests url_rest_api || url_rest_api url_action_api
log
Outputs the varnishncsa command to run on a cache host to see requests matching a given action.
$ requestctl log cache-text/requests_ua_api
You can monitor requests matching this action using the following command:
sudo varnishncsa -n frontend -g request \
-F '"%{X-Client-IP}i" %l %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i" "%{X-Public-Cloud}i"' \
-q 'ReqHeader:User-Agent ~ "^python-requests" and ( ReqURL ~ "^/api/rest_v1/" or ReqURL ~ "/w/api.php" ) and not VCL_ACL eq "MATCH wikimedia_nets"'
There is no corresponding functionality for haproxy actions.
validate
Validates objects written in a directory tree. Useful for testing new actions.
$ for what in pattern ipblock action haproxy_action; do requestctl dump base_dir $what; done
# Edit whatever action/haproxy_action you want to modify
$ edit base_dir/requestctl-actions/foo/bar.yaml
$ requestctl validate base_dir
$ requestctl apply action foo/bar base_dir/requestctl-actions/foo/bar.yaml
It will exit with non-zero exit status if any error is present.
vcl
Outputs the vcl fragment generated from the action.
$ requestctl vcl cache-text/requests_ua_api
// FILTER requests_ua_api
// Disallow python-requests to access restbase or the action api
// This filter is generated from data in etcd. To disable it, run the following command:
// sudo requestctl disable 'cache-text/requests_ua_api'
if (req.http.User-Agent ~ "^python-requests" && (req.url ~ "^/api/rest_v1/" || req.url ~ "/w/api.php") && vsthrottle.is_denied("requestctl:requests_ua_api", 500, 30s, 1000s)) {
return (synth(429, "Please see our UA policy"));
}
Code reference
Auditlog repository
In production, Conftool2git synchronizes requestctl objects to a git repository on the puppetservers under /srv/git/conftool/auditlog
. Don't update object definitions inside that repository; instead, copy the file over to your home directory and run apply
. Your changes modify data that resides in our main Etcd cluster under /conftool/v1/request-{ipblock,action,pattern}s/
.
Schema
Related playbooks
- (restricted access) (D)Dos Playbook