Command Line Client

The command line client is a thin wrapper over the REST API. Its purpose is to make the command invocations more succinct and to format the responses. The command name is scrapy-do-cl. It is followed by a bunch of optional global parameters, the name of the command to be executed, and the command’s parameters:

scrapy-do-cl [global parameters] command [command parameters]

Global parameters and the configuration file

  • --url - the URL of the scrapy-do server, i.e.: http://localhost:7654
  • --username - user name, in case the server is configured to perform authentication
  • --password - user password; if the password is not specified and was not configured in the configuration file, the user will be prompted to type it in the terminal.
  • --print-format - the format of the output; valid options are simple, grid, fancy_grid, presto, psql, pipe, orgtbl, jira, rst, mediawiki, html, latex; defaults to psql.
  • --verify-ssl - a boolean determining whether the SSL certificate checking should be enabled; defaults to True

The defaults for some of these parameters may be specified in the scrapy-do section of the ~/.scrapy-do.cfg file. The parameters configurable this way are: url, username, password, and print-format.

Commands and their parameters


Get information about the daemon and its environment.


$ scrapy-do-cl status
| key             | value                      |
| cpu-usage       | 0.0                        |
| memory-usage    | 42.9765625                 |
| jobs-canceled   | 0                          |
| timezone        | UTC; UTC                   |
| uptime          | 1m 12d 8h 44m 58s          |
| jobs-run        | 761                        |
| status          | ok                         |
| jobs-failed     | 0                          |
| hostname        | ip-172-31-35-215           |
| time            | 2018-01-27 08:28:55.625109 |
| jobs-successful | 761                        |


Push a project archive to the server replacing an existing one of the same name if it is already present.


  • --project-path - path to the project that you intend to push; defaults to the current working directory


$ scrapy-do-cl push-project
| spiders        |
| toscrape-css   |
| toscrape-xpath |


Get a list of the projects registered with the server.


$ scrapy-do-cl list-projects
| name      |
| quotesbot |


List spiders provided by the given project.


  • --project - name of the project


$ scrapy-do-cl list-spiders --project quotesbot
| name           |
| toscrape-css   |
| toscrape-xpath |


Schedule a job.


  • --project - name of the project
  • --spider - name of the spider
  • --when - a schedling spec, see Scheduling Specs; defaults to now


$ scrapy-do-cl schedule-job --project quotesbot \
    --spider toscrape-css --when 'every 10 minutes'
| identifier                           |
| 2abf7ff5-f5fe-47d2-96cd-750f8701aa27 |


Get information about a job or jobs.


  • --status - status of the jobs to list, see Jobs; addtionally ACTIVE and COMPLETED are accepted to get lists of jobs with related statuses; defaults to ACTIVE
  • --job-id - id of the job to list; superceeds --status

Query by status:

$ scrapy-do-cl list-jobs --status SCHEDULED
| identifier                           | project   | spider       | status    | schedule              | actor   | timestamp                  | duration   |
| 2abf7ff5-f5fe-47d2-96cd-750f8701aa27 | quotesbot | toscrape-css | SCHEDULED | every 10 minutes      | USER    | 2018-01-27 09:44:19.764036 |            |
| 0a3db618-d8e1-48dc-a557-4e8d705d599c | quotesbot | toscrape-css | SCHEDULED | every 5 to 15 minutes | USER    | 2018-01-27 08:29:24.749770 |            |

Query by id:

$ scrapy-do-cl list-jobs --job-id 2abf7ff5-f5fe-47d2-96cd-750f8701aa27
| identifier                           | project   | spider       | status    | schedule         | actor   | timestamp                  | duration   |
| 2abf7ff5-f5fe-47d2-96cd-750f8701aa27 | quotesbot | toscrape-css | SCHEDULED | every 10 minutes | USER    | 2018-01-27 09:44:19.764036 |            |


Cancel a job.


  • --job-id - id of the job to cancel


$ scrapy-do-cl cancel-job --job-id 2abf7ff5-f5fe-47d2-96cd-750f8701aa27


Retrieve the log file of the job that has either been completed or is still running.


  • --job-id - id of the job
  • --log-type - out for standard output; err for standard error output


$ scrapy-do-cl get-log --job-id b37be5b0-24bc-4c3c-bfa8-3c8e305fd9a3 \
    --log-type err


Remove a project.


  • name - name of the project
$ scrapy-do-cl remove-project --project quotesbot