REST API¶
This section describes the REST API provided by Scrapy Do. The responses to
all of the requests except for get-log
are JSON dictionaries. Error
responses look like this:
{ "msg": "Error message", "status": "error" }
Successful responses have the status
part set to ok
and a variety of
query dependent keys described below. The request examples use curl and jq.
status.json
¶
Get information about the daemon and its environment.
- Method:
GET
Example request:
$ curl -s "http://localhost:7654/status.json" | jq -r{ "status": "ok", "memory-usage": 39.89453125, "cpu-usage": 0, "time": "2017-12-11 15:20:42.415793", "timezone": "CET; CEST", "hostname": "host", "uptime": "1d 12m 24s", "jobs-run": 24, "jobs-successful": 24, "jobs-failed": 0, "jobs-canceled": 0 }
push-project.json
¶
Push a project archive to the server replacing an existing one of the same name if it is already present.
Method:
POST
Parameters:
archive
- a binary buffer containing the project archive
$ curl -s http://localhost:7654/push-project.json \ -F archive=@quotesbot.zip | jq -r
{ "status": "ok", "name": "quotesbot", "spiders": [ "toscrape-css", "toscrape-xpath" ] }
list-projects.json
¶
Get a list of the projects registered with the server.
Method:
GET
$ curl -s http://localhost:7654/list-projects.json | jq -r
{ "status": "ok", "projects": [ "quotesbot" ] }
list-spiders.json
¶
List spiders provided by the given project.
Method:
GET
Parameters:
project
- name of the project
$ curl -s "http://localhost:7654/list-spiders.json?project=quotesbot" | jq -r
{ "status": "ok", "project": "quotesbot", "spiders": [ "toscrape-css", "toscrape-xpath" ] }
schedule-job.json
¶
Schedule a job.
Method:
POST
Parameters:
project
- name of the projectspider
- name of the spiderwhen
- a schedling spec, see Scheduling Specs.description
- a short description of the job instance (optional)payload
- a valid JSON object for user-specified payload that will be passed as a scrapy named argument to the spider code (optional)
$ curl -s http://localhost:7654/schedule-job.json \ -F project=quotesbot \ -F spider=toscrape-css \ -F "when=every 10 minutes" | jq -r
{ "status": "ok", "identifier": "5b30c8a2-42e5-4ad5-b143-4cb0420955a5" }
list-jobs.json
¶
Get information about a job or jobs.
- Method:
GET
- Parameters (one required):
status
- status of the jobs to list, see Jobs; addtionallyACTIVE
andCOMPLETED
are accepted to get lists of jobs with related statuses.id
- id of the job to list
Query by status:
$ curl -s "http://localhost:7654/list-jobs.json?status=ACTIVE" | jq -r{ "status": "ok", "jobs": [ { "identifier": "5b30c8a2-42e5-4ad5-b143-4cb0420955a5", "status": "SCHEDULED", "actor": "USER", "schedule": "every 10 minutes", "project": "quotesbot", "spider": "toscrape-css", "description": "test #1", "timestamp": "2017-12-11 15:34:13.008996", "duration": null, "payload": "{\n\"test\": [1, 2, 3]\n}" }, { "identifier": "451e6083-54cd-4628-bc5d-b80e6da30e72", "status": "SCHEDULED", "actor": "USER", "schedule": "every minute", "project": "quotesbot", "spider": "toscrape-css", "description": "", "timestamp": "2017-12-09 20:53:31.219428", "duration": null, "payload": "{}" } ] }
Query by id:
$ curl -s "http://localhost:7654/list-jobs.json?id=317d71ea-ddea-444b-bb3f-f39d82855e19" | jq -r{ "status": "ok", "jobs": [ { "identifier": "317d71ea-ddea-444b-bb3f-f39d82855e19", "status": "SUCCESSFUL", "actor": "SCHEDULER", "schedule": "now", "project": "quotesbot", "spider": "toscrape-css", "description": "test #1", "timestamp": "2017-12-11 15:40:39.621948", "duration": 2, "payload": "{\n\"test\": [1, 2, 3]\n}" } ] }
cancel-job.json
¶
Cancel a job.
Method:
POST
Parameters:
id
- id of the job to cancel
$ curl -s http://localhost:7654/cancel-job.json \ -F id=451e6083-54cd-4628-bc5d-b80e6da30e72 | jq -r
{ "status": "ok" }
get-log
¶
Retrieve the log file of the job that has either been completed or is still running.
- Method::
GET
Get the log of the standard output:
$ curl -s http://localhost:7654/get-log/data/bf825a9e-b0c6-4c52-89f6-b5c8209e7977.out
Get the log of the standard error output:
$ curl -s http://localhost:7654/get-log/data/bf825a9e-b0c6-4c52-89f6-b5c8209e7977.err
remove-project.json
¶
Remove a project.
Method:
POST
Parameters:
name
- name pf the project
$ curl -s http://localhost:7654/remove-project.json \ -F name=quotesbot | jq -r
{ "status": "ok" }