Introduction
Caesar (https://github.com/zooniverse/caesar) is Zooniverse’s decision engine and real-time data processing pipeline. Caesar monitors volunteer classifications as they are submitted (i.e., a Lambda script monitors the Kinesis data stream and forwards to Caesar's HTTP API; see kinesis-to-http), then extracts, reduces, and acts on these data in real time.
Project teams primarily interact with and configure Caesar using the applications Web UI at https://caesar.zooniverse.org. The Panoptes Python Client can also be used for programatic interactions with the Caesar API; see examples in the Python Client docs.
Caesar interacts with and is extended by the Zooniverse's Aggregations application, specifically through the use of online extractors and reducers made available through the Zooniverse-hosted Aggregations app that are integrated using Caesar's external extractor and reducer functionality.
Data Flow
- Extract: For each classification, extractors generate extracts, which pull essential information out of the full classification record.
- Reduce: Whenever extracts change, Caesar then runs reducers that generate reductions. Each reducer receives all the extracts and aggregates the data from multiple classifications into key-value pairs.
- Act: Whenever a reduction changes, Caesar evaluates rules that can trigger effects. A rule is a boolean statement that reflects on reductions (referenced by reducer key) and other information (e.g., subject metadata), and uses logic clauses like
and/or/notin its defined condition. When the rule condition evaluates totrue, effects associated with that rule will be performed. For instance, an effect might trigger the action of retiring a subject.
┏━━━━━━━━━━━━━━━━━━┓
┃ Kinesis ┃
┗━━━┳━━━━━━━━━━━━━━┛
│ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
│ EXTRACTS:
│ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌──────────────────┐ │ │
├──▶ Classification 1 ────┬───▶│ FlaggedExtractor │──────▶{flagged: true}
│ └ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ └──────────────────┘ │ │
│ │ ┌──────────────────┐
│ └───▶│ SurveyExtractor │────┼─▶{raccoon: 1} │
│ └──────────────────┘
│ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌──────────────────┐ │ │
└──▶ Classification 2 ────┬───▶│ FlaggedExtractor │──────▶{flagged: false}
└ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ └──────────────────┘ │ │
│ ┌──────────────────┐
└───▶│ SurveyExtractor │────┼─▶{beaver: 1, raccoon: 1} │
└──────────────────┘
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
REDUCTIONS: │
│ │ │
{ │
│ votes_flagged: 1, │ ┌──────────────────┐ │
votes_beaver: 1, ◀─────│ VoteCountReducer │◀─────────────────┘
│ votes_raccoon: 2 │ └──────────────────┘
}
│ │
┏━━━━━━━━━━━━━━━━┓
│ { │ ┌──────────────────┐ ┃Some script run ┃
swap_confidence: 0.23 ◀─────│ ExternalReducer │◀────HTTP API call────┃by project owner┃
│ } │ └──────────────────┘ ┃ (externally) ┃
┗━━━━━━━━━━━━━━━━┛
└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
│
│
│ ┌──────────────────┐ POST ┏━━━━━━━━━━━━━━━━┓
└────────────────▶│ Rule │───/subjects/retire──▶┃ Panoptes ┃
└──────────────────┘ ┗━━━━━━━━━━━━━━━━┛
To make this more concrete, here is an example for a survey task workflow where:
- An extractor emits key-value pairs like
lion=1when the user tagged a lion in the image. - A reducer combines multiple classifications by adding up the lion counts, emitting
lion=5, coyote=1 - A rule then checks
lion > 4, which returns true, and therefore Caesar retires the image.
See also the diagram to the right, which provides a visual demonstration of an example flow.
Usage
Caesar listens to classification events for workflows from the event stream. The tasks and subject sets connected to a specific workflow are configured via the project builder. To configure the data handling:
- Go to the Caesar Web UI and login.
- Click on "Workflows" and click "Add" and enter the workflow ID (you can find this in the Project Builder page).
- Configure extractors and reducers.
- Configure rules and effects.
Configure Extractors
To create an extractor:
- From the workflow summary page, click on the ‘Extractors’ tab. Press the ‘+Create Extractor’ button. You will be prompted to choose a type of extractor.

- Fill out the form for the new extractor. The generic fields for all extractors are:
- The
keyis an alpha-numeric identifier for this extractor that is unique to this workflow. Set a short, but descriptive string for this, e.g.,galaxy-type-extract. - The
task keyis the identifier of the task in the workflow. You can get this information from the project builder page (see image below)
- The
if missingentry allows you to decide what should be done if the classification data is missing. The default choice is to error out of that extract. - The
minimum workflow versionprovides the choice to filter out early versions of the workflow, useful for limiting the data domain to post-development or post-launch classifications. - Each extractor will also have unique fields that need to be filled out, as detailed below.
- The
Configure Reducers
Reducers can be created from the "Reducers" tab in the workflow configuration page. Like extractors, Caesar features a set of standard reducers, which are task dependent. To add a reducer to your workflow, click on the 'Create' button and choose from dropdown:

This will take you to a configuration window for that reducer:

All reducers share the same set of configuration parameters, but given their individual flexibility it can be tricky to make decisions. See detailed documentation of all parameters below. Overall, default values are OK in most cases.
Configure Rules & Effects
Rules and effects can be added from the "Rules" tab in the workflow configuration page. Rules typically apply to subjects, but can also apply to users as well.
As an example, we will create an subject rule for early retirement. From the workflow's Rules page (see below), create a new rule by clicking the "Create" button.

Clicking create will open the rule editing page (see below). Enter the condition; here we use ["gte", ["lookup", "CSSIDemoReducer.NOTHINGHERE", "0"], ["const", 5]] to define that effects should trigger when the CSSIDemoReducer tracks five or more NOTHINGHERE classifications. Next, we select a corresponding effect that triggers when the condition is met; here we select the "Retire Subject" option.

Selecting the effect option will open the effect editing page (see below). Confirm the action type (tied to the choice of effect chosen from dropdown) and "Reason" parameter in the case of the retirement effect. Click "Create Subject rule effect" to complete effect configuration.

Once the condition and effects are configured (see below), click "Update Subject rule" button in the subject rule editing page to complete the configuration.

By default all configured rules on a workflow will be evaluated each time the reducers are run. Rule evaluation may be disabled from the settings panel -- for example, to temporarly halt early retirements.
Extracts
Extractors are tools that allow Caesar to extract specific data from the full classification output. Caesar (and the aggregations-for-caesar app) feature a collection of extractors for specific tasks.
Extractor types
There are different types of extractors built into Caesar for specific tasks. The following sections shows the types of tools that each extractor supports.
Blank extractor
This extractor checks for whether a text entry (or some drawing tasks) in the classification is blank. The extractor outputs blank=true if the classification is empty or false instead.
Question extractor
Suited for question tasks, this extracts retrieves the index of the answer from the classification. Indices are C-style, i.e. the first index is "0".
Pluck field extractor
This extractor is used to retrieve a value from the classification/subject metadata. For example, if the filename of the subject is used during aggregation, this extractor would pass it as an extracted value.
Survey extractor
Shape extractor
External extractor
The External Extractor API passes the classification data to an external (HTTPS) URL, which responds with the extracted data in a JSON format. See the External API section below for more information.
Get extracts
GET /workflows/$WORKFLOW_ID/extractors/$EXTRACTOR_KEY/extracts?subject_id=$SUBJECT_ID HTTP/1.1
Content-Type: application/json
Accept: application/json
Authorization: Bearer $TOKEN
The above command returns JSON structured like this:
[
{
"classification_at": "2017-05-16T15:51:13.544Z",
"classification_id": 54376560,
"created_at": "2017-05-16T20:37:39.124Z",
"data": null,
"extractor_key": "c",
"id": 411083,
"subject_id": 458033,
"updated_at": "2017-05-16T20:37:39.124Z",
"user_id": 108,
"workflow_id": 4084
}
]
Extracts are pieces of information relating to a specific classification (and therefore to a specific subject as well).
Query Parameters
| Parameter | Default | Description |
|---|---|---|
| WORKFLOW_ID | null | Required · Specifies which workflow |
| SUBJECT_ID | null | Required · Specifies which subject |
| EXTRACTOR_KEY | null | Required · Specifies which extractor to fetch extracts from. |
Create & update extracts
Inserting and updating extracts happens through one and the same API endpoint, which performs an "upsert".
POST /workflows/$WORKFLOW_ID/extractors/$EXTRACTOR_KEY/extracts HTTP/1.1
Content-Type: application/json
Accept: application/json
Authorization: Bearer $TOKEN
{
"subject_id": 458033,
"classification_at": "2017-05-16T15:51:13.544Z",
"classification_id": 54376560,
"user_id": 108,
"data": {"PENGUIN": 1, "POLARBEAR": 4}
}
Body fields
The request body should be encoded as a JSON with the following fields:
| Parameter | Default | Description |
|---|---|---|
| subject_id | null | Required · Specifies which subject this extract is about |
| classification_id | null | Required · Specifies which classification this extract is about. May be omitted if known to be an update rather than a create. |
| classification_at | null | Required · Specifies what time the classification happened. This is used to sort extracts by classification time when reducing them. May be omitted if known to be an update rather than a create. |
| user_id | null | User that made the classification. null signifies anonymous. |
External API calls
When an ExternalExtractor or ExternalReducer is called the classification data is sent to the given URL (requires HTTPS) as JSON data. The external API then does the processing and returns a response to Caeser. The response from the external endpoint must be:
- 200 (OK)
- 201 (Resource Created)
- 202 (Processing Started)
- 204 (No Data)
All other responses will result in an error on Caesar. The data format for the classification data sent to an external extractor is shown below below:
Classification data format
Sample classification data
{
"id": 356374099,
"project_id": 16747,
"workflow_id": 19487,
"workflow_version": "20.23",
"subject_id": 67913886,
"user_id": 2245813,
"annotations": {
main task data here
},
"metadata": {
"started_at": "2021-08-31T19:24:09.056Z",
"finished_at": "2021-08-31T19:24:25.576Z",
"live_project": false,
"interventions": {"opt_in": true, "messageShown": false},
"user_language": "en",
"user_group_ids": [],
"workflow_version": "20.23",
"subject_dimensions": [{"clientWidth": 700, "clientHeight": 390, "naturalWidth": 700, "naturalHeight": 390}],
"subject_selection_state": {
"retired": false,
"selected_at": "2021-08-31T19:24:08.886Z",
"already_seen": false,
"selection_state": "normal",
"finished_workflow": false,
"user_has_finished_workflow": false
},
"workflow_translation_id": "48794"
},
"subject": {
"id": 67913886,
"metadata": {
subject metadata here
},
"created_at": "2021-08-31T19:24:26.032Z",
"updated_at": "2021-08-31T19:24:26.032Z"
}
}
The extractors gets the raw data from the classification. There are a set of standard fields that are common across all task types, but individual tasks contain specific data formats tailored to the data that they send. The common fields are:
id: The unique ID for the classificationproject_id: The ID for the project that this classification belongs toworkflow_id: The workflow attached to the classificationworkflow_version: The version for the workflow (is this something that project builders can set?)subject_id: The ID for the subject that was classifieduser_id: The unique ID for the user who classified this subjectannotations: Dictionary containing the actual classification data (differs based on the number of tasks, and the task types)metadata: Additional data for this classification. Most are standard HTTP headers, exceptstarted_at,finished_at: The start and times for this classificationlive_project: whether the project is liveinterventions: data on whether the volunteer was shown any feedback messagessubject_dimensions: The size of the subject (in pixels) on the screensubject_selection_state: Data about the subject's retirement state and whether it has been seen before.
subject: Data about the subject, includingid: The unique subject ID in the databasemetadata: Additional data about the subject (including filename, and whether it is agold_standarddata)
Task specific data
Example of annotation data
"annotations": {
"T0": [
{
"task": "T0",
"value": 0
}
],
"T1": [
{
"task": "T1",
"value": [
{
"x": 315.75,
"y": 151.96665954589844,
"toolIndex": 3,
"tool": 3,
"frame": 0,
"details": []
}
]
}
],
"T2": [
{
"task": "T2",
"value": "ffdddsssaaa"
}
]
}
The data for each task is passed into the annotations key in the JSON dictionary. The tasks are listed by the task name, with each entry containing information related to the type of task. The name of the task is stored in the task key, while the data associated with the task is stored in the value key. The value can vary from a simple text/number to a dictionary depending on the task type. In the example on the right, the first task is a question, the second is a point tool, and the third is a text tool.
Reducers
Reducers are used to compile a set of extracts together to create an aggregated result. For example, a set of answers from a question task can be combined to get the "best" answer (i.e. one with the most votes).
Reducer Parameters
Key
This is the unique ID for this reducer. Use something that defines the functionality of the reducer. For example, a reducer that generates the consensus of a question task of galaxy morphology could be galaxy-morphology-consensus.
Topic
Extracts are always implicitly grouped before being combined. There are two different ways of doing this:
reduce_by_subject:
This filters all classifications by subject ID. Consequently, the aggregation will run on all classifications of a given subject. This is a useful way to get information about a specific subject.
reduce_by_user
This filters all classifications by user ID. Therefore, aggregation is done on all classifications done by that user in the current workflow. This is useful in getting statistics about specific users.
The default is reduce_by_subject.
Grouping
This is a confusing setting because extracts are already obviously grouped according to the topic. This allows an additional grouping pass, which, crucially, can be done on the basis of the value of a specified field. So to configure this, you need to set the name of the field to group by (in format extractor_key.field_name) and then a flag indicating how to handle when the extracts for a given classification are missing that field. The value of the grouping field will be reflected in the name of the group, stored in the subgroup field. The default behavior is not to perform this secondary grouping.
Filters
This tab allows you to filter what classifications are combined together. Caesar will search and retrieve all classifications based on the topic key defined above. In the filters tab, you can further refine which classifications in this subset you want to use (default: all), and which extracts to use for that classification. These keys are described below:
From/To
These keys allow you to subset the list of extracts to use, where from and to define the (zero-based) start and end index of the list of classifications. By default, Caesar will use all the retrieved extracts. For example, if you want everything from the 5th index to the end, set start=5 and end=-1.
Extractor Keys
This entry allows you to subset which extracts (defined in the extractor configuration) should be used for this reducer. Sometimes multiple extractors will be defined but a particular reducer only cares about or can only work with a particular type of extract. In this case, you can use the extractor keys property to restrict the extracts that are sent to this reducer. The format of this value is either a string (for a single extractor key) or an array of strings (for multiple extractors) of the extractor keys defined in the extractor configuration in the format ["extractor-key-1", "extractor-key-2", "extractor-key-3"]. The default, a blank string or a nil, sends all extracts.
Repeated classifications
This prescribes what Caesar should in case there are multiple classifications by the same user ID. keep_first is the default value, and Caesar will remove everything but the first time the user saw the subject. keep_last chooses the latest classification. keep_all will not delete any classifications. We recommend ‘keep_first’ unless you feel strongly that you’d prefer another of those options. It’s a rare event, but good to have a rule in place for it.
Training behavior
This configures what Caesar should do about training data (those with metadata keys #training_subjects = true). The default behaviour is to ignore_training where Caesar does not actively filter reduction inputs based on training metadata. This can be configured to work on training_only, where the reductions is only run on classifications which contain training subjects or the converse, where all training data is removed before aggregations (experiment_only). See training subject metadata for more info on training subjects.
Reduction Mode
This is probably the least understood part of configuring reducers. Briefly, the system offers two very different modes of performing reduction. These are:
default_reductionrunning_reduction
Default Reduction
In "default reduction" mode, each time a new extract is created, we fetch all of the other extracts for that subject (or user) and send them all to the reducer for processing. In cases where extracts are coming in very quickly, this can create some extra work fetching extracts, but is guaranteed to be free of race conditions because each new reduction will get a chance to reduce across all relevant extracts. This mode is much simpler and is preferred in almost every case. However, in the case where a given subject (or user) is likely to have thousands of associated extracts, it is recommended to use "running reduction" mode.
Running Reduction
"Running reduction" mode was created to support the Notes for Nature use case, where we are reducing across a user's entire classification history within a given project, which could run to tens of thousands of items for power users. In this use case, fetching all 10,000 extracts each time a new extract is created is impractical and the operations we want to perform are relatively simple to perform using only the new extracts created in a given extraction pass.
When a reducer is configured for running reduction, each time a new classification produces new extracts, the reducer is invoked with only those new extracts. Any additional information it would need in order to correctly compute the reduction should be present in a field on the reduction, called a store. With the new extracts and the store, the reducer will compute an updated value and update its store appropriately. However, this can't be done in a multithreaded way or else the object might be available while in an inconsistent state (example: its store has been updated but its value has not). Accordingly, we use optimistic locking semantics, so that we prefetch all possible relevant extracts and reductions before reducing and throw a sync error if the object versions don't match when we try to save. Further, we need to avoid updating the reduction multiple times with the same extract, which is not a concern with running reduction. Therefore, this mode populates a relation tracking which extracts have been incorporated into which reductions. Between this and the synchronization retries, there is considerable added complexity and overhead compared to default reduction mode. It's not recommended to use running reduction mode with external reducers, because the added complexity of writing reducers that reduce from a store.
Reduction Mode Example
This example is to clarify the difference between how default reduction and running reduction work. Imagine the extract from each classification produces a number from 0 to 10 and the reducer computes the average of these numbers.
The same extracts are processed by each reducer in the same order and we illustrate the changing values in the system as they arrive. For clarity, the values of extracts are indicated in bold.
Default Reduction
| Extract ID | Extract Value | Extracts to reducer | Store Value In | Calculation | Store Value | Items in Association |
|---|---|---|---|---|---|---|
| 1 | 5 | 1 | nil | 5/1 | nil | 0 |
| 2 | 3 | 1, 2 | nil | (5+3)/2 | nil | 0 |
| 2 | 3 | 1, 2 | nil | (5+3)/2 | nil | 0 |
| 3 | 4 | 1, 2, 3 | nil | (5+3+4)/3 | nil | 0 |
Running Reduction
| Extract ID | Extract Value | Extracts to reducer | Store Value In | Calculation | Store Value | Items in Association |
|---|---|---|---|---|---|---|
| 1 | 5 | 1 | nil | (0*0+5)/(0+1) | 1 | 1 |
| 2 | 3 | 2 | 1 | (5*1+3)/(1+1) | 2 | 2 |
| 2 | 3 | nil | N/A | N/A | 2 | 2 |
| 3 | 4 | 3 | 2 | (4*2+4)/(2+1) | 3 | 3 |
Points of Note
Note that in default reduction mode, re-reduction is always triggered, regardless of whether an extract is being processed twice. Also notice that each computation in default reduction consumes all of the extracts. We calculate an average by summing together the values of all of the extracts and then dividing by the number of extracts.
In running reduction, on the other hand, the store keeps a running count of how many items the reducer has seen. This store, with the previous value of the reduction, can be used to compute the new average using only the new value by using the formula ((old average * previous count) + new value)/(old count + 1) and the store can be updated with the new count (old count + 1).
When using running reducers for performance reasons, please keep in mind that the performance benefits of running reduction are only realized if every reducer for that reducible is executed in running mode. The primary advantage of running reduction is that it eliminates the need to load large numbers of extracts for a given subject or user.
Reducer Types
Caesar features a set of standard reducers that are useful for most projects. These are described below:
Given the following extracts
extract_list = [
{"data":
{"ZEBRA": 1}
},
{"data":
{"ZEBRA": 1}
},
{"data":
{"AARDVARK": 1}
},
{"data":
{"ZEBRA": 1}
}
]
The consensus, count and simple stats reducers will output
consensus_reducer_return = {
"most_likely": "ZEBRA",
"num_votes": 3,
"agreement": 0.75
}
Consensus
Uses the counting hash to summate the unique extracted key:value pairs.
The reducer will select the key with the highest summated value as the most likely (most_likely) answer.
It will also return the total number of votes (num_votes) for this most_likely answer.
Finally it will return an agreement value which is the num_votes/ number of all submitted classifications.
An example is shown on the right.
count_reducer_return = {
"classifications": 4,
"extracts": 4
}
Count
The count reducer will simply return a count of the number of classifications
(accounting for the rules set up for repeated classifications). The classifications entry shows
the number of classifications, and the extracts key shows the number of corresponding extracts.
simple_stats_reducer_return = {
"ZEBRA": 4,
"AARDVARK": 1
}
Simple Stats
Summates the extracted classification annotations key:value pair data. This reducer relies on the annotation data being in the correct format for summation, e.g. [["ZEBRA", 1]] Please note if the annotation shape doesn't include a summatable value, e.g. the 1 in above example, this reducer will require an aligned extractor to configure the key value to be summated.
Note this reducer can count True and False values as well, True increments by 1, False does note increment
First Extract
This reducer will return the output of the first extract in the list of extracts. This is useful when extracting data that is common to the subject or the user (e.g., subject metadata).
SQS
Setting up an SQS reducer instructs Caesar to send the output of our extractor to an AWS SQS queue. We can then use remote aggregation code to consume and process those extracts asynchronously and without having to maintain a dedicated server to accept extracted data. The reducer needs to be configured (through the admin console) with the URL and name of an AWS SQS queue that will receive and temporarily store the classifications from the workflow
Rectangle
This reducer is used to cluster extracts from the Rectangle tool. It uses the DBSCAN algorithm to aggregate the shapes together.
External
This is similar to an external extractor, and is configured by providing a URL (requires HTTPS) that serves as an endpoint for the extractor data from Caesar.
Subject Metadata
Caesar stores subject metadata that is fetched the first time that subject is operated on by an active Caesar workflow. Note: metadata updates made to Panoptes subjects are not automatically reflected in the Caesar stored entry. Metadata updates can only be fetched by admin-triggered actions: running a workflow backfill (via Caesar's BackfillWorkflowWorker), or running update_cache() on each Subject resource.
For Rule Conditions
Caesar can reflect on subject metadata as part of a rule condition. This can be particularly helpful to make use of ML predictions, where early retirement rules take into account agreement or disagreement with a machine prediction and adjust the required number of classifications accordingly.
As an example, this rule condition checks if #machine_confidence_is_empty is greater than or equal to 0.5 for case where volunteers are in 100% agreement with "NOANIMALSPRESENT" after at least two classifications:
["and", ["eq",["lookup","consensus.most_likely",""], ["const","NOANIMALSPRESENT"]], ["gte",["lookup","consensus.num_votes",0],["const",2]], ["gte",["lookup","consensus.agreement",0],["const",1]],["gte",["to_f",["lookup","subject.#machine_confidence_is_empty",0]],["const",0.5]]]
Note: subject metadata fields will be stored as strings by default, hence the use of the to_f operator to convert the string value to a float for numerical comparison.
Special Metadata
Two attributes in a subject's metadata have special significance to Caesar.
#training_subject
- Boolean. If true, subject is a training subject.
- Used to funnel training subjects to a separate reduction pathway.
- Example: TESS user weighting
- ExtractFilter allows filtering by training behavior.
- To use: set a filter on reducer to include:
training_behavior: training_onlyorexperiment_only - See Subject#training_subject? and Filters::FilterByTrainingBehavior for use.
#previous_subject_ids
- Array of Zooniverse subject ids
- Subjects whose ids are included in array will be passed by RunsReducers to FetchExtractsBySubject
- Used to indicate that one or more prior subjects' extracts should be included when reducing a new subject.
- Example: TESS takes a new image of the same piece of the sky as a previous subject on a subsequent pass. The previous subject's Zooniverse id is included in the subject metadata and all extracts for both subjects are included in the new subject's reduction.
- See Subject#additional_subject_ids_for_reduction for use.
Rules
A workflow can configure one or many rules. Each rule has a condition; when that condition evaluates to true, one or more effects linked to that rule are triggered. Conditions can be nested to achieve complicated logical statements.
Rules may pertain to either subjects or users. Rules have an evaluation order that can be set in the database if need be, and then rules can either be all evaluated, or evaluated until the first true condition is reached, or none evaluated.
Conditions
The condition is a single operation, but some types of operations can be nested. The general syntax is like if you'd write Lisp in JSON. It's always an array with as the first item a string identifying the operator. The other values are operations in themselves: [operator, arg1, arg2, ...].
["lt", operation, operation, ...]- Performs numerical comparison. You can specify more than two arguments, and it will evaluate asa < b < c < d.["lte", operation, operation, ...]- Performs numerical comparison. You can specify more than two arguments, and it will evaluate asa <= b <= c <= d.["gt", operation, operation, ...]- Performs numerical comparison. You can specify more than two arguments, and it will evaluate asa > b > c > d.["gte", operation, operation, ...]- Performs numerical comparison. You can specify more than two arguments, and it will evaluate asa >= b >= c >= d.["eq", operation, operation, ...]- Performs numerical comparison. You can specify more than two arguments, and it will evaluate asa == b == c == d.["const", value]- Always returns the configured value.["lookup", key, fallback_value]- Look up a reduction value by the given key. In the case that the lookup fails, thefallback_valuewill be adopted.["not", operation]- Negates the operation["and", operation, operation, ...]- Returns true if all of the given operations evaluate to logical true["or", operation, operation, ...]- Returns true if any of the given operations evaluates to logical true
Condition Examples
- If one or more vehicles is detected
["gte", ["lookup", "survey-total-VHCL", 0], ["const", 1]]
- If the most likely identification is "HUMAN"
["gte", ["lookup", "consensus.most_likely", ""], ["const", "HUMAN"]]
Effects
Each rule can have one or more effects associated with it. Those effects will be performed when that rule's condition evaluates to true. Subject Rules have effects that affect subjects (and implicitly receive subject_id as a parameter) and User Rules have effects that affect users (user_id).
Subject Rule Effects
| effect_type | config Parameters |
Effect Code |
|---|---|---|
retire_subject |
reason (string)* |
Effects::RetireSubject |
add_subject_to_set |
subject_set_id (string) |
Effects::AddSubjectToSet |
add_subject_to_collection |
collection_id (string) |
Effects::AddSubjectToCollection |
external_effect |
url (string)** |
Effects::ExternalEffect |
* Panoptes API validates reason against a list of permitted values. Choose from blank, consensus, or other
** url must be HTTPS
User Rule Effects
| effect_type | config Parameters |
Effect Code |
|---|---|---|
promote_user |
workflow_id (string) |
Effects::ExternalEffect |
How to do SWAP
The Space Warps Analysis Pipeline (SWAP; repo, docs) is a probabilistic framework for quantifying the probability that a candidate object is a strong gravitational lens that accounts for volunteer-specific performance, as described in the Space Warps overview publication. This document is a reference to the current state of affairs on doing SWAP on the Panoptes platform (by which we mean the Panoptes API, Caesar, and Designator).
To do SWAP, one must:
Track the confusion matrix of users. We currently expect this to be done by some entity outside the Panoptes platform. This could be a script that runs periodically on someone's laptop, or it can be an external webservice that gets classifications streamed to it in real-time by Caesar (this is what Darryl is doing). We don't currently have a good place to store the confusion matrix itself inside the Panoptes platform. But, if the matrix identifies an expert classifier, post that into Panoptes under the
project_preferencesresource (API calls explained in later section)Calculate the likelyhood of subjects. This is done in the same place that also calculates the confusion matrices. The resulting likelyhood should be posted into Caesar as a
reduction.Retire subjects when we know the answer. By posting the likelyhood into Caesar, we can set rules on it. For instance:
IF likelyhood < 0.1 AND classifications_count > 5 THEN retire()IF likelyhood > 0.9 AND classifications_count > 5 THEN retire()IF likelyhood > 0.1 AND likelyhood < 0.9 AND not seen_by_expert AND classifications > 10 THEN move to expert_set
When Caesar moves subjects into an expert-only subject set, Designator can then serve subjects from that set only to users marked as experts by the
project_preferences. Designator is all about serving subjects from sets with specific chances, which means that we avoid the situation where experts only ever see the really hard subjects by mixing e.g. 50% hard images with 50% "general population".
In Panoptes, set
workflow.configurationto something like:
{"subject_set_chances": {"EXPERT_SET_ID": 0}}
In Caesar, set the workflow like so:
{
"extractors_config": {
"who": {"type": "who"},
"swap": {"type": "external", "url": "https://darryls-server.com"} # OPTIONAL
},
"reducers_config": {
"swap": {"type": "external"},
"count": {"type": "count"}
}
"rules_config": [
{"if": [RULES], "then": [{"action": "retire_subject"}]}
]
}
When you detect an expert user, update their probabilities like this:
POST /api/project_preferences/update_settings?project_id=PROJECT_ID&user_id=USER_ID HTTP/1.1
Host: panoptes-staging.zooniverse.org
Authorization: Bearer TOKEN
Content-Type: application/json
Accept: application/vnd.api+json; version=1
{
"project_preferences": {
"designator": {
"subject_set_chances": {
"WORKFLOW_ID": {"SUBJECT_SET_ID": 0.5}
}
}
}
}
And store expert-seenness in Caesar so that you can use it in the rulse
POST /workflows/WORKFLOW_ID/reducers/REDUCER_KEY/reductions HTTP/1.1
Host: caesar-staging.zooniverse.org
Authorization: Bearer TOKEN
Content-Type: application/json
Accept: application/json
{
"likelyhood": 0.864,
"seen_by_expert": false
}
Errors
The Kittn API uses the following error codes:
| Error Code | Meaning |
|---|---|
| 400 | Bad Request -- Your request sucks. |
| 401 | Unauthorized -- Your API key is wrong. |
| 403 | Forbidden -- The kitten requested is hidden for administrators only. |
| 404 | Not Found -- The specified kitten could not be found. |
| 405 | Method Not Allowed -- You tried to access a kitten with an invalid method. |
| 406 | Not Acceptable -- You requested a format that isn't json. |
| 410 | Gone -- The kitten requested has been removed from our servers. |
| 418 | I'm a teapot. |
| 429 | Too Many Requests -- You're requesting too many kittens! Slow down! |
| 500 | Internal Server Error -- We had a problem with our server. Try again later. |
| 503 | Service Unavailable -- We're temporarily offline for maintenance. Please try again later. |