Introduction

This is on the docupdate branch. We are updating this page

Caesar is an evolution of the Nero codebase, which is made more generic. In essence, Caesar receives classifications from the event stream (a Lambda script sends them to Caesars HTTP API).

For each classification, it runs zero or more extractors defined in the workflow to generate "extracts". These extracts specify information summarized out of the full classification.

Whenever extracts change, Caesar will then run zero or more reducers defined in the workflow. Each reducer receives all the extracts, merged into one hash per classification. The task of the reducer is to aggregate results from multiple classifications into key-value pairs, where values are simple data types: integers or booleans. The output of each reducer is stored in the database as a Reduction.

Whenever a reduction changes, Caesar will then run zero or more rules defined in the workflow. Each rule is a boolean statement that can look at values produced by reducers (by key), compare. Rules support logic clauses like and / or / not. When the rule evaluates to true, all of the effects associated with that rule will be performed. For instance, an effect might be to retire a subject.

┏━━━━━━━━━━━━━━━━━━┓
┃     Kinesis      ┃
┗━━━┳━━━━━━━━━━━━━━┛
    │                                                       ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
    │                                                         EXTRACTS:
    │   ┌ ─ ─ ─ ─ ─ ─ ─ ─ ┐         ┌──────────────────┐    │                           │
    ├──▶ Classification 1  ────┬───▶│ FlaggedExtractor │──────▶{flagged: true}
    │   └ ─ ─ ─ ─ ─ ─ ─ ─ ┘    │    └──────────────────┘    │                           │
    │                          │    ┌──────────────────┐
    │                          └───▶│ SurveyExtractor  │────┼─▶{raccoon: 1}             │
    │                               └──────────────────┘
    │   ┌ ─ ─ ─ ─ ─ ─ ─ ─ ┐         ┌──────────────────┐    │                           │
    └──▶ Classification 2  ────┬───▶│ FlaggedExtractor │──────▶{flagged: false}
        └ ─ ─ ─ ─ ─ ─ ─ ─ ┘    │    └──────────────────┘    │                           │
                               │    ┌──────────────────┐
                               └───▶│ SurveyExtractor  │────┼─▶{beaver: 1, raccoon: 1}  │
                                    └──────────────────┘
   ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐                          └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
     REDUCTIONS:                                                          │
   │                             │                                        │
      {                                                                   │
   │    votes_flagged: 1,        │  ┌──────────────────┐                  │
        votes_beaver: 1,      ◀─────│ VoteCountReducer │◀─────────────────┘
   │    votes_raccoon: 2         │  └──────────────────┘
      }
   │                             │
                                                                              ┏━━━━━━━━━━━━━━━━┓
   │  {                          │  ┌──────────────────┐                      ┃Some script run ┃
        swap_confidence: 0.23 ◀─────│ ExternalReducer  │◀────HTTP API call────┃by project owner┃
   │  }                          │  └──────────────────┘                      ┃  (externally)  ┃
                                                                              ┗━━━━━━━━━━━━━━━━┛
   └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
                  │
                  │
                  │                 ┌──────────────────┐         POST         ┏━━━━━━━━━━━━━━━━┓
                  └────────────────▶│       Rule       │───/subjects/retire──▶┃    Panoptes    ┃
                                    └──────────────────┘                      ┗━━━━━━━━━━━━━━━━┛

To make this more concrete, an example would be a survey-task workflow where:

An extractor emits key-value pairs like lion=1 when the user tagged a lion in the image.
A reducer combines multiple classifications by adding up the lion counts, emitting lion=5, coyote=1
A rule then checks lion > 4, which returns true, and therefore Caesar retires the image.

Reducers can reduce across multiple subjects' extracts if the following is included in the new subject's metadata (when uploaded to Panoptes): { previous_subject_ids: [1234] }. Extracts whose subject ids match an id in that array will be included in reductions for the new subject.

Usage

Caesar listens to classification events for workflows from the event stream. The tasks and subject sets connected to a specific workflow are configured via the project builder. To configure the data handling from classifications,

Go to the Caesar Web UI and login.
Click on "Workflows" and click "Add" and enter the workflow ID (you can find this in the Project Builder page)
You can use the Extractors and Reducers to configure the extraction and reduction pipeline (as detailed below).
Use the UI to configure your rules and effects as per the rules & effects docs.

Extracts

Extractors are tools that allow Caesar to extract specific data from the full classification output. Caesar (and the aggregations-for-caesar app) feature a collection of extractors for specific tasks.

Creating an extractor

To create an extractor:

From the workflow summary page, click on the ‘Extractors’ tab. Press the ‘+Create Extractor’ button. You will be prompted to choose a type of extractor.

new-extractor

Fill out the form for the new extractor. The generic fields for all extractors are:
- The key is an alpha-numeric identifier for this extractor that is unique to this workflow. Set a short, but descriptive string for this, e.g., galaxy-type-extract.
- The task key is the identifier of the task in the workflow. You can get this information from the project builder page (see image below)
- The if missing entry allows you to decide what should be done if the classification data is missing. The default choice is to error out of that extract.
- The minimum workflow version provides the choice to filter out early versions of the workflow, useful for limiting the data domain to post-development or post-launch classifications.
- Each extractor will also have unique fields that need to be filled out, as detailed below.

Extractor types

There are different types of extractors built into Caesar for specific tasks. The following sections shows the types of tools that each extractor supports.

Blank extractor

This extractor checks for whether a text entry (or some drawing tasks) in the classification is blank. The extractor outputs blank=true if the classification is empty or false instead.

Question extractor

Suited for question tasks, this extracts retrieves the index of the answer from the classification. Indices are C-style, i.e. the first index is "0".

Pluck field extractor

This extractor is used to retrieve a value from the classification/subject metadata. For example, if the filename of the subject is used during aggregation, this extractor would pass it as an extracted value.

Survey extractor

Shape extractor

External extractor

The External Extractor API passes the classification data to an external (HTTPS) URL, which responds with the extracted data in a JSON format. See the External API section below for more information.

Get extracts

GET /workflows/$WORKFLOW_ID/extractors/$EXTRACTOR_KEY/extracts?subject_id=$SUBJECT_ID HTTP/1.1
Content-Type: application/json
Accept: application/json
Authorization: Bearer $TOKEN

The above command returns JSON structured like this:

[
    {
        "classification_at": "2017-05-16T15:51:13.544Z",
        "classification_id": 54376560,
        "created_at": "2017-05-16T20:37:39.124Z",
        "data": null,
        "extractor_key": "c",
        "id": 411083,
        "subject_id": 458033,
        "updated_at": "2017-05-16T20:37:39.124Z",
        "user_id": 108,
        "workflow_id": 4084
    }
]

Extracts are pieces of information relating to a specific classification (and therefore to a specific subject as well).

Query Parameters

Parameter	Default	Description
WORKFLOW_ID	null	Required · Specifies which workflow
SUBJECT_ID	null	Required · Specifies which subject
EXTRACTOR_KEY	null	Required · Specifies which extractor to fetch extracts from.

Create & update extracts

Inserting and updating extracts happens through one and the same API endpoint, which performs an "upsert".

POST /workflows/$WORKFLOW_ID/extractors/$EXTRACTOR_KEY/extracts HTTP/1.1
Content-Type: application/json
Accept: application/json
Authorization: Bearer $TOKEN

{
    "subject_id": 458033,
    "classification_at": "2017-05-16T15:51:13.544Z",
    "classification_id": 54376560,
    "user_id": 108,
    "data": {"PENGUIN": 1, "POLARBEAR": 4}
}

Body fields

The request body should be encoded as a JSON with the following fields:

Parameter	Default	Description
subject_id	null	Required · Specifies which subject this extract is about
classification_id	null	Required · Specifies which classification this extract is about. May be omitted if known to be an update rather than a create.
classification_at	null	Required · Specifies what time the classification happened. This is used to sort extracts by classification time when reducing them. May be omitted if known to be an update rather than a create.
user_id	null	User that made the classification. `null` signifies anonymous.

External API calls

When an ExternalExtractor or ExternalReducer is called the classification data is sent to the given URL (requires HTTPS) as JSON data. The external API then does the processing and returns a response to Caeser. The response from the external endpoint must be:

200 (OK)
201 (Resource Created)
202 (Processing Started)
204 (No Data)

All other responses will result in an error on Caesar. The data format for the classification data sent to an external extractor is shown below below:

Classification data format

Sample classification data

{
  "id": 356374099,
  "project_id": 16747,
  "workflow_id": 19487,
  "workflow_version": "20.23",
  "subject_id": 67913886,
  "user_id": 2245813,
  "annotations": {
    main task data here
  },
  "metadata": {
    "started_at": "2021-08-31T19:24:09.056Z",
    "finished_at": "2021-08-31T19:24:25.576Z",
    "live_project": false,
    "interventions": {"opt_in": true, "messageShown": false},
    "user_language": "en",
    "user_group_ids": [],
    "workflow_version": "20.23",
    "subject_dimensions": [{"clientWidth": 700, "clientHeight": 390, "naturalWidth": 700, "naturalHeight": 390}],
    "subject_selection_state": {
      "retired": false,
      "selected_at": "2021-08-31T19:24:08.886Z",
      "already_seen": false,
      "selection_state": "normal",
      "finished_workflow": false,
      "user_has_finished_workflow": false
    },
    "workflow_translation_id": "48794"
  },
  "subject": {
    "id": 67913886,
    "metadata": {
        subject metadata here
    },
    "created_at": "2021-08-31T19:24:26.032Z",
    "updated_at": "2021-08-31T19:24:26.032Z"
  }
}

The extractors gets the raw data from the classification. There are a set of standard fields that are common across all task types, but individual tasks contain specific data formats tailored to the data that they send. The common fields are:

id : The unique ID for the classification
project_id : The ID for the project that this classification belongs to
workflow_id : The workflow attached to the classification
workflow_version : The version for the workflow (is this something that project builders can set?)
subject_id : The ID for the subject that was classified
user_id : The unique ID for the user who classified this subject
annotations : Dictionary containing the actual classification data (differs based on the number of tasks, and the task types)
metadata : Additional data for this classification. Most are standard HTTP headers, except
- started_at, finished_at : The start and times for this classification
- live_project : whether the project is live
- interventions : data on whether the volunteer was shown any feedback messages
- subject_dimensions : The size of the subject (in pixels) on the screen
- subject_selection_state : Data about the subject's retirement state and whether it has been seen before.
subject : Data about the subject, including
- id : The unique subject ID in the database
- metadata : Additional data about the subject (including filename, and whether it is a gold_standard data)

Task specific data

Example of annotation data

  "annotations": {
    "T0": [
      {
        "task": "T0",
        "value": 0
      }
    ],
    "T1": [
      {
        "task": "T1",
        "value": [
          {
            "x": 315.75,
            "y": 151.96665954589844,
            "toolIndex": 3,
            "tool": 3,
            "frame": 0,
            "details": []
          }
        ]
      }
    ],
    "T2": [
      {
        "task": "T2",
        "value": "ffdddsssaaa"
      }
    ]
  }

The data for each task is passed into the annotations key in the JSON dictionary. The tasks are listed by the task name, with each entry containing information related to the type of task. The name of the task is stored in the task key, while the data associated with the task is stored in the value key. The value can vary from a simple text/number to a dictionary depending on the task type. In the example on the right, the first task is a question, the second is a point tool, and the third is a text tool.

Reducers

Reducers are used to compile a set of extracts together to create an aggregated result. For example, a set of answers from a question task can be combined to get the "best" answer (i.e. one with the most votes).

Creating Reducers

Reducers can be created from the "Reducers" tab in the workflow configure page. Like extractors, Caesar features a set of standard reducers, which are task dependent. To add a reducer to your workflow, click on the 'Create' button and choose from dropdown:

new-reducer

This will take you to a configuration window for that reducer:

reducer-config

All reducers share the same set of keys, but configuring reducers can be tricky because they are flexible in so many different ways. These keys will be described below:

Key

This is the unique ID for this reducer. Use something that defines the functionality of the reducer. For example, a reducer that generates the consensus of a question task of galaxy morphology could be galaxy-morphology-consensus.

Topic

Extracts are always implicitly grouped before being combined. There are two different ways of doing this:

reduce_by_subject:

This filters all classifications by subject ID. Consequently, the aggregation will run on all classifications of a given subject. This is a useful way to get information about a specific subject.

reduce_by_user

This filters all classifications by user ID. Therefore, aggregation is done on all classifications done by that user in the current workflow. This is useful in getting statistics about specific users.

The default is reduce_by_subject.

Grouping

This is a confusing setting because extracts are already obviously grouped according to the topic. This allows an additional grouping pass, which, crucially, can be done on the basis of the value of a specified field. So to configure this, you need to set the name of the field to group by (in format extractor_key.field_name) and then a flag indicating how to handle when the extracts for a given classification are missing that field. The value of the grouping field will be reflected in the name of the group, stored in the subgroup field. The default behavior is not to perform this secondary grouping.

Filters

This tab allows you to filter what classifications are combined together. Caesar will search and retrieve all classifications based on the topic key defined above. In the filters tab, you can further refine which classifications in this subset you want to use (default: all), and which extracts to use for that classification. These keys are described below:

From/To

These keys allow you to subset the list of extracts to use, where from and to define the (zero-based) start and end index of the list of classifications. By default, Caesar will use all the retrieved extracts. For example, if you want everything from the 5th index to the end, set start=5 and end=-1.

Extractor Keys

This entry allows you to subset which extracts (defined in the extractor configuration) should be used for this reducer. Sometimes multiple extractors will be defined but a particular reducer only cares about or can only work with a particular type of extract. In this case, you can use the extractor keys property to restrict the extracts that are sent to this reducer. The format of this value is either a string (for a single extractor key) or an array of strings (for multiple extractors) of the extractor keys defined in the extractor configuration in the format ["extractor-key-1", "extractor-key-2", "extractor-key-3"]. The default, a blank string or a nil, sends all extracts.

Repeated classifications

This prescribes what Caesar should in case there are multiple classifications by the same user ID. keep_first is the default value, and Caesar will remove everything but the first time the user saw the subject. keep_last chooses the latest classification. keep_all will not delete any classifications. We recommend ‘keep_first’ unless you feel strongly that you’d prefer another of those options. It’s a rare event, but good to have a rule in place for it.

Training behavior

This configures what Caesar should do about training data (those with metadata keys #training_subjects = true). The default behaviour is to ignore_training where Caesar does not actively filter reduction inputs based on training metadata. This can be configured to work on training_only, where the reductions is only run on classifications which contain training subjects or the converse, where all training data is removed before aggregations (experiment_only). See training subject metadata for more info on training subjects.

Reduction Mode

This is probably the least understood part of configuring reducers. Briefly, the system offers two very different modes of performing reduction. These are:

default_reduction
running_reduction

Default Reduction

In "default reduction" mode, each time a new extract is created, we fetch all of the other extracts for that subject (or user) and send them all to the reducer for processing. In cases where extracts are coming in very quickly, this can create some extra work fetching extracts, but is guaranteed to be free of race conditions because each new reduction will get a chance to reduce across all relevant extracts. This mode is much simpler and is preferred in almost every case. However, in the case where a given subject (or user) is likely to have thousands of associated extracts, it is recommended to use "running reduction" mode.

Running Reduction

"Running reduction" mode was created to support the Notes for Nature use case, where we are reducing across a user's entire classification history within a given project, which could run to tens of thousands of items for power users. In this use case, fetching all 10,000 extracts each time a new extract is created is impractical and the operations we want to perform are relatively simple to perform using only the new extracts created in a given extraction pass.

When a reducer is configured for running reduction, each time a new classification produces new extracts, the reducer is invoked with only those new extracts. Any additional information it would need in order to correctly compute the reduction should be present in a field on the reduction, called a store. With the new extracts and the store, the reducer will compute an updated value and update its store appropriately. However, this can't be done in a multithreaded way or else the object might be available while in an inconsistent state (example: its store has been updated but its value has not). Accordingly, we use optimistic locking semantics, so that we prefetch all possible relevant extracts and reductions before reducing and throw a sync error if the object versions don't match when we try to save. Further, we need to avoid updating the reduction multiple times with the same extract, which is not a concern with running reduction. Therefore, this mode populates a relation tracking which extracts have been incorporated into which reductions. Between this and the synchronization retries, there is considerable added complexity and overhead compared to default reduction mode. It's not recommended to use running reduction mode with external reducers, because the added complexity of writing reducers that reduce from a store.

Reduction Mode Example

See Reduction Mode Example

Reducer types

Caesar features a set of standard reducers that are useful for most projects. These are described below:

Given the following extracts

extract_list = [
    {"data": 
        {"ZEBRA": 1}
    },
    {"data": 
        {"ZEBRA": 1}
    },
    {"data": 
        {"AARDVARK": 1}
    },
    {"data": 
        {"ZEBRA": 1}
    }
]

The consensus, count and simple stats reducers will output

consensus_reducer_return = {
    "most_likely": "ZEBRA",
    "num_votes": 3,
    "agreement": 0.75
}

Consensus

Uses the counting hash to summate the unique extracted key:value pairs. The reducer will select the key with the highest summated value as the most likely (most_likely) answer. It will also return the total number of votes (num_votes) for this most_likely answer. Finally it will return an agreement value which is the num_votes/ number of all submitted classifications. An example is shown on the right.

count_reducer_return = {
    "classifications": 4,
    "extracts": 4
}

Count

The count reducer will simply return a count of the number of classifications (accounting for the rules set up for repeated classifications). The classifications entry shows the number of classifications, and the extracts key shows the number of corresponding extracts.

simple_stats_reducer_return = {
    "ZEBRA": 4,
    "AARDVARK": 1
}

Simple Stats

Summates the extracted classification annotations key:value pair data. This reducer relies on the annotation data being in the correct format for summation, e.g. [["ZEBRA", 1]] Please note if the annotation shape doesn't include a summatable value, e.g. the 1 in above example, this reducer will require an aligned extractor to configure the key value to be summated.

Note this reducer can count True and False values as well, True increments by 1, False does note increment

First Extract

This reducer will return the output of the first extract in the list of extracts. This is useful when extracting data that is common to the subject or the user (e.g., subject metadata).

SQS

Setting up an SQS reducer instructs Caesar to send the output of our extractor to an AWS SQS queue. We can then use remote aggregation code to consume and process those extracts asynchronously and without having to maintain a dedicated server to accept extracted data. The reducer needs to be configured (through the admin console) with the URL and name of an AWS SQS queue that will receive and temporarily store the classifications from the workflow

Rectangle

This reducer is used to cluster extracts from the Rectangle tool. It uses the DBSCAN algorithm to aggregate the shapes together.

External

This is similar to an external extractor, and is configured by providing a URL (requires HTTPS) that serves as an endpoint for the extractor data from Caesar.

Reduction Mode Examples

This example is to clarify the difference between how default reduction and running reduction work. Imagine the extract from each classification produces a number from 0 to 10 and the reducer computes the average of these numbers.

The same extracts are processed by each reducer in the same order and we illustrate the changing values in the system as they arrive. For clarity, the values of extracts are indicated in bold.

Default Reduction

Extract ID	Extract Value	Extracts to reducer	Store Value In	Calculation	Store Value
1	5	1	nil	5/1	nil
2	3	1, 2	nil	(5+3)/2	nil
2	3	1, 2	nil	(5+3)/2	nil
3	4	1, 2, 3	nil	(5+3+4)/3	nil

Running Reduction

Extract ID	Extract Value	Extracts to reducer	Store Value In	Calculation	Store Value	Items in Association
1	5	1	nil	(0*0+5)/(0+1)	1	1
2	3	2	1	(5*1+3)/(1+1)	2	2
2	3	nil	N/A	N/A	2	2
3	4	3	2	(4*2+4)/(2+1)	3	3

Points of Note

Note that in default reduction mode, re-reduction is always triggered, regardless of whether an extract is being processed twice. Also notice that each computation in default reduction consumes all of the extracts. We calculate an average by summing together the values of all of the extracts and then dividing by the number of extracts.

In running reduction, on the other hand, the store keeps a running count of how many items the reducer has seen. This store, with the previous value of the reduction, can be used to compute the new average using only the new value by using the formula ((old average * previous count) + new value)/(old count + 1) and the store can be updated with the new count (old count + 1).

When using running reducers for performance reasons, please keep in mind that the performance benefits of running reduction are only realized if every reducer for that reducible is executed in running mode. The primary advantage of running reduction is that it eliminates the need to load large numbers of extracts for a given subject or user.

Subject Metadata

Caesar can reflect on several attributes in a subject's metadata to know how to perform certain actions.

`#training_subject`:

Boolean. If true, subject is a training subject.
Used to funnel training subjects to a separate reduction pathway.
Example: TESS user weighting
ExtractFilter allows filtering by training behavior.
To use: set a filter on reducer to include: training_behavior: training_only or experiment_only
See Subject#training_subject? and Filters::FilterByTrainingBehavior for use.

`#previous_subject_ids`:

Array of Zooniverse subject ids
Subjects whose ids are included in array will be passed by RunsReducers to FetchExtractsBySubject
Used to indicate that one or more prior subjects' extracts should be included when reducing a new subject.
Example: TESS takes a new image of the same piece of the sky as a previous subject on a subsequent pass. The previous subject's Zooniverse id is included in the subject metadata and all extracts for both subjects are included in the new subject's reduction.
See Subject#additional_subject_ids_for_reduction for use.

Rules

A workflow can configure one or many rules. Each rule has a condition and one or more effects that happen when that condition evaluates to true. Conditions can be nested to achieve complicated if statements.

Rules may pertain to either subjects or users. Rules have an evaluation order that can be set in the database if need be, and then rules can either be all evaluated or evaluated until the first true condition is reached.

Conditions

The condition is a single operation, but some types of operations can be nested. The general syntax is like if you'd write Lisp in JSON. It's always an array with as the first item a string identifying the operator. The other values are operations in themselves: [operator, arg1, arg2, ...].

["lt", operation, operation, ...] - Performs numerical comparison. You can specify more than two arguments, and it will evaluate as a < b < c < d.
["lte", operation, operation, ...] - Performs numerical comparison. You can specify more than two arguments, and it will evaluate as a <= b <= c <= d.
["gt", operation, operation, ...] - Performs numerical comparison. You can specify more than two arguments, and it will evaluate as a > b > c > d.
["gte", operation, operation, ...] - Performs numerical comparison. You can specify more than two arguments, and it will evaluate as a >= b >= c >= d.
["eq", operation, operation, ...] - Performs numerical comparison. You can specify more than two arguments, and it will evaluate as a == b == c == d.
["const", value] - Always returns the configured value.
["lookup", key] - Look up a reduction value by the given key.
["not", operation] - Negates the operation
["and", operation, operation, ...] - Returns true if all of the given operations evaluate to logical true
["or", operation, operation, ...] - Returns true if any of the given operations evaluates to logical true

Sample conditions

If one or more vehicles is detected

From the console: ruby SubjectRule.new workflow_id: 123, condition: ['gte', ['lookup', 'survey-total-VHCL'], ['const', 1]], row_order: 1

Input into UI: json ["gte", ["lookup", "survey-total-VHCL"], ["const", 1]]

If the most likely identification is "HUMAN"

From the console: ruby SubjectRule.new workflow_id: 123, condition: ['gte', ['lookup', 'consensus.most_likely', ''], ['const', 'HUMAN']], row_order: 3 Input into UI: json ["gte", ["lookup", "consensus.most_likely", ""], ["const", "HUMAN"]]

Effects

Each rule can have one or more effects associated with it. Those effects will be performed when that rule's condition evaluates to true. Subject Rules have effects that affect subjects (and implicitly receive subject_id as a parameter) and User Rules have effects that affect users (user_id).

Subject Rule Effects

effect_type	`config` Parameters	Effect Code
`retire_subject`	`reason` (string)*	Effects::RetireSubject
`add_subject_to_set`	`subject_set_id` (string)	Effects::AddSubjectToSet
`add_subject_to_collection`	`collection_id` (string)	Effects::AddSubjectToCollection
`external_effect`	`url` (string)**	Effects::ExternalEffect

_{* Panoptes API validates reason against a list of permitted values. Choose from blank, consensus, or other}

_{** url must be HTTPS}

User Rule Effects

effect_type	`config` Parameters	Effect Code
`promote_user`	`workflow_id` (string)	Effects::ExternalEffect

Sample Effects

Retire a subject

From the console:

SubjectRuleEffect.new
  rule_id: 123,
  effect_type: 'retire_subject',
  config: { reason: 'consensus' }

In the UI:

These can be configured in the UI normally, there's nothing complicated like the condition field.

Promote a user to a new workflow

From the console: ruby UserRuleEffect.new rule_id: 234, effect_type: 'promote_user', config: { 'workflow_id': '555' }

How to do SWAP

In Panoptes, set workflow.configuration to something like:

{"subject_set_chances": {"EXPERT_SET_ID": 0}}

In Caesar, set the workflow like so:

{
  "extractors_config": {
    "who": {"type": "who"},
    "swap": {"type": "external", "url": "https://darryls-server.com"} # OPTIONAL
  },
  "reducers_config": {
    "swap": {"type": "external"},
    "count": {"type": "count"}
  }
  "rules_config": [
    {"if": [RULES], "then": [{"action": "retire_subject"}]}
  ]
}

When you detect an expert user, update their probabilities like this:

POST /api/project_preferences/update_settings?project_id=PROJECT_ID&user_id=USER_ID HTTP/1.1
Host: panoptes-staging.zooniverse.org
Authorization: Bearer TOKEN
Content-Type: application/json
Accept: application/vnd.api+json; version=1

{
  "project_preferences": {
    "designator": {
      "subject_set_chances": {
        "WORKFLOW_ID": {"SUBJECT_SET_ID": 0.5}
      }
    }
  }
}

And store expert-seenness in Caesar so that you can use it in the rulse

POST /workflows/WORKFLOW_ID/reducers/REDUCER_KEY/reductions HTTP/1.1
Host: caesar-staging.zooniverse.org
Authorization: Bearer TOKEN
Content-Type: application/json
Accept: application/json

{
  "likelyhood": 0.864,
  "seen_by_expert": false
}

This document is a reference to the current state of affairs on doing SWAP on the Panoptes platform (by which we mean the Panoptes API, Caesar, and Designator).

To do SWAP, one must:

Track the confusion matrix of users. We currently expect this to be done by some entity outside the Panoptes platform. This could be a script that runs periodically on someone's laptop, or it can be an external webservice that gets classifications streamed to it in real-time by Caesar (this is what Darryl is doing). We don't currently have a good place to store the confusion matrix itself inside the Panoptes platform. But, if the matrix identifies an expert classifier, post that into Panoptes under the project_preferences resource (API calls explained in later section)
Calculate the likelyhood of subjects. This is done in the same place that also calculates the confusion matrices. The resulting likelyhood should be posted into Caesar as a reduction.
Retire subjects when we know the answer. By posting the likelyhood into Caesar, we can set rules on it. For instance:
- IF likelyhood < 0.1 AND classifications_count > 5 THEN retire()
- IF likelyhood > 0.9 AND classifications_count > 5 THEN retire()
- IF likelyhood > 0.1 AND likelyhood < 0.9 AND not seen_by_expert AND classifications > 10 THEN move to expert_set
When Caesar moves subjects into an expert-only subject set, Designator can then serve subjects from that set only to users marked as experts by the project_preferences. Designator is all about serving subjects from sets with specific chances, which means that we avoid the situation where experts only ever see the really hard subjects by mixing e.g. 50% hard images with 50% "general population".

Errors

The Kittn API uses the following error codes:

Error Code	Meaning
400	Bad Request -- Your request sucks.
401	Unauthorized -- Your API key is wrong.
403	Forbidden -- The kitten requested is hidden for administrators only.
404	Not Found -- The specified kitten could not be found.
405	Method Not Allowed -- You tried to access a kitten with an invalid method.
406	Not Acceptable -- You requested a format that isn't json.
410	Gone -- The kitten requested has been removed from our servers.
418	I'm a teapot.
429	Too Many Requests -- You're requesting too many kittens! Slow down!
500	Internal Server Error -- We had a problem with our server. Try again later.
503	Service Unavailable -- We're temporarily offline for maintenance. Please try again later.

Introduction

Usage

Extracts

Creating an extractor

Extractor types

Blank extractor

Question extractor

Pluck field extractor

Survey extractor

Shape extractor

External extractor

Get extracts

Query Parameters

Create & update extracts

Body fields

External API calls

Classification data format

Task specific data

Reducers

Creating Reducers

Key

Topic

Grouping

Filters

From/To

Extractor Keys

Repeated classifications

Training behavior

Reduction Mode

Default Reduction

Running Reduction

Reduction Mode Example

Reducer types

Consensus

Count

Simple Stats

First Extract

SQS

Rectangle

External

Reduction Mode Examples

Default Reduction

Running Reduction

Points of Note

Subject Metadata

Caesar can reflect on several attributes in a subject's metadata to know how to perform certain actions.

#training_subject:

#previous_subject_ids:

Rules

Conditions

Sample conditions

If one or more vehicles is detected

If the most likely identification is "HUMAN"

Effects

Subject Rule Effects

User Rule Effects

Sample Effects

Retire a subject

Promote a user to a new workflow

How to do SWAP

Errors

`#training_subject`:

`#previous_subject_ids`: