Configuration

- name: <string>
  kind: RealtimeAPI
  predictor: # detailed configuration below
  compute: # detailed configuration below
  autoscaling: # detailed configuration below
  update_strategy: # detailed configuration below
  networking: # detailed configuration below

Predictor

Python Predictor

predictor:
  type: python
  path: <string>  # path to a python file with a PythonPredictor class definition, relative to the Cortex root (required)
  multi_model_reloading:  # use this to serve one or more models with live reloading (optional)
    path: <string> # S3/GCS path to an exported model directory (e.g. s3://my-bucket/exported_model/) (either this, 'dir', or 'paths' must be provided if 'multi_model_reloading' is specified)
    paths:  # list of S3/GCS paths to exported model directories (either this, 'dir', or 'path' must be provided if 'multi_model_reloading' is specified)
      - name: <string>  # unique name for the model (e.g. text-generator) (required)
        path: <string>  # S3/GCS path to an exported model directory (e.g. s3://my-bucket/exported_model/) (required)
      ...
    dir: <string>  # S3/GCS path to a directory containing multiple models (e.g. s3://my-bucket/models/) (either this, 'path', or 'paths' must be provided if 'multi_model_reloading' is specified)
    cache_size: <int>  # the number models to keep in memory (optional; all models are kept in memory by default)
    disk_cache_size: <int>  # the number of models to keep on disk (optional; all models are kept on disk by default)
  server_side_batching:  # (optional)
    max_batch_size: <int>  # the maximum number of requests to aggregate before running inference
    batch_interval: <duration>  # the maximum amount of time to spend waiting for additional requests before running inference on the batch of requests
  processes_per_replica: <int>  # the number of parallel serving processes to run on each replica (default: 1)
  threads_per_process: <int>  # the number of threads per process (default: 1)
  config: <string: value>  # arbitrary dictionary passed to the constructor of the Predictor (optional)
  python_path: <string>  # path to the root of your Python folder that will be appended to PYTHONPATH (default: folder containing cortex.yaml)
  image: <string>  # docker image to use for the Predictor (default: quay.io/cortexlabs/python-predictor-cpu:0.28.0, quay.io/cortexlabs/python-predictor-gpu:0.28.0, or quay.io/cortexlabs/python-predictor-inf:0.28.0 based on compute)
  env: <string: string>  # dictionary of environment variables
  log_level: <string>  # log level that can be "debug", "info", "warning" or "error" (default: "info")
  shm_size: <string>  # size of shared memory (/dev/shm) for sharing data between multiple processes, e.g. 64Mi or 1Gi (default: Null)

TensorFlow Predictor

ONNX Predictor

Compute

Autoscaling

Update strategy

Networking

Last updated