Configuration
- name: <string>
kind: RealtimeAPI
predictor: # detailed configuration below
compute: # detailed configuration below
autoscaling: # detailed configuration below
update_strategy: # detailed configuration below
networking: # detailed configuration belowPredictor
Python Predictor
predictor:
type: python
path: <string> # path to a python file with a PythonPredictor class definition, relative to the Cortex root (required)
multi_model_reloading: # use this to serve one or more models with live reloading (optional)
path: <string> # S3/GCS path to an exported model directory (e.g. s3://my-bucket/exported_model/) (either this, 'dir', or 'paths' must be provided if 'multi_model_reloading' is specified)
paths: # list of S3/GCS paths to exported model directories (either this, 'dir', or 'path' must be provided if 'multi_model_reloading' is specified)
- name: <string> # unique name for the model (e.g. text-generator) (required)
path: <string> # S3/GCS path to an exported model directory (e.g. s3://my-bucket/exported_model/) (required)
...
dir: <string> # S3/GCS path to a directory containing multiple models (e.g. s3://my-bucket/models/) (either this, 'path', or 'paths' must be provided if 'multi_model_reloading' is specified)
cache_size: <int> # the number models to keep in memory (optional; all models are kept in memory by default)
disk_cache_size: <int> # the number of models to keep on disk (optional; all models are kept on disk by default)
server_side_batching: # (optional)
max_batch_size: <int> # the maximum number of requests to aggregate before running inference
batch_interval: <duration> # the maximum amount of time to spend waiting for additional requests before running inference on the batch of requests
processes_per_replica: <int> # the number of parallel serving processes to run on each replica (default: 1)
threads_per_process: <int> # the number of threads per process (default: 1)
config: <string: value> # arbitrary dictionary passed to the constructor of the Predictor (optional)
python_path: <string> # path to the root of your Python folder that will be appended to PYTHONPATH (default: folder containing cortex.yaml)
image: <string> # docker image to use for the Predictor (default: quay.io/cortexlabs/python-predictor-cpu:0.28.0, quay.io/cortexlabs/python-predictor-gpu:0.28.0, or quay.io/cortexlabs/python-predictor-inf:0.28.0 based on compute)
env: <string: string> # dictionary of environment variables
log_level: <string> # log level that can be "debug", "info", "warning" or "error" (default: "info")
shm_size: <string> # size of shared memory (/dev/shm) for sharing data between multiple processes, e.g. 64Mi or 1Gi (default: Null)TensorFlow Predictor
ONNX Predictor
Compute
Autoscaling
Update strategy
Networking
Last updated