Predictor
The AsyncAPI
kind currently only supports the python
predictor type.
Project files
Cortex makes all files in the project directory (i.e. the directory which contains cortex.yaml
) available for use in your Predictor implementation. Python bytecode files (*.pyc
, *.pyo
, *.pyd
), files or folders that start with .
, and the api configuration file (e.g. cortex.yaml
) are excluded.
The following files can also be added at the root of the project's directory:
.cortexignore
file, which follows the same syntax and behavior as.env
file, which exports environment variables that can be used in the predictor. Each line of this file must followthe
VARIABLE=value
format.
For example, if your directory looks like this:
You can access values.json
in your Predictor like this:
Python Predictor
Interface
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as from where to download the model and initialization files, or any configurable model parameters. You define config
in your API configuration, and it is passed through to your Predictor's constructor.
Your API can accept requests with different types of payloads. Navigate to the API requests section to learn about how headers can be used to change the type of payload
that is passed into your predict
method.
At this moment, the AsyncAPI predict
method can only return JSON
-parseable objects. Navigate to the API responses section to learn about how to configure it.
TensorFlow Predictor
Uses TensorFlow version 2.3.0 by default
Interface
Cortex provides a tensorflow_client
to your Predictor's constructor. tensorflow_client
is an instance of TensorFlowClient that manages a connection to a TensorFlow Serving container to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict()
function should call tensorflow_client.predict()
to make an inference with your exported TensorFlow model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict()
function as well.
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as from where to download the model and initialization files, or any configurable model parameters. You define config
in your API configuration, and it is passed through to your Predictor's constructor.
Your API can accept requests with different types of payloads. Navigate to the API requests section to learn about how headers can be used to change the type of payload
that is passed into your predict
method.
At this moment, the AsyncAPI predict
method can only return JSON
-parseable objects. Navigate to the API responses section to learn about how to configure it.
ONNX Predictor
Uses ONNX Runtime version 1.6.0 by default
Interface
Cortex provides an onnx_client
to your Predictor's constructor. onnx_client
is an instance of ONNXClient that manages an ONNX Runtime session to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict()
function should call onnx_client.predict()
to make an inference with your exported ONNX model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict()
function as well.
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as from where to download the model and initialization files, or any configurable model parameters. You define config
in your API configuration, and it is passed through to your Predictor's constructor.
Your API can accept requests with different types of payloads. Navigate to the API requests section to learn about how headers can be used to change the type of payload
that is passed into your predict
method.
At this moment, the AsyncAPI predict
method can only return JSON
-parseable objects. Navigate to the API responses section to learn about how to configure it.
API requests
The type of the payload
parameter in predict(self, payload)
can vary based on the content type of the request. The payload
parameter is parsed according to the Content-Type
header in the request. Here are the parsing rules (see below for examples):
For
Content-Type: application/json
,payload
will be the parsed JSON body.For
Content-Type: text/plain
,payload
will be a string.utf-8
encoding is assumed, unless specified otherwise (e.g. via
Content-Type: text/plain; charset=us-ascii
)For all other
Content-Type
values,payload
will be the rawbytes
of the request body.
Here are some examples:
JSON data
Making the request
Reading the payload
When sending a JSON payload, the payload
parameter will be a Python object:
Binary data
Making the request
Reading the payload
Since the Content-Type: application/octet-stream
header is used, the payload
parameter will be a bytes
object:
Here's an example if the binary data is an image:
Text data
Making the request
Reading the payload
Since the Content-Type: text/plain
header is used, the payload
parameter will be a string
object:
API responses
Currently, AsyncAPI responses of your predict()
method have to be a JSON-serializable dictionary.
Chaining APIs
It is possible to make requests from one API to another within a Cortex cluster. All running APIs are accessible from within the predictor at http://api-<api_name>:8888/predict
, where <api_name>
is the name of the API you are making a request to.
For example, if there is an api named text-generator
running in the cluster, you could make a request to it from a different API by using:
Structured logging
You can use Cortex's logger in your predictor implemention to log in JSON. This will enrich your logs with Cortex's metadata, and you can add custom metadata to the logs by adding key value pairs to the extra
key when using the logger. For example:
The dictionary passed in via the extra
will be flattened by one level. e.g.
To avoid overriding essential Cortex metadata, please refrain from specifying the following extra keys: asctime
, levelname
, message
, labels
, and process
. Log lines greater than 5 MB in size will be ignored.
Last updated