Predictor
Which Predictor you use depends on how your model is exported:
TensorFlow Predictor if your model is exported as a TensorFlow
SavedModel
ONNX Predictor if your model is exported in the ONNX format
Python Predictor for all other cases
The response type of the predictor can vary depending on your requirements, see HTTP API responses and gRPC API responses below.
Project files
Cortex makes all files in the project directory (i.e. the directory which contains cortex.yaml
) available for use in your Predictor implementation. Python bytecode files (*.pyc
, *.pyo
, *.pyd
), files or folders that start with .
, and the api configuration file (e.g. cortex.yaml
) are excluded.
The following files can also be added at the root of the project's directory:
.cortexignore
file, which follows the same syntax and behavior as a .gitignore file..env
file, which exports environment variables that can be used in the predictor. Each line of this file must follow theVARIABLE=value
format.
For example, if your directory looks like this:
You can access values.json
in your Predictor like this:
HTTP
Python Predictor
Interface
When explicit model paths are specified in the Python predictor's API configuration, Cortex provides a python_client
to your Predictor's constructor. python_client
is an instance of PythonClient that is used to load model(s) (it calls the load_model()
method of your predictor, which must be defined when using explicit model paths). It should be saved as an instance variable in your Predictor, and your predict()
function should call python_client.get_model()
to load your model for inference. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict()
function as well.
When multiple models are defined using the Predictor's models
field, the python_client.get_model()
method expects an argument model_name
which must hold the name of the model that you want to load (for example: self.client.get_model("text-generator")
). There is also an optional second argument to specify the model version.
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as from where to download the model and initialization files, or any configurable model parameters. You define config
in your API configuration, and it is passed through to your Predictor's constructor.
Your API can accept requests with different types of payloads such as JSON
-parseable, bytes
or starlette.datastructures.FormData
data. See HTTP API requests to learn about how headers can be used to change the type of payload
that is passed into your predict
method.
Your predictor
method can return different types of objects such as JSON
-parseable, string
, and bytes
objects. See HTTP API responses to learn about how to configure your predictor
method to respond with different response codes and content-types.
TensorFlow Predictor
Uses TensorFlow version 2.3.0 by default
Interface
Cortex provides a tensorflow_client
to your Predictor's constructor. tensorflow_client
is an instance of TensorFlowClient that manages a connection to a TensorFlow Serving container to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict()
function should call tensorflow_client.predict()
to make an inference with your exported TensorFlow model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict()
function as well.
When multiple models are defined using the Predictor's models
field, the tensorflow_client.predict()
method expects a second argument model_name
which must hold the name of the model that you want to use for inference (for example: self.client.predict(payload, "text-generator")
). There is also an optional third argument to specify the model version.
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as configurable model parameters or download links for initialization files. You define config
in your API configuration, and it is passed through to your Predictor's constructor.
Your API can accept requests with different types of payloads such as JSON
-parseable, bytes
or starlette.datastructures.FormData
data. See HTTP API requests to learn about how headers can be used to change the type of payload
that is passed into your predict
method.
Your predictor
method can return different types of objects such as JSON
-parseable, string
, and bytes
objects. See HTTP API responses to learn about how to configure your predictor
method to respond with different response codes and content-types.
If you need to share files between your predictor implementation and the TensorFlow Serving container, you can create a new directory within /mnt
(e.g. /mnt/user
) and write files to it. The entire /mnt
directory is shared between containers, but do not write to any of the directories in /mnt
that already exist (they are used internally by Cortex).
ONNX Predictor
Uses ONNX Runtime version 1.6.0 by default
Interface
Cortex provides an onnx_client
to your Predictor's constructor. onnx_client
is an instance of ONNXClient that manages an ONNX Runtime session to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict()
function should call onnx_client.predict()
to make an inference with your exported ONNX model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict()
function as well.
When multiple models are defined using the Predictor's models
field, the onnx_client.predict()
method expects a second argument model_name
which must hold the name of the model that you want to use for inference (for example: self.client.predict(model_input, "text-generator")
). There is also an optional third argument to specify the model version.
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as configurable model parameters or download links for initialization files. You define config
in your API configuration, and it is passed through to your Predictor's constructor.
Your API can accept requests with different types of payloads such as JSON
-parseable, bytes
or starlette.datastructures.FormData
data. See HTTP API requests to learn about how headers can be used to change the type of payload
that is passed into your predict
method.
Your predictor
method can return different types of objects such as JSON
-parseable, string
, and bytes
objects. See HTTP API responses to learn about how to configure your predictor
method to respond with different response codes and content-types.
HTTP requests
The type of the payload
parameter in predict(self, payload)
can vary based on the content type of the request. The payload
parameter is parsed according to the Content-Type
header in the request. Here are the parsing rules (see below for examples):
For
Content-Type: application/json
,payload
will be the parsed JSON body.For
Content-Type: multipart/form-data
/Content-Type: application/x-www-form-urlencoded
,payload
will bestarlette.datastructures.FormData
(key-value pairs where the values are strings for text data, orstarlette.datastructures.UploadFile
for file uploads; see Starlette's documentation).For
Content-Type: text/plain
,payload
will be a string.utf-8
encoding is assumed, unless specified otherwise (e.g. viaContent-Type: text/plain; charset=us-ascii
)For all other
Content-Type
values,payload
will be the rawbytes
of the request body.
Here are some examples:
JSON data
Making the request
Reading the payload
When sending a JSON payload, the payload
parameter will be a Python object:
Binary data
Making the request
Reading the payload
Since the Content-Type: application/octet-stream
header is used, the payload
parameter will be a bytes
object:
Here's an example if the binary data is an image:
Form data (files)
Making the request
Reading the payload
When sending files via form data, the payload
parameter will be starlette.datastructures.FormData
(key-value pairs where the values are starlette.datastructures.UploadFile
, see Starlette's documentation). Either Content-Type: multipart/form-data
or Content-Type: application/x-www-form-urlencoded
can be used (typically Content-Type: multipart/form-data
is used for files, and is the default in the examples above).
Form data (text)
Making the request
Reading the payload
When sending text via form data, the payload
parameter will be starlette.datastructures.FormData
(key-value pairs where the values are strings, see Starlette's documentation). Either Content-Type: multipart/form-data
or Content-Type: application/x-www-form-urlencoded
can be used (typically Content-Type: application/x-www-form-urlencoded
is used for text, and is the default in the examples above).
Text data
Making the request
Reading the payload
Since the Content-Type: text/plain
header is used, the payload
parameter will be a string
object:
HTTP responses
The response of your predict()
function may be:
A JSON-serializable object (lists, dictionaries, numbers, etc.)
A
string
object (e.g."class 1"
)A
bytes
object (e.g.bytes(4)
orpickle.dumps(obj)
)An instance of starlette.responses.Response
gRPC
To serve your API using the gRPC protocol, make sure the predictor.protobuf_path
field in your API configuration is pointing to a protobuf file. When the API gets deployed, Cortex will compile the protobuf file for its use when serving the API.
Python Predictor
Interface
When explicit model paths are specified in the Python predictor's API configuration, Cortex provides a python_client
to your Predictor's constructor. python_client
is an instance of PythonClient that is used to load model(s) (it calls the load_model()
method of your predictor, which must be defined when using explicit model paths). It should be saved as an instance variable in your Predictor, and your predict()
function should call python_client.get_model()
to load your model for inference. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict()
function as well.
When multiple models are defined using the Predictor's models
field, the python_client.get_model()
method expects an argument model_name
which must hold the name of the model that you want to load (for example: self.client.get_model("text-generator")
). There is also an optional second argument to specify the model version.
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as from where to download the model and initialization files, or any configurable model parameters. You define config
in your API configuration, and it is passed through to your Predictor's constructor.
Your API can only accept the type that has been specified in the protobuf definition of your service's method. See gRPC API requests for how to construct gRPC requests.
Your predictor
method can only return the type that has been specified in the protobuf definition of your service's method. See gRPC API responses for how to handle gRPC responses.
TensorFlow Predictor
Uses TensorFlow version 2.3.0 by default
Interface
Cortex provides a tensorflow_client
to your Predictor's constructor. tensorflow_client
is an instance of TensorFlowClient that manages a connection to a TensorFlow Serving container to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict()
function should call tensorflow_client.predict()
to make an inference with your exported TensorFlow model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict()
function as well.
When multiple models are defined using the Predictor's models
field, the tensorflow_client.predict()
method expects a second argument model_name
which must hold the name of the model that you want to use for inference (for example: self.client.predict(payload, "text-generator")
). There is also an optional third argument to specify the model version.
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as configurable model parameters or download links for initialization files. You define config
in your API configuration, and it is passed through to your Predictor's constructor.
Your API can only accept the type that has been specified in the protobuf definition of your service's method. See gRPC API requests for how to construct gRPC requests.
Your predictor
method can only return the type that has been specified in the protobuf definition of your service's method. See gRPC API responses for how to handle gRPC responses.
If you need to share files between your predictor implementation and the TensorFlow Serving container, you can create a new directory within /mnt
(e.g. /mnt/user
) and write files to it. The entire /mnt
directory is shared between containers, but do not write to any of the directories in /mnt
that already exist (they are used internally by Cortex).
ONNX Predictor
Uses ONNX Runtime version 1.6.0 by default
Interface
Cortex provides an onnx_client
to your Predictor's constructor. onnx_client
is an instance of ONNXClient that manages an ONNX Runtime session to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict()
function should call onnx_client.predict()
to make an inference with your exported ONNX model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict()
function as well.
When multiple models are defined using the Predictor's models
field, the onnx_client.predict()
method expects a second argument model_name
which must hold the name of the model that you want to use for inference (for example: self.client.predict(model_input, "text-generator")
). There is also an optional third argument to specify the model version.
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as configurable model parameters or download links for initialization files. You define config
in your API configuration, and it is passed through to your Predictor's constructor.
Your API can only accept the type that has been specified in the protobuf definition of your service's method. See gRPC API requests for how to construct gRPC requests.
Your predictor
method can only return the type that has been specified in the protobuf definition of your service's method. See gRPC API responses for how to handle gRPC responses.
gRPC requests
Assuming the following service:
The type of the payload
parameter passed into predict(self, payload)
will match that of the Sample
message defined in the predictor.protobuf_path
file. For this example, we'll assume that the above protobuf file was specified for the API.
Simple request
The service method must look like this:
Making the request
Reading the payload
In the predict
method, you'll read the value like this:
Streaming request
The service method must look like this:
Making the request
Reading the payload
In the predict
method, you'll read the streamed values like this:
gRPC responses
Assuming the following service:
The type of the value that you return in your predict()
method must match the Response
message defined in the predictor.protobuf_path
file. For this example, we'll assume that the above protobuf file was specified for the API.
Simple response
The service method must look like this:
Making the request
Returning the response
In the predict
method, you'll return the value like this:
Streaming response
The service method must look like this:
Making the request
Returning the response
In the predict
method, you'll return the streamed values like this:
Chaining APIs
It is possible to make requests from one API to another within a Cortex cluster. All running APIs are accessible from within the predictor at http://api-<api_name>:8888/predict
, where <api_name>
is the name of the API you are making a request to.
For example, if there is an api named text-generator
running in the cluster, you could make a request to it from a different API by using:
Note that the autoscaling configuration (i.e. target_replica_concurrency
) for the API that is making the request should be modified with the understanding that requests will still be considered "in-flight" with the first API as the request is being fulfilled in the second API (during which it will also be considered "in-flight" with the second API).
Structured logging
You can use Cortex's logger in your predictor implemention to log in JSON. This will enrich your logs with Cortex's metadata, and you can add custom metadata to the logs by adding key value pairs to the extra
key when using the logger. For example:
The dictionary passed in via the extra
will be flattened by one level. e.g.
To avoid overriding essential Cortex metadata, please refrain from specifying the following extra keys: asctime
, levelname
, message
, labels
, and process
. Log lines greater than 5 MB in size will be ignored.
Cortex Python client
A default Cortex Python client environment has been configured for your API. This can be used for deploying/deleting/updating or submitting jobs to your running cluster based on the execution flow of your predictor. For example:
Last updated