Handler
Last updated
Last updated
Realtime APIs respond to requests in real-time and autoscale based on in-flight request volumes. They can be used for realtime inference or data processing workloads.
If you plan on deploying ML models and run realtime inferences, check out the page. Cortex provides out-of-the-box support for a variety of frameworks such as: PyTorch, ONNX, scikit-learn, XGBoost, TensorFlow, etc.
The response type of the handler can vary depending on your requirements, see and below.
Cortex makes all files in the project directory (i.e. the directory which contains cortex.yaml
) available for use in your Handler class implementation. Python bytecode files (*.pyc
, *.pyo
, *.pyd
), files or folders that start with .
, and the api configuration file (e.g. cortex.yaml
) are excluded.
The following files can also be added at the root of the project's directory:
.cortexignore
file, which follows the same syntax and behavior as a .
.env
file, which exports environment variables that can be used in the handler. Each line of this file must follow the VARIABLE=value
format.
For example, if your directory looks like this:
You can access values.json
in your handler class like this:
Your Handler
class can implement methods for each of the following HTTP methods: POST, GET, PUT, PATCH, DELETE. Therefore, the respective methods in the Handler
definition can be handle_post
, handle_get
, handle_put
, handle_patch
, and handle_delete
.
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as from where to download the model and initialization files, or any configurable model parameters. You define config
in your API configuration, and it is passed through to your Handler's constructor.
A callback is a function that starts running in the background after the results have been sent back to the client. They are meant to be short-lived.
Each handler method of your class can implement callbacks. To do this, when returning the result(s) from your handler method, also make sure to return a 2-element tuple in which the first element are your results that you want to return and the second element is a callable object that takes no arguments.
You can implement a callback like in the following example:
The type of the payload
parameter in handle_<HTTP-VERB>(self, payload)
can vary based on the content type of the request. The payload
parameter is parsed according to the Content-Type
header in the request. Here are the parsing rules (see below for examples):
For Content-Type: application/json
, payload
will be the parsed JSON body.
For Content-Type: text/plain
, payload
will be a string. utf-8
encoding is assumed, unless specified otherwise (e.g. via Content-Type: text/plain; charset=us-ascii
)
For all other Content-Type
values, payload
will be the raw bytes
of the request body.
Here are some examples:
Making the request
Reading the payload
When sending a JSON payload, the payload
parameter will be a Python object:
Making the request
Reading the payload
Since the Content-Type: application/octet-stream
header is used, the payload
parameter will be a bytes
object:
Here's an example if the binary data is an image:
Making the request
Reading the payload
Making the request
Reading the payload
Making the request
Reading the payload
Since the Content-Type: text/plain
header is used, the payload
parameter will be a string
object:
The response of your handle_<HTTP-VERB>()
method may be:
A JSON-serializable object (lists, dictionaries, numbers, etc.)
A string
object (e.g. "class 1"
)
A bytes
object (e.g. bytes(4)
or pickle.dumps(obj)
)
To serve your API using the gRPC protocol, make sure the handler.protobuf_path
field in your API configuration is pointing to a protobuf file. When the API gets deployed, Cortex will compile the protobuf file for its use when serving the API.
Your Handler
class must implement the RPC methods found in the protobuf. Your protobuf must have a single service defined, which can have any name. If your service has 2 RPC methods called Info
and Predict
methods, then your Handler
class must also implement these methods like in the above Handler
template.
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as from where to download the model and initialization files, or any configurable model parameters. You define config
in your API configuration, and it is passed through to your Handler class' constructor.
Assuming the following service:
The handler implementation will also have a corresponding Predict
method defined that represents the RPC method in the above protobuf service. The name(s) of the RPC method(s) is not enforced by Cortex.
The type of the payload
parameter passed into Predict(self, payload)
will match that of the Sample
message defined in the handler.protobuf_path
file. For this example, we'll assume that the above protobuf file was specified for the API.
The service method must look like this:
Making the request
Reading the payload
In the Predict
method, you'll read the value like this:
The service method must look like this:
Making the request
Reading the payload
In the Predict
method, you'll read the streamed values like this:
Assuming the following service:
The handler implementation will also have a corresponding Predict
method defined that represents the RPC method in the above protobuf service. The name(s) of the RPC method(s) is not enforced by Cortex.
The type of the value that you return in your Predict()
method must match the Response
message defined in the handler.protobuf_path
file. For this example, we'll assume that the above protobuf file was specified for the API.
The service method must look like this:
Making the request
Returning the response
In the Predict
method, you'll return the value like this:
The service method must look like this:
Making the request
Returning the response
In the Predict
method, you'll return the streamed values like this:
It is possible to make requests from one API to another within a Cortex cluster. All running APIs are accessible from within the handler implementation at http://api-<api_name>:8888/
, where <api_name>
is the name of the API you are making a request to.
For example, if there is an api named text-generator
running in the cluster, you could make a request to it from a different API by using:
Note that the autoscaling configuration (i.e. target_replica_concurrency
) for the API that is making the request should be modified with the understanding that requests will still be considered "in-flight" with the first API as the request is being fulfilled in the second API (during which it will also be considered "in-flight" with the second API).
You can use Cortex's logger in your handler implemention to log in JSON. This will enrich your logs with Cortex's metadata, and you can add custom metadata to the logs by adding key value pairs to the extra
key when using the logger. For example:
The dictionary passed in via the extra
will be flattened by one level. e.g.
To avoid overriding essential Cortex metadata, please refrain from specifying the following extra keys: asctime
, levelname
, message
, labels
, and process
. Log lines greater than 5 MB in size will be ignored.
Your API can accept requests with different types of payloads such as JSON
-parseable, bytes
or starlette.datastructures.FormData
data. See to learn about how headers can be used to change the type of payload
that is passed into your handler method.
Your handler method can return different types of objects such as JSON
-parseable, string
, and bytes
objects. See to learn about how to configure your handler method to respond with different response codes and content-types.
For Content-Type: multipart/form-data
/ Content-Type: application/x-www-form-urlencoded
, payload
will be starlette.datastructures.FormData
(key-value pairs where the values are strings for text data, or starlette.datastructures.UploadFile
for file uploads; see ).
When sending files via form data, the payload
parameter will be starlette.datastructures.FormData
(key-value pairs where the values are starlette.datastructures.UploadFile
, see ). Either Content-Type: multipart/form-data
or Content-Type: application/x-www-form-urlencoded
can be used (typically Content-Type: multipart/form-data
is used for files, and is the default in the examples above).
When sending text via form data, the payload
parameter will be starlette.datastructures.FormData
(key-value pairs where the values are strings, see ). Either Content-Type: multipart/form-data
or Content-Type: application/x-www-form-urlencoded
can be used (typically Content-Type: application/x-www-form-urlencoded
is used for text, and is the default in the examples above).
An instance of
Your API can only accept the type that has been specified in the protobuf definition of your service's method. See for how to construct gRPC requests.
Your handler method(s) can only return the type that has been specified in the protobuf definition of your service's method(s). See for how to handle gRPC responses.
A default environment has been configured for your API. This can be used for deploying/deleting/updating or submitting jobs to your running cluster based on the execution flow of your handler. For example: