TensorFlow Models
In addition to the standard Python Handler, Cortex also supports another handler called the TensorFlow handler, which can be used to deploy TensorFlow models exported as SavedModel
models.
Interface
Uses TensorFlow version 2.3.0 by default
Cortex provides a tensorflow_client
to your handler's constructor. tensorflow_client
is an instance of TensorFlowClient that manages a connection to a TensorFlow Serving container to make predictions using your model. It should be saved as an instance variable in your handler class, and your handle_async()
function should call tensorflow_client.predict()
to make an inference with your exported TensorFlow model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your handle_async()
function as well.
When multiple models are defined using the Handler's models
field, the tensorflow_client.predict()
method expects a second argument model_name
which must hold the name of the model that you want to use for inference (for example: self.client.predict(payload, "text-generator")
). There is also an optional third argument to specify the model version.
If you need to share files between your handler implementation and the TensorFlow Serving container, you can create a new directory within /mnt
(e.g. /mnt/user
) and write files to it. The entire /mnt
directory is shared between containers, but do not write to any of the directories in /mnt
that already exist (they are used internally by Cortex).
predict
method
predict
methodInference is performed by using the predict
method of the tensorflow_client
that's passed to the handler's constructor:
Specifying models
Whenever a model path is specified in an API configuration file, it should be a path to an S3 prefix which contains your exported model. Directories may include a single model, or multiple folders each with a single model (note that a "single model" need not be a single file; there can be multiple files for a single model). When multiple folders are used, the folder names must be integer values, and will be interpreted as the model version. Model versions can be any integer, but are typically integer timestamps. It is always assumed that the highest version number is the latest version of your model.
API spec
Single model
The most common pattern is to serve a single model per API. The path to the model is specified in the path
field in the handler.models
configuration. For example:
Multiple models
It is possible to serve multiple models from a single API. The paths to the models are specified in the api configuration, either via the models.paths
or models.dir
field in the handler
configuration. For example:
or:
When using the models.paths
field, each path must be a valid model directory (see above for valid model directory structures).
When using the models.dir
field, the directory provided may contain multiple subdirectories, each of which is a valid model directory. For example:
In this case, there are two models in the directory, one of which is named "text-generator", and the other is named "sentiment-analyzer".
Structure
On CPU/GPU
The model path must be a SavedModel export:
or for a versioned model:
On Inferentia
When Inferentia models are used, the directory structure is slightly different:
or for a versioned model:
Last updated