LogoLogo
WebsiteSlack
0.34
0.34
  • Get started
  • Clusters
    • Management
      • Auth
      • Create
      • Update
      • Delete
      • Environments
    • Instances
      • Multi-instance
      • Spot instances
    • Observability
      • Logging
      • Metrics
      • Alerting
    • Networking
      • Load balancers
      • VPC peering
      • HTTPS
      • Custom domain
    • Advanced
      • Setting up kubectl
      • Private Docker registry
      • Self hosted images
  • Workloads
    • Realtime APIs
      • Example
      • Handler
      • Configuration
      • Parallelism
      • Autoscaling
      • Models
      • Multi-model
        • Example
        • Configuration
        • Caching
      • Server-side batching
      • Metrics
      • Statuses
      • Traffic Splitter
        • Example
        • Configuration
      • Troubleshooting
    • Async APIs
      • Example
      • Handler
      • Configuration
      • TensorFlow Models
      • Metrics
      • Statuses
      • Webhooks
    • Batch APIs
      • Example
      • Handler
      • Configuration
      • Jobs
      • TensorFlow Models
      • Metrics
      • Statuses
    • Task APIs
      • Example
      • Definition
      • Configuration
      • Jobs
      • Metrics
      • Statuses
    • Dependencies
      • Example
      • Python packages
      • System packages
      • Custom images
    • Debugging
  • Clients
    • Install
    • Uninstall
    • CLI commands
    • Python client
Powered by GitBook
On this page
  • HTTP
  • Implement
  • Deploy
  • Monitor
  • Stream logs
  • Make a request
  • Delete
  • gRPC
  • Add protobuf file
  • Match RPC service name
  • Make a gRPC request
  1. Workloads
  2. Realtime APIs

Example

HTTP

Create HTTP APIs that respond to requests in real-time.

Implement

mkdir text-generator && cd text-generator
touch handler.py requirements.txt text_generator.yaml
# handler.py

from transformers import pipeline

class Handler:
    def __init__(self, config):
        self.model = pipeline(task="text-generation")

    def handle_post(self, payload):
        return self.model(payload["text"])[0]
# requirements.txt

transformers
torch
# text_generator.yaml

- name: text-generator
  kind: RealtimeAPI
  handler:
    type: python
    path: handler.py
  compute:
    gpu: 1

Deploy

cortex deploy text_generator.yaml

Monitor

cortex get text-generator --watch

Stream logs

cortex logs text-generator

Make a request

curl http://***.elb.us-west-2.amazonaws.com/text-generator -X POST -H "Content-Type: application/json" -d '{"text": "hello world"}'

Delete

cortex delete text-generator

gRPC

To make the above API use gRPC as its protocol, make the following changes (the rest of the steps are the same):

Add protobuf file

Create a handler.proto file in your project's directory:

<!-- handler.proto -->

syntax = "proto3";
package text_generator;

service Handler {
    rpc Predict (Message) returns (Message);
}

message Message {
    string text = 1;
}

Set the handler.protobuf_path field in the API spec to point to the handler.proto file:

# text_generator.yaml

- name: text-generator
  kind: RealtimeAPI
  handler:
    type: python
    path: handler.py
    protobuf_path: handler.proto
  compute:
    gpu: 1

Match RPC service name

Match the name of the RPC service(s) from the protobuf definition (in this case Predict) with what you're defining in the handler's implementation:

# handler.py

from transformers import pipeline

class Handler:
    def __init__(self, config, proto_module_pb2):
        self.model = pipeline(task="text-generation")
        self.proto_module_pb2 = proto_module_pb2

    def Predict(self, payload):
        return self.proto_module_pb2.Message(text="returned message")

Make a gRPC request

grpcurl -plaintext -proto handler.proto -d '{"text": "hello-world"}' ***.elb.us-west-2.amazonaws.com:80 text_generator.Handler/Predict
PreviousRealtime APIsNextHandler

Last updated 4 years ago