クラウド - Azure

Azure実行プロバイダー（プレビュー）

Azure実行プロバイダーは、ONNX RuntimeがリモートのAzureエンドポイントを呼び出して推論できるようにします。エンドポイントは事前にデプロイまたは利用可能になっている必要があります。

1.16以降、以下のプラグ可能な演算子がonnxruntime-extensionsから利用できます。

これらの演算子により、Azure実行プロバイダーは2つの使用モードをサポートします。

エッジとAzureの並列実行
ハイブリッドのマージと実行

Azure実行プロバイダーはプレビューステージにあり、すべてのAPIおよび使用法は変更される可能性があります。

インストール

1.16以降、Azure実行プロバイダーはpythonおよびnugetパッケージにデフォルトで同梱されています。

要件

1.16以降、すべてのAzure実行プロバイダー演算子はonnxruntime-extensions（>=v0.9.0）pythonおよびnugetパッケージに同梱されています。Azure実行プロバイダーを使用する前に、正しいonnxruntime-extensionパッケージがインストールされていることを確認してください。

ビルド

ビルド手順については、BUILDページを参照してください。

使用法

エッジとAzureの並列実行

このモードでは、2つのモデルが同時に実行されます。AzureモデルはRunAsync APIによって非同期で実行されます。これはpythonおよびcsharpからも利用できます。

import os
import onnx
from onnx import helper, TensorProto
from onnxruntime_extensions import get_library_path
from onnxruntime import SessionOptions, InferenceSession
import numpy as np
import threading


# ローカルモデルの生成：
# https://github.com/microsoft/onnxruntime-extensions/blob/main/tutorials/whisper_e2e.py
def get_whiper_tiny():
    return '/onnxruntime-extensions/tutorials/whisper_onnx_tiny_en_fp32_e2e.onnx'


# Azureモデルの生成
def get_openai_audio_azure_model():
    auth_token = helper.make_tensor_value_info('auth_token', TensorProto.STRING, [1])
    model = helper.make_tensor_value_info('model_name', TensorProto.STRING, [1])
    response_format = helper.make_tensor_value_info('response_format', TensorProto.STRING, [-1])
    file = helper.make_tensor_value_info('file', TensorProto.UINT8, [-1])

    transcriptions = helper.make_tensor_value_info('transcriptions', TensorProto.STRING, [-1])

    invoker = helper.make_node('OpenAIAudioToText',
                               ['auth_token', 'model_name', 'response_format', 'file'],
                               ['transcriptions'],
                               domain='com.microsoft.extensions',
                               name='audio_invoker',
                               model_uri='https://api.openai.com/v1/audio/transcriptions',
                               audio_format='wav',
                               verbose=False)

    graph = helper.make_graph([invoker], 'graph', [auth_token, model, response_format, file], [transcriptions])
    model = helper.make_model(graph, ir_version=8,
                              opset_imports=[helper.make_operatorsetid('com.microsoft.extensions', 1)])
    model_name = 'openai_whisper_azure.onnx'
    onnx.save(model, model_name)
    return model_name


if __name__ == '__main__':
    sess_opt = SessionOptions()
    sess_opt.register_custom_ops_library(get_library_path())

    azure_model_path = get_openai_audio_azure_model()
    azure_model_sess = InferenceSession(azure_model_path,
        sess_opt, providers=['CPUExecutionProvider', 'AzureExecutionProvider'])  # AzureEPのロード

    with open('test16.wav', "rb") as _f:  # ローカルのwavファイルから生の音声データを読み込む
        audio_stream = np.asarray(list(_f.read()), dtype=np.uint8)

    azure_model_inputs = {
        "auth_token": np.array([os.getenv('AUDIO', '')]),  # 環境変数から認証情報を読み込む
        "model_name": np.array(['whisper-1']),
        "response_format":  np.array(['text']),
        "file": audio_stream
    }


    class RunAsyncState:
        def __init__(self):
            self.__event = threading.Event()
            self.__outputs = None
            self.__err = ''

        def fill_outputs(self, outputs, err):
            self.__outputs = outputs
            self.__err = err
            self.__event.set()

        def get_outputs(self):
            if self.__err != '':
                raise Exception(self.__err)
            return self.__outputs;

        def wait(self, sec):
            self.__event.wait(sec)


    def azureRunCallback(outputs: np.ndarray, state: RunAsyncState, err: str) -> None:
        state.fill_outputs(outputs, err)


    run_async_state = RunAsyncState();
    # Azureモデルを非同期で推論
    azure_model_sess.run_async(None, azure_model_inputs, azureRunCallback, run_async_state)

    # 同時にエッジを実行
    edge_model_path = get_whiper_tiny()
    edge_model_sess = InferenceSession(edge_model_path,
        sess_opt, providers=['CPUExecutionProvider'])

    edge_model_outputs = edge_model_sess.run(None, {
        'audio_stream': np.expand_dims(audio_stream, 0),
        'max_length': np.asarray([200], dtype=np.int32),
        'min_length': np.asarray([0], dtype=np.int32),
        'num_beams': np.asarray([2], dtype=np.int32),
        'num_return_sequences': np.asarray([1], dtype=np.int32),
        'length_penalty': np.asarray([1.0], dtype=np.float32),
        'repetition_penalty': np.asarray([1.0], dtype=np.float32)
    })

    print("\nwhisper tinyからの出力: ", edge_model_outputs)
    run_async_state.wait(10)
    print("\nopenAIからの応答: ", run_async_state.get_outputs())
    # 結果を比較してより良い方を選択

ハイブリッドのマージと実行

代わりに、ローカルモデルとAzureモデルをハイブリッドにマージし、通常のONNXモデルとして推論することもできます。サンプルスクリプトはこちらにあります。

現在の制限事項

Windows、Linux、Androidでのみビルドおよび実行できます。
Androidでは、AzureTritonInvokerはサポートされていません。