tasks.classification

All Text classification models share the same API.

__init__

def __init__(self,
             embedding: Optional[Embedding] = None,
             hyper_parameters: Optional[Dict[str, Dict[str, Any]]] = None)

Args:

  • embedding: model embedding
  • hyper_parameters: a dict of hyper_parameters.

You could change customize hyper_parameters like this:

# get default hyper_parameters
hyper_parameters = BiLSTM_Model.get_default_hyper_parameters()
# change lstm hidden unit to 12
hyper_parameters['layer_blstm']['units'] = 12
# init new model with customized hyper_parameters
labeling_model = BiLSTM_Model(hyper_parameters=hyper_parameters)
labeling_model.fit(x, y)

Properties

token2idx

Returns model’s token index map, type: Dict[str, int]

label2idx

Returns model’s label index map, type: Dict[str, int]

Methods

get_default_hyper_parameters

Return the defualt hyper parameters

!!! attention “You must implement this function when customizing a model” When you are customizing your own model, you must implement this function.

Customization example: [customize-your-own-mode](../tutorial/text-classification.md#customize-your-own-model)
@classmethod
def get_default_hyper_parameters(cls) -> Dict[str, Dict[str, Any]]:

Returns:

  • dict of the defualt hyper parameters

build_model_arc

build model architectural, define models structure in this function.

!!! attention “You must implement this function when customizing a model” When you are customizing your own model, you must implement this function.

Customization example: [customize-your-own-mode](../tutorial/text-classification.md#customize-your-own-model)
def build_model_arc(self):

build_model

build model with corpus

def build_model(self,
                x_train: Union[Tuple[List[List[str]], ...], List[List[str]]],
                y_train: Union[List[List[str]], List[str]],
                x_validate: Union[Tuple[List[List[str]], ...], List[List[str]]] = None,
                y_validate: Union[List[List[str]], List[str]] = None)

Args:

  • x_train: Array of train feature data (if the model has a single input), or tuple of train feature data array (if the model has multiple inputs)
  • y_train: Array of train label data
  • x_validate: Array of validation feature data (if the model has a single input), or tuple of validation feature data array (if the model has multiple inputs)
  • y_validate: Array of validation label data

build_multi_gpu_model

Build multi-GPU model with corpus

def build_multi_gpu_model(self,
                            gpus: int,
                            x_train: Union[Tuple[List[List[str]], ...], List[List[str]]],
                            y_train: Union[List[List[str]], List[str]],
                            cpu_merge: bool = True,
                            cpu_relocation: bool = False,
                            x_validate: Union[Tuple[List[List[str]], ...], List[List[str]]] = None,
                            y_validate: Union[List[List[str]], List[str]] = None):

Args:

  • gpus: Integer >= 2, number of on GPUs on which to create model replicas.
  • cpu_merge: A boolean value to identify whether to force merging model weights under the scope of the CPU or not.
  • cpu_relocation: A boolean value to identify whether to create the model’s weights under the scope of the CPU. If the model is not defined under any preceding device scope, you can still rescue it by activating this option.
  • x_train: Array of train feature data (if the model has a single input), or tuple of train feature data array (if the model has multiple inputs)
  • y_train: Array of train label data
  • x_validate: Array of validation feature data (if the model has a single input), or tuple of validation feature data array (if the model has multiple inputs)
  • y_validate: Array of validation label data

build_tpu_model

Build TPU model with corpus

def build_tpu_model(self, strategy: tf.contrib.distribute.TPUStrategy,
                    x_train: Union[Tuple[List[List[str]], ...], List[List[str]]],
                    y_train: Union[List[List[str]], List[str]],
                    x_validate: Union[Tuple[List[List[str]], ...], List[List[str]]] = None,
                    y_validate: Union[List[List[str]], List[str]] = None):

Args:

  • strategy: TPUDistributionStrategy. The strategy to use for replicating model across multiple TPU cores.
  • x_train: Array of train feature data (if the model has a single input), or tuple of train feature data array (if the model has multiple inputs)
  • y_train: Array of train label data
  • x_validate: Array of validation feature data (if the model has a single input), or tuple of validation feature data array (if the model has multiple inputs)
  • y_validate: Array of validation label data

compile_model

Configures the model for training.

Using compile() function of tf.keras.Model

def compile_model(self, **kwargs):

Args:

  • **kwargs: arguments passed to compile() function of tf.keras.Model

Defaults:

  • loss: categorical_crossentropy
  • optimizer: adam
  • metrics: ['accuracy']

get_data_generator

data generator for fit_generator

def get_data_generator(self,
                        x_data,
                        y_data,
                        batch_size: int = 64,
                        shuffle: bool = True)

Args:

  • x_data: Array of feature data (if the model has a single input), or tuple of feature data array (if the model has multiple inputs)
  • y_data: Array of label data
  • batch_size: Number of samples per gradient update, default to 64.
  • shuffle:

Returns:

  • data generator

fit

Trains the model for a given number of epochs with fit_generator (iterations on a dataset).

def fit(self,
        x_train: Union[Tuple[List[List[str]], ...], List[List[str]]],
        y_train: Union[List[List[str]], List[str]],
        x_validate: Union[Tuple[List[List[str]], ...], List[List[str]]] = None,
        y_validate: Union[List[List[str]], List[str]] = None,
        batch_size: int = 64,
        epochs: int = 5,
        callbacks: List[keras.callbacks.Callback] = None,
        fit_kwargs: Dict = None):

Args:

  • x_train: Array of train feature data (if the model has a single input), or tuple of train feature data array (if the model has multiple inputs)
  • y_train: Array of train label data
  • x_validate: Array of validation feature data (if the model has a single input), or tuple of validation feature data array (if the model has multiple inputs)
  • y_validate: Array of validation label data
  • batch_size: Number of samples per gradient update, default to 64.
  • epochs: Integer. Number of epochs to train the model. default 5.
  • callbacks:
  • fit_kwargs: additional arguments passed to fit_generator() function from tensorflow.keras.Model

Returns:

  • A tf.keras.callbacks.History object.

fit_without_generator

Trains the model for a given number of epochs (iterations on a dataset). Large memory Cost.

def fit_without_generator(self,
                            x_train: Union[Tuple[List[List[str]], ...], List[List[str]]],
                            y_train: Union[List[List[str]], List[str]],
                            x_validate: Union[Tuple[List[List[str]], ...], List[List[str]]] = None,
                            y_validate: Union[List[List[str]], List[str]] = None,
                            batch_size: int = 64,
                            epochs: int = 5,
                            callbacks: List[keras.callbacks.Callback] = None,
                            fit_kwargs: Dict = None):

Args:

  • x_train: Array of train feature data (if the model has a single input), or tuple of train feature data array (if the model has multiple inputs)
  • y_train: Array of train label data
  • x_validate: Array of validation feature data (if the model has a single input), or tuple of validation feature data array (if the model has multiple inputs)
  • y_validate: Array of validation label data
  • batch_size: Number of samples per gradient update, default to 64.
  • epochs: Integer. Number of epochs to train the model. default 5.
  • callbacks:
  • fit_kwargs: additional arguments passed to fit_generator() function from tensorflow.keras.Model

Returns:

  • A tf.keras.callbacks.History object.

predict

Generates output predictions for the input samples. Computation is done in batches.

def predict(self,
            x_data,
            batch_size=None,
            multi_label_threshold: float = 0.5,
            debug_info=False,
            predict_kwargs: Dict = None):

Args:

  • x_data: The input data, as a Numpy array (or list of Numpy arrays if the model has multiple inputs).
  • batch_size: Integer. If unspecified, it will default to 32.
  • multi_label_threshold:
  • debug_info: Bool, Should print out the logging info.
  • predict_kwargs: Dict, arguments passed to predict() function of tensorflow.keras.Model

Returns:

  • array of predictions.

predict_top_k_class

Generates output predictions with confidence for the input samples.

Computation is done in batches.

def predict_top_k_class(self,
                        x_data,
                        top_k=5,
                        batch_size=32,
                        debug_info=False,
                        predict_kwargs: Dict = None) -> List[Dict]:

Args:

  • x_data: The input data, as a Numpy array (or list of Numpy arrays if the model has multiple inputs).
  • top_k: int
  • batch_size: Integer. If unspecified, it will default to 32.
  • debug_info: Bool, Should print out the logging info.
  • predict_kwargs: Dict, arguments passed to predict() function of tensorflow.keras.Model

Returns:

array(s) of prediction result dict.

  • sample result of single-label classification:
[
  {
    "label": "chat",
    "confidence": 0.5801531,
    "candidates": [
      { "label": "cookbook", "confidence": 0.1886314 },
      { "label": "video", "confidence": 0.13805099 },
      { "label": "health", "confidence": 0.013852648 },
      { "label": "translation", "confidence": 0.012913573 }
    ]
  }
]
  • sample result of multi-label classification:
[
  {
    "candidates": [
      { "confidence": 0.9959336, "label": "toxic" },
      { "confidence": 0.9358089, "label": "obscene" },
      { "confidence": 0.6882098, "label": "insult" },
      { "confidence": 0.13540423, "label": "severe_toxic" },
      { "confidence": 0.017219543, "label": "identity_hate" }
    ]
  }
]

evaluate

Evaluate model

def evaluate(self,
            x_data,
            y_data,
            batch_size=None,
            digits=4,
            debug_info=False) -> Tuple[float, float, Dict]:

Args:

  • x_data:
  • y_data:
  • batch_size:
  • digits:
  • debug_info:

save

Save model info json and model weights to given folder path

def save(self, model_path: str):

Args:

  • model_path: target model folder path

info

Returns a dictionary containing the configuration of the model.

def info(self)