Data Processors

SequenceProcessor

class kashgari.processors.SequenceProcessor(build_in_vocab='text', min_count=3, build_vocab_from_labels=False, **kwargs)[source]

Bases: kashgari.processors.abc_processor.ABCProcessor

Generic processors for the sequence samples.

Parameters
  • build_in_vocab (str) –

  • min_count (int) –

  • build_vocab_from_labels (bool) –

  • kwargs (Any) –

Return type

None

__init__(build_in_vocab='text', min_count=3, build_vocab_from_labels=False, **kwargs)[source]
Parameters
  • vocab_dict_type – initial vocab dict type, one of text labeling.

  • **kwargs

  • build_in_vocab (str) –

  • min_count (int) –

  • build_vocab_from_labels (bool) –

  • kwargs (Any) –

Return type

None

build_vocab(x_data, y_data)
Parameters
  • x_data (List[List[str]]) –

  • y_data (List[List[str]]) –

Return type

None

build_vocab_generator(generators)[source]
Parameters

generators (List[kashgari.generators.CorpusGenerator]) –

Return type

None

get_tensor_shape(batch_size, seq_length)
Parameters
  • batch_size (int) –

  • seq_length (int) –

Return type

Tuple

inverse_transform(labels, *, lengths=None, threshold=0.5, **kwargs)[source]
Parameters
  • labels (Union[List[List[int]], numpy.ndarray]) –

  • lengths (Optional[List[int]]) –

  • threshold (float) –

  • kwargs (Any) –

Return type

List[List[str]]

to_dict()[source]
Return type

Dict[str, Any]

transform(samples, *, seq_length=None, max_position=None, segment=False)[source]
Parameters
  • samples (List[List[str]]) –

  • seq_length (Optional[int]) –

  • max_position (Optional[int]) –

  • segment (bool) –

Return type

numpy.ndarray

property is_vocab_build: bool
property vocab_size: int

ClassificationProcessor

class kashgari.processors.ClassificationProcessor(multi_label=False, **kwargs)[source]

Bases: kashgari.processors.abc_processor.ABCProcessor

Parameters
  • multi_label (bool) –

  • kwargs (Any) –

Return type

None

__init__(multi_label=False, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

Parameters
  • multi_label (bool) –

  • kwargs (Any) –

Return type

None

build_vocab(x_data, y_data)
Parameters
  • x_data (List[List[str]]) –

  • y_data (List[List[str]]) –

Return type

None

build_vocab_generator(generators)[source]
Parameters

generators (List[kashgari.generators.CorpusGenerator]) –

Return type

None

get_tensor_shape(batch_size, seq_length)[source]
Parameters
  • batch_size (int) –

  • seq_length (int) –

Return type

Tuple

inverse_transform(labels, *, lengths=None, threshold=0.5, **kwargs)[source]
Parameters
Return type

Union[List[List[str]], List[str]]

to_dict()[source]
Return type

Dict[str, Any]

transform(samples, *, seq_length=None, max_position=None, segment=False)[source]
Parameters
  • samples (List[List[str]]) –

  • seq_length (Optional[int]) –

  • max_position (Optional[int]) –

  • segment (bool) –

Return type

numpy.ndarray

property is_vocab_build: bool
property vocab_size: int