Data Processors

SequenceProcessor

class kashgari.processors.SequenceProcessor(build_in_vocab: str = 'text', min_count: int = 3, build_vocab_from_labels: bool = False, **kwargs)[source]

Bases: kashgari.processors.abc_processor.ABCProcessor

Generic processors for the sequence samples.

__init__(build_in_vocab: str = 'text', min_count: int = 3, build_vocab_from_labels: bool = False, **kwargs) → None[source]
Parameters:
  • vocab_dict_type – initial vocab dict type, one of text labeling.
  • **kwargs
build_vocab(x_data: List[List[str]], y_data: List[List[str]]) → None
build_vocab_generator(generators: List[kashgari.generators.CorpusGenerator]) → None[source]
get_tensor_shape(batch_size: int, seq_length: int) → Tuple
inverse_transform(labels: Union[List[List[int]], numpy.ndarray], *, lengths: List[int] = None, threshold: float = 0.5, **kwargs) → List[List[str]][source]
is_vocab_build
to_dict() → Dict[str, Any][source]
transform(samples: List[List[str]], *, seq_length: int = None, max_position: int = None, segment: bool = False) → numpy.ndarray[source]
vocab_size

ClassificationProcessor

class kashgari.processors.ClassificationProcessor(multi_label: bool = False, **kwargs)[source]

Bases: kashgari.processors.abc_processor.ABCProcessor

__init__(multi_label: bool = False, **kwargs) → None[source]

Initialize self. See help(type(self)) for accurate signature.

build_vocab(x_data: List[List[str]], y_data: List[List[str]]) → None
build_vocab_generator(generators: List[kashgari.generators.CorpusGenerator]) → None[source]
get_tensor_shape(batch_size: int, seq_length: int) → Tuple[source]
inverse_transform(labels: Union[List[int], numpy.ndarray], *, lengths: List[int] = None, threshold: float = 0.5, **kwargs) → Union[List[List[str]], List[str]][source]
is_vocab_build
to_dict() → Dict[str, Any][source]
transform(samples: List[List[str]], *, seq_length: int = None, max_position: int = None, segment: bool = False) → numpy.ndarray[source]
vocab_size