Data Processors¶
Table of Contents
SequenceProcessor¶
-
class
kashgari.processors.
SequenceProcessor
(build_in_vocab: str = 'text', min_count: int = 3, build_vocab_from_labels: bool = False, **kwargs)[source]¶ Bases:
kashgari.processors.abc_processor.ABCProcessor
Generic processors for the sequence samples.
-
__init__
(build_in_vocab: str = 'text', min_count: int = 3, build_vocab_from_labels: bool = False, **kwargs) → None[source]¶ Parameters: - vocab_dict_type – initial vocab dict type, one of text labeling.
- **kwargs –
-
build_vocab
(x_data: List[List[str]], y_data: List[List[str]]) → None¶
-
get_tensor_shape
(batch_size: int, seq_length: int) → Tuple¶
-
inverse_transform
(labels: Union[List[List[int]], numpy.ndarray], *, lengths: List[int] = None, threshold: float = 0.5, **kwargs) → List[List[str]][source]¶
-
is_vocab_build
¶
-
transform
(samples: List[List[str]], *, seq_length: int = None, max_position: int = None, segment: bool = False) → numpy.ndarray[source]¶
-
vocab_size
¶
-
ClassificationProcessor¶
-
class
kashgari.processors.
ClassificationProcessor
(multi_label: bool = False, **kwargs)[source]¶ Bases:
kashgari.processors.abc_processor.ABCProcessor
-
__init__
(multi_label: bool = False, **kwargs) → None[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
build_vocab
(x_data: List[List[str]], y_data: List[List[str]]) → None¶
-
inverse_transform
(labels: Union[List[int], numpy.ndarray], *, lengths: List[int] = None, threshold: float = 0.5, **kwargs) → Union[List[List[str]], List[str]][source]¶
-
is_vocab_build
¶
-
transform
(samples: List[List[str]], *, seq_length: int = None, max_position: int = None, segment: bool = False) → numpy.ndarray[source]¶
-
vocab_size
¶
-