Embeddings¶
BareEmbedding¶
-
class
kashgari.embeddings.BareEmbedding(embedding_size=100, **kwargs)[source]¶ Bases:
kashgari.embeddings.abc_embedding.ABCEmbeddingBareEmbedding is a random init tf.keras.layers.Embedding layer for text sequence embedding, which is the defualt embedding class for kashgari models.
-
__init__(embedding_size=100, **kwargs)[source]¶ - Parameters
embedding_size (int) – Dimension of the dense embedding.
kwargs (Any) – additional params
-
embed(sentences, *, debug=False)¶ batch embed sentences
- Parameters
- Returns
vectorized sentence list
- Return type
-
get_seq_length_from_corpus(generators, *, use_label=False, cover_rate=0.95)¶ Calculate proper sequence length according to the corpus
- Parameters
generators (List[kashgari.generators.CorpusGenerator]) –
use_label (bool) –
cover_rate (float) –
- Return type
Returns:
-
setup_text_processor(processor)¶ - Parameters
processor (kashgari.processors.abc_processor.ABCProcessor) –
- Return type
-
WordEmbedding¶
-
class
kashgari.embeddings.WordEmbedding(w2v_path, *, w2v_kwargs=None, **kwargs)[source]¶ Bases:
kashgari.embeddings.abc_embedding.ABCEmbedding-
__init__(w2v_path, *, w2v_kwargs=None, **kwargs)[source]¶ - Parameters
w2v_path (str) – Word2Vec file path.
w2v_kwargs (Dict[str, Any]) – params pass to the
load_word2vec_format()function of gensim.models.KeyedVectorskwargs (Any) – additional params
-
embed(sentences, *, debug=False)¶ batch embed sentences
- Parameters
- Returns
vectorized sentence list
- Return type
-
get_seq_length_from_corpus(generators, *, use_label=False, cover_rate=0.95)¶ Calculate proper sequence length according to the corpus
- Parameters
generators (List[kashgari.generators.CorpusGenerator]) –
use_label (bool) –
cover_rate (float) –
- Return type
Returns:
-
setup_text_processor(processor)¶ - Parameters
processor (kashgari.processors.abc_processor.ABCProcessor) –
- Return type
-
TransformerEmbedding¶
-
class
kashgari.embeddings.TransformerEmbedding(vocab_path, config_path, checkpoint_path, model_type='bert', **kwargs)[source]¶ Bases:
kashgari.embeddings.abc_embedding.ABCEmbeddingTransformerEmbedding is based on bert4keras. The embeddings itself are wrapped into our simple embedding interface so that they can be used like any other embedding.
-
embed(sentences, *, debug=False)¶ batch embed sentences
- Parameters
- Returns
vectorized sentence list
- Return type
-
get_seq_length_from_corpus(generators, *, use_label=False, cover_rate=0.95)¶ Calculate proper sequence length according to the corpus
- Parameters
generators (List[kashgari.generators.CorpusGenerator]) –
use_label (bool) –
cover_rate (float) –
- Return type
Returns:
-
setup_text_processor(processor)¶ - Parameters
processor (kashgari.processors.abc_processor.ABCProcessor) –
- Return type
-
BertEmbedding¶
-
class
kashgari.embeddings.BertEmbedding(model_folder, **kwargs)[source]¶ Bases:
kashgari.embeddings.transformer_embedding.TransformerEmbeddingBertEmbedding is a simple wrapped class of TransformerEmbedding. If you need load other kind of transformer based language model, please use the TransformerEmbedding.
-
__init__(model_folder, **kwargs)[source]¶ - Parameters
model_folder (str) – path of checkpoint folder.
kwargs (Any) – additional params
-
build_embedding_model(*, vocab_size=None, force=False, **kwargs)¶
-
embed(sentences, *, debug=False)¶ batch embed sentences
- Parameters
- Returns
vectorized sentence list
- Return type
-
get_seq_length_from_corpus(generators, *, use_label=False, cover_rate=0.95)¶ Calculate proper sequence length according to the corpus
- Parameters
generators (List[kashgari.generators.CorpusGenerator]) –
use_label (bool) –
cover_rate (float) –
- Return type
Returns:
-
load_embed_vocab()¶ Load vocab dict from embedding layer
-
setup_text_processor(processor)¶ - Parameters
processor (kashgari.processors.abc_processor.ABCProcessor) –
- Return type
-