Embeddings¶
BareEmbedding¶
-
class
kashgari.embeddings.
BareEmbedding
(embedding_size=100, **kwargs)[source]¶ Bases:
kashgari.embeddings.abc_embedding.ABCEmbedding
BareEmbedding is a random init tf.keras.layers.Embedding layer for text sequence embedding, which is the defualt embedding class for kashgari models.
-
__init__
(embedding_size=100, **kwargs)[source]¶ - Parameters
embedding_size (int) – Dimension of the dense embedding.
kwargs (Any) – additional params
-
embed
(sentences, *, debug=False)¶ batch embed sentences
- Parameters
- Returns
vectorized sentence list
- Return type
-
get_seq_length_from_corpus
(generators, *, use_label=False, cover_rate=0.95)¶ Calculate proper sequence length according to the corpus
- Parameters
generators (List[kashgari.generators.CorpusGenerator]) –
use_label (bool) –
cover_rate (float) –
- Return type
Returns:
-
setup_text_processor
(processor)¶ - Parameters
processor (kashgari.processors.abc_processor.ABCProcessor) –
- Return type
-
WordEmbedding¶
-
class
kashgari.embeddings.
WordEmbedding
(w2v_path, *, w2v_kwargs=None, **kwargs)[source]¶ Bases:
kashgari.embeddings.abc_embedding.ABCEmbedding
-
__init__
(w2v_path, *, w2v_kwargs=None, **kwargs)[source]¶ - Parameters
w2v_path (str) – Word2Vec file path.
w2v_kwargs (Dict[str, Any]) – params pass to the
load_word2vec_format()
function of gensim.models.KeyedVectorskwargs (Any) – additional params
-
embed
(sentences, *, debug=False)¶ batch embed sentences
- Parameters
- Returns
vectorized sentence list
- Return type
-
get_seq_length_from_corpus
(generators, *, use_label=False, cover_rate=0.95)¶ Calculate proper sequence length according to the corpus
- Parameters
generators (List[kashgari.generators.CorpusGenerator]) –
use_label (bool) –
cover_rate (float) –
- Return type
Returns:
-
setup_text_processor
(processor)¶ - Parameters
processor (kashgari.processors.abc_processor.ABCProcessor) –
- Return type
-
TransformerEmbedding¶
-
class
kashgari.embeddings.
TransformerEmbedding
(vocab_path, config_path, checkpoint_path, model_type='bert', **kwargs)[source]¶ Bases:
kashgari.embeddings.abc_embedding.ABCEmbedding
TransformerEmbedding is based on bert4keras. The embeddings itself are wrapped into our simple embedding interface so that they can be used like any other embedding.
-
embed
(sentences, *, debug=False)¶ batch embed sentences
- Parameters
- Returns
vectorized sentence list
- Return type
-
get_seq_length_from_corpus
(generators, *, use_label=False, cover_rate=0.95)¶ Calculate proper sequence length according to the corpus
- Parameters
generators (List[kashgari.generators.CorpusGenerator]) –
use_label (bool) –
cover_rate (float) –
- Return type
Returns:
-
setup_text_processor
(processor)¶ - Parameters
processor (kashgari.processors.abc_processor.ABCProcessor) –
- Return type
-
BertEmbedding¶
-
class
kashgari.embeddings.
BertEmbedding
(model_folder, **kwargs)[source]¶ Bases:
kashgari.embeddings.transformer_embedding.TransformerEmbedding
BertEmbedding is a simple wrapped class of TransformerEmbedding. If you need load other kind of transformer based language model, please use the TransformerEmbedding.
-
__init__
(model_folder, **kwargs)[source]¶ - Parameters
model_folder (str) – path of checkpoint folder.
kwargs (Any) – additional params
-
build_embedding_model
(*, vocab_size=None, force=False, **kwargs)¶
-
embed
(sentences, *, debug=False)¶ batch embed sentences
- Parameters
- Returns
vectorized sentence list
- Return type
-
get_seq_length_from_corpus
(generators, *, use_label=False, cover_rate=0.95)¶ Calculate proper sequence length according to the corpus
- Parameters
generators (List[kashgari.generators.CorpusGenerator]) –
use_label (bool) –
cover_rate (float) –
- Return type
Returns:
-
load_embed_vocab
()¶ Load vocab dict from embedding layer
-
setup_text_processor
(processor)¶ - Parameters
processor (kashgari.processors.abc_processor.ABCProcessor) –
- Return type
-