Embeddings¶
BareEmbedding¶
- class kashgari.embeddings.BareEmbedding(embedding_size=100, **kwargs)[source]¶
Bases:
kashgari.embeddings.abc_embedding.ABCEmbedding
BareEmbedding is a random init tf.keras.layers.Embedding layer for text sequence embedding, which is the defualt embedding class for kashgari models.
- Parameters
embedding_size (int) –
kwargs (Any) –
- __init__(embedding_size=100, **kwargs)[source]¶
- Parameters
embedding_size (int) – Dimension of the dense embedding.
kwargs (Any) – additional params
- embed(sentences, *, debug=False)¶
batch embed sentences
- Parameters
- Returns
vectorized sentence list
- Return type
- get_seq_length_from_corpus(generators, *, use_label=False, cover_rate=0.95)¶
Calculate proper sequence length according to the corpus
- Parameters
generators (List[kashgari.generators.CorpusGenerator]) –
use_label (bool) –
cover_rate (float) –
- Return type
Returns:
- setup_text_processor(processor)¶
- Parameters
processor (kashgari.processors.abc_processor.ABCProcessor) –
- Return type
WordEmbedding¶
- class kashgari.embeddings.WordEmbedding(w2v_path, *, w2v_kwargs=None, **kwargs)[source]¶
Bases:
kashgari.embeddings.abc_embedding.ABCEmbedding
- __init__(w2v_path, *, w2v_kwargs=None, **kwargs)[source]¶
- Parameters
w2v_path (str) – Word2Vec file path.
w2v_kwargs (Optional[Dict[str, Any]]) – params pass to the
load_word2vec_format()
function of gensim.models.KeyedVectorskwargs (Any) – additional params
- embed(sentences, *, debug=False)¶
batch embed sentences
- Parameters
- Returns
vectorized sentence list
- Return type
- get_seq_length_from_corpus(generators, *, use_label=False, cover_rate=0.95)¶
Calculate proper sequence length according to the corpus
- Parameters
generators (List[kashgari.generators.CorpusGenerator]) –
use_label (bool) –
cover_rate (float) –
- Return type
Returns:
- setup_text_processor(processor)¶
- Parameters
processor (kashgari.processors.abc_processor.ABCProcessor) –
- Return type
TransformerEmbedding¶
- class kashgari.embeddings.TransformerEmbedding(vocab_path, config_path, checkpoint_path, model_type='bert', **kwargs)[source]¶
Bases:
kashgari.embeddings.abc_embedding.ABCEmbedding
TransformerEmbedding is based on bert4keras. The embeddings itself are wrapped into our simple embedding interface so that they can be used like any other embedding.
- Parameters
- embed(sentences, *, debug=False)¶
batch embed sentences
- Parameters
- Returns
vectorized sentence list
- Return type
- get_seq_length_from_corpus(generators, *, use_label=False, cover_rate=0.95)¶
Calculate proper sequence length according to the corpus
- Parameters
generators (List[kashgari.generators.CorpusGenerator]) –
use_label (bool) –
cover_rate (float) –
- Return type
Returns:
- setup_text_processor(processor)¶
- Parameters
processor (kashgari.processors.abc_processor.ABCProcessor) –
- Return type
BertEmbedding¶
- class kashgari.embeddings.BertEmbedding(model_folder, **kwargs)[source]¶
Bases:
kashgari.embeddings.transformer_embedding.TransformerEmbedding
BertEmbedding is a simple wrapped class of TransformerEmbedding. If you need load other kind of transformer based language model, please use the TransformerEmbedding.
- Parameters
model_folder (str) –
kwargs (Any) –
- __init__(model_folder, **kwargs)[source]¶
- Parameters
model_folder (str) – path of checkpoint folder.
kwargs (Any) – additional params
- build_embedding_model(*, vocab_size=None, force=False, **kwargs)¶
- embed(sentences, *, debug=False)¶
batch embed sentences
- Parameters
- Returns
vectorized sentence list
- Return type
- get_seq_length_from_corpus(generators, *, use_label=False, cover_rate=0.95)¶
Calculate proper sequence length according to the corpus
- Parameters
generators (List[kashgari.generators.CorpusGenerator]) –
use_label (bool) –
cover_rate (float) –
- Return type
Returns:
- load_embed_vocab()¶
Load vocab dict from embedding layer
- setup_text_processor(processor)¶
- Parameters
processor (kashgari.processors.abc_processor.ABCProcessor) –
- Return type