Word Embedding and Word Vectors
Word embeddings or word vectors are numeric representations of words. They turn text into numbers that models can understand. Words with similar meaning get vectors that are close in space.
Why Word Embeddings Matter
- Convert text into numeric form
- Capture meaning and relationships
- Improve NLP model performance
- Reduce dimensionality compared to one hot encoding
How Word Embeddings Work
- Each word becomes a vector of continuous values.
- Values carry semantic information.
- Vectors place related words close together.
Popular Embedding Methods
1. Word2Vec
Uses CBOW or Skip Gram to learn embeddings from context.
2. GloVe
Learns vectors from global statistics of word co occurrence.
3. FastText
Uses subword information. Helps with rare and misspelled words.
4. Contextual Embeddings
Generated by models like BERT. Same word can have different vectors depending on context.
Types of Word Embeddings
Static Embeddings
Each word has one fixed vector. Example. Word2Vec, GloVe.
Contextual Embeddings
Word meaning changes based on sentence. Example. BERT, GPT.
Example of Meaning in Vector Space
In a good embedding space:
- king minus man plus woman gives queen
- walk and walking stay close
- happy and joyful stay close
Benefits of Word Embeddings
- Compact representation
- Semantic meaning captured
- Better model accuracy
Limitations
- Static embeddings ignore context
- May capture dataset bias
Word Embeddings in Moroccan Darija
Word embeddings hiyya tariqa bach n7awlo words l vectors. Had vectors kay7mlo meaning. Words li kayn f same context kayjiw qrabin f vector space.
Examples
- Word2Vec f context learning.
- GloVe f global statistics.
- FastText f subwords.
- BERT f contextual meaning.
Nqat Sahl
- Text kaywlli numbers.
- Meaning kayban f vectors.
- Models kayfhamo text b7al data numeric.
Conclusion
Word embeddings turn text into meaningful vectors. They support most NLP systems. They help models understand relationships between words with strong accuracy.