ææ¥æ°èåèªãã¯ãã«ãã¯ç´800ä¸è¨äº(延ã¹23ååèª)ããã¡ãã¦å¦ç¿ããåèªãã¯ãã«ã§ããword2vecã®Skip-gramã»CBOWãGloVeãç¨ãã¦å¦ç¿ããã¦ãã¾ããããã«ãRetrofittingãã¨å¼ã°ãããåèªãã¯ãã«ã®fine-tuningææ³ãç¨ãã¦æé©åãããã®ãæä¾ãã¾ãã
ãææ¥æ°èåèªãã¯ãã«ãã¯ãææ¥æ°è社ãä¿æãã1984å¹´8æãã2017å¹´8æã¾ã§ã«æ²è¼ãããè¨äºã®ãã¡ãç´800ä¸è¨äº(延ã¹23ååèª)ããã¡ãã¦å¦ç¿ããåèªãã¯ãã«ã§ãã
åèªåå²ã«ã¯ MeCabã使ç¨ããè¾æ¸ã¯IPADIC-2.7.0ãç¨ãã¦ãã¾ããåèªãã¯ãã«ã®ã¢ãã«ã¯ãSkip-gramã¨CBOWã word2vecã® ãã¼ã«ã§å¦ç¿ããã»ããGloVeã«ãã£ã¦å¦ç¿ããã¢ãã«ãæä¾ãã¾ãã
ããã«ãSkip-gramãCBOWãGloVeã«å¯¾ãã¦ãRetrofittingãã¨å¼ã°ãããåèªãã¯ãã«ã®fine-tuningææ³ãç¨ãã¦æé©åãããã®ãç¨æãã¾ãã[1]ãåè¨6ã¢ãã«ãæä¾ãã¾ãã
æ¬ãã¼ã¿ã¯ãæ ªå¼ä¼ç¤¾ã¬ããªãã¨ã®å ±åç 究ã®ä¸ç°ã¨ãã¦ä½æããããã®ã§ããå ±åç 究ã«ã¤ãã¦ã¯ãã¡ããã覧ãã ãããã¾ããæ¬ä»¶ã«é¢ãããã¬ã¹ãªãªã¼ã¹ã¯ãã¡ããã覧ãã ããã
word2vecããã³GloVeã®è¨ç·´ãã©ã¡ã¼ã¿ã¯ä¸è¨ã®éãã§ãã
Skip-gram or CBOW | -cbow |
{0, 1} |
次å æ° | -size |
300 |
æèé· | -window |
8 |
è² ä¾ãµã³ããªã³ã° | -negative |
5 |
é層åã½ããããã¯ã¹ | -hs |
0 |
æä½é »åº¦é¾å¤ | -sample |
1e-5 |
åèªæä½åºç¾åæ° | -min-count |
3 |
å復åæ° | -iter |
15 |
次å æ° | VECTOR_SIZE |
300 |
æèé· | WINDOW_SIZE |
8 |
åèªæä½åºç¾åæ° | VOCAB MIN COUNT |
3 |
å復åæ° | MAX_ITER |
15 |
åèªãã¯ãã«ã¯ãå ¨ã¦word2vecã®ãã©ã¼ãããã§çµ±ä¸ããã¦ãã¾ãã
ãã¡ã¤ã«ã®1è¡ç®ã«èªå½æ°ã¨ãã¯ãã«ã®æ¬¡å æ°ãè¨è¼ããã2è¡ç®ä»¥éã¯åèªã¨ãã¯ãã«ãè¨è¼ããã¦ãã¾ãã
ä¸è¨ã§ã¯ãPythonã§gensim
ãç¨ãããµã³ãã«ã³ã¼ãã¯ä¸è¨ã®éãã§ãã
>>> from gensim.models import KeyedVectors
>>> # ããåèªã«ä¼¼ãåèª(ä¸ä½5å)ãæ¢ã
>>> vec = KeyedVectors.load_word2vec_format("./cbow.txt")
>>> vec.most_similar("æã", topn=5)
[('æã', 0.7123910188674927),
('æãã', 0.6702773571014404),
('æãã£', 0.5876639485359192),
('èæã', 0.58516526222229),
('çã£æ', 0.5563079118728638)]
>>> # Retrofittingãé©ç¨ãããã¯ãã«ã§ä¼¼ãåèªãæ¢ã
>>> retro_vec = KeyedVectors.load_word2vec_format("./cbow-retrofitting.txt")
>>> retro_vec.most_similar("æã", topn=5)
[('èããã', 0.8090516328811646),
('ã°ã«ã¼ãã¼', 0.7773782014846802),
('ãã®æ·ãã', 0.7517762780189514),
('é°æ°èã', 0.7295931577682495),
('æã', 0.7175554037094116
>>> # ãçãã¨ããåèªãããç·ããå¼ãã¦ã女ãã足ãã¨
>>> retro_vec.most_similar(positive=['女', 'ç'], negative=['ç·'], topn=5)
[('女å¸', 0.6063517332077026),
('女ç', 0.6007771492004395),
('åç', 0.5941751003265381),
('ã¯ã£ã¼ã³', 0.583606481552124),
('å', 0.5781991481781006)]
ãã¼ã¿ã®å ¥æã¯ãä¸è¨ã®å©ç¨è¦ç´ãèªã¿åæãããæ¹ã®ã¿å ¥æãå¯è½ã§ãã
å©ç¨è¦ç´ã«åæãããæ¹ã¯ã
ãæè¨ã®ä¸ã
research-prï¼ã¢ãããã¼ã¯ï¼retrieva.jp
ã¾ã§ãåãåãããã ããã
ï¼å¶æ¥æ¥ä»¥å ã«é ããå 容ã確èªããã¦ããã ããå©ç¨ç®çãå©ç¨è¦ç´ã«æºãã¦ããæ¹ã«ã¯ã¡ã¼ã«ã®è¿ä¿¡ã«ã¦ãã¼ã¿ã®URLããéããã¾ãã
ãã¡ãã®QRã³ã¼ãããã¡ã¼ã«ãéä¿¡ããã ãã¨ä¾¿å©ã§ãã
[1]. åèæç®1ã«è¨è¼ããã¦ããRetrofitting(èªå)ãããã«è©²å½ãã¾ãã