spaCyã®DependencyMatcherã§ã¬ãã¥ã¼æããæ å ±ãæ½åºãã¦ã¿ã
ã¢ããã³ãã«ã¬ã³ãã¼2021 ã¨ã³ã¸ã㢠ãã¯ããã¸ã¼
ããã¯ãèªç¶è¨èªå¦ç Advent Calendar 2021ã®20æ¥ç®ã®è¨äºã§ãã
æ°å2å¹´ç®ã®ã¨ã³ã¸ãã¢ãåæã§ãã æ®æ®µã¯ãã©ã«ã·ã¢ã®DXãã©ãããã©ã¼ã é¨ã»æè¡ç 究æã¨ãã2ã¤ã®é¨ç½²ã«æå±ããwebéçºã¨èªç¶è¨èªå¦çã®äºè¶³ã®èéãå±¥ãã¦ãã¾ããäºå ã追ãè ã¯ä¸å ããå¾ãã¨ããè¨èãããã¾ãããä»ã¯ã²ã¼ã²ã¼è¨ããªãããäºå ã追ããã¨ã³ã¸ãã¢ãç®æãã¦ãã¾ãã
ã¨ããã§çãããä¾åæ§é 解æãã¦ã¾ããï¼
ä¾åæ§é 解æã¯èªç¶è¨èªå¦çã®å®å¿ç¨ã«ããã¦éè¦ãªåºç¤è§£æã®1ã¤ã§ããæä¸ã®ã©ã®åèªï¼ãããã¯å¥ï¼ãã©ã®åèªï¼å¥ï¼ã«ä¾åãã¦ããããã¾ããããã®åèªï¼å¥ï¼éã¯ã©ããªé¢ä¿ãæã£ã¦ããã®ãï¼ä¾åæ§é ï¼ã解æãã¾ããä¸è¬çã«ä¾åæ§é 解æã¯ãæãåèªãå½¢æ ç´ ã«åå²ããããåèªãå½¢æ ç´ ã«åè©ã®ã©ãã«ãä»ä¸ãããããå½¢æ ç´ è§£æã¨å¼ã°ããå¦çã®å¾ã«è¡ããã¾ãã
ï¼ç»åï¼ãé¨å±ããè¦ããå¤æ¯ãç¾ããã£ãããã®ä¾åæ§é 解æã®çµæï¼
ä¸ã®å³ã¯ããé¨å±ããè¦ããå¤æ¯ãç¾ããã£ãããã¨ãããããã«ã®ã¬ãã¥ã¼ãæ³å®ããèªç¶è¨èªæã«å¯¾ãã¦ä¾åæ§é ã解æããçµæã§ããåèªã®ä¸ã«æ¸ãã¦ããNOUN
,ADP
ãªã©ã®æååã¯ããããåè©ãå©è©ã¨ãã£ãåè©ã表ãã¦ãã¾ããã¾ããä¾åé¢ä¿ãæã¤åèªã¨åèªãç¢å°ã§ç¹ããã¦ãããããããã®ç¢å°ã«ã¯é¢ä¿ã®ç¨®é¡ã表ãã©ãã«ãä»ä¸ããã¦ãã¾ããããã§ã¯ãã¹ã¦ã®ã©ãã«ã説æãããã¨ã¯ãã¾ããããä¾ãã°ä»¥ä¸ã®ãããªæå³ãããã¾ãã
- å¤æ¯âè¦ãã aclï¼å½¢å®¹è©ç修飾ç¯
- ç¾ããã£âå¤æ¯ nsubjï¼åè©å¥ä¸»èª
ä¾åæ§é 解æã¯æ§ã ãªå ´é¢ã§å½¹ç«ã¦ããã¾ããä¾ãã°ããæçãé«ç´ããã«ã®ããã«è±ªè¯ã§ç¾å³ããã£ãããã¨ãã修飾çãªèªå¥ã®ããæããæçãç¾å³ããã£ããããªã©ã®ããã«è¦ç´ãããã¨ã§ã人éã大éã®æç« ãç解ããã®ãè£å©ããææ¸è¦ç´ã·ã¹ãã ãããããããã«ã¯ãã¤éæ¥ãã¾ããï¼ãã¨ãã質åã«å¯¾ãã¦ãããããã«ã¯12æ20æ¥ã«éæ¥äºå®ã§ãããã¨ãããããªæãå ã«ã12æ20æ¥ãã¨åçãããããªè³ªåå¿çã·ã¹ãã ãªã©ã¸ã®å¿ç¨ãèãããã¾ããä»åã¯ç°¡åãªä¾æã使ã£ã¦ãä¾åæ§é 解æã使ã£ãã¬ãã¥ã¼æããã®æ å ±æ½åºã«ãã£ã¬ã³ã¸ãã¦ã¿ã¾ãããï¼
æºå
ä»åã¯ä¾åæ§é 解æãå«ãå種解æã«GiNZAã¨ããPythonã©ã¤ãã©ãªã使ç¨ãã¾ããGiNZAã¯Megagon Labsãéçºãããªã¼ãã³ã½ã¼ã¹ã®æ¥æ¬èªNLPã©ã¤ãã©ãªã§ãæå 端ã®æ©æ¢°å¦ç¿æè¡ãçµã¿è¾¼ãã èªç¶è¨èªå¦çã®ããã®ãã¬ã¼ã ã¯ã¼ã¯ã§ããspaCyã¨ãã¯ã¼ã¯ã¹ã¢ããªã±ã¼ã·ã§ã³ãºå¾³å³¶äººå·¥ç¥è½NLPç 究æã§éçºããããªã¼ãã³ã½ã¼ã¹å½¢æ ç´ è§£æå¨ã®Pythonå®è£ SudachiPyãåºç¤æè¡ã¨ãã¦ãã¾ãã
ä»åã®ä¸»å½¹ã§ããè¨äºã¿ã¤ãã«ã«ããªã£ã¦ããDependencyMatcher
ã¯spaCyã®APIã«ãããã®ãªã®ã§ãæ¬è¨äºã§DependencyMatcher
ã®ä½¿ãæ¹ãç解ããã°GiNZAã«ããæ¥æ¬èªã®èªç¶è¨èªå¦çã ãã§ãªããspaCyã§å¯¾å¿ãã¦ããä»ã®è¨èªã®å¦çã«ãé©ç¨ã§ãã¾ãã
ã§ã¯æ©éãGiNZAãã¤ã³ã¹ãã¼ã«ãã¾ãããã
pip install -U ginza ja-ginza-electra
ja-ginza-electra
ã¨ããã®ã¯ãGiNZAç¨ã®ææ°ã®è¨èªã¢ãã«ã§ãå¾æ¥ã®è¨èªã¢ãã«ãããé«ã解æ精度ãæã¡ã¾ãã ãã ãã¡ã¢ãªå®¹é16GB以ä¸æ¨å¥¨ã¨ã®ãã¨ãªã®ã§ãã¡ã¢ãªã®å°ãããã·ã³ã§å¦çãè¡ãå ´åã解æ精度ãããå®è¡é度ãéè¦ãããå ´åã«ã¯ãja-ginza-electra
ã®ä»£ããã«ja-ginza
ãã¤ã³ã¹ãã¼ã«ãã¦ãã ããã
ã¤ã³ã¹ãã¼ã«ãæ¸ãã ããã©ã¤ãã©ãªãimportãã¦è§£æã®æºåãæ´ãã¾ãããã
import ginza import spacy nlp = spacy.load('ja_ginza_electra') # ååã¯ã¢ãã«ããã¦ã³ãã¼ãããããã«æéããããã¾ã # nlp = spacy.load('ja_ginza') # ja-ginzaãã¤ã³ã¹ãã¼ã«ããå ´åã®ã¿
ç¶ãã¦ä½¿ç¨ããããã¹ããã¼ã¿ãæºåãã¾ãã ãæå ã«ãã¼ã¿ãããã°ããã使ç¨ãã¦ãæ§ãã¾ããããä»åã¯ãµã³ãã«ã¨ãã¦ããããã«ã®ã¬ãã¥ã¼ãæ³å®ãã3æãç¨æãã¾ããã ã¬ãã¥ã¼ãã¼ã¿ã¯å®åã«ãããèªç¶è¨èªå¦çã§ããæ±ããããã¼ã¿ã®1ã¤ã§ãã
txts = [ "é¨å±ããè¦ããå¤æ¯ãç¾ããã£ãã", "ç«å°ã¯æªããé£äºãç¾å³ããã", "客室é²å¤©é¢¨åã¯å¤§äººã§ã足ãã®ã°ãã¦ã¨ã¦ãåºãã£ãã" ]
Matcherã«ãããããã³ã°
ã¾ãã¯ä¾åæ§é 解æãããããã¼ã¯ã³ï¼åèª, å½¢æ ç´ ï¼åä½ã§ã®ãããã³ã°ã§æ å ±ãæ½åºãã¦ã¿ã¾ãããã spaCyã«ã¯Matcherã¨ããããã¼ã¯ã³åä½ã§ã®ãããã³ã°ã«é©ããAPIãããã¾ãã æåã«ãMatcherã使ã£ã¦ãã®ããã«ã®ã¬ãã¥ã¼ã«ã©ã®ãããªå½¢å®¹è©ã使ããã¦ããã®ãè¦ã¦ã¿ã¾ãããã
from spacy.matcher import Matcher matcher = Matcher(nlp.vocab) # æ¥æ¬èªã®èªå½ã®éåã渡ãã¦Matcherãªãã¸ã§ã¯ããä½ã patterns = [ [{"TAG": {"REGEX": "^形容è©"}}] # ã«ã¼ã«ã®å®ç¾©ãåè©ã¿ã°ãã形容è©ãã§å§ã¾ããã® ] matcher.add("adj", patterns) # Matcherãªãã¸ã§ã¯ãã«ã«ã¼ã«åadjã¨ãã¦ã«ã¼ã«ãç»é² s = "".join(txts) # 3ã¤ã®æãã¤ãªãã¦1ã¤ã®æååã«ãã doc = nlp(s) # å¼æ°ã«ä¸ããããæååãæç« ã¨ãã¦è§£æãã matches = matcher(doc) # matchesã®ä¸ã«ãããã³ã°çµæãå ¥ã
ã«ã¼ã«ã¯ããããã¹ããã¼ã¯ã³ã®å±æ§ãéããdictã®listã§ãã ä¾ãã°æ¬¡ã®ã«ã¼ã«ã¯ããåè©ã¿ã°ã®ååã形容è©ã§å§ã¾ããã®ãã表ãã«ã¼ã«ã§ãã
[{"TAG": {"REGEX": "^形容è©"}}]
TAG
ã¯åè©ã¿ã°ãæå®ãããã¨ãæå³ãã¾ããTAG
ã®ä»£ããã«LEMMA
ã使ããã¨ã§è¦åºãèªã«å¯¾ããæå®ãTEXT
ã使ããã¨ã§ææ¸ä¸åèªã®æååãã®ãã®ã«å¯¾ããæå®ãå¯è½ã§ãã
{"REGEX": "^形容è©"}
ã¯åè©ã¿ã°ã^形容è©
ã¨ããæ£è¦è¡¨ç¾ã«å½ã¦ã¯ã¾ããã®ï¼ï¼ãã¹ã¦ã®å½¢å®¹è©ï¼ã¨ããæå³ã§ããæ£è¦è¡¨ç¾ã使ããªãå ´åã«ã¯æååã§"形容è©-ä¸è¬"
ãªã©ã¨æå®ã§ãã¾ãã GiNZAå
é¨ã§ä½¿ããã¦ããå½¢æ
ç´ è§£æå¨SudachiPyã®åè©ã¿ã°ã®è¡¨ç¾æ¹æ³ã§ã¯ããã¤ãã³ã¤ãªãã§ãã詳細ãªåè©ã表ç¾ãã¦ãã¾ãããã®ãããåè©ãã形容è©ããåè©ãã¨ãã£ã大ããªåé¡ã®ã¿ä½¿ãããå ´åã¯ãæ£è¦è¡¨ç¾ã§åè©ã¿ã°ãæå®ããã¨è¯ãã§ãã解æçµæã®doc
ãforã«ã¼ãã§åãã¨ã©ã®åèªã«ã©ã®åè©ã¿ã°ãã¤ãã¦ãããè¦ããã¨ãã§ãã¾ãã
ã«ã¼ã«ã®æ¸ãæ¹ã«ã¤ãã¦ãã£ã¨è©³ããå¦ã³ããæ¹ã¯å ¬å¼ããã¥ã¡ã³ããå ¬å¼ã®USAGEããåç §ãã ããã
doc = nlp(txts[0]) for token in doc: print(token.text, token.tag_) # é¨å± åè©-æ®éåè©-ä¸è¬ # ãã å©è©-æ ¼å©è© # è¦ãã åè©-ä¸è¬ # å¤æ¯ åè©-æ®éåè©-ä¸è¬ # ã å©è©-æ ¼å©è© # ç¾ãã㣠形容è©-ä¸è¬ # ã å©åè© # ã è£å©è¨å·-å¥ç¹
ã«ã¼ã«ãå®ç¾©ã§ãããã次ã«ãããMatcherãªãã¸ã§ã¯ãã«ç»é²ãã¾ãã matcher.add()
ã®ç¬¬ï¼å¼æ°ã«ã«ã¼ã«ã®ååã第2å¼æ°ã«ã«ã¼ã«ã®listãæå®ãããã¨ã§ç»é²ã§ãã¾ãã ã«ã¼ã«ã®listãªã®ã§ãè¤æ°ã®æ¡ä»¶ãä¸ãããã¨ãå¯è½ã§ãã ä¾ãã°ã形容è©ã«å ãã¦å½¢å®¹åè©ï¼ã親åã ããä¸æå¿«ã ããªã©ï¼ã«ããããã³ã°ããããå ´åã«ã¯ã次ã®ããã«æ¸ããã¨ãã§ãã¾ãã
patterns = [ [ {"TAG": {"REGEX": "^形容è©"}} ], [ # ä»å使ç¨ãããã¼ã«ã®åè©ä½ç³»ã«ããã形容åè©ç¸å½ã®è¡¨ç¾ # åèï¼https://ccd.ninjal.ac.jp/unidic/glossary {"TAG": "åè©-æ®éåè©-å½¢ç¶è©å¯è½"}, {"LEMMA": "ã "} ] ] matcher.add("multiple patterns", patterns)
ãããã³ã°çµæãè¦ã¦ã¿ã¾ãããã å
¬å¼ããã¥ã¡ã³ããè¦ã¦ã¿ãã¨ãè¿ãå¤ã«ã¯(match_id, start, end)
ã®ãããªã¿ãã«ã®ãªã¹ããè¿ã£ã¦ããããã§ããã«ã¼ã«ã«ãããããé¨åã®æååã¯ãã¨ãã¨ã®æç« ã解æããdoc
ã«å¯¾ãã¦doc[start:end]
ã§åå¾ã§ãã¾ãã
for _, start, end in matches: print(doc[start:end].lemma_) # .lemma_ ã¯è¦åºãèªãåå¾ãããã¨ãæå³ãã
ä¸è¨ã®foræãå®è¡ããã¨ã次ã®ãããªçµæãåºåããã¾ãã
ç¾ãã æªã ç¾å³ãã åºã
ããã«ã«æ³ã¾ã£ã人ãã©ããªãã¨ãæããã®ãã¯ãªãã¨ãªãä¼ãã£ã¦ãã¾ããããªãã¨ãªã以ä¸ã®ãã¨ã¯åããã¾ããã ç¹ã«ããæªããã«ã¤ãã¦ã¯ä¸ä½ä½ãæªãã£ãã®ãæ°ã«ãªã£ã¦ãã¾ãã¾ããã
ããã§ã«ã¼ã«ãæ¹è¯ãããã®å½¢å®¹è©ã«å¯¾ãããä½ãããåããããã«ãã¦ã¿ã¾ãããã ä¾ãã°ã次ã®ãããªã«ã¼ã«å®ç¾©ãèãããã¾ãã
patterns = [ [ {"TAG": {"REGEX": "^åè©"}}, {"TEXT": {"IN": ["ã¯", "ã"]}}, #ãã¯ãããããã®ããããã«ããã {"TAG": {"REGEX": "^形容è©"}}] ] ]
ãããå ç¨ã®ããã«Matcherãªãã¸ã§ã¯ãã«ç»é²ãããããã³ã°çµæãè¦ã¾ãã
matcher = Matcher(nlp.vocab) matcher.add("noun-adj", patterns) doc = nlp(s) # s 㯠txt ã«å«ã¾ããæããã¹ã¦ join ãããã® matches = matcher(doc) for _, start, end in matches: print(doc[start:end].lemma_)
ããã¨ã次ã®ããã«åºåããã¾ãã
å¤æ¯ãç¾ãã ç«å°ã¯æªã é£äºãç¾å³ãã
ãªãã»ã©ããæªããã¨è¨ããã¦ããã®ã¯ç«å°ã§ãå¤æ¯ã¨é£äºã¯è¤ãããã¦ããããã§ãã
ã§ããããã§ã形容è©ãæ½åºããã¨ãã®çµæãããä¸åº¦è¦ã¦ã¿ã¾ãããã
ç¾ãã æªã ç¾å³ãã åºã
ä¸ã®3ã¤ã¯ä½ã«å¯¾ãã¦è¨ããã¦ããã®ãåããã¾ããããä½ããåºããã¨è¨ããã¦ããã®ãã¯æ½åºã§ãã¦ãã¾ããã ãåºããã¨ããåèªãå«ã¾ãã¦ããã®ã¯ä»¥ä¸ã®æã§ããã
客室é²å¤©é¢¨åã¯å¤§äººã§ã足ãã®ã°ãã¦ã¨ã¦ãåºãã£ãã
ä»ã®æã¨æ¯ã¹ãã¨ããè¤éãªæã§ããã 人éã¯ãã®æãè¦ãã¨ãã«ãåºãã¨è¨ããã¦ããã®ã¯å®¢å®¤é²å¤©é¢¨åï¼ãããã¯é²å¤©é¢¨åï¼ã ãªãã¨ããã«å¤æã§ãã¾ãããã³ã³ãã¥ã¼ã¿ã«ã¨ã£ã¦ã¯èªæã§ã¯ããã¾ããã ãåºãã£ããã®ç´åã®3åèªã¯ãã®ã°ãããã¦ããã¨ã¦ããã§ãåç´ã«ãåºããã®ç´åãè¦ãã°è¯ãã¨ããããã§ã¯ããã¾ããã å©è©ãã¯ããããã§å¤æããæãããã¾ããå®éãå©è©ãã¯ãããããå«ãç´åã®æç¯ã形容è©ã®æå³ã®å¯¾è±¡ã¨ãªã£ã¦ããå ´åãå¤ãããã¾ãããã ãä»åã®å ´åãç´åã®ãã¯ãããããã¤ãåèªã¯ã足ãã§ãã足ãåºããã¨è¨ã£ã¦ããã¨ã¯èãã¥ããã§ããã
ãããªã¨ãã«ä½¿ããã®ãä»åã®ã¡ã¤ã³ãã¼ããä¾åæ§é 解æã§ãã
DepdencyMatcherã«ãããããã³ã°
å®ã¯spaCyã«ã¯DepdencyMatcherã¨ãããä¾åæ§é ã使ã£ããããã³ã°ã®ããã®APIãããã¾ãã 使ãæ¹ã¯Matcherã¨ä¼¼ã¦ãã¾ãããã«ã¼ã«ã®æ¸ãæ¹ã¨ãããã³ã°çµæã®åãåºãæ¹ãç°ãªãã¾ãã
æã«å¯¾ããä¾åæ§é ã¯æ¨æ§é ããªãã¦ãã¾ããã¤ã¾ãããã ï¼ã¤ã ãæ ¹ã¨ãªããã¼ã¯ã³ããããæ ¹ä»¥å¤ã®ãã¼ã¯ã³ã¯ãã 1ã¤ã®ä¿ãå ãã¼ã¯ã³ãæã£ã¦ããã¨ãããã¨ã§ãã ãã®è¨äºã®æåã®å³ããããè¦ã¦ããã ãã¨ãç¾ããã£ãã¨ãããã¼ã¯ã³ãæ ¹ã¨ããæ¨æ§é ã«ãªã£ã¦ãã¾ãã æã«å¯¾ããä¾åæ§é ãæ¨æ§é ãªã®ã§ãä¾åæ§é ã使ã£ãã«ã¼ã«ãæ¨æ§é ã«ãªãã¾ãã
ä¾åæ§é ã使ã£ãã«ã¼ã«ã¯ãã¼ã¯ã³åä½ã§ã®ãããã³ã°ãè¡ãã«ã¼ã«ããå°ãè¤éãªã®ã§ãæåã«ã«ã¼ã«ã®ä¾ããè¦ããã¾ãã ã½ã¼ã¹ã³ã¼ãã¨æ¨æ§é ã®å³ã対å¿ä»ããªããè¦ã¦ã¿ã¾ãããã ä»åã®ä¾ã¯ãã¼ãã2ã¤ã ãã§ããããããç«æ´¾ãªæ¨æ§é ã§ãã
patterns = [ [ { "RIGHT_ID": "adj" ,"RIGHT_ATTRS": {"TAG": {"REGEX": "^形容è©"}} } ,{ "LEFT_ID": "adj" ,"REL_OP": ">" ,"RIGHT_ID": "noun" ,"RIGHT_ATTRS": {"TAG": {"REGEX": "^åè©"}, "DEP": "nsubj"} } ] ]
ï¼ç»åï¼DependencyMatcherç¨ã®ã·ã³ãã«ãªã«ã¼ã«ã®å³è§£ï¼
å³ã®ããã«ãDependencyMatcherç¨ã®ã«ã¼ã«ã¯å·¦ãæ ¹ã¨ãã¦å³å´ã«ãã¼ããçããã¦ããæ¨æ§é ã«ãªã£ã¦ãã¾ãã
Matcherã®ã¨ãã¨åæ§ã«patterns
ãã«ã¼ã«ã®listã«ãªã£ã¦ãã¦ããã®ä¸ã«1ã¤1ã¤ã®ã«ã¼ã«ãdictã®listã¨ãã¦è¡¨ç¾ããã¦ãã¾ãã dictã¯ããããæ¨æ§é ã®å辺ã«å¯¾å¿ãã¦ãã¾ããããã§ãã¤ã³ãã«ãªãã®ã¯ããã¼ãã§ã¯ãªã辺ã1ã¤ã®dictã«å¯¾å¿ãã¦ãããã¨ã§ãã æ ¹ãã¼ãã®å·¦ã«ã¯ä½ã®æ¡ä»¶æå®ããªãããã¼ãã¼ãããã£ã¤ãã¦ãã¦ãããã¼ãã¼ãã¨æ ¹ãã¼ãã®éã®è¾ºãå¼µããªããã°ãªããªããã¨èããã¨åãããããã§ãã åé
ç®ã®æå³ã¯æ¬¡ã®ããã«ãªã£ã¦ãã¾ãã
LEFT_ID
ï¼è¾ºã®å·¦å´ã®ãã¼ãã®IDã¨ãªãæååãè¨å®ããå ´åãããã¾ã§ã®è¦ç´ ã®RIGHT_ID
ã§ç»å ´ããæååã§ãªããã°ãªãã¾ãããRIGHT_ID
ï¼è¾ºã®å³å´ã®ãã¼ãã®IDã¨ãªãæååãä»ã®ãã¼ãã¨ãã¶ããªãååãã¤ãã¾ããREL_OP
ï¼å·¦å³ã®ãã¼ãã®é¢ä¿ã>
ã§ããã°å³ã®ãã¼ããå·¦ã®ãã¼ãã«ç´æ¥ä¾åãã¦ãããã¨ã示ãã<
ã§ããã°å·¦å³éï¼å·¦ã®ãã¼ããå³ã®ãã¼ãã«ä¾åï¼ã§ããä¾å以å¤ã®é¢ä¿ãæè»ã«å®ç¾©ã§ããä¾ãã°.
ã¯æã®ä¸ã§å³ã®ãã¼ããå·¦ã®ãã¼ãã®ç´åã«æ¥ããã¨ãã;
ã¯å³ã®ãã¼ããå·¦ã®ãã¼ãã®ç´å¾ã«æ¥ããã¨ãæå³ãã¾ããRIGHT_ATTR
ï¼è¾ºã®å³ã®ãã¼ãã«å¯¾å¿ãããã¼ã¯ã³ã®æ¡ä»¶ãè¨è¼ãã¾ããMatcherç¨ã®ã«ã¼ã«ã¨åæ§ã«åè©ã¿ã°ãè¦åºãèªãªã©ãç´æ¥æå®ã§ãã¾ããDEP
ã使ãã°ä¾åé¢ä¿ã®ç¨®é¡ãæå®ã§ãã¾ãã
å°éçãªèª¬æã«ã¯ãªãã¾ãããGiNZA/spaCyã§æ¥æ¬èªããã¹ãã解æããéã®ä¾åé¢ä¿ã®ç¨®é¡ã«ã¤ãã¦ã¯æ¬¡ã®è«æã«è©³ããã¾ã¨ã¾ã£ã¦ãã¾ãã
æµ å æ£å¹¸, éå±± å, 宮尾 ç¥ä», ç°ä¸ è²´ç§, 大æ è, æè æå¾, æ¾æ¬ è£æ²», Universal Dependencies æ¥æ¬èªã³ã¼ãã¹, èªç¶è¨èªå¦ç, 2019, 26å·», 1å·, p.3-36, https://www.jstage.jst.go.jp/article/jnlp/26/1/26_3/_article/-char/ja.
æ £ããªããã¡ã¯ä¾åé¢ä¿ã®ç¨®é¡ãæå®ãããã¨ãé£ããæããæ¹ãå¤ãããããã¾ããã ããããªãããå®éã«æ§ã ãªæã解æã«ãããªããåç §ãã¦ã¿ãã¨å¾ã ã«åãã£ã¦ããã¨æãã¾ãã 解æçµæã¯ä¾ãã°æ¬¡ã®ããã«å¯è¦åã§ããã®ã§æ´»ç¨ãã¦ãã ããã
s = "é¨å±ããè¦ããå¤æ¯ãç¾ããã£ã" doc = nlp(s) displacy.render(doc) # jupyterä¸ã§å¯è¦åããå ´åã¯jupyter=Trueãæå®ãã
次ã®ãããªç»åãåºåããã¾ãã
ã¾ããåãã¼ãã«ã¯å¿ é ã®é ç®ãããã¾ãã
- æ ¹ãã¼ããå®ç¾©ãã dictï¼
RIGHT_ID
RIGHT_ATTR
- 辺ãå®ç¾©ãã dictï¼
LEFT_ID
REL_OP
RIGHT_ID
RIGHT_ATTR
å³ã®ãã¼ããã©ããªåèªã§ãè¯ããã¨ããå ´åãããã¾ããããã®å ´åã¯RIGHT_ATTR
ã«ç©ºã®dict{}
ãè¨å®ãã¦ããã¾ãã
ãã£ã¨è¤éãªã«ã¼ã«ãæ¸ãããå ´åã«ã¯å ¬å¼ããã¥ã¡ã³ããå ¬å¼ã®USAGEããåç §ãã ããã
ãã¦ãã«ã¼ã«ã®æ¸ãæ¹ã®èª¬æãé·ããªã£ã¦ãã¾ãã¾ããããä¸è¨ã®ã«ã¼ã«ã§ãããã³ã°ããã¦ã¿ã¾ãããã
from spacy.matcher import DependencyMatcher matcher = DependencyMatcher(nlp.vocab) patterns = [ [ { "RIGHT_ID": "adj" ,"RIGHT_ATTRS": {"TAG": {"REGEX": "^形容è©"}} } ,{ "LEFT_ID": "adj" ,"REL_OP": ">" ,"RIGHT_ID": "noun" ,"RIGHT_ATTRS": {"TAG": {"REGEX": "^åè©"}, "DEP": "nsubj"} } ] ] matcher.add("adj_noun_pair", patterns) # txts = [ # "é¨å±ããè¦ããå¤æ¯ãç¾ããã£ãã", # "ç«å°ã¯æªããé£äºãç¾å³ããã", # "客室é²å¤©é¢¨åã¯å¤§äººã§ã足ãã®ã°ãã¦ã¨ã¦ãåºãã£ãã" # ] # s = "".join(txts) doc = nlp(s) matches = matcher(doc)
DependencyMatcherã®ãããã³ã°çµæã¯ããããã³ã°IDã¨ãã«ã¼ã«ã¨ã®å¯¾å¿ä»ãï¼alignmentsï¼ã®ã¿ãã«ã¨ãã¦è¿ããã¾ãã alignmentsã¯ã«ã¼ã«ã®åè¦ç´ ã§å®ç¾©ããã辺ã®å³å´ã®ãã¼ãã«ããããã¼ã¯ã³ã®indexãæ ¼ç´ããã¦ãã¾ãã ãã®ããã次ã®ããã«ãã¦çµæãåãåºãã¾ãã
for _, alignments in matches: print([doc[alignment].lemma_ for alignment in alignments]) # ['ç¾ãã', 'å¤æ¯'] # ['æªã', 'ç«å°'] # ['ç¾å³ãã', 'é£äº'] # ['åºã', 'é²å¤©é¢¨å']
åãããã³ã°çµæã«ã¤ãã¦ãæåã«IDadj
ã«å¯¾å¿ãããã¼ã¯ã³ã®indexãã次ã«IDnoun
ã«å¯¾å¿ãããã¼ã¯ã³ã®indexãæ ¼ç´ããã¦ãããã¨ããããã¾ãã
ã¾ãã
客室é²å¤©é¢¨åã¯å¤§äººã§ã足ãã®ã°ãã¦ã¨ã¦ãåºãã£ãã
ã«ã¤ãã¦ãåºãã®ãé²å¤©é¢¨åã§ããã¨ãããã¨ãæ¾ãã¦ãã¾ãã 欲ãè¨ãã¨åºãã®ã¯ãé²å¤©é¢¨åãã¨ããããã客室é²å¤©é¢¨åãã¨ãã¦ã»ããå ´é¢ãããã¨æãã¾ãããè¤ååè©ãæ½åºã§ããããã«ãã対å¿ãDependencyMatcherã®ã«ã¼ã«ãæ¹è¯ãããã¨ã§å¯è½ã«ãªãã¾ãããèå³ã®ããæ¹ã¯ãã²èãã¦ã¿ã¦ãã ããã
ã¾ã¨ã
æ¬è¨äºã§ã¯ãGiNZAã¨spaCyã®DependencyMatcherãç¨ãã¦æ¥æ¬èªã®ã¬ãã¥ã¼ããæ å ±ãæ½åºããæ¹æ³ã«ã¤ãã¦ãç°¡åãªä¾ãç¨ãã¦èª¬æãã¾ããã ã¾ããDependencyMatcherã®ä¾¿å©ãã示ããããMatcherã«ããæ å ±æ½åºãè¡ãã¾ããã GiNZAãspaCyããå®åã§èªç¶è¨èªå¦çãè¡ãéã«ã¯é常ã«ä¾¿å©ãªã©ã¤ãã©ãªã§ãã ä»åç´¹ä»ããæ©è½ã«éãããGiNZA/spaCyã使ãããªãã¦ä¸ã®ä¸ã®ããã¹ããã¼ã¿ãæå¹ã«æ´»ç¨ãã¦ããã¾ãããï¼
åæ æªèé
DXãã©ãããã©ã¼ã é¨ã¨æè¡ç 究æã«æå±ããæ°å2å¹´ç®ã¨ã³ã¸ãã¢ã
å®ã¯æ¹è¶ã®å³ã®ãèåãè¦æãªã®ã§ãããspaCyã®Matcherã¯ãã使ã£ã¦ãã¾ãã
ãã©ã«ã·ã¢ã§ã¯ãã©ã«ã·ã¢ã«èå³ããæã¡ããã ããæ¹ã«ã社å¡ã¨ã®é¢è«ã®ãæ¡å
ããã¦ãã¾ãã
æ¡ç¨å¿åã®æ¹ãã¾ãã¯ã«ã¸ã¥ã¢ã«ã«ã話ããã¦ã¿ããã¨ããæ¹ã¯ããæ°è»½ã«ä¸è¨ãããé£çµ¡ãã ããã
â» å¼ç¤¾ç¤¾å¡ã«å¯¾ããå¶æ¥è¡çºãªã©ã¯ãæããã¦ããã¾ãããå¸æã«æ²¿ããªãå ´åããããã¾ãã®ã§äºããäºæ¿ãã ããã