Yahooデベロッパーネットワーク「日本語形態素解析API」が凄いという話。

Yahooデベロッパーネットワークの、日本語形態素解析APIが結構使えそうだったので紹介してみる。


文章を投げるだけで、品詞毎に分解し、更に読み仮名なども返してくれる。
YahooIDを持っていれば、アプリケーションIDを登録するだけで利用可能となる。
アプリケーションID一つにつき50000リクエスト/日 が上限という制限があるものの、色々制限の多い、余所のAPIと比較すれば結構寛容な気がする。

というわけで、早速試してみる。
お試し用にPHPで以下のように書いてみた。

ソース
$appId = 'アプリケーションID';
$sentence = '初めまして。僕の名前はバフ。 Yahooデベロッパーネットワークの構文解析を使ってみるよ。これはテストの文章だよ。';
$queryArr = array(
    'appid' => $appId,
    'sentence' => $sentence
);
$query = http_build_query($queryArr);
$requestUri = 'http://jlp.yahooapis.jp/MAService/V1/parse?' . $query;
$resultXml = @simplexml_load_file($requestUri);
var_dump ($resultXml->ma_result->word_list);
実行結果
object(SimpleXMLElement)[4]
  public 'word' => 
    array
      0 => 
        object(SimpleXMLElement)[3]
          public 'surface' => string '初めまして' (length=15)
          public 'reading' => string 'はじめまして' (length=18)
          public 'pos' => string '感動詞' (length=9)
      1 => 
        object(SimpleXMLElement)[5]
          public 'surface' => string '。' (length=3)
          public 'reading' => string '。' (length=3)
          public 'pos' => string '特殊' (length=6)
      2 => 
        object(SimpleXMLElement)[6]
          public 'surface' => string '僕' (length=3)
          public 'reading' => string 'ぼく' (length=6)
          public 'pos' => string '名詞' (length=6)
      3 => 
        object(SimpleXMLElement)[7]
          public 'surface' => string 'の' (length=3)
          public 'reading' => string 'の' (length=3)
          public 'pos' => string '助詞' (length=6)
      4 => 
        object(SimpleXMLElement)[8]
          public 'surface' => string '名前' (length=6)
          public 'reading' => string 'なまえ' (length=9)
          public 'pos' => string '名詞' (length=6)
      5 => 
        object(SimpleXMLElement)[9]
          public 'surface' => string 'は' (length=3)
          public 'reading' => string 'は' (length=3)
          public 'pos' => string '助詞' (length=6)
      6 => 
        object(SimpleXMLElement)[10]
          public 'surface' => string 'バフ' (length=6)
          public 'reading' => string 'ばふ' (length=6)
          public 'pos' => string '名詞' (length=6)
      7 => 
        object(SimpleXMLElement)[11]
          public 'surface' => string '。' (length=3)
          public 'reading' => string '。' (length=3)
          public 'pos' => string '特殊' (length=6)
      8 => 
        object(SimpleXMLElement)[12]
          public 'surface' => 
            object(SimpleXMLElement)[33]
              ...
          public 'reading' => 
            object(SimpleXMLElement)[34]
              ...
          public 'pos' => string '特殊' (length=6)
      9 => 
        object(SimpleXMLElement)[13]
          public 'surface' => string 'Yahoo' (length=5)
          public 'reading' => string 'Yahoo' (length=5)
          public 'pos' => string '名詞' (length=6)
      10 => 
        object(SimpleXMLElement)[14]
          public 'surface' => string 'デベロッパー' (length=18)
          public 'reading' => string 'でべろっぱー' (length=18)
          public 'pos' => string '名詞' (length=6)
      11 => 
        object(SimpleXMLElement)[15]
          public 'surface' => string 'ネットワーク' (length=18)
          public 'reading' => string 'ねっとわーく' (length=18)
          public 'pos' => string '名詞' (length=6)
      12 => 
        object(SimpleXMLElement)[16]
          public 'surface' => string 'の' (length=3)
          public 'reading' => string 'の' (length=3)
          public 'pos' => string '助詞' (length=6)
      13 => 
        object(SimpleXMLElement)[17]
          public 'surface' => string '構文' (length=6)
          public 'reading' => string 'こうぶん' (length=12)
          public 'pos' => string '名詞' (length=6)
      14 => 
        object(SimpleXMLElement)[18]
          public 'surface' => string '解析' (length=6)
          public 'reading' => string 'かいせき' (length=12)
          public 'pos' => string '名詞' (length=6)
      15 => 
        object(SimpleXMLElement)[19]
          public 'surface' => string 'を' (length=3)
          public 'reading' => string 'を' (length=3)
          public 'pos' => string '助詞' (length=6)
      16 => 
        object(SimpleXMLElement)[20]
          public 'surface' => string '使っ' (length=6)
          public 'reading' => string 'つかっ' (length=9)
          public 'pos' => string '動詞' (length=6)
      17 => 
        object(SimpleXMLElement)[21]
          public 'surface' => string 'て' (length=3)
          public 'reading' => string 'て' (length=3)
          public 'pos' => string '助詞' (length=6)
      18 => 
        object(SimpleXMLElement)[22]
          public 'surface' => string 'みる' (length=6)
          public 'reading' => string 'みる' (length=6)
          public 'pos' => string '助動詞' (length=9)
      19 => 
        object(SimpleXMLElement)[23]
          public 'surface' => string 'よ' (length=3)
          public 'reading' => string 'よ' (length=3)
          public 'pos' => string '助詞' (length=6)
      20 => 
        object(SimpleXMLElement)[24]
          public 'surface' => string '。' (length=3)
          public 'reading' => string '。' (length=3)
          public 'pos' => string '特殊' (length=6)
      21 => 
        object(SimpleXMLElement)[25]
          public 'surface' => string 'これ' (length=6)
          public 'reading' => string 'これ' (length=6)
          public 'pos' => string '名詞' (length=6)
      22 => 
        object(SimpleXMLElement)[26]
          public 'surface' => string 'は' (length=3)
          public 'reading' => string 'は' (length=3)
          public 'pos' => string '助詞' (length=6)
      23 => 
        object(SimpleXMLElement)[27]
          public 'surface' => string 'テスト' (length=9)
          public 'reading' => string 'てすと' (length=9)
          public 'pos' => string '名詞' (length=6)
      24 => 
        object(SimpleXMLElement)[28]
          public 'surface' => string 'の' (length=3)
          public 'reading' => string 'の' (length=3)
          public 'pos' => string '助詞' (length=6)
      25 => 
        object(SimpleXMLElement)[29]
          public 'surface' => string '文章' (length=6)
          public 'reading' => string 'ぶんしょう' (length=15)
          public 'pos' => string '名詞' (length=6)
      26 => 
        object(SimpleXMLElement)[30]
          public 'surface' => string 'だ' (length=3)
          public 'reading' => string 'だ' (length=3)
          public 'pos' => string '助動詞' (length=9)
      27 => 
        object(SimpleXMLElement)[31]
          public 'surface' => string 'よ' (length=3)
          public 'reading' => string 'よ' (length=3)
          public 'pos' => string '助詞' (length=6)
      28 => 
        object(SimpleXMLElement)[32]
          public 'surface' => string '。' (length=3)
          public 'reading' => string '。' (length=3)
          public 'pos' => string '特殊' (length=6)

こんな感じ。

追加のリクエストパラメータを付加してやる事で、返す値を任意の品詞のみに限定したり、同一形態素を出現回数順に並べたりと、結構細かく指定する事が出来る。

2011年05月23日  タグ:,

Leave a Reply