Japanese Question-Answering Corpus (JQAC)

What's JQAC?

Japanese Question-Answering Corpus (JQAC) is a dataset, consisting of question-answering pairs, which is manually made by university students on a set of Japanese Wikipedia articles and some public documents.

(distributed under the CC BY-SA 4.0 license)

Categories

Category Number of
Themes
Author(1) Number of
Questions(1)
Author(2) Number of
Questions(2)
Total
学問 (Academic Ddiscipline) 10 KA 100 - 0 100
技術 (Techonology) 10 SI 99 KI 57 156
自然 (Nature) 11 SI 83 KI 63 149
社会 (Society) 11 KI 66 SI 66 132
地理 (Geograpghy) 10 KA 100 - 0 100
人間 (Humans) 10 SA 74 - 0 74
文化 (Culture) 10 HI 138 - 0 138
歴史 (History) 10 YA 60 - 0 60
徳島大学シラバス (Tokushima University Syllabus) 10 KA 60 YA 60 120

Format

The JQAC data containts nine CSV files in UTF-8. All the sentences are wrtten in Japanese. Each file is partetionned as follows:

Theme
(Category)
Topic
(Title)
Question
(What, Who, Where, Whose, How, Yes/No)
AnswerDifficulty
(by Author)
Difficulty
(by Answerer)
URL
(Original Content)
学問アリストテレスアリストテレスは誰の弟子ですか?プラトン5https://ja.wikipedia.org/wiki/%E3%82%A2%...
学問アリストテレスアリストテレスは紀元前何年に出生しましたか?紀元前384年5https://ja.wikipedia.org/wiki/%E3%82%A2%E3%...
学問アリストテレス紀元前367年,アリストテレスはどこに入門しましたか?アカデメイア5https://ja.wikipedia.org/wiki/%E3%82%A2%E3%...
学問アリストテレスアリストテレスは師プラトンから何と評されましたか?学校の精神5https://ja.wikipedia.org/wiki/%E3%82%A2%E3%...

Download

The latest JQAC dataset is here. jqac20180625.tgz

Feel free to ask me any questions or comments regarding this project and dataset.

Referrence

Acknowledgments

This work was supported by Works Appilcations Co., Ltd.

This containt is managed by Hiroki Tanioka (taniokah[at]gmail.com), since 2018.