Skip to content

sheetsvicky/Zizhitongjian

Repository files navigation

Zizhitongjian

Outstanding personalities in a semantic analysis of Zizhitongjian

1 Segment Zizitongjian (https://en.wikipedia.org/wiki/Zizhi_Tongjian), a great work in Chinese historiography led by Sima Guang. It records Chinese history from 403 BC to 959 AD, covering 16 dynasties and about 3 million Chinese characters.

1.1 Initiate segmentation with jieba (https://github.com/fxsjy/jieba).

1.2 Remove symbols and stop words. Stop words are manually chosen from top 1000 high-frequency words after segmentation.

1.3 Unify words that represent the same people of followings ('吴起','韩信','霍去病','邓禹','关羽','诸葛亮','桓温','王猛','秦叔宝'). Also do this for negative controls ('赵高','贾充','李林甫').

Example:

温与秦丞相雄等战于白鹿原,温兵不利 (original document)

'温', '与', '秦丞相', '雄', '等', '战', '于', '白鹿原', ',', '温', '兵', '不利' (segmentation)

'桓温', '秦丞相', '雄', '战', '白鹿原', '桓温', '兵', '不利' (remove symbols and stop words, unify names '温' ==> '桓温')

2 Word2vec analysis (https://en.wikipedia.org/wiki/Word2vec), neural networks that reconstruct words to vector in high-dimensional spaces.

2.1 Implement Word2vec in gensim (https://radimrehurek.com/gensim/index.html)

2.2 Visualize Posthumous names from (https://zh.wikipedia.org/wiki/%E8%B0%A5%E5%8F%B7) by t-SNE and choose indices for evaluting personality ('义','智','忠','德','勇','仁').

alt tag

3 Compare personalities

3.1 Visualize personalities by t-SNE

alt tag

3.2 Compare them by similarity score of indices

alt tag

3.3 Interesting results

alt tag alt tag

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages