There is an emerging “field” in places like North America and Europe that people are calling the “Digital Humanities.” People who work in this field are employing digital tools to enhance the traditional work of humanities scholars, and they are also thinking critically about how digital media might be transforming the way that we produce and process knowledge.

One tool that some scholars employ is software for text mining. Through text mining, scholars can search through large quantities of text to try to detect certain patterns that they can then examine more closely through the traditional techniques of humanities scholars – namely, the close reading of texts.

I decided to try a simple experiment with Voyant Tools, a free on-line site that allows you to do basic text mining (http://voyant-tools.org/).

What I did was to input the text for the first chapter of the Đại Việt sử ký toàn thư in both the original classical Chinese and then the Vietnamese translation.

There are various ways that you can analyze the text. It creates a “word cloud.” It shows the frequency of usage of words. You can produce graphs of word usage over the course of the text simply by clicking on a word in the text. And finally, you can see the words in their contexts.

In order to data mine the classical Chinese text, I had to put a space between each of the characters so that the software could recognize them as separate.

So did the results reveal anything interesting? Kind of. I can see, for instance, how using this software could help a person analyze Vietnamese language translations of classical Chinese texts.

Take a look at the frequencies.

In the classical Chinese text, the term “quốc” (國) appears 27 times. In the Vietnamese translation, the term “nước,” meaning “country,” appears 46 times. Perhaps in some cases nước is used to refer to “water” rather than a “country.” But perhaps this is a sign that the translator injected a term in places where it did not belong.

Whatever the case may be, this numerical discrepancy leads one to want to investigate the issue further by doing what humanities scholars have always done, that is, to read the text closely.

In the first instance, vạn quốc (萬國, “the ten thousand kingdoms”) is translated as muôn nước. Then Xích Quỷ quốc (赤鬼國, “the ScarletGhostKingdom”) is translated as nước Xích Quỷ.

The above two translations are very straightforward and unproblematic. The third time that the term “nước” appears in the translation however, is not as clear-cut. There we find that “Ngã Việt chi cơ” (我越之基, “the enterprise of We/Our Việt”) is translated as cơ nghiệp của nước Việt ta.

First of all, in the original there is no term here for kingdom/country (國) like there are in the first two cases. So the translator added the term “nước.” Was that addition warranted?

Does Ngã Việt refer to a “country”? How do we know? Was there ever a kingdom called “Ngã Việt”? Did the term ever appear in expressions like “Ngã Việt chi sơn hà” (我越之山河) to refer to the “mountains and rivers” or “territory” of Ngã Việt?

From my reading of original texts, I don’t get the sense that Ngã Việt was used to refer to a “country” in a territorial sense. Instead, I get the sense that it was a concept that was restricted to the elite, and which referred mainly to the existence of a political tradition as the extended phrase “Ngã Việt chi cơ” (我越之基, “the enterprise of We/Our Việt”) indicates here.

So if I was translating this, I would not add the word “nước” here. It’s not in the original, and to add it distorts the way that the premodern elite viewed the world.

Adding that term, however, does make the past fit the way we view the world in the present. But in the process, the uniqueness of the past is erased.

In any case, it just struck me that this might be one way that text mining could be put to productive use, namely by helping to identify issues to examine, and then enabling a scholar to focus in on the issue and engage in a close textual reading.