These wordlists are extracts from Wikipedia. You can use them under the terms of the GNU Free Documentation License. The processed data is - dewiki-20060123-pages-articles.xml - enwiki-20060125-pages-articles.xml en-2 contains all words that match the regular expression [a-zA-Z0-9_\-\.\+\#\~\$\!\%\&]+ and occur twice. en-5 is a smaller subset that contains only words that occure five time. de-2 and de-5 is almost the same for the German export.