Hello World

2 minute read

Hello world!

This is my first post!

I am just going to create the word cloud of the ‘Hello world’ sentence in some different languages. Translations will be scraped from ReversoContext. The goal of the post is just to say hello, so no further describing the methods used. I hope the blog will be at least a bit enjoyable šŸ˜€

1import re
2import requests
3
4import lxml
5import matplotlib.pyplot as plt
6from bs4 import BeautifulSoup
7from wordcloud import WordCloud
 1# List of languages
 2langs = [
 3    "German",
 4    "Spanish",
 5    "French",
 6    "Dutch",
 7    "Polish",
 8    "Portuguese",
 9    "Romanian",
10    "Russian",
11    "Turkish",
12    "Italian",
13    "Turkish",
14]
1def request(user_l, trans_l, word):
2    # Send a get request to specified url
3    url = f"https://context.reverso.net/translation/{user_l.lower()}-{trans_l.lower()}/{word.lower()}"
4    headers = {
5        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
6    }
7    response = requests.get(url, headers=headers)
8    return response
 1def get_trans(url, trans_l):
 2    # Scrap translations for one lang
 3    src = url.content
 4    soup = BeautifulSoup(src, "lxml")
 5    translations = soup.find_all(class_=re.compile("^translation"))
 6    translations = [
 7        a.text.strip("(\r\n|\n) ,") for i, a in enumerate(translations) if i > 1
 8    ]
 9    to_remove = re.compile(r"(world|hello)", re.IGNORECASE)
10    translations = filter(lambda x: not to_remove.search(x), translations)
11    translations = map(lambda x: x.replace(", ", " "), translations)
12    return translations
1def get_all(user_l, word):
2    # Get translations for all langs in dict with freq 1 (requirement for wordcloud)
3    d = {"Hello world": 1}
4    for v in langs:
5        url = request(user_l, v, word)
6        for x in get_trans(url, v):
7            d[x] = 1
8    del d["Hallo-Welt-Programm"]  # that's quit weird translation
9    return d
1translations = get_all("english", "Hello world")
1def plot_cloud(wordcloud):
2    # Set figure size
3    plt.figure(figsize=(15, 15))
4    # Display image
5    plt.imshow(wordcloud)
6    # No axis details
7    plt.axis("off")
1wordcloud = WordCloud(
2    mode="RGBA", width=3000, height=2000, random_state=1
3).generate_from_frequencies(translations)
1plot_cloud(wordcloud)

worldcloud

Seems like the work is done. Non-Latin languages like Arabic or Japanese are not included, because they are a bit tricky for the word cloud library. So I guess I can say hello world with a clear conscience šŸ¤–