01 Mar 2021
Planet Python
John Ludhi/nbshare.io: Opinion Mining Aspect Level Sentiment Analysis
Opinion Mining - Aspect Level Sentiment Analysis
Aspect level sentiment analysis employs multiple machine learning processes. The first is parsing the sentence to extract the relation between words and be able to identify the aspects of a review. The second is analysing the sentiment of the adjectives used to describe the aspects.

This can be done automatically using Azure's Text Analytics service. All we need to do is to create a free account on microsoft azure and create a text analytics service: link
- Once you create and login to your account go to azure portal.
- Search for Text Analytics and create a new service.
- It will ask for a resource group, click on "create new"
- Choose the free tier which works fine for personal experimentation.
- Once the service is created, go to your resources and look for Keys and Endpoints, copy the keys and put them in the following cell.
KEY = "PUT THE KEY HERE" ENDPOINT = "PUT THE ENDPOINT HERE"
This function is just header to authenticate your credentials and connect with Azure. We can the communicate with the Azure ML service through the client
object.
from azure.ai.textanalytics import TextAnalyticsClient from azure.core.credentials import AzureKeyCredential def authenticate_client(): ta_credential = AzureKeyCredential(KEY) text_analytics_client = TextAnalyticsClient( endpoint=ENDPOINT, credential=ta_credential) return text_analytics_client client = authenticate_client() # we will interact with Azure ML via this object.
We will use Jupyter's widgets to create an interactive tool for opinion mining.
import ipywidgets as widgets
We will use Plotly library for interactive visualizations.
import plotly.graph_objs as go from plotly.offline import init_notebook_mode from plotly.subplots import make_subplots init_notebook_mode() # this line is required to be able to export the notebook as html with the plots.
# given three score (positive - neutral - negative) this function plots a pie chart of the three sentiments def plot_sentiment_scores(pos, neut, neg): return go.Figure(go.Pie(labels=["Positive", "Neutral", "Negative"], values=[pos, neut, neg], textinfo='label+percent', marker=dict(colors=["#2BAE66FF", "#795750", "#C70039"])), layout=dict(showlegend=False) )
Azure's Text analytics analyzes documents, not just sentences. Each document is a list of sentences. So our input must be a list of sentences.
We can use our Azure client to call the analyze_sentiment
method, which will return a list of sentiment scores for each passed document. Since we are just using one document with one sentence, we are interested in the first thing it returns, which is a tuple of three values: positive, negative, and neutral sentiment scores.
response = client.analyze_sentiment(documents=["This movie is fantastic"]) response
response[0]
AnalyzeSentimentResult(id=0, sentiment=positive, warnings=[], statistics=None, confidence_scores=SentimentConfidenceScores(positive=1.0, neutral=0.0, negative=0.0), sentences=[SentenceSentiment(text=This movie is fantastic, sentiment=positive, confidence_scores=SentimentConfidenceScores(positive=1.0, neutral=0.0, negative=0.0), offset=0, mined_opinions=[])], is_error=False)
print(f"Positive: {response[0].confidence_scores.positive}") print(f"Neutral: {response[0].confidence_scores.neutral}") print(f"Negative: {response[0].confidence_scores.negative}")
Positive: 1.0
Neutral: 0.0
Negative: 0.0
Let's put all of this in a function that takes a list of sentences as an input and plots the distribution of sentiment scores as a pie chart!
def sentiment_analysis_example(sentences): document = [sentences] # we use only one document for this function response = client.analyze_sentiment(documents=document)[0] # we use [0] to get only the first and only document print("Document Sentiment: {}".format(response.sentiment)) plot_sentiment_scores(response.confidence_scores.positive, response.confidence_scores.neutral, response.confidence_scores.negative ).show() # here we plot the sentiment for each sentence in the document. for idx, sentence in enumerate(response.sentences): print("Sentence: {}".format(sentence.text)) print("Sentence {} sentiment: {}".format(idx+1, sentence.sentiment)) plot_sentiment_scores(sentence.confidence_scores.positive, sentence.confidence_scores.neutral, sentence.confidence_scores.negative ).show()
sentiment_analysis_example("The acting was good. The graphics however were just okayish. I did not like the ending though.")
Document Sentiment: mixed

Sentence: The acting was good.
Sentence 1 sentiment: positive

Sentence: The graphics however were just okayish.
Sentence 2 sentiment: negative

Sentence: I did not like the ending though.
Sentence 3 sentiment: negative

Instead of just reporting the overall sentiment of a sentence, in aspect-level opinion mining, there are two main differences:
- We extract specific aspects in the sentences.
- We detect the opinion about the aspect in the text, not just a sentiment score.
repsonse = client.analyze_sentiment( ["The food and service were unacceptable and meh, but the concierge were nice and ok"], show_opinion_mining=True # only addition is that we set `show_opinion_mining` to True )[0]
# now we can also access the mined_opinions in a sentence mined_opinion = repsonse.sentences[0].mined_opinions[0] aspect = mined_opinion.aspect print(f"Aspect: {aspect.text}") for opinion in mined_opinion.opinions: print(f"Opinion: {opinion.text}\tSentiment:{opinion.sentiment}".expandtabs(12)) # p.s. we use expandtabs because unacceptable is longer than 8 characters # , so we want the \t to consider it one long word
Aspect: food Opinion:
unacceptable Sentiment:negative
Opinion: meh Sentiment:mixed
Let's make this more visual
def plot_sentiment_gauge(pos_score, title, domain=[0, 1]): fig = go.Figure(go.Indicator( mode="gauge+number", value=pos_score, gauge={'axis': {'range': [0, 1]}}, domain={'x': domain, 'y': [0, 1]}, title={'text': f"{title}", "font":dict(size=14)}), layout=dict(width=800, height=600, margin=dict(l=150,r=150))) return fig
def sentiment_analysis_with_opinion_mining_example(sentences, document_level=True, sentence_level=True, aspect_level=True, opinion_level=True): document = [sentences] response = client.analyze_sentiment(document, show_opinion_mining=True)[0] if document_level: # plotting overall document sentiment print("Document Sentiment: {}".format(response.sentiment)) plot_sentiment_scores(response.confidence_scores.positive, response.confidence_scores.neutral, response.confidence_scores.negative ).show() if not(sentence_level or aspect_level or opinion_level): # no need to continue if no plots are needed return response for sentence in response.sentences: if sentence_level: # plotting the overall sentence sentiment print(f"Sentence: {sentence.text}") print(f"Sentence sentiment: {sentence.sentiment}") plot_sentiment_scores( sentence.confidence_scores.positive, sentence.confidence_scores.neutral, sentence.confidence_scores.negative).show() for mined_opinion in sentence.mined_opinions: aspect = mined_opinion.aspect if aspect_level: # plotting the sentiment of the aspect plot_sentiment_gauge( aspect.confidence_scores.positive, f"Aspect ({aspect.text})").show() if opinion_level: opinions = mined_opinion.opinions n = len(opinions) gauges = list() for i, opinion in enumerate(opinions, start=1): gauges.append(plot_sentiment_gauge( opinion.confidence_scores.positive, f"Opinion ({opinion.text})", # this is just to show the plots next to each other domain=[(i-1)/n, i/n] ).data[0]) go.Figure(gauges, layout=go.Layout( height=600, width=800, autosize=False)).show() return response
response = sentiment_analysis_with_opinion_mining_example( "The food and service were unacceptable and meh, but the concierge were nice and ok", document_level=False, sentence_level=False )
Now let's create some jupyter widgets to interact with this function.
# some text to get the input text = widgets.Textarea(placeholder="Enter your text here") # checkboxes to select different levels of analysis document_cb = widgets.Checkbox(value=True, description="Document Level") sentence_cb = widgets.Checkbox(value=True, description="Sentence Level") aspect_cb = widgets.Checkbox(value=True, description="Aspect Level") opinion_cb = widgets.Checkbox(value=True, description="Opinion Level") # some button to trigger the analysis btn = widgets.Button(description="Analyse") # some place to show the output on out = widgets.Output() def analysis(b): with out: out.clear_output() sentences = text.value # get the input sentences from the Textarea widget # pass the input sentences to our `sentiment_analysis_example` function sentiment_analysis_with_opinion_mining_example(sentences, document_level=document_cb.value, sentence_level=sentence_cb.value, aspect_level=aspect_cb.value, opinion_level=opinion_cb.value ) btn.on_click(analysis) # bind the button with the `sentiment_analysis` function # put all widgets together in a tool checkboxes = widgets.VBox([document_cb, sentence_cb, aspect_cb,opinion_cb]) tool = widgets.VBox([widgets.HBox([text, checkboxes]), btn, out]) # give a default value for the text text.value = "The food and service were unacceptable and meh, but the concierge were nice and ok" tool
01 Mar 2021 10:41am GMT
John Ludhi/nbshare.io: Opinion Mining Aspect Level Sentiment Analysis
Opinion Mining - Aspect Level Sentiment Analysis
Aspect level sentiment analysis employs multiple machine learning processes. The first is parsing the sentence to extract the relation between words and be able to identify the aspects of a review. The second is analysing the sentiment of the adjectives used to describe the aspects.

This can be done automatically using Azure's Text Analytics service. All we need to do is to create a free account on microsoft azure and create a text analytics service: link
- Once you create and login to your account go to azure portal.
- Search for Text Analytics and create a new service.
- It will ask for a resource group, click on "create new"
- Choose the free tier which works fine for personal experimentation.
- Once the service is created, go to your resources and look for Keys and Endpoints, copy the keys and put them in the following cell.
KEY = "PUT THE KEY HERE" ENDPOINT = "PUT THE ENDPOINT HERE"
This function is just header to authenticate your credentials and connect with Azure. We can the communicate with the Azure ML service through the client
object.
from azure.ai.textanalytics import TextAnalyticsClient from azure.core.credentials import AzureKeyCredential def authenticate_client(): ta_credential = AzureKeyCredential(KEY) text_analytics_client = TextAnalyticsClient( endpoint=ENDPOINT, credential=ta_credential) return text_analytics_client client = authenticate_client() # we will interact with Azure ML via this object.
We will use Jupyter's widgets to create an interactive tool for opinion mining.
import ipywidgets as widgets
We will use Plotly library for interactive visualizations.
import plotly.graph_objs as go from plotly.offline import init_notebook_mode from plotly.subplots import make_subplots init_notebook_mode() # this line is required to be able to export the notebook as html with the plots.
# given three score (positive - neutral - negative) this function plots a pie chart of the three sentiments def plot_sentiment_scores(pos, neut, neg): return go.Figure(go.Pie(labels=["Positive", "Neutral", "Negative"], values=[pos, neut, neg], textinfo='label+percent', marker=dict(colors=["#2BAE66FF", "#795750", "#C70039"])), layout=dict(showlegend=False) )
Azure's Text analytics analyzes documents, not just sentences. Each document is a list of sentences. So our input must be a list of sentences.
We can use our Azure client to call the analyze_sentiment
method, which will return a list of sentiment scores for each passed document. Since we are just using one document with one sentence, we are interested in the first thing it returns, which is a tuple of three values: positive, negative, and neutral sentiment scores.
response = client.analyze_sentiment(documents=["This movie is fantastic"]) response
response[0]
AnalyzeSentimentResult(id=0, sentiment=positive, warnings=[], statistics=None, confidence_scores=SentimentConfidenceScores(positive=1.0, neutral=0.0, negative=0.0), sentences=[SentenceSentiment(text=This movie is fantastic, sentiment=positive, confidence_scores=SentimentConfidenceScores(positive=1.0, neutral=0.0, negative=0.0), offset=0, mined_opinions=[])], is_error=False)
print(f"Positive: {response[0].confidence_scores.positive}") print(f"Neutral: {response[0].confidence_scores.neutral}") print(f"Negative: {response[0].confidence_scores.negative}")
Positive: 1.0
Neutral: 0.0
Negative: 0.0
Let's put all of this in a function that takes a list of sentences as an input and plots the distribution of sentiment scores as a pie chart!
def sentiment_analysis_example(sentences): document = [sentences] # we use only one document for this function response = client.analyze_sentiment(documents=document)[0] # we use [0] to get only the first and only document print("Document Sentiment: {}".format(response.sentiment)) plot_sentiment_scores(response.confidence_scores.positive, response.confidence_scores.neutral, response.confidence_scores.negative ).show() # here we plot the sentiment for each sentence in the document. for idx, sentence in enumerate(response.sentences): print("Sentence: {}".format(sentence.text)) print("Sentence {} sentiment: {}".format(idx+1, sentence.sentiment)) plot_sentiment_scores(sentence.confidence_scores.positive, sentence.confidence_scores.neutral, sentence.confidence_scores.negative ).show()
sentiment_analysis_example("The acting was good. The graphics however were just okayish. I did not like the ending though.")
Document Sentiment: mixed

Sentence: The acting was good.
Sentence 1 sentiment: positive

Sentence: The graphics however were just okayish.
Sentence 2 sentiment: negative

Sentence: I did not like the ending though.
Sentence 3 sentiment: negative

Instead of just reporting the overall sentiment of a sentence, in aspect-level opinion mining, there are two main differences:
- We extract specific aspects in the sentences.
- We detect the opinion about the aspect in the text, not just a sentiment score.
repsonse = client.analyze_sentiment( ["The food and service were unacceptable and meh, but the concierge were nice and ok"], show_opinion_mining=True # only addition is that we set `show_opinion_mining` to True )[0]
# now we can also access the mined_opinions in a sentence mined_opinion = repsonse.sentences[0].mined_opinions[0] aspect = mined_opinion.aspect print(f"Aspect: {aspect.text}") for opinion in mined_opinion.opinions: print(f"Opinion: {opinion.text}\tSentiment:{opinion.sentiment}".expandtabs(12)) # p.s. we use expandtabs because unacceptable is longer than 8 characters # , so we want the \t to consider it one long word
Aspect: food Opinion:
unacceptable Sentiment:negative
Opinion: meh Sentiment:mixed
Let's make this more visual
def plot_sentiment_gauge(pos_score, title, domain=[0, 1]): fig = go.Figure(go.Indicator( mode="gauge+number", value=pos_score, gauge={'axis': {'range': [0, 1]}}, domain={'x': domain, 'y': [0, 1]}, title={'text': f"{title}", "font":dict(size=14)}), layout=dict(width=800, height=600, margin=dict(l=150,r=150))) return fig
def sentiment_analysis_with_opinion_mining_example(sentences, document_level=True, sentence_level=True, aspect_level=True, opinion_level=True): document = [sentences] response = client.analyze_sentiment(document, show_opinion_mining=True)[0] if document_level: # plotting overall document sentiment print("Document Sentiment: {}".format(response.sentiment)) plot_sentiment_scores(response.confidence_scores.positive, response.confidence_scores.neutral, response.confidence_scores.negative ).show() if not(sentence_level or aspect_level or opinion_level): # no need to continue if no plots are needed return response for sentence in response.sentences: if sentence_level: # plotting the overall sentence sentiment print(f"Sentence: {sentence.text}") print(f"Sentence sentiment: {sentence.sentiment}") plot_sentiment_scores( sentence.confidence_scores.positive, sentence.confidence_scores.neutral, sentence.confidence_scores.negative).show() for mined_opinion in sentence.mined_opinions: aspect = mined_opinion.aspect if aspect_level: # plotting the sentiment of the aspect plot_sentiment_gauge( aspect.confidence_scores.positive, f"Aspect ({aspect.text})").show() if opinion_level: opinions = mined_opinion.opinions n = len(opinions) gauges = list() for i, opinion in enumerate(opinions, start=1): gauges.append(plot_sentiment_gauge( opinion.confidence_scores.positive, f"Opinion ({opinion.text})", # this is just to show the plots next to each other domain=[(i-1)/n, i/n] ).data[0]) go.Figure(gauges, layout=go.Layout( height=600, width=800, autosize=False)).show() return response
response = sentiment_analysis_with_opinion_mining_example( "The food and service were unacceptable and meh, but the concierge were nice and ok", document_level=False, sentence_level=False )
Now let's create some jupyter widgets to interact with this function.
# some text to get the input text = widgets.Textarea(placeholder="Enter your text here") # checkboxes to select different levels of analysis document_cb = widgets.Checkbox(value=True, description="Document Level") sentence_cb = widgets.Checkbox(value=True, description="Sentence Level") aspect_cb = widgets.Checkbox(value=True, description="Aspect Level") opinion_cb = widgets.Checkbox(value=True, description="Opinion Level") # some button to trigger the analysis btn = widgets.Button(description="Analyse") # some place to show the output on out = widgets.Output() def analysis(b): with out: out.clear_output() sentences = text.value # get the input sentences from the Textarea widget # pass the input sentences to our `sentiment_analysis_example` function sentiment_analysis_with_opinion_mining_example(sentences, document_level=document_cb.value, sentence_level=sentence_cb.value, aspect_level=aspect_cb.value, opinion_level=opinion_cb.value ) btn.on_click(analysis) # bind the button with the `sentiment_analysis` function # put all widgets together in a tool checkboxes = widgets.VBox([document_cb, sentence_cb, aspect_cb,opinion_cb]) tool = widgets.VBox([widgets.HBox([text, checkboxes]), btn, out]) # give a default value for the text text.value = "The food and service were unacceptable and meh, but the concierge were nice and ok" tool
01 Mar 2021 10:41am GMT
Zero to Mastery: Python Monthly 💻🐍 February 2021
15th issue of Python Monthly! Read by 20,000+ Python developers every month. This monthly Python newsletter is focused on keeping you up to date with the industry and keeping your skills sharp, without wasting your valuable time.
01 Mar 2021 10:00am GMT
Zero to Mastery: Python Monthly 💻🐍 February 2021
15th issue of Python Monthly! Read by 20,000+ Python developers every month. This monthly Python newsletter is focused on keeping you up to date with the industry and keeping your skills sharp, without wasting your valuable time.
01 Mar 2021 10:00am GMT
Tryton News: Newsletter for March 2021
Here's a sneak peak at the improvements that landed during the last month.
Changes for the User
We now show the carrier on the shipment list so it's possible to prioritize shipments based on the carrier.
We've added a wizard to make it easy to add lots to stock moves. The sequence to use for the lot number can be configured for each product.
We ensure the unit prices for stock moves are up to date when their invoices are posted or their moves are done.
The account move lines created by a statement now have the statement line as their origin. This makes it simpler to audit the accounts.
We now use the menu path from which a window was opened as its name.
We now warn the user when they try to post a statement with cancelled or paid invoices and then remove them from the statement.
A delivery usage checkbox has been added to contact mechanisms just like for addresses. It can be used, for example, to indicate which email address to send notifications related to deliveries.
The clients now display the revision on the dialog. This is useful, for example, when opening the party dialog from the invoice when the history is activated. This way the user can see from which date the information is displayed.
It is easy to get lost when quickly opening consecutive dialog fields. To improve the situation, the clients now display breadcrumbs in the title showing the browsing path to the dialog.
We've added the new identifiers from python-stdnum 1.15.
We no longer create accounting moves for stock when the amount involved is 0.
There is now a scheduled task that can be configured to fetch currency rates at a specific frequency. By default it gets the rates from the European Central Bank.
New Modules
Changes for the System Administrator
We've added device cookie support to the clients. This allows these clients to not be affected by the brute force attack protection.
Changes for the Developer
It is now possible to send emails with different "FROM" addresses for the envelope and header.
All the warnings can be skipped automatically by adding a single key named _skip_warnings
to the context.
We've added the trigonometric functions to the SQLite back-end.
Any fields that are loaded eagerly are no longer instantiated automatically but instead the id is just stored in the cache. The instantiation is done only if the field is actually accessed. This improves the performance of some operations by up to 13%, but the actual improvements you can expect will depend a lot on of the number of fields the model has.
It is now possible to define help text for each selection value. However, at the moment only the web client can display it.
We made the ModelView.parse_view
method public. This allows the XML that makes up the view to be modified by code before it is sent to the client.
It is now possible to group the report renderings by header. As the OpenDocument format only supports a single header and footer definition, this feature renders a different file for each header and places them in a zip file if needed. This is used when rendering company related reports which display the company information in the header/footer.
In order to simplify the dependencies in our web client, we replaced tempusdominus with the browser's native input methods for types date
, datetime-local
and time
when available.
In order to make better use of the browse cache, the getter
method of Function
fields is called with cache sized groups of records.
1 post - 1 participant
01 Mar 2021 9:00am GMT
Tryton News: Newsletter for March 2021
Here's a sneak peak at the improvements that landed during the last month.
Changes for the User
We now show the carrier on the shipment list so it's possible to prioritize shipments based on the carrier.
We've added a wizard to make it easy to add lots to stock moves. The sequence to use for the lot number can be configured for each product.
We ensure the unit prices for stock moves are up to date when their invoices are posted or their moves are done.
The account move lines created by a statement now have the statement line as their origin. This makes it simpler to audit the accounts.
We now use the menu path from which a window was opened as its name.
We now warn the user when they try to post a statement with cancelled or paid invoices and then remove them from the statement.
A delivery usage checkbox has been added to contact mechanisms just like for addresses. It can be used, for example, to indicate which email address to send notifications related to deliveries.
The clients now display the revision on the dialog. This is useful, for example, when opening the party dialog from the invoice when the history is activated. This way the user can see from which date the information is displayed.
It is easy to get lost when quickly opening consecutive dialog fields. To improve the situation, the clients now display breadcrumbs in the title showing the browsing path to the dialog.
We've added the new identifiers from python-stdnum 1.15.
We no longer create accounting moves for stock when the amount involved is 0.
There is now a scheduled task that can be configured to fetch currency rates at a specific frequency. By default it gets the rates from the European Central Bank.
New Modules
Changes for the System Administrator
We've added device cookie support to the clients. This allows these clients to not be affected by the brute force attack protection.
Changes for the Developer
It is now possible to send emails with different "FROM" addresses for the envelope and header.
All the warnings can be skipped automatically by adding a single key named _skip_warnings
to the context.
We've added the trigonometric functions to the SQLite back-end.
Any fields that are loaded eagerly are no longer instantiated automatically but instead the id is just stored in the cache. The instantiation is done only if the field is actually accessed. This improves the performance of some operations by up to 13%, but the actual improvements you can expect will depend a lot on of the number of fields the model has.
It is now possible to define help text for each selection value. However, at the moment only the web client can display it.
We made the ModelView.parse_view
method public. This allows the XML that makes up the view to be modified by code before it is sent to the client.
It is now possible to group the report renderings by header. As the OpenDocument format only supports a single header and footer definition, this feature renders a different file for each header and places them in a zip file if needed. This is used when rendering company related reports which display the company information in the header/footer.
In order to simplify the dependencies in our web client, we replaced tempusdominus with the browser's native input methods for types date
, datetime-local
and time
when available.
In order to make better use of the browse cache, the getter
method of Function
fields is called with cache sized groups of records.
1 post - 1 participant
01 Mar 2021 9:00am GMT
Mike Driscoll: PyDev of the Week: Jonathan Hoffstadt
This week we welcome Jonathan Hoffstadt (@jhoffs1) as our PyDev of the Week! Jonathan is the co-author of Dear PyGUI. It's a neat, new Python GUI package. You can see what else Jonathan has been working on over on Github.
Let's spend some time getting to know Jonathan better!
Can you tell us a little about yourself (hobbies, education, etc):
I'm a mechanical engineer based in Houston, Texas. I have a bachelor's degree in Mechanical Engineering from Louisiana State University, was a Tow Gunner in the U.S. Marines, and I've been working in the oil and gas industry since I graduated university.
My hobbies include chess, shooting, and programming. With programming, I find 3D graphics to be extremely interesting.
Why did you start using Python?
I'd been interested in programming since middle school after I was given a C++ for dummies book as a gift, but I did not encounter Python until university. It was there that I started using Python as a free alternative to MATLAB for assignments. It wasn't long before I was hooked on the language.
I started using it outside of homework for anything and everything I could. This included making small games, automating tasks at internships, controlling breadboards with raspberry pi's, and everything in between. When compared to other languages, I was amazed at how quickly you could make things happen.
I ended up using Python for courses in Finite Element Analysis and Computational Fluid Dynamics. For our senior design capstone project, my team was tasked with building an Arc Welding 3D printer. As the member with the most exposure to programming, I was responsible for the software side of the project in which I used Python to control all the mechanical devices including a robotic arm and custom electronics the team created. I also wrote my first user interface which used tkinter and pygame to wrap an open source slicing engine and provide a 3D view of tool paths and the robotic arm position.
What other programming languages do you know, and which is your favorite?
C, C++, and Java are my other primary languages, though I've worked with C#, Swift, and Objective-C.
The truth is that I have 2 favorite languages. C++ for large projects. Python for small projects, scripting, and just getting things done!
What projects are you working on now?
I currently spend most of my time working on Dear PyGui.
Which Python libraries are your favorite (core or 3rd party)?
My favorite Python libraries would have to be NumPy, Pillow, tqdm, json, and Nuitka.
How did your package, Dear PyGUI, come about?
Dear PyGui is a graphical user interface library I coauthored with my friend, Preston Cothren. As mechanical engineers, we use python daily for calculations, scripts, and plugins for various software used in mechanical design and analysis. We wanted an easy way to add an interface to the various scripts with minimal effort.
The first iteration of the software was called "Engineer's Sandbox" which was commercial. Not only was it easy to create small interfaces but it also made it easy to package and distribute. It came with a built in IDE and 60 premade apps. "Sandbox" was a C++ program that embedded python into it where the graphics were created with wxWidgets. Ultimately, this project was unsuccessful, gaining only a few hundred users. You can see an image of it below:
Six months after abandoning Engineer's Sandbox, we revisited the idea and reassessed. We came to 3 realizations:
1. Our primary target audience (mechanical engineers) were mostly uninterested in programming.
2. The software was too restrictive and limited for developers (our second target audience). Limited widgets, layouts, limited 3rd party operability, etc.
3. Most developers prefer using open source libraries.
From these realizations, we went back to the drawing board and decided to make a full GUI library. With this iteration being open source, a Python extension (instead of a standalone software), and as easy to use as possible.
Between iterations, we fell in love with the C++ library, Dear ImGui, and so decided to use it as underlying library to build around. With Dear ImGui being an immediate mode GUI library, it allowed us to make Dear PyGui extremely dynamic when compared to other UI libraries.
Dear PyGui has continued to rapidly improve and grow in popularity since we released the first open beta in July of 2020:
What are the top three things you've learned as an open-source developer?
As an open-source developer, I've learned that:
1. It's hard work and will make you appreciate open-source software and developers.
2. Listen to the community but also know when to say "no".
3. Funding is difficult to find, so you should enjoy the work you are doing.
Is there anything else you'd like to say?
Yes! This is for those new to programming. I'm often asked how to learn a programming language, library, topic, etc. and my answer has always been: The best way to learn anything in programming is to just start building things.
I typically skim a book then immediately start trying to build something. As I get stuck, I go back to the book to read the relevant sections more closely. Once it's time to refactor and optimize, I typically go back to the book and read the more advanced sections now that I'm more aware of the issues that the advanced sections try to address. I've found this technique helps me a lot. Although you may end up reinventing the wheel by saving the advanced topics for after you're done, you will end up with a deeper understanding that you are unlikely to forget.
Thanks for doing the interview, Jonathan!
The post PyDev of the Week: Jonathan Hoffstadt appeared first on Mouse Vs Python.
01 Mar 2021 6:05am GMT
Mike Driscoll: PyDev of the Week: Jonathan Hoffstadt
This week we welcome Jonathan Hoffstadt (@jhoffs1) as our PyDev of the Week! Jonathan is the co-author of Dear PyGUI. It's a neat, new Python GUI package. You can see what else Jonathan has been working on over on Github.
Let's spend some time getting to know Jonathan better!
Can you tell us a little about yourself (hobbies, education, etc):
I'm a mechanical engineer based in Houston, Texas. I have a bachelor's degree in Mechanical Engineering from Louisiana State University, was a Tow Gunner in the U.S. Marines, and I've been working in the oil and gas industry since I graduated university.
My hobbies include chess, shooting, and programming. With programming, I find 3D graphics to be extremely interesting.
Why did you start using Python?
I'd been interested in programming since middle school after I was given a C++ for dummies book as a gift, but I did not encounter Python until university. It was there that I started using Python as a free alternative to MATLAB for assignments. It wasn't long before I was hooked on the language.
I started using it outside of homework for anything and everything I could. This included making small games, automating tasks at internships, controlling breadboards with raspberry pi's, and everything in between. When compared to other languages, I was amazed at how quickly you could make things happen.
I ended up using Python for courses in Finite Element Analysis and Computational Fluid Dynamics. For our senior design capstone project, my team was tasked with building an Arc Welding 3D printer. As the member with the most exposure to programming, I was responsible for the software side of the project in which I used Python to control all the mechanical devices including a robotic arm and custom electronics the team created. I also wrote my first user interface which used tkinter and pygame to wrap an open source slicing engine and provide a 3D view of tool paths and the robotic arm position.
What other programming languages do you know, and which is your favorite?
C, C++, and Java are my other primary languages, though I've worked with C#, Swift, and Objective-C.
The truth is that I have 2 favorite languages. C++ for large projects. Python for small projects, scripting, and just getting things done!
What projects are you working on now?
I currently spend most of my time working on Dear PyGui.
Which Python libraries are your favorite (core or 3rd party)?
My favorite Python libraries would have to be NumPy, Pillow, tqdm, json, and Nuitka.
How did your package, Dear PyGUI, come about?
Dear PyGui is a graphical user interface library I coauthored with my friend, Preston Cothren. As mechanical engineers, we use python daily for calculations, scripts, and plugins for various software used in mechanical design and analysis. We wanted an easy way to add an interface to the various scripts with minimal effort.
The first iteration of the software was called "Engineer's Sandbox" which was commercial. Not only was it easy to create small interfaces but it also made it easy to package and distribute. It came with a built in IDE and 60 premade apps. "Sandbox" was a C++ program that embedded python into it where the graphics were created with wxWidgets. Ultimately, this project was unsuccessful, gaining only a few hundred users. You can see an image of it below:
Six months after abandoning Engineer's Sandbox, we revisited the idea and reassessed. We came to 3 realizations:
1. Our primary target audience (mechanical engineers) were mostly uninterested in programming.
2. The software was too restrictive and limited for developers (our second target audience). Limited widgets, layouts, limited 3rd party operability, etc.
3. Most developers prefer using open source libraries.
From these realizations, we went back to the drawing board and decided to make a full GUI library. With this iteration being open source, a Python extension (instead of a standalone software), and as easy to use as possible.
Between iterations, we fell in love with the C++ library, Dear ImGui, and so decided to use it as underlying library to build around. With Dear ImGui being an immediate mode GUI library, it allowed us to make Dear PyGui extremely dynamic when compared to other UI libraries.
Dear PyGui has continued to rapidly improve and grow in popularity since we released the first open beta in July of 2020:
What are the top three things you've learned as an open-source developer?
As an open-source developer, I've learned that:
1. It's hard work and will make you appreciate open-source software and developers.
2. Listen to the community but also know when to say "no".
3. Funding is difficult to find, so you should enjoy the work you are doing.
Is there anything else you'd like to say?
Yes! This is for those new to programming. I'm often asked how to learn a programming language, library, topic, etc. and my answer has always been: The best way to learn anything in programming is to just start building things.
I typically skim a book then immediately start trying to build something. As I get stuck, I go back to the book to read the relevant sections more closely. Once it's time to refactor and optimize, I typically go back to the book and read the more advanced sections now that I'm more aware of the issues that the advanced sections try to address. I've found this technique helps me a lot. Although you may end up reinventing the wheel by saving the advanced topics for after you're done, you will end up with a deeper understanding that you are unlikely to forget.
Thanks for doing the interview, Jonathan!
The post PyDev of the Week: Jonathan Hoffstadt appeared first on Mouse Vs Python.
01 Mar 2021 6:05am GMT
28 Feb 2021
Planet Python
Matthew Wright: Profiling Python code with line_profiler
Once we have debugged, working, readable (and hopefully testable) code, it may become important to examine it more closely and try to improve the code's performance. Before we can make any progress in determining if our changes are an improvement, we need to measure the current performance and see where it is spending its time. … Continue reading Profiling Python code with line_profiler
The post Profiling Python code with line_profiler appeared first on wrighters.io.
28 Feb 2021 11:44pm GMT
Matthew Wright: Profiling Python code with line_profiler
Once we have debugged, working, readable (and hopefully testable) code, it may become important to examine it more closely and try to improve the code's performance. Before we can make any progress in determining if our changes are an improvement, we need to measure the current performance and see where it is spending its time. … Continue reading Profiling Python code with line_profiler
The post Profiling Python code with line_profiler appeared first on wrighters.io.
28 Feb 2021 11:44pm GMT
Test and Code: 146: Automation Tools for Web App and API Development and Maintenance - Michael Kennedy
Building any software, including web apps and APIs requires testing.
There's automated testing, and there's manual testing.
In between that is exploratory testing aided by automation tools.
Michael Kennedy joins the show this week to share some of the tools he uses during development and maintenance.
We talk about tools used for semi-automated exploratory testing.
We also talk about some of the other tools and techniques he uses to keep Talk Python Training, Talk Python, and Python Bytes all up and running smoothly.
We talk about:
- Postman
- ngrok
- sitemap link testing
- scripts for manual processes
- using failover servers during maintenance, redeployments, etc
- gitHub webhooks and scripts to between fail over servers and production during deployments automatically
- floating IP addresses
- services to monitor your site: StatusCake, BetterUptime
- the affect of monitoring on analytics
- crash reporting: Rollbar, Sentry
- response times
- load testing: Locus
Special Guest: Michael Kennedy.
Sponsored By:
- Linode: If it runs on Linux, it runs on Linode. Get started on Linode today with $100 in free credit for listeners of Test & Code.
Support Test & Code : Python Testing
Links:
- Python Bytes Podcast
- Talk Python To Me Podcast
- Talk Python Training
- Postman
- ngrok
- StatusCake
- Better Uptime
- Rollbar
- Sentry
- Locust
- 12 requests per second in Python
<p>Building any software, including web apps and APIs requires testing.<br> There's automated testing, and there's manual testing.<br><br> In between that is exploratory testing aided by automation tools. </p> <p>Michael Kennedy joins the show this week to share some of the tools he uses during development and maintenance.</p> <p>We talk about tools used for semi-automated exploratory testing. <br> We also talk about some of the other tools and techniques he uses to keep Talk Python Training, Talk Python, and Python Bytes all up and running smoothly. </p> <p>We talk about:</p> <ul> <li>Postman</li> <li>ngrok</li> <li>sitemap link testing</li> <li>scripts for manual processes</li> <li>using failover servers during maintenance, redeployments, etc</li> <li>gitHub webhooks and scripts to between fail over servers and production during deployments automatically</li> <li>floating IP addresses </li> <li>services to monitor your site: StatusCake, BetterUptime</li> <li>the affect of monitoring on analytics</li> <li>crash reporting: Rollbar, Sentry</li> <li>response times</li> <li>load testing: Locus</li> </ul><p>Special Guest: Michael Kennedy.</p><p>Sponsored By:</p><ul><li><a href="https://linode.com/testandcode" rel="nofollow">Linode</a>: <a href="https://linode.com/testandcode" rel="nofollow">If it runs on Linux, it runs on Linode. Get started on Linode today with $100 in free credit for listeners of Test & Code.</a></li></ul><p><a href="https://www.patreon.com/testpodcast" rel="payment">Support Test & Code : Python Testing</a></p><p>Links:</p><ul><li><a href="https://pythonbytes.fm/" title="Python Bytes Podcast" rel="nofollow">Python Bytes Podcast</a></li><li><a href="https://talkpython.fm/" title="Talk Python To Me Podcast" rel="nofollow">Talk Python To Me Podcast</a></li><li><a href="https://training.talkpython.fm/" title="Talk Python Training" rel="nofollow">Talk Python Training</a></li><li><a href="https://www.postman.com/" title="Postman" rel="nofollow">Postman</a></li><li><a href="https://ngrok.com/" title="ngrok" rel="nofollow">ngrok</a></li><li><a href="https://www.statuscake.com/" title="StatusCake" rel="nofollow">StatusCake</a></li><li><a href="https://betteruptime.com/" title="Better Uptime" rel="nofollow">Better Uptime</a></li><li><a href="https://rollbar.com/free-trial/?obility_id=104442591522&utm_source=google&utm_medium=cpc&utm_campaign=Brand&utm_term=rollbar&utm_content=442667882877&hsa_acc=8602228161&hsa_cam=10324648206&hsa_grp=104442591522&hsa_ad=442667882877&hsa_src=g&hsa_tgt=kwd-780380991&hsa_kw=rollbar&hsa_mt=e&hsa_net=adwords&hsa_ver=3&gclid=Cj0KCQiA7NKBBhDBARIsAHbXCB4WfJXsrUh4i3hrFD6JX5I96uIJSn55kDXDV3cujqnUoquHBwyqRcYaAhgOEALw_wcB" title="Rollbar" rel="nofollow">Rollbar</a></li><li><a href="https://sentry.io" title="Sentry" rel="nofollow">Sentry</a></li><li><a href="https://locust.io/" title="Locust" rel="nofollow">Locust</a></li><li><a href="https://suade.org/dev/12-requests-per-second-with-python/" title="12 requests per second in Python" rel="nofollow">12 requests per second in Python</a></li></ul>
28 Feb 2021 11:00pm GMT
Test and Code: 146: Automation Tools for Web App and API Development and Maintenance - Michael Kennedy
Building any software, including web apps and APIs requires testing.
There's automated testing, and there's manual testing.
In between that is exploratory testing aided by automation tools.
Michael Kennedy joins the show this week to share some of the tools he uses during development and maintenance.
We talk about tools used for semi-automated exploratory testing.
We also talk about some of the other tools and techniques he uses to keep Talk Python Training, Talk Python, and Python Bytes all up and running smoothly.
We talk about:
- Postman
- ngrok
- sitemap link testing
- scripts for manual processes
- using failover servers during maintenance, redeployments, etc
- gitHub webhooks and scripts to between fail over servers and production during deployments automatically
- floating IP addresses
- services to monitor your site: StatusCake, BetterUptime
- the affect of monitoring on analytics
- crash reporting: Rollbar, Sentry
- response times
- load testing: Locus
Special Guest: Michael Kennedy.
Sponsored By:
- Linode: If it runs on Linux, it runs on Linode. Get started on Linode today with $100 in free credit for listeners of Test & Code.
Support Test & Code : Python Testing
Links:
- Python Bytes Podcast
- Talk Python To Me Podcast
- Talk Python Training
- Postman
- ngrok
- StatusCake
- Better Uptime
- Rollbar
- Sentry
- Locust
- 12 requests per second in Python
<p>Building any software, including web apps and APIs requires testing.<br> There's automated testing, and there's manual testing.<br><br> In between that is exploratory testing aided by automation tools. </p> <p>Michael Kennedy joins the show this week to share some of the tools he uses during development and maintenance.</p> <p>We talk about tools used for semi-automated exploratory testing. <br> We also talk about some of the other tools and techniques he uses to keep Talk Python Training, Talk Python, and Python Bytes all up and running smoothly. </p> <p>We talk about:</p> <ul> <li>Postman</li> <li>ngrok</li> <li>sitemap link testing</li> <li>scripts for manual processes</li> <li>using failover servers during maintenance, redeployments, etc</li> <li>gitHub webhooks and scripts to between fail over servers and production during deployments automatically</li> <li>floating IP addresses </li> <li>services to monitor your site: StatusCake, BetterUptime</li> <li>the affect of monitoring on analytics</li> <li>crash reporting: Rollbar, Sentry</li> <li>response times</li> <li>load testing: Locus</li> </ul><p>Special Guest: Michael Kennedy.</p><p>Sponsored By:</p><ul><li><a href="https://linode.com/testandcode" rel="nofollow">Linode</a>: <a href="https://linode.com/testandcode" rel="nofollow">If it runs on Linux, it runs on Linode. Get started on Linode today with $100 in free credit for listeners of Test & Code.</a></li></ul><p><a href="https://www.patreon.com/testpodcast" rel="payment">Support Test & Code : Python Testing</a></p><p>Links:</p><ul><li><a href="https://pythonbytes.fm/" title="Python Bytes Podcast" rel="nofollow">Python Bytes Podcast</a></li><li><a href="https://talkpython.fm/" title="Talk Python To Me Podcast" rel="nofollow">Talk Python To Me Podcast</a></li><li><a href="https://training.talkpython.fm/" title="Talk Python Training" rel="nofollow">Talk Python Training</a></li><li><a href="https://www.postman.com/" title="Postman" rel="nofollow">Postman</a></li><li><a href="https://ngrok.com/" title="ngrok" rel="nofollow">ngrok</a></li><li><a href="https://www.statuscake.com/" title="StatusCake" rel="nofollow">StatusCake</a></li><li><a href="https://betteruptime.com/" title="Better Uptime" rel="nofollow">Better Uptime</a></li><li><a href="https://rollbar.com/free-trial/?obility_id=104442591522&utm_source=google&utm_medium=cpc&utm_campaign=Brand&utm_term=rollbar&utm_content=442667882877&hsa_acc=8602228161&hsa_cam=10324648206&hsa_grp=104442591522&hsa_ad=442667882877&hsa_src=g&hsa_tgt=kwd-780380991&hsa_kw=rollbar&hsa_mt=e&hsa_net=adwords&hsa_ver=3&gclid=Cj0KCQiA7NKBBhDBARIsAHbXCB4WfJXsrUh4i3hrFD6JX5I96uIJSn55kDXDV3cujqnUoquHBwyqRcYaAhgOEALw_wcB" title="Rollbar" rel="nofollow">Rollbar</a></li><li><a href="https://sentry.io" title="Sentry" rel="nofollow">Sentry</a></li><li><a href="https://locust.io/" title="Locust" rel="nofollow">Locust</a></li><li><a href="https://suade.org/dev/12-requests-per-second-with-python/" title="12 requests per second in Python" rel="nofollow">12 requests per second in Python</a></li></ul>
28 Feb 2021 11:00pm GMT
Codementor: Server deployment with Python: From A to Z.
In this tutorial you will learn how to configure a server and deploy a web app from scratch by using only Python.
28 Feb 2021 5:58pm GMT
Codementor: Server deployment with Python: From A to Z.
In this tutorial you will learn how to configure a server and deploy a web app from scratch by using only Python.
28 Feb 2021 5:58pm GMT
27 Feb 2021
Planet Python
The Open Sourcerer: A new data format has landed in the upcoming GTG 0.5
Here's a general call for testing from your favorite pythonic native Linux desktop personal productivity app, GTG.
In recent months, Diego tackled the epic task of redesigning the XML file format from a new specification devised with the help of Brent Saner (proposal episodes 1, 2 and 3), and then implementing the new file format in GTG. This work has now been merged to the main development branch on GTG's git repository:
Diego's changes are major, invasive technological changes, and they would benefit from extensive testing by everybody with "real data" before 0.5 happens (very soon). I've done some pretty extensive testing & bug reporting in the last few months; Diego fixed all the issues I've reported so far, so I've pretty much run out of serious bugs now, as only a few remain targetted to the 0.5 milestone… But I'm only human, and it is possible that issues might remain, even after my troll-testing.
Grab GTG's git version ASAP, with a copy of your real data (for extra caution, and also because we want you to test with real data); see the instructions in the README, including the "Where is my user data and config stored?" section.
Please torture-test it to make sure everything is working properly, and report issues you may find (if any). Look for anything that might seem broken "compared to 0.4", incorrect task parenting/associations, incorrect tagging, broken content, etc.
If you've tried to break it and still couldn't find any problems, maybe one way to indicate that would be a "👍" on the merge request-I'm not sure we really have another way to know if it turns out that "everything is OK" 🙂
Your help in testing this (or spreading the word) will help ensure a smooth transition for users getting an upgrade from 0.4 to 0.5, letting us release 0.5 with confidence. Thanks!
27 Feb 2021 11:53pm GMT
The Open Sourcerer: A new data format has landed in the upcoming GTG 0.5
Here's a general call for testing from your favorite pythonic native Linux desktop personal productivity app, GTG.
In recent months, Diego tackled the epic task of redesigning the XML file format from a new specification devised with the help of Brent Saner (proposal episodes 1, 2 and 3), and then implementing the new file format in GTG. This work has now been merged to the main development branch on GTG's git repository:
Diego's changes are major, invasive technological changes, and they would benefit from extensive testing by everybody with "real data" before 0.5 happens (very soon). I've done some pretty extensive testing & bug reporting in the last few months; Diego fixed all the issues I've reported so far, so I've pretty much run out of serious bugs now, as only a few remain targetted to the 0.5 milestone… But I'm only human, and it is possible that issues might remain, even after my troll-testing.
Grab GTG's git version ASAP, with a copy of your real data (for extra caution, and also because we want you to test with real data); see the instructions in the README, including the "Where is my user data and config stored?" section.
Please torture-test it to make sure everything is working properly, and report issues you may find (if any). Look for anything that might seem broken "compared to 0.4", incorrect task parenting/associations, incorrect tagging, broken content, etc.
If you've tried to break it and still couldn't find any problems, maybe one way to indicate that would be a "👍" on the merge request-I'm not sure we really have another way to know if it turns out that "everything is OK" 🙂
Your help in testing this (or spreading the word) will help ensure a smooth transition for users getting an upgrade from 0.4 to 0.5, letting us release 0.5 with confidence. Thanks!
27 Feb 2021 11:53pm GMT
Weekly Python StackOverflow Report: (cclxv) stackoverflow python report
These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2021-02-27 18:35:08 GMT
- Why aren't my list elements being swapped? - [11/1]
- Title words in a column except certain words - [8/4]
- specify number of spaces between pandas DataFrame columns when printing - [8/1]
- More effective / clean way to aggregate data - [5/5]
- Is there a way to write this if-else using min and max? - [5/4]
- What does the operator += return in Python - [5/4]
- Find missing numbers in a sorted column in Pandas Dataframe - [5/3]
- Perform best cycle sort knowing order at the end - [5/1]
- Making a scroll bar but its inconsistent - [5/1]
- Python import mechanism and module mocks - [5/1]
27 Feb 2021 8:42pm GMT
Weekly Python StackOverflow Report: (cclxv) stackoverflow python report
These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2021-02-27 18:35:08 GMT
- Why aren't my list elements being swapped? - [11/1]
- Title words in a column except certain words - [8/4]
- specify number of spaces between pandas DataFrame columns when printing - [8/1]
- More effective / clean way to aggregate data - [5/5]
- Is there a way to write this if-else using min and max? - [5/4]
- What does the operator += return in Python - [5/4]
- Find missing numbers in a sorted column in Pandas Dataframe - [5/3]
- Perform best cycle sort knowing order at the end - [5/1]
- Making a scroll bar but its inconsistent - [5/1]
- Python import mechanism and module mocks - [5/1]
27 Feb 2021 8:42pm GMT
Corey Gallon: 3 Simple Steps to Build a Python Package for Conda Forge

Hey data hackers! We're all raving fans of Conda Forge - the community-led repository of recipes, build infrastructure and distributions for the conda package manager, right? It's a rich source of the most useful, updated libraries for Python (and many other languages, including R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN … the list goes on). You may well have found, though, that the library you're looking for isn't available in the repo, which is precisely what I found recently when building a machine learning model to predict song popularity, based on its musical attributes. What does one do in such a situation? Package the library yourself, of course! While this may seem a daunting task, we'll work through 3 simple steps to building a Python package for Conda Forge, and submitting it to the repository.
Why build a package for Conda Forge?
The excellent spotipy library, written and maintained by the similarly awesome data scientist and developer Paul Lamere (Director of the Developer Platform at Spotify), is (well … was) not available in Conda Forge. This library is a Python wrapper for the Spotify API. Recently, I wrote a machine learning model to predict a song's popularity based on its musical attributes. The data underlying the model were to be pulled from the Spotify API, using the spotipy library. Unfortunately, at the time I went to build the model, the only Python package repo offering spotipy was PyPI (the repo that 'pip' installs from). You, like me, may have learned the hard way that it is unadvised to mix packages from both pip and conda in a conda environment. In order to get through my little machine learning project, I downloaded the spotipy source code from PyPI and both built and installed it locally. Wanting to write about the project, though, I realized that this approach is suboptimal for most who want to hackalong with the article, hence the decision to package spotipy myself.
"Enough background, already - let's build!"
Rightho! Enough background … let's get started! The good people at Conda Forge provide detailed instructions on how to contribute packages to the repo. We'll distill, from these detailed instructions, 3 simple steps to follow to build and contribute your Python package to Conda Forge.
- Fork the Conda Forge examples recipes repository at GitHub
- Tailor the example recipe for your package
- Submit a pull request to Conda Forge
Let's get into it!
1. Fork the Conda Forge examples repository at GitHub
Before we head over to GitHub, be sure that the source code for your package is available for download in a single file. This should be an archive (e.g. tarball, zip, etc …) of some kind. In this case, as we're packaging spotipy from PyPI, we can confirm that the source code is, indeed, available in a gzipped tarball there. You should confirm, similarly, for the package you plan to build and contribute.
We'll be working in GitHub for this project. If you've never used GitHub before, or need a brief refresher, here's the great Hello World! documentation they offer.
Okay! Pop over to GitHub and open the 'staged-recipes' repository in GitHub Desktop. This will clone the repo to your local machine.

In GitHub Desktop, within your newly forked repo, create a new branch from the staged-recipes master branch. Name the new branch as is sensible for your package. In this case, I named the branch 'spotipy'. To create a new branch, simply click the 'master' branch button and type the name of the new branch into the 'Filter' text field.

Now we'll create a new folder in the 'recipes' folder, and copy the example recipe into the new folder. To do this, open the files in the repository in your operating system's file browser.

The newly opened window will look something like this, depending on your operating system (this is Windows 10)

Navigate to the 'recipes' folder (highlighted above) and create a copy of the 'example' folder (CTRL + drag the folder in Windows 10), then rename it to reflect your package name. NB this is an important step - don't just create a folder, copy and rename it so that the example 'meta.yaml' file is copied also.

Within your newly created folder, open the meta.yaml
file in your favorite text editor, and …
2. Tailor the example recipe for your package
Conda Forge recipes are written in the YAML (YAML Ain't Markup Language) data serialization language. If you've not written YAML before, or need a brief refresher, here's a great reference. The copy of the example recipe meta.yaml
file looks like this
Again, the good people at Conda Forge provide very detailed instructions on how to edit this file - both in the file itself, and in their documentation. Another vital bit of documentation is provided by conda here. Let's save you the hassle of reading through all of this at the start, though I found myself referencing these docs frequently throughout the process and strongly suggest you come back and do the same.
Pro tip: use the Python 'grayskull' package to automatically create a near-perfect meta.yaml file!
I spent ages the first time I did this manually editing the meta.yaml
file for my package and iteratively submitting it via a pull request. It turns out that all of that brain damage can be avoided by using the Python packaging tools provided by conda
to generate this file. The documentation provided by the Conda team is helpful, though the approach outlined here (i.e. using conda skeleton
) did not work for me because, I learned after much banging of my head against the keyboard, conda skeleton
needs a major refactoring.
Enter grayskull
, which will eventually replace conda skeleton
. Grayskull is a tool to generate concise recipes for Conda Forge - precisely the task at hand! We won't go through the process of creating a new conda environment here. Simply install grayskull from conda forge in a new conda environment as follows:
conda install -c conda-forge grayskull
With grayskull installed, use it to build a meta.yaml
(recipe) for your package, by calling it, passing it the repository in which the package presently lives, and the name of the package. In our case:
grayskull pypi spotipy
Upon a successful build, you'll see something like this …

… and there will be a newly created folder with the name of your package. Inside this folder, you'll find a near-perfect recipe to inform your tailoring of the meta.yaml
file you've copied in your local clone of the repo above.
At this point, you could either copy and paste the file into your clone of the repo, overwriting the example file above, or edit the example file down using this information as inputs. I suggest the latter, as the recipe that grayskull
creates isn't quite perfect and will likely be rejected during the pull request process without some edits. Importantly, in this case, Conda Forge requires minimum version limits for Python because we're building to the 'noarch' (i.e. operating system non-specific) target. The edits are simple enough, though. In the 'requirements' section of the YAML file, add minimum versions to the python
lines for both host
and run
.
… annnnnnnd we're donezo! The final, pull request-ready recipe meta.yaml
file for the spotipy package is as follows.
Note the subtle differences between this file and the one generated by grayskull
. Again, I recommend editing the file using the output of grayskull
rather than copying and pasting to avoid potential issues during the pull request process.
3. Submit a pull request to Conda Forge
Rightho! We're almost finished. All that remains is to submit a pull request to the maintainers of Conda Forge. Before we do, we'll need to commit the changes we've made in our local clone of the GitHub.

Now we'll push the commit back to GitHub.

Excellent! Now it's time to submit our pull request.
They have a really dope continuous integration (CI) pipeline that automates most of the pull request process! A reviewer from the Conda Forge team will, ultimately, review and approve the request, but there is a heap of great feedback from their automated CI that really speeds the process up.
Important note: do not fully submit a pull request for the spotipy package if you've been hacking-along with this article, as it will be rejected as a duplicate (of the package I've already contributed).
When we select "Pull request" from the GitHub desktop app, we're returned to the GitHub website to complete the process. Click the 'Pull request' link at the top right of the repo.

The next screen shows you compared changes between the original repo we cloned, and the edits we've made.

Click the green "Create pull request" button. In the next screen, provide an informative title for the pull request and work carefully through each item of the checklist to confirm that all of the requirements are met.


Once the pull request is submitted, the aforementioned slick CI process is kicked off and works through 5 automated steps. These include linting, checking the package builds, then checking that the package builds for each of 3 target operating systems (Linux, OSX and Windows). When these checks successfully complete, you'll be notified. (This process can take as much as half an hour or so … be patient!)

Now … we play the waiting game until we hear back from the maintainers with any additional feedback …
In a little while (a few hours, in this case) the maintainers will respond with either feedback or, if all went well, confirmation that the pull request was merged. Congratulations - you are now a package maintainer for Conda Forge! You'll receive an email invitation to the Conda Forge organization on GitHub, which must be accepted within 7 days. After accepting, the automated CI process will build the package so that it is available at Conda Forge. Here's the spotipy package.

We can also run a quick search from the shell to confirm that the package is available to install via conda
conda search -c conda-forge spotipy

… and that's it! I hope you've found this both interesting and accelerating as you make your way into the wonderful world of Python packaging for Conda Forge!
What the huh!? Problems I encountered along the way …
This article comes together in a way that suggests graceful ease of development. As usual, that was certainly not the case. Here are a few things I learned by banging my head against the keyboard during the process of packaging spotipy
for Conda Forge.
grayskull
would have saved me heaps of time that was otherwise spent digging through source code trying to figure out how to edit themeta.yaml
file for the package recipe. In addition to digging through the source code, I spent entirely too much time trying to parse the PKG-INFO file it contains into a recipe for Conda Forge. It worked, and I managed to get it packaged, but OMGgrayskull
FTW! This article would have been a much longer, more tedious piece without this learning.- The documentation provided by the Conda team on packaging for Conda is outdated and doesn't work. My first foray into packaging was a frustrating failure as a result, leaving hours spent without the outcome I expected (though I learned a lot). I even ended up posting to the anaconda.org mailing list looking for help, which never came. Perhaps one day, if I muster the motivation, I'll edit the docs and submit a pull request to update the broken bits.
- As always, reading the documentation is particularly helpful. Despite the hackalong provided in this article, I strongly recommend reading the docs I've linked to in order to better understand the packaging process.
Screencast forthcoming!
Stay tuned for a video walkthrough of this article!
27 Feb 2021 8:18pm GMT
Corey Gallon: 3 Simple Steps to Build a Python Package for Conda Forge

Hey data hackers! We're all raving fans of Conda Forge - the community-led repository of recipes, build infrastructure and distributions for the conda package manager, right? It's a rich source of the most useful, updated libraries for Python (and many other languages, including R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN … the list goes on). You may well have found, though, that the library you're looking for isn't available in the repo, which is precisely what I found recently when building a machine learning model to predict song popularity, based on its musical attributes. What does one do in such a situation? Package the library yourself, of course! While this may seem a daunting task, we'll work through 3 simple steps to building a Python package for Conda Forge, and submitting it to the repository.
Why build a package for Conda Forge?
The excellent spotipy library, written and maintained by the similarly awesome data scientist and developer Paul Lamere (Director of the Developer Platform at Spotify), is (well … was) not available in Conda Forge. This library is a Python wrapper for the Spotify API. Recently, I wrote a machine learning model to predict a song's popularity based on its musical attributes. The data underlying the model were to be pulled from the Spotify API, using the spotipy library. Unfortunately, at the time I went to build the model, the only Python package repo offering spotipy was PyPI (the repo that 'pip' installs from). You, like me, may have learned the hard way that it is unadvised to mix packages from both pip and conda in a conda environment. In order to get through my little machine learning project, I downloaded the spotipy source code from PyPI and both built and installed it locally. Wanting to write about the project, though, I realized that this approach is suboptimal for most who want to hackalong with the article, hence the decision to package spotipy myself.
"Enough background, already - let's build!"
Rightho! Enough background … let's get started! The good people at Conda Forge provide detailed instructions on how to contribute packages to the repo. We'll distill, from these detailed instructions, 3 simple steps to follow to build and contribute your Python package to Conda Forge.
- Fork the Conda Forge examples recipes repository at GitHub
- Tailor the example recipe for your package
- Submit a pull request to Conda Forge
Let's get into it!
1. Fork the Conda Forge examples repository at GitHub
Before we head over to GitHub, be sure that the source code for your package is available for download in a single file. This should be an archive (e.g. tarball, zip, etc …) of some kind. In this case, as we're packaging spotipy from PyPI, we can confirm that the source code is, indeed, available in a gzipped tarball there. You should confirm, similarly, for the package you plan to build and contribute.
We'll be working in GitHub for this project. If you've never used GitHub before, or need a brief refresher, here's the great Hello World! documentation they offer.
Okay! Pop over to GitHub and open the 'staged-recipes' repository in GitHub Desktop. This will clone the repo to your local machine.

In GitHub Desktop, within your newly forked repo, create a new branch from the staged-recipes master branch. Name the new branch as is sensible for your package. In this case, I named the branch 'spotipy'. To create a new branch, simply click the 'master' branch button and type the name of the new branch into the 'Filter' text field.

Now we'll create a new folder in the 'recipes' folder, and copy the example recipe into the new folder. To do this, open the files in the repository in your operating system's file browser.

The newly opened window will look something like this, depending on your operating system (this is Windows 10)

Navigate to the 'recipes' folder (highlighted above) and create a copy of the 'example' folder (CTRL + drag the folder in Windows 10), then rename it to reflect your package name. NB this is an important step - don't just create a folder, copy and rename it so that the example 'meta.yaml' file is copied also.

Within your newly created folder, open the meta.yaml
file in your favorite text editor, and …
2. Tailor the example recipe for your package
Conda Forge recipes are written in the YAML (YAML Ain't Markup Language) data serialization language. If you've not written YAML before, or need a brief refresher, here's a great reference. The copy of the example recipe meta.yaml
file looks like this
Again, the good people at Conda Forge provide very detailed instructions on how to edit this file - both in the file itself, and in their documentation. Another vital bit of documentation is provided by conda here. Let's save you the hassle of reading through all of this at the start, though I found myself referencing these docs frequently throughout the process and strongly suggest you come back and do the same.
Pro tip: use the Python 'grayskull' package to automatically create a near-perfect meta.yaml file!
I spent ages the first time I did this manually editing the meta.yaml
file for my package and iteratively submitting it via a pull request. It turns out that all of that brain damage can be avoided by using the Python packaging tools provided by conda
to generate this file. The documentation provided by the Conda team is helpful, though the approach outlined here (i.e. using conda skeleton
) did not work for me because, I learned after much banging of my head against the keyboard, conda skeleton
needs a major refactoring.
Enter grayskull
, which will eventually replace conda skeleton
. Grayskull is a tool to generate concise recipes for Conda Forge - precisely the task at hand! We won't go through the process of creating a new conda environment here. Simply install grayskull from conda forge in a new conda environment as follows:
conda install -c conda-forge grayskull
With grayskull installed, use it to build a meta.yaml
(recipe) for your package, by calling it, passing it the repository in which the package presently lives, and the name of the package. In our case:
grayskull pypi spotipy
Upon a successful build, you'll see something like this …

… and there will be a newly created folder with the name of your package. Inside this folder, you'll find a near-perfect recipe to inform your tailoring of the meta.yaml
file you've copied in your local clone of the repo above.
At this point, you could either copy and paste the file into your clone of the repo, overwriting the example file above, or edit the example file down using this information as inputs. I suggest the latter, as the recipe that grayskull
creates isn't quite perfect and will likely be rejected during the pull request process without some edits. Importantly, in this case, Conda Forge requires minimum version limits for Python because we're building to the 'noarch' (i.e. operating system non-specific) target. The edits are simple enough, though. In the 'requirements' section of the YAML file, add minimum versions to the python
lines for both host
and run
.
… annnnnnnd we're donezo! The final, pull request-ready recipe meta.yaml
file for the spotipy package is as follows.
Note the subtle differences between this file and the one generated by grayskull
. Again, I recommend editing the file using the output of grayskull
rather than copying and pasting to avoid potential issues during the pull request process.
3. Submit a pull request to Conda Forge
Rightho! We're almost finished. All that remains is to submit a pull request to the maintainers of Conda Forge. Before we do, we'll need to commit the changes we've made in our local clone of the GitHub.

Now we'll push the commit back to GitHub.

Excellent! Now it's time to submit our pull request.
They have a really dope continuous integration (CI) pipeline that automates most of the pull request process! A reviewer from the Conda Forge team will, ultimately, review and approve the request, but there is a heap of great feedback from their automated CI that really speeds the process up.
Important note: do not fully submit a pull request for the spotipy package if you've been hacking-along with this article, as it will be rejected as a duplicate (of the package I've already contributed).
When we select "Pull request" from the GitHub desktop app, we're returned to the GitHub website to complete the process. Click the 'Pull request' link at the top right of the repo.

The next screen shows you compared changes between the original repo we cloned, and the edits we've made.

Click the green "Create pull request" button. In the next screen, provide an informative title for the pull request and work carefully through each item of the checklist to confirm that all of the requirements are met.


Once the pull request is submitted, the aforementioned slick CI process is kicked off and works through 5 automated steps. These include linting, checking the package builds, then checking that the package builds for each of 3 target operating systems (Linux, OSX and Windows). When these checks successfully complete, you'll be notified. (This process can take as much as half an hour or so … be patient!)

Now … we play the waiting game until we hear back from the maintainers with any additional feedback …
In a little while (a few hours, in this case) the maintainers will respond with either feedback or, if all went well, confirmation that the pull request was merged. Congratulations - you are now a package maintainer for Conda Forge! You'll receive an email invitation to the Conda Forge organization on GitHub, which must be accepted within 7 days. After accepting, the automated CI process will build the package so that it is available at Conda Forge. Here's the spotipy package.

We can also run a quick search from the shell to confirm that the package is available to install via conda
conda search -c conda-forge spotipy

… and that's it! I hope you've found this both interesting and accelerating as you make your way into the wonderful world of Python packaging for Conda Forge!
What the huh!? Problems I encountered along the way …
This article comes together in a way that suggests graceful ease of development. As usual, that was certainly not the case. Here are a few things I learned by banging my head against the keyboard during the process of packaging spotipy
for Conda Forge.
grayskull
would have saved me heaps of time that was otherwise spent digging through source code trying to figure out how to edit themeta.yaml
file for the package recipe. In addition to digging through the source code, I spent entirely too much time trying to parse the PKG-INFO file it contains into a recipe for Conda Forge. It worked, and I managed to get it packaged, but OMGgrayskull
FTW! This article would have been a much longer, more tedious piece without this learning.- The documentation provided by the Conda team on packaging for Conda is outdated and doesn't work. My first foray into packaging was a frustrating failure as a result, leaving hours spent without the outcome I expected (though I learned a lot). I even ended up posting to the anaconda.org mailing list looking for help, which never came. Perhaps one day, if I muster the motivation, I'll edit the docs and submit a pull request to update the broken bits.
- As always, reading the documentation is particularly helpful. Despite the hackalong provided in this article, I strongly recommend reading the docs I've linked to in order to better understand the packaging process.
Screencast forthcoming!
Stay tuned for a video walkthrough of this article!
27 Feb 2021 8:18pm GMT
Andre Roberge: Friendly-traceback: testing with Real Python
Real Python is an excellent learning resource for beginning and intermediate Python programmers that want to learn more about various Python related topics. Most of the resources of RealPython are behind a paywall, but there are many articles available for free. One of the free articles, Invalid Syntax in Python: Common Reasons for SyntaxError, is a good overview of possible causes of syntax errors when using Python. The Real Python article shows code raising exceptions due to syntax errors and provides some explanation for each case.
In this blog post, I reproduce the cases covered in the Real Python article and show the information provided by Friendly-traceback. Ideally, you should read this blog post side by side with the Real Python article, as I mostly focus on showing screen captures, with very little added explanation or background.
If you want to follow along using Friendly-traceback, make sure that you use version 0.2.34 or newer.
Missing comma: first example from the article
Misusing the Assignment Operator (=)
Friendly traceback provides a "hint" right after the traceback. We can get more information by asking why().
Misspelling, Missing, or Misusing Python Keywords
Missing Parentheses, Brackets, and Quotes
Mistaking Dictionary Syntax
Using the Wrong Indentation
Defining and Calling Functions
Changing Python Versions
Last example: TypeError result of a syntax error.
There is more ...
27 Feb 2021 11:51am GMT
Andre Roberge: Friendly-traceback: testing with Real Python
Real Python is an excellent learning resource for beginning and intermediate Python programmers that want to learn more about various Python related topics. Most of the resources of RealPython are behind a paywall, but there are many articles available for free. One of the free articles, Invalid Syntax in Python: Common Reasons for SyntaxError, is a good overview of possible causes of syntax errors when using Python. The Real Python article shows code raising exceptions due to syntax errors and provides some explanation for each case.
In this blog post, I reproduce the cases covered in the Real Python article and show the information provided by Friendly-traceback. Ideally, you should read this blog post side by side with the Real Python article, as I mostly focus on showing screen captures, with very little added explanation or background.
If you want to follow along using Friendly-traceback, make sure that you use version 0.2.34 or newer.
Missing comma: first example from the article
Misusing the Assignment Operator (=)
Friendly traceback provides a "hint" right after the traceback. We can get more information by asking why().
Misspelling, Missing, or Misusing Python Keywords
Missing Parentheses, Brackets, and Quotes
Mistaking Dictionary Syntax
Using the Wrong Indentation
Defining and Calling Functions
Changing Python Versions
Last example: TypeError result of a syntax error.
There is more ...
27 Feb 2021 11:51am GMT
Cusy: New: Pattern Matching in Python 3.10
27 Feb 2021 10:04am GMT
Cusy: New: Pattern Matching in Python 3.10
27 Feb 2021 10:04am GMT
Fabio Zadrozny: PyDev 8.2.0 released (external linters, Flake8, path mappings, ...)
PyDev 8.2.0 is now available for download.
This release has many improvements for dealing with external linters.
The main ones are the inclusion of support for the Flake8 linter as well as using a single linter call for analyzing a directory, so, that should be much faster now (previously it called external linters once for each file) .
Note: to request code analysis for all the contents below a folder, right-click it and choose PyDev > Code analysis:
Another change is that comments are now added to the line indentation...
This means that some code as:
def method():
if True:
pass
Will become:
def method():
# if True:
# pass
p.s.: it's possible to revert to the old behavior by changing the preferences at PyDev > Editor > Code Style > Comments.
Also note that after some feedback, on the next release an option to format such as the code below will also be added (and will probably be made the default):
def method():
# if True:
# pass
Interpreter configuration also got a revamp:
So, it's possible to set a given interpreter to be the default one and if you work with conda, select Choose from Conda to select one of your conda environments and configure it in PyDev.
Path mappings for remote debugging can now (finally) be configured from within PyDev itself, so, changing environment variables is no longer needed for that:
Note that Add path mappings template entry may be clicked multiple times to add multiple entries.
That's it... More details may be found at: http://pydev.org.
Hope you enjoy the release 😊
27 Feb 2021 5:44am GMT
Fabio Zadrozny: PyDev 8.2.0 released (external linters, Flake8, path mappings, ...)
PyDev 8.2.0 is now available for download.
This release has many improvements for dealing with external linters.
The main ones are the inclusion of support for the Flake8 linter as well as using a single linter call for analyzing a directory, so, that should be much faster now (previously it called external linters once for each file) .
Note: to request code analysis for all the contents below a folder, right-click it and choose PyDev > Code analysis:
Another change is that comments are now added to the line indentation...
This means that some code as:
def method():
if True:
pass
Will become:
def method():
# if True:
# pass
p.s.: it's possible to revert to the old behavior by changing the preferences at PyDev > Editor > Code Style > Comments.
Also note that after some feedback, on the next release an option to format such as the code below will also be added (and will probably be made the default):
def method():
# if True:
# pass
Interpreter configuration also got a revamp:
So, it's possible to set a given interpreter to be the default one and if you work with conda, select Choose from Conda to select one of your conda environments and configure it in PyDev.
Path mappings for remote debugging can now (finally) be configured from within PyDev itself, so, changing environment variables is no longer needed for that:
Note that Add path mappings template entry may be clicked multiple times to add multiple entries.
That's it... More details may be found at: http://pydev.org.
Hope you enjoy the release 😊
27 Feb 2021 5:44am GMT
26 Feb 2021
Planet Python
Peter Bengtsson: How MDN's site-search works
tl;dr; Periodically, the whole of MDN is built, by our Node code, in a GitHub Action. A Python script bulk-publishes this to Elasticsearch. Our Django server queries the same Elasticsearch via /api/v1/search
. The site-search page is a static single-page app that sends XHR requests to the /api/v1/search
endpoint. Search results' sort-order is determined by match and "popularity".
Jamstack'ing
The challenge with "Jamstack" websites is with data that is too vast and dynamic that it doesn't make sense to build statically. Search is one of those. For the record, as of Feb 2021, MDN consists of 11,619 documents (aka. articles) in English. Roughly another 40,000 translated documents. In English alone, there are 5.3 million words. So to build a good search experience we need to, as a static site build side-effect, index all of this in a full-text search database. And Elasticsearch is one such database and it's good. In particular, Elasticsearch is something MDN is already quite familiar with because it's what was used from within the Django app when MDN was a wiki.
Note: MDN gets about 20k site-searches per day from within the site.
Build
When we build the whole site, it's a script that basically loops over all the raw content, applies macros and fixes, dumps one index.html
(via React server-side rendering) and one index.json
. The index.json
contains all the fully rendered text (as HTML!) in blocks of "prose". It looks something like this:
{ "doc": { "title": "DOCUMENT TITLE", "summary": "DOCUMENT SUMMARY", "body": [ { "type": "prose", "value": { "id": "introduction", "title": "INTRODUCTION", "content": "<p>FIRST BLOCK OF TEXTS</p>" } }, ... ], "popularity": 0.12345, ... }
You can see one here: /en-US/docs/Web/index.json
Indexing
Next, after all the index.json
files have been produced, a Python script takes over and it traverses all the index.json
files and based on that structure it figures out the, title, summary, and the whole body (as HTML).
Next up, before sending this into the bulk-publisher in Elasticsearch it strips the HTML. It's a bit more than just turning <p>Some <em>cool</em> text.</p>
to Some cool text.
because it also cleans up things like <div class="hidden">
and certain <div class="notecard warning">
blocks.
One thing worth noting is that this whole thing runs roughly every 24 hours and then it builds everything. But what if, between two runs, a certain page has been removed (or moved), how do you remove what was previously added to Elasticsearch? The solution is simple: it deletes and re-creates the index from scratch every day. The whole bulk-publish takes a while so right after the index has been deleted, the searches won't be that great. Someone could be unlucky in that they're searching MDN a couple of seconds after the index was deleted and now waiting for it to build up again.
It's an unfortunate reality but it's a risk worth taking for the sake of simplicity. Also, most people are searching for things in English and specifically the Web/
tree so the bulk-publishing is done in a way the most popular content is bulk-published first and the rest was done after. Here's what the build output logs:
Found 50,461 (potential) documents to index Deleting any possible existing index and creating a new one called mdn_docs Took 3m 35s to index 50,362 documents. Approximately 234.1 docs/second Counts per priority prefixes: en-us/docs/web 9,056 *rest* 41,306
So, yes, for 3m 35s there's stuff missing from the index and some unlucky few will get fewer search results than they should. But we can optimize this in the future.
Searching
The way you connect to Elasticsearch is simply by a URL it looks something like this:
https://USER:PASSWD@HASH.us-west-2.aws.found.io:9243
It's an Elasticsearch cluster managed by Elastic running inside AWS. Our job is to make sure that we put the exact same URL in our GitHub Action ("the writer") as we put it into our Django server ("the reader").
In fact, we have 3 Elastic clusters: Prod, Stage, Dev.
And we have 2 Django servers: Prod, Stage.
So we just need to carefully make sure the secrets are set correctly to match the right environment.
Now, in the Django server, we just need to convert a request like GET /api/v1/search?q=foo&locale=fr
(for example) to a query to send to Elasticsearch. We have a simple Django view function that validates the query string parameters, does some rate-limiting, creates a query (using elasticsearch-dsl
) and packages the Elasticsearch results back to JSON.
How we make that query is important. In here lies the most important feature of the search; how it sorts results.
In one simple explanation, the sort order is a combination of popularity and "matchness". The assumption is that most people want the popular content. I.e. they search for foreach
and mean to go to /en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/forEach
not /en-US/docs/Web/API/NodeList/forEach
both of which contains forEach
in the title. The "popularity" is based on Google Analytics pageviews which we download periodically, normalize into a floating-point number between 1 and 0. At the of writing the scoring function does something like this:
rank = doc.popularity * 10 + search.score
This seems to produce pretty reasonable results.
But there's more to the "matchness" too. Elasticsearch has its own API for defining boosting and the way we apply is:
- match phrase in the
title
: Boost = 10.0 - match phrase in the
body
: Boost = 5.0 - match in
title
: Boost = 2.0 - match in
body
: Boost = 1.0
This is then applied on top of whatever else Elasticsearch does such as "Term Frequency" and "Inverse Document Frequency" (tf and if). This article is a helpful introduction.
We're most likely not done with this. There's probably a lot more we can do to tune this myriad of knobs and sliders to get the best possible ranking of documents that match.
Web UI
The last piece of the puzzle is how we display all of this to the user. The way it works is that developer.mozilla.org/$locale/search
returns a static page that is blank. As soon as the page has loaded, it lazy-loads JavaScript that can actually issue the XHR request to get and display search results. The code looks something like this:
function SearchResults() { const [searchParams] = useSearchParams(); const sp = createSearchParams(searchParams); // add defaults and stuff here const fetchURL = `/api/v1/search?${sp.toString()}`; const { data, error } = useSWR( fetchURL, async (url) => { const response = await fetch(URL); // various checks on the response.statusCode here return await response.json(); } ); // render 'data' or 'error' accordingly here
A lot of interesting details are omitted from this code snippet. You have to check it out for yourself to get a more up-to-date insight into how it actually works. But basically, the window.location
(and pushState
) query string drives the fetch()
call and then all the component has to do is display the search results with some highlighting.
The /api/v1/search
endpoint also runs a suggestion query as part of the main search query. This extracts out interest alternative search queries. These are filtered and scored and we issue "sub-queries" just to get a count for each. Now we can do one of those "Did you mean...". For example: search for intersections
.
In conclusion
There are a lot of interesting, important, and careful details that are glossed over here in this blog post. It's a constantly evolving system and we're constantly trying to improve and perfect the system in a way that it fits what users expect.
A lot of people reach MDN via a Google search (e.g. mdn array foreach
) but despite that, nearly 5% of all traffic on MDN is the site-search functionality. The /$locale/search?...
endpoint is the most frequently viewed page of all of MDN. And having a good search engine that's reliable is nevertheless important. By owning and controlling the whole pipeline allows us to do specific things that are unique to MDN that other websites don't need. For example, we index a lot of raw HTML (e.g. <video>
) and we have code snippets that needs to be searchable.
Hopefully, the MDN site-search will elevate from being known to be very limited to something now that can genuinely help people get to the exact page better than Google can. Yes, it's worth aiming high!
26 Feb 2021 10:02pm GMT
Peter Bengtsson: How MDN's site-search works
tl;dr; Periodically, the whole of MDN is built, by our Node code, in a GitHub Action. A Python script bulk-publishes this to Elasticsearch. Our Django server queries the same Elasticsearch via /api/v1/search
. The site-search page is a static single-page app that sends XHR requests to the /api/v1/search
endpoint. Search results' sort-order is determined by match and "popularity".
Jamstack'ing
The challenge with "Jamstack" websites is with data that is too vast and dynamic that it doesn't make sense to build statically. Search is one of those. For the record, as of Feb 2021, MDN consists of 11,619 documents (aka. articles) in English. Roughly another 40,000 translated documents. In English alone, there are 5.3 million words. So to build a good search experience we need to, as a static site build side-effect, index all of this in a full-text search database. And Elasticsearch is one such database and it's good. In particular, Elasticsearch is something MDN is already quite familiar with because it's what was used from within the Django app when MDN was a wiki.
Note: MDN gets about 20k site-searches per day from within the site.
Build
When we build the whole site, it's a script that basically loops over all the raw content, applies macros and fixes, dumps one index.html
(via React server-side rendering) and one index.json
. The index.json
contains all the fully rendered text (as HTML!) in blocks of "prose". It looks something like this:
{ "doc": { "title": "DOCUMENT TITLE", "summary": "DOCUMENT SUMMARY", "body": [ { "type": "prose", "value": { "id": "introduction", "title": "INTRODUCTION", "content": "<p>FIRST BLOCK OF TEXTS</p>" } }, ... ], "popularity": 0.12345, ... }
You can see one here: /en-US/docs/Web/index.json
Indexing
Next, after all the index.json
files have been produced, a Python script takes over and it traverses all the index.json
files and based on that structure it figures out the, title, summary, and the whole body (as HTML).
Next up, before sending this into the bulk-publisher in Elasticsearch it strips the HTML. It's a bit more than just turning <p>Some <em>cool</em> text.</p>
to Some cool text.
because it also cleans up things like <div class="hidden">
and certain <div class="notecard warning">
blocks.
One thing worth noting is that this whole thing runs roughly every 24 hours and then it builds everything. But what if, between two runs, a certain page has been removed (or moved), how do you remove what was previously added to Elasticsearch? The solution is simple: it deletes and re-creates the index from scratch every day. The whole bulk-publish takes a while so right after the index has been deleted, the searches won't be that great. Someone could be unlucky in that they're searching MDN a couple of seconds after the index was deleted and now waiting for it to build up again.
It's an unfortunate reality but it's a risk worth taking for the sake of simplicity. Also, most people are searching for things in English and specifically the Web/
tree so the bulk-publishing is done in a way the most popular content is bulk-published first and the rest was done after. Here's what the build output logs:
Found 50,461 (potential) documents to index Deleting any possible existing index and creating a new one called mdn_docs Took 3m 35s to index 50,362 documents. Approximately 234.1 docs/second Counts per priority prefixes: en-us/docs/web 9,056 *rest* 41,306
So, yes, for 3m 35s there's stuff missing from the index and some unlucky few will get fewer search results than they should. But we can optimize this in the future.
Searching
The way you connect to Elasticsearch is simply by a URL it looks something like this:
https://USER:PASSWD@HASH.us-west-2.aws.found.io:9243
It's an Elasticsearch cluster managed by Elastic running inside AWS. Our job is to make sure that we put the exact same URL in our GitHub Action ("the writer") as we put it into our Django server ("the reader").
In fact, we have 3 Elastic clusters: Prod, Stage, Dev.
And we have 2 Django servers: Prod, Stage.
So we just need to carefully make sure the secrets are set correctly to match the right environment.
Now, in the Django server, we just need to convert a request like GET /api/v1/search?q=foo&locale=fr
(for example) to a query to send to Elasticsearch. We have a simple Django view function that validates the query string parameters, does some rate-limiting, creates a query (using elasticsearch-dsl
) and packages the Elasticsearch results back to JSON.
How we make that query is important. In here lies the most important feature of the search; how it sorts results.
In one simple explanation, the sort order is a combination of popularity and "matchness". The assumption is that most people want the popular content. I.e. they search for foreach
and mean to go to /en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/forEach
not /en-US/docs/Web/API/NodeList/forEach
both of which contains forEach
in the title. The "popularity" is based on Google Analytics pageviews which we download periodically, normalize into a floating-point number between 1 and 0. At the of writing the scoring function does something like this:
rank = doc.popularity * 10 + search.score
This seems to produce pretty reasonable results.
But there's more to the "matchness" too. Elasticsearch has its own API for defining boosting and the way we apply is:
- match phrase in the
title
: Boost = 10.0 - match phrase in the
body
: Boost = 5.0 - match in
title
: Boost = 2.0 - match in
body
: Boost = 1.0
This is then applied on top of whatever else Elasticsearch does such as "Term Frequency" and "Inverse Document Frequency" (tf and if). This article is a helpful introduction.
We're most likely not done with this. There's probably a lot more we can do to tune this myriad of knobs and sliders to get the best possible ranking of documents that match.
Web UI
The last piece of the puzzle is how we display all of this to the user. The way it works is that developer.mozilla.org/$locale/search
returns a static page that is blank. As soon as the page has loaded, it lazy-loads JavaScript that can actually issue the XHR request to get and display search results. The code looks something like this:
function SearchResults() { const [searchParams] = useSearchParams(); const sp = createSearchParams(searchParams); // add defaults and stuff here const fetchURL = `/api/v1/search?${sp.toString()}`; const { data, error } = useSWR( fetchURL, async (url) => { const response = await fetch(URL); // various checks on the response.statusCode here return await response.json(); } ); // render 'data' or 'error' accordingly here
A lot of interesting details are omitted from this code snippet. You have to check it out for yourself to get a more up-to-date insight into how it actually works. But basically, the window.location
(and pushState
) query string drives the fetch()
call and then all the component has to do is display the search results with some highlighting.
The /api/v1/search
endpoint also runs a suggestion query as part of the main search query. This extracts out interest alternative search queries. These are filtered and scored and we issue "sub-queries" just to get a count for each. Now we can do one of those "Did you mean...". For example: search for intersections
.
In conclusion
There are a lot of interesting, important, and careful details that are glossed over here in this blog post. It's a constantly evolving system and we're constantly trying to improve and perfect the system in a way that it fits what users expect.
A lot of people reach MDN via a Google search (e.g. mdn array foreach
) but despite that, nearly 5% of all traffic on MDN is the site-search functionality. The /$locale/search?...
endpoint is the most frequently viewed page of all of MDN. And having a good search engine that's reliable is nevertheless important. By owning and controlling the whole pipeline allows us to do specific things that are unique to MDN that other websites don't need. For example, we index a lot of raw HTML (e.g. <video>
) and we have code snippets that needs to be searchable.
Hopefully, the MDN site-search will elevate from being known to be very limited to something now that can genuinely help people get to the exact page better than Google can. Yes, it's worth aiming high!
26 Feb 2021 10:02pm GMT
PyCharm: PyCharm and WSL
Over the past few months, I've been monitoring a ticket closely. Over the course of two years, the ticket has accrued over 130 votes. It's the one about WSL support in PyCharm, and by extension, the rest of the JetBrains IDEs. When I say it's "the one", it's because this is the probably the most famous ticket with regards to WSL in our tracker. So, the question is, why is this taking so long to implement?
The History of WSL Support
As things stand right now, WSL and WSL2 are both supported on PyCharm. However, the issue is not with the support itself, but rather how it is supported. WSL is currently supported directly via wsl.exe
. We initially used SSH and SFTP to run commands and transfer files. We needed to do this because this was the only way in which we could support WSL at the time.
There were multiple reasons for this. WSL showed tremendous promise for people who wanted to develop on open source technologies. However, we needed to make sure that we could adapt to changes in WSL. At the same time, we were dealing with technology that was not our own, and we needed to be careful about building support that would need to be re-done.
However, the biggest problem stems from a limitation of the IntelliJ platform at the time. IntelliJ expects that it is working with a real file system, and in the case of remote machines, you don't have a real file system.
This is why, we have a copy of the files on your local machine, which is then uploaded via SFTP. This means that whenever you make changes, there will be delays before you can immediately run it.
However, taking a deeper look at this, we begin to see the core of the issue, and that is we need to have a way to support remote development in a better way. By remote, I mean any remote host. This means WSL, but also includes any host on a remote machine and that we would not have to build custom implementations for things like WSL from scratch. This is why, we began working on a project called "Targets".
The Targets API
This new system provides a layer of abstraction over all remote hosts, whether it is WSL, an AWS, GCP or any other machine for that matter. Now, we use the term "remote" loosely here, because to us, a remote is anything that is not the file system or the operating system that PyCharm is running on.
This means that the way to support interpreters will also change fundamentally; it also means that there is a lot of refactoring involved.
Think of the API as a matrix. Not The Matrix, but a matrix. If you want to support a new remote, then you need to start filling out that matrix, and you need to provide answers to how the IDE will handle different scenarios. So, for example, if you wish to add direct support for Docker or WSL, you will need to fill out the entire matrix of behaviours that can be done from the IDE.
Through this approach, we can indeed pave a way for all future remote targets, but it means that the transition to this API will be gradual, as a lot of the current functionality will need to be re-written in order to take advantage of this.
This also means that when complete, cloud providers will have an easier way of adding all kinds of functionality, and editing should become as fluid as editing on the filesystem itself (or so we hope).
Progress Thus Far
Our plan is to implement the Targets API in 2021 although we're still working through a few issues that arise from the implementation. It will implement some basic things such as docker support and remote interpreters, as the year progresses, we hope to add further support for WSL and bring it on part with all other remote targets.
Transcript
Nafiul: [00:00:00] Hello, all you beautiful PyCharmers. This is Early Access PyCharm with your host Nafiul Islam. Today I sit down with three people behind our WSL support and ask them some tough questions because a lot of people really want better support for WSL on PyCharm. So let's get into it.
Ilya: [00:00:26] Well, we started to support WSL as a remote interpreter via SSH
because at the time it was the only way to support it.
Nafiul: [00:00:36] This is Ilya. He's one of the people who works on the remote interpreter team, which supports WSL in PyCharm, along with Vladimir as well as, Alex .
Ilya: [00:00:47] So user had to run open SSH server inside of WSL. And connect to each and they connect to any other remote server.
And I believe a couple of years ago, we switched to a new approach. And so users can now launch the WSL processes directly. Under the hood we run WSL.exe and provide the whole path to the Python interpreter and to this script and so on. This is how it works now.
Nafiul: [00:01:19] So Vladimir, can you just tell me how this all started?
Not the WSL part, but also about remote interpreters in general.
Vladimir: [00:01:30] So it started even before we all had joined JetBrains. The oldest commits I've seen were made at 2012. If I'm not mistaken. So, I believe it's time when it started.
Nafiul: [00:01:45] So is this something that came from the IntelJ platform or was this something that was made by the PyCharm team itself?
Vladimir: [00:01:51] No. As far as I am concerned initially it was made especially for PyCharm and just a few years ago it was moved to the whole platform.
Nafiul: [00:02:04] Okay. So something went out of PyCharm and became accepted in other IDEs. So that's pretty cool. This is not something that usually happens here at JetBrains. Usually it's IntelliJ that builds the platform. And the features just sort of end up in other IDEs.
So the question that I have is when you're using something like WSL or say Apple comes up with a, with a fancy new mechanism for virtualization. We don't know if that's ever going to happen, but essentially what is preventing us from incorporating or providing native support for something like WSL from the get-go.
Ilya: [00:02:49] Well for WSL, we have a couple of problems. The first one that all IntelliJ products are initially configured to work with local files. Even if you have your project on some remote system, you still have to store your files locally and IntelliJ product will copy them to the remote server automatically.
Nafiul: [00:03:11] And how does the sync happen?
Ilya: [00:03:13] There is a special configuration called deployment and IntelliJ monitors your files, and when files are changed, they are copied to their remote server. Or in some cases they are copied before you launch your script.
Nafiul: [00:03:28] So essentially you have to copy the whole file.
You're not changing the files themselves on the server. Like you just do a complete upload. Is that how it works?
Ilya: [00:03:37] Yes. Some products do support very limited file editing on the remote servers. As far as I know PhpStorm support, you can open one file and edit it, but the whole project should be stored on your local machine and you should use your locally installed version control and so on.
Nafiul: [00:04:00] I see. Okay. It makes sense, but explain this to me. You need to copy it back and forth, but so one of the issues that we have with WSL for example, is support for virtual environments, right? That does not seem to be limited by copying and pasting files that are being edited inside of the editor.
So what is kind of holding us back in terms of giving users that support on virtual machines or WSL or whatever.
Ilya: [00:04:31] It's more like a historical problem. We had a very different approach to use it for a virtual environment and different interpreter types. But now we are trying to unify all this things together and want to finish this job.
We should have, like you need API, which will give us ability to create a virtual environment on any interpreter type, be it a WSL or SSH or whatever.
Sasha: [00:05:01] Yes, actually Ilya said exactly what our plan plans are, as for now. There is quite a lot of differences between the local execution and local file system and local file system actions and working with files and executing files with the remote machines.
So basically now we have two different implementations for almost each feature. Like we have some extention points that are implemented differently for local machine and SSH machines. So this, I think this holds us back for some features that we are not exposing to users for remote development, like creating virtualenvs.
But generally the plan is that we are going to provide an API that allows us to use one base code for each of the feature we provide and let this feature run on local machine as well as on SSH and even on Docker or some AWS instances and so on.
Nafiul: [00:06:12] So essentially what you're saying is the reason we haven't solved this problem is because we want to solve this problem, not just for WSL, but for problems like WSL in the future as well.
So that different kinds of machines, virtual, remote… whatever it is … can be supported with a minimum level of effort instead of having to build everything from scratch over and over again. Am I correct in understanding that?
Sasha: [00:06:40] Yeah, it is quite correct.
Nafiul: [00:06:43] So how difficult is this?
Sasha: [00:06:46] As we already have a lot of source code for different type of targets that we have, like local machine, SSH, Docker.
We need to bring all this together and get a single code for each of these features and hide the differences of these targets under the API implementation. So ..
Nafiul: [00:07:11] what you're telling me is you have to change a lot of existing code, make sure that that doesn't break, unify all of that into a framework and then support all the stuff we already support.
And then you can have WSL.
Sasha: [00:07:29] I mean, then we will have some WSL features that we don't have now, because now we have a WSL support for project execution
Nafiul: [00:07:39] Yes, absolutely. But essentially what I'm saying is a lot of the features that we have right now will probably need to be reimplemented in order for everything to work and that we'll probably need to be tested.
Is that what you're telling me? Like the mother of all refactorings.
Sasha: [00:07:57] Yeah, something like that. We did a lot of refactorings for example, for SSH subsystem, I started it some time ago, I think three years ago. And then, Vladimir came to our company, joined…
Nafiul: [00:08:10] You basically made him do all the hard work. Is that what you're saying?
Sasha: [00:08:13] Yes, he made the next iteration, actually, of the refactoring. So yeah. We've got a lot of refactoring tasks and because we face new problems and sometimes it requires complete, not complete, but a general rewrite of the code. Yeah.
Nafiul: [00:08:34] Okay. That's that seems like a lot of work. So the question that I have is once this target API is done, Does that mean whenever somebody comes out with a new cloud, with a new way of doing things, with a new API, say for IBM cloud or for XYZ cloud or whatever, it will be far easier for them also to implement functionality within PyCharm.
Vladimir: [00:09:01] Yes. I believe the whole idea of targets API is to generalize infrastructure for running process, for synchronized files from some high-level syncs like virtual environments, like path interpreters and so on. So yes, we want to make a simple API that would allow various cloud companies like IBM cloud, like Amazon and so on and so on just to implement some interface about running some extra process, about synchronizing files between machines and we'll keep all the things about virtualenv and so on away from that API.
Nafiul: [00:09:50] I see, well, thank you very much, Vova, Ilya and Alexander. Thank you for answering some very tough questions and I hope to book you again soon.
Ilya: [00:09:59] Bye!
Nafiul: [00:10:00] And thank you for listening. If you want more of these podcasts, let us know on Twitter.
The post PyCharm and WSL first appeared on JetBrains Blog.
26 Feb 2021 6:27pm GMT
PyCharm: PyCharm and WSL
Over the past few months, I've been monitoring a ticket closely. Over the course of two years, the ticket has accrued over 130 votes. It's the one about WSL support in PyCharm, and by extension, the rest of the JetBrains IDEs. When I say it's "the one", it's because this is the probably the most famous ticket with regards to WSL in our tracker. So, the question is, why is this taking so long to implement?
The History of WSL Support
As things stand right now, WSL and WSL2 are both supported on PyCharm. However, the issue is not with the support itself, but rather how it is supported. WSL is currently supported directly via wsl.exe
. We initially used SSH and SFTP to run commands and transfer files. We needed to do this because this was the only way in which we could support WSL at the time.
There were multiple reasons for this. WSL showed tremendous promise for people who wanted to develop on open source technologies. However, we needed to make sure that we could adapt to changes in WSL. At the same time, we were dealing with technology that was not our own, and we needed to be careful about building support that would need to be re-done.
However, the biggest problem stems from a limitation of the IntelliJ platform at the time. IntelliJ expects that it is working with a real file system, and in the case of remote machines, you don't have a real file system.
This is why, we have a copy of the files on your local machine, which is then uploaded via SFTP. This means that whenever you make changes, there will be delays before you can immediately run it.
However, taking a deeper look at this, we begin to see the core of the issue, and that is we need to have a way to support remote development in a better way. By remote, I mean any remote host. This means WSL, but also includes any host on a remote machine and that we would not have to build custom implementations for things like WSL from scratch. This is why, we began working on a project called "Targets".
The Targets API
This new system provides a layer of abstraction over all remote hosts, whether it is WSL, an AWS, GCP or any other machine for that matter. Now, we use the term "remote" loosely here, because to us, a remote is anything that is not the file system or the operating system that PyCharm is running on.
This means that the way to support interpreters will also change fundamentally; it also means that there is a lot of refactoring involved.
Think of the API as a matrix. Not The Matrix, but a matrix. If you want to support a new remote, then you need to start filling out that matrix, and you need to provide answers to how the IDE will handle different scenarios. So, for example, if you wish to add direct support for Docker or WSL, you will need to fill out the entire matrix of behaviours that can be done from the IDE.
Through this approach, we can indeed pave a way for all future remote targets, but it means that the transition to this API will be gradual, as a lot of the current functionality will need to be re-written in order to take advantage of this.
This also means that when complete, cloud providers will have an easier way of adding all kinds of functionality, and editing should become as fluid as editing on the filesystem itself (or so we hope).
Progress Thus Far
Our plan is to implement the Targets API in 2021 although we're still working through a few issues that arise from the implementation. It will implement some basic things such as docker support and remote interpreters, as the year progresses, we hope to add further support for WSL and bring it on part with all other remote targets.
Transcript
Nafiul: [00:00:00] Hello, all you beautiful PyCharmers. This is Early Access PyCharm with your host Nafiul Islam. Today I sit down with three people behind our WSL support and ask them some tough questions because a lot of people really want better support for WSL on PyCharm. So let's get into it.
Ilya: [00:00:26] Well, we started to support WSL as a remote interpreter via SSH
because at the time it was the only way to support it.
Nafiul: [00:00:36] This is Ilya. He's one of the people who works on the remote interpreter team, which supports WSL in PyCharm, along with Vladimir as well as, Alex .
Ilya: [00:00:47] So user had to run open SSH server inside of WSL. And connect to each and they connect to any other remote server.
And I believe a couple of years ago, we switched to a new approach. And so users can now launch the WSL processes directly. Under the hood we run WSL.exe and provide the whole path to the Python interpreter and to this script and so on. This is how it works now.
Nafiul: [00:01:19] So Vladimir, can you just tell me how this all started?
Not the WSL part, but also about remote interpreters in general.
Vladimir: [00:01:30] So it started even before we all had joined JetBrains. The oldest commits I've seen were made at 2012. If I'm not mistaken. So, I believe it's time when it started.
Nafiul: [00:01:45] So is this something that came from the IntelJ platform or was this something that was made by the PyCharm team itself?
Vladimir: [00:01:51] No. As far as I am concerned initially it was made especially for PyCharm and just a few years ago it was moved to the whole platform.
Nafiul: [00:02:04] Okay. So something went out of PyCharm and became accepted in other IDEs. So that's pretty cool. This is not something that usually happens here at JetBrains. Usually it's IntelliJ that builds the platform. And the features just sort of end up in other IDEs.
So the question that I have is when you're using something like WSL or say Apple comes up with a, with a fancy new mechanism for virtualization. We don't know if that's ever going to happen, but essentially what is preventing us from incorporating or providing native support for something like WSL from the get-go.
Ilya: [00:02:49] Well for WSL, we have a couple of problems. The first one that all IntelliJ products are initially configured to work with local files. Even if you have your project on some remote system, you still have to store your files locally and IntelliJ product will copy them to the remote server automatically.
Nafiul: [00:03:11] And how does the sync happen?
Ilya: [00:03:13] There is a special configuration called deployment and IntelliJ monitors your files, and when files are changed, they are copied to their remote server. Or in some cases they are copied before you launch your script.
Nafiul: [00:03:28] So essentially you have to copy the whole file.
You're not changing the files themselves on the server. Like you just do a complete upload. Is that how it works?
Ilya: [00:03:37] Yes. Some products do support very limited file editing on the remote servers. As far as I know PhpStorm support, you can open one file and edit it, but the whole project should be stored on your local machine and you should use your locally installed version control and so on.
Nafiul: [00:04:00] I see. Okay. It makes sense, but explain this to me. You need to copy it back and forth, but so one of the issues that we have with WSL for example, is support for virtual environments, right? That does not seem to be limited by copying and pasting files that are being edited inside of the editor.
So what is kind of holding us back in terms of giving users that support on virtual machines or WSL or whatever.
Ilya: [00:04:31] It's more like a historical problem. We had a very different approach to use it for a virtual environment and different interpreter types. But now we are trying to unify all this things together and want to finish this job.
We should have, like you need API, which will give us ability to create a virtual environment on any interpreter type, be it a WSL or SSH or whatever.
Sasha: [00:05:01] Yes, actually Ilya said exactly what our plan plans are, as for now. There is quite a lot of differences between the local execution and local file system and local file system actions and working with files and executing files with the remote machines.
So basically now we have two different implementations for almost each feature. Like we have some extention points that are implemented differently for local machine and SSH machines. So this, I think this holds us back for some features that we are not exposing to users for remote development, like creating virtualenvs.
But generally the plan is that we are going to provide an API that allows us to use one base code for each of the feature we provide and let this feature run on local machine as well as on SSH and even on Docker or some AWS instances and so on.
Nafiul: [00:06:12] So essentially what you're saying is the reason we haven't solved this problem is because we want to solve this problem, not just for WSL, but for problems like WSL in the future as well.
So that different kinds of machines, virtual, remote… whatever it is … can be supported with a minimum level of effort instead of having to build everything from scratch over and over again. Am I correct in understanding that?
Sasha: [00:06:40] Yeah, it is quite correct.
Nafiul: [00:06:43] So how difficult is this?
Sasha: [00:06:46] As we already have a lot of source code for different type of targets that we have, like local machine, SSH, Docker.
We need to bring all this together and get a single code for each of these features and hide the differences of these targets under the API implementation. So ..
Nafiul: [00:07:11] what you're telling me is you have to change a lot of existing code, make sure that that doesn't break, unify all of that into a framework and then support all the stuff we already support.
And then you can have WSL.
Sasha: [00:07:29] I mean, then we will have some WSL features that we don't have now, because now we have a WSL support for project execution
Nafiul: [00:07:39] Yes, absolutely. But essentially what I'm saying is a lot of the features that we have right now will probably need to be reimplemented in order for everything to work and that we'll probably need to be tested.
Is that what you're telling me? Like the mother of all refactorings.
Sasha: [00:07:57] Yeah, something like that. We did a lot of refactorings for example, for SSH subsystem, I started it some time ago, I think three years ago. And then, Vladimir came to our company, joined…
Nafiul: [00:08:10] You basically made him do all the hard work. Is that what you're saying?
Sasha: [00:08:13] Yes, he made the next iteration, actually, of the refactoring. So yeah. We've got a lot of refactoring tasks and because we face new problems and sometimes it requires complete, not complete, but a general rewrite of the code. Yeah.
Nafiul: [00:08:34] Okay. That's that seems like a lot of work. So the question that I have is once this target API is done, Does that mean whenever somebody comes out with a new cloud, with a new way of doing things, with a new API, say for IBM cloud or for XYZ cloud or whatever, it will be far easier for them also to implement functionality within PyCharm.
Vladimir: [00:09:01] Yes. I believe the whole idea of targets API is to generalize infrastructure for running process, for synchronized files from some high-level syncs like virtual environments, like path interpreters and so on. So yes, we want to make a simple API that would allow various cloud companies like IBM cloud, like Amazon and so on and so on just to implement some interface about running some extra process, about synchronizing files between machines and we'll keep all the things about virtualenv and so on away from that API.
Nafiul: [00:09:50] I see, well, thank you very much, Vova, Ilya and Alexander. Thank you for answering some very tough questions and I hope to book you again soon.
Ilya: [00:09:59] Bye!
Nafiul: [00:10:00] And thank you for listening. If you want more of these podcasts, let us know on Twitter.
The post PyCharm and WSL first appeared on JetBrains Blog.
26 Feb 2021 6:27pm GMT
Python Software Foundation: Python Software Foundation Fellow Members for Q4 2020
It's that time of year! Let us welcome the new PSF Fellows for Q4! The following people continue to do amazing things for the Python community:
Batuhan Taskaya
Elaine Wong
Nicole Harris
Pablo Rivera
Philip James
Thank you for your continued contributions. We have added you to our Fellow roster online.
The above members help support the Python ecosystem by contributing to CPython, contributing to the PyLadies community, maintaining Python libraries, creating educational material, improving UX/UI for our infrastructure, organizing Python events and conferences, starting Python communities in local regions, and overall being great mentors in our community. Each of them continues to help make Python more accessible around the world. To learn more about the new Fellow members, check out their links above.
Let's continue to recognize Pythonistas all over the world for their impact on our community. The criteria for Fellow members is available online: https://www.python.org/psf/fellows/. If you would like to nominate someone to be a PSF Fellow, please send a description of their Python accomplishments and their email address to psf-fellow at python.org. We are accepting nominations for quarter 2 through May 20, 2021 (Q1 cut-off has already passed!).
Work Group Needs Members
The Fellow Work Group is looking for more members from all around the world! If you are a PSF Fellow and would like to help review nominations, please email us at psf-fellow at python.org. More information is available at: https://www.python.org/psf/fellows/.
26 Feb 2021 4:52pm GMT
Python Software Foundation: Python Software Foundation Fellow Members for Q4 2020
It's that time of year! Let us welcome the new PSF Fellows for Q4! The following people continue to do amazing things for the Python community:
Batuhan Taskaya
Elaine Wong
Nicole Harris
Pablo Rivera
Philip James
Thank you for your continued contributions. We have added you to our Fellow roster online.
The above members help support the Python ecosystem by contributing to CPython, contributing to the PyLadies community, maintaining Python libraries, creating educational material, improving UX/UI for our infrastructure, organizing Python events and conferences, starting Python communities in local regions, and overall being great mentors in our community. Each of them continues to help make Python more accessible around the world. To learn more about the new Fellow members, check out their links above.
Let's continue to recognize Pythonistas all over the world for their impact on our community. The criteria for Fellow members is available online: https://www.python.org/psf/fellows/. If you would like to nominate someone to be a PSF Fellow, please send a description of their Python accomplishments and their email address to psf-fellow at python.org. We are accepting nominations for quarter 2 through May 20, 2021 (Q1 cut-off has already passed!).
Work Group Needs Members
The Fellow Work Group is looking for more members from all around the world! If you are a PSF Fellow and would like to help review nominations, please email us at psf-fellow at python.org. More information is available at: https://www.python.org/psf/fellows/.
26 Feb 2021 4:52pm GMT
10 Nov 2011
Python Software Foundation | GSoC'11 Students
Benedict Stein: King Willams Town Bahnhof
Gestern musste ich morgens zur Station nach KWT um unsere Rerservierten Bustickets für die Weihnachtsferien in Capetown abzuholen. Der Bahnhof selber ist seit Dezember aus kostengründen ohne Zugverbindung - aber Translux und co - die langdistanzbusse haben dort ihre Büros.
Größere Kartenansicht
10 Nov 2011 10:57am GMT
09 Nov 2011
Python Software Foundation | GSoC'11 Students
Benedict Stein
Niemand ist besorgt um so was - mit dem Auto fährt man einfach durch, und in der City - nahe Gnobie- "ne das ist erst gefährlich wenn die Feuerwehr da ist" - 30min später auf dem Rückweg war die Feuerwehr da.
09 Nov 2011 8:25pm GMT
08 Nov 2011
Python Software Foundation | GSoC'11 Students
Benedict Stein: Brai Party
Brai = Grillabend o.ä.
Die möchte gern Techniker beim Flicken ihrer SpeakOn / Klinke Stecker Verzweigungen...
Die Damen "Mamas" der Siedlung bei der offiziellen Eröffnungsrede
Auch wenn weniger Leute da waren als erwartet, Laute Musik und viele Leute ...
Und natürlich ein Feuer mit echtem Holz zum Grillen.
08 Nov 2011 2:30pm GMT
07 Nov 2011
Python Software Foundation | GSoC'11 Students
Benedict Stein: Lumanyano Primary
One of our missions was bringing Katja's Linux Server back to her room. While doing that we saw her new decoration.
Björn, Simphiwe carried the PC to Katja's school
07 Nov 2011 2:00pm GMT
06 Nov 2011
Python Software Foundation | GSoC'11 Students
Benedict Stein: Nelisa Haircut
Today I went with Björn to Needs Camp to Visit Katja's guest family for a special Party. First of all we visited some friends of Nelisa - yeah the one I'm working with in Quigney - Katja's guest fathers sister - who did her a haircut.
African Women usually get their hair done by arranging extensions and not like Europeans just cutting some hair.
In between she looked like this...
And then she was done - looks amazing considering the amount of hair she had last week - doesn't it ?
06 Nov 2011 7:45pm GMT
05 Nov 2011
Python Software Foundation | GSoC'11 Students
Benedict Stein: Mein Samstag
Irgendwie viel mir heute auf das ich meine Blogposts mal ein bischen umstrukturieren muss - wenn ich immer nur von neuen Plätzen berichte, dann müsste ich ja eine Rundreise machen. Hier also mal ein paar Sachen aus meinem heutigen Alltag.
Erst einmal vorweg, Samstag zählt zumindest für uns Voluntäre zu den freien Tagen.
Dieses Wochenende sind nur Rommel und ich auf der Farm - Katja und Björn sind ja mittlerweile in ihren Einsatzstellen, und meine Mitbewohner Kyle und Jonathan sind zu Hause in Grahamstown - sowie auch Sipho der in Dimbaza wohnt.
Robin, die Frau von Rommel ist in Woodie Cape - schon seit Donnerstag um da ein paar Sachen zur erledigen.
Naja wie dem auch sei heute morgen haben wir uns erstmal ein gemeinsames Weetbix/Müsli Frühstück gegönnt und haben uns dann auf den Weg nach East London gemacht. 2 Sachen waren auf der Checkliste Vodacom, Ethienne (Imobilienmakler) außerdem auf dem Rückweg die fehlenden Dinge nach NeedsCamp bringen.
Nachdem wir gerade auf der Dirtroad losgefahren sind mussten wir feststellen das wir die Sachen für Needscamp und Ethienne nicht eingepackt hatten aber die Pumpe für die Wasserversorgung im Auto hatten.
Also sind wir in EastLondon ersteinmal nach Farmerama - nein nicht das onlinespiel farmville - sondern einen Laden mit ganz vielen Sachen für eine Farm - in Berea einem nördlichen Stadteil gefahren.
In Farmerama haben wir uns dann beraten lassen für einen Schnellverschluss der uns das leben mit der Pumpe leichter machen soll und außerdem eine leichtere Pumpe zur Reperatur gebracht, damit es nicht immer so ein großer Aufwand ist, wenn mal wieder das Wasser ausgegangen ist.
Fego Caffé ist in der Hemmingways Mall, dort mussten wir und PIN und PUK einer unserer Datensimcards geben lassen, da bei der PIN Abfrage leider ein zahlendreher unterlaufen ist. Naja auf jeden Fall speichern die Shops in Südafrika so sensible Daten wie eine PUK - die im Prinzip zugang zu einem gesperrten Phone verschafft.
Im Cafe hat Rommel dann ein paar online Transaktionen mit dem 3G Modem durchgeführt, welches ja jetzt wieder funktionierte - und übrigens mittlerweile in Ubuntu meinem Linuxsystem perfekt klappt.
Nebenbei bin ich nach 8ta gegangen um dort etwas über deren neue Deals zu erfahren, da wir in einigen von Hilltops Centern Internet anbieten wollen. Das Bild zeigt die Abdeckung UMTS in NeedsCamp Katjas Ort. 8ta ist ein neuer Telefonanbieter von Telkom, nachdem Vodafone sich Telkoms anteile an Vodacom gekauft hat müssen die komplett neu aufbauen.
Wir haben uns dazu entschieden mal eine kostenlose Prepaidkarte zu testen zu organisieren, denn wer weis wie genau die Karte oben ist ... Bevor man einen noch so billigen Deal für 24 Monate signed sollte man wissen obs geht.
Danach gings nach Checkers in Vincent, gesucht wurden zwei Hotplates für WoodyCape - R 129.00 eine - also ca. 12€ für eine zweigeteilte Kochplatte.
Wie man sieht im Hintergrund gibts schon Weihnachtsdeko - Anfang November und das in Südafrika bei sonnig warmen min- 25°C
Mittagessen haben wir uns bei einem Pakistanischen Curry Imbiss gegönnt - sehr empfehlenswert !
Naja und nachdem wir dann vor ner Stunde oder so zurück gekommen sind habe ich noch den Kühlschrank geputzt den ich heute morgen zum defrosten einfach nach draußen gestellt hatte. Jetzt ist der auch mal wieder sauber und ohne 3m dicke Eisschicht...
Morgen ... ja darüber werde ich gesondert berichten ... aber vermutlich erst am Montag, denn dann bin ich nochmal wieder in Quigney(East London) und habe kostenloses Internet.
05 Nov 2011 4:33pm GMT
31 Oct 2011
Python Software Foundation | GSoC'11 Students
Benedict Stein: Sterkspruit Computer Center
Sterkspruit is one of Hilltops Computer Centres in the far north of Eastern Cape. On the trip to J'burg we've used the opportunity to take a look at the centre.
Pupils in the big classroom |
The Trainer |
School in Countryside |
Adult Class in the Afternoon |
"Town" |
31 Oct 2011 4:58pm GMT
Benedict Stein: Technical Issues
What are you doing in an internet cafe if your ADSL and Faxline has been discontinued before months end. Well my idea was sitting outside and eating some ice cream.
At least it's sunny and not as rainy as on the weekend.

31 Oct 2011 3:11pm GMT
30 Oct 2011
Python Software Foundation | GSoC'11 Students
Benedict Stein: Nellis Restaurant
For those who are traveling through Zastron - there is a very nice Restaurant which is serving delicious food at reasanable prices.
In addition they're selling home made juices jams and honey.
interior |
home made specialities - the shop in the shop |
the Bar |
30 Oct 2011 4:47pm GMT
29 Oct 2011
Python Software Foundation | GSoC'11 Students
Benedict Stein: The way back from J'burg
Having the 10 - 12h trip from J'burg back to ELS I was able to take a lot of pcitures including these different roadsides
Plain Street |
Orange River in its beginngings (near Lesotho) |
Zastron Anglican Church |
The Bridge in Between "Free State" and Eastern Cape next to Zastron |
my new Background ;) |
Freeway |
getting dark |
29 Oct 2011 4:23pm GMT
28 Oct 2011
Python Software Foundation | GSoC'11 Students
Benedict Stein: Wie funktioniert eigentlich eine Baustelle ?
Klar einiges mag anders sein, vieles aber gleich - aber ein in Deutschland täglich übliches Bild einer Straßenbaustelle - wie läuft das eigentlich in Südafrika ?
Ersteinmal vorweg - NEIN keine Ureinwohner die mit den Händen graben - auch wenn hier mehr Manpower genutzt wird - sind sie fleißig mit Technologie am arbeiten.
Eine ganz normale "Bundesstraße" |
und wie sie erweitert wird |
gaaaanz viele LKWs |
denn hier wird eine Seite über einen langen Abschnitt komplett gesperrt, so das eine Ampelschaltung mit hier 45 Minuten Wartezeit entsteht |
Aber wenigstens scheinen die ihren Spaß zu haben ;) - Wie auch wir denn gücklicher Weise mussten wir nie länger als 10 min. warten. |
28 Oct 2011 4:20pm GMT