Pregunta

I am considering migrating from R-Studio to Python. So I am just starting Python with VS Code as my editor. My purpose is mainly to analyze data, build predictive models and make such models available via a web API.

When working in R-studio with a large data set, I can run a script to read in the data from say a CSV file, then I can explore the data using the normal statistical approaches. I only need to load the data set ONCE, which is great since loading a large data set often takes some time. And data exploration requires lots of repeated "probes" to view histograms, compare counts etc. So not having to reload the data each time I change a script or run a script to view a plot, is a great advantage.

However, using Python in VS Code, it seems to me that if I load the data frame say in a method at the top of a script, then have a method to draw a graph, then each time I run that script, the data is reloaded. So if I want to draw 20 histograms, each time changing only the title, I need to do the data reload each time I modify and run the script.

Am I missing a feature of Python and VS Code or is my summary correct?

¿Fue útil?

Solución

This is more about the IDE than the programming language. When you use RStudio, you can import the data and the IDE saves it in its memory so you don't have to reload it when you write subsequent code in that same session.

To do this in Python, consider using a Jupyter Notebook or something similar. That way, you only need to import your dataframe once, and you can then perform exploratory analysis, visualizations, and build models as needed.

Licenciado bajo: CC-BY-SA con atribución
scroll top