How to start a project in Python?
In this doc I will explain my approach on how to get started with a python project. The following topics will be discussed here.
- IDE
- Source Control
- Dependency Management
- Unit testing and Test Coverage
- Linting
- CI/CD pipeline
IDE Selection
For Python projects the go to IDE is usually PyCharm by IntelliJ. PyCharm is a powerful IDE custom built for python. It comes with off the shelf tools for debugging python code. Supports linting and formatting with PEP8 as well.
But many people use VSCode for its multi-language support and lightweight nature. For this I will use VSCode and lets see how we can setup VSCode for this guide.
We will see how to install extensions when time comes, for the time being let's see source control.
Source Control
For source control we use git. It is ideal to start using git from the beginning so that you track changes from early stage. You can find a guide to Git here.
When using Git, we use a file named .gitignore as a reference to the files that we do not track, for python you can use this gitignore from git hub.
You can check my Git for Beginners doc to start with Git.
Dependency Management
When we install dependencies (libraries) we use pip with python. But it is advised when we start a project to create a virtual environment before installing the libraries with pip. The main reason is that we will be installing the libraries to a location that we know and not onto the system wide python installation.
If you install dependencies with system wide installation, when you later generate the requirements.txt it will include all the libraries that are installed, including ones you have used for other projects.
For creation of virtual environment you can use venv provided by python. For installing libraries you can use pip. ( pip install 'package name').
When pip is used, the installed libraries can be exported as a requirements.txt file. This file can then be understand by someone else to install the dependencies when they run the project.
I recommend using pipenv for dependencies as it provides both creation of virtual environment and pip installation. On top of it, pipenv has a npm like (nodejs package manager) dependency tracking where it creates two files Pipfile and Pipfile.lock with more finegrained information about the libraries.
You can read about using pipenv here
Unit testing
I cannot stress the importance of unit testing in any sort of development, not just with python. It is extremely important that developers start writing unit tests along with the code or better yet before code. TDD - Test driven development is a good approach to start development with test cases.
Main steps in TDD are (from Wikipedia):
1. Add a test
2. Run all tests. The new test should fail for expected reasons
3. Write the simplest code that passes the new test
4. All tests should now pass
5. Refactor as needed, using tests after each refactor to ensure that functionality is preserved
6. Repeat whenever code changes are done
Python has a built in unit testing framework here.
Unit tests are expected to run while development, in branch merges, and before deployments. Automating unit tests would be a ideal way so that everyone involved can see the test results.
Test Coverage
Test coverage is how much your unit tests cover the code that you have written. It is understandble that 100% of your code may not be covered with unit tests but it should be higher as possible.
A higher test coverage ensures that you have tested your code and these tests are being run with every commit. Ensures code quality. Coverage is a framework you can use for this purpose.
Linting
When we write code, we need to write readable code. Code is more often read than written. While comments and the function names, variable names can help understand, the formatting of the code, spacing, tabs, new lines make the code more readable. PEP8 is Python's standard on code formatting. Using a linting tool helps to comply with this standard and highlights whereever the compliance is missing. This helps keep code coming from multiple developers that work in one code base to keep the code clean.
VSCode has an extension called pylint to help with this. If you install pylint, it should automatically start scanning the files you open and provide feedback.
CI/CD pipeline
The code or the project that we are writing will go into production at some point. When we do it, the way that we do development and delivery or deployment should be streamlined. It makes the development process and testing easier. Therefore right at the beginning of the project this needs to be thought about. CI means continuous integration which means the continuous development process and integrating the developed code with other services. Continuous delivery (or deployment) means shipping the code or delivering the software to customers.
CI/CD is not just about how you deploy things. It is philosophy that goes in development and delivery. Development targets must be vetted to decide when the new features will be available, effort estimations, testable code for testers etc.
Conclusion
Developing a software product is not just about how we write the code. All the above aspects need to be thought of prior to writing an actual code.