The following map summarized my general WorkFlow for data analysis based on Python, which is a popular community-driven and user-friendly language.
I hope the WorkFlow and Methodology beneath can also serve as a reference for other language/projects, where Python is found useful as a prototyping/demonstration tool.
Some of the basic concepts include,
Use mind map to conveniently remark and organize the outline of a project.
Content type in MindNode
Export the mind map to markdown document to extend the details on each topic.
Markdown Basics: the key formatting syntax. Markdown is also compatible to use html markup most of the time.
Containerize: balancing between system isolation and performance like a sandbox for micro-service.
docker pullcommand to download the busybox image.
docker runwhich we did using the busybox image that we downloaded. A list of running containers can be seen using the
1. Start from base image
docker pull image_name # pull public image/repository from a registry docker build [--no-cache] -t image_name path/to/Dockerfile [-f renamed-dockerfile] docker run -it # interactive --rm # rm container when exit -d # run as detached -p 8888:8888 # port fwd to host -e DISPLAY=$DISPLAY # set environment variable -u user # username/uid in image -v path/to/local:path/to/container # mount directory image_name [command] docker port container_id # show the open ports of a container instance docker start/attach/stop/rm container_id # manage a container instance docker rmi image_id # remove an image
2. Record needed ingredients in requirement.txt
While developing, record the additional python packages in a text file named
requirement.txt, which will be useful to construct the
Dockerfile to automatically configure the developing environment, as well as hosting an interactive Jupyter notebook with mybinder.
RUN sudo pip install -r requirement.txt
A few suggestions
Code Styling: PEP8
_for build-in variables
c = a/b
>2 space # 1 spaceinline comments, docstring
1. Dev with Jupyter, note issues: interactive notebook is very handy at development
2. Aggregate to python script: modularize codes into functions
More to read and adopt
3. Checkpoint scripts with git: git log the progress
Github git cheat sheet: some basic operations
When the code is ready to share,
1. Modularize function: if not done earlier
2. Unittest: remember the issues we note down during the developing? These are good cases to write up tests about. A more proactive concept is test-driven programming.
├── __init__.py ├── code.py ├── func_a.py ├── func_b.py ├── func_c.py └── tests ├── __init__.py ├── test_funcs.py └── test_something.py
3. Continuous Integration: use continuous integration to automatically test when something changes in repository.
.travis.ymlfile to config
4. Profiling & Optimization
Premature optimization is the root of all evil. – Donald Knuth
Tips from Cameron Hummels
5. Documenting: essential for future revisit or further development.
versioning: x.y.z (E.g., 0.2.3, 2.7.12, 3.6)
• change x for breaking changes • change y for non-breaking changes • change z for bug-fixes
docstring and comments tips
Following my WorkFlow, most of the work has been done at this stage. The rest can be carried out in very minimum effort with decent finish.
Example finish: use
? for keyboard shortcuts to control the slides.
1. MindNote -> Markdown: re-arrange and convert the outline mind map to markdown.
Warning: Jupyter notebook should be re-organized for presentation, especially dissertation defense! The order of work is not necessary the order of talk! Check my LSST talk for some tips.
2. Markdown -> HTML: extend the details in markdown and convert to html.
Pandoc: powerful tool for conversion.
brew install pandoc
%for frontpage info
pandoc -s --mathjax -i -t --slide-level=2 revealjs WorkFlow.md -o WorkFlow.html
3. Slideshow: HTML + reveal.js
-V revealjs-url=http://lab.hakim.se/reveal-jswhen using pandoc to convert
or download reveal.js to the same directory of the converted directory
Now it’s ready to open the html file to start the slideshow, use
? for keyboard shortcuts to control the slides.
To take the research/project to a workshop, we need to recall what we’ve done.
1. Config env.: Dockerfile
With everything done, it is now easy to put all the ingredients and recipe together into a Dockerfile.
2. Demo: Markdown + Scripts -> Jupyter notebook
Following the mind map and resulting markdown file, we can put the outline structure in Jupyter notebook since it natively supports markdown formatted cells. We can fill in the function calls and visualization codes in between.
My secret on toggling code cells
Another wheel to edit slide styles on the fly
3. Slideshow: nbconvert + reveal.js
Wrap up commands to convert notebook into slideshow
When it is ready to take the project to the public, there are a few wheels very handy to make it more appealing.
Live slideshow: add some markup in the url of the html file in repository to render the slideshow in live, not always working.
reveal html file on github: go to
Live notebook demo: binder
Everything is ready, just paste the repository link to mybinder.org
Project webpage: github.io + HTML
github.io: use any of the converted html file to set it up in 3 steps
Documentation: Read the Docs
This step depends on how often and well the project is documented. If earlier guide is followed, there is no pain at all.
sphinx-quickstart -a "Name" -p Repo -v 0.1 --ext-autodoc -q
Building test: Travis CI
Follow the manual/documentation!