<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Alex C. Viana&apos;s Blog</title><description>Personal blog</description><link>https://acviana.com/</link><item><title>Welcome, Substack Posts</title><link>https://acviana.com/posts/2026-03-27-welcome-substack-posts/</link><guid isPermaLink="true">https://acviana.com/posts/2026-03-27-welcome-substack-posts/</guid><description>Adding 5 posts from my old Substack newsletter to the blog.</description><pubDate>Fri, 27 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Back in 2020–2021 I ran a short-lived newsletter on &lt;a href=&quot;https://alexcviana.substack.com/&quot;&gt;Substack&lt;/a&gt; called &lt;em&gt;Alex&apos;s Newsletter&lt;/em&gt;. I wrote 5 posts there before life got in the way, and they&apos;ve been sitting off on their own ever since.&lt;/p&gt;
&lt;p&gt;As part of my ongoing effort to get all my writing in one place (see &lt;a href=&quot;/posts/2026-01-15-hello-astro&quot;&gt;Hello, Astro&lt;/a&gt;), I&apos;ve now migrated those 5 posts here under the new &lt;a href=&quot;/tags/source-substack&quot;&gt;&lt;code&gt;source-substack&lt;/code&gt;&lt;/a&gt; tag.&lt;/p&gt;
&lt;p&gt;The process was almost embarrassingly easy. I asked an agent to write a Python migration script that pulled the posts from the Substack RSS feed, stripped the boilerplate, downloaded the images locally, and converted everything to Markdown. The whole thing was done in under 10 minutes.&lt;/p&gt;
&lt;p&gt;Modern agents are genuinely good at this kind of task.&lt;/p&gt;
</content:encoded></item><item><title>Hello, cloud-reader</title><link>https://acviana.com/posts/2026-03-27-hello-cloud-reader/</link><guid isPermaLink="true">https://acviana.com/posts/2026-03-27-hello-cloud-reader/</guid><description>Building a personal RSS reader on the Cloudflare developer platform.</description><pubDate>Fri, 27 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I&apos;m from the generation that still complains about the death of Google Reader. So when I had some surprising quick wins using AI at work this week — wins that genuinely convinced me I could just crank something out in an evening — I decided to spend my Friday night building my own RSS reader.&lt;/p&gt;
&lt;p&gt;The result is &lt;a href=&quot;https://github.com/acviana/cloud-reader&quot;&gt;cloud-reader&lt;/a&gt;, running entirely on the Cloudflare developer platform. You can see the live deployment at &lt;a href=&quot;https://cloud-reader.alexcostaviana.workers.dev&quot;&gt;cloud-reader.alexcostaviana.workers.dev&lt;/a&gt; — though I may put it behind Cloudflare Access at some point.&lt;/p&gt;
&lt;p&gt;The stack is all Cloudflare under the hood: a Worker for the REST API, D1 (SQLite) for storage, and static assets — all in a single deployment. On top of that: Hono for routing, Drizzle ORM, React 19 + Vite for the frontend, and Cloudflare&apos;s Kumo UI components with Tailwind v4.&lt;/p&gt;
&lt;p&gt;I used the same agents + worklog approach I&apos;ve been developing on this blog. Every decision and dead end goes into a &lt;code&gt;WORKLOG.md&lt;/code&gt;, which makes it easy to reload context between sessions without losing continuity. I also broke the project into explicit milestones — and the milestones weren&apos;t about managing complexity, they were about protecting space at the beginning to carefully think through architecture and design decisions. Things like migration strategy, linting setup, and tool choices. Getting those right up front meant the project moved &lt;em&gt;faster&lt;/em&gt; once it had some maturity, not slower.&lt;/p&gt;
&lt;p&gt;Like I &lt;a href=&quot;https://x.com/AlexVianaPro/status/2037753394736062626&quot;&gt;posted on Twitter&lt;/a&gt;: in about two hours I had 60+ commits — tested, linted, type-checked, and documented. I feel like I&apos;m living in the future and way behind everyone at the same time.&lt;/p&gt;
</content:encoded></item><item><title>Hello, Astro</title><link>https://acviana.com/posts/2026-01-15-hello-astro/</link><guid isPermaLink="true">https://acviana.com/posts/2026-01-15-hello-astro/</guid><description>Migrating from Next.js to Astro</description><pubDate>Thu, 15 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I made a bunch of big changes to my blog to kick off 2026. I switched to the &lt;a href=&quot;https://astro.build/&quot;&gt;Astro&lt;/a&gt; framework with the &lt;a href=&quot;https://astro.build/themes/details/astropaper/&quot;&gt;Astropaper&lt;/a&gt; template and switched hosting providers to Cloudflare Workers (disclosure: I&apos;m currently employed at Cloudflare which recently acquired Astro).&lt;/p&gt;
&lt;p&gt;I&apos;m really happy with the update! My blog now has search, a dedicated tags page, better typography, a reading progress bar, and overall better aesthetics. You can find the source code on &lt;a href=&quot;https://github.com/acviana/cloudflare-astro-blog&quot;&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;14 years and 4 blogs in one place&lt;/h2&gt;
&lt;p&gt;This is my 4th blog going as far back as 2012. I started tech blogging on &lt;a href=&quot;https://www.tumblr.com/theothersideofthescreen-blog&quot;&gt;Tumblr&lt;/a&gt; (!!), then transitioned to the Python Pelican framework hosted on &lt;a href=&quot;http://acviana.github.io/&quot;&gt;GitHub.io&lt;/a&gt;, and my last blog was built with the Nextra framework and hosted on &lt;a href=&quot;https://vercel-nextjs-blog-rose.vercel.app/&quot;&gt;Vercel&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I started the process of collecting some of my previous writing in my last blog iteration. As part of this migration I finished that process and have now collected all my previous blog posts in one place for the first time. All my blogs posts are now tagged with &lt;code&gt;source-*&lt;/code&gt; so you can see what came from where.&lt;/p&gt;
&lt;p&gt;[Update: it&apos;s actually now 14 years and 5 blogs — I&apos;ve since added posts from my old &lt;a href=&quot;/posts/2026-03-27-welcome-substack-posts&quot;&gt;Substack newsletter&lt;/a&gt;.]&lt;/p&gt;
&lt;h2&gt;Coding with agents&lt;/h2&gt;
&lt;p&gt;This project involved a lot of boilerplate configuration and metadata tweaking. This is not the kind of thing I have a ton of time for so it was the perfect project to check out the latest generation of agents with. I wrote basically no code directly for this project, instead pushing everything to Claude Code through OpenCode.&lt;/p&gt;
&lt;p&gt;I&apos;m very happy with both the final result of the blog and the experience of working with Agents at the start of 2026. I want to write more about the experience of working with Agents on this project later - but for now I just wanted to throw up a hello world.&lt;/p&gt;
</content:encoded></item><item><title>Verifying Git Branch Histories with Diff, Log, and Blame</title><link>https://acviana.com/posts/2025-04-29-verifying-git-branch-histories/</link><guid isPermaLink="true">https://acviana.com/posts/2025-04-29-verifying-git-branch-histories/</guid><description>Rebuilding and checking a git branch</description><pubDate>Tue, 29 Apr 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I recently had to perform some &quot;git surgery&quot; to rebuild the &lt;code&gt;main&lt;/code&gt; branch of a repository I was working on. The goal was to recreate the branch with the exact same code but a corrected commit history. This gave me the opportunity to use Git in a few ways I hadn’t before. I took extensive notes to convince myself—and others—that the process worked, which became the basis for this post.&lt;/p&gt;
&lt;h2&gt;Context&lt;/h2&gt;
&lt;p&gt;The short backstory is that we had a few commits in the project trunk that needed to be removed. These problematic commits were squashed patches of diffs that had &lt;em&gt;also&lt;/em&gt; been correctly merged earlier as unsquashed merge commits. As a result, the branch technically had all the right code, but nearly all the changes were attributed to just two squashed commits. That meant the &lt;code&gt;git log&lt;/code&gt; was technically accurate, but &lt;code&gt;git blame&lt;/code&gt; became almost useless.&lt;/p&gt;
&lt;p&gt;With a combination of branching, merging, and cherry-picking, I was able to create what I believed was a clean, corrected branch. But how do you &lt;em&gt;verify&lt;/em&gt; that?&lt;/p&gt;
&lt;h2&gt;Comparing Branch Heads&lt;/h2&gt;
&lt;p&gt;The first thing I wanted to check was that the content at the branch tips was identical. Normally, when comparing files, I might use &lt;code&gt;diff&lt;/code&gt; or compare &lt;code&gt;md5&lt;/code&gt; hashes. But since we’re working within Git, we can compare the branches directly:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Check the diff against main—there should be no changes:
&amp;gt; git diff --stat develop-new..main

# Also check against a previous commit on main to confirm that a diff *would* show something
&amp;gt; git diff --stat develop-new..main~1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This obviously isn’t a new command, but previously I’d only used &lt;code&gt;git diff&lt;/code&gt; on individual files or within a single branch.&lt;/p&gt;
&lt;h2&gt;Comparing the Git Log&lt;/h2&gt;
&lt;p&gt;This part was a bit trickier. I needed to confirm that the Git history looked correct. The first step was to manually inspect the log:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# The graph should look clean and include all expected commits
&amp;gt; git log --graph --oneline --pretty=format:&quot;%h %ad %s&quot; --date=short
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I’m not sure there’s a more analytical way to verify this than just reading through and making sure the history makes sense. But there &lt;em&gt;is&lt;/em&gt; a way to be more precise when checking whether the commits we intended to exclude actually stayed out:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# These commits should only exist on main—not on the new branch
&amp;gt; git branch --contains 48d14a0
&amp;gt; git branch --contains 4915a48
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These commands scan the entire commit tree and list which branches contain the specified commits, helping confirm they weren’t accidentally included in the clean branch.&lt;/p&gt;
&lt;h2&gt;Git Blame&lt;/h2&gt;
&lt;p&gt;After confirming that the file contents were identical and the unwanted commits were removed, the last check was to ensure the blame history was corrected. I picked a random file that I knew should have a large number of different commits. As with the &lt;code&gt;git log&lt;/code&gt;, you can inspect this visually—but here, there&apos;s also a more quantitative approach:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Count the unique commits that appear in the file&apos;s blame output
&amp;gt; git blame .github/workflows/publish-image.yml | awk &apos;{print $1}&apos; | sort | uniq -c | sort -nr

&amp;gt; git blame main -- .github/workflows/publish-image.yml | awk &apos;{print $1}&apos; | sort | uniq -c | sort -nr
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This counts how many lines are attributed to each commit. The cleaned-up branch should show a more diverse history—exactly what we were aiming for.&lt;/p&gt;
&lt;p&gt;At this point, I was finally convinced we could replace the &lt;code&gt;main&lt;/code&gt; branch with this new corrected one.&lt;/p&gt;
&lt;p&gt;Thanks for reading, if you enjoyed this article you should check out this post which touches on &lt;a href=&quot;/posts/2024-09-30-weekly-update&quot;&gt;cleaning git history&lt;/a&gt;.&lt;/p&gt;
</content:encoded></item><item><title>Brian Eno&apos;s Oblique Strategies in the Terminal</title><link>https://acviana.com/posts/2025-04-01-oblique-strategies-in-the-terminal/</link><guid isPermaLink="true">https://acviana.com/posts/2025-04-01-oblique-strategies-in-the-terminal/</guid><description>Building a terminal splashscreen with Brian Eno&apos;s Oblique Strategies</description><pubDate>Tue, 01 Apr 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This post is about a fun little command-line utility I made with ChatGPT&apos;s help to randomly generate exercises from Brian Eno&apos;s &quot;&lt;a href=&quot;https://en.wikipedia.org/wiki/Oblique_Strategies&quot;&gt;Oblique Strategies&lt;/a&gt;&quot; in an aesthetically pleasing presentation. Here’s what it looks like:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/2025-04-01-oblique-strategies-terminal-screenshot.png&quot; alt=&quot;Screenshot of terminal with Oblique Strategy&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I love seeing one of these every time I open a terminal — it&apos;s like a creative spark before doing any work. It recenters me and challenges me to reconsider my assumptions and process.&lt;/p&gt;
&lt;p&gt;Here&apos;s how I built it. The script is written in Fish shell and works best in terminals with solid Unicode and font rendering support, like Kitty or Alacritty. It reads in the exercises from a source file adapted from &lt;a href=&quot;https://github.com/joelparkerhenderson/oblique-strategies&quot;&gt;this GitHub repo&lt;/a&gt;. Here is the source code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# ~/.config/fish/functions/print_random_oblique_strategy.fish

function print_random_oblique_strategy
    set strategy_file ~/Developer/oblique-strategies.txt

    if test -f $strategy_file
        set strategies (cat $strategy_file)
        set count (count $strategies)
        set index (random 1 $count)
        set strategy $strategies[$index]

        # Terminal width
        set term_width (tput cols)

        # Padding inside the box
        set padding 4
        set strategy_length (string length -- $strategy)
        set content_width (math &quot;$strategy_length + $padding&quot;)
        set left_margin (math &quot;floor(($term_width - $content_width) / 2)&quot;)

        # Box drawing characters
        set hline &quot;─&quot;
        set vline &quot;│&quot;
        set tl &quot;┌&quot;
        set tr &quot;┐&quot;
        set bl &quot;└&quot;
        set br &quot;┘&quot;

        # Construct the box
        set top (string repeat -n $left_margin &quot; &quot;)$tl(string repeat -n (math &quot;$content_width - 2&quot;) $hline)$tr
        set middle (string repeat -n $left_margin &quot; &quot;)$vline (set_color --bold)&quot;$strategy&quot;(set_color normal) $vline
        set bottom (string repeat -n $left_margin &quot; &quot;)$bl(string repeat -n (math &quot;$content_width - 2&quot;) $hline)$br

        echo
        echo (set_color yellow)&quot;🎲 Oblique Strategy:&quot;(set_color normal)
        echo
        echo $top
        echo $middle
        echo $bottom
        echo
    end
end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The script defines a function called &lt;code&gt;print_random_oblique_strategy&lt;/code&gt;, which can be run anywhere in your Fish shell. To set this up, save the function in &lt;code&gt;~/.config/fish/functions/print_random_oblique_strategy.fish&lt;/code&gt;, and make sure your strategy list is in &lt;code&gt;~/Developer/oblique-strategies.txt&lt;/code&gt;. To have it run whenever a new shell is started, I added it as a command in &lt;code&gt;~/.config/fish/config.fish&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To get a clean splash page, I needed to do two more things. First, I removed the default &lt;em&gt;&quot;Welcome to Fish...&quot;&lt;/em&gt; message by adding &lt;code&gt;set fish_greeting&lt;/code&gt; (which sets it to nothing) in the &lt;code&gt;config.fish&lt;/code&gt; file. Finally, macOS displays a &lt;em&gt;&quot;last login&quot;&lt;/em&gt; message, which you can silence by running &lt;code&gt;touch ~/.hushlogin&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you enjoy this you might also be interested in my post on &lt;a href=&quot;/posts/2023-12-08-practicing-jazz-with-the-command-line&quot;&gt;practicing jazz from the command line&lt;/a&gt;. While you&apos;re at it, check out this transcendent 6-hour &lt;a href=&quot;https://www.youtube.com/watch?v=ZWUlLHv7-64&amp;amp;t=2s&quot;&gt;&quot;time-stretched&quot; version&lt;/a&gt; of Eno&apos;s &lt;em&gt;&quot;Music for Airports&quot;&lt;/em&gt;.&lt;/p&gt;
</content:encoded></item><item><title>2024-12-09 Weekly Note - Upgrading Raspberry Pi OS</title><link>https://acviana.com/posts/2024-12-09-weekly-note/</link><guid isPermaLink="true">https://acviana.com/posts/2024-12-09-weekly-note/</guid><description>Rebuilding my Raspberry Pi, Pi-Hole, and PADD configuration from scratch.</description><pubDate>Mon, 09 Dec 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Early in the pandemic I got a Raspberry Pi and set it up to run &lt;a href=&quot;https://pi-hole.net/&quot;&gt;Pi-Hole&lt;/a&gt; as an ad blocker on my home network. In theory, it felt like a nice-to-have. But once you get used to surfing the web without walls of ads on almost every page - there’s no going back.&lt;/p&gt;
&lt;p&gt;Last weekend I decided to update the operating system on the Pi, Raspberry Pi OS (formerly Rasbian). Because I was going between major release versions this required a full reinstall and reconfiguration of my setup. But, as I’ve mentioned before, I really like rebuilding my tools up from scratch! It&apos;s more work but it helps remove old cruft and forces you to really understand how your systems work.&lt;/p&gt;
&lt;p&gt;I took some notes so I wouldn&apos;t forget what I was doing and in the end I had enough for a quick post. These notes are more of a road map than a detailed guide. I figured between existing how-to guides, forums, and LLMs you should be able to work through any system-specific issues you run into. Good luck!&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;The Task:&lt;/strong&gt; I wanted to upgrade my Rasbian OS version (now Raspberry OS) from one based on Debian 10 (”Buster”, released in 2019) to one based on Debian 12 (“Bookworm”, released in 2023). This will wipe my system so I will need to setup ad blocking on the Pi, &lt;a href=&quot;https://github.com/pi-hole/PADD&quot;&gt;PADD&lt;/a&gt; for visualization, and point my router at the Pi for DNS resolution to complete the process.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;My Setup:&lt;/strong&gt; I’m running a 4 year old &lt;a href=&quot;https://www.canakit.com/raspberry-pi-4-starter-kit.html&quot;&gt;CanaKit Raspberry Pi 4&lt;/a&gt; with a 7” touchscreen display. I basically only use the system to run Pi-Hole so I run a distro without a desktop and set it to boot to the PADD utility which gives me a nice ASCII “dashboard” view of the network and system activity.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/2024-12-09-padd-screenshot.png&quot; alt=&quot;Screenshot of PADD utility showing an ASCII based display ad blocking&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Every time I walk by my setup in the basement &lt;em&gt;&quot;it sparks joy&quot;&lt;/em&gt; as Marie Kondo would say.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Raspberry OS Upgrade:&lt;/strong&gt; Upgrading the OS requires etching a new ISO image on the MicroSD card. There are very clear warnings on the Raspberry OS site not to try and force an upgrade between versions from the CLI. I of course ignored the warnings and just tried to upgrade directly from CLI — and broke my install. &lt;code&gt;¯\_(ツ)_/¯&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;For the new OS I used the 64-bit version of Raspberry OS Lite, which does not include the desktop. Using the Pi-specific disk image etcher from Raspberry Pi you need to enable SSH for remote access and you should pick a non-default a username and password to keep the script kiddies out. From here boot from the new disk image and do the usual software updates with &lt;code&gt;apt&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;SSH Access:&lt;/strong&gt; If your machine now has a new IP address the SSH fingerprint will have changed and your next attempt to SSH will fail. Instead you&apos;ll get an error that starts with &lt;code&gt;WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED&lt;/code&gt; and contains a warning about a possible &lt;a href=&quot;https://en.wikipedia.org/wiki/Man-in-the-middle_attack&quot;&gt;man-in-the-middle attack&lt;/a&gt;. If you don’t know what this means it’s worth a few minutes of reading to help understand some of the networking steps later on.&lt;/p&gt;
&lt;p&gt;You can fix this SSH issue either by just blowing away the old key with something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;  ssh-keygen -R raspberrypi
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;or you could also just reassign the old static IP address back to your Pi in your router&apos;s admin panel.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pi-Hole and PADD:&lt;/strong&gt; Now we’ve SSH’ed into the system and we can set up the ad blocker. Both Pi-Hole and PADD install easily with &lt;code&gt;curl&lt;/code&gt; commands following the instructions in their docs. Next update the &lt;code&gt;.bashrc&lt;/code&gt; script to run PADD on login using the instructions in the repo &lt;code&gt;README&lt;/code&gt;. Finally, configure the Pi to automatically boot into your user which you can do with the &lt;code&gt;raspi-config&lt;/code&gt; CLI.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Network Configuration:&lt;/strong&gt; The last step is to configure your router to use the pi-hole as a DNS resolver and block any black-listed ad-related URLS. To do this you use your the DHCP section of your router admin panel to assign a static IP to the Raspberry Pi‘s MAC address. You then use that static IP address as the IP for the router&apos;s primary DNS resolver.&lt;/p&gt;
&lt;p&gt;If DNS and network configuration is new to you, you can ask your friendly local LLM or if you want a deeper dive I highly recommend &lt;a href=&quot;https://jvns.ca/categories/dns/&quot;&gt;Julia Evan&apos;s&lt;/a&gt; wonderful newsletters and zines on how DNS works.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;End Result:&lt;/strong&gt; You might have to reboot some combination of the router, Pi, and Pi-Hole to get everything to link up but now you should be surfing the internet with few if any 3rd party ads.&lt;/p&gt;
&lt;p&gt;Looking at the PADD display I’m usually blocking 10-30% of DNS requests!&lt;/p&gt;
</content:encoded></item><item><title>2024-11-11 Weekly Update - Python, Inverse Machine Learning, Set Cardinality</title><link>https://acviana.com/posts/2024-11-11-weekly-update/</link><guid isPermaLink="true">https://acviana.com/posts/2024-11-11-weekly-update/</guid><description>Sabbatical Week 20 - Python structural pattern matchin and package logging, uv in Docker, inverse reinforcement learning, and Cantor&apos;s theory of cardinality</description><pubDate>Mon, 11 Nov 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I took the last few weeks off of blogging to focus on some other projects. The highlight was a demo at the local AI Tinkerers meetup of the AI medical document summarization tool I&apos;ve been helping to build. This week, I’m back to blogging and I’ve got a blend of programming, machine learning, notes to share.&lt;/p&gt;
&lt;p&gt;This is a longer post, so if you need some music settle into anything by &lt;a href=&quot;https://linenoise.io/&quot;&gt;Linenoise.io&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Python Structural Pattern Matching&lt;/h3&gt;
&lt;p&gt;When I first learned about structural pattern matching in Python (i.e. &lt;code&gt;match&lt;/code&gt; and &lt;code&gt;case&lt;/code&gt; keywords) I glossed over it as equivalent to if/else statements. After reading this &lt;a href=&quot;https://realpython.com/structural-pattern-matching/#finding-practical-uses-for-pattern-matching&quot;&gt;great article from Real Python&lt;/a&gt; (more than once) I see there’s a lot more depth here. In particular I learned about deconstruction in pattern matching (similar to &lt;code&gt;*args&lt;/code&gt;), conditional pattern execution, and applications to complex nested data structures. I’m not sure how often I would reach for this in production but it’s definitely something I want to experiment with in the future.&lt;/p&gt;
&lt;h3&gt;Logging for Python Packages&lt;/h3&gt;
&lt;p&gt;The Python logging module is not the easiest API to grok, but it does have a lot of useful properties once you understand how to hold the tool. It’s also one of those tools where I’ve inherited a logging setup more than I’ve had to set it up myself.&lt;/p&gt;
&lt;p&gt;So this week I spent some time digging best practices are for instrumenting logging. The &lt;a href=&quot;https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library&quot;&gt;Python standard library docs&lt;/a&gt; are pretty rich in examples, patterns, and anti-patterns for logging. I learned a lot of things but my biggest “ah-ha moment” was understanding that logs inherit configurations (e.g. a handler) on import from whatever configuration is executed first. That means that you need to align your logger configurations with the entry point(s) to your code. So for example maybe you define a logger on your main pipeline entry and another in the module’s &lt;code&gt;if __name__ == ”__main__”:&lt;/code&gt; block if it can be run as a stand-alone script.&lt;/p&gt;
&lt;h3&gt;UV in Docker&lt;/h3&gt;
&lt;p&gt;Like most people in the Python ecosystem uv is my new go-to tool for Python dependency and environment management. First it replaced Poetry for me, and then pyenv, and so now I want to understand how to effectively use it in a Docker container. Astral’s &lt;a href=&quot;https://github.com/astral-sh/uv-docker-example&quot;&gt;example project&lt;/a&gt; as well as this article by &lt;a href=&quot;https://hynek.me/articles/docker-uv/&quot;&gt;Hynek Schlawack&lt;/a&gt; are a great place to get started. But knowing my interest in fundamentals I can tell this is going to lead me to a deeper dive on Docker.&lt;/p&gt;
&lt;p&gt;While you’re at it you should also check out this &lt;a href=&quot;https://bsky.app/profile/brohrer.bsky.social/post/3l7lkivbsia2m&quot;&gt;uv cheatsheet&lt;/a&gt; which translate commands from other tools (pip, vent, poetry) to their equivalent in uv. This framing helped some of the new commands “click” for me.&lt;/p&gt;
&lt;h3&gt;Inverse Reinforcement learning&lt;/h3&gt;
&lt;p&gt;I was talking to a local founder and they mentioned “inverse reinforcement learning” (IRL). I had spent some time this year working (unsuccessfully) on reproducing an experiment for Sutton’s “Reinforcement Learning” textbook so I felt (barely) confident enough to ask them to explain IRL. Fortunately, I was able to get it right away — studying the fundamentals, even briefly, always pays off!&lt;/p&gt;
&lt;p&gt;In traditional reinforcement learning you are trying to optimize agent behavior to optimize a known reward function, for example in a multi-armed bandit problem. In IRL it&apos;s the inverse, you are given agent behaviors and instead have to find the reward function that best explains the agent behaviors. This &lt;a href=&quot;https://thegradient.pub/learning-from-humans-what-is-inverse-reinforcement-learning/&quot;&gt;article&lt;/a&gt; has a good summary as well as citations to foundational papers in the area.&lt;/p&gt;
&lt;h3&gt;Cantor’s Theory of Cardinality&lt;/h3&gt;
&lt;p&gt;I’m enjoying this &lt;a href=&quot;https://youtu.be/9_xG0AGRa-w?si=RZvyNmB7eKzeutQr&quot;&gt;second lecture&lt;/a&gt; from the MIT open courseware on analysis. This lecture is about using bijections to compare the cardinality of sets, specifically infinite sets. It concluded with Cantor’s proof that there are more real numbers than integers. This builds nicely on the proof work I’ve been doing all year out of “The Book of Proof”, enough so that I was able to follow along while cooking dinner.&lt;/p&gt;
&lt;p&gt;Not sure other folks would enjoy this lecture, if you already know the material it’s probably too slow and if you’re not familiar with it (or planning to put in some serious study) it’s probably too fast. But it’s a well-done resource worth highlighting. Free MIT math content online -- what a time to be alive!&lt;/p&gt;
&lt;h3&gt;Odds and Ends&lt;/h3&gt;
&lt;p&gt;Updates on some books and audiobooks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I finished “The MANIAC” audiobook which I thought was fantastic. I especially loved the dark overtones and the bridge to AlphaGo in the last section of the book. It makes a great compliment to the more neutral biography “The Man from the Future: The Visionary Life of John von Neumann” which I listened to earlier this year.&lt;/li&gt;
&lt;li&gt;I’m now listening to “A Mind At Play: How Claude Shannon Invented The Information Age”, which is more, maybe I’d call it folksy in tone than ”The MANIC”. It gives a lot of insight into the transition in Electrical Engineering from essentially tinkerers and craftsman to the modern practice which is so intertwined with physics and mathematics.&lt;/li&gt;
&lt;li&gt;I’m trying to read “The Missing Billionaires” which is essentially a book-long study of the application of the Kelley Criterion as a risk management tool. I’m making glacial progress (a few pages as I nod off at night) but the book is great if you’re interested in something partway between quantitative finance and personal investing advice.&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>2024-10-07 Weekly Update - Levenshtein, Hamming, and Jaccard Distances</title><link>https://acviana.com/posts/2024-10-07-weekly-update/</link><guid isPermaLink="true">https://acviana.com/posts/2024-10-07-weekly-update/</guid><description>Sabbatical Week 15 - General metric spaces, Levenshtein, Hamming, and Jaccard distances plus links on Python 3.13, inline functions, and The MANIAC</description><pubDate>Mon, 07 Oct 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This week as part of my consulting work I’ve been working on evaluation metrics for LLM-powered text extraction. Essentially, trying to quantify if the LLM grabbed the right text from a document. This means getting hands-on with some common text comparison algorithms. These were largely already familiar to me but I hadn&apos;t had a change to use most of them on a real engineering problem before. I collected my notes and turned them into this week&apos;s post.&lt;/p&gt;
&lt;p&gt;If you need something to listen to while you read may I suggest &lt;a href=&quot;https://open.spotify.com/track/5JfTTsxXiIwQKIWy9DcEV1?si=555abf6b6c804867&quot;&gt;&quot;Something Will Happen&quot;&lt;/a&gt; by house/jazz artist Beloiz. Come for the lush melodies, stay for the encouraging voice-overs by Willem Dafoe.&lt;/p&gt;
&lt;h3&gt;String Metrics&lt;/h3&gt;
&lt;p&gt;Reference: &lt;a href=&quot;https://en.wikipedia.org/wiki/String_metric&quot;&gt;https://en.wikipedia.org/wiki/String_metric&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Formally, all the string metrics measures here are proper metric spaces in the mathematical sense. A metric space is a set $M$ together with a notion of distance $d$ between set elements that must satisfy the following four conditions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Self-Distance is Zero:&lt;/strong&gt; Distance from a point to itself is zero: $d(x,x) = 0$&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Positivity:&lt;/strong&gt; The distance function always returns a positive value for any two distinct points: $x \neq y \Rightarrow d(x,y) &amp;gt; 0$&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Symmetry:&lt;/strong&gt; The distance function is symmetric: $d(x,y) = d(y,x)$&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Triangle Inequality:&lt;/strong&gt; The following property holds: $d(x,z) \leq d(x,y) + d(y,z)$&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Levenshtein Distance&lt;/h3&gt;
&lt;p&gt;Reference: &lt;a href=&quot;https://en.wikipedia.org/wiki/Levenshtein_distance&quot;&gt;https://en.wikipedia.org/wiki/Levenshtein_distance&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. For example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;L(bat, cat) = 1

bat : 0
cat : 1 (b -&amp;gt; c)

L(bat, egg) = 3

bat : 0
eat : 1 (b -&amp;gt; e)
egt : 2 (a -&amp;gt; g)
egg : 3 (T -&amp;gt; g)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The Levenshtein distance can be normalized by dividing by the length of the longer of the two text strings. This results in a range of $[0,1]$.&lt;/p&gt;
&lt;h3&gt;Hamming Distance&lt;/h3&gt;
&lt;p&gt;Reference: &lt;a href=&quot;https://en.wikipedia.org/wiki/Hamming_distance&quot;&gt;https://en.wikipedia.org/wiki/Hamming_distance&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Unlike the Levenshtein distance the Hamming distance is only used to compare strings of the same length. As a result it only measure the number of &lt;em&gt;substitutions&lt;/em&gt;. So it is the number of positions at which the two strings are different.&lt;/p&gt;
&lt;p&gt;Similar to the Levenshtein distance, the Hamming distance can be normalized by dividing by the length of the strings.&lt;/p&gt;
&lt;h3&gt;Jaccard Distance&lt;/h3&gt;
&lt;p&gt;Reference: &lt;a href=&quot;https://en.wikipedia.org/wiki/Jaccard_index&quot;&gt;https://en.wikipedia.org/wiki/Jaccard_index&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The Jaccard &lt;em&gt;distance&lt;/em&gt; is the complement of the Jaccard &lt;em&gt;coefficient&lt;/em&gt; (or index), that is, $1 - J(A,B)$. Where the Jaccard coefficient is the ratio of the intersection of two sets over their union:&lt;/p&gt;
&lt;p&gt;$$
J(A,B) = \frac{|{A \cap B}|}{|A \cup B|}
$$&lt;/p&gt;
&lt;p&gt;Note that the Jaccard coefficient does not define a metric space because $J(A,A) = 1$ not $0$ which is why we subtract $1$ to get the Jaccard distance. For text comparisons set elements can be defined in different ways leading to different results. For example, by characters, by words, or by n-grams.&lt;/p&gt;
&lt;h3&gt;Odds and Ends&lt;/h3&gt;
&lt;p&gt;Here are some other things that have caught my interest this week.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Python 3.13 features:&lt;/strong&gt; Python 3.13 was released on 2024-10-08. I &lt;a href=&quot;https://www.acviana.com/posts/2024-09-23-weekly-update&quot;&gt;recently wrote&lt;/a&gt; about some new (to me) Python features I had missed so I was curious to check out what’s new in this release. I found a nice overview on this &lt;a href=&quot;https://realpython.com/podcasts/rpp/223/&quot;&gt;Real Python Podcast episode&lt;/a&gt; which includes links to all the relevant feature PEPs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Inline Functions:&lt;/strong&gt; In another call-back to a recent post I was talking about &lt;a href=&quot;https://www.acviana.com/posts/2024-09-09-weekly-updat%3CD-s%3E&quot;&gt;“taste” in programming&lt;/a&gt;, which I loosely defined as being something a little more granular than design patterns. This &lt;a href=&quot;http://number-none.com/blow/john_carmack_on_inlined_code.html&quot;&gt;post on inline functions&lt;/a&gt; from John Carmack (and the associated &lt;a href=&quot;https://news.ycombinator.com/item?id=41758371&quot;&gt;Hacker News discussion&lt;/a&gt;) on when to be inconsistent is exactly the kind of programming nuance I’ve been thinking about.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The MANIAC:&lt;/strong&gt; I’ve been working on this &lt;a href=&quot;https://www.penguinrandomhouse.com/books/725022/the-maniac-by-benjamin-labatut/&quot;&gt;audiobook&lt;/a&gt; about the life and work of John von Neumann. Written in a semi-fictional voice, the book covers everything from his work on the incompleteness theorem with Godel through the impact of his work on recent Nobel Prize winner Demis Hassabis and Lee Sedol&apos;s legendary Go match against AlphaGo. It&apos;s a dark, almost creepy, read on technological progress but I&apos;m still enjoying it.&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>2024-09-30 Weekly Update - Cleaning git history, blog meta-updates, and new developer line tools</title><link>https://acviana.com/posts/2024-09-30-weekly-update/</link><guid isPermaLink="true">https://acviana.com/posts/2024-09-30-weekly-update/</guid><description>Sabbatical Week 14 - Removing sensitive content from your git history, framework and title format updates for the blog, and new tools I&apos;ve started using over the last 3 months.</description><pubDate>Mon, 30 Sep 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This week&apos;s update is again focused on technical content. I&apos;ve written up some notes on clearing your git history, some updates to this blog, and some new tools I&apos;ve started using in my daily workflow. I hope you enjoy it and reach out if you want to chat about anything I&apos;ve written.&lt;/p&gt;
&lt;h3&gt;Removing Sensitive Data From Git&lt;/h3&gt;
&lt;p&gt;The other day I was working with a consulting client and trying to live-fix a bug on a call. I was going too fast and I ended up accidentally committing an API key to our repo and pushing the change. Fortunately, we&apos;re the only two people who have access to the repo, but I still wanted to properly scrub it from the repo.&lt;/p&gt;
&lt;p&gt;I haven&apos;t had to scrub a git history for a private key in a few years. Since then, the recommended tool for cleaning up a git history has gone from &lt;code&gt;git-filter-branch&lt;/code&gt; to &lt;a href=&quot;https://github.com/newren/git-filter-repo&quot;&gt;git-filter-repo&lt;/a&gt;. I had a pretty easy time finding suggestions on how to delete a word/file from the history but I had a harder time finding a command to &lt;em&gt;verify&lt;/em&gt; that the text was removed.&lt;/p&gt;
&lt;p&gt;I&apos;ve captured both in here and subbed &lt;code&gt;&amp;lt;REDACTED&amp;gt;&lt;/code&gt; for the sensitive text.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Check that the API key is in fact in the repo history
bash-3.2$ git log -S &amp;lt;REDACTED&amp;gt; main --name-only --pretty=format: | sort -u
template_notebook.ipynb

# Replace all references to the API key in all commits to ADD_API_KEY
bash-3.2$ uv run git-filter-repo --force --replace-text &amp;lt;(echo &quot;&amp;lt;REDACTED&amp;gt;==&amp;gt;ADD_API_KEY&quot;)
Parsed 22 commits
New history written in 0.03 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 27440ad Replaced unmaintained pdfminer with pdfminer.six (#11)
Enumerating objects: 76, done.
Counting objects: 100% (76/76), done.
Delta compression using up to 10 threads
Compressing objects: 100% (35/35), done.
Writing objects: 100% (76/76), done.
Total 76 (delta 42), reused 69 (delta 39), pack-reused 0 (from 0)
Completely finished after 0.10 seconds.

# Check that the API references are gone
bash-3.2$ git log -S &quot;&amp;lt;REDACTED&amp;gt;&quot; main --name-only --pretty=format: | sort -u

# Confirm that they have been replaced by ADD_API_KEY
bash-3.2$ git log -S &quot;ADD_API_KEY&quot; main --name-only --pretty=format: | sort -u
template_notebook.ipynb

# I&apos;m always scared when I run this one ...
bash-3.2$ git push origin -f main:main
Enumerating objects: 76, done.
Counting objects: 100% (76/76), done.
Delta compression using up to 10 threads
Compressing objects: 100% (32/32), done.
Writing objects: 100% (76/76), 90.13 KiB | 90.13 MiB/s, done.
Total 76 (delta 42), reused 76 (delta 42), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (42/42), done.
To &amp;lt;REDACTED&amp;gt;
 + 4da8e1f...27440ad main -&amp;gt; main (forced update)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Hopefully this is not something I need to do for a few more years but I figured it was worth capturing it in some notes for myself and others.&lt;/p&gt;
&lt;h3&gt;Blog Updates&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Version Update:&lt;/strong&gt; I updated my blog dependencies including bumping the blog framework to Nextra v3.0. This required massaging some configuration files but overall was not too bad. Most importantly, it seems to have fixed a bug where some new posts were not being added to the main post feed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Weekly Updates Titles:&lt;/strong&gt; Now that I have 2 months of weekly notes I realized that I need more descriptive titles for my posts than just the date. I&apos;ve added more description to the titles and you can now find all those posts under the &lt;a href=&quot;https://www.acviana.com/tags/weekly-update&quot;&gt;weekly-update tag&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;New (To Me) Developer Tools&lt;/h3&gt;
&lt;p&gt;At the start of this sabbatical I set up a new laptop and I used that as an excuse to try out some new developer tools. As a result, I ended up replacing a surprising number tools that were previously daily-drivers for me. Here&apos;s what&apos;s new to me:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Old&lt;/th&gt;
&lt;th&gt;New&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Terminal&lt;/td&gt;
&lt;td&gt;iTerm2&lt;/td&gt;
&lt;td&gt;Kitty&lt;/td&gt;
&lt;td&gt;GPU acceleration for LazyVim&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Terminal Prompt&lt;/td&gt;
&lt;td&gt;Tide&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Never set it up and didn&apos;t miss it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IDE&lt;/td&gt;
&lt;td&gt;SublimeText&lt;/td&gt;
&lt;td&gt;LazyVim&lt;/td&gt;
&lt;td&gt;Trying to live more in the terminal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Theme&lt;/td&gt;
&lt;td&gt;Dracula&lt;/td&gt;
&lt;td&gt;TokyoNight&lt;/td&gt;
&lt;td&gt;Time for an aesthetic change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Git&lt;/td&gt;
&lt;td&gt;CLI&lt;/td&gt;
&lt;td&gt;LazyGit&lt;/td&gt;
&lt;td&gt;Great UI and terminal-based&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python Projects&lt;/td&gt;
&lt;td&gt;Poetry&lt;/td&gt;
&lt;td&gt;Astral uv&lt;/td&gt;
&lt;td&gt;The solution we&apos;ve been waiting for&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;One of the themes in my new choices is working more in the terminal. You can see that in my moving from Sublime Text to LazyVim. That prompted me to move from iTerm2 to Kitty for performance reasons. I still love the Fish shell but went for a more stripped back prompt over my previous powerline9000 inspired configuration. Lastly, like everyone else in the Python ecosystem, I&apos;ve been joining the Astral bandwagon and moving my workflow over to uv from Poetry (and pipenv before that). I suspect I&apos;ll migrate off of pyenv in a few weeks as I get more comfortable with uv.&lt;/p&gt;
&lt;p&gt;Updates like this feel like a &quot;spring cleaning&quot; of my tech stack and are one of the reasons that I don&apos;t backup my config/dotfiles between machines. First of all, I don&apos;t tend to use tool that require a ton of configuration out of the box. But more to the point, having to configure each of my machines separately helps me discover new tools, drop tools that I don&apos;t miss, and makes sure I really understand my tools well enough to troubleshoot them.&lt;/p&gt;
</content:encoded></item><item><title>2024-09-23 Weekly Update - Python syntax, Obsidian dataview, vim, and Python notebooks</title><link>https://acviana.com/posts/2024-09-23-weekly-update/</link><guid isPermaLink="true">https://acviana.com/posts/2024-09-23-weekly-update/</guid><description>Sabbatical Week 13 - python syntax, Obsidian dataview, LazyVim configs, and python notebooks</description><pubDate>Mon, 23 Sep 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I didn&apos;t get an update out last week because I&apos;ve been heads down on some projects. But, more time working on code while practicing evergreen note taking means a lot of ready-to-go content for weekly updates.&lt;/p&gt;
&lt;h2&gt;Easier Blogging with Obsidian&apos;s Dataview&lt;/h2&gt;
&lt;p&gt;I&apos;ve been using Obsidian as my note-taking app for about a year now. I finally have so much content now that staying on top of all my threads of thinking is becoming challenging. To help with that I&apos;ve started using the popular &lt;a href=&quot;https://blacksmithgu.github.io/obsidian-dataview/&quot;&gt;dataview plug-in&lt;/a&gt; to generate indexes of files for different projects.&lt;/p&gt;
&lt;p&gt;Obsidian files are fundamentally and transparently Markdown files and dataview lets you query your file metadata using a SQL-ish dialect. For example, this dataview query I figured out this week returns a Markdown formatted list of all the Obsidian files you&apos;ve edited in the last week.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;LIST
WHERE file.mtime &amp;gt;= date(today) - dur(1 week)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is useful because looking at all the files I modified in a week is great hack to source content for these weekly updates.&lt;/p&gt;
&lt;h2&gt;New (To Me) Python Syntax&lt;/h2&gt;
&lt;p&gt;This week I came across two Python expressions I hadn&apos;t seen before, though both are several years old. The first is a shorthand to add dictionaries, the second is a way of marking positional-only arguments in a function.&lt;/p&gt;
&lt;h3&gt;Dictionary Addition&lt;/h3&gt;
&lt;p&gt;The pipe operator can be used to combine to dictionaries &lt;code&gt;|&lt;/code&gt; like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; a = {&quot;this&quot;: &quot;that&quot;}

&amp;gt;&amp;gt;&amp;gt; b = {&quot;foo&quot;: &quot;bar&quot;}

&amp;gt;&amp;gt;&amp;gt; # Regular addition with + doesn&apos;t work
... a + b
Traceback (most recent call last):
  File &quot;&amp;lt;stdin&amp;gt;&quot;, line 1, in &amp;lt;module&amp;gt;
TypeError: unsupported operand type(s) for +: &apos;dict&apos; and &apos;dict&apos;

&amp;gt;&amp;gt;&amp;gt; # Use | instead
... a | b
{&apos;this&apos;: &apos;that&apos;, &apos;foo&apos;: &apos;bar&apos;}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This was introduced in &lt;a href=&quot;https://peps.python.org/pep-0584/&quot;&gt;PEP-584 - Add Union Operators to dict&lt;/a&gt; and released in &lt;a href=&quot;https://docs.python.org/3/whatsnew/3.9.html&quot;&gt;Python 3.9&lt;/a&gt;. Previously you would have had to do something like &lt;code&gt;{**a, **b}&lt;/code&gt; (assuming you don&apos;t want to modify either dictionary).&lt;/p&gt;
&lt;h3&gt;Restricting Positional-Only Function Arguments&lt;/h3&gt;
&lt;p&gt;The second syntax is that you can use the slash symbol &lt;code&gt;/&lt;/code&gt; to denote function variables that &lt;em&gt;must&lt;/em&gt; be positional-only as opposed to keyword arguments. Here&apos;s an example.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; def a(arg_1, /, arg_2):
...     print(arg_1, arg_2)
...

&amp;gt;&amp;gt;&amp;gt; a(&quot;this&quot;, &quot;that&quot;)
this that

&amp;gt;&amp;gt;&amp;gt; # This is fine
... a(&quot;this&quot;, arg_2=&quot;that&quot;)
this that

&amp;gt;&amp;gt;&amp;gt; # This is not
... a(arg_1=&quot;this&quot;, arg_2=&quot;that&quot;)
Traceback (most recent call last):
  File &quot;&amp;lt;stdin&amp;gt;&quot;, line 1, in &amp;lt;module&amp;gt;
TypeError: a() got some positional-only arguments passed as keyword arguments: &apos;arg_1&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This was introduced in &lt;a href=&quot;https://docs.python.org/3/whatsnew/3.8.html#positional-only-parameter&quot;&gt;Python 3.8&lt;/a&gt; back in 2009 along with the &quot;walrus&quot; operator (&lt;code&gt;:=&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;These update got me thinking about my first Python version. I&apos;ve been using Python since 2008 which would put me at about Python 2.5 as my first version. It&apos;s pretty neat to watch the language continue to evolve and change after 16 years, even if I am late to the party sometimes.&lt;/p&gt;
&lt;h2&gt;My First LazyVim Configuration File&lt;/h2&gt;
&lt;p&gt;During this sabbatical I&apos;ve been seriously trying to use Vim, more precisely NeoVim (even more precisely LazyVim) as my main text editor and IDE. My big jump this week is that I finally got comfortable enough with my setup to create a configuration file. To LazyVim&apos;s credit, the defaults are so sane that I really didn&apos;t have much motivation to change anything for months.&lt;/p&gt;
&lt;p&gt;Configuration files in LazyVim are fundamentally Lua functions that return maps of the configuration settings. LazyVim&apos;s functionality is built around an ecosystem of plugins so to configure your settings you need to know 1) which plugin is controlling the setting you want to configure and then 2) the configuration schema for that particular package.&lt;/p&gt;
&lt;p&gt;This is all kind of a steep learning curve but really not that bad once you get the hang of it. For example, here is the config file I wrote to to display hidden dot files in Neotree, the sidebar tree view plugin.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;return {
  &quot;nvim-neo-tree/neo-tree.nvim&quot;,
  opts = {
    filesystem = {
      filtered_items = {
        visible = true,
        show_hidden_count = true,
        hide_dotfiles = false,
        hide_gitignored = true,
        hide_by_name = {
          -- &apos;.git&apos;,
          -- &apos;.DS_Store&apos;,
          -- &apos;thumbs.db&apos;,
        },
        never_show = {},
      },
    },
  },
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One of the things I had to figure out is that, unlike a Python module, the file name isn&apos;t special. LazyVim is opening all the Lua files in the config search path and loading all the configurations they return.&lt;/p&gt;
&lt;h2&gt;Using Editable Modules in Jupyter Notebooks&lt;/h2&gt;
&lt;p&gt;A somewhat eccentric development pattern I often find myself using is editing a Python module while &lt;em&gt;also&lt;/em&gt; importing and using that same module in an Jupyter notebook.&lt;/p&gt;
&lt;p&gt;To me, this allows you to have the data exploration tools of a notebook along with the best-practice and tooling of a normal Python project. So as I&apos;m working on a data project I might start migrating functions out of my notebook into my module when they feel done. This forces me to build useful pipelines of code but still allows me an interactive environment to explore the data outputs I&apos;m producing.&lt;/p&gt;
&lt;p&gt;To work in this way you need to tell the Jupyter server to refresh the already loaded module. If you&apos;re just doing this once you can just restart the notebook server. But if you&apos;re doing it constantly you can set a flag to force module reloading. This is one of those syntaxes I have to look up every year and just have to throw at the wall for a bit until it works so I finally decided to take some notes.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Loads the `autoreload` extension
%load_ext autoreload

# Reload all modules imported with `%aimport` every time before executing the Python code typed. Same as `%autoreload 1`
%autoreload explicit

# Reload all modules (except those excluded by `%aimport`) every time before executing the Python code typed. Same as `%autoreload 2`
%autoreload all

# List modules which are to be automatically imported or not to be imported.
%aimport

# Import modules ‘foo’, ‘bar’ and mark them to be autoreloaded for `%autoreload 1`
%aimport foo, bar
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I think this development pattern should manage to upset everyone; data scientists won&apos;t want the hassle of pushing data down to a module and developers won&apos;t want to use a notebook environment. But to me, it&apos;s the best of both worlds.&lt;/p&gt;
</content:encoded></item><item><title>2024-09-09 Weekly Update - Evergreen notes, LLMs, programming taste</title><link>https://acviana.com/posts/2024-09-09-weekly-update/</link><guid isPermaLink="true">https://acviana.com/posts/2024-09-09-weekly-update/</guid><description>Sabbatical Week 11 - evergreen notes, LLMs, and programming taste</description><pubDate>Mon, 09 Sep 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I feel like this weekly update experiment is coming along. Even though I&apos;m a little late on this week&apos;s update I still feel:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;motivated to share&lt;/li&gt;
&lt;li&gt;able to knock out some quick notes&lt;/li&gt;
&lt;li&gt;without being a perfectionist&lt;/li&gt;
&lt;li&gt;confident enough to share&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In part this is from the positive encouragement I receive every week - thank you!&lt;/p&gt;
&lt;p&gt;If you need a soundtrack while you read, may I recommend &lt;a href=&quot;https://open.spotify.com/track/7qK7Bk0ksb05zFPUv8pgag?si=46231c7421ce473e&quot;&gt;&quot;Obra Akyedzi&quot;&lt;/a&gt; an upbeat afrobeat track by Ghanaian &lt;a href=&quot;https://en.wikipedia.org/wiki/Ebo_Taylor&quot;&gt;Ebo Taylor&lt;/a&gt; with production by Ali Shaheed Muhammad (A Tribe Called Quest).&lt;/p&gt;
&lt;p&gt;Here&apos;s what I&apos;ve been up to.&lt;/p&gt;
&lt;h3&gt;Evergreen Notes&lt;/h3&gt;
&lt;p&gt;As I mentioned last week I&apos;ve been interested in Andy Matuschak&apos;s concept of &lt;a href=&quot;https://notes.andymatuschak.org/z5E5QawiXCMbtNtupvxeoEX&quot;&gt;&quot;Evergreen Notes&quot;&lt;/a&gt;. In short, he feels notes should be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;atomic&lt;/li&gt;
&lt;li&gt;concept-oriented&lt;/li&gt;
&lt;li&gt;densely linked&lt;/li&gt;
&lt;li&gt;ontological&lt;/li&gt;
&lt;li&gt;written for yourself first&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&apos;ve tried to put this into practice by regularly refining my talking points for my job interviews. It&apos;s still early, but I think I like this approach. It feels more like refining my understanding of discrete topics and not just dumping information into yet another unlinked and unread note.&lt;/p&gt;
&lt;h3&gt;LLMs and Pipelines&lt;/h3&gt;
&lt;p&gt;My main focus this week has been helping a local founder with creating a data pipeline for an LLM-based medical document summarization tool. As I mentioned last week, I&apos;m using ChatGTP, Python, and the Instructor library for most of this work. What&apos;s most striking to me so far about working with LLMs is how ... familiar it all feels.&lt;/p&gt;
&lt;p&gt;While the iteration process of prompt engineering is novel, much of the scaffolding for the project centers around familiar data engineering problems. For example, I want to make it really easy for us to run experiments and understand the results. Of course, the LLM is doing incredible and novel work, but most of the work to enable that is familiar data pipelining. This means I spend my time thinking about data schemas, validation and serialization, and function parameters.&lt;/p&gt;
&lt;p&gt;Which leads me to my next point.&lt;/p&gt;
&lt;h3&gt;Taste in Programming&lt;/h3&gt;
&lt;p&gt;Last week &lt;a href=&quot;https://blog.gitbutler.com/why-github-actually-won/&quot;&gt;this article&lt;/a&gt; was making the rounds on HackerNews by GitHub Co-founder Scott Chacon. He argued that part GitHub &quot;won&quot; because:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It was at the right place at the right time&lt;/li&gt;
&lt;li&gt;it had good taste&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You can disagree with that conclusion but I still think it&apos;s important to develop your personal sense of programming &quot;taste&quot;. By taste I don&apos;t quite mean programming techniques like TDD or language features like types. I mean something closer to your personal definition of &quot;good&quot;.&lt;/p&gt;
&lt;p&gt;Lately I&apos;ve been thinking more about my own taste in programming. In part, this is because I&apos;m writing more code that usual. More specifically, this LLM project has given me a playground where I can both set the style guide and have complete architectural control. At the same time, unlike a toy project, the codebase is starting to get big enough that I&apos;m starting to notice my own inconsistencies and eccentricities. Things like variable names, function scope, abstractions, etc.&lt;/p&gt;
&lt;p&gt;There are a lot of best practices and paradigms to choose from, but there&apos;s also a certain element of taste at play. This Medium article on &quot;&lt;a href=&quot;https://medium.com/@iftimiealexandru/data-pipeline-recipes-in-python-8561e07b2556&quot;&gt;Data pipeline recipes in Python&lt;/a&gt;&quot; gives a good sense of some of the design decisions I&apos;m talking about. But most helpful reference I have here is actually not a technical one.&lt;/p&gt;
&lt;p&gt;Instead, I&apos;ve found myself thinking about the men&apos;s fashion commentator Derek Guy (aka &lt;a href=&quot;https://x.com/dieworkwear&quot;&gt;@DieWorkWear&lt;/a&gt;). One the points he continuously makes is that &quot;good taste&quot; is about well-executed coherent ideas. If you want to look like a cowboy, or a 90&apos;s rapper, or landed gentry then your outfit should coherently convey that. (Of course you can choose the break all the rules, but this only tends to go well when you intimately understand the rules you are breaking)&lt;/p&gt;
&lt;p&gt;I think it&apos;s the same with code, you want coherent ideas well executed. As my codebase grows, I find myself asking more and more &quot;why am I doing this?&quot; and maybe more importantly &quot;is it consistent with my other decisions in this codebase?&quot;.&lt;/p&gt;
&lt;h3&gt;Odds and Ends&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;As I mentioned above Derek Guy (aka &lt;a href=&quot;https://x.com/dieworkwear&quot;&gt;@DieWorkWear&lt;/a&gt;) is obviously a great follow if you have any interest in men&apos;s fashion. But his writing is also an overall excellent example of analysis, especially in how he combines technical details (&quot;this is how a sport coat is constructed&quot;) with big-picture framing (&quot;this is the purpose of a sport coat&quot;).&lt;/li&gt;
&lt;li&gt;Here&apos;s another Hacker News post from last month I enjoyed, &lt;a href=&quot;https://emnudge.dev/blog/markov-chains-are-funny/&quot;&gt;&quot;Markov chains are Funnier than LLMs&quot;&lt;/a&gt;. I enjoyed this in part because I enjoy meta-analysis of comedy (I&apos;m really fun at parties). But, I also enjoyed the technical point that Markov chains produce distributions of outcomes (the edges of which an be inadvertently funny) but LLMs only produce the mean outcome (which is not funny).&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>2024-09-06 Weekly Update - Structured outputs for LLMs and proving the binomal theorem</title><link>https://acviana.com/posts/2024-09-06-weekly-update/</link><guid isPermaLink="true">https://acviana.com/posts/2024-09-06-weekly-update/</guid><description>Sabbatial Week 10 - structured outputs for LLMs and proving the binomial theorem</description><pubDate>Fri, 06 Sep 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This is the summary of what I was up to in the 10th week of my sabbatical. This write-up focuses on math proofs, working with LLMs, and wraps up with some odds and ends. A small update, my work on a web scrapper for a friend&apos;s company last week helped them close a customer deal!&lt;/p&gt;
&lt;p&gt;I continue to really enjoy connecting with everyone who reaches out after reading these posts. I&apos;m talking to folks about everything from career advice, open roles, kicking around data problems, and just catching up. If you&apos;re on the fence, please reach out, I would love to hear from you!&lt;/p&gt;
&lt;p&gt;If you need a soundtrack to read to, let me suggest &lt;a href=&quot;https://open.spotify.com/track/5JJRxktdvtSjN3AeITJNCs?si=5f4c0c7ed4f341e6&quot;&gt;&quot;Glass, Concrete &amp;amp; Stone&quot;&lt;/a&gt; by David Byrne. You might recognize it from the the 3rd season of &quot;The Bear&quot;.&lt;/p&gt;
&lt;h2&gt;Proving The Binomial Theorem&lt;/h2&gt;
&lt;p&gt;I finally gave in and looked up the answer to the proof of the binomial theorem I was working on in Hammack&apos;s &quot;The Book of Proof&quot;. I had the right set-up but never cracked the middle step. But even after looking at the answer I wasn&apos;t following how to make the jump.&lt;/p&gt;
&lt;p&gt;Because this is a famous proof, I felt confident asking ChatGPT for help. Still, I was genuinely surprised. Not only did ChatGTP produce what appears to be a correct proof (as far as I can tell), when asked it was also able to specifically expand on the step that was confusing me. The trick was connecting the $S_k$ and $S_{k+1}$ statements with a substitution of the form $j = k + 1$. This is is a recurring technique in these induction proofs. In this case, it allows you to change the indexes on a sum and use an identity of the binomial coefficient. You can find the transcript of the chat &lt;a href=&quot;https://chatgpt.com/share/c22a1c50-aec3-45b5-8d5a-4bfa3a40b625&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you&apos;re interested in LLMs and their ability to work on novel proofs (not just repeat famous ones) you should check out this recent talk by Terence Tao on &lt;a href=&quot;https://www.youtube.com/watch?v=e049IoFBnLA&quot;&gt;AI and Mathematics&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;The Opportunity Cost of Pure Math&lt;/h2&gt;
&lt;p&gt;I&apos;m glad I now understand this proof, but I go back and forth on if this level of rigor and completeness is a good use of my time. Should I be putting this much effort into being able to solve these proofs? Or should I be trying to get to &quot;good enough&quot; so I can cover more material? It would probably be more helpful in my career to be working on something more applied like standard data science or statistics techniques. And it would be more enjoyable to &quot;fast forward&quot; to more abstract and complex ideas like vector spaces.&lt;/p&gt;
&lt;p&gt;There is an opportunity costs to everything we do. Especially as I get older and my family life is changing, time is more precious than ever. And there is also a part of me that is &lt;em&gt;embarrassed&lt;/em&gt; that math doesn&apos;t come easier to me and that my progress is so gradual. But at the end of the day, I enjoy the work and I feel much &quot;stronger&quot; mathematically than I did 2 years ago.&lt;/p&gt;
&lt;p&gt;In practice, I do think I could be better about capping the amount of time I spend working on any one problem before getting help. I have to remind myself the point is to learn, not have a stroke of genius or reinvent math on my own.&lt;/p&gt;
&lt;h2&gt;LLMs for Document Summarization&lt;/h2&gt;
&lt;p&gt;My big project this week has been helping a pre-seed founder with some prompt engineering for an LLM-based medical summarization tool. The first thing I added to the project was to start treating each LLM run as an &lt;em&gt;experiment&lt;/em&gt;. This means that for every run we need to store not only the results but all the metadata needed to recreate the results. This means data like the model, the data inputs, the prompts, etc.&lt;/p&gt;
&lt;p&gt;Once I had that output format down I tried to compare a few results. It only took me a few minutes to realize we needed structured LLM outputs! Fortunately, I had been keeping up with LLM model tools and I know that the &lt;a href=&quot;https://python.useinstructor.com/&quot;&gt;Instructor package&lt;/a&gt;, built on top of &lt;a href=&quot;https://docs.pydantic.dev/latest/&quot;&gt;Pydantic&lt;/a&gt;, was the go-to solution for this. To quote Jason Liu, the creator of Instructor, &lt;a href=&quot;https://x.com/jxnlco/status/1832879090917061076&quot;&gt;&quot;Pydantic is all you need&quot;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I&apos;ll give a quick overview of structured outputs and why they&apos;re useful. Let&apos;s use dentistry as an example. Your typical unstructured text output from a generative LLM might look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;The patient came in on Nov. 3rd for a cleaning and we found one cavity.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The point here isn&apos;t if the summary is right or wrong, the point is that because this is an unstructured/human-readable response it&apos;s not clear how to we would assess the accuracy of this output &lt;em&gt;at scale&lt;/em&gt;. For example, if we had 1k records and 5 different prompts we wanted to try, how would be able to parse and compare all those outputs?&lt;/p&gt;
&lt;p&gt;We can get a lot closer to a scalable solution just by enforcing a structured response. For example, the above response could become:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{
 &quot;proceedure_date&quot;: &quot;2022-11-03&quot;,
 &quot;proceedure_type&quot;: &quot;cleaning&quot;,
 &quot;has_cavity&quot;: true 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This model is something we could build automated testing and scoring around! There&apos;s still a lot of complexity to be addressed but this framework gives us something stable to build against.&lt;/p&gt;
&lt;h2&gt;Odds and Ends&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The web scrapper I worked on last week helped my friend&apos;s company win a customer by creating a better search index of the customer content.&lt;/li&gt;
&lt;li&gt;I updated the node dependency on this blog which fixed some issues with recent posts appearing in the blog index (&lt;a href=&quot;https://github.com/acviana/vercel-nextjs-blog/pull/26&quot;&gt;PR&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;I started actually saving more bookmarks to the my DuckDB bookmark app. Here are some of my favorites from this week:
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://lazyvim-ambitious-devs.phillips.codes/&quot;&gt;LazyVim for Ambitious Developers&lt;/a&gt; - The missing NeoVim/LazyVim guide I&apos;ve been looking for!&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://press.princeton.edu/books/paperback/9780691158662/office-hours-with-a-geometric-group-theorist&quot;&gt;Office Hours with a Geometric Group Theorist&lt;/a&gt; - Content like this reminds me why I&apos;m working so hard on my proofs.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://notes.andymatuschak.org/About_these_notes?stackedNotes=z5E5QawiXCMbtNtupvxeoEX&quot;&gt;Andy Matuschak on Evergreen Notes&lt;/a&gt; - Useful advice on how to take notes that are actually useful for building understanding.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>2024-08-30 Weekly Update - Web scrapting and prompt engineering</title><link>https://acviana.com/posts/2024-08-30-weekly-update/</link><guid isPermaLink="true">https://acviana.com/posts/2024-08-30-weekly-update/</guid><description>Sabbatical Week 8 - some web scraping and prompt engineering</description><pubDate>Fri, 30 Aug 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This is my 3rd weekly sabbatical summary and I&apos;m happy with how this series is shaping up. Writing this week&apos;s post started to feel more like the quick and direct process I&apos;m working towards. As with past weeks, I continue to be pleasantly surprised by the amount of engagement these posts generate. This week&apos;s post is focused on some advising work I&apos;ve been doing with some other odds and ends thrown in to round things out. I hope you enjoy it.&lt;/p&gt;
&lt;p&gt;If you need a soundtrack while you read, Chance the Rapper dropped &lt;a href=&quot;https://open.spotify.com/track/51wZRATIHtYIfb0tMpp3e2?si=3aa644a36f5a46bb&quot;&gt;one more summer single&lt;/a&gt; before Labor Day.&lt;/p&gt;
&lt;h2&gt;Consulting&lt;/h2&gt;
&lt;p&gt;My most exiting update this week is that I got to do some hands-on work for two companies. I&apos;ve been advising to 1-2 companies or investors a week during my sabbatical but this is the first time I gotten to help out with a technical problem. The first was a web scraper and the second is an LLM application.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Web Scraping:&lt;/strong&gt; The first project was a one-off web scraper for a friend&apos;s company to help them gather content for a customer demo. I began my startup career working for a security company that ran a large dark web scraper and it was fun to return to that problem space for a few hours. I originally started the project in &lt;a href=&quot;https://scrapy.org/&quot;&gt;Scrapy&lt;/a&gt; which is a nice structured crawling and scraping framework. But Scrapy started to feel like overkill for this task and so I wrote a simpler script using just requests and &lt;a href=&quot;https://beautiful-soup-4.readthedocs.io/en/latest/&quot;&gt;Beautiful Soup&lt;/a&gt;. This was pretty quick work and ChatGTP helped out a lot by generating the HTML tag selectors I needed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Prompt engineering:&lt;/strong&gt; The second project I&apos;m working on is an on-going effort to help a pre-seed founder with some generative AI prompt engineering and QA validation. The use case is medical document summarization. This is a really interesting project for me. While I&apos;ve been using generative AI both locally and via ChatGPT for a nearly a year, this is my first time doing so for a commercial use-case. I&apos;m still in the early stages but already learning a lot.&lt;/p&gt;
&lt;h2&gt;Math and CS&lt;/h2&gt;
&lt;p&gt;I didn&apos;t want to completely skip working on any math this week so I&apos;m working on an induction proof of the &lt;a href=&quot;https://en.wikipedia.org/wiki/Binomial_theorem&quot;&gt;Binomial Theorem&lt;/a&gt;. I think I have the shape of the proof but I just need to figure out how to connect the dots. As always, I struggle with deciding when I&apos;ve reached the point of diminished returns of working on my own vs learning from the answer. Whenever I do wrap this up, the next problem starts a series of Fibonacci related questions which I&apos;m looking forward to.&lt;/p&gt;
&lt;p&gt;I didn&apos;t get to any LeetCode this week but that was the trade off for doing more software projects than usual.&lt;/p&gt;
&lt;h2&gt;Odds and Ends&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Bookmarking Tool:&lt;/strong&gt; I made a little bit of progress on my &lt;a href=&quot;https://github.com/acviana/bookmark-thing&quot;&gt;bookmark tool&lt;/a&gt; project, mostly small updates to the queries and started pushing data up to MotherDuck. This is basically at the point I could start using it to track links.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Neovim:&lt;/strong&gt; I&apos;ve been using Neovim casually for most of 2024. But, as I&apos;ve been &lt;a href=&quot;https://x.com/AlexVianaPro/status/1829908268929728957&quot;&gt;telling folks online&lt;/a&gt;, watching me use Neovim is like watching a caveman trying to rub two Teslas together to make fire (the follow-up joke writes itself). I&apos;m slowly getting better at learning the key mappings and installing the extensions I need but what I really need to do is rip off the band-aid and start customizing my setup. Soon!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Blog Framework:&lt;/strong&gt; I&apos;ve been using my current blog setup for about 2 years now. Which, if you have a blog, you know that means I&apos;m getting the itch to rip it out and start over (and then blog about it). So I&apos;ve spent a little time trying out different frameworks to see what else is out there. TBD, if I&apos;ll actually go through with this!&lt;/p&gt;
&lt;p&gt;That&apos;s it! Thank you for reading and if you&apos;re on the fence about reaching out for any reason, please I&apos;d love to hear from you!&lt;/p&gt;
</content:encoded></item><item><title>2024-08-23 Weekly Update - More math proofs, LeetCode, DuckDB, and networking</title><link>https://acviana.com/posts/2024-08-23-weekly-update/</link><guid isPermaLink="true">https://acviana.com/posts/2024-08-23-weekly-update/</guid><description>Sabbatical Week 7 - Some in-person events plus more proofs, LeetCode, and DuckDB</description><pubDate>Fri, 23 Aug 2024 00:00:00 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;... the most important thing is to have fun!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;- Reader feedback on my last post&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Unpopular opinion I agree with- learning should not be fun.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;- Andrej Karpathy &lt;a href=&quot;https://x.com/udayan_w/status/1824715943919886338&quot;&gt;Twitter&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Last week I wrote my &lt;a href=&quot;https://www.acviana.com/posts/2024-08-16-weekly-update&quot;&gt;first weekly update&lt;/a&gt;. My only goal was to write something up without belaboring it too much. I was pleasantly surprised to get a wide range of positive feedback from readers (I have readers!?!).&lt;/p&gt;
&lt;p&gt;This included encouragement to continue writing from colleagues ranging from investors to former coworkers. I got a thank you from someone I gave helpful career advice to years ago at a conference, connected with someone looking for career advice, and got a couple of intros of folks in my network. I&apos;m humbled and appreciative for all the engagement.&lt;/p&gt;
&lt;p&gt;This week, my goals is to see if I can start to make a habit of these updates while I&apos;m on sabbatical. I&apos;m hoping they get easier each week as I build on what I&apos;ve written before.&lt;/p&gt;
&lt;p&gt;This week I&apos;ll be talking about more math proofs, some LeetCode, my bookmark tool side project, some networking events, and my job search. If you need a soundtrack while you read, may I suggest &quot;&lt;a href=&quot;https://open.spotify.com/track/3diGyW1Q9dHoE9Qk1u4hXe?si=ec57d6a97b08411a&quot;&gt;Retreat&lt;/a&gt;&quot; a 2023 jazz/funk scorcher by Chicago-based saxophonist Isaiah Collier.&lt;/p&gt;
&lt;h2&gt;Math&lt;/h2&gt;
&lt;p&gt;One of the trade-offs I made this week is that I didn&apos;t put much time into math work.&lt;/p&gt;
&lt;p&gt;I wrapped up the proof I was working on last week which asks you to show that the &lt;a href=&quot;https://en.wikipedia.org/wiki/Harmonic_series_(mathematics)&quot;&gt;harmonic series&lt;/a&gt;, summed up to $1/2^n \geq 1 + n/2$. This in effect shows that the Harmonic series is unbounded. I got reasonably close to the solution on my own but was missing the final technique to complete the proof. Once I read the answer I got it, but I spent a little more time with the proof until I felt like the missing technique was a logical conclusion and not a flash of insight. This problem in particular, because it&apos;s about proving a bound, is a nice set up for the theory of calculus chapters which make up the last 3rd of the book. If you&apos;re interested, I was able to eventually nudge ChatGPT to the &lt;a href=&quot;https://chatgpt.com/share/2913baf6-1b56-445d-bcd2-58ce6ede57fd&quot;&gt;correct proof&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The other bit of math study I did was wrap up my notes and a few of the problems from Wasserman&apos;s &quot;&lt;a href=&quot;https://www.amazon.com/All-Statistics-Statistical-Inference-Springer/dp/1441923225&quot;&gt;All of Statistics&lt;/a&gt;&quot;. This included finishing up a proof on the continuity of probabilities (again, ChatGPT can walk you through &lt;a href=&quot;https://chatgpt.com/share/6a0590d6-edb5-473c-bf7f-3dd0348d1995&quot;&gt;the proof&lt;/a&gt;).  Working through this book feels like a bit more of a grind. I&apos;m not sure if it&apos;s the subject matter or the book but I&apos;m on the fence on if I&apos;m going to continue working on this right now.&lt;/p&gt;
&lt;h2&gt;Programming&lt;/h2&gt;
&lt;p&gt;I did some more LeetCode problems this week and focused on two areas. The first was going back and doing some &quot;medium&quot; difficultly problems in string manipulation and the other one was working through more &quot;easy&quot; dynamic programming problems.&lt;/p&gt;
&lt;p&gt;I&apos;m getting better at solving these problems but they&apos;re taking more time because I&apos;m now more interested in making sure I really understand all the standard CS techniques. Usually this means doing a problem 2 or 3 times to make sure I can get all the memory and runtime optimizations. This is hard, but it feels like I&apos;m getting the &quot;point&quot; of each exercise.&lt;/p&gt;
&lt;p&gt;Just like with my math proofs I&apos;m getting much better at efficiently learning and not just beating my head against the wall when I&apos;m stuck. What that looks like for me is aggressively cutting the problem down until I have identified a missing technique, then checking the answers to find that technique, then trying to apply it from memory.&lt;/p&gt;
&lt;p&gt;To be clear, I still get &lt;em&gt;very&lt;/em&gt; frustrated when I get stuck! But time boxing myself, taking breaks, and focusing on the big picture helps me keep my cool.&lt;/p&gt;
&lt;h2&gt;Bookmark Thing&lt;/h2&gt;
&lt;p&gt;I made a little more progress on my eloquently named &lt;a href=&quot;https://github.com/acviana/bookmark-thing&quot;&gt;bookmark thing&lt;/a&gt; side project. This started as an excuse to play with DuckDB some more while tracking interesting links. Because DuckDB is an OLAP database I wanted to see if I could just throw everything into one flat denormalized table.&lt;/p&gt;
&lt;p&gt;I wanted to use some of the list parsing functions to modify the tag list associated with each entry. This was working fine until I tried to perform an update on a tag list and started getting unexpected constraint errors. After some digging I realized that currently duckdb treats list updates as an insert followed by an update, which is known to cause these types of errors. If you&apos;re interested you can check out the relevant &lt;a href=&quot;https://github.com/duckdb/duckdb/issues/11915&quot;&gt;GitHub issue&lt;/a&gt; and the documentation &lt;a href=&quot;https://duckdb.org/docs/sql/data_types/list#updating-lists&quot;&gt;explanation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Instead of trying to figure out a work around (such as handling this in the application code) I decided to just revert to a typical normalized database model and build a denormalized view on top of that.&lt;/p&gt;
&lt;p&gt;This new schema has the drawback this database is now more of an OLTP workload, instead of the OLAP workload DuckDB is optimized for. But that&apos;s fine, I&apos;m still getting more exposure to DuckDB. I have some ideas for other datasets and other projects I could try that are a better fit for DuckDB&apos;s strengths.&lt;/p&gt;
&lt;p&gt;I now have working versions of my schema creation as well as the CRUD queries I want to run. I&apos;m eager to get this wrapped up in a Typer CLI interface but I think next week I&apos;ll be interacting with my code in SQL while moving it to MotherDuck, DuckDB&apos;s hosted offering.&lt;/p&gt;
&lt;p&gt;Overall, this project is reaffirming my belief in aggressively using your prototypes as much as possible before writing additional code.&lt;/p&gt;
&lt;h2&gt;Odds and ends&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I finally &lt;a href=&quot;https://github.com/acviana/vercel-nextjs-blog/pull/22&quot;&gt;updated&lt;/a&gt; the Twitter card template from the blog default.&lt;/li&gt;
&lt;li&gt;I really feel like I&apos;m starting to hit my stride with using ChatGTP to work on what I saw Charlie Marsh call &quot;side quests&quot; - stuff like the changing the Twitter card. I&apos;ve been squashing tons of small annoying bugs and warnings this way and it&apos;s starting to become my go-to.&lt;/li&gt;
&lt;li&gt;Lastly been listening to Martin Kleppmann&apos;s excellent &quot;&lt;a href=&quot;https://www.oreilly.com/library/view/designing-data-intensive-applications/9781491903063/&quot;&gt;Designing Data-Intensive Applications&lt;/a&gt;&quot;. I finished up the chapters on database replication and partitioning and started the chapter on transactions. I&apos;m learning a lot more than I thought I would by listening to a technical book! That being said, I know I&apos;m only retaining a fraction of the material, but it&apos;s more than the 0% I would get by just looking at the book on my shelf and thinking &quot;I should read that&quot;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Networking&lt;/h2&gt;
&lt;p&gt;This week I started putting more time into networking and exploring job opportunities. In particular going to in-person events and sending out more applications.&lt;/p&gt;
&lt;p&gt;I went to two in-person events last week. The first was the monthly meetup of the &lt;a href=&quot;https://chicago.aitinkerers.org/&quot;&gt;Chicago AI Tinkerers&lt;/a&gt;. This was an awesome event hosted by Drive Capital. I saw some cool demos and made a couple of great connections. The second was a meta-meetup of tech organizations at &lt;a href=&quot;https://www.mhubchicago.com/&quot;&gt;mHUB&lt;/a&gt;, one of our local hard-tech innovation spaces. This was organized around the DNC convention happening downtown and was intended to be a showcase of the Chicago hard tech ecosystem. I got to meet companies working on sharing commercial import/export data, infrastructure for recycling, energy optimization platforms, and even other hard-tech incubators. I also met one of my &quot;neighbors&quot; in my coworking space and got to learn about what they&apos;re working on.&lt;/p&gt;
&lt;p&gt;Lastly, after talking to so many folks over the last few weeks, I finally feel like I&apos;m at a point where I&apos;m ready to start sending out general applications to job openings. I think my ideal job still rhymes with &quot;I&apos;ll know it when I see it&quot;, but I feel like I&apos;ll be able to spot it and jump on it.&lt;/p&gt;
&lt;p&gt;Again, if you want to chat about opportunities or talk shop, please reach out.&lt;/p&gt;
</content:encoded></item><item><title>2024-08-16 Weekly Update - math proofs, LeetCode, DuckDB, and networking</title><link>https://acviana.com/posts/2024-08-16-weekly-update/</link><guid isPermaLink="true">https://acviana.com/posts/2024-08-16-weekly-update/</guid><description>Week 6 of my sabbatical - math proofs, LeetCode, DuckDB, and networking</description><pubDate>Fri, 16 Aug 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Like most bloggers, I have a long backlog of unfinished posts. I aspire to write concise well-scoped ideas that I can quickly get out the door. But I tend to reverted to my natural discursive writing style and completionist tendencies which makes things hard to finish.&lt;/p&gt;
&lt;p&gt;To work on that I&apos;m trying a &quot;weekly update&quot; format. I&apos;m drawing direct inspiration from Simon Willison&apos;s &quot;&lt;a href=&quot;https://simonwillison.net/tags/weeknotes/&quot;&gt;weeknotes&lt;/a&gt;&quot; as well as John D. Cook&apos;s short-form technical &lt;a href=&quot;https://www.johndcook.com/blog/&quot;&gt;blogging&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is my first attempt at this format and it&apos;s been a success in that I actually got something published! I hope you enjoy it. If you need a soundtrack while you read, may I suggest Four Tet&apos;s latest album &lt;a href=&quot;https://open.spotify.com/album/4pnU9CGKI2YneVwkqK6EIN?si=96fD3W52TWWQL_SOWhaCww&quot;&gt;Three +&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;I&apos;m on Sabbatical&lt;/h3&gt;
&lt;p&gt;My biggest update is that I&apos;ve left my former job and I&apos;m about 6 weeks into a self-proclaimed sabbatical. While I&apos;m looking for my next role I&apos;ve been focusing my energy on &quot;sharpening my tools&quot; by studying math and programming fundamentals (phrasing taken from &lt;a href=&quot;https://x.com/JohnDCook/status/1824425328409792685&quot;&gt;this tweet&lt;/a&gt;), networking with interesting folks, and just generally resetting and refocusing. Here are some examples of what&apos;s been interesting me.&lt;/p&gt;
&lt;h3&gt;Mathematical Induction Proofs&lt;/h3&gt;
&lt;p&gt;For nearly 2 years now I&apos;ve made a concerted effort to study math again. I spent 2023 working through Sheldon Axler&apos;s modestly titled &lt;em&gt;&quot;Linear Algebra Done Right (4th Edition)&quot;&lt;/em&gt;. The text is an algebraic approach to linear algebra that largely eschews the use of the determinant. I only got a few chapters in but the material was eye-opening. But, as I worked through the book I noticed I was getting too bogged down in the mechanics of completing the proofs and I was losing momentum.&lt;/p&gt;
&lt;p&gt;So this year I took a step &quot;back&quot; and have been working through John Hammack&apos;s &lt;em&gt;&quot;The Book of Proof (3rd Edition)&quot;&lt;/em&gt;. This book focuses on teaching the mechanics of proofs using a range of examples from different areas of math. This book is exactly what I needed as an undergrad math major. I&apos;ve been I&apos;ve gotten a lot more comfortable with the tools needed to work on the math I&apos;m interested in learning (I even submitted a small typo!).&lt;/p&gt;
&lt;p&gt;Currently, I&apos;m working through the chapter on mathematical induction. Most the problems so far have been about elementary number theory or series. I got stuck this week on a problem that implies the divergence of the Harmonic Series. But, I think I&apos;ve just about unwound my confusion. The rest of the chapter has problems on the Fibonacci sequence, Pascal&apos;s triangle, and graph theory.&lt;/p&gt;
&lt;h3&gt;CS Fundamentals with LeetCode&lt;/h3&gt;
&lt;p&gt;In keeping with my interest in learning fundamentals, I&apos;ve also been working through basic CS problems on LeetCode, e.g. linked lists. Because I don&apos;t have a CS degree there are a lot of fundamental CS problems I&apos;ve only read about. While I conceptually understand them I&apos;ve never worked through them in practice. I used to shrug these off as being academic problems, and maybe they are, but I&apos;ve had a surprising amount of fun solving them. I feel like I&apos;m solving chess puzzles (something I enjoy) but I also feel like I&apos;m closing some important gaps in my knowledge.&lt;/p&gt;
&lt;p&gt;This week I&apos;ve been working on Binary Search Trees and Binary Search. Coincidentally, this involves a lot of recursion which complements the math proofs I&apos;m working on.&lt;/p&gt;
&lt;h3&gt;Building a Bookmark Tool with DuckDB&lt;/h3&gt;
&lt;p&gt;I&apos;ve also been feeling the itch to do something more applied so I started working on &lt;a href=&quot;https://github.com/acviana/bookmark-thing&quot;&gt;a small bookmarking tool&lt;/a&gt; to keep track of interesting links. I decided to do this in &lt;a href=&quot;https://duckdb.org/&quot;&gt;DuckDB&lt;/a&gt; which is a tool I&apos;ve wanted to dig into more. It&apos;s also giving me a chance to check out MotherDuck, their hosted service.&lt;/p&gt;
&lt;p&gt;Right now the &quot;app&quot; is just a series of SQL queries while I prototype out the different operations I want to perform, mostly CRUD operations and trivial analytics. I have some more queries I want to add, specifically around tag manipulations. Shortly, I want to turn this into a python CLI app with the &lt;a href=&quot;https://typer.tiangolo.com/&quot;&gt;Typer&lt;/a&gt; library.&lt;/p&gt;
&lt;p&gt;One thing I&apos;ve been pleasantly surprised by is the number of convenience functions in DuckDB and how pythonic some of the syntax is. There are functions for testing list membership, you can index and slice lists just like in Python, you can declare checks at the column schema level, and there are even &lt;a href=&quot;https://duckdb.org/2023/08/23/even-friendlier-sql.html#list-comprehensions&quot;&gt;list comprehensions&lt;/a&gt;!&lt;/p&gt;
&lt;h3&gt;Networking&lt;/h3&gt;
&lt;p&gt;I finally shared on LinkedIn that I&apos;m between jobs and looking to use this time to network and as a result I&apos;ve had tons of interesting meetings. Just this week alone I&apos;ve talked to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A couple of VCs who want to connect me to portfolio companies&lt;/li&gt;
&lt;li&gt;The investment arm of a family office&lt;/li&gt;
&lt;li&gt;The tech transfer office of a national lab&lt;/li&gt;
&lt;li&gt;A local data science director&lt;/li&gt;
&lt;li&gt;A founder of a stealth mode database company&lt;/li&gt;
&lt;li&gt;A few local CTOs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&apos;s been a blast! I&apos;m meeting a ton of smart people and it&apos;s helping me better understand what I want to be working on.&lt;/p&gt;
&lt;p&gt;If &lt;em&gt;you&lt;/em&gt; want to chat about anything related to tech, please reach out.&lt;/p&gt;
</content:encoded></item><item><title>Migrating Posts From My GitHub Blog</title><link>https://acviana.com/posts/2024-06-20-migrating-github-blog-posts/</link><guid isPermaLink="true">https://acviana.com/posts/2024-06-20-migrating-github-blog-posts/</guid><description>Porting some old posts over from 2014-2017</description><pubDate>Wed, 26 Jun 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Over the years, I&apos;ve had four different technical blogs. A &lt;a href=&quot;https://www.tumblr.com/theothersideofthescreen-blog&quot;&gt;Tumblr&lt;/a&gt;, a static blog on &lt;a href=&quot;https://github.com/acviana/acviana.github.io&quot;&gt;Github Pages&lt;/a&gt;, a &lt;a href=&quot;https://alexcviana.substack.com/&quot;&gt;substack&lt;/a&gt;, and now this blog. For a long time now I wanted to collect all my writing in one place. I&apos;ve finally started that project by porting all my posts from my long dormant Github Pages blog over here. I still need to clean up some links and images but the content is ready to go. You can find all the posts under the &lt;code&gt;#github-blog&lt;/code&gt; tag.&lt;/p&gt;
&lt;p&gt;These posts started around 2013 when I was still working for the Hubble Space Telescope and showcase some of my astronomy pipelines and analysis projects I was working on. They end in 2017 when I had a leadership position at a startup and was an organizer for the local Python Meetup group.&lt;/p&gt;
&lt;p&gt;Looking back on these posts now, more than anything I&apos;m thankful that I took the time to document where I&apos;ve been along the way in my professional journey. It&apos;s so easy to forget where you&apos;ve been and how hard you&apos;ve worked. Looking back on my work gives me a sense of both pride and perspective. It also give me motivation to write even more and raise the bar on my output!&lt;/p&gt;
</content:encoded></item><item><title>Fumbling Into Floats</title><link>https://acviana.com/posts/2024-05-01-fumbling-into-floats/</link><guid isPermaLink="true">https://acviana.com/posts/2024-05-01-fumbling-into-floats/</guid><description>Down a rabbit hole of floating point numbers</description><pubDate>Sat, 04 May 2024 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;What Do You Mean?&lt;/h2&gt;
&lt;p&gt;Recently, I was working on a side project writing a &lt;a href=&quot;https://github.com/acviana/multiarmed-bandit-simulation/tree/main&quot;&gt;multi-armed bandit simulator&lt;/a&gt; from scratch. Part of the code requires iteratively appending to a series of numbers and calculating the mean. It turns out that instead recalculating the mean over the entire series each time you can use a more computationally efficient &lt;a href=&quot;https://math.stackexchange.com/a/106720&quot;&gt;incremental mean&lt;/a&gt;. Here what that looks like in Python:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def incremental_mean(mean: float, observation: float, n: int) -&amp;gt; float:
    return mean + ((observation - mean) / n)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The benefit of using this incremental mean formula is that each incremental calculation is $\mathcal{O}(1)$ while calculating the mean of the entire series would be $\mathcal{O}(n)$. To see this, I benchmarked both methods with a series of 1000 randomly generated floats, once using my incremental mean function and once using the &lt;code&gt;statistics.mean&lt;/code&gt; function from the Python Standard Library. Here are the results:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;%%timeit
[statistics.mean(input_list[0 : index + 1]) for index in range(len(input_list))
]
433 ms ± 62.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
rolling_incremental_mean(input_list)
513 µs ± 28.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Fantastic, it&apos;s nearly 1000x faster! Great, we&apos;re done.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;We were not done&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;What the Float?!?&lt;/h2&gt;
&lt;p&gt;I swapped in my faster incremental formula into my project, got the same results but faster, and moved on. But, while I was debugging a different error I decided to return to this function just to double-check. I tried to run an &lt;code&gt;assert&lt;/code&gt; against the two outputs from my benchmark to confirm that they were the same and the test failed.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;No surprise, probably just a typo!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Then I looped over the two outputs and checked them all pairwise. This time, it made it through a few dozen numbers before failing.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;That&apos;s weird, shouldn&apos;t the formula always be right or wrong?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Then tried plotting the residuals (differences) between the two outputs and got this graph&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/incremental-mean-residuals.png&quot; alt=&quot;incremental mean residuals&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;That&apos;s weird ... why are the errors quantized? Shouldn&apos;t they be - oh right, floats.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;We All Float On Alright&lt;/h2&gt;
&lt;p&gt;Here&apos;s what I thought was going on, to first order.&lt;/p&gt;
&lt;p&gt;Numbers are infinite but computers are decidedly finite. This means any computational (physical) representation of the number line is going to have certain constraints. The most common representation computers use are called &lt;a href=&quot;https://en.wikipedia.org/wiki/Floating-point_arithmetic&quot;&gt;floating-point numbers&lt;/a&gt;, or &quot;floats&quot;. This represents every number as some real number of finite length raised to an exponent like this $123.456e^{10}$.&lt;/p&gt;
&lt;p&gt;Now here&apos;s where the constraints come in. Because the number of digits in the prefix to the exponent (also called the &quot;mantissa&quot;) is limited, it means we can only construct a finite number of mantissas &lt;em&gt;per exponent&lt;/em&gt;. That means that as the exponents get smaller, the numbers we can represent get closer and closer. The exponents themselves have a limited range (much less than the range of the mantissa) which means at some point we get to the smallest interval we can express in our floating point system.&lt;/p&gt;
&lt;p&gt;I&apos;m hand-waiving away some details (which we&apos;ll return to later) but that&apos;s what went through my mind the first time I saw the graph of my residuals. I noticed the errors were on the order of $1e^{-15}$. This is effectively zero for my purposes, and figured I must have hit the floating point limit. I assumed there must be some rounding approximation in the standard library &lt;code&gt;statistics.mean&lt;/code&gt; function or something.&lt;/p&gt;
&lt;p&gt;Regardless of the details, this meant my function was correct so I could just stop there.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;I did not stop there.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;Down the Rabbit Hole&lt;/h2&gt;
&lt;p&gt;I really wanted to get back to my main project, but now my interest was piqued. Could I &lt;em&gt;prove&lt;/em&gt; that this was just rounding errors on floating point math?&lt;/p&gt;
&lt;p&gt;To start with, I confirmed that both the outputs each had 1000 distinct values which seemed normal. Now, how many different residual values were there?&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt;len(residuals)
1000

&amp;gt;&amp;gt;&amp;gt;len(set(residuals))
18
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;OK, only 18 distinct values out of 1000. So now my suspicion is that this effect is being introduced when I subtract the two outputs. 18 isn&apos;t that many, let&apos;s take a look.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt;sorted(set(residuals))
  [-6.661338147750939e-16,
 -4.440892098500626e-16,
 -3.3306690738754696e-16,
 -2.220446049250313e-16,
 -1.1102230246251565e-16,
 -3.469446951953614e-18,
 0.0,
 5.551115123125783e-17,
 1.1102230246251565e-16,
 2.220446049250313e-16,
 3.3306690738754696e-16,
 4.440892098500626e-16,
 6.661338147750939e-16,
 8.881784197001252e-16,
 1.1102230246251565e-15,
 1.3322676295501878e-15,
 1.5543122344752192e-15]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Many of those seem suspiciously evenly spaced at something like $1.11e^{-16}$. You can see this more clearly if you plot them on a number line.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/floating-point-residuals-1d.png&quot; alt=&quot;Incremental mean residuals on a 1d number line&quot; /&gt;&lt;/p&gt;
&lt;p&gt;We can calculate the distances between points to make sure they&apos;re really the same.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt;residuals_set = sorted(set(residuals))
&amp;gt;&amp;gt;&amp;gt;set([
    residuals_set[i] - residuals_set[i + 1] 
    for i in range(len(residuals_set) - 1)
])

{-2.220446049250313e-16,
 -1.1102230246251565e-16,
 -1.0755285551056204e-16,
 -5.551115123125783e-17,
 -3.469446951953614e-18}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That seems promising, there are only 5 distinct distances. Maybe the smallest of those is the minimum floating point step size?&lt;/p&gt;
&lt;h2&gt;An Epsilon of Delta&lt;/h2&gt;
&lt;p&gt;It turns out you can figure out the float properties of your system with the &lt;code&gt;sys.float&lt;/code&gt; command.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt;import sys
&amp;gt;&amp;gt;&amp;gt;sys.float_info

sys.float_info(
    max=1.7976931348623157e+308, 
    max_exp=1024, 
    max_10_exp=308, 
    min=2.2250738585072014e-308, 
    min_exp=-1021, 
    min_10_exp=-307, 
    dig=15, 
    mant_dig=53, 
    epsilon=2.220446049250313e-16, 
    radix=2, 
    rounds=1
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The value we care about is &lt;code&gt;epsilon=2.220446049250313e-16&lt;/code&gt;. This value is on of our list of spacings, but confusingly it&apos;s not the smallest one. The Python docs define &lt;code&gt;float_info.epsilon&lt;/code&gt; as:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;difference between 1.0 and the least value greater than 1.0 that is representable as a float.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OK, that definition seems consistent with the idea that we&apos;re hitting the lower limit of what we can represent with floats. It doesn&apos;t quite explain why I see values smaller than that but still, this seems like a reasonable place to stop!&lt;/p&gt;
&lt;p&gt;&lt;em&gt;We&apos;re not stopping.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;Epsilon Families&lt;/h2&gt;
&lt;p&gt;If you squint at the list of distinct residuals you might notice a pattern that many of them are offset by the &lt;code&gt;float_info.epsilon&lt;/code&gt; value. Actually, it looks like there are two &quot;families&quot; of residuals, each coming at intervals of &lt;code&gt;epsilon&lt;/code&gt;, but with one family offset but 1/2 &lt;code&gt;epsilon&lt;/code&gt;. This plot helps show that.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/floating-point-residuals-by-epsilon-family.png&quot; alt=&quot;Floating Point Residual by Epsilon Family&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The same data appears in both plots with the families are color coded. On the top they all appear on the same number line. On the bottom figure they are separated for clarity and expressed in units of epsilon. You can see the two families, their spacings and offsets, as well as the two points that don&apos;t fit into that scheme.&lt;/p&gt;
&lt;h2&gt;It&apos;s Epsilons All The Way Down&lt;/h2&gt;
&lt;p&gt;At one point while working on this I remember thinking&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Boy, it&apos;s lucky that these numbers were multiples of roughly 1.11 or I wouldn&apos;t have been able to spot the pattern! (e.g. 2.22, 3.33, etc.)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Yeah, so about that. It turns out this ideas of epsilon &quot;families&quot; wasn&apos;t quite right, and the clue was in the remaining points that I thought didn&apos;t &quot;fit&quot; the pattern. After making lots more plots I finally realized that &lt;em&gt;all&lt;/em&gt; the errors can be expressed in units of epsilon. Take a look:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt;a = [
    -6.661338147750939e-16,
    -4.440892098500626e-16,
    -3.3306690738754696e-16,
    -2.220446049250313e-16,
    -1.1102230246251565e-16,
    -3.469446951953614e-18,
    0.0,
    5.551115123125783e-17,
    1.1102230246251565e-16,
    2.220446049250313e-16,
    3.3306690738754696e-16,
    4.440892098500626e-16,
    6.661338147750939e-16,
    8.881784197001252e-16,
    1.1102230246251565e-15,
    1.3322676295501878e-15,
    1.5543122344752192e-15,
]

&amp;gt;&amp;gt;&amp;gt; [item / sys.float_info.epsilon for item in a]
[-3.0,
 -2.0,
 -1.5,
 -1.0,
 -0.5,
 -0.015625,
 0.0,
 0.25,
 0.5,
 1.0,
 1.5,
 2.0,
 3.0,
 4.0,
 5.0,
 6.0,
 7.0]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;OK so epsilon itself is not the smallest difference we can express (that was already clear). But it is the basis for representing all small differences because all our errors are some rational number times epsilon.&lt;/p&gt;
&lt;p&gt;To test this I ran some more simulations and cranked up the sample size to 50k. The residuals moved around a lot more as you can see in the plot below. But when I checked every single one was always a multiple of epsilon.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/mean-and-residuals-large-dataset.png&quot; alt=&quot;Mean and Residuals for 50k Steps&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The patterns was something like this.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For values larger than $\pm 2$ the step size is $k \cdot \epsilon$ where $k$ is an integer (or sometimes $\epsilon/2$).&lt;/li&gt;
&lt;li&gt;For values between $\pm 2$ and $\pm 1/4$ the step size was $\epsilon/4$.&lt;/li&gt;
&lt;li&gt;Smaller than $\pm 1/4$ goes like $\epsilon/2^{k}$ (e.g. $0.015625$ is $1/64$).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Interestingly, I didn&apos;t always see the same step sizes and values so I think there&apos;s a lot of optimization going on under the hood.&lt;/p&gt;
&lt;p&gt;But, all this to say I&apos;m satisfied that epsilon really &lt;em&gt;is&lt;/em&gt; the fundamental building block of small number representations (at least close to 0), which explains the quantization effect I saw.&lt;/p&gt;
&lt;h2&gt;At the Bottom of Everything&lt;/h2&gt;
&lt;p&gt;This is very much a inductive approach to understanding the problem, one where I&apos;m working backwards from the data. I could come at this from deductive angle but I think that would involve reading IEEE specs, CS textbooks, and Python source code. All that sounds fun (seriously!) but I want to return to my main project of simulating multi-armed bandits. So, at least for now, this is proof enough for me.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Now, we are finally done.&lt;/em&gt;&lt;/p&gt;
</content:encoded></item><item><title>The Data Chief Podcast Episode</title><link>https://acviana.com/posts/2024-01-24-the-data-chief-podcast-episode/</link><guid isPermaLink="true">https://acviana.com/posts/2024-01-24-the-data-chief-podcast-episode/</guid><description>Some thoughts on appearing on a professionally produced podcast</description><pubDate>Wed, 24 Jan 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Last July I was the guest on a data leadership podcast called &quot;The Data Chief&quot;. I&apos;ve done a decent amount of mid-sized public speaking but I had never been on a professionally produced podcast before. It was a great experience and we got to cover a lot of ground from my diverse work history (astronomy, infosec, health tech, developer tools), to becoming an executive, and some of my thoughts on running a data team at scale.&lt;/p&gt;
&lt;p&gt;I’ve had a slow but steady stream of people reaching out to me since then saying they enjoyed the recording and were able to apply some of what they learned. I thought it was a good nudge to write some thoughts down about the experience.&lt;/p&gt;
&lt;p&gt;You can find the episode webpage &lt;a href=&quot;https://www.thoughtspot.com/data-chief/ep75/balancing-long-term-vision-with-near-term-action-with-vercel-s-vp-of-data#transcript&quot;&gt;here&lt;/a&gt; as well as the embedded Spotify link below.&lt;/p&gt;
&lt;p&gt;&amp;lt;iframe style=&quot;border-radius:12px&quot; src=&quot;https://open.spotify.com/embed/episode/6qajXurCEN1gX2Wfs3wJ9Z?utm_source=generator&quot; width=&quot;100%&quot; height=&quot;352&quot; frameBorder=&quot;0&quot; allowfullscreen=&quot;&quot; allow=&quot;autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture&quot; loading=&quot;lazy&quot;&amp;gt;&amp;lt;/iframe&amp;gt;&lt;/p&gt;
&lt;h3&gt;Highlights From My Conversation&lt;/h3&gt;
&lt;p&gt;A theme I tried to bring into the conversation was my perspective of a first time junior executive, someone who is on the executive team but not yet in the C-suite. I wanted to share some perspectives on how I got where I am, what I think worked for me, and what problems I’m working on.&lt;/p&gt;
&lt;p&gt;One of the ideas I emphasize for folks who want to move into leadership positions is to understanding &lt;em&gt;your&lt;/em&gt; unique value as a leader. You can’t be everything to everyone and make it to the top of any hiring pile you have to have the self-awareness to understand what makes you special.&lt;/p&gt;
&lt;p&gt;For example, I’ve worked with a lot of security teams in my career. This is unusual and it’s influenced the way I think about risk. As I said on the show:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You can never completely neutralize risk. You want to document the risk and accept it. And I think it&apos;s the same thing with data work. You don&apos;t want to, for every situation, have the same degree of precision, the same amount of data required. It&apos;s sometimes okay to be directionally accurate and to estimate things.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That’s probably not a make-or-break opinion, but it is something that is based on my experience and it sets me apart a little bit. Having the self-reflection to Identify these things in yourself is something aspiring leaders need to be able to do.&lt;/p&gt;
&lt;p&gt;There’s lots more in the episode so if that sounds interesting please check it out and let me know what you think.&lt;/p&gt;
&lt;h3&gt;Sheepishly Helpful&lt;/h3&gt;
&lt;p&gt;I was actually hesitant to appear on The Data Chief because it feels intimidating, even arrogant, to get in front of a microphone and claim you have something meaningful to say about data leadership. I’m easily taken in by the common cognitive distortion that just because there are people who are more knowledgeable than me, then clearly I can’t have anything meaningful to contribute to the conversation.&lt;/p&gt;
&lt;p&gt;I’m glad I was able to ignore that voice in my head. Earlier this week someone reached out to thank me for helping them find a job leading a data team. My contribution was this podcast episode and few other conversations that inspired them to want to come back to data work and gave them a roadmap to do so.&lt;/p&gt;
&lt;p&gt;This was kind of a full circle moment because this is the same individual who encouraged me to go on the podcast in the first place. We had met at a local data meetup and I had offered him some career advice. In turn, I told him I had been invited to appear on a data podcast but doubted I had anything interesting to say. He politely corrected my self-perception (a common theme for me) and encouraged me to go ahead.&lt;/p&gt;
&lt;p&gt;All this is to say, everyone struggles with being “enough”. If you want to do some public speaking and get a chance to — go for it! And let me know.&lt;/p&gt;
</content:encoded></item><item><title>Practicing Jazz with the Linux Command Line</title><link>https://acviana.com/posts/2023-12-08-practicing-jazz-with-the-command-line/</link><guid isPermaLink="true">https://acviana.com/posts/2023-12-08-practicing-jazz-with-the-command-line/</guid><description>Using linux command line tools for jazz ear training</description><pubDate>Fri, 08 Dec 2023 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;With ChatGPT&apos;s help, I wrote a command line script that loops over a play-along jazz track starting at a random point each time. Starting at a random point forces me to develop my ear by finding my place in the chord changes, an important skill in Jazz.&lt;/p&gt;
&lt;h2&gt;Some Background&lt;/h2&gt;
&lt;p&gt;Over the pandemic I started taking jazz piano lessons. Jazz is improvisational music meaning it&apos;s not written out ahead of time like classical music. Even though it&apos;s improvised there is a guiding structure, mostly in the form of preset chord changes. To keep their place in the music, Jazz musicians need to be able to hear these changes.&lt;/p&gt;
&lt;p&gt;One of the ways to build this skill is to play along with backing audio tracks that simulate playing with a band. This is useful practice, but it&apos;s easy to just keep in time and play along without really listening and responding.&lt;/p&gt;
&lt;p&gt;To develop your ear, you need randomness. So I decided to see if I could write a script to throw me some musical curve balls by starting the track in random places&lt;/p&gt;
&lt;h2&gt;Using ChatGTP&lt;/h2&gt;
&lt;p&gt;Playing an audio file seemed like a good fit for a bash script. I&apos;m sure there are lots of Python libraries I could have used but that felt like overkill.&lt;/p&gt;
&lt;p&gt;I chose to use ChatGTP to help me out because bash&apos;s syntax is stable and well-documented. It&apos;s also an implementation detail and not something I&apos;m not interested in building mastery in. Better to just outsource it to the LLM&lt;/p&gt;
&lt;h2&gt;The Script&lt;/h2&gt;
&lt;p&gt;Here&apos;s the first pass from ChatGPT 3.5:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#!/bin/bash

if [ -z &quot;$1&quot; ]; then
  echo &quot;Usage: $0 &amp;lt;music_file&amp;gt;&quot;
  exit 1
fi

music_file=&quot;$1&quot;

while true; do
  start_time=$(shuf -i 0-$(mpv --no-video --length &quot;$music_file&quot;) -n 1)
  mpv --start=&quot;$start_time&quot; &quot;$music_file&quot;
done
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Like many technical questions I ask ChatGPT, the outline of this answer makes sense at first glance. And while it sets you on the right track, but there were a handful of bugs and the script as written didn&apos;t run. I essentially &quot;paired&quot; with ChatGPT for about 20 minutes, asking question about bugs, using alternate tools, and adding an escape key to kill the loop.&lt;/p&gt;
&lt;p&gt;The final script looks like this, I&apos;ll break it down below.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#!/bin/bash

if [ -z &quot;$1&quot; ]; then
  echo &quot;Usage: $0 &amp;lt;music_file&amp;gt;&quot;
  exit 1
fi

music_file=&quot;$1&quot;
track_length=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 &quot;$music_file&quot;)
echo &quot;Press q twice to quit&quot;

while true; do
  start_time=$(gshuf -i 0-$(echo &quot;$track_length&quot; | cut -d. -f1) -n 1)
  mpv --start=&quot;$start_time&quot; &quot;$music_file&quot; --no-audio-display

  read -t 1 -n 1 key
  if [ &quot;$key&quot; == &quot;q&quot; ]; then
    echo &quot;Exiting...&quot;
    break
  fi

done
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;First we check for a command line argument that we assume gives the location of the audio track. Then we use &lt;code&gt;ffprobe&lt;/code&gt; to get the track length from the metadata, &lt;code&gt;gshuf&lt;/code&gt; to generate the starting timestamp, and &lt;code&gt;mpv&lt;/code&gt; to play the track. Finally it listens for &lt;code&gt;q&lt;/code&gt; to quit the program.&lt;/p&gt;
&lt;p&gt;This was a fun little evening project that yielded a useful practice tool as well as some more experience working with LLMs. In the future I might want up the complexity by also having it &lt;em&gt;stop&lt;/em&gt; at a random time - but in the meantime I need to practice more first.&lt;/p&gt;
</content:encoded></item><item><title>Notes on Chapter 1 and 2 of What are Embeddings</title><link>https://acviana.com/posts/2023-10-01-notes-on-chapter-1-and-2-of-what-are-embeddings/</link><guid isPermaLink="true">https://acviana.com/posts/2023-10-01-notes-on-chapter-1-and-2-of-what-are-embeddings/</guid><description>Notes on Chapter 1 and 2 of Vicki Boykis&apos;s &quot;What Are Embeddings&quot; paper</description><pubDate>Sun, 01 Oct 2023 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://vicki.substack.com/p/what-are-embeddings&quot;&gt;What are Embeddings&lt;/a&gt; is an excellent long-form primer on the topic of &lt;a href=&quot;https://en.wikipedia.org/wiki/Word_embedding&quot;&gt;embeddings&lt;/a&gt; - vector representations of data used for deep learning applications. The material makes a great counterpart to my current self-study of linear algebra, showing how concepts in vector spaces can be applied to deep learning in very concrete way.&lt;/p&gt;
&lt;p&gt;This paper is distinct in my opinion in that Vicki goes to great lengths to contextualize the problem of embeddings at a number of different levels, starting from when you might want to use an ML model or a recommender system at all. She also gives historical context of how embeddings evolved and what problems they solved. I encourage you to read to original text.&lt;/p&gt;
&lt;p&gt;Below are my highlights and notes from the first 2 chapters, the Introduction and &quot;Recommendations as a Business Problem&quot;. Most of my notes focuses on the later chapter on different frameworks for thinking about end-to-end search and recommendation systems.&lt;/p&gt;
&lt;p&gt;I look forward to finding more time to dig into the remaining sections which are more technical, as well as read the foundational &lt;a href=&quot;https://en.wikipedia.org/wiki/Word2vec&quot;&gt;Word2Vec&lt;/a&gt; paper.&lt;/p&gt;
&lt;h2&gt;1. Introduction&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;embeddings — deep learning models’ internal representations of their input data (p. 4)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The usage of embeddings to generate compressed, context-specific representations of content exploded in popularity after the publication of Google’s Word2Vec paper (p.4)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;[A graph of] Embeddings papers in Arxiv by month. It’s interesting to note the decline in frequency of embeddings-specific papers, possibly in tandem with the rise of deep learning architectures like GPT (p.5 )&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Transformer [66] architecture, with its self-attention mechanism, a much more specialized case of calculating context around a given word, has become the de-facto way to learn representations of growing multimodal vocabularies (p.5)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;As a general definition, embeddings are data that has been transformed into n-dimensional matrices for use in deep learning computations. (p.5)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;3 steps of the embedding process:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Transform&lt;/strong&gt; multimodal input into representations such as vectors, tensors, or graphs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compress&lt;/strong&gt; inputs for use by an ML task, a fixed input such tagging, labeling, or semantic search.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Create an Embedding Space&lt;/strong&gt; that is specific to the training space but can be generalized to other tasks and domains. This generalizability is the reason embeddings are so popular.&lt;/li&gt;
&lt;/ol&gt;
&lt;blockquote&gt;
&lt;p&gt;We often talk about item embeddings being in X dimensions, ranging anywhere from 100 to 1000, with diminishing returns in usefulness somewhere beyond 200-300 in the context of using them for machine learning problems4.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;In Systems Thinking, Donella Meadows writes, “You think that because you understand ’one’ that you must therefore understand ’two’ because one and one make two. But you forget that you must also understand ’and.’&quot; (p.9)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;2. Recommendations as a Business Problem&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;How do we solve the problem of what to show in the timeline here so that our users find the content relevant and interesting, and balance the needs of our advertisers and business partners? (p.10)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;2.1 Building a Web App&lt;/h3&gt;
&lt;p&gt;No Notes&lt;/p&gt;
&lt;h3&gt;2.2 Rules-based systems versus machine learning&lt;/h3&gt;
&lt;blockquote&gt;
&lt;p&gt;In short, the difference between programming and machine learning development is that we are not generating answers through business rules, but business rules through data. (p. 14)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;2.3 Building a web app with machine learning&lt;/h3&gt;
&lt;p&gt;The 4 components of a machine learning system:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Input Data:&lt;/strong&gt; processing data from a stream or a database&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feature Engineering and Selection:&lt;/strong&gt; picking which features (attributes of the data) to use as inputs to machine learning. Embeddings are used as inputs here.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Building:&lt;/strong&gt; Select the important features and train the model, iterating to optimize performance. Embeddings are &lt;em&gt;also&lt;/em&gt; the output of this step and be (re)used downstream.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Serving&lt;/strong&gt;: Put the model in prod.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The 3 highest-level types of ML tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Supervised:&lt;/strong&gt; when we have training data that can tell us if the model predictions are correct (e.g. regression, SVM, NN)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Unsupervised:&lt;/strong&gt; when this is no single ground-truth answer, patterns can be detected but have to be interpreted. (e.g. clustering, PCA)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reinforcement Learning:&lt;/strong&gt; Akin to a game theory problem. We have an agent moving through a model and we want to iteratively learn an optimal strategy. (e.g. PCA and Word2Vec).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2.4 Formulating a machine learning problem&lt;/h3&gt;
&lt;p&gt;Again, that attributes of the data are called the features. In general, this is a really good section to review as an overview of a ML learning process.&lt;/p&gt;
&lt;p&gt;We use linear regression to train a model but then use &lt;strong&gt;gradient descent&lt;/strong&gt; to minimize the cost function (in this example MSE). As a reminder:&lt;/p&gt;
&lt;p&gt;$$
MSE = \frac{1}{N} \sum^{n}&lt;em&gt;{i=1}(y&lt;/em&gt;{i} - (mx_{i} + b))^{2}
$$&lt;/p&gt;
&lt;h4&gt;2.4.1 The Task of Recommendations&lt;/h4&gt;
&lt;p&gt;The general task of recommender systems is &lt;strong&gt;information retrieval&lt;/strong&gt; which is focused on pulling relevant information from large corpuses. Within information retrieval we have 2 complimentary approaches:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Search:&lt;/strong&gt; Direction information seeking, well established at this point.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Recommendation:&lt;/strong&gt; The query is not directly given but inferred based on learned taste and preferences.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;The first industrial recommender systems were created to filter messages in email and newsgroups [22] at the Xerox Palo Alto Research Center based on a growing need to filter incoming information from the web. The most common recommender systems today are those at Netflix, YouTube, and other large-scale platforms that need a way to surface relevant content to users.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Common approaches to recommender systems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Collaborative Filtering:&lt;/strong&gt; &lt;em&gt;&quot;finding missing user-item interactions in a given set of user-item interaction history&quot;&lt;/em&gt;, e.g. most people who watch 8 Star Wars movies watch the 9th and you still have not watched the 9th. There are two main approaches:
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Neighborhood Models:&lt;/strong&gt; Finding similar users based on similarity functions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Matrix Factorization:&lt;/strong&gt; &lt;em&gt;&quot;the process of representing users and items in a feature matrix made up of low-dimensional factor vectors, which in our case, are also known as embeddings, and learning those feature vectors through the process of minimizing a cost function&quot;&lt;/em&gt; (p. 21) This is similar to Word2Vec.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Content Filtering:&lt;/strong&gt; &quot;&lt;em&gt;This approach uses metadata available about our items (for example in movies or music, the title, year released, genre, and so on) as initial or additional features input into models and work well when we don’t have much information about user activ- ity&lt;/em&gt;&quot; (p. 21)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Learn to Rank:&lt;/strong&gt; pair-wise rankings of items. &quot;&lt;em&gt;This step normally takes place after candidate generation, in a filtering step, because it’s computationally expensive to rank extremely large lists&lt;/em&gt;&quot; (p.21)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Neural Recommendations:&lt;/strong&gt; NN capture the same relationships as matrix factorization and this is were we see deep learning networks like Word2Vec and BERT. An example is convolutional and recurrent NN for sequential recommendation such as in music playlists.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Recommender systems tend to have a specialized architecture with 4 stages:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Candidate Generation:&lt;/strong&gt; A first-pass model that generates a smaller list of candidates down from millions to thousands or hundreds.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ranking:&lt;/strong&gt; Ordering the list of candidate recommendations based on predicted user preference.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Filtering:&lt;/strong&gt; Remove unwanted items e.g. NSFW content or sale items&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Retrieval:&lt;/strong&gt; Hit the modal endpoint to get final list.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Embeddings play a role in search and recommendation systems similar to that of databases in backend architectures.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Embeddings are a type of machine learning feature — or model input data — that we use first as input into the feature engineering stage, and the first set of results that come from our candidate generation stage, that are then incorporated into downstream processing steps of ranking and retrieval to produce the final items the user sees.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4&gt;2.4.2 Machine Learning Features&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;As a general rule, the creation of the correct formulation of input data is perhaps the heart of machine learning. I.e. if we have bad input, we will get bad output.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The process of formatting data correctly to feed into a model is called fea- ture engineering [...] when we have textual data, we need to turn it into numerical representations so that we can compare these representations.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;2.5 Numerical Feature Vectors&lt;/h3&gt;
&lt;p&gt;This section just explains encoding text as a numerical value so we can represent it as a vector.&lt;/p&gt;
&lt;h3&gt;2.6 From Words to Vectors in Three Easy Pieces&lt;/h3&gt;
&lt;p&gt;In deep learning architectures and NLP-related tasks you repeatedly see the following core concepts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Encoding:&lt;/strong&gt; Representing non-numerical multimodal data as numbers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vectors:&lt;/strong&gt; We store our data as vectors&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lookup Matrices:&lt;/strong&gt; hash tables to allow for efficient lookups between words and numbers&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>Finite Dimensiona Vector Spaces Ch. 1 Problem 2(c)</title><link>https://acviana.com/posts/2023-09-23-finite-dimensional-vector-spaces-chapter-1-problem-2c/</link><guid isPermaLink="true">https://acviana.com/posts/2023-09-23-finite-dimensional-vector-spaces-chapter-1-problem-2c/</guid><description>Working through problem 2(c) from chapter 1 of Halmos&apos;s &quot;Finite Dimensional Vector Spaces&quot;</description><pubDate>Sat, 23 Sep 2023 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I&apos;m 9 months into my self-directed study of math and I wanted to start writing about some of the problems I enjoyed working on. First up is problem 2(c) from &lt;a href=&quot;https://en.wikipedia.org/wiki/Paul_Halmos&quot;&gt;Paul Halmos&apos;s&lt;/a&gt; foundational 1942 text &lt;em&gt;&quot;Finite Dimensional Vector Spaces&quot;&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;I was excited by this problem because of the technique I had to use to solve it. It&apos;s the first to me I&apos;ve been given a problem that can&apos;t be solved in the original domain, but can be solved the mapping your set to a new domain with different properties. It&apos;s a technique I&apos;ve seen before, but never one I&apos;ve been asked to implement myself.&lt;/p&gt;
&lt;h2&gt;The Setup&lt;/h2&gt;
&lt;p&gt;The first two parts of this problem are quick to verify and on-par with what you would see in any other introductory abstract algebra text covering &lt;a href=&quot;https://en.wikipedia.org/wiki/Field_(mathematics)&quot;&gt;fields&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Briefly, the set of all positive integers ($\mathbb{Z}&lt;em&gt;+$) is not a field because there is no additive inverse ($\nexists -x \in \mathbb{Z}&lt;/em&gt;{+}$ such that  $x + (-x) = 0$ for $\forall x \in \mathbb{Z}_{+}$).&lt;/p&gt;
&lt;p&gt;Similarly, the set of all integers ($\mathbb{Z}$), while it does have an additive inverse, is still not a field because there is no multiplicative inverse ($\nexists x^{-1} \in \mathbb{Z}$ such that  $xx^{-1} = 1$ for $\forall x \in \mathbb{Z}$).&lt;/p&gt;
&lt;p&gt;This is all pretty standard stuff but Part (c) was interesting to me though.&lt;/p&gt;
&lt;h2&gt;The Problem&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;Can the answers to the question be changed by re-defining addition or multiplication (or both)?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;After have no luck with my usual bag of tricks (modulo addition, trig functions, even/odd numbers) I found some help on &lt;a href=&quot;https://math.stackexchange.com/a/1356925/1141983&quot;&gt;Math Stackexchange&lt;/a&gt; .&lt;/p&gt;
&lt;p&gt;We can, in fact, redefine addition (here $\oplus$ instead of $+$) and multiplication ($\odot$ instead of $\cdot$) such that $\mathbb{Z}&lt;em&gt;{+}$ (or $\mathbb{Z}$) are fields $\mathfrak{F}(\mathbb{Z}&lt;/em&gt;+,\oplus,\odot)$.&lt;/p&gt;
&lt;p&gt;We want to take advantage of the fact that the rational numbers ($\mathbb{Q}$) &lt;em&gt;are&lt;/em&gt; a field. Note that $\mathbb{Z}&lt;em&gt;{+}$ is infinite but has the same cardinality as $\mathbb{Q}$ -- they are both &lt;a href=&quot;https://en.wikipedia.org/wiki/Countable_set&quot;&gt;countably infinite&lt;/a&gt; (written as $\aleph&lt;/em&gt;{0}$). Using this fact, we can define a function $f: \mathbb{Z}&lt;em&gt;+ \rightarrow \mathbb{Q}$ . That is, a &lt;a href=&quot;https://en.wikipedia.org/wiki/Bijection&quot;&gt;bijection&lt;/a&gt; (a mapping that is one-to-one and onto) that maps each member of $\mathbb{Z}&lt;/em&gt;{+}$ onto a unique member of $\mathbb{Q}$.&lt;/p&gt;
&lt;p&gt;Then we can use our function $f$ (and its inverse $f^{-1}$) to build a new definition of addition and multiplication for $x \in \mathbb{Z}_+$ such that it now meets the &lt;a href=&quot;https://en.wikipedia.org/wiki/Field_(mathematics)#Classic_definition&quot;&gt;properties of a field&lt;/a&gt;:
$$
\begin{align}
&amp;amp;\oplus : f^{-1}(f(x) + f(y)) &amp;amp; \
&amp;amp;\odot : f^{-1}(f(x) \cdot f(y)) &amp;amp;
\end{align}
$$&lt;/p&gt;
&lt;h2&gt;Why It Works&lt;/h2&gt;
&lt;p&gt;These new definitions of addition and multiplication satisfy the properties of a field by moving our binary functions to $\mathbb{Q}$, performing the operation, and then mapping the result back to $\mathbb{Z}_{+}$. Note that we never actually define the function, and there could be infinitely many such mappings, it&apos;s just enough to show that it exists.&lt;/p&gt;
&lt;p&gt;The general technique here is to use a function ($f$) to map a set ($\mathbb{Z}&lt;em&gt;{+}$ or $\mathbb{Z}$) into a codomain ($\mathbb{Q}$), using the properties we need in the new codomain (additive and multiplicative inverse), and then using an inverse function  ($f^{-1}$) to return to the original space ($\mathbb{Z}&lt;/em&gt;{+}$ or $\mathbb{Z}$).&lt;/p&gt;
&lt;p&gt;Conceptually, this idea of mapping to a new space with useful properties reminds me of related techniques from other areas of applied math. For example, a &lt;a href=&quot;https://en.wikipedia.org/wiki/Fourier_transform&quot;&gt;Fourier transform&lt;/a&gt; to transform a waveform from time-space to function-space, or a &lt;a href=&quot;https://en.wikipedia.org/wiki/Laplace_transform&quot;&gt;Laplace transform&lt;/a&gt; to go from a real-space to a complex-space.&lt;/p&gt;
&lt;h2&gt;Why It&apos;s Interesting&lt;/h2&gt;
&lt;p&gt;Like most people, most of my math career has been focused on &lt;em&gt;calculation&lt;/em&gt;, I was given formulas and told to apply them. The motivation and reasoning behind the techniques was explained, which is how I recognized some of the parallels I pointed out.  But these were always presented a tools to be used, not general techniques that we could also apply.&lt;/p&gt;
&lt;p&gt;So to wrap things up, even though this is an elementary result it got me excited because this is the type deeper mathematical tool set I would like to build.&lt;/p&gt;
</content:encoded></item><item><title>The Best Math Advice I Ever Got</title><link>https://acviana.com/posts/2023-08-12-the-best-math-advice-i-ever-got/</link><guid isPermaLink="true">https://acviana.com/posts/2023-08-12-the-best-math-advice-i-ever-got/</guid><description>On turning hieroglyphics into pictures</description><pubDate>Sat, 12 Aug 2023 05:00:00 GMT</pubDate><content:encoded>&lt;h3&gt;Hieroglyphics and Pictures&lt;/h3&gt;
&lt;blockquote&gt;
&lt;p&gt;Math is just a picture which you can understand hidden in hieroglyphics which you don&apos;t understand.
Your job is to turn the hieroglyphics back into a picture.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;- Prof. David Griffeath, UW - Madison&lt;/p&gt;
&lt;p&gt;That quote is the best pedagogical math advice I ever I got.
It was an off-hand remark in 1-credit elective on &lt;a href=&quot;https://en.wikipedia.org/wiki/Cellular_automaton&quot;&gt;cellular automaton&lt;/a&gt; (as in &lt;a href=&quot;https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life&quot;&gt;Conway&apos;s Game of Life&lt;/a&gt;), an esoteric course even by math standards.
I doubt any of the other students remember hearing it, and probably professor Griffeath doesn&apos;t remember saying it.
You never know the impact your words are going to have!&lt;/p&gt;
&lt;p&gt;But offhand remark or not, this idea of &quot;translating&quot; concepts into pictures resonated with me.
Case-in-point, the only concept that I vividly remember from my senior-level abstract algebra class are the diagrams I drew for myself to understand how 3 major classes of functions map between sets; &lt;a href=&quot;https://en.wikipedia.org/wiki/Injective_function&quot;&gt;injective&lt;/a&gt; (one-to-one), &lt;a href=&quot;https://en.wikipedia.org/wiki/Surjective_function&quot;&gt;surjective&lt;/a&gt; (on-to), and &lt;a href=&quot;https://en.wikipedia.org/wiki/Bijection&quot;&gt;bijective&lt;/a&gt; (both surjective and bijective).&lt;/p&gt;
&lt;p&gt;Pictures can be powerful tools.
When I say this I&apos;m not trying to say I&apos;m a in some way a &quot;visual learner&quot; (the popular theory of &quot;&lt;a href=&quot;https://en.wikipedia.org/wiki/Learning_styles&quot;&gt;Learning Styles&lt;/a&gt;&quot; is widely debunked).
Instead, to me drawing a picture is a shorthand for saying you have a deeper understanding of a concept that just parroting a definition.
It&apos;s when you understand the meaning of an idea enough to translate it into different forms, such as a picture.&lt;/p&gt;
&lt;p&gt;This quote has been kicking around in my head a lot over the past 8 months and I&apos;ve been taking a second pass at some of the subjects from my undergrad coursework, linear algebra in particular.
I don&apos;t always draw a literal picture, but I am going more slowly than I have in the past and striving for that same level of deeper understanding that allows me to the express concepts in my own language and shorthand.
Being able to rewrite something has proven a pretty good heuristic for how well I understand something.
And realizing my &quot;pictures&quot; are wrong has been a pretty effective way to correct my understanding!&lt;/p&gt;
&lt;p&gt;This brings me to the final thing I&apos;ve always liked about this quote, which is that I find it encouraging.
On the other side of whatever difficult concept you&apos;re banging your head against is something friendly and accessible, a picture.&lt;/p&gt;
&lt;p&gt;Your job is just to get to that picture.&lt;/p&gt;
</content:encoded></item><item><title>Hello, Manjaro!</title><link>https://acviana.com/posts/2023-07-12-hello-manjaro/</link><guid isPermaLink="true">https://acviana.com/posts/2023-07-12-hello-manjaro/</guid><description>First steps with Arch</description><pubDate>Wed, 12 Jul 2023 05:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;Hello, Manjaro!!&lt;/h1&gt;
&lt;p&gt;Hello from a fresh Manjaro Linux install on my 2014 Macbook Air 6,2!&lt;/p&gt;
&lt;p&gt;This is my first time running an Arch-based Linux distro and my first time running Linux on Apple hardware. I wrote out a few thoughts on picking Manjaro and getting everything setup.&lt;/p&gt;
&lt;p&gt;But first, let&apos;s talk about dorking around with Linux distros while everyone else is playing with AI.&lt;/p&gt;
&lt;h2&gt;Under Pressure&lt;/h2&gt;
&lt;p&gt;If you&apos;re even moderately connected to the more &quot;online&quot; parts of the tech world, it&apos;s hard not to feel pressure to constantly be doing more.
Pressure to build a public persona, pressure to ship, pressure to be up-to-date, pressure to get the next job, pressure to join &quot;the discourse&quot;, to monetize your side hustle, to turn your project into a company.&lt;/p&gt;
&lt;p&gt;I was thinking about this pressure as I was an hour into sitting cross-legged on the floor tethered to my router with a 3-foot Ethernet cable trying to set up a Wi-Fi drivers on an 8 year-old machine when I have &lt;strong&gt;four&lt;/strong&gt; other perfectly good laptops around the house.
As I finally got the Wi-Fi icon to light up I couldn&apos;t help thinking ...&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I did it! ... But should I have?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The rest of the tech world is exploring a brave new world of AI and LLMs.
I&apos;ve got work I could be doing for my job or math I could be studying to further my career.
Instead, I&apos;m losing the feeling in my legs while tweaking Linux settings &lt;em&gt;and then&lt;/em&gt; doubling down on that decision by blogging about it for the 2 (maybe) people who read this blog.&lt;/p&gt;
&lt;p&gt;My point is that playing with Linux is not what productivity culture tells me I should be doing.
But, I think it&apos;s important to learn to let go of that.
This is something &lt;a href=&quot;https://vercel-nextjs-blog-acviana.vercel.app/posts/2023-01-01-i-love-computers&quot;&gt;I&apos;ve written about before&lt;/a&gt;, but sometimes, it&apos;s nice just to fiddle for the sake of fiddling.
To write for the sake of writing.
To practice both as a craft, or if nothing else just so the ideas will stop rattling around in my head.&lt;/p&gt;
&lt;p&gt;So let&apos;s talk about messing around with Linux, AI will still be there when we get back.&lt;/p&gt;
&lt;h2&gt;Going Beyond Ubuntu&lt;/h2&gt;
&lt;p&gt;The machine I set this up on is an old MacBook Air that hasn&apos;t been used in a few years.
It&apos;s about 8 years old at this point, which is well-past the hardware cutoff for the latest MacOS updates.
But, with the right OS, a quardcore i5 CPU and 4 GB of RAM is more than enough for the programming and websurfing I do.
So I decided to give this machine a second life as Linux box, but which flavor?&lt;/p&gt;
&lt;p&gt;Ubuntu was my first choice. I have been running it for 10+ years across two older IMB/Lenovo laptops so I was comfortable with it and I knew it met my needs.
At first I was worried it wouldn&apos;t work well on Apple hardware, but from what I&apos;ve read that doesn&apos;t seem to be an issue.
And in case it isn&apos;t obvious from the MacBook model date, this is still an Intel-based CPU.
So I&apos;m not worried about compiling against the new generation of Apple CPUs, just general hardware support.&lt;/p&gt;
&lt;p&gt;But, as I mentioned, this machine only has 4 GB of memory.
That &lt;em&gt;barely&lt;/em&gt; meets the required minimums for my standard Ubuntu + Gnome desktop setup.
I could address this by running a lighter-weight Ubuntu distro like Xubuntu or Kubuntu.
Or, if I wanted to stay in the Debian family, I got an old laptop up and running for my dad on ZornOS last year.&lt;/p&gt;
&lt;p&gt;But, none of those felt like compelling options to me.
I was doing this purely for fun and after 10 years on Ubuntu I wanted to try something completely new.&lt;/p&gt;
&lt;h2&gt;Settling on Manjaro and Arch&lt;/h2&gt;
&lt;p&gt;When I first started running Linux on my own machines Ubuntu felt like the front-runner for folks who just wanted something FOSS &quot;that just worked&quot; in a desktop environment.
But as I was looking around now in 2023 it seemed like there were a lot more well-supported options that had taken the same &quot;batteries included&quot; approach.
On distro that I had never heard of seemed to be on everyone&apos;s list: Manjaro.&lt;/p&gt;
&lt;p&gt;Manjaro (man-JAR-o), is a popular and relatively user-friendly distro based on Arch Linux.
This immediately caught my eye because Arch Linux is a notoriously difficult distro to set up.
It comes with essentially nothing included, not even a desktop, so you start from a command line and you have to really know your way around Linux to set it up.
The payoff is a high level of customization.
Manjaro tries to bridge that gap by providing a user-friendly Arch setup.&lt;/p&gt;
&lt;p&gt;Honestly, they had me at notoriously difficult. I decided to give Manjaro a shot.&lt;/p&gt;
&lt;h2&gt;First Steps with Manjaro&lt;/h2&gt;
&lt;p&gt;All-in-all this was a pretty easy system to get set up.
The one major exception, as I mentioned in the beginning of the post, was getting the Wi-Fi drivers to work.
But that likely has nothing to do with Manjaro or Arch; Apple hardware is proprietary.
As I understand it, that means that they do not release open-source drivers for some of their hardware such as the webcam and, crucially, their wireless network card.&lt;/p&gt;
&lt;p&gt;The solution to this is install the OS with a wired connection and then install the drivers needed to support the Broadcom network hardware on my machine. Fortunately, this is a well-documented problem on the Manjaro / Arch message boards and after about 90 minutes I had a working Wi-Fi connection.&lt;/p&gt;
&lt;p&gt;The steps are well-documented elsewhere but just to give you sense of the scale of solution I had to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Figure out how to read the system configuration with &lt;code&gt;inxi&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Decide I wanted to go with the boardcom-dkms driver (&lt;code&gt;broadcom-wl-dkms&lt;/code&gt; and &lt;code&gt;wpa_supplicanat&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Learn how to use pacman (the Arch package manager).&lt;/li&gt;
&lt;li&gt;Find my kernal version and install the corresponding linux header files.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;ip link set&lt;/code&gt; to enable the network interface.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And then it worked! So not terrible, but definitely something you need at least intermediate Linux knowledge to troubleshoot. And this is an issue if you&apos;re running this on a Mac.&lt;/p&gt;
&lt;p&gt;All-in-all a fun weekend project and a decent excuse to crank out a blog post. Thanks for reading!&lt;/p&gt;
</content:encoded></item><item><title>Hello, LaTeX!</title><link>https://acviana.com/posts/2023-05-21-hello-latex/</link><guid isPermaLink="true">https://acviana.com/posts/2023-05-21-hello-latex/</guid><description>Playing around with LaTeX formuals for fun</description><pubDate>Mon, 22 May 2023 05:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This year I&apos;ve working on gaining a deeper understanding of AI and ML tools.
To do this I&apos;ve been studying a lot of math, specifically &lt;a href=&quot;https://en.wikipedia.org/wiki/Vector_space&quot;&gt;vector spaces&lt;/a&gt;.
As I&apos;ve been getting increasingly excited about the progress I&apos;m making,
I figured some blog posts were in my future.
This seemed like a good enough reason to go on a side quest to set up equation rendering.&lt;/p&gt;
&lt;p&gt;Fortunately, the blogging framework I&apos;m using right now, &lt;a href=&quot;https://nextra.site/&quot;&gt;Nextra&lt;/a&gt;, comes with $\LaTeX$ (nice) support built in.
After migrating some configuration files, I was able get it set up and wanted to do a &quot;little hello&quot; world post with some equations!&lt;/p&gt;
&lt;p&gt;So, as a proof-of-concept, here is the definition a subspace, of a core concept I&apos;ve been studying in Sheldon Axler&apos;s text &lt;a href=&quot;https://link.springer.com/book/10.1007/978-3-319-11080-6&quot;&gt;Linear Algebra Done Right&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Subspaces&lt;/h3&gt;
&lt;p&gt;For a vector space $V$ over a field $F$ (nominally either $\mathbb{R}$ the real numbers or $\mathbb{C}$ the complex numbers), the subset $U \subseteq V$ is said to be a &lt;em&gt;subspace&lt;/em&gt; of V if and only if ($\Leftrightarrow$) the following 3 conditions are met:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Additive Identity:&lt;/strong&gt; The additive identity $0 \in V$ is also $\in U$. That is, $0 \in U \cap V$, where $0$ is an element in $V$ such that $0 + x = x + 0 = x$ for all $x \in V$.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Closure Under Addition:&lt;/strong&gt; For any $x,y \in U$ we also have $x+y \in U$.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Closure Under Scalar Multiplication:&lt;/strong&gt; For $a \in F$ and $x \in U$ we have $ax \in U$.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Update 2023-06-25:&lt;/strong&gt; Fixed the definition of a subspace to include the trivial case $U = V$ i.e. $U \subseteq V$ instead of $U \subset V$. Thanks Dad :)&lt;/p&gt;
</content:encoded></item><item><title>Journal Club: Talking About Large Language Models</title><link>https://acviana.com/posts/2023-01-16-talking-about-large-language-models/</link><guid isPermaLink="true">https://acviana.com/posts/2023-01-16-talking-about-large-language-models/</guid><description>Notes on Murray Shanahan&apos;s &quot;Talking About Larger Language Models&quot;</description><pubDate>Mon, 16 Jan 2023 06:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I want to do more intention self-directed study in 2023.
One of the ways I&apos;m working on this is by reading academic and technical papers.
Taking notes on what I&apos;m reading gives me some more direction and helps organize my thoughts.
Hopefully, I&apos;ll continue commenting on articles as time and inspiration allows.&lt;/p&gt;
&lt;p&gt;My first paper of the year is &lt;a href=&quot;https://arxiv.org/abs/2212.03551&quot;&gt;“Talking About Large Language Models”&lt;/a&gt; by Murray Shanahan.
It was a good first pick!
The topic of Large Language Models (LLMs) is basically inescapable in tech right now and the fact that the article is fairly philosophical and less technical in nature made it an accessible entry point for trying to understand these systems.
Overall, I think the article gives a useful framing of LLM systems that I&apos;ll continue to use as I try to understand LLMs and AI generally.&lt;/p&gt;
&lt;p&gt;What follows are my notes but the paper itself isn&apos;t that long so you should consider reading the original if this seems interesting to you.&lt;/p&gt;
&lt;p&gt;With apologies to &lt;a href=&quot;https://en.wikipedia.org/wiki/What_We_Talk_About_When_We_Talk_About_Love&quot;&gt;Raymond Carver&lt;/a&gt;, let&apos;s dive in.&lt;/p&gt;
&lt;h3&gt;What We Talk About When We Talk About Talking About Large Language Models&lt;/h3&gt;
&lt;p&gt;The paper stars by reminding us that despite the incredible and evolving accomplishments of LLMs they are fundamentally still &quot;next [text] token prediction&quot; machines.
As such, they lack any internal structures that could be mapped to anything like the human experience of &quot;knowing&quot; or &quot;understanding&quot;.
Because LLMs and other forms of &quot;exotic, mind-like entities&quot;  are poised to become a fixture in our everyday lives we need to be precise in talking about what these systems are actually doing, both as practitioners and in the general public.&lt;/p&gt;
&lt;h3&gt;Unreasonable Effectiveness&lt;/h3&gt;
&lt;p&gt;The article starts by pointing out 3 ways in which large language models (LLMs) are “unreasonably” effective.
As an aside, this is a reference to Halevy, Norvig, and Pereira&apos;s 2009 article &lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf&quot;&gt;&quot;The Unreasonable Effectiveness of Data&quot;&lt;/a&gt;.
This is in turn is a reference to Wagner&apos;s 1960 article &lt;a href=&quot;https://www.maths.ed.ac.uk/~v1ranick/papers/wigner.pdf&quot;&gt;&quot;The Unreasonable Effectiveness of Mathematics in the Natural Sciences&quot;&lt;/a&gt; (additional &lt;a href=&quot;https://en.wikipedia.org/wiki/The_Unreasonable_Effectiveness_of_Mathematics_in_the_Natural_Sciences&quot;&gt;background&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The third of there three points is the focus of the article and the most relevant to the understanding LLMs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;… a great many tasks that demand intelligence in humans can be reduced to next token prediction with a sufficiently performant model.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The more you sit with this statement the more remarkable it becomes!
However, in spite of this (or rather because of this) the rest of the article focuses on reminding us that this &quot;effectiveness&quot; is &lt;strong&gt;not&lt;/strong&gt; in fact anything resembling “intelligence”.&lt;/p&gt;
&lt;h3&gt;Defining LLMS&lt;/h3&gt;
&lt;p&gt;Continuing on, we get a definition of LLMs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;LLMs are generative mathematical models of the statistical distribution of tokens in the vast public corpus of human generated text …&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And from there the author reminds us that, under the hood, all LLM systems are doing is something like the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Here’s a fragment of text. Tell me how this fragment might go on. According to your model of the statistics of human language, what words are likely to come next?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Reiterating and tying those points together more explicitly:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There are two important takeaways here. First, the basic function of a large language model, namely to generate statistically likely continuations of word sequences, is extraordinarily versatile. Second, notwithstanding this versatility, at the heart of every such application is a model doing just that one thing: generating statistically likely continuations of word sequences. Having established how LLMs work (probabilistic prediction on sequences of text tokens) the article now focuses on the fact that, by design, LLMs can not “know” or “understand” anything. They say what is likely to come next.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;These concepts are largely preliminary to the main point of the article, which is the limitations of these systems and how our language should reflect this.
But, as technical reader who is not an AI practitioner, these preliminary definitions are an extremely useful framework to begin to think about all the AI and LLM models news I see everyday.&lt;/p&gt;
&lt;h3&gt;LLMs, Knowledge, and Anthropomorphism&lt;/h3&gt;
&lt;p&gt;The remainder of the paper presses the issue that these systems, by nature of their scope, can not be said to &quot;think&quot; or &quot;know&quot;.
And further, it&apos;s important &lt;strong&gt;not&lt;/strong&gt; to reach for convenient idiomatic expressions that suggest they can.
This precise language is important for an accurate long-term societal understanding of what these systems are doing.&lt;/p&gt;
&lt;p&gt;The author provides a nice analogy to illustrate these limitations:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;An encyclopedia doesn&apos;t literally “know” or “believe” anything, in the way that a human does, and neither does a bare-bones LLM.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The author supports this position by arguing that LLMs operate in a context completely outside of that of human &quot;communicative intent&quot;; meaning it knows nothing about person asking the question, people in general, or the effect of its response.
The author then preempts the argument that such knowledge could arise as an emergent phenomenon from a neural network like system because sequence prediction (the essence of LLMs) will never contain a notion of communicative intent.
I&apos;m omitting the details because neither argument is particularly lengthy and interested readers would be better served to read the source material.&lt;/p&gt;
&lt;h3&gt;Wrapping Up&lt;/h3&gt;
&lt;p&gt;To conclude, the author asserts (and I largely agree) that these arguments are neither pedantic nor purely philosophical.
LLMs, AI, and other “exotic, mind-like entities” are not going anywhere and nor are they well understood. The linguistic choices we make now could potentially impact the ability of practitioners, law makers, and the general public to reason about the limitations and abilities of these new and exciting systems.&lt;/p&gt;
&lt;p&gt;This paper was an interesting and relevant start to the year and I&apos;m glad I found time to jot down my thoughts. I&apos;m hoping the next paper I pick will come from the citations in this paper so I can start to build a deeper understanding of this space.&lt;/p&gt;
</content:encoded></item><item><title>I Love Computers</title><link>https://acviana.com/posts/2023-01-01-i-love-computers/</link><guid isPermaLink="true">https://acviana.com/posts/2023-01-01-i-love-computers/</guid><description>Working on computers for fun ... and not for profit</description><pubDate>Sun, 01 Jan 2023 06:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I&apos;m coming off an 11-day break for the winter holidays.
I came into the holidays this year really run down and needing a break.
But weirdly, one of the things I did to recharge from my job working with computers all day was ... tinkering with and maintaining various computers in my house.
I have 3 different generations of MacBooks in my house but I spent most of my time on my favorite machine, a 9 year old ThinkPad X1 Carbon running Ubuntu.&lt;/p&gt;
&lt;p&gt;I worked on everything from updating software, installing new utilities, tweaking custom GUI aesthetics, fixing nagging warning messages, and fiddling with configuration files.
It was a nice change of pace from my work life where I have to prioritize efficiency over say, getting a mouse pointer size just right.&lt;/p&gt;
&lt;p&gt;It was also a nice reminder that ever since I was a kid I have loved computers for their own sake.
Not just for building useful stuff, I just enjoy the act of working on them, even in the absence of a project.
There&apos;s something very rewarding to me about taking a system that&apos;s nearly a decade old and getting it to be almost as nice to work on as the MacBook M1 Pro I use for work.&lt;/p&gt;
&lt;p&gt;Happy new year and I hope we all get to do more of the things that make us happy in 2023.&lt;/p&gt;
&lt;p&gt;To start of the year with some festive energy please enjoy Masayoshi Takanaka&apos;s &lt;a href=&quot;https://www.youtube.com/watch?v=aGm_X6viE0A&quot;&gt;samba cover of the Star Wars theme&lt;/a&gt;.
It&apos;s completely irreverent, wildly fun, and so much better than it has any right to be.&lt;/p&gt;
</content:encoded></item><item><title>One Day Of Advent Of Code</title><link>https://acviana.com/posts/2023-01-07-one-day-of-advent-of-code/</link><guid isPermaLink="true">https://acviana.com/posts/2023-01-07-one-day-of-advent-of-code/</guid><description>Working on Advent of Code and learning to take only what you need from something</description><pubDate>Sun, 01 Jan 2023 06:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This year I finally accomplished a dream of mine - I &lt;em&gt;only&lt;/em&gt; did a single day of the annual &lt;a href=&quot;https://adventofcode.com/&quot;&gt;Advent of Code&lt;/a&gt; challenge.
While most people aspire to complete the entire 25 day exercise, I&apos;ve learned Advent of Code is as much about me wrestling with my own perfectionism as it is about solving coding puzzles.&lt;/p&gt;
&lt;p&gt;I&apos;ve been doing Advent of Code for 4 years now.
3 years ago, I did the most I had ever done and I was miserable.
For 13 days AOC was eating up what little time and energy I had after work.
I was caught in a cycle where I was ashamed of myself for not solving the problems more easily.
But, I was unable to step away because I felt like I was so close to finishing. This made me invest even more time which made me even more ashamed.&lt;/p&gt;
&lt;p&gt;Last year, I only did 4 days of puzzles but I also stopped trying to solve as many problems as I could.
Part of this is because I stopped feeling like I needed to use AOC to prove my programming ability to myself and others.
This allowed me to start using AOC as a sandbox to try new (to me) Python tools and features like type hints, mypy, and to build out workflows in Make.
I had &lt;em&gt;way&lt;/em&gt; more fun working this way and ended up with some tools and patterns I was able to use for the rest of the year.&lt;/p&gt;
&lt;p&gt;I discovered that last part, up-to-date knowledge of how to build complete modern python projects, is really valuable to me.
As a then manager and now executive, I&apos;m not coding regularly.
So when I do have a chance to get my hands dirty, for fun or for profit, it&apos;s a huge accelerator to have a go-to template to quickly build out a project with up-to-date tools and practices.&lt;/p&gt;
&lt;p&gt;I don&apos;t mean &quot;template&quot; metaphorically.
I have a &lt;a href=&quot;https://github.com/acviana/python-project-template/&quot;&gt;cookiecutter template&lt;/a&gt; I use to build my projects and now I use Advent of Code as my chance to give it an annual refresh.&lt;/p&gt;
&lt;p&gt;This year, I leaned even more into &quot;doing more with less&quot; and only did one day&apos;s worth of puzzles.
I then spent the remaining time making updates to my solution workflow including switching from flake8 to &lt;a href=&quot;https://github.com/charliermarsh/ruff&quot;&gt;ruff&lt;/a&gt;, adding &lt;a href=&quot;https://mypy-lang.org/&quot;&gt;mypy&lt;/a&gt; into my workflows, and adding a template documentation page with &lt;a href=&quot;https://www.sphinx-doc.org/en/master/&quot;&gt;Sphinx&lt;/a&gt; driven by docstrings and typehints with &lt;a href=&quot;https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html&quot;&gt;autodoc&lt;/a&gt;.
With the exception of Ruff this was all stuff I&apos;d done before, but had never taken the time to systematize.&lt;/p&gt;
&lt;p&gt;I then started porting the ideas I enjoyed into my template.
Checking for bugs in your template can be tricky because the template itself is full of jinja-style template variables (e.g. &lt;code&gt;{{project-name}}&lt;/code&gt;) so you can&apos;t run it directly.
To help with this I created a &lt;a href=&quot;https://github.com/acviana/python-project-template-testing&quot;&gt;new project&lt;/a&gt; with a &lt;code&gt;Makefile&lt;/code&gt; that pulls my cookiecutter template and runs a &quot;hello world&quot; build to check for obvious bugs.&lt;/p&gt;
&lt;p&gt;I really wish I had the time to do all 25 of the AOC puzzles every year, but the truth is I don&apos;t. Instead, I&apos;m glad I&apos;ve gotten to the point in my life where I have the confidence to make use of something in the way I want to.&lt;/p&gt;
</content:encoded></item><item><title>The Coconut Programing Language</title><link>https://acviana.com/posts/2022-10-24-coconut-programming-language/</link><guid isPermaLink="true">https://acviana.com/posts/2022-10-24-coconut-programming-language/</guid><description>A functional programming language that compiles to Python</description><pubDate>Mon, 24 Oct 2022 05:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A friend of mine tipped me off to a neat little project called the &lt;a href=&quot;https://coconut-lang.org/&quot;&gt;Coconut programming language&lt;/a&gt;.
From the docs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Coconut is a functional programming language that compiles to Python. Since all valid Python is valid Coconut, using Coconut will only extend and enhance what you&apos;re already capable of in Python.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Because I have a math degree and use functions in my Python code I like to pretend I&apos;m a functional programmer (mustache twirl).
However, other than my annual 30-minute attempt to learn Haskell on the first day of &lt;a href=&quot;https://adventofcode.com/&quot;&gt;Advent of Code&lt;/a&gt; and the occasional &lt;a href=&quot;https://realpython.com/lessons/type-hinting/&quot;&gt;Python type hint&lt;/a&gt; -- I haven&apos;t made much progress on that front.&lt;/p&gt;
&lt;p&gt;On one hand, I doubt I&apos;ll spend much time with Coconut. But on the other, I do think this would be the first project I would reach for now if I wanted to get my hands dirty working in a functional framework.&lt;/p&gt;
&lt;p&gt;My last thought here is that this project kind of reminds me of &lt;a href=&quot;https://coffeescript.org/&quot;&gt;CoffeeScript&lt;/a&gt;. That effort seems to have &lt;a href=&quot;https://javascript.plainenglish.io/coffeescript-6dd64142b8dd&quot;&gt;run out of steam&lt;/a&gt; but developers of a certain age will remember it as another Haskell-influence language that compiles down to a &quot;mainstream&quot; language.&lt;/p&gt;
</content:encoded></item><item><title>Where The Water Goes by David Owen</title><link>https://acviana.com/posts/2022-09-17-where-the-water-goes/</link><guid isPermaLink="true">https://acviana.com/posts/2022-09-17-where-the-water-goes/</guid><description>Life and death along the Colorado River</description><pubDate>Sat, 17 Sep 2022 05:00:00 GMT</pubDate><content:encoded>&lt;p&gt;David Owen&apos;s &lt;a href=&quot;https://www.npr.org/2017/04/11/522778149/where-the-water-goes-is-effortlessly-engaging-and-also-scary&quot;&gt;&quot;Where The Water Goes&quot;&lt;/a&gt; follows the author along the length of the Colorado River as he narrates the complex legal, economic, and environmental issues around water usage.&lt;/p&gt;
&lt;p&gt;What stood out to me in this book is the cascading of unintended consequences, even from what seem like sensible environmentally conscious decisions.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Allowing man-made lakes to dry up releases captured pollution into the air&lt;/li&gt;
&lt;li&gt;Water waste provides neccissary &quot;slack&quot; in water system capacity and replenishes the water table&lt;/li&gt;
&lt;li&gt;Efficient water consumption induces more water utilization&lt;/li&gt;
&lt;li&gt;Removing water-intensive plants increases dust and creates &quot;heat islands&quot;&lt;/li&gt;
&lt;li&gt;Man-made water ways become critical support systems for endangered species&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The ways in which decisions, compromises, and accidents continued to change the social, environmental, and literal landscape were mind-boggling.
Overall, it made me interested in learning more about the state of the waterways in the upper Midwest where I live.&lt;/p&gt;
</content:encoded></item><item><title>The Crystal Programming Language</title><link>https://acviana.com/posts/2022-09-06-the-cryatal-programming-language/</link><guid isPermaLink="true">https://acviana.com/posts/2022-09-06-the-cryatal-programming-language/</guid><description>A quick peek at at the Crystal programming language</description><pubDate>Tue, 06 Sep 2022 05:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I recently came across a Hacker News thread on the &lt;a href=&quot;https://crystal-lang.org/&quot;&gt;Crystal programming language&lt;/a&gt; that piqued my interest. Reading around, it seems to hit many of the language features I&apos;m interested in.&lt;/p&gt;
&lt;p&gt;From what I can gather, Crystal is an imperative, compiled, and statically-typed language with a syntax heavily influenced by Ruby.
Types can often be inferred by the compiler resulting in a syntax more similar to dynamic languages while static typing leads to more efficient compiled code.&lt;/p&gt;
&lt;p&gt;Here&apos;s a sample webserver from their documentation page:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# A very basic HTTP server
require &quot;http/server&quot;

server = HTTP::Server.new do |context|
  context.response.content_type = &quot;text/plain&quot;
  context.response.print &quot;Hello world, got #{context.request.path}!&quot;
end

puts &quot;Listening on http://127.0.0.1:8080&quot;
server.listen(8080)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This syntax looks very clean to me. It&apos;s just different enough to Python to be interesting but not so much that I couldn&apos;t quickly be productive.&lt;/p&gt;
&lt;p&gt;Maybe I&apos;ll try it out on one or two Advent of Code puzzles this year.&lt;/p&gt;
&lt;p&gt;Thanks for reading, please enjoy this recording of a &lt;a href=&quot;https://open.spotify.com/track/6Q8v1qifgM8zIyBbie5MM4?si=ccd8cd41b0cd4e32&quot;&gt;Beethoven late string quartet (No. 12, Op. 127)&lt;/a&gt; by the LeSalle Quartet.&lt;/p&gt;
</content:encoded></item><item><title>SQLite and DuckDB</title><link>https://acviana.com/posts/2022-09-05-sqlite-and-duckdb/</link><guid isPermaLink="true">https://acviana.com/posts/2022-09-05-sqlite-and-duckdb/</guid><description>Diving into a recent paper on running OLAP workloads on my favorite database</description><pubDate>Mon, 05 Sep 2022 05:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I recently came across a &lt;a href=&quot;https://simonwillison.net/2022/Sep/1/sqlite-duckdb-paper/&quot;&gt;great blog post&lt;/a&gt; by Simon Willison (creator of &lt;a href=&quot;https://datasette.io/&quot;&gt;Datasette&lt;/a&gt;) digging into a recent academic paper comparing OLAP workloads on SQLite against DuckDB.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://duckdb.org/&quot;&gt;DuckDB&lt;/a&gt; is a newer file-based database that&apos;s generating a lot of buzz and is nicknamed &quot;the SQLite for analytics&quot;.&lt;/p&gt;
&lt;p&gt;If you&apos;re into that sort of thing you should check out the &lt;a href=&quot;https://vldb.org/pvldb/volumes/15/paper/SQLite%3A%20Past%2C%20Present%2C%20and%20Future&quot;&gt;original paper&lt;/a&gt;.
It&apos;s pretty readible and gives an overview of SQLite&apos;s history, architecture, OLAP benchmarks, bottlenecks, and potential optimizations.&lt;/p&gt;
&lt;p&gt;I love this type of well-organized reserach and I feel like I&apos;ve learned a lot more about my favorite database even though I&apos;m only 1/3 of the way through the paper.&lt;/p&gt;
&lt;p&gt;Thanks for reading and please enjoy &lt;a href=&quot;https://www.artic.edu/artworks/100858/sky-above-clouds-iv&quot;&gt;this wonderful Georgia O&apos;Keeffe&apos;s painting&lt;/a&gt; that has moved me tears before.&lt;/p&gt;
</content:encoded></item><item><title>Hello, Darkness</title><link>https://acviana.com/posts/2022-09-01-hello-darkness/</link><guid isPermaLink="true">https://acviana.com/posts/2022-09-01-hello-darkness/</guid><description>My old Friend ...</description><pubDate>Thu, 01 Sep 2022 05:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I figured out how to turn on a &quot;dark mode&quot; toggle.
It&apos;s &lt;a href=&quot;https://nextra.vercel.app/themes/blog&quot;&gt;somewhat buried in the docs&lt;/a&gt; but you can add a &lt;code&gt;darkMode: true&lt;/code&gt; flag in &lt;code&gt;theme.config.js&lt;/code&gt; to turn this feature on.&lt;/p&gt;
&lt;p&gt;I was a little confused at first because the app doesn&apos;t default to dark mode, it just adds a sun/moon icon in the upper right the page where you can toggle your color preference.&lt;/p&gt;
&lt;p&gt;It&apos;s a pretty intense high-contrast dark mode though, which I&apos;m not sure if I like it enough to keep.&lt;/p&gt;
&lt;p&gt;But this is a neat little app and I&apos;m enjoying playing with it.&lt;/p&gt;
&lt;p&gt;Please enjoy this documentary on the &lt;a href=&quot;https://watchdocumentaries.com/helvetica/&quot;&gt;Helvetica typeface&lt;/a&gt;.&lt;/p&gt;
</content:encoded></item><item><title>Hello, World!</title><link>https://acviana.com/posts/2022-09-01-hello-world/</link><guid isPermaLink="true">https://acviana.com/posts/2022-09-01-hello-world/</guid><description>First Post!</description><pubDate>Wed, 31 Aug 2022 05:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I&apos;m starting this blog with some experiments with 100-word posts.&lt;/p&gt;
&lt;p&gt;I&apos;m not witty enough for Twitter, but I mostly don&apos;t want to argue with strangers.
Writing a newsletter was fun, but my tiny ideas keep exploding into multi-page essays.
The proofreading and editing buried me for weeks.&lt;/p&gt;
&lt;p&gt;So I&apos;m trying for ~100 words at a time.
About the length of a few tweets.
I mostly just want a place to jot down my thoughts.
Some code, some math, some music.&lt;/p&gt;
&lt;p&gt;Maybe I should just text my friends more.&lt;/p&gt;
&lt;p&gt;But I like playing around with Markdown so here we are.&lt;/p&gt;
&lt;p&gt;Please enjoy this video of &lt;a href=&quot;https://www.youtube.com/watch?v=VhMWUayNMcM&quot;&gt;Jaco Pastorius playing bass&lt;/a&gt;.&lt;/p&gt;
</content:encoded></item><item><title>Prototyping for Data Teams</title><link>https://acviana.com/posts/2021-07-23-prototyping-for-data-teams/</link><guid isPermaLink="true">https://acviana.com/posts/2021-07-23-prototyping-for-data-teams/</guid><description>Hello again!</description><pubDate>Fri, 23 Jul 2021 23:17:47 GMT</pubDate><content:encoded>&lt;p&gt;Hello, welcome back, and thanks for reading! If you enjoy this issue please consider sharing or subscribing.&lt;/p&gt;
&lt;h2&gt;We&apos;re Back!&lt;/h2&gt;
&lt;p&gt;I took a break from this newsletter to work on some of the projects I wrote about in this issue. I&apos;m trying to make time to write again and part of that process is trying to nudge this newsletter towards a shorter and more sustainable format.&lt;/p&gt;
&lt;p&gt;I think the tone I&apos;m aiming for is less like the comprehensive Medium-style guides of my first issues and instead more like informal sketches of ideas I&apos;m kicking around. Something more like an email you would send a friend when you come across an interesting article or have an idea you want to share.&lt;/p&gt;
&lt;p&gt;You&apos;re that friend. We&apos;re data friends now.&lt;/p&gt;
&lt;h2&gt;Prototyping for Data Teams&lt;/h2&gt;
&lt;p&gt;The main project I&apos;ve been working on for the last few months is organizing some thoughts around the role of prototyping for the modern data practitioner.&lt;/p&gt;
&lt;p&gt;My thesis goes something like this. The expanding role of modern data teams, powered by executive-level buy-in and self-service SaaS tools, had allowed data teams to take on projects that are higher impact and more complex then their previous work. This is interesting and exciting but because the stakes are now higher the costs of building the wrong things are also higher.&lt;/p&gt;
&lt;p&gt;One of the ways to reduce project risk is to create a prototype. But to prototype a data product (even something as basic as a dashboard) you often have to use data that is somehow approximate, simulated, or outright faked. I think this flies in the face of the natural tendency of data teams to obsesses over data accuracy and as a result we tend to skip this step. The result is that we don&apos;t really start validating if we&apos;ve built the right thing until after we&apos;ve built it, at which point we&apos;re already dealing with a sunk cost.&lt;/p&gt;
&lt;p&gt;My first public iteration of this idea came last month when I spoke at the Data Quality Meetup hosted by &lt;a href=&quot;https://www.datafold.com/&quot;&gt;DataFold&lt;/a&gt;. It&apos;s a great event I encourage you to check out. You can read a &lt;a href=&quot;https://www.datafold.com/blog/data-quality-meetup-4/#fake-it-till-you-make-it-a-backward-approach-to-data-products&quot;&gt;write-up of the entire event&lt;/a&gt; and see my section in the video below.&lt;/p&gt;
&lt;h2&gt;Hitting the Conference Circuit&lt;/h2&gt;
&lt;p&gt;I read a blog post years ago by someone who was a frequent speaker at tech conferences about how they approached the conference submission process (unfortunately, I can&apos;t find the link anymore). The author&apos;s method was to pick one topic at the beginning of the year, make the best presentation they could around that topic, and then submit iterations of that talk to every conference for the rest of the year. It&apos;s a simple concept but it was mind-blowing to me at the time.&lt;/p&gt;
&lt;p&gt;This newsletter has already gotten me the habit of publicly articulating my ideas and then revisiting and rehashing them. From there, it was a natural progression to bring the same approach to my conference presentations. Case in point, I submitted an updated version of my Data Quality Meetup talk to the &lt;a href=&quot;https://coalesce.getdbt.com/&quot;&gt;dbt Coalesce Conference&lt;/a&gt; in December. And then again to the &lt;a href=&quot;https://fivetran.com/blog/modern-data-stack-conference-2021&quot;&gt;Fivetran Modern Data Stack Conference&lt;/a&gt; in September.&lt;/p&gt;
&lt;p&gt;You can read &lt;a href=&quot;https://sessionize.com/app/speaker/session/270782&quot;&gt;my Coalesce Conference submission here&lt;/a&gt; and &lt;a href=&quot;https://www.loom.com/share/17580900baaa4314af85a2f83d2ee04c&quot;&gt;watch my pitch video here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Submitting to conferences is scary. You could get rejected, you could get accepted and not be happy with your talk. And there&apos;s nothing like watching a video of yourself speaking to make you realize what a weirdly twitchy and awkward human you are. 🤪&lt;/p&gt;
&lt;p&gt;Submitting to conferences means opening yourself up to failure. So let&apos;s talk about failure.&lt;/p&gt;
&lt;h2&gt;Failure as a Goal and Failure as a Process&lt;/h2&gt;
&lt;p&gt;2 years ago when I was last on the job market, I read a blog post about someone trying to get to 100 rejections in their job hunt. I can&apos;t find the original post, but it was certainly inspired by &lt;a href=&quot;https://www.ted.com/talks/jia_jiang_what_i_learned_from_100_days_of_rejection&quot;&gt;this TED Talk&lt;/a&gt;. I know TED talks are something of a meme these days but this was a useful nudge for me.&lt;/p&gt;
&lt;p&gt;I realized that if I wasn&apos;t getting rejections then I wasn&apos;t being ambitious enough in my job search. So even thought it left me anxious to the point of insomnia, I applied to everything I was even remotely interested in: from seed stage startups to billion-dollar companies, and from VC firms to boutique analytics consultancies.1 It made my job hunt much scarier but also more interesting and fulfilling. I think I ended up feeling much more settled in my final decision as well.&lt;/p&gt;
&lt;p&gt;It&apos;s in this same spirit I&apos;m intentionally telling you about the conferences I&apos;ve applied to without knowing if my talks were accepted.&lt;/p&gt;
&lt;p&gt;I&apos;m doing this because I think it&apos;s healthy to make sure we&apos;re &quot;failing&quot; just a little bit, and even to loosen up our concept of what counts as success. I&apos;m doing it because I want to choose to celebrate the hard work I put into this process and what I was able to produce.&lt;/p&gt;
&lt;p&gt;Let me be clear, I &lt;em&gt;hate&lt;/em&gt; failure and rejection. I&apos;m naturally anxious and risk-adverse. I mean, I leave the house with two raincoats. But I&apos;ve been learning to let some of that go when it doesn&apos;t serve me.&lt;/p&gt;
&lt;p&gt;Lastly, I hope it&apos;s not presumptuous of me to assume that you might need the same reminder to put yourself out there a little more.&lt;/p&gt;
&lt;p&gt;Seeing as we&apos;re data friends now.&lt;/p&gt;
&lt;h2&gt;Odds &amp;amp; Ends&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Around the Internet:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://zelby.substack.com/&quot;&gt;3 Things&lt;/a&gt;: This newsletter is increasingly influenced by the format of my friend Elaine&apos;s &quot;3 Things&quot; newsletter. Concise, original, and consistent, Elaine is a VC who digs into 3 business rabbit holes each week she finds interesting. The breadth of her ideas are astounding to me and I admire her delivery.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Listening:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.flowstate.fm/&quot;&gt;Flow State&lt;/a&gt;: If I have your phone number and you&apos;re into music I&apos;ve probably already texted you about this one. &quot;Flow State&quot; is a substack that comes out every work day with &quot;2 hours of music perfect for working&quot;. They lean heavily into ambient, experimental, electronic, and jazz artists, which minimal lyrics and more upbeat selections on Friday. My only complaint is that its too much content!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Reading:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;http://www.irrationalexuberance.com/main.html?src=%2F&quot;&gt;Irrational Exuberance&lt;/a&gt;: Robert J. Schiller is a noble laureate in economics and &lt;em&gt;Irrational Exuberance&lt;/em&gt; is probably his most famous book about behavioral economics and specifically the wisdom/folly of crowds in pricing assets such as stocks and real estate. The topic is fascinating but this book is a textbook in disguise. Incredibly exhaustive and discursive I was barely able to get though it even though I&apos;m glad I did.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://press.princeton.edu/books/paperback/9780691170817/the-box&quot;&gt;The Box&lt;/a&gt;: I enthusiastically read a whole book about shipping containers in case you wanted to know exactly how much of a nerd I can be.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://us.macmillan.com/books/9781250058294&quot;&gt;90% of Everything&lt;/a&gt;: Another book about logistics, this time about the shipping industry. I thought this was a nearly perfect non-fiction book. Informative but not overly serious and with a narrative arc that doesn&apos;t overshadow the content.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;1&lt;/p&gt;
&lt;p&gt;I should acknowledge there&apos;s an &lt;a href=&quot;https://hbr.org/2017/06/7-practical-ways-to-reduce-bias-in-your-hiring-process&quot;&gt;abundance of research&lt;/a&gt; that shows that as a straight white male with a nondescript name that obscures my immigrant identity and a traditional STEM background, I&apos;m generally taken much more seriously as a candidate for a broad range of positions than folks from other backgrounds.&lt;/p&gt;
</content:encoded></item><item><title>Configuring Your OSX Terminal</title><link>https://acviana.com/posts/2021-05-07-configuring-your-osx-terminal/</link><guid isPermaLink="true">https://acviana.com/posts/2021-05-07-configuring-your-osx-terminal/</guid><description>For fun and productivity!</description><pubDate>Fri, 07 May 2021 12:30:05 GMT</pubDate><content:encoded>&lt;p&gt;Hello and welcome back! This edition of my newsletter is about how to configure an attractive and functional terminal environment. I&apos;ll be walking through my current setup (pictured below) which uses iTerm2, the Dracula color theme, Meslo LGS Nerd Font, and the Fish shell with the Tide theme. This tutorial is geared towards OSX so your milage may vary on Linux and Windows systems.&lt;/p&gt;
&lt;h3&gt;Why Should You Care?&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!rI3_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7e7789e-dc25-4ec6-a460-8f86639bf25c_1140x178.png&quot;&gt;&lt;img src=&quot;/assets/substack/configuring-your-osx-terminal/e7e7789e-dc25-4ec6-a460-8f86639bf25c_1140x178.png&quot; alt=&quot;Basic Bash Shell&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As always, why should we bother with this in this the first place?&lt;/p&gt;
&lt;p&gt;Given the justifiable momentum right now to low-code and no-code solutions in the data space, the command line might not be a high on many people&apos;s list of favorite tools. But, even as we gain new SaaS tools the command line remains an incredibly powerful systems programming tool. For folks in the analytics space, the command line is useful for everything from parsing and peeking at data to setting up your Python environment. Using the terminal becomes essential as your work gets to closer to the Data Engineering and DevOps domains (even if those words aren&apos;t in your job title).&lt;/p&gt;
&lt;p&gt;But for a lot of people who work in data, and even some engineers, the terminal is their &lt;em&gt;least&lt;/em&gt; favorite tool. The steep learning curve and dramatic failure modes aside, it doesn&apos;t help that terminal environment itself can feel like stepping back in time -- forget linting and autocomplete, there aren&apos;t even any colors!&lt;/p&gt;
&lt;p&gt;But check this out …&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!jsiW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2f27d358-df53-42e5-b8fc-b508cb12bdb7_1424x202.png&quot;&gt;&lt;img src=&quot;/assets/substack/configuring-your-osx-terminal/2f27d358-df53-42e5-b8fc-b508cb12bdb7_1424x202.png&quot; alt=&quot;Completed terminal example&quot; /&gt;&lt;/a&gt;Sharp fonts, contrasting colors, fun glyphs -- all this could be yours!&lt;/p&gt;
&lt;p&gt;That&apos;s my current terminal setup pictured above. From left to right you can see my system type, my current working directory, my git branch, the number of staged and untracked files in that repo, which virtual environment I&apos;m using, and the timestamp of the last command. On the next line I&apos;ve started to type &lt;code&gt;git add&lt;/code&gt; and &lt;code&gt;tests/&lt;/code&gt; is being suggested as an autocomplete option based on the modified files in the repo.&lt;/p&gt;
&lt;p&gt;These aren&apos;t just neat tricks, all this information clearly displayed helps prevent many of my most frequent command line mistakes. Notice that form is as important as function here. The font face is clear down to the details in the file icon. The colors clearly separate out different information. There&apos;s visual whitespace between each command. Theses are little things but when you are staring at a terminal for hours a day these little things add up.&lt;/p&gt;
&lt;p&gt;Lastly, I think there&apos;s something to be said for the educational value that comes from taking ownership of your tools; customizing them, breaking them along the way, but ultimately learning how they work.&lt;/p&gt;
&lt;p&gt;Let&apos;s get started!&lt;/p&gt;
&lt;h3&gt;iTerm2&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!xOf2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1149d329-c5a4-4bb6-ae3e-80fd6044eee5_1336x436.png&quot;&gt;&lt;img src=&quot;/assets/substack/configuring-your-osx-terminal/1149d329-c5a4-4bb6-ae3e-80fd6044eee5_1336x436.png&quot; alt=&quot;iterm 2-column example&quot; /&gt;&lt;/a&gt;lolcat, figlet, and cowsay are some of very serious tools you can use in the command line&lt;/p&gt;
&lt;p&gt;Let&apos;s start by downloading the &lt;a href=&quot;https://iterm2.com/&quot;&gt;iTerm&lt;/a&gt; terminal emulator for OSX. iTerm is a feature-rich alternative to the built-in OSX terminal application. Some of the &lt;a href=&quot;https://iterm2.com/features.html&quot;&gt;features&lt;/a&gt; that I use regularly are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Better control of font size, font, and transparency&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Tabs, Split screens, and moving panes&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A configurable &lt;a href=&quot;https://medium.com/@msvechla/customizing-the-new-iterm2-status-bar-to-your-needs-252eee06bf39&quot;&gt;status bar&lt;/a&gt; (visible at the bottom of the screen shot above) that can show things like location, git info, CPU and memory use.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Even just this handful of features is a quality of life improvement over the OSX Terminal app. Note that iTerm only runs on OSX so Windows and Linux users can skip this step. The good news is that the Linux terminal (or at least on Ubuntu) can already do most of what I use iTerm for. Windows users should look at setting up &lt;a href=&quot;https://ubuntu.com/blog/new-installation-options-coming-for-ubuntu-wsl&quot;&gt;WSL 2&lt;/a&gt; which gives you a native Linux terminal your Windows environment.&lt;/p&gt;
&lt;h2&gt;Colors&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!PUYE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F00d4d0ab-2717-4b17-9fe7-9a300aed6876_1456x332.png&quot;&gt;&lt;img src=&quot;/assets/substack/configuring-your-osx-terminal/00d4d0ab-2717-4b17-9fe7-9a300aed6876_1456x332.png&quot; alt=&quot;Dracula Theme command line example&quot; /&gt;&lt;/a&gt;Credit: https://draculatheme.com/&lt;/p&gt;
&lt;p&gt;Now that we have a better terminal environment let&apos;s get rid of the default black and white color pallet. We want our color theme to provide helpful highlighting as well as reduce eyestrain and be aesthetically pleasing. iTerm already comes with a few built-in color presets which you can find under Preferences &amp;gt; Profiles &amp;gt; Colors &amp;gt; Color Presets. Personally, for years I used the popular built-in Solarized Dark theme and would switch to Solarized Light in bright lighting conditions like outside. Lately, I&apos;ve switched to the &lt;a href=&quot;https://draculatheme.com/&quot;&gt;Dracula color theme &lt;/a&gt; (the background story is worth a read).&lt;/p&gt;
&lt;p&gt;If none of these appeal to you can grab all the iTerm2 color files you want from t&lt;a href=&quot;https://iterm2colorschemes.com/&quot;&gt;his repo&lt;/a&gt;. They&apos;re a little hard to interpret but you can find plenty of &quot;best of&quot; lists on web that can guide you toward the most popular ones. After you download them you can import and select them from the same Color Presets menu.&lt;/p&gt;
&lt;p&gt;These colors might not look like much right now, but as we keep going they&apos;ll start to pop a bit more as our terminal makes better use of them.&lt;/p&gt;
&lt;h2&gt;Fonts&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!ZLox!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6be146-476a-42b0-9585-3bb32ea0d05f_1488x474.png&quot;&gt;&lt;img src=&quot;/assets/substack/configuring-your-osx-terminal/7b6be146-476a-42b0-9585-3bb32ea0d05f_1488x474.png&quot; alt=&quot;Nerd font example&quot; /&gt;&lt;/a&gt;Source: https://www.nerdfonts.com/&lt;/p&gt;
&lt;p&gt;Typefaces are a rich world of nerdiness unto themselves with their own subculture of designers. There&apos;s a documentary on &lt;a href=&quot;https://vimeo.com/286172171&quot;&gt;Helvetica&lt;/a&gt;. There are &lt;a href=&quot;http://thinkingwithtype.com/&quot;&gt;books&lt;/a&gt; about &lt;a href=&quot;https://www.amazon.com/Just-My-Type-About-Fonts/dp/1592407463?psc=1&quot;&gt;type faces&lt;/a&gt; (including one by some &lt;a href=&quot;https://www.letteringandtype.com/site/thebook/&quot;&gt;good friends&lt;/a&gt; of mine). All this is to say that while you might never have given the font in your terminal much thought -- some people have.&lt;/p&gt;
&lt;p&gt;OK, but what if don&apos;t want a research project and just want a good font? From what I&apos;ve read, a safe bet for OSX users __ is the Meslo LG font family which you can &lt;a href=&quot;https://github.com/andreberg/Meslo-Font&quot;&gt;read more about here&lt;/a&gt;. I picked Meslo because it&apos;s a monospaced font designed for software applications and is specifically optimized for OSX systems and displays. But this optimization means it doesn&apos;t always look great on my external monitors, in which case I switch to the &lt;a href=&quot;https://fonts.google.com/specimen/Source+Code+Pro&quot;&gt;Source Code Pro family&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;But wait! These are not &lt;em&gt;exactly&lt;/em&gt; the typefaces you want to install. We&apos;re going to make our terminals fancy (🧐 🎩) by choosing variants of our fonts that include glyphs from the &lt;a href=&quot;https://www.nerdfonts.com/&quot;&gt;Nerd Font&lt;/a&gt; family. For the font families I mentioned these are the &lt;a href=&quot;https://github.com/IlanCosman/tide#meslo-nerd-font&quot;&gt;Meslo LGS Nerd Font&lt;/a&gt; and &lt;a href=&quot;https://github.com/ryanoasis/nerd-fonts/tree/master/patched-fonts/SourceCodePro/Regular&quot;&gt;Source Code Pro Nerd&lt;/a&gt; Font (aka &quot;Sauce Code Pro&quot; for namespace reasons). We&apos;ll make use of these new glyphs in the next section.&lt;/p&gt;
&lt;p&gt;Lastly, make your font bigger. While reading small font won&apos;t permanently damage your eyes, it does cause eye strain. Many of us correct for this by hunching over and squinting without even realizing it, leading to muscle strain and headaches. Here is an experiment you can try; make your font just a little bit bigger than you think it needs to be and then see if you even remember to change it back.&lt;/p&gt;
&lt;h2&gt;Fish Shell and Themes&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!5MX-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6b104681-6b83-4bac-b284-91b20ce10822_1082x372.png&quot;&gt;&lt;img src=&quot;/assets/substack/configuring-your-osx-terminal/6b104681-6b83-4bac-b284-91b20ce10822_1082x372.png&quot; alt=&quot;Tide theme logo&quot; /&gt;&lt;/a&gt;Source: https://github.com/IlanCosman/tide&lt;/p&gt;
&lt;p&gt;So far, we have a modern terminal emulator, a nice color theme, and a sharp font. Now let&apos;s bring them all together. Within your terminal you are always working in a &quot;shell&quot;. You can think of a shell as a scripting language for interacting with your operating system. The most common shell is called bash; though the OSX default recently switched to zsh, a more modern bash-compatible option.&lt;/p&gt;
&lt;p&gt;For the past few years though I&apos;ve been using &lt;a href=&quot;https://fishshell.com/&quot;&gt;Fish Shell&lt;/a&gt;, a user-friendly shell that come set up with things like color highlighting and extensive auto-completion for everything from git to unix commands (&quot;Finally, a command line shell for the 90s&quot;).&lt;/p&gt;
&lt;p&gt;There is a trade off. Fish achieves at least some of this by &lt;em&gt;not&lt;/em&gt; being bash compliant. I&apos;ve found this is rarely an issue as Fish will respect a &lt;code&gt;#!/bin/bash&lt;/code&gt; header and otherwise I can drop in to a bash shell anytime I need to. In contrast, almost every command I run in Fish gives me some kind of value such autocompleting file paths.&lt;/p&gt;
&lt;p&gt;Fish works great out of the box but to the get the best experience I prefer setting a theme with a plug-in manager. All of these themes will make heavy use of the Nerd fonts we installed earlier. For the past few years I&apos;ve been using the &lt;a href=&quot;https://github.com/oh-my-fish/oh-my-fish&quot;&gt;Oh My Fish&lt;/a&gt; plugin manager with the &lt;a href=&quot;https://github.com/oh-my-fish/oh-my-fish/blob/master/docs/Themes.md#default&quot;&gt;default theme&lt;/a&gt;. I briefly used the &lt;a href=&quot;https://github.com/oh-my-fish/oh-my-fish/blob/master/docs/Themes.md#bobthefish-1&quot;&gt;bobthefish&lt;/a&gt; theme but the &lt;a href=&quot;https://github.com/powerline/powerline&quot;&gt;Powerline-inspired&lt;/a&gt; aspects were too distracting for me.&lt;/p&gt;
&lt;p&gt;In the process of writing this up I came across the &lt;a href=&quot;https://github.com/IlanCosman/tide&quot;&gt;Tide theme&lt;/a&gt; which is not supported by OMF but you can install using the &lt;a href=&quot;https://github.com/jorgebucaran/fisher&quot;&gt;Fisher&lt;/a&gt; plugin manager. I like this theme quite a bit, in part because of the built-in setup manager that walks you through a customization workflow. This is the setup I have pictured at the top of the article and my new default on my personal and work computers.&lt;/p&gt;
&lt;p&gt;Whatever you pick, there is a a fork in the road here because you can&apos;t have both OMF and Fisher installed on the same system because they try to edit the same Fish configuration files -- so you do have to pick one.&lt;/p&gt;
&lt;p&gt;That&apos;s it! Hopefully you have a more functional and attractive setup now or have otherwise learned a little bit more about the tools available to you.&lt;/p&gt;
&lt;h2&gt;Odds and Ends&lt;/h2&gt;
&lt;p&gt;It&apos;s been a little bit since I&apos;ve put out a newsletter. I&apos;ve spent a good portion of that time playing with some the frameworks I wrote about in my last edition including a &lt;a href=&quot;https://github.com/acviana/dagster-advent-of-code&quot;&gt;Dagster MVP&lt;/a&gt; built around an Advent of Code challenge that I want to write up at some point.&lt;/p&gt;
&lt;p&gt;And while all that has kept me busy the truth is that &lt;em&gt;writing is hard&lt;/em&gt; and I&apos;ve gotten stuck on a lot of the drafts I&apos;ve started. But I think I&apos;m learning how to write more tightly-scoped newsletters that should be easier to get out the door.&lt;/p&gt;
&lt;p&gt;Anyway, here are some odds and ends for you.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Around the Internet:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://towardsdatascience.com/what-i-learned-in-my-first-6-months-on-the-data-team-at-healthjoy-e5218144429b&quot;&gt;What I Learned in My First 6 Months on the Data Team at HealthJoy&lt;/a&gt; - One of my teammates recently started blogging and wrote about his great article about his first 6 months on our Data Team. He talks a lot about how we&apos;re trying to build a modern team around collaboration and data as a product.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.politico.com/news/magazine/2019/11/29/penn-station-robert-caro-073564&quot;&gt;This is Why Your Holiday Travel is Awful&lt;/a&gt; - A long read on debacle that is Penn Station in NYC, the politics of government projects, and why we can&apos;t seem to build anything in America.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Reading:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.npr.org/2019/02/05/691293961/in-go-ahead-in-the-rain-the-love-for-a-tribe-called-quest-is-infectious&quot;&gt;Go Ahead in the Rain: Notes to a Tribe Called Quest&lt;/a&gt;: I finished this wonderful book, part history of ATCQ and part personal memoir of the author as he was growing up with the group. Probably my favorite hip-hop book since I read the 33 1/3 series on &lt;a href=&quot;https://www.amazon.com/Nas-Illmatic-33-1-3/dp/0826429076https://www.amazon.com/Nas-Illmatic-33-1-3/dp/0826429076&quot;&gt;Nas&apos;s Illmatic&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.kirkusreviews.com/book-reviews/tressie-mcmillan-cottom/thick/&quot;&gt;Thick: And Other Essays&lt;/a&gt;: I&apos;ve been following recent McCarther Genius Award recipient Dr. Tressie McMillan Cottom for years so I&apos;m glad to finally put some money in her pocket. Her first volume of her essays is effecting and profound. She also has a great &lt;a href=&quot;https://tressie.substack.com/&quot;&gt;substack&lt;/a&gt; where she talks about everything from her writing process to Peloton to Dolly Parton.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Listening:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I&apos;ve been bouncing around a lot but lately. Not really loving anything too much but this is what&apos;s been at the top of my rotation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;On a recommendation I checked out Benny the Butcher&apos;s &lt;a href=&quot;https://open.spotify.com/album/20XfOL0gmcOQhupwC2bMDj?si=vANC2kNNTUa2pTMPIcgR_Q&quot;&gt;The Plugs I Met 2&lt;/a&gt; (mostly for the Harry Fraud production)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Floated around some SoCal Lo-Fi pop by &lt;a href=&quot;https://open.spotify.com/artist/1zeHZCkBteZhJHsRI9qv29?si=x7SbU1veRWG4VeCuqqyIDw&quot;&gt;Little Monarch&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Went down a Chicago blues history rabbit hole listening to Alligator Records first artist &lt;a href=&quot;https://open.spotify.com/artist/737qPoiQQkeuIzuJy54aK4?si=4Rs6JVgaTymJJN3YokrkxQ&quot;&gt;Hound Dog Taylor&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;My personal favorites &lt;a href=&quot;https://open.spotify.com/album/3wW4dsaL7EVVdAfu9aIU1M?si=IK5xONXBSYiN8n2Ak976RA&quot;&gt;The Hold Steady&lt;/a&gt; put out a pretty decent studio album&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
</content:encoded></item><item><title>6 Python Tools I&apos;d Like To Try In 2021</title><link>https://acviana.com/posts/2021-03-02-6-python-tools-id-like-to-try-in/</link><guid isPermaLink="true">https://acviana.com/posts/2021-03-02-6-python-tools-id-like-to-try-in/</guid><description>I could probably have tried all these tools in the time it took me to write this ...</description><pubDate>Tue, 02 Mar 2021 14:43:39 GMT</pubDate><content:encoded>&lt;p&gt;I recently discovered most of what I&apos;ve arrived at (in addition to a bunch of other stuff) is covered by Claudio Jolowicz&apos;s excellent (and wonderfully illustrated) &lt;a href=&quot;https://cjolowicz.github.io/posts/hypermodern-python-01-setup/&quot;&gt;Hypermodern Python&lt;/a&gt; series. My ideal project setup includes &quot;hypermodern&quot; tools like Black, pyenv, and Poetry but also more traditional tools like PyTest and Flake8, build pipelines in Makefiles and GitLab CI/CD config files, as well as more mundane things like &lt;code&gt;README&lt;/code&gt; layouts.&lt;/p&gt;
&lt;p&gt;At work, I&apos;ve been slowly adding these things to a template project but this is mostly used as a reference implementation for individual tools that we gradually back port to other projects as time allows. In previous roles, we tried to take this a step further by using a GitHub-style development model to fork new projects off of a base project but this never quite got traction.&lt;/p&gt;
&lt;p&gt;I thought about building a separate template project for my personal projects on &lt;a href=&quot;https://github.com/acviana&quot;&gt;GitHub&lt;/a&gt; but something about it just felt inefficient. I could hear Raymond Hettinger&apos;s catch phrase from his PyCon talks in the back of my mind &quot;&lt;a href=&quot;https://www.youtube.com/watch?v=npw4s1QTmPg&quot;&gt;There MUST be a better way!&lt;/a&gt;&quot;&lt;/p&gt;
&lt;p&gt;Enter Cookiecutter. From their website:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A command-line utility that creates projects from &lt;strong&gt;cookiecutters&lt;/strong&gt;  (project templates), e.g. creating a Python package project from a Python package project template.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This looks very promising and is the top project I want to try this year. I&apos;m an advocate of getting folks out of notebooks workflows and into a proper project format when work starts to move beyond exploratory analysis. Giving everyone a dynamic template to start with is a huge enabler for that. You can even find maintained templates for things like &lt;a href=&quot;https://github.com/audreyfeldroy/cookiecutter-pypackage&quot;&gt;Python packages&lt;/a&gt; and &lt;a href=&quot;https://github.com/drivendata/cookiecutter-data-science&quot;&gt;data science&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;pre-commit&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!pJM8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F37ff1c2b-6920-4654-b872-6770ecee8cde_2000x2000.svg&quot;&gt;&lt;img src=&quot;/assets/substack/6-python-tools-id-like-to-try-in/37ff1c2b-6920-4654-b872-6770ecee8cde_2000x2000.svg&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;pre-commit is another newish package that feels like an emerging standard to me (checkout the &lt;a href=&quot;https://cjolowicz.github.io/posts/hypermodern-python-03-linting/#managing-git-hooks-with-precommit&quot;&gt;Hypermodern post&lt;/a&gt;). Currently, I wrap all my &quot;pre-commit&quot; steps like Black and Flake8 into a Makefile. This works pretty well, but you still have to remember to run a &lt;code&gt;make pre-commit&lt;/code&gt; before you push. While this isn&apos;t a &lt;em&gt;huge&lt;/em&gt; issue in my team&apos;s current workflow, I don&apos;t like anything that relies on the user having to remember something.&lt;/p&gt;
&lt;p&gt;One solution to this issue to write a &lt;a href=&quot;https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks&quot;&gt;pre-commit hook&lt;/a&gt; in git. These are scripts that are automatically run by git before committing code. I&apos;ve known about pre-commit hooks for a long time but never quite felt compelled to write my own.&lt;/p&gt;
&lt;p&gt;Enter pre-commit. From their website:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We built pre-commit to solve our hook issues. It is a multi-language package manager for pre-commit hooks. You specify a list of hooks you want and pre-commit manages the installation and execution of any hook written in any language before every commit.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I learned about pre-commit sometime last year and it seemed like something nice to get around to eventually. But what I&apos;m now realizing is that people are using pre-commit as a way to share git-hook recipes. For example, instead of showing everyone how to set up their text editors to remove trailing whitespace I can just turn on the &lt;code&gt;trailing-whitespace&lt;/code&gt; hook like in this &lt;a href=&quot;https://cjolowicz.github.io/posts/hypermodern-python-03-linting/#managing-git-hooks-with-precommit&quot;&gt;Hypermodern post&lt;/a&gt;. This connected the dots for me with cookiecutter as a way to increase standardization and reusability at my job. I think the possibilities really became clear to me when I read about the 3rd project on my list.&lt;/p&gt;
&lt;h2&gt;pre-commit-dbt&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!F4H_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F736e46b5-dadd-4a3f-a418-35cf2c0d5440_1200x447.png&quot;&gt;&lt;img src=&quot;/assets/substack/6-python-tools-id-like-to-try-in/736e46b5-dadd-4a3f-a418-35cf2c0d5440_1200x447.png&quot; alt=&quot;dbt-pre-commit&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Our data warehouse at work is centered around the &lt;a href=&quot;https://www.getdbt.com/&quot;&gt;dbt&lt;/a&gt; package, its &lt;a href=&quot;https://www.youtube.com/watch?v=5W6VrnHVkCA&amp;amp;t=2s&quot;&gt;recommended design patterns&lt;/a&gt;, and even the concept of an &lt;a href=&quot;https://blog.getdbt.com/what-is-an-analytics-engineer/&quot;&gt;Analytics Engineer&lt;/a&gt;. I&apos;ll be honest, my team has a pretty good handle on dbt which means I&apos;m largely hands-off on this project. But, from our team meetings my engineering sense was telling me that we could be doing more in terms of enforcing things like testing and documentation. So I was excited to connect the dots back to the pre-commit project (maybe framework is a better word here?) in the latest dbt community newsletter.&lt;/p&gt;
&lt;p&gt;From the pre-commit-dbt GitHub page:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;List of pre-commit hooks to ensure the quality of your dbt projects.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You&apos;ve got my attention! It continues:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;dbt is awesome, but when a number of models, sources, and macros grow it starts to be challenging to maintain quality. People often forget to update columns in schema files, add descriptions, or test. Besides, with the growing number of objects, dbt slows down, users stop running models/tests (because they want to deploy the feature quickly), and the demands on reviews increase.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Perfect, and these are exactly the kinds of code/project quality issues I&apos;m trying to improve on my current team and I think using some of these dbt pre-commit hooks could really help improve the overall quality of our data warehouse as a platform.&lt;/p&gt;
&lt;p&gt;And thinking about our data warehouse as a platform leads me to the next interesting project on our list.&lt;/p&gt;
&lt;h2&gt;Dagster&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!GjRc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fac218550-2688-40b1-a57d-f101c69633d8_408x408.png&quot;&gt;&lt;img src=&quot;/assets/substack/6-python-tools-id-like-to-try-in/ac218550-2688-40b1-a57d-f101c69633d8_408x408.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;One of the benefits of doing the &lt;a href=&quot;https://adventofcode.com/&quot;&gt;Advent of Code&lt;/a&gt; puzzles this last December is that it really helped me understand my personal software design preferences. Getting into a daily rhythm of completing a mini-project and then discussing them in a Discord channel with my friends allowed me to refine and articulate some programming ideas I had been taking for granted (link to my &lt;a href=&quot;https://github.com/acviana/advent-of-code-2020&quot;&gt;code repo&lt;/a&gt;). Upon reflection, also I realized that these were ideas that I was failing to communicate effectively while mentoring other team members.&lt;/p&gt;
&lt;p&gt;From my point of view, the Advent of Code problems are all essentially mini data pipelines with an input file and a single summary output. This had me advocating to my friends about the merits of highly-decoupled and functional programming paradigms assembled using composition and with strong test coverage (there&apos;s a draft of a newsletter just about this).&lt;/p&gt;
&lt;p&gt;You can almost directly scale those ideas of composing decoupled functions with no side-effects from toy python pipelines into full data architectures. At a certain scale this argument runs into pushback about the practical (not computational) scalability of microservices architectures. However, I think there&apos;s still a lot of value in the concept of highly independent chunks of code that do exactly one thing perfectly.&lt;/p&gt;
&lt;p&gt;I was thinking about these types of patterns when I remembered a recent issue of Tristan Handy&apos;s &lt;a href=&quot;http://roundup.fishtownanalytics.com/issues/orchestration-with-dagster-end-to-end-data-scientists-snowflake-s-s-1-experimentation-stitch-fix-dsr-233-270986&quot;&gt;Data Science Roundup&lt;/a&gt; (Tristan is the founder of &lt;a href=&quot;https://www.fishtownanalytics.com/&quot;&gt;Fishtown Analytics&lt;/a&gt; which maintains dbt core) talking about a new data orchestration tool called &lt;a href=&quot;https://dagster.io/&quot;&gt;Dagster&lt;/a&gt;. From their website:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Dagster is a data orchestrator for machine learning, analytics, and ETL&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Continuing:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;With Dagster&apos;s pluggable execution, the same pipeline can run in-process, against your local file system or on a distributed work queue, against your production data lake. You can set up Dagster&apos;s web interface in a minute on your laptop, or deploy it on-premise or in any cloud.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You can read more from one of the project organizers on this &lt;a href=&quot;https://medium.com/dagster-io/dagster-the-data-orchestrator-5fe5cadb0dfb&quot;&gt;Medium post&lt;/a&gt; including a some helpful sample code. I&apos;ve scanned their docs and there seems to be a lot of overlap with how I&apos;m already thinking about pipelines. I&apos;d have to read more to see how this would compare to something like Airflow, Databricks, or Matillin but I could see this being a very cool project. In fact, this is the project that has me the most excited of anything on my list. I see a lot of possibilities for composing various scripts and processes that are starting to pop up in our team&apos;s work.&lt;/p&gt;
&lt;p&gt;There&apos;s even typing:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Dagster models data dependencies between steps in your orchestration graph and handles passing data between them. Optional typing on inputs and outputs helps catch bugs early.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Type hints in Python is something I&apos;m increasing pulling into my work. In working on my Advent of Code puzzles I finally started using &lt;a href=&quot;https://mypy.readthedocs.io/en/stable/&quot;&gt;mypy&lt;/a&gt; and type hints and loved the structure and constraints they added to my code. Unfortunately, everything I&apos;ve read and tried has led to believe the type hint integration just isn&apos;t &lt;em&gt;quite&lt;/em&gt; there yet in terms of full integration with Pandas + MyPy. Nonetheless, the next two projects on my list for 2021 are interesting in-and-of-themselves but also because of how they leverage type hints for autogeneration of functionality and documentation.&lt;/p&gt;
&lt;h2&gt;Typer&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!sYZZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9194d015-369d-4362-af9b-3efbb9746d3d_591x237.svg&quot;&gt;&lt;img src=&quot;/assets/substack/6-python-tools-id-like-to-try-in/9194d015-369d-4362-af9b-3efbb9746d3d_591x237.svg&quot; alt=&quot;Typer&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&apos;ve been writing command line interfaces since the beginning of my career when I was creating image analysis pipelines for the Hubble Space Telescope and I&apos;ll always have a soft spot for a solid CLI framework. I started out like everyone just manually parsing &lt;code&gt;sys.agrv&lt;/code&gt;, then move to the now deprecated &lt;a href=&quot;https://docs.python.org/3/library/optparse.html#module-optparse&quot;&gt;optparse&lt;/a&gt;, then argparse, and finally moved on to &lt;a href=&quot;https://click.palletsprojects.com/en/7.x/&quot;&gt;Click&lt;/a&gt; a few years ago which is the modern Python standard and a great tool.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://typer.tiangolo.com/&quot;&gt;Typer&lt;/a&gt; is a newish CLI framework and the self described &quot;little sibling&quot; of FastAPI, the next and final project on this list. From their website:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Typer is a library for building CLI applications that users will &lt;strong&gt;love using&lt;/strong&gt;  and developers will &lt;strong&gt;love creating&lt;/strong&gt;. Based on Python 3.6+ type hints&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Check out this &quot;Hello world&quot; example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import typer


def main(name: str):
    typer.echo(f&quot;Hello {name}&quot;)


if __name__ == &quot;__main__&quot;:
    typer.run(main)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That&apos;s it. That&apos;s enough to generate a CLI complete with unix style &lt;code&gt;--help&lt;/code&gt; flags and everything:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;python main.py --help

Usage: main.py [OPTIONS] NAME

Arguments:
  NAME  [required]

Options:
  --install-completion  Install completion for the current shell.
  --show-completion     Show completion for the current shell, to copy it or customize the installation.
  --help                Show this message and exit.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Amazing! This reminds me of the &lt;a href=&quot;https://flask.palletsprojects.com/en/1.1.x/quickstart/&quot;&gt;Flask API hello world&lt;/a&gt;! Which brings us to our final project I&apos;ve got on my radar for 2021.&lt;/p&gt;
&lt;h2&gt;FastAPI&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!H4Hb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e6083ea-8bb2-461c-bbba-7183b734f30b_1023x369.png&quot;&gt;&lt;img src=&quot;/assets/substack/6-python-tools-id-like-to-try-in/4e6083ea-8bb2-461c-bbba-7183b734f30b_1023x369.png&quot; alt=&quot;FastAPI&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I spent a lot of time early in my engineering career bouncing back and forth between Django and Flask. Both are great projects with their own merits. Django is the go-to batteries-included option but a lot of times my projects ended up needing something much more lightweight. Flask offered us that minimal code footprint (at first) but as our projects scaled I found we had to spend more and more time thinking about which design principles we wanted to impose on our app. Or, to be more honest, thinking about how to reconcile the various patterns we had half-implemented throughout our app.&lt;/p&gt;
&lt;p&gt;Some of this is inevitable as projects start to scale but Flask&apos;s hands-off nature seems to make the issue especially acute from the get-go. Often times, while I still felt like I didn&apos;t want to go full Django with it&apos;s CRM and ORM, I did still wish Flask was making more decisions for me.&lt;/p&gt;
&lt;p&gt;Since then I&apos;ve been increasingly interested in how opinionated projects can enforce a certain structure which can help guide your work and this is one of the things that drew to me to FastAPI. From their website:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href=&quot;https://fastapi.tiangolo.com/&quot;&gt;FastAPI&lt;/a&gt; is Typer&apos;s (older?) sibling. The idea is the same, using type hints allows you to more richly define boilerplate for your API. In the same way that Typer generates a command line help interface, FastAPI generates rich Swagger docs.&lt;/p&gt;
&lt;p&gt;As that name would suggest, FastAPI is also fast. Actually a subclass of another project called starlette it&apos;s one of the &lt;a href=&quot;https://fastapi.tiangolo.com/features/#starlette-features&quot;&gt;fastest Python Frameworks&lt;/a&gt;. I don&apos;t see myself building too many APIs in the near future but if I do I&apos;m certainly going to reach for this project.&lt;/p&gt;
&lt;h2&gt;Odds and Ends&lt;/h2&gt;
&lt;p&gt;Around the Internet:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://longreads.com/2018/03/15/welcome-to-the-center-of-the-universe/&quot;&gt;Welcome to the Center of the Universe&lt;/a&gt;: This a long read about the Deep Space Network, the network of satellite dishes that has given us 24/7 contact with all of our deep space missions for over 50 years. It&apos;s an incredible feat of engineering and commitment characteristic of our space program. It also had me reminiscing of my first career in astronomy when I was doing overnight shifts of thermal vacuum testing on the &lt;a href=&quot;https://www.stsci.edu/hst/instrumentation/wfc3&quot;&gt;Hubble WFC3&lt;/a&gt; camera at Goddard Space Flight Center.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Reading:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://wwnorton.com/books/Natures-Metropolis/&quot;&gt;Nature&apos;s Metropolis&lt;/a&gt;: I finally wrapped up this long but fantastic read about the economic history of Chicago and the greater midwest. I learned about the many historic economic connections between the places I&apos;ve lived, studied, visited, and vacationed, down to individual buildings I&apos;ve been in. It changed the way I think of my city and the region it&apos;s situated in.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://pitchfork.com/thepitch/go-ahead-in-the-rain-a-tribe-called-quest-book-review-hanif-abdurraqib/&quot;&gt;Go Ahead in the Rain: Notes to A Tribe Called Quest&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt; I&apos;m looking forward to poet and writer Hanif Abdurraqib&apos;s part history and part personal memoir of ATCQ. I haven&apos;t done a deep dive on the group&apos;s history since a tumultuous 2016 when in a few short months I watched the &lt;a href=&quot;https://en.wikipedia.org/wiki/Beats,_Rhymes_%26_Life:_The_Travels_of_A_Tribe_Called_Quest&quot;&gt;Beats, Rhymes &amp;amp; Life&lt;/a&gt; documentary for the first time, only to learn of Phife Dawg&apos;s untimely death a few months later, closely followed by the release of their &lt;a href=&quot;https://en.wikipedia.org/wiki/We_Got_It_from_Here..._Thank_You_4_Your_Service&quot;&gt;last album&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Listening:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chick Corea: It&apos;s unfortunate that so recently after losing MF DOOM I have to say goodbye to another treasured musician: jazz pianist Chick Corea. I&apos;ve been fascinated by Chick&apos;s playing since I came across Return to Forever&apos;s &lt;a href=&quot;https://open.spotify.com/album/2mLtPMLV5nWE0rzjVvcEmt?si=ome_TqvgRfexGPflZR_fJA&quot;&gt;Romantic Warrior&lt;/a&gt; in high school. Here he is with his long-time collaborator Gary Burton in in a 2016 Tiny Desk Concert:&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And here he is 1992 at the &lt;a href=&quot;https://www.youtube.com/watch?v=7-hoh_ZIKIc&quot;&gt;Tokyo Blue Note&lt;/a&gt; with his New Akoustic Band trio.&lt;/p&gt;
&lt;p&gt;Chick knew he was on his way out with a rare form of cancer soon after it was discovered so I&apos;ll close out this newsletter with an excerpt from his goodbye message to his fans.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;It is my hope that those who have an inkling to play, write, perform or otherwise, do so. If not for yourself then for the rest of us.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr /&gt;
</content:encoded></item><item><title>My Python Documentation Toolchain</title><link>https://acviana.com/posts/2021-02-07-my-python-documentation-toolchain/</link><guid isPermaLink="true">https://acviana.com/posts/2021-02-07-my-python-documentation-toolchain/</guid><description>I tried to write a short technical article and it took me 6 weeks</description><pubDate>Sun, 07 Feb 2021 19:03:47 GMT</pubDate><content:encoded>&lt;p&gt;Let&apos;s start with why we bothered to build a documentation &lt;em&gt;system&lt;/em&gt; in the first place instead of just taking ad-hoc notes.&lt;/p&gt;
&lt;h3&gt;Why The Fuss About Documentation?&lt;/h3&gt;
&lt;p&gt;I think most people who write software would agree that projects should have at least &lt;em&gt;some&lt;/em&gt; documentation and that &lt;em&gt;major&lt;/em&gt; projects, like Pandas, should have exhaustive documentation. But for an internal medium-sized project with only a few users isn&apos;t just a &lt;code&gt;README&lt;/code&gt; file, sane variable names, and a few comment strings more than enough? More to the point, real projects have deadlines. Wouldn&apos;t developer effort be better spent getting code out the door and not on comprehensive documentation?&lt;/p&gt;
&lt;p&gt;In my opinion, for code to be truly high-quality it needs to be maintainable. Solid design principles can make it easier to understand what your code is doing. Tests can ensure your code is behaving as you intended. But documentation helps you understand &lt;em&gt;why&lt;/em&gt; you are doing what you are doing. It adds context, and data work in particular is heavily contextual relying on nuanced understandings of upstream and downstream data flows.&lt;/p&gt;
&lt;p&gt;While cutting corners on documentation is a pragmatic approach, and certainly something I&apos;ve done myself, I don&apos;t think it works well in the long run. In my experience it&apos;s just another form of technical debt. In the extreme but not uncommon case of a single project maintainer years of system knowledge can be lost when that individual leaves the organization.&lt;/p&gt;
&lt;p&gt;These are well-known arguments but I want to push this idea of the value of documentation beyond just creating maintainable code. In a modern data team it&apos;s common to have folks that are new to writing software in the form of a project and not just one-off scripts or notebooks. This is a transition I can&apos;t encourage enough but one that can be initially very confusing for the practitioner. Forcing these team members to slow down, think about what they&apos;re doing, and be explicit about their intentions and understanding in the documentation, is beneficial not only for code quality but also for their own professional development.&lt;/p&gt;
&lt;p&gt;Let me close this section by saying that I&apos;ve never met a technical team, mine included, that was satisfied with their documentation -- or their testing for that matter. But if you can agree as an organization that documentation is a critical part of delivering long-term value then you can at least strive to make sure your documentation is constantly improving. But to do that at scale you need a solid repeatable structure, and for that you need tools.&lt;/p&gt;
&lt;h3&gt;Why So Many Tools?&lt;/h3&gt;
&lt;p&gt;Even if you buy my argument about documentation being important, why do we need so many tools? Can&apos;t you just … keep the documentation up-to-date? My argument is that setting up a toolchain for your documentation gives your project structure and a direction, just like with a testing or a deployment system. As long as you have structure and direction you can iterate and evolve. Without it, you&apos;re lost and &quot;documentation&quot; will be scattered throughout your project, often completely outdated and irrelevant.&lt;/p&gt;
&lt;p&gt;I see more parallels here between documentation and testing. One of the main arguments in favor of testing is that beyond the immediate value of tests as code validation is that tests also encourage people to write code that is &lt;em&gt;testable by design.&lt;/em&gt; That is, code is intended to be intelligible and verifiable and which is overall better code. I think the same thing happens with documentation. When documentation becomes part of the structure in your projects, people will be inherently concerned with communicating all the ideas surrounding their code and their work will start to become more documentable (and intelligible) by design.&lt;/p&gt;
&lt;p&gt;But that structure needs to start from the ground up.&lt;/p&gt;
&lt;h3&gt;Documentation with Python Docstrings&lt;/h3&gt;
&lt;p&gt;Python has the concept of a &lt;em&gt;docstring,&lt;/em&gt; which is a multiline string at the start of a module, function, class, or method. If you&apos;re not familiar it looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def raise_to_power(base, exponent):
    &quot;&quot;&quot;This is a docstring&quot;&quot;&quot;
    # This is a comment
    return base ** exponent 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Like their name says, docstrings are intended to hold documentation. This might seem like a trivial feature at first; after all, we can also put important information in the &lt;code&gt;README&lt;/code&gt; or comments. So why should we use docstrings?&lt;/p&gt;
&lt;p&gt;I&apos;ve found that you get the most out of tools when you lean into their opinionated design decisions. Python as a language has made an intentional decision that documentation belongs in docstrings and as a result tools and best practices have been built up around this assumption. Aligning to these conventions means you can take advantage of all these additional tools. For example, if you load the previous function into a Python REPL you&apos;ll find it already knows what to do with the docstring:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; help(raise_to_power)

Help on function raise_to_power in module __main__:

raise_to_power()
    This is a docstring
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can get the same response with the &lt;code&gt;?&lt;/code&gt; operator in IPython or a Jupyter notebook. This is just a hint at what we can do with information once it&apos;s collected into docstrings. But, to do more we need to standardize our docstrings so other tools can make use of them.&lt;/p&gt;
&lt;h3&gt;Docstring Guidelines with the Google Style Guide&lt;/h3&gt;
&lt;p&gt;There are a couple of different conventions for Python docstrings but I strongly prefer the conventions from the &lt;a href=&quot;https://google.github.io/styleguide/pyguide.html#381-docstrings&quot;&gt;Google Python Style Guide&lt;/a&gt;. I&apos;m a big fan of the Google syntax because it strikes a balance between human and machine readable; similar in my opinion to Markdown.&lt;/p&gt;
&lt;p&gt;We&apos;ll leverage the fact that this format is machine readable when we start building webpages for our documentation but first let&apos;s look at a sample. Here&apos;s the same function we just saw but now with a Google style docstring:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def raise_to_power(base, exponent):
    &quot;&quot;&quot;
    Raise a base to an exponent, i.e. base ** exponent.

    Args:
        base (int): The base value
        exponent (int): The exponent value

    Returns:
        int: The output of the calculation
    &quot;&quot;&quot;
    return base ** exponent
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I think most people will agree that this is readable but some people might question if simple functions should be documented this extensively. Have we really added anything to our project in this case? I would argue yes.&lt;/p&gt;
&lt;p&gt;First of all, never overestimate the extent to which you have made assumptions about the context and clarity of intent in your code. In the case of our example, I would want this docstring to explain &lt;em&gt;why&lt;/em&gt; we are wrapping functionality that already exists in the Python standard library into a function. There&apos;s a useful side effect of this, forcing myself to write out assumptions in docstrings has become a form of &lt;a href=&quot;https://en.wikipedia.org/wiki/Rubber_duck_debugging&quot;&gt;rubber duck debugging&lt;/a&gt; for me that has frequently caused me to change my mind about a particular implementation.&lt;/p&gt;
&lt;p&gt;Second, as a practical matter, I already have enough decisions to make in writing my code. Having no input on when and how I add docstrings actually frees me to focus on the decisions that do matter. And as we&apos;re about to see, having a uniform documentation standard not only frees me to think about what matters, it allows us to leverage automation.&lt;/p&gt;
&lt;h3&gt;Automation with darglint, Flake8, and Merge Templates&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!Shp_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9bbd5ca-e951-4714-a94a-881c2925b525_2022x360.png&quot;&gt;&lt;img src=&quot;/assets/substack/my-python-documentation-toolchain/a9bbd5ca-e951-4714-a94a-881c2925b525_2022x360.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;One of the pain points about maintaining docstrings is that it&apos;s hard to keep them up-to-date. One of the reasons I like to keep documentation right in the code is that it make it easier to remember to update both. But having to remember something is always an unreliable plan. Fortunately, we can lean on some automation to help.&lt;/p&gt;
&lt;p&gt;We use GitLab CI/CD in the cloud and Makefiles locally for all our code quality checks including testing and linting (though I have my eye on migrating the later to the Python &lt;a href=&quot;https://pre-commit.com/&quot;&gt;pre-commit&lt;/a&gt; tool). For linting, we first run auto-formatting with the &lt;a href=&quot;https://black.readthedocs.io/en/stable/&quot;&gt;Black code formatter&lt;/a&gt; and then do a pass with &lt;a href=&quot;https://flake8.pycqa.org/en/latest/&quot;&gt;Flake8&lt;/a&gt;. To make sure our docstrings are accurate we use a tool call &lt;a href=&quot;https://github.com/terrencepreilly/darglint&quot;&gt;darglint&lt;/a&gt; which can run as a Flake8 extension. Darglint (&quot;document argument linter&quot;) checks to make sure that your function call signature and docstrings match. If you change one without updating the other Flake8 will fail.&lt;/p&gt;
&lt;p&gt;It&apos;s worth taking just a moment to talk about tool selection here. Darglint is a small project that &lt;em&gt;seems&lt;/em&gt; pretty stable. Nonetheless, it&apos;s much earlier in its development that anything else in our projects and I&apos;ve been burnt before by teammates including trendy new tools into our code, only to find a few years later that we have a critical dependency on a seemingly abandoned piece of technology.&lt;/p&gt;
&lt;p&gt;A key consideration for me when evaluating an architecture or tool is making sure I have a plan, or at least the ability, to migrate off of it. In this case, if we had to rip out darglint tomorrow, we would loose the ability to lint our docstrings which would only be an annoyance at worst and only require changing a few lines in a some config files. I decided this was worth the risk to help maintain our docstrings.&lt;/p&gt;
&lt;p&gt;Another way we keep the docs up-to-date is to use&lt;a href=&quot;https://docs.gitlab.com/ee/user/project/description_templates.html#creating-merge-request-templates&quot;&gt; GitLab merge templates&lt;/a&gt; which includes the ability to create a blocking markdown checkbox for the developer to confirm they&apos;ve updated the docs. This is a helpful nudge but it doesn&apos;t help if you forget what you changed or don&apos;t realize the scope of the impact of your changes. Ultimately, I don&apos; think there is a solution for that problem. But I do think making high-impact docs that people want to use will help you find the inevitable documentation problems sooner.&lt;/p&gt;
&lt;p&gt;So let&apos;s make our docs high-impact by building some webpages from our docstrings.&lt;/p&gt;
&lt;h3&gt;Documentation Webpages with Sphinx&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!E311!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa06dd6a-2260-4c21-9b35-5c94c43b2c0e_1458x718.png&quot;&gt;&lt;img src=&quot;/assets/substack/my-python-documentation-toolchain/fa06dd6a-2260-4c21-9b35-5c94c43b2c0e_1458x718.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;At this point in the process if we&apos;ve written our docstrings with even a moderate amount of care we now have an important body of knowledge about how our code works. And because we&apos;ve written them using strict standards we can now export our documentation into webpages. To do this we use a popular Python library called &lt;a href=&quot;https://www.sphinx-doc.org/en/master/&quot;&gt;Sphinx&lt;/a&gt; which generates static HTML pages. Sphinx was developed to create the documentation for the Python core library so the formatting out-of-the-box looks pretty similar to the historical python.org docs.&lt;/p&gt;
&lt;p&gt;You can build a perfectly usable documentation page using just vanilla Sphinx; but with a few plug-ins we can make a much nicer site with much less work. First, we add the &lt;a href=&quot;https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#module-sphinx.ext.autodoc&quot;&gt;autodoc&lt;/a&gt; extension which will automatically find and collect all the docstrings in our package. Then we add the &lt;a href=&quot;https://www.sphinx-doc.org/en/master/usage/extensions/viewcode.html&quot;&gt;viewcode&lt;/a&gt; extension which will allow us to click through to the raw source code within the doc pages themselves. The &lt;a href=&quot;https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html&quot;&gt;napoleon&lt;/a&gt; extension allows Sphinx to parse our Google style docstrings as opposed to Sphinx&apos;s default reStructured text format. Finally, and this is just personal preference, we can set our HTML theme to the &lt;a href=&quot;https://sphinx-rtd-theme.readthedocs.io/en/stable/&quot;&gt;ReadTheDocs theme&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Like I said, this is not intended to be a tutorial so I&apos;ve skipped a few steps such as setting up your &lt;code&gt;conf.py&lt;/code&gt; file, your base &lt;code&gt;index.rst&lt;/code&gt; file, and making sure your package is installable and not just a script. Fortunately, there are plenty of resources on these steps. Setting this stuff up is honestly a pain, but you only have to do it once and after that it&apos;s pretty stable. And once you&apos;ve figured it our for one project it&apos;s easy to repeat for all your projects.&lt;/p&gt;
&lt;p&gt;So now that we have a nice looking document on our local machine, how do we publish it? After all, telling someone to RTFM just doesn&apos;t have the same bite to it when they have to go build the docs from source on their local machine.&lt;/p&gt;
&lt;h3&gt;Going Live with GitLab Pages&lt;/h3&gt;
&lt;p&gt;We use GitLab as our version control platform which has a Pages feature that allows you to host static webpages from pipeline build artifacts (GitHub has similar functionality). The GitLab examples include a &lt;a href=&quot;https://gitlab.com/pages/sphinx&quot;&gt;Python + Sphinx example&lt;/a&gt; that you can pretty much just cut and paste and into your CI/CD and you&apos;ve now got live automatically documenting project.&lt;/p&gt;
&lt;p&gt;Nice work!&lt;/p&gt;
&lt;h3&gt;Closing Thoughts&lt;/h3&gt;
&lt;p&gt;It&apos;s worth pointing out that something interesting happens at this point. When docs are published and automatically updated I&apos;ve noticed they become a lot more &lt;em&gt;real&lt;/em&gt; to teams. People are proud of them and &lt;em&gt;want&lt;/em&gt; to keep the up-to-date. Questions get directed to the docs, the docs get updated to be more helpful, and you start to get a feedback loop.&lt;/p&gt;
&lt;p&gt;There were a number of other directions I wanted to take this article as I was writing it including building a documentation ecosystem by connecting the other self-documenting services in our cloud warehouse such as Looker and DBT. Or how by following the example of the &lt;a href=&quot;https://fastapi.tiangolo.com/&quot;&gt;FastAPI&lt;/a&gt; project data pipelines could use Python type hints to build even more automatic documentation. However, I eventually realized these topics were way too much for a single article so I split them out into drafts for future articles. I&apos;m quickly (re)learning that in writing just like in programming, scoping and structure are everything.&lt;/p&gt;
&lt;h3&gt;Odds and Ends&lt;/h3&gt;
&lt;p&gt;Around the Internet:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://newsletter.limitsofinference.com/p/overdiagnosis-why-scientists-and&quot;&gt;Overdiagnosis&lt;/a&gt; (The Limits of Inference) - If you love data and philosophy you should check out my friend Clare&apos;s substack. Her last article in particular on false positives and what they mean for widespread COVID testing is an excellent reminder of the importance of core statistical principles.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Currently Listening:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://open.spotify.com/album/36ARP3Wl5RQBoWEwdRU8In?si=10ZvMtIISUqOn3oQGx9o5Q&quot;&gt;Portrait of Wes&lt;/a&gt; (Wes Montgomery Trio): A friend of mine recently turned me on to under-appreciated Indiana jazz guitar player Wes Montgomery. Opting for Melvin Rhyne on organ instead of a bass player, his trios have an especially interesting sound. Check out the up-tempo cover of Art Blakey&apos;s &quot;Moanin&apos;&quot;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Currently Reading:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Death%27s_End&quot;&gt;Death&apos;s End / Three-Body Problem Trilogy&lt;/a&gt;: Just finished up this series. If you like your science fiction to take on a truly cosmological scale, are pretty comfortable with plot devices that rely heavily on physics concepts like the speed of light or the strong atomic force, and can handle characters that act more like platonic ideas of humans than actual people - then this is the trilogy for you. It&apos;s also interesting to think about how the author&apos;s Chinese cultural perspective comes through in his writing. If that last bit seems interesting I&apos;d recommend &lt;a href=&quot;https://en.wikipedia.org/wiki/China_in_Ten_Words&quot;&gt;China in Ten Words&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://edgeeffects.net/natures-metropolis/&quot;&gt;Nature&apos;s Metropolis&lt;/a&gt;: Absolutely blown away by this economic and business history of Chicago - I see why it was a Pulitzer Prize finalist. For example the chapter on grain could not sound more boring and yet ends up being an amazing primer for all the current stock market upheaval around GameStop.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
</content:encoded></item><item><title>Welcome!</title><link>https://acviana.com/posts/2020-12-31-welcome/</link><guid isPermaLink="true">https://acviana.com/posts/2020-12-31-welcome/</guid><description>Well, the good news is I didn&apos;t start a podcast ...</description><pubDate>Thu, 31 Dec 2020 23:50:10 GMT</pubDate><content:encoded>&lt;p&gt;It seems that every 4 years I get the itch to write a &lt;a href=&quot;https://acviana.github.io/&quot;&gt;tech blog&lt;/a&gt;. Recently, I was getting ready to breath some life back into my long dormant &lt;a href=&quot;https://rakhim.org/honestly-undefined/19/&quot;&gt;static web blog&lt;/a&gt; when my friend informed me it&apos;s 2021, blogs are out and &lt;em&gt;newsletters&lt;/em&gt; are back in -- time is a flat circle.&lt;/p&gt;
&lt;p&gt;Now it&apos;s the last few hours of 2020 and some of my friends have already begun subscribing to the empty URL where I have indicated there will one day be a newsletter. This seems like as good a reason as any to let go of my perfectionism and kick this thing off.&lt;/p&gt;
&lt;p&gt;So … here we go!&lt;/p&gt;
&lt;h3&gt;Who&apos;s Writing This Thing?&lt;/h3&gt;
&lt;p&gt;Whether you just wandered in from Twitter or if I&apos;ve known you for years, I thought it would be good to start things off with some background about myself to give you an of idea where I&apos;m coming from with this newsletter.&lt;/p&gt;
&lt;p&gt;My &lt;a href=&quot;https://sites.google.com/view/magviana&quot;&gt;parents&lt;/a&gt; are both biostatisticians so I grew up around discussions about how statistics isn&apos;t magic. I went to UW-Madison where I majored in Astronomy and Math. I spent the first 7 years of my career working for the Hubble Space Telescope at STScI cutting my teeth as a self-taught software developer and writing data analysis pipelines. Following that I spent 4.5 years at a dark web information security startup. I was their first hire starting as an engineer, then VP of Engineering, and finally VP of Technical Products. Since 2019 I&apos;ve been the Director of Data at HeathJoy, a Chicago-based healthcare startup, where I lead our Data Team. In my current role my projects span traditional analytics, data engineering, and data science as well as management across those fields (my opinions here are not the opinions of my employer etc.).&lt;/p&gt;
&lt;p&gt;My career has always spanned data and software. I was always the engineer who cared the most about benchmarks and monitoring or I was the analyst who cared the most about software and automation. Throughout my career I&apos;ve been deeply interested in the craft of writing quality software and producing data analysis that&apos;s fundamentally sound. I tend to ask why we&apos;re doing things which for better of worse has ended up with me leading new projects or teams. I&apos;ve learned some things about how to build and work in changing organizations.&lt;/p&gt;
&lt;h3&gt;What&apos;s the Plan?&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://substackcdn.com/image/fetch/$s_!24Pr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F042d113d-cc1d-4543-a875-6b311ece49ab_500x300.jpeg&quot;&gt;&lt;img src=&quot;/assets/substack/welcome/042d113d-cc1d-4543-a875-6b311ece49ab_500x300.jpeg&quot; alt=&quot;JK Rowling doesn&apos;t exist: conspiracy theories the internet can&apos;t resist |  Internet | The Guardian&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I want to write about a mix of software, data, technology, math and science, and how to lead teams that work in those areas. This isn&apos;t strictly a software newsletter but there will be some posts -- wait no, _issues -- _that focus pretty heavily on software development. But within that the emphasis will be less on tips and tricks and instead focus on the theme of building stable maintainable software. Overall though, there should be plenty of non-software topics.&lt;/p&gt;
&lt;p&gt;So what exactly does that look like? Here are some drafts I have on the back burner:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;What I learned doing &lt;a href=&quot;https://adventofcode.com/&quot;&gt;Advent of Code 2020&lt;/a&gt; with 5 of my friends&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;How I try to create reusable software project structures myself and the teams I lead&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;How to structure data team interactions with stakeholders&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A long read about chess, deep learning, and sports&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Some grumpy thoughts about deep learning in industry&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Side projects to measure my home&apos;s internet connection speed with a RaspberryPi, generate &lt;em&gt;perfect&lt;/em&gt; picture hanging arrangements, and analyze my chess performance&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;General odds and ends about tech, data, and some bits of culture to round things out.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But, to quote the title of another substack I read &lt;a href=&quot;https://ftrain.substack.com/&quot;&gt;I am absolutely going to bail on this in a month&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Odds and Ends&lt;/h3&gt;
&lt;p&gt;Around the Internet:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I&apos;ve started using &lt;a href=&quot;https://github.com/kmikiy/SpotMenu/tree/master&quot;&gt;SpotMenu&lt;/a&gt; for Spotify on OSX to display current track information and as mini-controller in my menu bar. It&apos;s such an obvious tool, I&apos;m amazed I&apos;ve gone so long without it. It makes me little bit nostalgic for WinAmp.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;After getting tired of my &lt;a href=&quot;https://ethanschoonover.com/solarized/&quot;&gt;Solarized Dark&lt;/a&gt; color scheme as my terminal and editor color scheme I&apos;ve switched over to &lt;a href=&quot;https://draculatheme.com/&quot;&gt;Dracula&lt;/a&gt; (created by a fellow Brazilian!).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Current Reading:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://edgeeffects.net/natures-metropolis/&quot;&gt;Nature&apos;s Metropolis&lt;/a&gt;: Deepening my love affair with Chicago after moving back here in 2017. This was the most recommended book in a &lt;a href=&quot;https://mcmansionhell.com/&quot;&gt;McMansion Hell&lt;/a&gt; Twitter thread asking for Chicago book recs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Death%27s_End&quot;&gt;Death&apos;s End&lt;/a&gt;: Wrapping up the Three Body Problem trilogy. My love/hate relationship with this series has tilted towards love as the themes have taken on a cosmological scale.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://avehtari.github.io/ROS-Examples/&quot;&gt;Regression And Other Stories&lt;/a&gt;: I&apos;ve been looking for a text that balances theory and practice like this for a while. I&apos;m sure I&apos;ll be writing more about it in the future.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Currently Listening:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://open.spotify.com/album/3hmzJ6czCNKNiSdRvFGToy?si=yFwe1ncGQyqaht6xPTxbjA&quot;&gt;Getting into Knives&lt;/a&gt; - The Mountain Goats: Trying to catch up on 2020 releases I missed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://open.spotify.com/album/4UG3kz6qoHtNI1glQ2wdon?si=LnxJZg0YQzuEimiUOjLDuA&quot;&gt;Operation: Doomsday&lt;/a&gt; - MF DOOM: While writing this I was deeply saddened to learn that one of my favorite MCs, &lt;a href=&quot;https://en.wikipedia.org/wiki/MF_Doom&quot;&gt;MF DOOM&lt;/a&gt;, has passed away.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a href=&quot;https://genius.com/Mf-doom-doomsday-lyrics#note-1837315&quot;&gt;On Doomsday!&lt;/a&gt;, &lt;a href=&quot;https://genius.com/Mf-doom-doomsday-lyrics#note-42199&quot;&gt;ever since the womb &apos;til I&apos;m back where my brother went&lt;/a&gt;, &lt;a href=&quot;https://genius.com/Mf-doom-doomsday-lyrics#note-164076&quot;&gt;that&apos;s what my tomb will say&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;https://genius.com/Mf-doom-doomsday-lyrics#note-42200&quot;&gt;Right above my government; Dumile&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;https://genius.com/Mf-doom-doomsday-lyrics#note-164078&quot;&gt;Either unmarked or engraved, hey, who&apos;s to say?&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Rest in Power DOOM&lt;/p&gt;
&lt;hr /&gt;
&lt;hr /&gt;
</content:encoded></item><item><title>An Intro to Regex</title><link>https://acviana.com/posts/2017-05-10-intro-to-regex/</link><guid isPermaLink="true">https://acviana.com/posts/2017-05-10-intro-to-regex/</guid><description>A presentation on using Regex for analysts</description><pubDate>Wed, 10 May 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&amp;lt;center&amp;gt;
&lt;img src=&quot;https://dl.dropboxusercontent.com/s/7yhsckbmoairs7o/mother-and-daughter-talk-e1447834530360.jpg?dl=0&quot; alt=&quot;alt text&quot; /&gt;
&amp;lt;/center&amp;gt;&lt;/p&gt;
&lt;h2&gt;How to Talk to your Analysts about Regex&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Alex Viana - 05/05/17&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The following is a presentation I gave to some non-techincal collegues at work to help them get started on reading and writing regex expressions. Sharing in case others find it useful as well.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Regular Expressions (Regexs) are a way of defining search patterns that can be applied to text. Throughout this document regexes will be written in Python syntax, quoted and preceded by a letter &lt;code&gt;r&lt;/code&gt;, like this &lt;code&gt;r&quot;a regex pattern&quot;&lt;/code&gt;, to differentiate it from raw text, like this &lt;code&gt;&quot;some raw text&quot;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A regex can be as simple as a literal match like this:&lt;/p&gt;
&lt;p&gt;r&quot;match this string exactly&quot;&lt;/p&gt;
&lt;p&gt;Which would only match against the literal text &lt;code&gt;&quot;match this string exactly&quot;&lt;/code&gt;. By assigning special properties to certain characters regex can also search for text that matches complicated patters like this:&lt;/p&gt;
&lt;p&gt;r&quot;10.(6[4-9]|[7-9]\d|1\d\d|2[0-4]\d|25[0-5]).(1?\d\d?|2[0-4]\d|25[0-5]).(1?\d\d?|2[0-4]\d|25[0-5])&quot;&lt;/p&gt;
&lt;p&gt;This &lt;a href=&quot;https://stackoverflow.com/questions/39884618/regex-to-find-a-range-in-an-ip-address&quot;&gt;regex&lt;/a&gt; would match against an IP address range that begins with &lt;code&gt;10.&lt;/code&gt;, has a second octect in in the 64-255 range, and accepts any values for the third and fourth octets.&lt;/p&gt;
&lt;p&gt;Complicated regex patterns like this can be pretty inscrutable even when they are working correctly and hence the old joke:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You have a problem. You try to solve it with Regular Expressions. Now you have two problems.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Still, regexes are very useful and you can get a lot of mileage out of just a few parts of the regex syntax.&lt;/p&gt;
&lt;h2&gt;Testing Regexes&lt;/h2&gt;
&lt;p&gt;I highly recommend using the &lt;a href=&quot;regex101.com&quot;&gt;Regex101 website&lt;/a&gt; to test regexes, both to understand existing expressions and to test expressions you are writing. Try it out to play with the example searches in this tutorial.&lt;/p&gt;
&lt;h2&gt;Special Characters&lt;/h2&gt;
&lt;p&gt;As we saw, matching string literals, i.e. patterns that have only one possible match, is trivial:&lt;/p&gt;
&lt;p&gt;Regex: r&quot;this&quot;
Matches: &quot;this&quot;
Misses: &quot;that&quot;&lt;/p&gt;
&lt;p&gt;The regex syntax is largely built around assigning special properties to punctuation marks. As a result, if you ever want to literally search for any of these characters you will have to &lt;em&gt;escape&lt;/em&gt; those characters with a backslash (&lt;code&gt;\&lt;/code&gt;). So if you wanted to search for a backslash you would have to escape the backslash like this:&lt;/p&gt;
&lt;p&gt;Regex: r&quot;\&quot;
Matches: &quot;&quot;
Misses: &quot;Something else&quot;&lt;/p&gt;
&lt;p&gt;Pretty meta right? Similarly if you wanted to search for &lt;code&gt;&quot;google.com&quot;&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;Regex: r&quot;google.com&quot;
Matches: &quot;google.com&quot;
Misses: &quot;anything else&quot;&lt;/p&gt;
&lt;h2&gt;Replacing a Single Character&lt;/h2&gt;
&lt;p&gt;String literals without special characters aren&apos;t very useful thought, the power of regex comes from matching patterns. One of the simplest regex patterns is a period (&lt;code&gt;.&lt;/code&gt;), which means &quot;match any single character in this position&quot;. For example:&lt;/p&gt;
&lt;p&gt;Regex: r&quot;1.3&quot;
Matches: &quot;123&quot;, &quot;1.3&quot;
Misses: &quot;13&quot;&lt;/p&gt;
&lt;p&gt;Because the period is a special character, as we saw in the last section you would have to escape it if you literally wanted to search for &lt;code&gt;&quot;1.3&quot;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Regex: r&quot;1.3&quot;
Matches: &quot;1.3&quot;
Misses: &quot;123&quot;&lt;/p&gt;
&lt;p&gt;We can constrain the possible characters we match on by using square brackets (&lt;code&gt;[]&lt;/code&gt;). For example if we only wanted to match on the numbers 1, 2, or 3:&lt;/p&gt;
&lt;p&gt;Regex: r&quot;1[123]3&quot;
Matches: &quot;113&quot;, &quot;123&quot;, &quot;133&quot;
Misses: &quot;13&quot;, &quot;143&quot;&lt;/p&gt;
&lt;p&gt;A common pattern in regex is matching against any letter. This can be done by specifying a range like this &lt;code&gt;[1-3]&lt;/code&gt;. So any lowercase letter would be &lt;code&gt;[a-z]&lt;/code&gt;, uppercase would be &lt;code&gt;[A-Z]&lt;/code&gt;, both would be &lt;code&gt;[a-zA-Z]&lt;/code&gt;, and adding letters would be &lt;code&gt;[a-zA-Z0-9]&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Regex: r&quot;[a-zA-Z0-9]&quot;
Mathes: Any uppercase letter, lowercase letter, or number
Misses: &quot;!!!&quot;&lt;/p&gt;
&lt;h2&gt;Replacing Multiple Characters&lt;/h2&gt;
&lt;p&gt;Using just square brackets will match only character one time, we can also match against a character or set of characters repeatedly by using the curly braces &lt;code&gt;{}&lt;/code&gt;, where the number in the curly braces indicates the number of times, or range of times, to match.&lt;/p&gt;
&lt;p&gt;For starters we can rewrite our last regex which, matches only once, as &lt;code&gt;r&quot;1[123]{1}3&quot;&lt;/code&gt; and produce the same search results (try it!).&lt;/p&gt;
&lt;p&gt;Repeating the set match twice then looks like this:&lt;/p&gt;
&lt;p&gt;Regex: r&quot;1[123]{2}3&quot;
Matches: &quot;1113&quot;, &quot;1123&quot;, &quot;1333&quot;
Misses: &quot;1243&quot;&lt;/p&gt;
&lt;p&gt;We can also repeat a variable number of times. To repeat a set between x and y times looks like &lt;code&gt;{x,y}&lt;/code&gt;. For example this regular expression will match from the set &lt;code&gt;[123]&lt;/code&gt; either once or twice (but not zero times).&lt;/p&gt;
&lt;p&gt;Regex: r&quot;1[123]{1,2}3&quot;
Matches: &quot;113&quot;, &quot;123&quot;, &quot;133&quot;, &quot;1113&quot;, &quot;1123&quot;, &quot;1333&quot;
Misses: &quot;13&quot;, &quot;143&quot;, &quot;1243&quot;&lt;/p&gt;
&lt;h2&gt;Word Boundaries&lt;/h2&gt;
&lt;p&gt;Another useful pattern is defining the boundaries of what you want to match against. You can do this with any string literal like we did in the section on replacing a single character where the search terms were all bound by a &lt;code&gt;1&lt;/code&gt; and a &lt;code&gt;3&lt;/code&gt;. You could bound a match with whitespace using a space ``. More generally you can use the &lt;code&gt;\b&lt;/code&gt; for any word break. For example&lt;/p&gt;
&lt;p&gt;Regex: r&quot;\bthis\b&quot;
Matches: &quot;that this that&quot;
Misses: &quot;thatthisthat&quot;&lt;/p&gt;
&lt;h2&gt;Logical Operators&lt;/h2&gt;
&lt;p&gt;The vertical bar (&lt;code&gt;|&lt;/code&gt;) is a logical or operator. For example:&lt;/p&gt;
&lt;p&gt;Regex: r&quot;tyler|perry&quot;
Matches: &quot;Katie Perry&quot;, &quot;Steve Tyler&quot;, &quot;Tyler Perry&quot;, &quot;Perry Como&quot;
Misses: &quot;something else&quot;&lt;/p&gt;
&lt;h2&gt;Next Steps&lt;/h2&gt;
&lt;p&gt;There are a lot more you can do with regexes and a lot more syntax to learn but hopefully this was a useful starting point.&lt;/p&gt;
&lt;p&gt;If you &lt;em&gt;really&lt;/em&gt; want to dive into the deep end check out one of my favorite regex problems, trying to write a regex that &lt;a href=&quot;https://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url&quot;&gt;matches any valid URL&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Congratulations, you now have two problems!&lt;/p&gt;
</content:encoded></item><item><title>What The F*ck Is The Internet Vol. 1 -  Bits, Encoding, Packets, and Protocols</title><link>https://acviana.com/posts/2017-03-17-what-the-fck-is-the-internet-vol-1/</link><guid isPermaLink="true">https://acviana.com/posts/2017-03-17-what-the-fck-is-the-internet-vol-1/</guid><description>A short presentation about how data is encoded over networks</description><pubDate>Fri, 17 Mar 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I organize a weekly series of tech talks at work. It&apos;s a good way to stay up on what technologies people are using in and out of the office. Recently, I&apos;ve started presenting at these talks on various computer science fundamentals I was never exposed to in college, using the presentation as a driver to read up on these topics.&lt;/p&gt;
&lt;p&gt;Last winter I read a &lt;a href=&quot;https://www.amazon.com/Cryptography-Short-Introduction-Fred-Piper/dp/0192803158&quot;&gt;short book on cryptography&lt;/a&gt; and gave a presentation on various &lt;a href=&quot;https://en.wikipedia.org/wiki/History_of_cryptography&quot;&gt;historical cyphers&lt;/a&gt; as a way to provide a context for modern cryptographic techniques (there were no slides). The theme was basically, &quot;we tied this and it didn&apos;t work for this reason so now we do this instead&quot;.&lt;/p&gt;
&lt;p&gt;I followed that up a few weeks ago with a presentation on basic networking. And when I say basic, I really mean from the from the ground up starting with binary encoding and work up only as far as protocols. Because we sometimes have non technical members of our team sit in on these talks I made en effort to really walk through all the fundamentals step by step. Here are the slides:&lt;/p&gt;
&lt;p&gt;&amp;lt;iframe src=&quot;www.slideshare.net/slideshow/embed_code/key/BCiUP00odBwerZ&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen&amp;gt; &amp;lt;/iframe&amp;gt; &amp;lt;div style=&quot;margin-bottom:5px&quot;&amp;gt; &amp;lt;strong&amp;gt; &amp;lt;a href=&quot;www.slideshare.net/alexcostaviana/what-the-fck-is-the-internet-vol-1&quot; title=&quot;What the f&lt;em&gt;ck is the internet? - vol. 1&quot; target=&quot;_blank&quot;&amp;gt;What the f&lt;/em&gt;ck is the internet? - vol. 1&amp;lt;/a&amp;gt; &amp;lt;/strong&amp;gt; from &amp;lt;strong&amp;gt;&amp;lt;a target=&quot;_blank&quot; href=&quot;www.slideshare.net/alexcostaviana&quot;&amp;gt;alexcostaviana&amp;lt;/a&amp;gt;&amp;lt;/strong&amp;gt; &amp;lt;/div&amp;gt;&lt;/p&gt;
&lt;p&gt;Hopefully, I&apos;ll be able to follow this up later in the spring with another presentation digging into how network messages are routed using IP address and sockets.&lt;/p&gt;
&lt;p&gt;Meanwhile, as my experience of being a self-taught software engineer becomes increasingly common it&apos;s good to see new resources springing up as more developers try to fill in our knowledge gaps. I particularly liked this site which popped up on Hacker News this week: &lt;a href=&quot;https://teachyourselfcs.com/&quot;&gt;https://teachyourselfcs.com/&lt;/a&gt;&lt;/p&gt;
</content:encoded></item><item><title>Special Methods in Python</title><link>https://acviana.com/posts/2017-03-15-python_special_methods/</link><guid isPermaLink="true">https://acviana.com/posts/2017-03-15-python_special_methods/</guid><description>Exploring Python&apos;s build-in methods</description><pubDate>Wed, 15 Mar 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;em&gt;This post is intended to accompany a presentation I gave to the &lt;a href=&quot;https://www.meetup.com/baltimore-python/events/238058777/&quot;&gt;Baltimore Python Meetup&lt;/a&gt; on Wednesday, March 15th 2017.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This post assumes some familiarity with object-oriented programming as it relates to classes and inheritance in Python.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The purpose of this post is to explore Python&apos;s &lt;a href=&quot;https://docs.python.org/3/reference/datamodel.html#special-method-names&quot;&gt;special methods&lt;/a&gt;, also called &quot;magic&quot;, &quot;built-in&quot;, &quot;double underscore&quot;, or &quot;dunder&quot; methods. A common example is the &lt;code&gt;__init__&lt;/code&gt; method. These methods are considered special because they are referenced by Python to determine class behavior. Note that the double underscore notation is just a naming convention for indicating which methods Python considers special. Adding a double underscore to any other method (e.g. &lt;code&gt;__foo__&lt;/code&gt;) doesn&apos;t give it any special properties.&lt;/p&gt;
&lt;h2&gt;Special Methods&lt;/h2&gt;
&lt;p&gt;To explore how these methods work we&apos;re going to pretend we&apos;re building a Scrabble-type game. A natural object for this type of project would be an object that encapsulates a word and it&apos;s score. Let&apos;s start with a super basic class:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;In [1]: class ScrabbleWord():
  ...:     pass
  ...:
In [2]: my_word = ScrabbleWord()
In [3]: my_word.word = &apos;cat&apos;
In [4]: my_word.score = 5
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We start with an &quot;empty&quot; class, and manually assigned it two attributes, a word, and a score which I calculated by consulting a table of scrabble letter values. Using Python&apos;s &lt;code&gt;dir()&lt;/code&gt; function we can inspect the attributes and methods on the &lt;code&gt;my_word&lt;/code&gt; object.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;In [1]: dir(my_word)
Out[1]: [&apos;__class__&apos;, &apos;__delattr__&apos;, &apos;__dict__&apos;, &apos;__doc__&apos;,
  &apos;__format__&apos;, &apos;__getattribute__&apos;, &apos;__hash__&apos;, &apos;__init__&apos;,
  &apos;__module__&apos;, &apos;__new__&apos;, &apos;__reduce__&apos;, &apos;__reduce_ex__&apos;, &apos;__repr__&apos;,
  &apos;__setattr__&apos;, &apos;__sizeof__&apos;, &apos;__str__&apos;, &apos;__subclasshook__&apos;,
  &apos;__weakref__&apos;, &apos;score&apos;, &apos;word&apos;]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At the very end, we can see the &lt;code&gt;score&lt;/code&gt; and &lt;code&gt;word&lt;/code&gt; attributed I added as well as about a dozen other methods and attributes which all Python classes have. These other attributes, which all happen to be special attributes and methods, came from the Python base class. These are the minimum set of methods that Python will need to call to be able to use an object so it injects them into all its classes. Let&apos;s start looking at what we can do with these special methods.&lt;/p&gt;
&lt;h2&gt;The __init__ Method&lt;/h2&gt;
&lt;p&gt;One of the first things we can change about our &lt;code&gt;ScrabbleWord&lt;/code&gt; class is that we have to manually add the the word and the score. Since we will always want to do this for every class instance we should just make it part of the object creation.&lt;/p&gt;
&lt;p&gt;To define the way an object is created (also referred to as &quot;instantialized&quot;) we can create an &lt;code&gt;__init__&lt;/code&gt; method. If we look at the output of the &lt;code&gt;dir()&lt;/code&gt; function though we can see that &lt;code&gt;ScrabbleWord&lt;/code&gt; already has an &lt;code&gt;__init__&lt;/code&gt; method. Because we are replacing an existing method inherited from the parent class, this is called operator overloading. In this case the parent, implicit in our class definition, is the Python base class for all objects.&lt;/p&gt;
&lt;p&gt;As it turns out, all classes have an &lt;code&gt;__init__&lt;/code&gt; method. This is because Python always uses the &lt;code&gt;__init__&lt;/code&gt; method to create class instances. This is an example of what we mean when we say &lt;code&gt;__init__&lt;/code&gt; is a &quot;special&quot; method; these methods are referenced by Python to determine class behavior. Our replacement looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class ScrabbleWords(object):

    letter_values = {
        &apos;a&apos;: 1, &apos;b&apos;: 3, &apos;c&apos;: 3, &apos;d&apos;: 2, &apos;e&apos;: 1, &apos;f&apos;: 4, &apos;g&apos;: 2, &apos;h&apos;: 4,
        &apos;i&apos;: 1, &apos;j&apos;: 8, &apos;k&apos;: 5, &apos;l&apos;: 1, &apos;m&apos;: 3, &apos;n&apos;: 1, &apos;o&apos;: 1, &apos;p&apos;: 3,
        &apos;q&apos;: 10, &apos;r&apos;: 1, &apos;s&apos;: 1, &apos;t&apos;: 1, &apos;u&apos;: 1, &apos;v&apos;: 4, &apos;q&apos;: 10,
        &apos;w&apos;: 4, &apos;x&apos;: 8, &apos;y&apos;: 4, &apos;z&apos;: 10
    }

    def __init__(self, word):
        self.word = word
        self.score = sum(
            [ScrabbleWords.letter_values[letter] for letter in self.word]
	      )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our &lt;code&gt;__init__&lt;/code&gt; method now takes an argument called &lt;code&gt;word&lt;/code&gt; which it assigns as an attribute to the class. I added a class attribute called &lt;code&gt;letter_values&lt;/code&gt; that is accessed as an attribute on the class,  i.e. &lt;code&gt;ScrabbleWords.letter_values&lt;/code&gt;. This attribute is a dictionary that maps letters to their Scrabble score. &lt;code&gt;__init__&lt;/code&gt; uses this dictionary in a list comprehension to turn the input word into a list of letter scores, sum those up, and then assign the sum to the &lt;code&gt;score&lt;/code&gt; attribute. Calling the class now looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;In [1]: my_word = ScrabbleWords(&apos;cat&apos;)

In [2]: my_word.word
Out[2]: &apos;cat&apos;

In [3]: my_word.score
Out[3]: 5
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Not bad, we now provide the word at the same time as we create the object and both the word the calculated score are saved on the object. Let&apos;s see what else we can do.&lt;/p&gt;
&lt;h2&gt;The __repr__ and __str__ and Methods&lt;/h2&gt;
&lt;p&gt;Let&apos;s take a look at what happens when we try to inspect or print our class instance [^1].&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;In [1]: my_word
Out[1]: &amp;lt;__main__.ScrabbleWord instance at 0x1084c80e0&amp;gt;

In [2]: print(my_word)
Out[2]: &amp;lt;__main__.ScrabbleWord instance at 0x1084c80e0&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This tells us that the &lt;code&gt;my_word&lt;/code&gt; object is an instance of the &lt;code&gt;ScrabbleWord&lt;/code&gt; class from the &lt;code&gt;__main__&lt;/code&gt; namespace and is located at memory address &lt;code&gt;0x1084c80e0&lt;/code&gt;. While this is all correct it&apos;s the default representation of any Python object and that information is not as useful as it could be for building our Scrabble game&lt;/p&gt;
&lt;p&gt;We can make the results more descriptive by overloading the &lt;code&gt;__repr__&lt;/code&gt; and &lt;code&gt;__str__&lt;/code&gt; methods to fit our use case. This could look like this.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    def __repr__(self):
        return &apos;{}: {} points&apos;.format(self.word, self.score)

    def __str__(self):
        return &apos;{}: {} points&apos;.format(self.word, self.score)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now interacting with our object will look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;In [1]: my_word
Out[1]: cat: 5 points

In [2]: print(my_word)
cat: 5 points
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can read more about the difference between the &lt;code&gt;__repr__&lt;/code&gt; and &lt;code&gt;__str__&lt;/code&gt; methods in this &lt;a href=&quot;https://stackoverflow.com/questions/1436703/difference-between-str-and-repr-in-python&quot;&gt;Stack Overflow Question&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;The __eq__ and Other Operator Methods&lt;/h2&gt;
&lt;p&gt;At some point we&apos;re probably going to want some type of score optimization algorithm for our game, maybe for a simple computer opponent. To do that it would be helpful to have a way of comparing the value of two &lt;code&gt;ScrabbleWord&lt;/code&gt; objects. Right now we can do that manually by explicitly comparing the &lt;code&gt;score&lt;/code&gt; attribute.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;In [1]: cat_word  = ScrabbleWords(&apos;cat&apos;)

In [2]: dog_word = ScrabbleWords(&apos;dog&apos;)

In [3]: cat_word
Out[3]: cat: 5 points

In [4]: dog_word
Out[4]: dog: 5 points

In [5]: cat_word.score == dog_word.score
Out[5]: True

In [6]: cat_word.score &amp;gt; dog_word.score
Out[6]: False
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But if we start to implicitly compare the words Python doesn&apos;t know we mean to compare the score and we get unexpected behavior.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;In [5]: cat_word == dog_word
Out[5]: False

In [8]: cat_word &amp;gt; dog_word
Out[8]: True
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It turns out we can define the behavior of these comparison operators &lt;code&gt;==&lt;/code&gt;, &lt;code&gt;!=&lt;/code&gt;, &lt;code&gt;&amp;lt;&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;=&lt;/code&gt;, and &lt;code&gt;&amp;gt;=&lt;/code&gt; using the &lt;code&gt;__eq__&lt;/code&gt;, &lt;code&gt;__ne__&lt;/code&gt;, &lt;code&gt;__lt__&lt;/code&gt;, &lt;code&gt;__gt__&lt;/code&gt;, &lt;code&gt;__le__&lt;/code&gt;, and &lt;code&gt;__ge__&lt;/code&gt; respectively. If we want to make our comparisons implicitly use the score attribute to prevent us from having to write it out every time our definitions could look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def __eq__(self, y):
    if self.score == y.score:
        return True
    else:
        return False

def __ne__(self, y):
    if self.score != y.score:
        return True
    else:
        return False

def __lt__(self, y):
    if self.score &amp;lt; y.score:
        return True
    else:
        return False

def __gt__(self, y):
    if self.score &amp;gt; y.score:
        return True
    else:
        return False

def __le__(self, y):
    if self.score &amp;lt;= y.score:
        return True
    else:
        return False

def __ge__(self, y):
    if self.score &amp;gt;= y.score:
        return True
    else:
        return False
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now using our object would go like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;In [1]: cat_word == dog_word
Out[1]: True

In [2]: cat_word != dog_word
Out[2]: False

In [3]: cat_word &amp;lt; dog_word
Out[3]: False

In [4]: cat_word &amp;gt; dog_word
Out[4]: False

In [5]: cat_word &amp;lt;= dog_word
Out[5]: True

In [6]: cat_word &amp;gt;= dog_word
Out[6]: True
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&apos;s worth pointing out what&apos;s happening here. Taking the &lt;code&gt;__eq__&lt;/code&gt; method as an example. Where Python sees the &lt;code&gt;==&lt;/code&gt; operator between two instance of our &lt;code&gt;ScrabbleWords&lt;/code&gt; class it&apos;ll evaluate the &lt;code&gt;__eq__&lt;/code&gt; method to determine the result.&lt;/p&gt;
&lt;p&gt;But looking at our code and you can see the comparison being performed is &lt;code&gt;self.score == y.score&lt;/code&gt; which is something like &lt;code&gt;&amp;lt;int&amp;gt; == &amp;lt;int&amp;gt;&lt;/code&gt;. So what determines the answer of &lt;em&gt;that&lt;/em&gt; statement? Because everything in python is an object, including integers, Python uses the integer classes &lt;code&gt;__eq__&lt;/code&gt; method to evaluate this.&lt;/p&gt;
&lt;h2&gt;The __iter__ and __len__ and Methods&lt;/h2&gt;
&lt;p&gt;When we play a word on the board in Scrabble we might care about how long a word is (to see if it&apos;ll fit on the board) and we might need to look at a word letter-by-letter (to evaluate things like bonus letter scores). If we try to do either of these things with Our class in it&apos;s current state we&apos;ll see neither are not supported.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;	In [1]: len(cat_word)
	---------------------------------------------------------------------------
	AttributeError                            Traceback (most recent call last)
	&amp;lt;ipython-input-48-7ee5d4e3efb8&amp;gt; in &amp;lt;module&amp;gt;()
	----&amp;gt; 1 len(cat_word)

	AttributeError: ScrabbleWords instance has no attribute &apos;__len__&apos;

	In [2]: [letter for letter in cat_word]
	---------------------------------------------------------------------------
	TypeError                                 Traceback (most recent call last)
	&amp;lt;ipython-input-49-b747de29668a&amp;gt; in &amp;lt;module&amp;gt;()
	----&amp;gt; 1 [letter for letter in cat_word]

	TypeError: iteration over non-sequence
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again we can get around this by directly accessing the attribute we what to work with, in this case the &lt;code&gt;word&lt;/code&gt; attribute.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;In [1]: len(cat_word.word)
Out[1]: 3

In [2]: [letter for letter in cat_word.word]
Out[2]: [&apos;c&apos;, &apos;a&apos;, &apos;t&apos;]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But, following the theme of this post, we can do this implicitly with the special methods, in this case the &lt;code&gt;__len__&lt;/code&gt; and &lt;code&gt;__iter__&lt;/code&gt; methods.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def __iter__(self):
    for letter in self.word:
        yield letter

def __len__(self):
    return len(self.word)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now our class works like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;In [43]: len(cat_word)
Out[43]: 3

In [44]: [letter for letter in cat_word]
Out[44]: [&apos;c&apos;, &apos;a&apos;, &apos;t&apos;]
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Python special methods become increasingly important as you start working on more involved projects. Either you will be writing your own libraries, in which case these are common &quot;pythonic&quot; programming patterns. Or even if you are leveraging existing libraries, at some point you will need to understand exactly what some special methods are doing in order to understanding some bug or quirk in the library.&lt;/p&gt;
&lt;p&gt;Just doing some quick poking around on the github repos for some of my favorite open source projects turns up a couple of examples of classes with lots of special methods in the wild: &lt;a href=&quot;https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/engine/result.py#L42&quot;&gt;BaseRowProxy&lt;/a&gt; in SQLAlchemy, &lt;a href=&quot;https://github.com/astropy/astropy/blob/master/astropy/coordinates/sites.py#L26&quot;&gt;SiteRegistry&lt;/a&gt; in Astropy.&lt;/p&gt;
&lt;p&gt;[^1]: Note I&apos;m using the Python 3 style print statement. To use this in Python 2 run &lt;code&gt;from __future__ import print_function&lt;/code&gt;&lt;/p&gt;
</content:encoded></item><item><title>Setting Up Pelican with GitHub Pages</title><link>https://acviana.com/posts/2017-02-25-setting-up-pelican-with-github-pages/</link><guid isPermaLink="true">https://acviana.com/posts/2017-02-25-setting-up-pelican-with-github-pages/</guid><description>Obligatory metapost on setting up a blog</description><pubDate>Sat, 25 Feb 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This is a short post about how I set up my pelican project to deploy with GitHub pages. Specifically, I want to version control both the output static pages that are being served as well as the source Markdown files used to generate the output. What I ended up isn&apos;t novel but it did take me a while to wrap my head around and wasn&apos;t highlighted in any of the documentation or blog posts I found so I wanted to share. Hopefully, it can save you some time.&lt;/p&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;docs.getpelican.com/&quot;&gt;Pelican&lt;/a&gt; is a static blogging tool written in Python that generates static webpages from Markdown. &lt;a href=&quot;https://pages.github.com/&quot;&gt;GitHub Pages&lt;/a&gt; is a site hosting feature from GitHub that will serve the content of a repo in your account named &lt;code&gt;$USERNAME.github.io&lt;/code&gt; to that same URL - super easy. This makes for a great blogging workflow, just write some Markdown, peep the output, commit, push, done! The only complication in setting up this workflow is that you project actually contains two different types of content, the source and the output, which you want to store separately.&lt;/p&gt;
&lt;h2&gt;The Problem&lt;/h2&gt;
&lt;p&gt;The problem I ran into is that when you run the pelican quickstart command you get a file tree that looks roughly like this.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   ├── Makefile
   ├── content/
   ├── output/
   ├── pelicanconf.py
   └── publishconf.py
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With most new projects you just initialize a new repo at the root level and go. But, I didn&apos;t want everything in the same place. I didn&apos;t want the source Markdown files, config files, etc. to be served to my GitHub pages site, I only want the contents of the &lt;code&gt;output/&lt;/code&gt; folder. At the same time though, I do want to keep track of the source Markdown files. These are critical if I ever want to regenerate the output files to do something like fix a typo or change the site theme.&lt;/p&gt;
&lt;p&gt;What&apos;s worse is that I already &lt;em&gt;had&lt;/em&gt; two repos set up like this on GitHub from when I was last actively blogging. But over the course of the 3 years I had forgotten Pelican workflow, missed a few releases, and changed laptops so I was struggling to remember how to reconnect them. I tried playing around with having two parallel projects with GitHub submodules, git branches, symbolic links, and shell scripting to sync the files over but all of these felt really hacky. Finally, I remembered what I had done.&lt;/p&gt;
&lt;h2&gt;My Solution&lt;/h2&gt;
&lt;p&gt;What I ended up doing was cloning my pages git repo into the &lt;code&gt;content/&lt;/code&gt; folder and the source file repo into the &lt;code&gt;output/&lt;/code&gt; folder.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   ├── Makefile
   ├── content/
   │   ├── 2013/
   │   ├── 2014/
   │   ├── 2017/
   │   ├── images
   │   ├── pages
   │   └── .git
   ├── output/
   │   ├── index.html
   │   ├── ...
   │   └── .git
   ├── pelicanconf.py
   └── publishconf.py
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It seems obvious in hindsight but because it&apos;s a break from the traditional model of one repo per project that it took a while for me to think of it.&lt;/p&gt;
&lt;h2&gt;Sources&lt;/h2&gt;
&lt;p&gt;So initially, about 4 years ago, I modeled a lot of my pelican configuration and setup around &lt;a href=&quot;https://jakevdp.github.io/blog/2013/05/07/migrating-from-octopress-to-pelican/&quot;&gt;this post&lt;/a&gt; by Jake Vanderplas. Rereading it now I recognized lots of the configuration decisions I had made in the past but couldn&apos;t find anything about how he set up his repos. Finally, I think what I must have done all those years ago is looked at his &lt;a href=&quot;https://github.com/jakevdp/PythonicPerambulations&quot;&gt;github source&lt;/a&gt; for his blog and copied the pattern from there.&lt;/p&gt;
&lt;p&gt;Actually, what Jake does is better than my setup. He has a git repo at the root level so all his config settings are saved, with a .gitignore file the ignores the &lt;code&gt;output/&lt;/code&gt; directory which I&apos;m assuming contains another git repo which he uses for deployment. Nested git repos, neat!&lt;/p&gt;
&lt;p&gt;Like I said, this isn&apos;t anything most people couldn&apos;t have figured out on their own eventually it took me longer than it should have so hopefully this saves you some time.
s&lt;/p&gt;
</content:encoded></item><item><title>Exploring Python&apos;s &apos;in&apos; operator</title><link>https://acviana.com/posts/2017-02-17-exploring-pythons-in-operator/</link><guid isPermaLink="true">https://acviana.com/posts/2017-02-17-exploring-pythons-in-operator/</guid><description>Digging into how the in operator works</description><pubDate>Fri, 17 Feb 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I spent 12 hours on a plane this week traveling between the East Coast and San Fransisco for the RSA conference. I&apos;ve appreciated the way flying forces me to disconnect and spend a little bit more time on side projects such as this blog. On this particular flight picked up an idea I&apos;ve been wanting to play with for a while, overloading Python&apos;s special object methods.&lt;/p&gt;
&lt;h2&gt;Magic Methods&lt;/h2&gt;
&lt;p&gt;These are the object methods that begin with a double underscore such as &lt;code&gt;__init__&lt;/code&gt; and are also known as &quot;magic&quot;, &quot;double-underscore&quot;, or &quot;dunder&quot; methods. These methods define a lot of the inherent behavior of Python objects, including the Python built-ins such as lists and dictionaries. These methods also define how objects interact with operators such as &lt;code&gt;+&lt;/code&gt; (the &lt;code&gt;__add__&lt;/code&gt; method) and &lt;code&gt;&amp;lt;=&lt;/code&gt; (the &lt;code&gt;__leq__&lt;/code&gt; operator).&lt;/p&gt;
&lt;p&gt;Because so much of how Python works is wrapped up in these method I figured messing with them would be a good way to gain a deeper understanding of the language. So I started making little toy classes that implemented these methods in weird ways and everything was going as I expected until I got to the &lt;code&gt;__contains__&lt;/code&gt; method. The &lt;code&gt;__contains__&lt;/code&gt; method is the method used to check membership using the &lt;code&gt;in&lt;/code&gt; operator. For example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;In [1]: &apos;foo&apos; in [&apos;foo&apos;, &apos;bar&apos;]
Out[1]: True

In [2]: [&apos;foo&apos;, &apos;bar&apos;].__contains__(&apos;foo&apos;)
Out[2]: True

In [3]: 1 in [&apos;foo&apos;, &apos;bar&apos;]
Out[3]: False

In [4]: [&apos;foo&apos;, &apos;bar&apos;].__contains__(1)
Out[4]: False
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In each case the &lt;code&gt;__contains__&lt;/code&gt; method of Python&apos;s built-in list object is being called to evaluate the &lt;code&gt;in&lt;/code&gt; operator. As it turns out this is an over simplification. I realized this when I took my first pass at building my own &lt;code&gt;__contains__&lt;/code&gt; method and I did something a little weird:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class MyObject(object):

    def __contains__(self, y):
        return &apos;__contains__&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is a non-sense method, returning a string instead of a Boolean is unexpected behavior and doesn&apos;t make sense in the context of evaluating an object membership. But I just figured it would always evaluate to &lt;code&gt;False&lt;/code&gt;. A little more exploration showed that there was something special about the &lt;code&gt;in&lt;/code&gt; operator.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;In [1]: mo = MyObject()

In [2]: mo.__contains__(1)
Out[2]: &apos;__contains__&apos;

In [3]: mo.__contains__(1) == True
Out[3]: False

In [4]: 1 in mo
Out[4]: True
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I was expecting &lt;code&gt;1 in mo&lt;/code&gt; to evaluate to something like &lt;code&gt;mo.__contains__(1) == True&lt;/code&gt; which evaluates to &lt;code&gt;__contains__ == True&lt;/code&gt; which should be &lt;code&gt;False&lt;/code&gt;. Instead I&apos;m got &lt;code&gt;True&lt;/code&gt;, so there was something I didn&apos;t understand going on.&lt;/p&gt;
&lt;p&gt;A little but of searching on StackOverflow led me to &lt;a href=&quot;http://stackoverflow.com/a/18753584/1216837&quot;&gt;this hint&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;First, &lt;code&gt;in&lt;/code&gt; always casts the result of &lt;code&gt;__contains__&lt;/code&gt; to a bool&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So this answers my question but doesn&apos;t really teach me much more about Python. This is the beauty about being on a plane. Normally, if I ran into something like this at work I would just take this answer at face value and moved on. But instead being stuck on a plane I had the time and incentive to go a little further down rabbit hole and read the cpython source.&lt;/p&gt;
&lt;h2&gt;Reading the Source&lt;/h2&gt;
&lt;p&gt;I&apos;ve only looked at the &lt;a href=&quot;https://github.com/python/cpython/tree/2.7&quot;&gt;cpython source code&lt;/a&gt; a few times before this. I&apos;ve never written a line of C in my life but I&apos;ve been programming long enough that I can kind of squint and make out the flow of most things.&lt;/p&gt;
&lt;p&gt;It didn&apos;t take me too long to get oriented enough to find an example of how the &lt;code&gt;__contains__&lt;/code&gt; object is &lt;a href=&quot;https://github.com/python/cpython/blob/2.7/Objects/listobject.c#L438&quot;&gt;implemented&lt;/a&gt; in an function called &lt;code&gt;list_contains&lt;/code&gt; in the &lt;code&gt;listobject.c&lt;/code&gt; file. This function iterates over the object contents calling the c &lt;a href=&quot;https://github.com/python/cpython/blob/6f0eb93183519024cb360162bdd81b9faec97ba6/Objects/object.c#L727&quot;&gt;function&lt;/a&gt; on each one.&lt;/p&gt;
&lt;p&gt;The comment on the &lt;code&gt;PyObject_RichCompareBool&lt;/code&gt; function reads:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; /* Perform a rich comparison with integer result.  This wraps
   PyObject_RichCompare(), returning -1 for error, 0 for false, 1 for true. */
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;OK so our casting as of everything as a boolean must happen in here. Digging in a little more this function drops down into the &lt;code&gt;PyObject_IsTrue&lt;/code&gt; &lt;a href=&quot;https://github.com/python/cpython/blob/6f0eb93183519024cb360162bdd81b9faec97ba6/Objects/object.c#L1314&quot;&gt;function&lt;/a&gt;. And here we finally have our answer. &lt;code&gt;PyObject_IsTrue&lt;/code&gt; checks if the input is &lt;code&gt;True&lt;/code&gt;, &lt;code&gt;False&lt;/code&gt;, &lt;code&gt;None&lt;/code&gt; and then a handful of sequence types. If none of these are true the function returns &lt;code&gt;1&lt;/code&gt;, which cascades up to &lt;code&gt;__contains__&lt;/code&gt; and is then interpreted as &lt;code&gt;True&lt;/code&gt;.&lt;/p&gt;
</content:encoded></item><item><title>Writing a FITS File Bigger Than Your Memory</title><link>https://acviana.com/posts/2014-04-08-writing-a-fits-file-bigger-than-your-memory/</link><guid isPermaLink="true">https://acviana.com/posts/2014-04-08-writing-a-fits-file-bigger-than-your-memory/</guid><description>Getting in the weeds with the FITS file format</description><pubDate>Tue, 08 Apr 2014 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&amp;lt;center&amp;gt;&lt;em&gt;I&apos;ve listed &lt;a href=&quot;https://plus.google.com/+ErikBray/about&quot;&gt;Erik Bray&lt;/a&gt;(&lt;a href=&quot;https://github.com/embray&quot;&gt;GitHub&lt;/a&gt;) as a co-author on this post. Erik is one of the PyFITS developers and this post was born out of an email chain where he explained most of what follows to me, several times.&lt;/em&gt;&amp;lt;/center&amp;gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;I’ve always thought that one of the the great things about physics is that you can add more digits to any number and see what happens and nobody can stop you.&quot;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Randall Munroe, &lt;a href=&quot;https://what-if.xkcd.com/20/&quot;&gt;What If?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I&apos;ve been working a lot lately on my HST WFC3 &lt;a href=&quot;http://acviana.github.io/tag/psf.html&quot;&gt;PSF project&lt;/a&gt; and recently had to solve a challenging scaling problem that forced me to deal with the hardware limits of my machine. I needed to create a FITS file containing a 4,000,000 x 11 x 11 data cube [^1]. This ends up being more than my 16 GB machine can handle and it resulted in &lt;a href=&quot;http://en.wikipedia.org/wiki/Paging&quot;&gt;paging&lt;/a&gt; to the virtual memory which killed performance. As I was trying to find a solution to this problem a play on Randall Munroe&apos;s quote from the beginning of this post kept popping up in my head:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;The annoying thing about writing software is that people can just add zeros and break everything and you can&apos;t stop them.&quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;So even thought it was exciting that my dataset had grown to the point that it couldn&apos;t all fit in memory at once, I was now faced with the problem of how to create a FITS file from a NumPy array that&apos;s too big to fit in memory?&lt;/p&gt;
&lt;h2&gt;Reading the Docs&lt;/h2&gt;
&lt;p&gt;I work with FITS files using the Astropy &lt;code&gt;io.fits&lt;/code&gt; &lt;a href=&quot;http://astropy.readthedocs.org/en/latest/io/fits/index.html&quot;&gt;module&lt;/a&gt;. For those of you familiar with PyFITS you&apos;ll find that the code in Astropy has been wholly migrated over from PyFITS so the functionality is currently identical. So much so that you can still use the PyFITS docs to understand Astropy&apos;s &lt;code&gt;io.fits&lt;/code&gt;. This is great because the PyFITS FAQ explicitly answers this question: &lt;a href=&quot;http://pyfits.readthedocs.org/en/latest/appendix/faq.html#how-can-i-create-a-very-large-fits-file-from-scratch&quot;&gt;How can I create a very large fits file from scratch?&lt;/a&gt;. Go ahead and read that section.&lt;/p&gt;
&lt;p&gt;With that FAQ as a starting point I ended up with this code snippet:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;data = np.zeros((1, 1, 1), dtype=np.float64)
hdu = fits.PrimaryHDU(data=data)
header = hdu.header
header[&apos;NAXIS1&apos;] = 11
header[&apos;NAXIS2&apos;] = 11
header[&apos;NAXIS3&apos;] = 4000000
header.tofile(fits_file_name, clobber=True)
header_length = len(header.tostring())
data_length = math.ceil(11 * 11 * 4000000 * 8 / 2880.0)
with open(fits_file_name, &apos;rb+&apos;) as fobj:
    fobj.seek(header_length + data_length  - 1)
    fobj.write(&apos;\0&apos;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The FAQ got me 80% of the way there and Erik helped me connect the dots. It&apos;s worth walking though this for this last 20% as well as an explanation of what exactly is going on.&lt;/p&gt;
&lt;h2&gt;Hacking the Header&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;data = np.zeros((1, 1, 1), dtype=np.float32)
hdu = fits.PrimaryHDU(data=data)
header = hdu.header
header[&apos;NAXIS1&apos;] = 11
header[&apos;NAXIS2&apos;] = 11
header[&apos;NAXIS3&apos;] = record_count
header.tofile(fits_file_name, clobber=True)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To being with I create a dummy NumPy array just to get the FITS dimensionality right. As we keep going you&apos;ll see why we didn&apos;t just create the HDU with an array the size of our expected output in the first place. Then I create a &lt;code&gt;PrimaryHDU&lt;/code&gt; instance with that NumPy array. Under the hood the &lt;code&gt;fits&lt;/code&gt; module is using this to set up some basic elements of the FITS file format. But this is all being done in-memory, I haven&apos;t written anything to the disk yet. Next I changes the &lt;code&gt;NAXIS&lt;/code&gt; keywords required by the FITS standard to match those of our expected output and not those of our dummy NumPy array. Changing this keyword doesn&apos;t do anything to actually change the dimensions or size of the file, those were set by our initial NumPy array. But it does update our header to match the data we&apos;ll be putting in. To wrap this up I use the &lt;code&gt;tofile&lt;/code&gt; method to write &lt;em&gt;only the FITS header&lt;/em&gt; to the disk with not actual data following it. PyFITS generally won&apos;t let you write an invalid FITS file, in this case it would complain that the &lt;code&gt;NAXIS&lt;/code&gt; values don&apos;t match the 1 x 1 x 1 array we used to create the file, but we &lt;em&gt;can&lt;/em&gt; write just the header as we&apos;re doing here. The clobber option overwrites the file if it already exists.&lt;/p&gt;
&lt;h2&gt;Hacking the Data&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;header_length = len(header.tostring())
data_length = math.ceil(11 * 11 * 4000000 * 8 / 2880.0)
with open(fits_file_name, &apos;rb+&apos;) as fobj:
    fobj.seek(header_length + data_length  - 1)
    fobj.write(&apos;\0&apos;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now comes the interesting part. One specific advantage Python offers scientists without a programming background is that Python is a &quot;high level&quot; programming language. The term &quot;high&quot; is subjective but the point is that Python takes care of many of the &quot;low-level&quot; aspects of programming such as memory allocation, pointers, and garbage collection. However, as you get further into the language you&apos;ll find that you have to learn how these concepts are implemented to take solve more complicated problems. This was one of those times for me.&lt;/p&gt;
&lt;p&gt;Let&apos;s jump to the middle where I use the &quot;new&quot; standard Python convention of using &lt;code&gt;with&lt;/code&gt; to open a file object. In this case I open it with the &lt;code&gt;rb+&lt;/code&gt; setting which means I&apos;m going to be read (&lt;code&gt;r&lt;/code&gt;) and update (&lt;code&gt;+&lt;/code&gt;) the file in binary mode (&lt;code&gt;b&lt;/code&gt;). Binary mode means that rather than trying to decode strings in something like UTF-8 the file object will read raw bytes.&lt;/p&gt;
&lt;p&gt;When you open a file object in binary update mode your current position is the beginning of the file. Meaning if you tell python to start reading or writing to the file it will start right a the beginning of the file. In my case I don&apos;t want that so I first use the &lt;code&gt;seek&lt;/code&gt; function to tell Python how far ahead on the disk to skip in units of bytes. In this case I want to seek ahead by the length of my final FITS file in units of bytes. It&apos;s worth noting that I&apos;m taking advantage of the fact that &lt;code&gt;seek&lt;/code&gt; will, by design, seek past the end of the file. It&apos;s worth noting that this is all being done at a very high level, I&apos;m not actually seeking on the disk or even memory locations.&lt;/p&gt;
&lt;p&gt;For clarity I&apos;ve broken up this calculation into two variables, &lt;code&gt;header_length&lt;/code&gt; and &lt;code&gt;data_lenght&lt;/code&gt;. I use the &lt;code&gt;tostring&lt;/code&gt; &lt;a href=&quot;http://stsdas.stsci.edu/stsci_python_sphinxdocs_2.13/pyfits/api_docs/api_headers.html?highlight=tostring#pyfits.Header.tostring&quot;&gt;method&lt;/a&gt; to return a string representation of the header. Because each character can be represented by one byte the string length is the same as the byte length.&lt;/p&gt;
&lt;p&gt;Next I figure out how many bytes my data is going to be. I do this by multiplying the number of elements in my array times the number of bytes required to store each element. This is interesting and something I haven&apos;t really thought about before; I can tell Python exactly how big my file will be before I write anything to it. I&apos;m using a double precision, or 64bit, floating point, which can be represented by 8 bytes. So my data will take &lt;code&gt;11 x 11 x 4,000,000 x 8 byes&lt;/code&gt; or a little over 3.6 GB.&lt;/p&gt;
&lt;p&gt;Now you can start to see why we needed to go through all this trouble. This data would have first have to have been read into memory from a different format (SQL in this case) and then copied into a NumPy array, that&apos;s 7+ GB of memory right there. Additionally, there&apos;s a couple of GB of metadata I need to load in as well, plus what&apos;s being used by system. And my data set is growing constantly. Suddenly, it&apos;s clear why we couldn&apos;t do this all in memory.&lt;/p&gt;
&lt;p&gt;Actually, you&apos;ll notice it&apos;s a little more complicated then that. I&apos;m actually rounding up to the next multiple of 2880 bytes. This is because, and I&apos;m trying to say this with a straight face, of historical limitations on how &lt;em&gt;tape&lt;/em&gt; machines used to read FITS data. So just like the FORTRAN line limit, this is a historical artifact we just have to live with. If you read the documentation carefully for the &lt;code&gt;header.tostring&lt;/code&gt; method you&apos;ll see this was automatically done for the header when it was turned into a string.&lt;/p&gt;
&lt;p&gt;Also notice that I never had to tell Python &lt;em&gt;what&lt;/em&gt; was going to be in those bytes. An array of complex decimals takes up just as much space as an array of zeros if they&apos;re both stored as the same data type. &lt;em&gt;This&lt;/em&gt; is the reason I couldn&apos;t just create our original HDU with a 4,000,000 x 11 x 11 NumPy array of zeros - that would take up just as much room as a NumPy array of the real data!&lt;/p&gt;
&lt;p&gt;But what about the &lt;code&gt;- 1&lt;/code&gt; at the end? Well I go back just one byte from the end to write the final character of the FITS file, the &lt;code&gt;\0&lt;/code&gt; byte. In this case it acts as kind of like a place holder staking out how big the file is going to be. So what was the point of all that? Well, without even needing to create anything in memory anywhere near the size of our dataset we&apos;ve now created a file exactly big enough to hold all our data.&lt;/p&gt;
&lt;p&gt;Note that what we&apos;ve created is called a &lt;a href=&quot;http://en.wikipedia.org/wiki/Sparse_file&quot;&gt;sparse file&lt;/a&gt; and how it&apos;s implemented will depend on your file system. Certain file systems will just add some metadata letting the system know that it&apos;s just blank space, whereas other file systems such as Apple&apos;s HFS+ will go through and write each &quot;blank&quot; byte which will take just as long as writing real data.&lt;/p&gt;
&lt;h2&gt;(Finally) Writing the Data&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;with fits.open(fits_file_name, mode=&apos;update&apos;) as hdlu:
    for bottom_index, top_index in zip(bottom_index_list, top_index_list):
        numpy_data_cube = get_data_chunk(bottom_index, top_index)
        hdul[0].data[bottom_index:top_index,:,:] = numpy_data_cube 
hdul.close() 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that I&apos;ve created the output file of the correct size and dimensionality I can start writing data to the file. Fist I use &lt;code&gt;fits.open&lt;/code&gt; with the &lt;code&gt;with&lt;/code&gt; convention to open the FITS file. But wait, what&apos;s going on here? Won&apos;t opening this file just read all the data into memory - the exact thing we&apos;re trying to avoid?&lt;/p&gt;
&lt;p&gt;What&apos;s happening is that the &lt;code&gt;fits&lt;/code&gt; module is by default opening the file with &lt;a href=&quot;http://en.wikipedia.org/wiki/Mmap&quot;&gt;mmap&lt;/a&gt;. What this means is that the file is read in a &quot;lazy&quot; or &quot;on-demand&quot; mode, data is only read into memory as needed. You can find more info in both the &lt;a href=&quot;http://astropy.readthedocs.org/en/latest/io/fits/index.html#working-with-large-files&quot;&gt;Astropy&lt;/a&gt; and &lt;a href=&quot;http://pyfits.readthedocs.org/en/latest/appendix/faq.html#how-do-i-open-a-very-large-image-that-won-t-fit-in-memory&quot;&gt;PyFITS&lt;/a&gt; docs.&lt;/p&gt;
&lt;p&gt;Next I step through the data I want to add to the FITS file in chunks that easily fit in memory. I wrote some generic code that does this by stepping over some indices and passing them to a function &lt;code&gt;get_data_chunk&lt;/code&gt; that returns the next &quot;chunk&quot; of records as a NumPy array. I index the FITS data just like a NumPy array (because it is one) and then update it with the chunk from my current data cube. Iterating over this eventually write the entire file without ever needing to store the entire FITS data in memory at once.&lt;/p&gt;
&lt;h2&gt;One More Thing&lt;/h2&gt;
&lt;p&gt;Erik pointed out that there is a little-known feature in PyFITS called the &lt;a href=&quot;https://github.com/spacetelescope/PyFITS/blob/master/lib/pyfits/hdu/streaming.py&quot;&gt;StreamingHDU&lt;/a&gt; class. It&apos;s an alternative way to make a FITS file on disk just by outputting the header and then writing the data one chunk at a time: http://pyfits.readthedocs.org/en/latest/api_docs/api_hdus.html#streaminghdu&lt;/p&gt;
&lt;p&gt;[^1]: &lt;a href=&quot;http://en.wikipedia.org/wiki/FITS&quot;&gt;FITS&lt;/a&gt; is a standard data file format in astronomy.&lt;/p&gt;
</content:encoded></item><item><title>Working with NumPy Arrays and SQL</title><link>https://acviana.com/posts/2014-04-07-numpy-arrays-and-sql/</link><guid isPermaLink="true">https://acviana.com/posts/2014-04-07-numpy-arrays-and-sql/</guid><description>Binary encoding in SQL for NumPy objects</description><pubDate>Mon, 07 Apr 2014 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Lately I&apos;ve been doing a lot (millions) of calculations involving small NumPy arrays of HST PSFs. Naturally, I wanted to save the output of these calculations to for later analysis. I put all the results in a MySQL database so I could easily select subsets of the data for future work (by filter, image, date, etc.). However, sometimes the outputs of these calculations are arrays themselves. This left me searching for a good way to save these NumPy arrays to a SQL database.&lt;/p&gt;
&lt;p&gt;Before I dive into this it&apos;s worth noting that there are non-SQL storage options that are specifically designed for use cases like this such as &lt;a href=&quot;http://www.pytables.org/moin&quot;&gt;PyTables&lt;/a&gt; or &lt;a href=&quot;http://en.wikipedia.org/wiki/Hierarchical_Data_Format&quot;&gt;HDF5&lt;/a&gt;. But, my project was already pretty tightly integrated with SQLAlchemy and I wasn&apos;t concerned with having readable, hierarchical, or queryable array information, which are the strengths of these other storage systems as I understand them. The queries I&apos;m going to write are going to be constructed on other fields and the data is only going to analyzed once it had been read back in as a Numpy array in Python. So, all I really needed was a way to go between NumPy and some SQL data type.&lt;/p&gt;
&lt;h2&gt;Starting with String&lt;/h2&gt;
&lt;p&gt;So my first thought was to just flatten the array into a string and then write that to the database as a &lt;code&gt;VARCHAR&lt;/code&gt; field. So something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;a = np.array([[1,2],[3,4]])
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which gives us:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[[1 2]
 [3 4]]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And then transform it into something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&apos;1,2,3,4&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And then I would just code up some logic in Python that would know to convert it back into a 2x2 array. The problem is then you start getting really awkward obtuse Python like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;string_array = str(numpy_array.flatten().tolist())[1:-1]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And on top of that you have to convince yourself that you are always reading and writing your strings in the correct order in terms of left/right and up/down, which means writing more tests. This quickly started to not feel right to me, especially if the end result was a human-readable SQL field that was never going to be read by a human while in the database.&lt;/p&gt;
&lt;h2&gt;Moving to Bytecode&lt;/h2&gt;
&lt;p&gt;After some digging and things I switched to bytecode. This isn&apos;t human readable (which is fine) but it easily and consistently goes in and out of numpy arrays with built-in methods and sits nicely in a SQL &lt;code&gt;BLOB&lt;/code&gt; field. Writing looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;byte_array = numpy_array.tostring()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which gives me:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And converting back to numpy is just as easy:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;numpy_array = numpy.fromstring(byte_array)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Ta-Da! This does what I wanted and in my opinion has that intangible &quot;pythonic&quot; feel to it. The default NumPy datatype is &lt;code&gt;numpy.float64&lt;/code&gt; but you can specify others with the &lt;code&gt;dtype&lt;/code&gt; parameter. While this solution met my needs that are probably many other ways to accomplish this, feel free to tell me about them in the comments.&lt;/p&gt;
</content:encoded></item><item><title>Fitting 2D Gaussians with agpy</title><link>https://acviana.com/posts/2014-01-30-2d-guassian-fitting-with-agpy/</link><guid isPermaLink="true">https://acviana.com/posts/2014-01-30-2d-guassian-fitting-with-agpy/</guid><description>Fitting 2 dimensionaly Gaussians with a small open source package</description><pubDate>Thu, 30 Jan 2014 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Update 01/30/2014:&lt;/strong&gt; Adam has split his &lt;code&gt;gaussfitter&lt;/code&gt; code off into it&apos;s own GitHub repository &lt;a href=&quot;https://github.com/keflavich/gaussfitter/blob/master/gaussfitter/gaussfitter.py&quot;&gt;here&lt;/a&gt; (&lt;em&gt;&quot;PR&apos;s Welcome!&quot;&lt;/em&gt;). This removes some dependencies and changes the import statement but as of right now everything else is the same. I&apos;ve maintained the old links to the original agpy repo in the post below but please use the above repo for the latest version.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;After some &lt;a href=&quot;http://acviana.github.io/posts/2013/counting-to-10-million-stars/&quot;&gt;initial work&lt;/a&gt; with fitting WFC3 UVIS PSFs with 1D Gaussians through the x and y axis I decided to look at 2d Guassian fitting as well. I was disappointed to find there wasn&apos;t already a canned procedure to do this in something like SciPy. But after some digging I decided to use &lt;a href=&quot;http://casa.colorado.edu/~ginsbura/&quot;&gt;Adam Ginsburg&apos;s&lt;/a&gt; personal agpy library. I briefly met Adam at the &lt;a href=&quot;http://dotastronomy.com/&quot;&gt;dotAstronomy&lt;/a&gt; conference last year in Boston. He&apos;s a contributor to &lt;a href=&quot;http://www.astropy.org/&quot;&gt;AstroPY&lt;/a&gt;, &lt;a href=&quot;http://astroquery.readthedocs.org/en/latest/&quot;&gt;AstroQuery&lt;/a&gt;, and &lt;a href=&quot;http://aplpy.github.io/&quot;&gt;AplPy&lt;/a&gt; so I had a hunch I could trust his code and it&apos;s worked out great.&lt;/p&gt;
&lt;p&gt;You can clone the repo &lt;a href=&quot;https://github.com/keflavich/agpy&quot;&gt;here&lt;/a&gt;. There are a couple of dependencies but I only satisfied the AstroPy and Numpy requirements and that was enough to run the &lt;code&gt;gaussfit&lt;/code&gt; function.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from agpy import gaussfitter

mpfit, psf_fit = gaussfitter.gaussfit(
    psf_array,
    returnmp=True, 
    returnfitimage=True
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using &lt;code&gt;gaussfit&lt;/code&gt; without the &lt;code&gt;returnmp&lt;/code&gt; or &lt;code&gt;returnfitimage&lt;/code&gt; parameters just returns a list with the following model parameters (in order):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;height&lt;/li&gt;
&lt;li&gt;amplitude&lt;/li&gt;
&lt;li&gt;x&lt;/li&gt;
&lt;li&gt;y&lt;/li&gt;
&lt;li&gt;width_x&lt;/li&gt;
&lt;li&gt;width_y&lt;/li&gt;
&lt;li&gt;rotation angle.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Adding &lt;code&gt;returnfitimage=True&lt;/code&gt; will also return a NumPy array of the model with the same dimensions as the input data. Lastly, setting &lt;code&gt;returnmp=True&lt;/code&gt; will return a &lt;code&gt;mpfit&lt;/code&gt; instance, which is the class used to generate the fit. The class is defined in the &lt;code&gt;agpy.mpfit_custom&lt;/code&gt; module. The &lt;code&gt;mpfit&lt;/code&gt; instance contains two useful attributes, &lt;code&gt;mpfit.params&lt;/code&gt; which is the same list of parameters that &lt;code&gt;guassfit&lt;/code&gt; returns by default, and &lt;code&gt;mpfits.covar&lt;/code&gt; which is a 7x7 &lt;a href=&quot;http://en.wikipedia.org/wiki/Covariance_matrix&quot;&gt;covariance matrix&lt;/a&gt; for the 7 model parameters.&lt;/p&gt;
&lt;p&gt;It took me a little bit of work to figure out all these outputs but they were exactly what I needed so I followed up with Adam and submitted my &lt;em&gt;first&lt;/em&gt; FOSS PR on GitHub with some documentation &lt;a href=&quot;https://github.com/keflavich/agpy/pull/2&quot;&gt;improvements&lt;/a&gt;. It&apos;s a small contribution but still a personal milestone.&lt;/p&gt;
&lt;p&gt;Finally, I made a plot of the input data, the model, and the residual (difference) at two different scales. I&apos;m definitely happy with this and am looking forward to digging into the covariance matrix a little more to really understand how well I&apos;m fitting these PSFs.&lt;/p&gt;
&lt;p&gt;&amp;lt;imgß style=&quot;width: 800px; max-width: 100%; height: auto;&quot; alt=&quot;Oops, something broke.&quot; src=&quot;/images/2d-gaussians.png&quot; /&amp;gt;&lt;/p&gt;
</content:encoded></item><item><title>Faster File Existence Testing with Sets</title><link>https://acviana.com/posts/2014-01-19-faster-file-checking-with-sets/</link><guid isPermaLink="true">https://acviana.com/posts/2014-01-19-faster-file-checking-with-sets/</guid><description>Using set membership to skip disk reads</description><pubDate>Sun, 19 Jan 2014 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;It&apos;s Time to Think about Performance&lt;/h2&gt;
&lt;p&gt;Lately at work I&apos;ve been thinking a lot about the performance of my code.  In the past most of my work fell into one of two performance categories: (roughly) overnight or (roughly) right now. In either case I didn&apos;t really care about performance. Either the task was going to take so long I had time to go do something else, in which case I didn&apos;t care if it took 1 hour or 10. Or it was going to be done fast enough I could immediately start iterating on the results, again in which case I didn&apos;t really care if was going to take 1 second or 10. I think this is indicative of the scientific computing mindset where you are both the programmer and the user: fast means fast enough for &lt;em&gt;you&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;But recently my datasets have been getting bigger (which is awesome) which has forced me to be more careful about my programming. I&apos;m routinely finding my scripts out-growing both of my performance &quot;categories&quot; and either taking several minutes to run or several days. Both scenarios leave &lt;em&gt;me&lt;/em&gt; waiting around, which is the real problem. While I always try, to the best of my abilities, to write high-quality code my time is more scarce and expensive than CPU time. This means that I optimize my time, not the CPU&apos;s. However, when I &lt;em&gt;do&lt;/em&gt; find myself waiting around for some code to run, it&apos;s time to roll up my sleeves and find some speedups.&lt;/p&gt;
&lt;p&gt;The work I do is very I/O intensive involving lots of databases and data files. I/O is extremely &lt;a href=&quot;https://gist.github.com/hellerbarde/2843375&quot;&gt;expensive&lt;/a&gt; in terms of latency so reducing trips to the disk can yield sizable speedups. Here&apos;s an example I found today that includes an introduction to a handy (and I would argue underutilized) Python type called sets.&lt;/p&gt;
&lt;h2&gt;The Slow Way&lt;/h2&gt;
&lt;p&gt;I was working on a project where I wanted to verify that all the files I had listed in a database actually existed in my file system [^1]. To do this I wrote a SQL query in SQLAlchemy to grab all the file names listed in the database. Then I looped over the the records returned by the query and used &lt;code&gt;os.path.exists&lt;/code&gt; to test the existence of each file in the file system.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;%%timeit
for record in database_query:
    if os.path.exists(record.fits_file) == False:
        print &apos;Missing {}&apos;.format(record.fits_file)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There were 3,096 iterations (records) in this loop and the IPython &lt;code&gt;%%timeit&lt;/code&gt; cell magic gave the following result:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;1 loops, best of 3: 103 s per loop
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is a bit too long of a wait for me. It&apos;s long enough for me to get distracted by Facebook or maybe writing a blog post. I kid but task switching &lt;em&gt;does&lt;/em&gt; have a real &lt;a href=&quot;http://www.codinghorror.com/blog/2006/09/the-multi-tasking-myth.html&quot;&gt;mental overhead&lt;/a&gt;. I&apos;m not advocating optimizing every task that makes you sit around for a few minutes, but in this case the solution was trivial and applicable to lots of my projects.&lt;/p&gt;
&lt;h2&gt;The Fast Way&lt;/h2&gt;
&lt;p&gt;It occurred to me that I was making 3,096 separate trips to the disk. It&apos;s my understanding that there is some overhead for each disk read so I thought maybe it would be faster to read everything I needed at once and then work with the result in memory. To do this I used &lt;code&gt;glob&lt;/code&gt; and create a list of all the files in my file system I wanted to check my query against. This gave me all the data I wanted in memory from one SQL query and one &lt;code&gt;glob&lt;/code&gt; command. That reduced the problem to a membership testing problem and Python has a great built-in type for this, &lt;a href=&quot;http://docs.python.org/2/tutorial/datastructures.html#sets&quot;&gt;sets&lt;/a&gt;. Sets are unordered &lt;a href=&quot;https://en.wikipedia.org/wiki/Hash_table&quot;&gt;hash tables&lt;/a&gt; which means their average performance for a lookup operation is the holy grail of speed, &lt;code&gt;O(1)&lt;/code&gt;. Incorporating all this into my code looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;%%timeit
file_set = set(glob.glob(file_search_string))
for record in database_query:
    if record.fits_file in file_set == False:
        print &apos;Missing {}&apos;.format(record.fits_file)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It turns out I was right, this is almost a full order of magnitude faster than my original code.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;1 loops, best of 3: 10.6 s per loop
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Caveats&lt;/h2&gt;
&lt;p&gt;So I think the principles behind this speed up are solid but, as always, your mileage my vary and there are some caveats I can think of.&lt;/p&gt;
&lt;p&gt;First of all, the file system I am searching is a network file system that I&apos;m connecting to over VPN, this makes each disk read exceptionally expensive. Secondly, the &lt;code&gt;glob&lt;/code&gt; operation is very expensive, almost all the run time is spent in that step. So if you&apos;re only checking a few files it might be faster to just look them up one-by-one than to use wildcards to scan a file tree. I&apos;m not sure where the tipping point is, but it&apos;s certainly worthwhile if you&apos;re checking every file like I am.&lt;/p&gt;
&lt;p&gt;I&apos;ve just starting thinking about these topics so if I missed something in my code or my explanation I would love to hear about it the comments.&lt;/p&gt;
&lt;p&gt;[^1]: If you&apos;re wondering why I want to do this, yes, it&apos;s because I screwed up and put the wrong files in the database.&lt;/p&gt;
</content:encoded></item><item><title>2014 Summer Internship Opportunity</title><link>https://acviana.com/posts/2014-01-16-SASP-Internship-Ad/</link><guid isPermaLink="true">https://acviana.com/posts/2014-01-16-SASP-Internship-Ad/</guid><description>Call for submissions for my internship opportunity at STScI</description><pubDate>Thu, 16 Jan 2014 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The Moving Target Pipeline group at the Space Telescope Science Institute (STScI) in Baltimore, Maryland seeks a graduate or undergraduate student for a paid summer internship through the Institute&apos;s Space Astronomy Summer Program (SASP). The intern will work with our group to develop, test, and run our software pipeline which processes Solar System object observations (&quot;moving targets&quot;) taken with the Hubble Space Telescope.&lt;/p&gt;
&lt;p&gt;The Moving Target Pipeline is the first attempt in the 22-year history of the Hubble Space Telescope to create a generalized data processing pipeline to meet the unique requirements of Solar System data. Using Python and MySQL the pipeline creates calibrated science data, dynamically scaled preview images, and predicts possible detections of &quot;incidental&quot; or serendipitous objects such as moons and asteroids. These data products and software techniques are of broad interest to the solar system research community as well as STScI, even influencing plans for the new James Webb Space Telescope (JWST). More information on the project can be found &lt;a href=&quot;http://acviana.github.io/posts/2013/mtpipeline-ddrf/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is an ideal opportunity for a student interested in software development, whose goal is either a career in software or wishes to add programming to his or her highly-desirable skill set. Our intern will be exposed to many aspects of professional software development including documentation, testing, code reviews, distributed version control, and remote software execution. Last year&apos;s intern in our group (a college senior) reported the internship experience helpful in both job interviews and subsequently, on the job.&lt;/p&gt;
&lt;p&gt;Strong candidates for this position will have experience in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Working in the Linux command line&lt;/li&gt;
&lt;li&gt;The Python programming language&lt;/li&gt;
&lt;li&gt;SQL databases, specifically MySQL&lt;/li&gt;
&lt;li&gt;Version control, specifically Git and GitHub&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A demonstrated ability to work and learn independently as well as attention to detail are all valued. A background in astronomy or physics is desirable but not required. Applicants with other relevant qualifications, such as experience in other programming languages or success in prior internships, are encouraged to demonstrate how they could contribute to our work.&lt;/p&gt;
&lt;p&gt;Candidates are invited apply by follow this &lt;a href=&quot;http://www.stsci.edu/institute/smo/students/applications&quot;&gt;link&lt;/a&gt;. Women and minorities are especially encouraged to apply. The deadline for applications is January 31st, 2014.&lt;/p&gt;
&lt;p&gt;Please note that all applications go into a general applicant pool and will be considered for &lt;em&gt;all&lt;/em&gt;  internship opportunities available through the SASP. If you are specifically interested in an internship with our Moving Target project please include the words &quot;Moving Target&quot; somewhere in your application to ensure our group reviews your application.&lt;/p&gt;
&lt;p&gt;We look forward to your application,&lt;/p&gt;
&lt;p&gt;The Moving Target Team&lt;br /&gt;
Space Telescope Science Institute&lt;br /&gt;
Baltimore, Maryland&lt;/p&gt;
</content:encoded></item><item><title>Guilty As Charged</title><link>https://acviana.com/posts/2013-12-19-guilty-as-charged/</link><guid isPermaLink="true">https://acviana.com/posts/2013-12-19-guilty-as-charged/</guid><description>Just a little git humor</description><pubDate>Thu, 19 Dec 2013 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Earlier this month the wonderful Randall Munroe at XKCD posted this great &lt;a href=&quot;http://xkcd.com/1296/&quot;&gt;strip&lt;/a&gt; poking fun at the decrease in commit message quality as a project ages.&lt;/p&gt;
&lt;p&gt;&amp;lt;img style=&quot;width: 800px; max-width: 100%; height: auto;&quot; alt=&quot;Oops, something broke.&quot; src=&quot;http://imgs.xkcd.com/comics/git_commit.png/&quot;&amp;gt;&lt;/p&gt;
&lt;p&gt;As the meme goes, &quot;nailed it&quot;.&lt;/p&gt;
&lt;p&gt;&amp;lt;img style=&quot;width: 800px; max-width: 100%; height: auto;&quot; alt=&quot;Oops, something broke.&quot; src=&quot;/images/github-commit-history.png&quot;&amp;gt;&lt;/p&gt;
&lt;p&gt;(I did finally get everything to work BTW.)&lt;/p&gt;
</content:encoded></item><item><title>A Basic Automation Setup for Astronomy - Part 2</title><link>https://acviana.com/posts/2013-12-10-a-basic-automation-setup-2/</link><guid isPermaLink="true">https://acviana.com/posts/2013-12-10-a-basic-automation-setup-2/</guid><description>Part 2 of a automation pipeline for astronomy tasks</description><pubDate>Tue, 10 Dec 2013 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Why Log Files?&lt;/h2&gt;
&lt;p&gt;Let&apos;s put on our imagining caps. Pretend that you have already set up an automated pipeline like the one I described in the first post in this series (In fact some people already have). One day your boss walks into your office and asks for some details on about something you pipeline made 6 months ago. Maybe they want to know what input files were used. Or what settings and options were used. Or Maybe they want to reproduce a figure. How would you do that?&lt;/p&gt;
&lt;p&gt;Something that I think sometimes gets lost when you start to transfer you work to a more automated work flow is our old friend the lab notebook. For any data products your automation platform produces you should be able to tell someone exactly what inputs were used, what software, and even what version of your software. Ideally, someone should even be able to figure this out for themselves. Log files are a great way to accomplish this type of detailed record keeping, essentially generating an automated lab notebook. Here is an example output from one of my scripts:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;2013-11-22 10:08:52,719 - INFO - User: jstrummer
2013-11-22 10:08:52,720 - INFO - Host: casbah
2013-11-22 10:08:52,745 - INFO - Machine: x86_64
2013-11-22 10:08:52,747 - INFO - Platform: Linux-2.6.32-358.18.1.el6.x86_64-x86_64-with-redhat-6.4-Santiago
2013-11-22 10:08:52,747 - INFO - Command-line arguments used:
2013-11-22 10:08:52,747 - INFO - reproc: False
2013-11-22 10:08:52,747 - INFO - filelist: /*_saturn/*single_sci.fits
2013-11-22 10:08:52,820 - INFO - Processing 2722 files.
2013-11-22 10:08:52,821 - INFO - Now running on u2ona109t_c0m_center_single_sci.fits
...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With the command line &lt;code&gt;grep&lt;/code&gt; utility you can quickly start to look for trends. If you want to do even more you can explore the Pandas Python package, more on that in a later post.&lt;/p&gt;
&lt;h2&gt;Getting Started with the Python Logger&lt;/h2&gt;
&lt;p&gt;Python has a great &lt;a href=&quot;http://docs.python.org/2/library/logging.html&quot;&gt;logging&lt;/a&gt; module. You can find a basic tutorial &lt;a href=&quot;http://docs.python.org/2/howto/logging.html#logging-basic-tutorial&quot;&gt;here&lt;/a&gt;. There are lots of ways to invoke and setup the Python logger. I usually define a function something like this and import it throughout my project.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import logging
import os
import datetime

def setup_logging(module):
    &quot;&quot;&quot;Set up the logging.&quot;&quot;&quot;
    log_file = os.path.join(&apos;/my-project/logs/&apos;, module,
        module + &apos;_&apos; + datetime.datetime.now().strftime(&apos;%Y-%m-%d-%H-%M&apos;) + &apos;.log&apos;)
    logging.basicConfig(filename = log_file,
        format = &apos;%(asctime)s %(levelname)s: %(message)s&apos;,
        datefmt = &apos;%m/%d/%Y %H:%M:%S %p&apos;,
        level = logging.INFO)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What this does is it will create a separate log directory for each module and the log the outputs in a date-stamped file, e.g. &lt;code&gt;/my-project/logs/my_module/my_module_2013-12-22-15-41.log&lt;/code&gt;. Every line in the log file will then begin with a date and time stamp, followed by the level name and then the logging message, similar to the output I showed in the last section. Lastly, I tell it to log all statements down to the &quot;INFO&quot; level.&lt;/p&gt;
&lt;h2&gt;A Logging Recipe&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;def log_info(func):
    &quot;&quot;&quot;Decorator to log some useful environment information.&quot;&quot;&quot;
    def wrapped(*a, **kw):

     # Log user, system, and Python metadata
        logging.info(&apos;User: &apos; + getpass.getuser())
        logging.info(&apos;System: &apos; + socket.gethostname())
        logging.info(&apos;Python Version: &apos; + sys.version.replace(&apos;\n&apos;, &apos;&apos;))
        logging.info(&apos;Python Executable Path: &apos; + sys.executable)

        # Log PyRAF data
        logging.info(&apos;PyRAF Version: &apos; + pyraf.__version__)
        logging.info(&apos;PyRAF Path: &apos; + pyraf.__path__[0])

        # Call the function and log the execution time.
        t1_time = time.time()
        func(*a, **kw)
        t2_time = time.time()
        hours_time, remainder_time = divmod(t2_time - t1_time, 60 * 60)
        minutes_time, seconds_time = divmod(remainder_time, 60)
        logging.info(&apos;Elapsed Real Time: {0:.0f}:{1:.0f}:{2:f}&apos;.\
         format(hours_time, minutes_time, seconds_time))
    return wrapped
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You script would then looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@log_info
def main():
 &quot;&quot;&quot;My main function.&quot;&quot;&quot;
 make_science()

if __name__ == &apos;__main__&apos;:
 main()
&lt;/code&gt;&lt;/pre&gt;
</content:encoded></item><item><title>That Time I Made a Metaclass</title><link>https://acviana.com/posts/2013-12-04-that-time-i-made-a-metaclass/</link><guid isPermaLink="true">https://acviana.com/posts/2013-12-04-that-time-i-made-a-metaclass/</guid><description>I made a metaclass in Python (but should I have?)</description><pubDate>Wed, 04 Dec 2013 00:00:00 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;&quot;If you don&apos;t know what a metaclass is you don&apos;t need to use one.&quot;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;David Beazley&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I was shooting some messages back and forth this morning with some current and former coworkers on Twitter on the topic of Python Metaclasses. One coworker said metaclasses was something he&apos;d never really got around to using. I mentioned that I had used them exactly once to generate database Object Relational Models (ORMs). My second coworker said that was a common use case and that it would be nice to see an example. Since a tech blog is a shining example of a hammer searching for a nail I immediately got to work on this post.&lt;/p&gt;
&lt;h2&gt;Some Background&lt;/h2&gt;
&lt;p&gt;I took David Beazley&apos;s Python Master class in 2009 (?) and I still remembered the quote from the start of this post, so for years I didn&apos;t worry about metaclasses because I knew I didn&apos;t need them. Finally though, I did need them.&lt;/p&gt;
&lt;p&gt;One of my favorite Python modules, despite its near vertical learning curve, is the &lt;a href=&quot;http://www.sqlalchemy.org/&quot;&gt;SQLAlchemy&lt;/a&gt; database toolkit. One of the features in this module is a very nice Object Relational Mapper (ORM) which maps database tables to Python classes. The ORM can be used in a number of ways and I prefer to use what&apos;s called the &lt;a href=&quot;http://docs.sqlalchemy.org/en/rel_0_9/orm/extensions/declarative.html&quot;&gt;Declarative Base&lt;/a&gt; syntax. The basic idea is that you create a parent class called &lt;code&gt;Base&lt;/code&gt; that contains information about your database connection and metadata. All your ORM classes are then child classes of &lt;code&gt;Base&lt;/code&gt; and you use them to work with your tables. Here is a basic example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;class MyTable(Base):
	&quot;&quot;&quot;Defines a SQLAlchemy ORM&quot;&quot;&quot;

	def __init__(self, init_dict):
		self.__dict__.update(init_dict)

	__tablename__ = &apos;my_table&apos;
	id = Column(Integer, primary_key=True, index=True)
	foo1 = Column(String(50))
	foo2 = Column(String(50))
	foo3 = Column(String(50))
	...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You could imagine an application that would need to dynamically define several of these tables, but you don&apos;t have to because I&apos;m about to tell you about one.&lt;/p&gt;
&lt;h2&gt;My Problem&lt;/h2&gt;
&lt;p&gt;The most common image file format in astronomy is called &lt;a href=&quot;http://en.wikipedia.org/wiki/FITS&quot;&gt;FITS&lt;/a&gt;. FITS files have multiple layers (called &quot;extensions&quot;) each with it&apos;s own set of metadata (called &quot;headers&quot;). For one of my projects we have over a million FITS files and we index these files with a MySQL database that maps the header keywords in the extensions to fields in SQL tables. We have about a dozen different file types, each with a handful of extensions, and each of those has 10s of header keywords. If you spelled out every ORM explicitly with a class and an attribute for every column like we do above we would literally have thousands of rows of ORM definitions. I&apos;m a big proponent of the DRY principle (Don&apos;t Repeat Yourself) for the sake of readability and maintainability so this was a pretty big red flag in my opinion.&lt;/p&gt;
&lt;h2&gt;My Solution&lt;/h2&gt;
&lt;p&gt;Notice that we don&apos;t need to dynamically create many instances of the same class. Instead we need to dynamically create many class definitions. This is the specific need the drove me to use a metaclass.I ended up with something like the code snippet below.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def orm_factory(class_name):
	&quot;&quot;&quot;Creates SQLA ORM Classes.&quot;&quot;&quot;
	def __init__(self, init_dict):
		self.__dict__.update(init_dict)

	class_attributes_dict = {}
	class_attributes_dict[&apos;__init__&apos;] = __init__
	class_attributes_dict[&apos;id&apos;] = Column(Integer, primary_key=True, index=True)
	class_attributes_dict[&apos;__tablename__&apos;] = class_name.lower()s
	class_attributes_dict.__update__(get_column_defs(class_name))

	return type(class_name.upper(), (Base,), class_attributes_dict)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You could then call &lt;code&gt;orm_factory&lt;/code&gt; like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Class1 = orm_factory(&apos;Class1&apos;)
Class2 = orm_factory(&apos;Class2&apos;)
Class3 = orm_factory(&apos;Class3&apos;)
...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And there you have your classes, dynamically created using metaclasses.&lt;/p&gt;
&lt;h2&gt;Solution Breakdown&lt;/h2&gt;
&lt;p&gt;Let&apos;s walk through this. First, let&apos;s look at the last line for the &lt;code&gt;orm_factory&lt;/code&gt; function. This is maybe the &quot;craziest&quot; part of the whole function. That&apos;s because &lt;code&gt;type&lt;/code&gt; is actually a metaclass constructor. That&apos;s right, the thing that tells you &lt;code&gt;type(1)&lt;/code&gt; is &lt;code&gt;int&lt;/code&gt; is also used as a metaclass constructor to maintain backward comparability [^1]. (If you want to tickle your brain check the type of type). To really wrap your head around metaclasses and type check out Jake VanderPlas&apos;s &lt;a href=&quot;http://jakevdp.github.io/blog/2012/12/01/a-primer-on-python-metaclasses/&quot;&gt;excellent post&lt;/a&gt; on the subject.&lt;/p&gt;
&lt;p&gt;The basic idea is you pass &lt;code&gt;type&lt;/code&gt; the string name you would to give the constructed class, a tuple of parent classes, and a dictionary of any other attributes for the class. Looking up the code block you&apos;ll see that I create a dictionary for these attributes called &lt;code&gt;class_attribute_dict&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Notice I&apos;m doing a little bit of magic by creating a &lt;code&gt;get_column_defs()&lt;/code&gt; function. This function will dynamically add the appropriate column definitions, for example by pulling them from the FITS headers. The implementation of this function isn&apos;t import to the topic of this post, what matters is that these is some dynamic aspect to the column definition (and hence the class creation) that necessitates the use of a metaclass.&lt;/p&gt;
&lt;p&gt;Also, notice that the attributes in  &lt;code&gt;class_attribute_dict&lt;/code&gt; includes an &lt;code&gt;__init__&lt;/code&gt; method defined as a function. This is one of those weird moments in python when you define a function &lt;em&gt;inside&lt;/em&gt; of another function. We do this because we&apos;re never going to use the &lt;code&gt;__init__&lt;/code&gt; function outside of the &lt;code&gt;orm_factory&lt;/code&gt; function it&apos;s nested in so there is no reason to globally scope it. In this case we define &lt;code&gt;__init__&lt;/code&gt; just like it was a method with a reference to &lt;code&gt;self&lt;/code&gt; and everything. Even though there is nothing special about this function it will still take on the properties of a method after it&apos;s passed to type. I personally think is pretty cool and give you some insight into how classes are built.&lt;/p&gt;
&lt;p&gt;So that&apos;s my example of metaclasses. It took me a couple of long days to figure this all out but I learned a lot about the inner workings of Python in the process. It&apos;s not so hard once you see it, but it&apos;s also not something I anticipate having to do again soon.&lt;/p&gt;
&lt;p&gt;[^1]: I swear I read this somewhere but I&apos;m still hunting for the source.&lt;/p&gt;
</content:encoded></item><item><title>The Joy of &quot;Screen&quot;</title><link>https://acviana.com/posts/2013-11-24-the-joy-of-screen/</link><guid isPermaLink="true">https://acviana.com/posts/2013-11-24-the-joy-of-screen/</guid><description>Using the screen command line untility to manage remote sessions</description><pubDate>Sun, 24 Nov 2013 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Because I frequently work remotely as well as run long processes I recently gave up my desktop system at work in exchange for a laptop and a couple of virtual machines. I use the laptop for day-to-day work but offload longer computations or automated processes to my VMs. But before I could fully take advantage of this setup  I had to solve a new problem I didn&apos;t have to worry about with my desktop, how do I run a process remotely without maintaining an open SSH connection?&lt;/p&gt;
&lt;p&gt;I went to our IT department with this question and my favorite sys admin[^1] turned me onto the joy of &lt;a href=&quot;http://en.wikipedia.org/wiki/GNU_Screen&quot;&gt;Screen&lt;/a&gt;. Since then I&apos;ve been spreading the word around my branch about this awesome little tool. Screen is a robust little piece of software that allows you to manage your shell session in a variety of ways. Screen can do a lot and it&apos;s worth taking the time to read through some tutorials but I&apos;ll explain my basic workflow in this post to start a process on a remote host and leave it running after closing the SSH connection.&lt;/p&gt;
&lt;p&gt;First you SSH into you remote machine as normal and then type&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ screen -ls 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is going to return a list of all the Screen[^2] sessions you have active on your machine. If this is your first time using screen you should see something like &lt;code&gt;No Sockets found in ...&lt;/code&gt;. Great, now we&apos;re going to start up our first Screen session, I&apos;m going to call this one &lt;code&gt;making_science&lt;/code&gt;, so then I type:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ screen -S making_science
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And then you&apos;ll get a new command line prompt, you are now &quot;in&quot; your Screen session, or as Screen calls it &quot;attached&quot;. If you run &lt;code&gt;screen -ls&lt;/code&gt; again you&apos;ll now get something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;there is a screen on:
        48625.making_science    (Attached)
1 Socket in blah-blah
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now go ahead and launch your long script, I usually run it in the background by appending script with an ampersand (&lt;code&gt;&amp;amp;&lt;/code&gt;) like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ make_some_science.py &amp;amp;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And now we finally get to the good part, we&apos;re going to detach the session without killing the process, even if you close the SSH connection. You can do this either by typing &lt;code&gt;ctrl + a&lt;/code&gt; then &lt;code&gt;:detach&lt;/code&gt;,  or if you don&apos;t have those key bindings, &lt;code&gt;screen -D&lt;/code&gt;. Anytime you want to check back in one it just reattach your screen session on the same host with &lt;code&gt;screen -r&lt;/code&gt; and everything will be just as you left it, even your command history. When you&apos;re finally done with your session you can just kill it with &lt;code&gt;exit&lt;/code&gt; and it&apos;ll be removed from your list of screens.&lt;/p&gt;
&lt;p&gt;Lastly, if like me you&apos;re suspicious by nature, you&apos;re going to want to check to make sure your process is &lt;em&gt;actualy&lt;/em&gt; still running so you don&apos;t die a little inside when you come back to work the next morning expecting a pile of fresh data and instead get a stack trace. There are a variety of tools that allow you to do this, the two command line solutions I use are &lt;code&gt;top&lt;/code&gt; [^3] and &lt;code&gt;ps&lt;/code&gt;. Either one will list the processed currently running on your machine. I usually start up a process with screen, detach the session, confirm that the process is still running with &lt;code&gt;top&lt;/code&gt; or &lt;code&gt;ps&lt;/code&gt; and &lt;em&gt;then&lt;/em&gt; close the SSH connection. If I&apos;m feeling extra careful I&apos;ll check the log files from another host after closing th SSH to make sure things are still humming along.&lt;/p&gt;
&lt;p&gt;And that&apos;s it. Go ask your IT department very nicely if they can build you a VM and then unleash your codebase while you go about your life. Horray!&lt;/p&gt;
&lt;p&gt;[^1]: If you&apos;re going to do any serious DevOps work make friends with you IT staff. Ask them for help nicely and thank them profusely. Buy them beers. A good relationship with your sys admins in invaluable to getting sh*t done. This really shouldn&apos;t even be a footnote.&lt;/p&gt;
&lt;p&gt;[^2]: Screen is inconsistently capitalized in the websites and blogs I saw. I decided to follow the convention in the GNU &lt;a href=&quot;https://www.gnu.org/software/screen/&quot;&gt;docs&lt;/a&gt; and treat it as a proper noun. I bring this up here because this is the first place in this post where Screen isn&apos;t capitalized because it starts a sentence. I also bring it up because I&apos;m a nerd.&lt;/p&gt;
&lt;p&gt;[^3]: Depending on your version of &lt;code&gt;top&lt;/code&gt; you can type &lt;code&gt;u&lt;/code&gt; on the main screen and then your username to view all the processes being run by your user name. Given that &lt;code&gt;top&lt;/code&gt; shows all the system processes this helps remove all the ones you don&apos;t care about.&lt;/p&gt;
</content:encoded></item><item><title>A Basic Automation Setup for Astronomy - Part 1</title><link>https://acviana.com/posts/2013-11-23-a-basic-automation-setup/</link><guid isPermaLink="true">https://acviana.com/posts/2013-11-23-a-basic-automation-setup/</guid><description>A barebones automation setup for astronomy pipelines</description><pubDate>Sat, 23 Nov 2013 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;For one of my projects at work I engineered an automation platform for one of our instrument teams. This platform allows us to automatically execute 20+ daily scripts, written in a variety of programming languages, as data is coming down from the telescope. All the scripts for our team, from the downloading the data, copying and indexing the data in an SQL database, running in-house scripts, and system self-diagnostics run on the same automation platform. Adding a script to this platform requires as little as 4 lines of code. Our codebase is updated with hourly builds from our team of 6 developers and all execution and maintenance is performed via a service account on a Linux Red Hat virtual machine.&lt;/p&gt;
&lt;p&gt;This is going to be a series of posts where I&apos;m going to cover all the odds and ends I had to stick together to build this system. This is far from a generalized solution but hopefully you can learn from my mistakes and build something to suit your own needs faster and better than I did.&lt;/p&gt;
&lt;p&gt;As an aside, this is what I consider to be the DevOps side of my job. It&apos;s only in the last year that this has become part of my work and it&apos;s only now that I&apos;m starting to identify this as a valuable skill in response to a hard problem. Previously, I was just embarrassed I kept breaking things. But looking back on it, automating this system in this way is one of the hardest things I&apos;ve done in my job. I&apos;m fortunate I have a team that didn&apos;t tell me to stop wasting my time and do it the old way by running everything by hand. And now that it&apos;s up and running I haven&apos;t had to fix it in weeks.&lt;/p&gt;
&lt;p&gt;In this first post we&apos;ll just cover combining the automated execution solution (cron) with the environment configuration solution (Ureka).&lt;/p&gt;
&lt;h2&gt;Cron: Running Your Code&lt;/h2&gt;
&lt;p&gt;At the heart of our system, like many automation solutions, is the Unix job scheduler &lt;a href=&quot;http://en.wikipedia.org/wiki/Cron&quot;&gt;cron&lt;/a&gt;. There are other applications our team considered using to fill this role such as &lt;a href=&quot;http://research.cs.wisc.edu/htcondor/&quot;&gt;Condor&lt;/a&gt; a distributed computing platform, &lt;a href=&quot;http://jenkins-ci.org/&quot;&gt;Jenkins CI&lt;/a&gt; a java-based web frontend for continuous integration, &lt;a href=&quot;http://en.wikipedia.org/wiki/Launchd&quot;&gt;launchd&lt;/a&gt; an OSX job scheduler, and even &lt;a href=&quot;http://www.celeryproject.org/&quot;&gt;Celery&lt;/a&gt; a distributed job queue. In the end we chose cron because, once we got it working, it was the most direct and simple solution for our architecture. However, as you&apos;ll see later, that simplicity made it incredibly difficult to use with IRAF/PyRAF. But first let&apos;s start with some cron basics.&lt;/p&gt;
&lt;p&gt;If you look at some cron tutorials you&apos;ll see that cron jobs are scheduled with a syntax like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;00 11 * * * my_script.py
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will execute &lt;code&gt;my_script.py&lt;/code&gt; every day at 11 am. This is nothing you can&apos;t find in any cron tutorial but here are some tricks I found useful that I had to dig around for a little bit. You can also run multiple scripts sequentially on one line by separating them with a semicolon (&lt;code&gt;;&lt;/code&gt;) like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;00 11 * * * my_first_script.py; my_second_script.py
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Each script will run once after the preceding script finishes regardless of the preceding script&apos;s exit status. Alternatively, you can make the execution of the second script dependent on the successful completion of the first script with a double ampersand (&lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt;) like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;00 11 * * * my_first_script.py &amp;amp;&amp;amp; my_second_script.py
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now cron is likely going to try to be helpful by emailing you any outputs from your code. The recipient of this email can be defined by setting the &lt;code&gt;MAILTO&lt;/code&gt; variable before the job definitions like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;MAILTO = &apos;my_team_list@my_institution.edu&apos;
00 11 * * * my_first_script.py &amp;amp;&amp;amp; my_second_script.py
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can also define any other variables or alias you want in this same manner. But let&apos;s say that you only want to hear from cron when something breaks. You can do this by redirecting &lt;code&gt;STDOUT&lt;/code&gt; just as you would from the command line:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;MAILTO = &apos;my_team_list@my_institution.edu&apos;
00 11 * * * my_first_script.py &amp;amp;&amp;amp; my_second_script.py &amp;gt; /dev/null
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now you will only get an email when something gets passed to &lt;code&gt;STDERR&lt;/code&gt;. For our setup, this is all the cron syntax we needed to understand. Now onto setting up you environment.&lt;/p&gt;
&lt;h2&gt;Ureka: Setting Up Your Environment&lt;/h2&gt;
&lt;p&gt;Right now you might be thinking, &lt;em&gt;&quot;My environment is already set up! Right?&quot;&lt;/em&gt;. This is when using cron starts to become a little non-trivial; cron does not know about your &lt;a href=&quot;http://stackoverflow.com/questions/2229825/where-can-i-set-environment-variables-that-crontab-will-use&quot;&gt;enviorment variables&lt;/a&gt;, like &lt;em&gt;any&lt;/em&gt; of them. In a lot of applications this is not a big deal, you can just define some environment variables just like I defined &lt;code&gt;MAILTO&lt;/code&gt; above, and you&apos;re set.&lt;/p&gt;
&lt;p&gt;But, if you&apos;re in astronomy one of your default software tools is likely IRAF/PyRAF. This takes installing software and declaring environment variables to an entirely new level of difficulty. I spent &lt;em&gt;weeks&lt;/em&gt; working on the problem of getting cron to run in an IRAF/PyRAF compatible environment without any success. I tried half a dozen different approaches and talked to several people, all of who confessed to giving up due to the same complication. In the end, I partnered with one of our best IT people and we came up with a solution. The first step of that solution is to use Ureka.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://ssb.stsci.edu/ureka/&quot;&gt;Ureka&lt;/a&gt; is a software package developed by STScI and Gemini. From the website:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Ureka is a collection of useful astronomy software that is generally centered around Python and IRAF. The software provides everything you need to run the data reduction packages provided by STScI and Gemini.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Ureka is great, it builds a completely isolated IRAF and Python environment in minutes and loads or unloads the environment with a single command. You can run ds9, IRAF, PyRAF, and Python. Plus Python comes loaded with IPython, the IPython notebook, matplotlib, numpy, scipy, and pandas. If you need more than that you can immediately run &lt;code&gt;pip install&lt;/code&gt; because your paths have already been set up for you. You&apos;re on your own for IDL though.&lt;/p&gt;
&lt;p&gt;Whether you&apos;re like me and work on an institute machine with pre-built libraries or if you&apos;re running everything on your own machine and have root, Ureka is worth looking into because it &lt;em&gt;just works&lt;/em&gt;. I spent a lot of time learning to use package managers, building from source into different prefixes, virtualenv, and the ins and outs of pip, but when I got a new virtual machine last month I used Ureka and literally set up everything I needed in 3 commands. I was sold.&lt;/p&gt;
&lt;p&gt;So now we have our automation tool, cron, and our environment setup with Ureka. Now it&apos;s time to combine them.&lt;/p&gt;
&lt;h2&gt;Cron + Ureka: Automatic Environment Setup&lt;/h2&gt;
&lt;p&gt;Like we just saw, you can run more than one script on a single line in cron. You can start the Ureka environment with the command &lt;code&gt;ur_setup&lt;/code&gt; and exit with &lt;code&gt;ur_forget&lt;/code&gt;. So I &lt;em&gt;thought&lt;/em&gt; the following command would have been enough to run our scripts:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;00 11 * * * ur_setup &amp;amp;&amp;amp; my_second_script.py &amp;gt; /dev/null
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But it doesn&apos;t work. Somehow this does not run &lt;code&gt;my_second_script.py&lt;/code&gt; using the environment set up by &lt;code&gt;ur_setup&lt;/code&gt;, my guess is that each script is launched in an independent shell that doesn&apos;t propagate variables back to the parent cron environment. This independence is generally a desirable feature so that makes sense, though it makes life hard in our case. This is where everyone I talked to crashed and burned when trying to use cron to automate astronomy software, whether they were using Ureka or not. But eventually one of our IT specialists worked out a wrapper script:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;:::sh
#!/bin/tcsh
ur_setup
&quot;$*&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&apos;s non-intuitive at first but what it does it runs &lt;code&gt;ur_setup&lt;/code&gt; and then takes a script name as a command line argument and runs that script. Because this is all done in the same shell session the script is launched in the Ureka environment - &lt;em&gt;finally&lt;/em&gt;. I can&apos;t tell you how happy I was to finally get this to work. The execution looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;00 11 * * * cron_setup.sh my_second_script.py &amp;gt; /dev/null
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Done, Right?&lt;/h2&gt;
&lt;p&gt;That was a bit of a long post, and you might be tempted to call it quits and just run with this setup, but I would encourage you not to. We still have to talk about a deployment solution for your code using your version control system (because your code is version controlled right?), using Python to wrap code written in other languages, using the Python logging module to generate logs. Doesn&apos;t that sound nice? I&apos;ll put the link to all that right [here] once it&apos;s ready.&lt;/p&gt;
</content:encoded></item><item><title>The Data are Inconclusive</title><link>https://acviana.com/posts/2013-11-21-inconclusive-data/</link><guid isPermaLink="true">https://acviana.com/posts/2013-11-21-inconclusive-data/</guid><description>Error bars change the interpretation of my stellar PSF project</description><pubDate>Thu, 21 Nov 2013 00:00:00 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;&quot;Give me a point and I can draw a line. Give me two points and I can draw a curve. That&apos;s astronomy.&quot; - Anonymous Professor.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Ran the full pipeline the PSF data for the first time. The data are inconclusive.&lt;/p&gt;
&lt;p&gt;&amp;lt;img style=&quot;width: 800px; max-width: 100%; height: auto;&quot; alt=&quot;Oops, something broke.&quot; src=&quot;/images/inconclusive-data.png&quot; /&amp;gt;&lt;/p&gt;
&lt;p&gt;Haha, oh error bars. Who hasn&apos;t made a plot just like this before?&lt;/p&gt;
</content:encoded></item><item><title>The First Thousand PSFs</title><link>https://acviana.com/posts/2013-11-20-the-first-thousand-psfs/</link><guid isPermaLink="true">https://acviana.com/posts/2013-11-20-the-first-thousand-psfs/</guid><description>Plotting one thousand stellar PSFs</description><pubDate>Wed, 20 Nov 2013 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In my last post on my PSF Project I characterized the PSF shape by fitting a 1-D Gaussian to the a row and column slice through the central pixel. In this post I start to think about how to characterize entire images with thousands of stars.&lt;/p&gt;
&lt;h2&gt;Adding Some Zeros&lt;/h2&gt;
&lt;p&gt;Many applications for the PSF database involve looking at PSF changes over a time series. One way to do this is to move from characterizing individual PSFs to characterizing the PSFs in an entire image and seeing how that changes over time. [^1] In other words, we need to add some zeros to our lone star and start working our way towards our final dataset of 10 million. I decided to do this by digging into the first stellar field I could find. This happened to be &lt;code&gt;iabj01a2q_flt.fits&lt;/code&gt; an image of &lt;a href=&quot;http://en.wikipedia.org/wiki/47_Tucanae&quot;&gt;NGC104&lt;/a&gt; which adds 3 more zeros to our star total, bringing it to roughly 1,500 stars. Here is the image, it&apos;s actually quite nice (sorry for the large file):&lt;/p&gt;
&lt;p&gt;&amp;lt;img style=&quot;width: 800px; max-width: 100%; height: auto;&quot; alt=&quot;Oops, something broke.&quot; src=&quot;/images/Visit01-iabj01a2q_flt.jpg&quot; /&amp;gt;&lt;/p&gt;
&lt;p&gt;In case you want to play along at home, this image is in the public domain and can be found &lt;a href=&quot;http://archive.stsci.edu/cgi-bin/mastpreview?mission=hst&amp;amp;dataid=IABJ01A2Q&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Plotting the PSF Fits&lt;/h2&gt;
&lt;p&gt;So I went ahead and fitted a Gaussian in both the row and column directions for all the stars in that image. Then I studied the distribution by plotting the fit parameters against each other in each slice.&lt;/p&gt;
&lt;p&gt;&amp;lt;img style=&quot;width: 800px; max-width: 100%; height: auto;&quot; alt=&quot;Oops, something broke.&quot; src=&quot;/images/image_variable_matrix.png&quot; /&amp;gt;&lt;/p&gt;
&lt;p&gt;First of all, in hindsight what I wanted was a &lt;a href=&quot;https://www.google.com/webhp#q=scatter%20plot%20matrix&quot;&gt;scatter plot matrix&lt;/a&gt; but this is exploratory work so I can go back and make that plot in the future.&lt;/p&gt;
&lt;p&gt;In terms of the actual data, we don&apos;t see anything noticeably different between the row and column slices, which is expected at this point, though there there may be directionally-dependent instrumental effects, such as the charge transfer efficiency, which may come into play later. Looking at the individual fit parameters there&apos;s not much going on with the amplitude but things get interesting when we plot the standard deviation against the mean (&lt;code&gt;mu&lt;/code&gt;). However, the correlation we&apos;re seeing in the plot is actually a sampling effect.&lt;/p&gt;
&lt;h2&gt;Interpreting the Results&lt;/h2&gt;
&lt;p&gt;Let&apos;s start by explaining what &lt;code&gt;sigma&lt;/code&gt; and &lt;code&gt;mu&lt;/code&gt; mean in this context (it might be helpful to refer to my &lt;a href=&quot;http://acviana.github.io/posts/2013/11/18/counting-to-10-million-stars/&quot;&gt;last post&lt;/a&gt;). First of all, &lt;code&gt;mu&lt;/code&gt;, the mean of the Gaussian fit is almost always between 4.5 and 5.5. This is by design because the algorithm that centers the PSF cutouts does a good job of picking the brightest pixel. &lt;code&gt;sigma&lt;/code&gt; then is a measure of the width of the PSF.&lt;/p&gt;
&lt;p&gt;The actual energy distribution of a star is a continuous distribution. However, detectors take discrete samples from this distribution, pretty much literally making a histogram. If you&apos;ve played around with histograms much you might see where this is going. If a star happens to land exactly in the middle of a pixel (&lt;code&gt;mu&lt;/code&gt; = 5) most of the light from the PSF will land in that pixel. This will result in a narrow &lt;code&gt;sigma&lt;/code&gt;. But is the &lt;em&gt;same&lt;/em&gt; PSF lands right in the middle of two pixels (equidistant from their centers) then the same amount of flux is split between those two pixels. The total amount of flux is still the same, but the distribution is different. This distribution is less peaked, which is to say wider and with a larger &lt;code&gt;sigma&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;I think this is supported by the fact that as you move away from the middle of the central pixel (&lt;code&gt;mu&lt;/code&gt; = 5) the distribution of &lt;code&gt;sigma&lt;/code&gt; for a given mean moves up (gets wider) in an absolute sense, but not in a relative sense. Put another way, the width of you PSF increases the further the peak is from the central pixel, but the difference between a peaked and broad PSF at any given mean is about the same.&lt;/p&gt;
&lt;h2&gt;What&apos;s Next?&lt;/h2&gt;
&lt;p&gt;The next step is pretty clear, we need to use this distribution of parameters to characterize the &quot;average&quot; PSF shape in each image and plot that as a time series. This will almost certainly yield nothing but noise, but I&apos;m confident that as we tease the data out such as separating each filter or different parts of the detector we&apos;ll start to see some real trends.&lt;/p&gt;
&lt;p&gt;But, this sampling effect is bothering me. If we just take a mean and standard distribution of all the &lt;code&gt;sigma&lt;/code&gt; parameters in each image those results are going to be heavily influenced by the sampling effect (I&apos;m deliberately trying to avoid saying the &lt;code&gt;sigma&lt;/code&gt; of the &lt;code&gt;sigma&lt;/code&gt;s.). However, after looking at a sample set of plots I think this is likely characteristic of all our data. Consistency will have to suffice for now because as you&apos;ll see in my next post there are more pressing problems as we start to add zeros to our star count.&lt;/p&gt;
&lt;p&gt;[^1]: Most astronomers talk about PSF widths in terms of the full width at half maximum (&lt;a href=&quot;http://en.wikipedia.org/wiki/Full_width_at_half_maximum&quot;&gt;FWHM&lt;/a&gt;). I&apos;m using the standard deviation (sigma) in my work but they only differ by a coefficient; &lt;code&gt;FWHM = sigma x 2 x (2 x ln(s)) ^ (1/2)&lt;/code&gt;. Sorry about the lack of nice math symbols, I haven&apos;t played with that plugin yet.&lt;/p&gt;
</content:encoded></item><item><title>Counting to 10 Million Stars</title><link>https://acviana.com/posts/2013-11-18-counting-to-10-million-stars/</link><guid isPermaLink="true">https://acviana.com/posts/2013-11-18-counting-to-10-million-stars/</guid><description>Working with large datasets of stellar PSFs</description><pubDate>Mon, 18 Nov 2013 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I&apos;ve started a new project working with 10 million stellar PSFs. In my first few steps in the project I performed some model fitting and made a pretty visualization of the individual data points.&lt;/p&gt;
&lt;h2&gt;My New (Little) Big Data Project&lt;/h2&gt;
&lt;p&gt;I am starting a new project that I&apos;m pretty excited, it&apos;s one of the reasons I decided to start this blog. about because it is pushing me more in the direction of &quot;Big Data&quot;. Lots of people throw the term big data around with different meanings. Personally, I consider something to be Big Data when the complexity of the task is dominated by complications from the size of the data. The &quot;task&quot; could be anything related to the data including storage, computation, or visualization. Specifically, this project is going to push the computation and database aspects of my work into the Big Data zone.&lt;/p&gt;
&lt;p&gt;The dataset for this project is 10 million stellar &lt;a href=&quot;http://en.wikipedia.org/wiki/Point_spread_function&quot;&gt;PSFs&lt;/a&gt; observations taken with the HST WFC3 UVIS instrument. These PSF were data mined from the total on-orbit data set of roughly 35 thousand WFC3 UVIS observations using a colleague&apos;s specialized FORTRAN code which extracted a 11x11 array centered on each PSF. This is an especially powerful method of constructing our dataset because it allows us to use any incidental PSFs observations when the target was not a star or stellar field.&lt;/p&gt;
&lt;h2&gt;Fitting 1-D Gaussian Distributions&lt;/h2&gt;
&lt;p&gt;After some initial work I was able to create a reader that takes the outputs text files from my colleague&apos;s code and transforms it into a numpy array. Next, we decided we wanted to start by characterizing the PSFs with two 1-D Gaussian fits through the center pixel, one in the row direction and another in the column.&lt;/p&gt;
&lt;p&gt;First of all, I was &lt;em&gt;shocked&lt;/em&gt; to learn, after an hour of googling and popping my head into people&apos;s offices, that the definition of a Gaussian distribution isn&apos;t tucked away somewhere in NumPy or SciPy. Thinking about it, it &lt;em&gt;guess&lt;/em&gt; makes sense because it&apos;s not clear what format your inputs and outputs should be, but I&apos;m still a little surprised that all the tutorials I found on this subject began with defining the Gaussian distribution. Anyway, once I got past that the rest wasn&apos;t too hard.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import numpy as np
from scipy.optimize import curve_fit


def gaussian(x, A, mu, sigma):
    &quot;&quot;&quot;Definintion of the Guassian function.&quot;&quot;&quot;
    return A*np.exp(-(x-mu)**2/(2.*sigma**2))


def get_gaussian_dict(data):
    &quot;&quot;&quot;Use curve fit to return a dictionary with all the model 
    information.&quot;&quot;&quot;
    p0 = [data.max(), np.where(data == data.max())[0][0], 1]
    coeff, var_matrix = curve_fit(gaussian, range(len(data)), data, p0)
    A, mu, sigma = coeff
    output_dict = {}
    output_dict[&apos;amplitude&apos;] = float(A)
    output_dict[&apos;mu&apos;] = float(mu)
    output_dict[&apos;sigma&apos;] = float(sigma)
    output_dict[&apos;var_matrix&apos;] = var_matrix
    output_dict[&apos;model_data&apos;] = gaussian(resample_range(data, 10), A, mu, sigma)
    return output_dict
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I use scipy&apos;s impressive &lt;code&gt;curve_fit&lt;/code&gt; function to perform the model fitting. The last argument &lt;code&gt;curve_fit&lt;/code&gt; takes is &lt;code&gt;p0&lt;/code&gt;, the initial guess for the fitting parameters. Fortunately, our data is very well behaved so we can easily do a good job guessing the initial parameters from the input data. Because I like to make my functions as general as possible I return all the possible information from the fit in a dictionary. For example, in the future I&apos;ll probably want to dig into the &lt;a href=&quot;http://en.wikipedia.org/wiki/Covariance_matrix&quot;&gt;covariance matrix&lt;/a&gt; that the &lt;code&gt;curve_fit&lt;/code&gt; returns to calculate a goodness of fit estimator and I&apos;ll be able to do that with the same function.&lt;/p&gt;
&lt;h2&gt;Eye Candy&lt;/h2&gt;
&lt;p&gt;Finally, all this is all visualized in the 4-panel figure below.&lt;/p&gt;
&lt;p&gt;&amp;lt;img style=&quot;width: 800px; max-width: 100%; height: auto;&quot; alt=&quot;Oops, something broke.&quot; src=&quot;/images/psf-4-panel-view.png&quot; /&amp;gt;&lt;/p&gt;
&lt;p&gt;The bottom row contains the row and column slices and the Gaussian fits with the model parameters printed in the upper corners. The upper row contains a heat map, and just for fun, a 3D wire frame for the PSF. I could make some tweaks here and there such as matching the wire frame and heat map color bars but this is already more than enough to visualize a single data point, I need to start working my way up to 10 million.&lt;/p&gt;
</content:encoded></item><item><title>The Moving Target Pipeline</title><link>https://acviana.com/posts/2013-11-18-mtpipeline-ddrf/</link><guid isPermaLink="true">https://acviana.com/posts/2013-11-18-mtpipeline-ddrf/</guid><description>My reaseach grant proposal for a Hubble Space Telescope moving target pipeline</description><pubDate>Mon, 18 Nov 2013 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A few weeks ago I was awarded an research grant to continue working on a prototype software pipeline for HST moving target (solar system) observations. The grant came from an internal source called the Director&apos;s Discretionary Research Fund (DDRF). My project, called the Moving Target Pipeline, was fully funded at $21,000 and allows me to buy back 25% of my time for one year to work on the project. Here is the proposal abstract:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;We propose a moving target pipeline for the WFC3 and ACS instruments based our existing WFPC2 software to produce properly drizzled FITS images, dynamically scaled preview images, and predicted ephemeris positions. Such a pipeline is relevant to ongoing HST scientific observations, the Hubble Legacy Archive (HLA), and serves to lay the design groundwork for JWST’s moving target processing. We request funds to support a senior RIA for our software development activities. [^1]&quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Continuing from the proposal:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&quot;Our WFPC2 pipeline addresses the 4 main issues that impede performing Solar System astronomy with HST archival data: (1) identifying cosmic rays, (2) drizzling, (3) scaled preview images, and (4) identifying incidental ephemeris observations.&quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This image gives an attractive visual of what we can already accomplish for WFPC2 data and will expand to the WFC3 and ACS cameras:&lt;/p&gt;
&lt;p&gt;&amp;lt;img style=&quot;width: 800px; max-width: 100%; height: auto;&quot; alt=&quot;Oops, something broke.&quot; src=&quot;/images/mtpipeline-mars-before-after.png&quot; /&amp;gt;&lt;/p&gt;
&lt;p&gt;You can find the full proposal &lt;a href=&quot;https://www.dropbox.com/s/04m5rboqkkmzuvm/2013_Fall_DDRF_Proposal_No_Recs.pdf&quot;&gt;here&lt;/a&gt;. Our project will be open source and available on GitHub. It will be an extension of our existing work on a citizen science project for WFPC2 which you can browse &lt;a href=&quot;&apos;https://github.com/STScI-Citizen-Science/MTPipeline&apos;&quot;&gt;here&lt;/a&gt;. This builds off a number of other grants and &lt;a href=&quot;http://archive.stsci.edu/prepds/planetpipeline/index.html&quot;&gt;existing work&lt;/a&gt; in this area [^2].&lt;/p&gt;
&lt;p&gt;I hope that this phase of the project will be useful for planetary scientists using HST.&lt;/p&gt;
&lt;p&gt;[^1]: Full disclosure, I misspelled &quot;activities&quot; in the actual abstract. &lt;em&gt;facepalm&lt;/em&gt;
[^2]: Humblebrag / scavenger hunt, spot the astronaut co-investigator in the parent proposal :-)&lt;/p&gt;
</content:encoded></item><item><title>The Trouble with Tech Blogs</title><link>https://acviana.com/posts/2013-11-17-the-trouble-with-tech-blogs/</link><guid isPermaLink="true">https://acviana.com/posts/2013-11-17-the-trouble-with-tech-blogs/</guid><description>Starting a Pelican-based GitHub blog</description><pubDate>Sun, 17 Nov 2013 00:00:00 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;the trouble with poetry is&lt;br /&gt;
that it encourages the writing of more poetry&lt;br /&gt;
- Billy Collins, &lt;a href=&quot;http://www.edutopia.org/trouble-poetry&quot;&gt;The Trouble with Poetry&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To paraphrase one of my favorite poets, the trouble with tech blogs is that they encourage the writing of more tech blogs. The Internet probably doesn&apos;t need another tech blog but I think this will be a fun way for me to both share and keep track of what I&apos;m working on. Hopefully, some people will at least find it entertaining if not helpful.&lt;/p&gt;
&lt;p&gt;I tried to start this blog over a year ago with Octopress. I&apos;d never worked with ruby before but I immediately feel in love with the idea of static HTML generated from Markdown. It seemed so elegant yet customizable, definitely a hacker&apos;s blogging platform. Of course I almost immediately broke the deploy step to GitHub and got pulled away to other projects. Then I stumbled across Pelican earlier this year, which is the same idea as Octopress but in Python! After several false starts I now finally have a working blog again!&lt;/p&gt;
&lt;p&gt;And so another tech blog begins!&lt;/p&gt;
</content:encoded></item><item><title>All other languages were for some reason inferior, and as a Python programmer, I was the member of an elite cabal of superhuman...</title><link>https://acviana.com/posts/all-other-languages-were-for-some-reason-inferior/</link><guid isPermaLink="true">https://acviana.com/posts/all-other-languages-were-for-some-reason-inferior/</guid><description>All other languages were for some reason inferior, and as a Python programmer, I was the member of an elite cabal of superhuman...</description><pubDate>Fri, 14 Sep 2012 21:14:48 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;All other languages were for some reason inferior, and as a Python programmer, I was the member of an elite cabal of superhuman ultranerds, smarter than those childish Rails/JavaScript/PHP/whatever developers that couldn’t write a bubble sort or comprehend even basic algorithmic complexity, but more in touch with reality than the grey-bearded wizards of Lisp/Haskell/whatever that sat in their caves/towers/whatever solving contrived, nonexistent problems for people that don’t exist, or those insane Erlang programmers who are content writing sumerian cuneiform all day long. &lt;a href=&quot;http://jordanorelli.tumblr.com/post/31533769172/why-i-went-from-python-to-go-and-not-node-js&quot;&gt;http://jordanorelli.tumblr.com/post/31533769172/why-i-went-from-python-to-go-and-not-node-js&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
</content:encoded></item><item><title>2012 DjangoCon Slides</title><link>https://acviana.com/posts/2012-djangocon-slides/</link><guid isPermaLink="true">https://acviana.com/posts/2012-djangocon-slides/</guid><description>2012 DjangoCon Slides</description><pubDate>Thu, 06 Sep 2012 16:34:22 GMT</pubDate><content:encoded>&lt;p&gt;**&lt;a href=&quot;http://www.slideshare.net/alexcostaviana/djangocon-lightning-talk-hello-from-hubble&quot;&gt;DjangoCon Lightning Talk: Hello from Hubble&lt;/a&gt; ** from &lt;strong&gt;&lt;a href=&quot;http://www.slideshare.net/alexcostaviana&quot;&gt;alexcostaviana&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Here are my slides from my lightning talk at the 2012 DjangoCon in Washington DC. It was my first talk at a conference so I was glad I got such a warm response. It would be nice to return next year with some working django sites!&lt;/p&gt;
</content:encoded></item><item><title>Overly honest commenting on hitting &apos;accept&apos; on a GitHub pull request for the first time.</title><link>https://acviana.com/posts/overly-honest-commenting-on-hitting-accept-on-a/</link><guid isPermaLink="true">https://acviana.com/posts/overly-honest-commenting-on-hitting-accept-on-a/</guid><description>Overly honest commenting on hitting &apos;accept&apos; on a GitHub pull request for the first time.</description><pubDate>Sun, 02 Sep 2012 18:28:54 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;/assets/tumblr/overly-honest-commenting-on-hitting-accept-on-a/tumblr_m9qx86ClKn1rt9cjfo1_1280.png&quot; alt=&quot;Photo&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Overly honest commenting on hitting &apos;accept&apos; on a GitHub pull request for the first time.&lt;/p&gt;
</content:encoded></item><item><title>Making sure we stay at the forefront of space exploration is a big priority for my administration. The passing of Neil Armstrong...</title><link>https://acviana.com/posts/making-sure-we-stay-at-the-forefront-of-space/</link><guid isPermaLink="true">https://acviana.com/posts/making-sure-we-stay-at-the-forefront-of-space/</guid><description>Making sure we stay at the forefront of space exploration is a big priority for my administration. The passing of Neil Armstrong...</description><pubDate>Thu, 30 Aug 2012 09:09:26 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;Making sure we stay at the forefront of space exploration is a big priority for my administration. The passing of Neil Armstrong this week is a reminder of the inspiration and wonder that our space program has provided in the past; the curiosity probe on mars is a reminder of what remains to be discovered. The key is to make sure that we invest in cutting edge research that can take us to the next level - so even as we continue work with the international space station, we are focused on a potential mission to a asteroid as a prelude to a manned Mars flight. President Barak Obama&lt;/p&gt;
&lt;/blockquote&gt;
</content:encoded></item><item><title>The academic system does not respect practitioners’ knowledge (or timescales). Practitioners don’t understand that computer...</title><link>https://acviana.com/posts/the-academic-system-does-not-respect/</link><guid isPermaLink="true">https://acviana.com/posts/the-academic-system-does-not-respect/</guid><description>The academic system does not respect practitioners’ knowledge (or timescales). Practitioners don’t understand that computer...</description><pubDate>Mon, 27 Aug 2012 14:03:00 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;The academic system does not respect practitioners&apos; knowledge (or timescales). Practitioners don&apos;t understand that computer scientists don&apos;t care about building software.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Taken from Greg Wilson&apos;s &quot;&lt;a href=&quot;http://www.slideshare.net/gvwilson/two-solitudes&quot;&gt;Two Solitudes&lt;/a&gt;&quot; presentation. Slide 78.&lt;/p&gt;
&lt;p&gt;And this is why I don&apos;t see myself ever going to grad school.&lt;/p&gt;
</content:encoded></item><item><title>I&apos;m on-site today at Southern Illinois University - Edwardsville working on Space Telescope&apos;s collaboration with...</title><link>https://acviana.com/posts/im-on-site-today-at-southern-illinois-university/</link><guid isPermaLink="true">https://acviana.com/posts/im-on-site-today-at-southern-illinois-university/</guid><description>I&apos;m on-site today at Southern Illinois University - Edwardsville working on Space Telescope&apos;s collaboration with...</description><pubDate>Tue, 21 Aug 2012 16:28:09 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;/assets/tumblr/im-on-site-today-at-southern-illinois-university/tumblr_m94jmx7CBQ1rt9cjfo1_1280.png&quot; alt=&quot;Photo&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I&apos;m on-site today at Southern Illinois University - Edwardsville working on Space Telescope&apos;s collaboration with &lt;a href=&quot;http://cosmoquest.org/&quot;&gt;CosmoQuest&lt;/a&gt; on an exciting crowd-sourcing project for citizen-scientists.&lt;/p&gt;
&lt;p&gt;I&apos;m providing the back-end pipeline to produce the images for the web app (I&apos;ll share the GitHub onces it&apos;s a little further along). Right now I&apos;m using a little imaging tool I created (below) to examine the steps in the image scaling. The three row are the original image, the image after the step was applied (in this case the log of original image) and a delta image. Each row has an image and a histogram. In this case you can see there is single pixel pulling the stretch way out of whack so I&apos;m working a function to trim the lowest outliers.&lt;/p&gt;
</content:encoded></item><item><title>On Leaving Academia « Ars Experientia</title><link>https://acviana.com/posts/on-leaving-academia-ars-experientia/</link><guid isPermaLink="true">https://acviana.com/posts/on-leaving-academia-ars-experientia/</guid><description>On Leaving Academia « Ars Experientia</description><pubDate>Tue, 24 Jul 2012 17:54:26 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;http://cs.unm.edu/~terran/academic_blog/?p=113&quot;&gt;On Leaving Academia « Ars Experientia&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I saw this blog post being passed around Twitter this afternoon by the likes of &lt;a href=&quot;https://twitter.com/dabeaz&quot;&gt;David Beazley&lt;/a&gt; and &lt;a href=&quot;https://twitter.com/gvwilson&quot;&gt;Greg Wilson&lt;/a&gt; and the timing was perfect for post I was thinking about writing.&lt;/p&gt;
&lt;p&gt;One of the things I noticed last week at &lt;a href=&quot;http://conference.scipy.org/scipy2012/&quot;&gt;SciPy&lt;/a&gt; was how many smart people have left academia or were weighing the benefits of leaving. I found out a friend of mine, who just got tenure at the age of 39, makes a only few hundred dollars more a year than I do as a 27 year-old analyst. He said he basically has to talk himself into staying in academia every summer.&lt;/p&gt;
&lt;p&gt;There are of course unhappy people in the private sector. And there are many smart people still going into academia. But most of the current and aspiring academics I know or have had a chance to talk to do not paint a pretty picture of the journey to academia, or the what happens when you get there. It seems to me that even in the last 5 years since I finished undergrad the game has changed.&lt;/p&gt;
&lt;p&gt;What I am personally seeing is a lot of very smart and motivated people look around and change their minds. And as Terran Lane points on in his blog post, this is a very disturbing trend.&lt;/p&gt;
</content:encoded></item><item><title>Benchmarking Python Sets, Comprehensions, and Loops</title><link>https://acviana.com/posts/benchmarking-python-sets-comprehensions-and/</link><guid isPermaLink="true">https://acviana.com/posts/benchmarking-python-sets-comprehensions-and/</guid><description>Benchmarking Python Sets, Comprehensions, and Loops</description><pubDate>Sat, 14 Jul 2012 14:17:00 GMT</pubDate><content:encoded>&lt;p&gt;I learned how to use Python&apos;s cProfile module last week and thought it was pretty cool. So at work this week when we came across a couple across a couple of lines at code I thought could be optimized I used it as an opportunity do a little profiling.&lt;/p&gt;
&lt;p&gt;I compared loops against list comprehensions and then against the set object. I knew loops would be slower in both cases, but like a homework problem I wanted to go through the steps and get a better handle on numbers.&lt;/p&gt;
&lt;h2&gt;Comprehension vs Loops&lt;/h2&gt;
&lt;p&gt;I started by comparing a list comprehension against an equivalent for loop to pull all the even numbers out of the first 10,000 integers.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def lists_with_for_loop():
    evens = []
    for item in range(10000): 
        if item % 2 == 0:
        evens.append(item)

def lists_with_comprehensions(): 
    evens = [x for x in range(10000) if x % 2 == 0]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here are the results:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;function&lt;/th&gt;
&lt;th&gt;ncalls&lt;/th&gt;
&lt;th&gt;tottime&lt;/th&gt;
&lt;th&gt;percall&lt;/th&gt;
&lt;th&gt;cumtime&lt;/th&gt;
&lt;th&gt;percal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;lists_with_for_loop&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.013&lt;/td&gt;
&lt;td&gt;0.013&lt;/td&gt;
&lt;td&gt;0.022&lt;/td&gt;
&lt;td&gt;0.022&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;lists_with_comprehensions&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.002&lt;/td&gt;
&lt;td&gt;0.002&lt;/td&gt;
&lt;td&gt;0.002&lt;/td&gt;
&lt;td&gt;0.002&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;List comprehensions win by about an order of magnitude.&lt;/p&gt;
&lt;h2&gt;Sets vs Loops&lt;/h2&gt;
&lt;p&gt;To compare sets and loops I made a loop that generated a list of all the unique entries in a list of 10,000 integers between 0 and 1,000. It occurred to me that this was a relatively &quot;sparse&quot; case so I also made a &quot;dense&quot; one where the integers are distributed between 0 and 10. Here is the code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SPARSE_INPUT_LIST = N.random.random_integers(0,1000,10000) 
DENSE_INPUT_LIST = N.random.random_integers(0,10,10000)

def unique_with_for_loop_dense():
    output = [] 
    for item in DENSE_INPUT_LIST:
            if item not in output: 
                output.append(item)

def unique_with_for_loop_sparse():
    output = [] 
    for item in SPARSE_INPUT_LIST:
        if item not in output:
            output.append(item)

def unique_with_set_dense():
    output = set(DENSE_INPUT_LIST)

def unique_with_set_sparse():
    output = set(SPARSE_INPUT_LIST)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And here are the results:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;function&lt;/th&gt;
&lt;th&gt;ncalls&lt;/th&gt;
&lt;th&gt;tottime&lt;/th&gt;
&lt;th&gt;percall&lt;/th&gt;
&lt;th&gt;cumtime&lt;/th&gt;
&lt;th&gt;percal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;unique_with_for_loop_sparse&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.515&lt;/td&gt;
&lt;td&gt;0.515&lt;/td&gt;
&lt;td&gt;0.518&lt;/td&gt;
&lt;td&gt;0.518&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;unique_with_set_dense&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.024&lt;/td&gt;
&lt;td&gt;0.024&lt;/td&gt;
&lt;td&gt;0.024&lt;/td&gt;
&lt;td&gt;0.024&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;unique_with_for_loop_dense&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.013&lt;/td&gt;
&lt;td&gt;0.013&lt;/td&gt;
&lt;td&gt;0.013&lt;/td&gt;
&lt;td&gt;0.013&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;unique_with_set_sparse&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.008&lt;/td&gt;
&lt;td&gt;0.008&lt;/td&gt;
&lt;td&gt;0.008&lt;/td&gt;
&lt;td&gt;0.008&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;So sets always beat loops, but the difference is more dramatic the sparser the input list is. This makes sense since the set object uses a hash table while the for loop I write just does a list look up (though I&apos;m not exactly sure how it&apos;s implemented). The larger the output list/set will be the more efficient the hash is over the list.&lt;/p&gt;
&lt;p&gt;I made a plot of this for a a non-negative set of 1,000,000 integers of varying densities.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/assets/tumblr/benchmarking-python-sets-comprehensions-and/tumblr_m76462f3Sd1qi9n6f.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;A couple of notes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The bench marks and the plots were generated on different computers so the numbers don&apos;t match up, though the trends should be the same. I just didn&apos;t feel like entering all the numbers in the html &lt;code&gt;table&lt;/code&gt; tags again.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To generate the plot I used the &lt;code&gt;time&lt;/code&gt; module and took the difference of two time instances to find the run time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You can view the source code for the benchmarks and the plot (and any other future code from this blog) on my Github &lt;a href=&quot;https://github.com/acviana/BlogExamples&quot;&gt;Page&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
</content:encoded></item><item><title>Pain with the Windows Command Prompt</title><link>https://acviana.com/posts/pain-with-the-windows-command-prompt/</link><guid isPermaLink="true">https://acviana.com/posts/pain-with-the-windows-command-prompt/</guid><description>Pain with the Windows Command Prompt</description><pubDate>Wed, 04 Jul 2012 16:54:06 GMT</pubDate><content:encoded>&lt;p&gt;&lt;em&gt;[I wrote this post a few months but never posted it for some reason. Ah, the joys of trying to develop on Windows.]&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I love my trusty IBM Thinkpad. It&apos;s coming up on 5 years old and other than a some fan buzz and a RAM card that needed to be replaced it runs just fine.&lt;/p&gt;
&lt;p&gt;But the Windows Command Prompt can just ruin my day.&lt;/p&gt;
&lt;p&gt;So I run my code to ingest the data into the database. No problem. But then I need to see if it made it into the database. I try just typing &lt;code&gt;sqlite3&lt;/code&gt; in the Command Prompt. No dice. No big deal, I just download the SQLite3 windows shell. But the problem with this tool is is just launches. You can&apos;t do something like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sqlite3 mydatabse.db
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Bummer. In hindsight I should have tried to called the full path to &lt;code&gt;sqlite3.exe&lt;/code&gt; in the folder with my database. But since SQLite was just firing up in the folder where &lt;code&gt;sqlite3.exe&lt;/code&gt; lives I focused on trying to open the database from within the SQLite3 shell.&lt;/p&gt;
&lt;p&gt;Enter my next problem. There is no tab completion or copy/paste functionality in the SQLite3 shell. So now I have to type the full path out. No big deal right?&lt;/p&gt;
&lt;p&gt;Wrong. My database lives with my code in a folder in my dropbox area. On a windows machine the Dropbox root folder is called &lt;code&gt;My Dropbox&lt;/code&gt;, with the space. This creates a minor headache because now I have to get SQLite to accept a path name with space in it. After some playing with quotes, and forward and back slashes I decided that the space in the &lt;code&gt;My Dropbox&lt;/code&gt; path element was annoying enough in general I should just change it altogether (it&apos;s just &lt;code&gt;Dropbox&lt;/code&gt; on my OSX system at work).&lt;/p&gt;
&lt;p&gt;It turns out this is a known &apos;non-issue&apos; with Dropbox that is intentionally configured that was to avoid confusing non-technical Dropbox users. Around the time I was reading about scripts and redownloading instructions to change this I realized I had gone much too far afield wresting with this.&lt;/p&gt;
&lt;p&gt;I the end I hacked a solution by just making a copy of &lt;code&gt;sqlite3.exe&lt;/code&gt; in the same folder as my database. Then I can use the &lt;code&gt;.restore&lt;/code&gt; command inside the SQLite3 shell without worrying about the full path. Definitely a hack but it worked and I was able to get it to work.&lt;/p&gt;
</content:encoded></item><item><title>Benchmarking the Python SQLite3 Connection</title><link>https://acviana.com/posts/benchmarking-the-python-sqlite3-connection/</link><guid isPermaLink="true">https://acviana.com/posts/benchmarking-the-python-sqlite3-connection/</guid><description>Benchmarking the Python SQLite3 Connection</description><pubDate>Wed, 04 Jul 2012 16:51:00 GMT</pubDate><content:encoded>&lt;p&gt;We have several SQLite calls in our pipeline that look something like this&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for file in file_list:
    conn = sqlite3.connect(database)
    c = conn.cursor()
    command = &apos;SELECT * FROM table&apos;
    c.execute(command)
    conn.close()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I originally wrote our database calls this way because at the time (this was about 2 years ago), I was worried about the code crashing with an open database connection. I thought this would be vaguely &quot;bad&quot; so I was over enthusiastic about keeping connections closed. The design pattern stuck.&lt;/p&gt;
&lt;p&gt;But someone pointed out to us that it would be faster to do something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;conn = sqlite3.connect(database)
c = conn.cursor()
for file in file_list:
    command = &apos;SELECT * FROM table&apos;
    c.execute(command)
conn.close()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The reason being that there is some overhead involved when opening a Python SQLite3 connection. When you loop over the connection opening/closing steps you multiply this overhead.&lt;/p&gt;
&lt;p&gt;My officemate went ahead and tested the overhead for running both design patterns over a test set of queries. Based on our results you can infer a connection overhead of 0.0134s. I took that data and assumed a linear increase for the overhead time for each additional query to extrapolate the results up to scale of our filesystem. The results are below. The first row are the real results and the next two are the extrapolated predictions.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Multiple&lt;/th&gt;
&lt;th&gt;Single&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;th&gt;Delta %&lt;/th&gt;
&lt;th&gt;Records&lt;/th&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0:02:23&lt;/td&gt;
&lt;td&gt;0:02:05&lt;/td&gt;
&lt;td&gt;0:00:18&lt;/td&gt;
&lt;td&gt;12.59%&lt;/td&gt;
&lt;td&gt;9300&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;wfc3g flt files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0:19:13&lt;/td&gt;
&lt;td&gt;0:16:48&lt;/td&gt;
&lt;td&gt;0:02:25&lt;/td&gt;
&lt;td&gt;12.59%&lt;/td&gt;
&lt;td&gt;75000&lt;/td&gt;
&lt;td&gt;8.0645&lt;/td&gt;
&lt;td&gt;All flt files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3:31:25&lt;/td&gt;
&lt;td&gt;3:04:48&lt;/td&gt;
&lt;td&gt;0:26:36&lt;/td&gt;
&lt;td&gt;12.59%&lt;/td&gt;
&lt;td&gt;825000&lt;/td&gt;
&lt;td&gt;88.7097&lt;/td&gt;
&lt;td&gt;All fits files&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As you can see, we achieve a 13% speed up, which if we are querying the entire database saves us almost 30 min. Since we are running this process overnight this isn&apos;t a huge deal. Going forward it would be smart to use the faster form. However, I&apos;m not sure if it&apos;s worth it to go back and fix our old SQLite calls as we have more pressing issues.&lt;/p&gt;
&lt;p&gt;All in all, this was an interesting exercise and will probably lead to some more detailed profiling of our pipeline in the future.&lt;/p&gt;
</content:encoded></item><item><title>Hipster Passwords</title><link>https://acviana.com/posts/hipster-passwords/</link><guid isPermaLink="true">https://acviana.com/posts/hipster-passwords/</guid><description>Hipster Passwords</description><pubDate>Wed, 04 Jul 2012 16:37:16 GMT</pubDate><content:encoded>&lt;p&gt;K: I can appreciate using &quot;12345&quot; as a password, like ironically. You know, like Spaceballs. A: Passwords aren&apos;t t-shirts.&lt;/p&gt;
</content:encoded></item><item><title>_Finding the Right Plot_
I really like data visualization, even if I can&apos;t claim to be any an expert at it. Today I wanted to...</title><link>https://acviana.com/posts/finding-the-right-plot-i-really-like-data/</link><guid isPermaLink="true">https://acviana.com/posts/finding-the-right-plot-i-really-like-data/</guid><description>_Finding the Right Plot_
I really like data visualization, even if I can&apos;t claim to be any an expert at it. Today I wanted to...</description><pubDate>Tue, 03 Jul 2012 18:31:13 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;/assets/tumblr/finding-the-right-plot-i-really-like-data/tumblr_m6lyo1p5dg1rt9cjfo1_1280.png&quot; alt=&quot;Photo&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Finding the Right Plot&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I really like data visualization, even if I can&apos;t claim to be any an expert at it. Today I wanted to plot some data in a flat file a coworker had generated. I first tried this with Google Docs because I just wanted something quick and dirty. But while I discovered some nice new features I wasn&apos;t quite able to do what I wanted; so I switched to Python with the Matplotlib module.&lt;/p&gt;
&lt;p&gt;In Matplotlib I was pretty quickly able to get what I wanted, but then there was the matter of deciding what I wanted. I went through the 4 plots above.&lt;/p&gt;
&lt;p&gt;The first was the standard scatter plot, this is no good because close fluctuations in February just look scattered and hard to follow. Then I tried a line plot, but now I felt that it was hard to pick out the individual points. Since the data points are non-uniformly distributed this is key information. Next I tried a plot with both lines and points (actually the two plots overlaid). This produced the best of both worlds, you could see the individual data points as well as follow their progression.&lt;/p&gt;
&lt;p&gt;But then I noticed there was a problem with this chart as well. The use of the line implies an interpolation between the data points that might not be true. For example the increase shown on the plot from mid-May to mid-June actually occurred suddenly, not gradually as the line implies.&lt;/p&gt;
&lt;p&gt;Finally I settled on a bar plot. This allows us to see that the data is non-uniformly distributed. We can pick out each individual point, as well as follow the flow of data. Lastly, nothing about data markers is misleading about the nature of the data. Areas with no data look &quot;empty&quot;, which is exactly what they are.&lt;/p&gt;
&lt;p&gt;Fun little project for the end of the workday.&lt;/p&gt;
</content:encoded></item><item><title>Just registered for [SciPy2012](http://conference.scipy.org/). This will be my first programming conference of any kind. I&apos;m all...</title><link>https://acviana.com/posts/just-registered-for-scipy2012-this-will-be-my/</link><guid isPermaLink="true">https://acviana.com/posts/just-registered-for-scipy2012-this-will-be-my/</guid><description>Just registered for [SciPy2012](http://conference.scipy.org/). This will be my first programming conference of any kind. I&apos;m all...</description><pubDate>Mon, 02 Jul 2012 16:39:25 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;/assets/tumblr/just-registered-for-scipy2012-this-will-be-my/tumblr_m6jytpmj0A1rt9cjfo1_400.png&quot; alt=&quot;Photo&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Just registered for &lt;a href=&quot;http://conference.scipy.org/&quot;&gt;SciPy2012&lt;/a&gt;. This will be my first programming conference of any kind. I&apos;m all around excited but I&apos;m the most excited for the opportunity to hack during the sprints during the last two days. Plus, Austin is a sweet town!&lt;/p&gt;
</content:encoded></item><item><title>I just wanted to share a quick and dirty visualization I made to analyze some an processing pipeline I&apos;m working on.  
What...</title><link>https://acviana.com/posts/i-just-wanted-to-share-a-quick-and-dirty/</link><guid isPermaLink="true">https://acviana.com/posts/i-just-wanted-to-share-a-quick-and-dirty/</guid><description>I just wanted to share a quick and dirty visualization I made to analyze some an processing pipeline I&apos;m working on.  
What...</description><pubDate>Tue, 26 Jun 2012 12:11:58 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;/assets/tumblr/i-just-wanted-to-share-a-quick-and-dirty/tumblr_m68ifz80OI1rt9cjfo1_1280.gif&quot; alt=&quot;Photo&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I just wanted to share a quick and dirty visualization I made to analyze some an processing pipeline I&apos;m working on.&lt;/p&gt;
&lt;p&gt;What you are looking at is a GIF of a the same image displayed with a linear stretch from 11 stretch cut-offs. The cutoffs are created by flattening the top and bottom pixels. It&apos;s a little quick and dirty but I&apos;ve noticed it works across a broad range of image types.&lt;/p&gt;
&lt;p&gt;However, the top % of pixels will often include data from the target. This is good strategy when the target is saturated since you aren&apos;t losing any information by scaling it down. But when the target is not saturated you end up &quot;smoothing&quot; out real features.&lt;/p&gt;
&lt;p&gt;To test this I tried varying the %-clip levels. For each image I made 11 png files. Each of these pgn files is a XX% to YY% stretch. They start at 99% to 1% and step down by a 1/10 of a percent to 99.9% and 0.1%.&lt;/p&gt;
&lt;p&gt;Each of these png files contains 4 images. The first is the before image, then the after image. The 3rd image is the most interesting, it shows the top pixels which were flattened in red, and the bottom in blue. The last image is the histogram of the after image. The title shows the stretch range and the title of the &quot;flagged&quot; image shows the value of the top % cut-off.&lt;/p&gt;
&lt;p&gt;The &quot;flagged&quot; is definitely the most interesting - it shows the compromises you make to create a acceptable linear scaling. Still thinking about how to improve the end product but I&apos;m happy with the visualization I created to get there.&lt;/p&gt;
</content:encoded></item><item><title>I&apos;m excited I finally have a reason to learn JavaScript at work. The first thing, per the suggestion of my friend who does UX,...</title><link>https://acviana.com/posts/im-excited-i-finally-have-a-reason-to-learn/</link><guid isPermaLink="true">https://acviana.com/posts/im-excited-i-finally-have-a-reason-to-learn/</guid><description>I&apos;m excited I finally have a reason to learn JavaScript at work. The first thing, per the suggestion of my friend who does UX,...</description><pubDate>Mon, 25 Jun 2012 13:33:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;/assets/tumblr/im-excited-i-finally-have-a-reason-to-learn/tumblr_m66rjaNUFd1rt9cjfo2_640.jpg&quot; alt=&quot;Photo&quot; /&gt; &lt;img src=&quot;/assets/tumblr/im-excited-i-finally-have-a-reason-to-learn/tumblr_m66rjaNUFd1rt9cjfo1_r1_250.gif&quot; alt=&quot;Photo&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I&apos;m excited I finally have a reason to learn JavaScript at work. The first thing, per the suggestion of my friend who does UX, was have work order me a copy of &lt;a href=&quot;http://www.amazon.com/JavaScript-Good-Parts-Douglas-Crockford/dp/0596517742&quot;&gt;JavaScript: The Good Parts&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Excited to read through it but couldn&apos;t help laughing when I stumbled upon the size comparison between &lt;em&gt;JavaScript: The good Parts&lt;/em&gt; and &lt;em&gt;JavaScript: The Definitive Guide&lt;/em&gt;.&lt;/p&gt;
</content:encoded></item><item><title>Widget Factory vs Film Crew Organizational Models</title><link>https://acviana.com/posts/widget-factory-vs-film-crew-organizational-models/</link><guid isPermaLink="true">https://acviana.com/posts/widget-factory-vs-film-crew-organizational-models/</guid><description>Widget Factory vs Film Crew Organizational Models</description><pubDate>Wed, 20 Jun 2012 14:58:18 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;http://programmers.stackexchange.com/a/45814&quot;&gt;Widget Factory vs Film Crew Organizational Models&lt;/a&gt;&lt;/p&gt;
</content:encoded></item><item><title>Top 30 Most Popular Stolen LinkedIn Passwords</title><link>https://acviana.com/posts/top-30-most-popular-stolen-linkedin-passwords/</link><guid isPermaLink="true">https://acviana.com/posts/top-30-most-popular-stolen-linkedin-passwords/</guid><description>Top 30 Most Popular Stolen LinkedIn Passwords</description><pubDate>Sat, 09 Jun 2012 09:30:42 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;http://mashable.com/2012/06/08/linkedin-stolen-passwords-list/&quot;&gt;Top 30 Most Popular Stolen LinkedIn Passwords&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Seriously? &quot;link&quot; is the most popular stolen password? What&apos;s your bank password? &quot;bank&quot;? Or just &quot;$&quot;? I can&apos;t believe that LinkedIn even allows a 4-letter password.&lt;/p&gt;
&lt;p&gt;I wonder what percent of the total passwords these top 30 represent.&lt;/p&gt;
</content:encoded></item><item><title>You need to do something spectacular before anyone knows you;re doing anything at all.</title><link>https://acviana.com/posts/you-need-to-do-something-spectacular-before-anyone/</link><guid isPermaLink="true">https://acviana.com/posts/you-need-to-do-something-spectacular-before-anyone/</guid><description>You need to do something spectacular before anyone knows you;re doing anything at all.</description><pubDate>Wed, 30 May 2012 11:47:03 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;You need to do something spectacular before anyone knows you;re doing anything at all. One of my bosses. Everything I&apos;ve done at work that&apos;s really stood out has been in that vein.&lt;/p&gt;
&lt;/blockquote&gt;
</content:encoded></item><item><title>My First List Comprehension </title><link>https://acviana.com/posts/my-first-list-comprehension/</link><guid isPermaLink="true">https://acviana.com/posts/my-first-list-comprehension/</guid><description>My First List Comprehension</description><pubDate>Wed, 02 May 2012 13:50:00 GMT</pubDate><content:encoded>&lt;p&gt;I&apos;ve been programming in Python for over 4 years. That&apos;s how long it took me to use &lt;a href=&quot;http://docs.python.org/tutorial/datastructures.html#list-comprehensions&quot;&gt;list comprehensions&lt;/a&gt;. I&apos;m a little embarrassed about this.&lt;/p&gt;
&lt;p&gt;To be fair I&apos;ve known about list comprehensions for years but just never used them. It&apos;s one of those nice bits of syntactic sugar that you can get by without. This is especially true if you&apos;re off in the corner reinventing the wheel, as I was the first couple of years I was programming. I knew they existed, I had a rough Idea of what they were for, but I just didn&apos;t see the use for them. Like learning most things in software development, it was just a matter of having the right problem.&lt;/p&gt;
&lt;p&gt;In my case I was refactoring some of my code so my officemate could more effectively collaborate with me and I didn&apos;t like the way I was modifying lists. Let&apos;s say I had a list of names and wanted to remove all the names that didn&apos;t start with b. I was doing something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;name_list = [&apos;bill&apos;, &apos;bob&apos;, &apos;brian&apos;, &apos;betty&apos;, &apos;farnsworth&apos;]
b_name_only_list = []
for name in name_list:
    if name[0] == &apos;b&apos;:
        b_name_only_list.append(name)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This would create a list of the names that start with &apos;b&apos;. For smaller projects this worked. But it felt dumb to have to make a new list to do this. What if I didn&apos;t want two lists? What I just wanted to chuck the non-b names? I tried to do this by modifing the list &quot;in-place&quot; with something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for name in name_list:
    if name[0] != &apos;b&apos;:
        name_list.remove(name)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But modifying a list as you are iterating over it creates errors and other weirdness such as this little guy:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;a = [1, 2, 3]
for item in a:
    a.remove(item)
print a
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which prints &lt;code&gt;2&lt;/code&gt;. (Explained &lt;a href=&quot;http://stackoverflow.com/questions/7226997/removing-items-from-a-list-in-a-loop&quot;&gt;here&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;I got the feeling I was barking up the wrong tree. So after a little searching on StackOverflow I remembered, &quot;oh yeah, list comprehensions, I should try that.&quot; Sure enough I can just do something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; name_list = [name for name in name_list if name[0] == &apos;b&apos;]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Sweet.&lt;/p&gt;
</content:encoded></item><item><title>Just had the same thought this morning.
(via Yahoo&apos;s genius content strategy - The Oatmeal)</title><link>https://acviana.com/posts/just-had-the-same-thought-this-morning-via/</link><guid isPermaLink="true">https://acviana.com/posts/just-had-the-same-thought-this-morning-via/</guid><description>Just had the same thought this morning.
(via Yahoo&apos;s genius content strategy - The Oatmeal)</description><pubDate>Tue, 24 Apr 2012 16:11:28 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;/assets/tumblr/just-had-the-same-thought-this-morning-via/tumblr_m305j5VREm1rt9cjfo1_1280.jpg&quot; alt=&quot;Photo&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Just had the same thought this morning.&lt;/p&gt;
&lt;p&gt;(via &lt;a href=&quot;http://theoatmeal.com/pl/state_web_spring/yahoo&quot;&gt;Yahoo&apos;s genius content strategy - The Oatmeal&lt;/a&gt;)&lt;/p&gt;
</content:encoded></item><item><title>Sure, I trust Google to index the contents of all my files. Why not?</title><link>https://acviana.com/posts/sure-i-trust-google-to-index-the-contents-of-all/</link><guid isPermaLink="true">https://acviana.com/posts/sure-i-trust-google-to-index-the-contents-of-all/</guid><description>Sure, I trust Google to index the contents of all my files. Why not?</description><pubDate>Tue, 24 Apr 2012 12:33:38 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;Sure, I trust Google to index the contents of all my files. Why not? &lt;a href=&quot;http://daringfireball.net/linked/2012/04/24/google-drive&quot;&gt;Daring Fireball Linked List: Introducing Google Drive&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
</content:encoded></item><item><title>`ls` in the Windows Command Prompt</title><link>https://acviana.com/posts/ls-in-the-windows-command-prompt/</link><guid isPermaLink="true">https://acviana.com/posts/ls-in-the-windows-command-prompt/</guid><description>`ls` in the Windows Command Prompt</description><pubDate>Fri, 13 Apr 2012 15:59:51 GMT</pubDate><content:encoded>&lt;p&gt;Right now to interact with Windows Command Prompt I use a wrapper called Console2. It adds some basic functionality to the Command Prompt like tabs, and copy/paste. It&apos;s an improvement but not great because it&apos;s still just a wrapper around the primitive command prompt.&lt;/p&gt;
&lt;p&gt;So I&apos;ve been slowly transitioning over to Cygwin. Last week one of my buddies &lt;a href=&quot;http://theothersideofthescreen.tumblr.com/post/20543972366/getting-cygwin-to-see-my-local-file-system&quot;&gt;showed me&lt;/a&gt; what to add to my Windows &lt;code&gt;PATH&lt;/code&gt; variable to to allow Cygwin to see the local filespace. I was, and still am, confused about why adding paths &lt;em&gt;inside&lt;/em&gt; of the Cygwin filespace allowed Cygwin to see the rest of the drive, but it worked.&lt;/p&gt;
&lt;p&gt;But today I noticed something interesting. One of the minor annoyances of having to use the Windows Command Prompt at home and OSX at work is that in windows you have to use &lt;code&gt;dir&lt;/code&gt; instead of &lt;code&gt;ls&lt;/code&gt; to view the contents of a folder and I have to remember to switch back and forth.&lt;/p&gt;
&lt;p&gt;But today, &lt;code&gt;ls&lt;/code&gt; worked! It was such a little thing that I didn&apos;t even notice till I had done it a few times. I realized (I think) this is because the paths I added to the &lt;code&gt;PATH&lt;/code&gt; variable mapped the functions from Cygwin to rest of the Windows environment.&lt;/p&gt;
&lt;p&gt;Neat! A nice surprise.&lt;/p&gt;
</content:encoded></item><item><title>Update 04/12/12</title><link>https://acviana.com/posts/update-041212/</link><guid isPermaLink="true">https://acviana.com/posts/update-041212/</guid><description>Update 04/12/12</description><pubDate>Thu, 12 Apr 2012 16:25:56 GMT</pubDate><content:encoded>&lt;p&gt;It&apos;s been a productive few days.&lt;/p&gt;
&lt;p&gt;Got some stuff done:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Joined GitHub and have my eyes on a project I want to contribute to.&lt;/li&gt;
&lt;li&gt;Asked my first question on Stack Overflow (got my answer a few hours later).&lt;/li&gt;
&lt;li&gt;Working my way through JavaScript examples on &lt;a href=&quot;http://www.codecademy.com/&quot;&gt;Code Academy&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Got IPython Notebook to work on my laptop, excited to play with that.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Still scratching my head on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Getting IPython Notebook to work on my work computer because of some complications between the IPython in our central Python source and my local site-packages.&lt;/li&gt;
&lt;li&gt;Getting Cygwin on my laptop to see my python install outside of the Cygwin drive tree.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Got some good programming projects lined up a work as well.&lt;/p&gt;
&lt;p&gt;Feeling good, feeling like I&apos;m moving along.&lt;/p&gt;
</content:encoded></item><item><title>In December 2006, Palm CEO Ed Colligan summarily dismissed the idea that a traditional personal computing company could compete...</title><link>https://acviana.com/posts/in-december-2006-palm-ceo-ed-colligan-summarily/</link><guid isPermaLink="true">https://acviana.com/posts/in-december-2006-palm-ceo-ed-colligan-summarily/</guid><description>In December 2006, Palm CEO Ed Colligan summarily dismissed the idea that a traditional personal computing company could compete...</description><pubDate>Mon, 09 Apr 2012 20:35:25 GMT</pubDate><content:encoded>&lt;blockquote&gt;&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;In December 2006, Palm CEO Ed Colligan summarily dismissed the idea that a traditional personal computing company could compete in the smartphone business. “We’ve learned and struggled for a few years here figuring out how to make a decent phone,” he said. “PC guys are not going to just figure this out. They’re not going to just walk in.”&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In January 2007, Microsoft CEO Steve Ballmer laughed off the prospect of an expensive smartphone without a keyboard having a chance in the marketplace as follows: “Five hundred dollars? Fully subsidized? With a plan? I said that’s the most expensive phone in the world and it doesn’t appeal to business customers because it doesn’t have a keyboard, which makes it not a very good e-mail machine.”&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In March 2007, computing industry pundit John C. Dvorak argued that “Apple should pull the plug on the iPhone” since “There is no likelihood that Apple can be successful in a business this competitive.” Dvorak believed the mobile handset business was already locked up by the era’s major players. “This is not an emerging business. In fact it’s gone so far that it’s in the process of consolidation with probably two players dominating everything, Nokia Corp. and Motorola Inc.”&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href=&quot;http://www.forbes.com/sites/adamthierer/2012/04/01/bye-bye-blackberry-how-long-will-apple-last/&quot;&gt;http://www.forbes.com/sites/adamthierer/2012/04/01/bye-bye-blackberry-how-long-will-apple-last/&lt;/a&gt;&lt;/p&gt;
</content:encoded></item><item><title>RSS in Chrome</title><link>https://acviana.com/posts/rss-in-chrome/</link><guid isPermaLink="true">https://acviana.com/posts/rss-in-chrome/</guid><description>RSS in Chrome</description><pubDate>Mon, 09 Apr 2012 20:32:29 GMT</pubDate><content:encoded>&lt;p&gt;Why does clicking on an RSS feed link take you to XML source code for the feed? I&apos;m pretty sure that&apos;s never what anyone wants when they click the RSS button. Shouldn&apos;t there be something like the &lt;code&gt;&amp;lt;mailto&amp;gt;&lt;/code&gt; tag?&lt;/p&gt;
</content:encoded></item><item><title>Paying Bills Online</title><link>https://acviana.com/posts/paying-bills-online/</link><guid isPermaLink="true">https://acviana.com/posts/paying-bills-online/</guid><description>Paying Bills Online</description><pubDate>Mon, 09 Apr 2012 11:59:00 GMT</pubDate><content:encoded>&lt;p&gt;I just spent an hour paying a series of small bills, parking tickets, vehicle registration, toll fees, medical bills, etc. All were &amp;lt;$25. About half the time I just gave up and called the organization, sat on hold, and paid over the phone. Some trends I saw:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why are these pages so ugly? Look at a clean page, look at your page. Recognize the problem.&lt;/li&gt;
&lt;li&gt;Why do the names of things change across your pages and documents. Account number, member number, user number, etc.&lt;/li&gt;
&lt;li&gt;Why do different divisions have different billing pages. You have one company. Give me one billing portal.&lt;/li&gt;
&lt;li&gt;You already have my email. I give it to you every time I interact with you.&lt;/li&gt;
&lt;li&gt;Why am I hunting through menus and sidebars for what I want. Guide me, don&apos;t send me on what you think is a logical hunt.&lt;/li&gt;
&lt;li&gt;Why do I have to keep typing in my credit card number. There has to be a better way. Hell, why isn&apos;t there a card reader on my laptop. I&apos;d rather scan it every time than have the numbers sitting in a cache somewhere.&lt;/li&gt;
&lt;li&gt;Why do you send anything in bulk mail ever?&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>Inbox Zero</title><link>https://acviana.com/posts/inbox-zero/</link><guid isPermaLink="true">https://acviana.com/posts/inbox-zero/</guid><description>Inbox Zero</description><pubDate>Sun, 08 Apr 2012 22:41:43 GMT</pubDate><content:encoded>&lt;p&gt;I just watched &lt;a href=&quot;http://youtu.be/z9UjeTMb3Yk&quot;&gt;this&lt;/a&gt; Google Tech Talk on &quot;Inbox Zero&quot; by &lt;a href=&quot;http://www.merlinmann.com/&quot;&gt;Merlin Mann&lt;/a&gt;. It wasn&apos;t as interesting as I expected but I did get some useful ideas from it. The biggest immediate change is that I put &lt;em&gt;everything&lt;/em&gt; from my inbox, all 9k+ messages, into a folder called DMZ. I&apos;ll slowly go through and sort those guys out. But in the meantime, my inbox has zero messages. A fresh slate. Now I have to set up my system to keep it at zero.&lt;/p&gt;
</content:encoded></item><item><title>Email is not a messaging protocol, it’s a to-do list, right? Or at least my inbox is a to-do list and email is the protocol for...</title><link>https://acviana.com/posts/email-is-not-a-messaging-protocol-its-a-to-do/</link><guid isPermaLink="true">https://acviana.com/posts/email-is-not-a-messaging-protocol-its-a-to-do/</guid><description>Email is not a messaging protocol, it’s a to-do list, right? Or at least my inbox is a to-do list and email is the protocol for...</description><pubDate>Sun, 08 Apr 2012 21:45:58 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;Email is not a messaging protocol, it&apos;s a to-do list, right? Or at least my inbox is a to-do list and email is the protocol for putting things on it. Here&apos;s the problem, it is a shitty to-do list. Any one of you can put something on my to-do list and I don&apos;t want that. Paul Graham of YCombinator on Email at PyCon 2012. &lt;a href=&quot;http://youtu.be/R9ITLdmfdLI&quot;&gt;Video&lt;/a&gt; and &lt;a href=&quot;http://paulgraham.com/ambitious.html&quot;&gt;text&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
</content:encoded></item><item><title>The Greedy Algorithm </title><link>https://acviana.com/posts/the-greedy-algorithm/</link><guid isPermaLink="true">https://acviana.com/posts/the-greedy-algorithm/</guid><description>The Greedy Algorithm</description><pubDate>Sun, 08 Apr 2012 21:07:00 GMT</pubDate><content:encoded>&lt;p&gt;I was reading a fun bit of &lt;a href=&quot;http://dfkoz.tumblr.com/post/20389927354/whats-a-pound-of-change-worth&quot;&gt;analysis&lt;/a&gt; of value of a pound of coins on Dan Kozikowski&apos;s blog and stumbled upon the greedy algorithm (via &lt;a href=&quot;http://en.wikipedia.org/wiki/Greedy_algorithm&quot;&gt;wikipedia&lt;/a&gt;):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A greedy algorithm is an algorithm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So, you pick to local max (the highest value coin in this example) and it ends up getting you the global max (the fewest number of coins to make the total value).&lt;/p&gt;
&lt;p&gt;Knowing the names of little things like the greedy algorithm is definitely something I&apos;m trying to work on these days. It just doesn&apos;t look good on interviews when I have to go back and forth a few times before going &quot;ohhhh, right &lt;em&gt;that&lt;/em&gt; thing. I&apos;ve used that before but I didn&apos;t know that&apos;s what it was called.&quot;&lt;/p&gt;
</content:encoded></item><item><title>Getting Cygwin to see my local file system. </title><link>https://acviana.com/posts/getting-cygwin-to-see-my-local-file-system/</link><guid isPermaLink="true">https://acviana.com/posts/getting-cygwin-to-see-my-local-file-system/</guid><description>Getting Cygwin to see my local file system.</description><pubDate>Thu, 05 Apr 2012 15:41:09 GMT</pubDate><content:encoded>&lt;p&gt;My friend showed me how to see my local file system on my windows Vista machine. I added the following paths to my windows PATH variable:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;C:\Cygwin\bin;
C:\Cygwin\usr\bin;
C:\Cygwin\usr\local\bin;
C:\Cygwin\lib;
C:\Cygwin\usr\lib
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then I can see the local file system with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cd c:/
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I&apos;m not sure what adding these path to this variable changed though. Cygwin was already working so I&apos;m not sure why adding paths that point to it&apos;s own directory structure allows me to see the local file structure.&lt;/p&gt;
</content:encoded></item><item><title>On the other side of the screen, it all looks so easy.</title><link>https://acviana.com/posts/on-the-other-side-of-the-screen-it-all-looks-so/</link><guid isPermaLink="true">https://acviana.com/posts/on-the-other-side-of-the-screen-it-all-looks-so/</guid><description>On the other side of the screen, it all looks so easy.</description><pubDate>Thu, 05 Apr 2012 15:26:25 GMT</pubDate><content:encoded>&lt;blockquote&gt;
&lt;p&gt;On the other side of the screen, it all looks so easy. Tron (1982)&lt;/p&gt;
&lt;/blockquote&gt;
</content:encoded></item></channel></rss>