Sort things out: code, project, and open source

As a pet project that kind of fit into the end-of-year mood, I spent some of my personal time re-releasing my previous code projects. In this post, I will announce the repositories I created and try to document additional works I did on those projects.

The projects are:

All these repositories are created with original commit history. Battling with git filter-branch is quite challenging but fun. I disassemble the work done in HTML5 Word Cloud in order to achieve maximum re-useability; by re-useability, I am also referring to having the code understandable and fixable by others. I’ve already use the Google OAuth 2 Login in an unannounced project and it works great. My goal is to eventually rewrote HTML5 Word Cloud and have it use the libraries from these individual repositories, with slick, better UIs.

These are the things I learned, and implemented:

Coding style, comments, and variable naming

One of the first thing I learned from working in Mozilla is to express thoughts to people, not computers, through code. In fact, in collaborative project like Mozilla, having people understand your code is way more important than having computers run them effectively. To achieve that, code written should be:

  • Promote shared understanding through unified coding style. Avoid magical syntax that could confuse the reader.
  • Also, promote shared understanding by annotating the work with comments. David Flanagan (Yes, the David) did amazing work in the Gaia code base on this, but sadly we failed to catch up that.
  • No secretive variable names; this is particularly rampant in the wordcloud2.js, and I haven’t been able to fix them all.

Testing and continuous integration

No, I’ve never wrote automatic tests for my projects until this commit. Before that, all the test is done manually by try out the actual product, or on manual test pages.

In WordFreq and Google OAuth 2 Web Client, I (re)wrote QUnit tests for most of the functionalities. The test coverage is not quite there yet, and for Google OAuth 2 library there are no automatic tests in PhantomJS because some tests must be done with a logged-in session.

I’ve also hook up automatic tests of WordFreq to Travis CI. It’s nice to know such service exists that could further reinforce the fork-commit-pull-request contribution cycle on Github. Unfortunately WordFreq fail randomly on the Travis CI testing VM.

Beyond browser context

Javascript does not just live the the browser. With WordFreq, I tried node.js this time. WordFreq is rewritten in a way that it comes with a synchronous interface and a asynchronous interface (powered by Web Workers). With synchronous interface, WordFreq is available as a npm package and is accessible on the command line. node.js is fun, I would love to use it for more stuff and move away from shell scripting. 🙂

Toward a real collaborative open source project

Putting your work on Github is easy, but nurturing that to a solid free software project is hard. You would just need to learn and adopt some software engineering practices besides writing great code, in order to effectively contribute to the free software ecosystems. These are my baby-steps toward that; feedback and contribution welcome!

我也寫了輸入法,而且用 Javascript!

聽完 jserv 在 COSCUP 2012 講的新酷音輸入法專案,決定來把這篇欠很久的文章寫完,順便換個標題。

在 Mozilla 工作,加入 Boot to Gecko 專案的第一個工作是處理 Gaia 的輸入法,弄了大約一兩個月,最後做出了可以在瀏覽器獨立運作(不需要從伺服器選字)的自動選字注音輸入法:

Gaia 注音輸入法「JS 注音」瀏覽器示範網頁


詞庫要匯進程式勢必要轉成 JSON 格式。原本是從新酷音的詞庫轉出來的,後來發現 LGPL 跟專案不相符(我們用 Apache),找了很久,最後找到 mjhsieh 維護的小麥注音詞庫。至於轉檔,本來一度寫了 phantomJS 的 script 來處理,後來改用自家的 Javascript shell。比較麻煩的就是 jsshell 要用 -U 啟動不然字串不會被當成 UTF-8。


要把 3MB 的 JSON 整包放在記憶體也不是不行,只是這樣放手機是跑不動的。另外的後果就是這樣每次網頁載入就要每次重新下載,還不能保證會不會碰到 HTTP cache。

Gecko 提供的 database 解決方案是 IndexedDB。也說不上來好用或是不好用,反正我沒得選(笑)。比起直接用 key 從 JS Object 拿資料,IndexedDB 的 async 流程當然是比較複雜,但也不會說難用(而且 Disk I/O 本來就應該要 off main-thread 才對)。比較複雜的是一些手機輸入法的進階找字功能(例如輸入「ㄊㄅ」就要查到「台北」)就需要用到 IndexedDB 的 range search 等等。


我沒有實作像是 tree search 之類的搜尋方式,而是傻傻的用窮舉法,把 N 個音可能的斷詞通通切開然後送進資料庫查詞還有問積分。和文字雲一樣,用了很簡單的演算法但是卻很有效。


剩下就是一些 code organization 的問題。牽涉到這麼多 function 的程式不太容易整理好,我自己改寫了兩次,後來被北京的同事 port 成拼音輸入法的時候他又改成 prototype 的寫法,看起來又更整齊。Boot to Gecko(產品名 Firefox OS)最初的上市目標是南美洲,所以中文支援的功能就先放著,先去處理系統的部份。

我有很刻意的把注音的 JS 檔保持著可以讓其他環境與架構順利載入與 fall back 的狀況(還偷偷加了 CommonJS 的 define())。DEMO 網頁連 iPhone 都可以用,沒有 IndexedDB 的瀏覽器也會每次載入詞庫 JSON。有一些 fancy 的 idea 是像是可以用 node-gtk 或是其他方法執行,反攻桌面或是 Android 之類的。不過這記憶體消耗比別人大,演算法也沒有比別人強,要做只是證明因為做的到吧 … XD


COSCUP 2011 網站手機版

這篇文章是給 Mobile App 比賽的介紹。

COSCUP 2011 網站帶有 Responsive Design 概念的 Media query CSS,在手機上網站會套用手機使用的 CSS 版面:

整個畫面的 CSS 重排;右方的贊助單位 Logo 隱藏,改成隨機顯示贊助單位於標題下方。贊助單位頁面有所有的贊助單位。

議程表格使用 CSS 將 <table> 原有的樣式拿掉。「目前議程」的按鈕可以在大會期間快速跳到目前正在進行的議程:


在 iOS 上,網站可以被加到 Home Screen:

從 Home Screen 啟動會進入全螢幕模式:


COSCUP 2011 網站的設計目標是在同樣的網址,同樣的靜態 HTML 上面在桌面與手機均提供最好的體驗。在手機上,希望能夠用類 Web App 的形式展示手機使用者即時需要的內容。沒有在前端技術上很複雜的去做議程篩選、離線瀏覽等等,不過那些都是 Web App 的潛能。使用 Web App 的形式也能實現真正的跨平台——只要有瀏覽器的手機皆能執行操作。


最後,感謝 COSCUP 2011 網站組團隊的協助,尤其是魏藥在後端內容管理上的開發。