My wonderful wife Erin bought me a Raspberry Pi for my birthday in November 2012. It’s changed the way I think about computers and coding education. I wrote a long form post about my latest ideas on HASTAC.org. You can read more by checking out my G+ feed, and below are some selected posts from it.
I started Comp-core.org, the Computational Core Curriculum, to collect and collaborate on course materials (and code, via our github repos) for those of us who teach this material. It’s small now I first got this idea from a panel I moderated in the Fall at UNC called Code in the Classroom. A little wrap-up of that event, which featured profs from Duke and UNC, is here.
So check it out and shoot me an email or join comp-core’s open google group if you want to contribute!
Inspired by this post by Harj Taggar, I took email off of my iPhone in September 2012. After a week or two, it’s really helped me stay focused while I’m away from the computer. That, combined with Rescue Time when I’m at my laptop, has helped me be more efficient with the hours I spend working. I’m interested in this both from an information science perspective (when every device can do everything, how do we make space for thought and focus?) and from a personal perspective (how do I achieve what I want to achieve and leave time for life?). I’ll try to pay attention to both aspects.
Harj wrote his post after 6 months, so I’m going to give it plenty of time before I make any sweeping pronouncements. Stay tuned and I’ll post my reflections here.
I’m learning Clojure. I’d like it to be my primary language.
I spent tons of time evaluating languages, and have compiled my notes into a stub of a curriculum I’d like to teach at UNC one semester. For now though, I thought it might be helpful to throw up a few links that I’ve found particularly helpful.
The features of Clojure that caught my attention
Edit code at runtime
Prototype blazing fast
Macros: programs that can make other programs at run- or compiletime
My setup, shown below, is two terminals, one with VimClojure editing a koan file, the other running the koan script. Edit and save (:w) in vim and the koan script will automatically move on to the next blank (or koan that you got wrong). It still seems magical to me that I’m editing code while a program is running.
Note: Getting the cool highlighting and rainbow parens are described below at the VimClojure link.
There are all sorts of great blip.tv videos on Clojure related topics. (I recommend this one featuring Rich Hickey, the originator of Clojure. He’s not talking about Clojure per se, but design/programming techniques ‘from the hammock’, using your ‘background mind’. Watch the video to see what he means).
Getting Serious with Clojure
I’m just barely serious with clojure, so this is a stub.
VimClojure – I started out using pico, which served me well, but now mostly use vim for terminal editing. Set up rainbow parens in vimslime as described here.
Clojure Eclipse Plugin – Everyone’s favorite IDE (ok, a lot of people’s favorite) has a Clojure plugin, called CounterClockwise. Instructions for a nice setup are here.
Think PowerPoint but easier and everywhere.
Google still doesn’t have the tight integration that Office does, but for me that’s more than made up for by the ease and access.
An excellent tool for creating show-stopping presentations. The software uses vector graphics and takes advantage of that fact to allow near-infinite scalability. What does that mean? Take a look at a Prezi I made about Prezi’s innovative user interface: (press the play button and wait for it to load. use the mouse or arrow keys to navigate)
There’s a big difference between knowing no Linux and knowing a little Linux. I highly recommend getting to a little Linux. It’s so easy!
Try Before You Buy
…Well, the Linux is free so you’ll never have to buy it. But the best way to figure out what it is and play around is using VirtualBox. You can load Linux within your current operating system. Killer. Oracle spearheads the free/opensource project, so you know it’s got some firepower behind it. One day I’ll post a tutorial on setting up VB.
My VirtualBox Machines
After you load it you’ll be able to create new virtual machines. Simply download the .iso for your preferred distribution (see below) and mount it as a virtual CDrom.
Elliott’s Guide to Linux Distributions
Start Here: Ubuntu
Ubuntu is quickly becoming the standard for non-server, non-expert Linux. That said, there is perhaps the largest community of experts working to tweak and improve it. It’s also backed by a company- Canonical Ltd- which, despite the philosophical reservations of some idealists, makes the end product higher quality in my opinion. You can even buy hardware pre-installed with Ubuntu from system76.com. Pretty sweet.
Ubuntu is focused on ease-of use, and it delivers. It’s also customizable and powerful. A great blend, ready for almost anything you’d like to throw at it.
Research Computing: the RedHat Clan
Red Hat is another corporation that backs/spearheads open source projects. RedHat, Fedora, CentOS, and even TarHeel Linux all owe some aspect of their existence to Red Hat the company. RedHat Enterprise Linux is their flagship product, and is the backbone for a significant chunk of the Internet we know and love. Fedora is the community-developed version of their software where they test out new features. It’s re-versioned every 6 months or so, whether it needs it or not, so it’s great for cutting-edge, not so great for your standard user (that said, it’s much much more stable than even a few years ago). CentOS is RedHat Linux with the logos taken out. TarHeel Linux is UNC’s version of RedHat Linux supported for research computing. Check out Emerald, UNC’s research cluster, for info on how to submit massive MatLab or Mathematica jobs to 700 processors.
SQLite is a simple, lightweight, cross-platform relational database sponsored by, amongst others, Bloomberg and Oracle.
Believe it or not, I learned Access back in 7th grade. I just remembered that. Nothing I learned really stuck with me because I didn’t understand relational databases at the time. See below for more on that. Like most Microsoft products, Access is very powerful and widely misunderstood and underutilized.
Filemaker Pro was bought by Apple at some point. The suite of products includes Bento, a template-centered system for personal information management to Filemaker Pro Server, a multi-user environment.
Real men (or hapless students) use the command-line based SQL*Plus. SQL Developer is the graphical interface. These products are database management systems. They both manage Oracle’s relational database, called Oracle v11.
Check out my Reading list for a sense of my perspectives on databases. If you find any inaccuracies or imprecisions in the below, let me know.
I’m very interested in different types of databases and their implications for information, knowledge, and scholarly practice. In contrast to the management software discussed above, the below focuses on the underlying structural options for storing and organizing data.
Relational Databases run your life: they contain your eBay account, your Hospital Records, your Drivers License info, etc, etc, etc. These things keep track of your membership in things, your account values, store inventories, etc. They are excellent for ‘complete’ systems, where the user can define ahead of time what kinds of data will go into the system and, crucially, is certain of the content. Big, subtle problems come along, though, when RDBs are misunderstood or stretched beyond their capabilities (or, as I’m investigating, when indeterminacy or functional indeterminacy enters the picture). Much like statistics, this system is only a grammar that relates elements in predefined ways. I’m working on a formal evaluation of the capabilities/pitfalls of RDBs.
There are many ways to use this one basic format Access and FileMaker both support the Relational Model (developed by E.F. Codd), as does database giant Oracle. SQL is supposed to be a standard interface protocol (it’s not really a language) to the relational model.
Relational Databases break related data into tiny bits so that they can be more efficiently coordinated. How do you know where to make a break? Well, there’s a whole science to database design (and the dreaded Normal Forms), but essentially you identify fault lines within the data for aspects that vary independently of each other. For instance, ‘Author’ varies separately from ‘Title’ in a book (e.g. one Author writes many books, or many people write a text called ‘Biology’). But within ‘Author’, we can specify ‘First’ and ‘Last’ names. We could specifiy ‘First letter,’ ‘Second letter’, etc of each name. But if, in our case, these series of letters always appear together, we know we’ve gotten granular enough. Knowing when to stop splitting up your data is the art of database design. There’s also a science that can tell you when splitting your data up (e.g. into individual letters) won’t be very efficient. That doesn’t mean it’s not desirable, though: I believe that genetics databases split each nucleotide out and use references to the nucleotide in allele descriptions rather than storing ‘ACCG’ etc. Ok time to zoom out again.
Many people use databases and they can get very complex. So far I haven’t found many people with the technical skills to manipulate them and even fewer who can give a good conceptual overview. Hopefully the above is serviceable. Please do shoot me an email if I’ve gotten it wrong.
Data Warehouses (‘Atomic Data Stores’)
Large organizations (particularly in Healthcare and Finance) use Data Warehouses to analyze massive amounts of static data. The entire industry of ‘Business Intelligence’ is pretty much predicated upon their existence. They are a type of Relational Database utilizing a special data model called a Star Schema (or, when it’s more complex, a Snowflake Schema). Basically, the schema collapses the standard relational model down into Facts (the ‘atom’ of atomic) and dimensions. Roughly, the Facts correspond to ‘measurements’ you’ve made and Dimensions are aspects you want to analyze. It’s all a design question, though.
The difference between a data warehouse and a regular relational database is that the latter is operational: it’s updated on the fly. Warehouses are for storage and analysis only. Their schemata (or, for mere English speakers, ‘schemas’) are optimized for retrieval and comparison. If you start trying to update the ‘facts’ you’ll have to go through all of them to make sure you don’t introduce any irregularities.
(via Lew Hassel)
Above we talked about the field ‘Author’ and how it might be reasonably broken up into ‘First’ and ‘Last’ names. The details of this are part of a database’s schema. Think of a schema as a blueprint, or plumbing pipes, or electrical wiring. And just like those systems, if you didn’t plan ahead for certain functionality, chances are trying to add it will break the model. So to break the model above, all we have to do is try to add a ‘Co-Author’.
Imagine trying to add a sink where there wasn’t any plumbing. That’s realizing that what you’re trying to do wasn’t anticipated in the design. It’s frustrating.
Document-oriented databases try to circumvent all this by being build around ‘documents’ instead of into a rigid schema. Think of them like a modular house that you can plug together like legos: you will still get horrible results if you don’t plan to have things meet up, but the pieces can be more autonomous and so it’s easier to add on as you go.
Document oriented databases allow the functionality we love in Apps like GoogleDocs.
An interesting feature here is that ‘documents’ are really just a grouping of fields. There are no specified relationships between documents or fields (i.e. no foreign keys), so the relationship between the fields is latent and dependent upon the capabilities of the field’s data format. For instance if I want to know if a ‘document’ (say, this blog post) was created by the same author as the rest of the blog posts here, I’ll just search all the posts’ ‘Author’ field and see if they match up. This is a consequence of the document-oriented databases’ ‘flatness’- their lack of rigid relations.
The document orientation (‘persuasion,’ if you will..) is great for semi-structured content like blog posts. It would be horrible at tracking your bank account balances.
Object Oriented Databases
I don’t understand object oriented databases very well. More when I do.
In early November I was asked to talk about Data Curation for Art Markets research by the Duke Art, Law and Markets Initiative (DALMI) conference. For a small, interdisciplinary, international and growing field like Art Markets, data curation and sharing has the potential to be a powerful catalyst for growth. Currently, sourcing data from archives or previous publications can be a huge part of the work- before analysis begins. Economists may not have the historical background to collect or evaluate appropriate data, while Art Historians may not have the quantitative toolsets they need for effective analysis. Collaboration has been the solution for the field, but shared, vetted, appropriately described data is the way forward.
Omnigraffle is the Big Kahuna of diagraming programs. I use it extensively. There are complaints I have about it, but they are the kind of complaints you have when you find a piece of software so useful you know all of its quirks. That said, Omni group’s software ergonomics pretty much beat anyone’s.
Visio is the de-facto standard for Windows diagramming. I don’t like to use Windows, so don’t have much experience creating content with Visio. You may like it though. Not to be confused with Vizio.
Gliffy is a neat web tool that wants to be a web-enabled version of OmniGraffle. Not only is it a great diagraming option for Macs, it’s one of the best I’ve found that works with Linux. (sorry, Dia…)
It runs on the web and there is a free version. It’s pretty powerful, and certainly more convenient than box-bound software. It’s not as powerful as Omnigraffle, though, and the interface isn’t as refined (for instance, there’s no equivalent to holding down alt to duplicate elements).
I used Gliffy back when I was just starting to learn about ER diagrams:
I haven’t found it useful enough to upgrade beyond the 5 documents you get for free. Instead, I bought Omnigraffle.
TechSmith’s Morae is an interesting- and very expensive – entry into the UX Analysis Space. It combines screen & webcam capture with indexed notetaking form multiple observers.
I’m sure there are more programs in the UX space. Stay tuned.
I debated putting I went ahead and put Excel in strikethrough. It’s the go-to dataviz tool for the masses, and is great at what it’s designed to do. But if you have limited needs there may be something easier and freer. And if you have advanced needs, Excel’s extremely limited/hard to use visual formatting options will drive you up the wall, as they did to me during my art history thesis research.
Tableau is an expensive but interesting option geared towards business intelligence. I’m keeping an eye on them… (Not to be confused with Tableau, LLC, forensic data analysis hardware provider)
Google Fusion Tables
This is a labs product, so don’t expect fireworks. But take heart that Google has projects like this up its sleeve. Basically, you connect to or upload data and then select from some canned visualizations. The components of the vis are open to public comment. An interesting but mildly useful feature as implemented.
More to come! – Meanwhile, check out this sweet post on visualization by Hilary Davis. (Thanks to Alex Gallin for being the reason I found Hilary’s post)
Growing up, I was the kid who never took notes or did homework assignments. I would read alot, but not always the things that were assigned for class… The upshot of all this is that I’m uncommonly knowledgeable about random things and have stunted organizational skills.
I do homework now (mostly reading), and man, it takes a lot of time and planning! I’ve had to impose order on my mind and on my time to get it all done. I started trying to think in a structured way about tasks, priorities, and my capacity to get things done.
There’s an App for that?
A few months ago I started looking into a to-do list program. There are millions of them. Some are a one basic list, others have nesting and categorization, and some offer syncing across all your devices. The good ones seem to be expensive (though there are some free ones to try too). Dig through this list if you’d like to see some offerings.
So, not wanting a new system before I could prove to myself that I could use a to-do list at all, I decided to use what I already had. Last semester (Fall 2010) I started using the to-do list capabilities of Calendar for Mac, synced using MobileMe. Basic, sturdy, but with an egregious lack of support on the iPhone. The interface was clearly not tweaked, either. Apple seriously fell short on this. Even the recent major calendar update had no changes to the to-do list. Bummer.
But using a to-do list was great! I could put all my readings, etc in there and, though the list itself got HUGE, Calendar let me set it to only display those items due within the timeframe I was looking at (usually a week at a time). Which was great because I also started using the Mac calendar system heavily (again synced with MobileMe), and I could tick off items while planning my week.
Somewhere along the way I started to hear about the Getting Things Done system. This isn’t a computer program but a system described in one of those self-help type books on productivity. I’m extremely suspicious of such things. But I’m always willing to give a suspicious listen. The basics of the system are to state your goals, align your projects with your goals, break your projects into actionable items, and do your items within their necessary contexts. I started planning in the GTD style, and quickly found that a simple to-do list won’t, so to speak, do. May sound confusing at first, but, as I hope to show with OmniFocus, the software based on GTD I’ve decided to use, it’s pretty powerful.
I’m still not buying the book, though…
OmniFocus offers a 14 day free trial period. The first time I tried OmniFocus (spring 2010) I didn’t quite get it. Now I know that, before trying a heavy-duty productivity program,
you should first start living with a perpetual to-do list and get used to that first and
you have to have a LOT of things to do before nested to-do lists, projects, and contexts really pay off.
So my experience with the Mac to-do list wasn’t all for nought and my crazy grad school schedule ends up being a plus. I’d suggest picking a free to-do list solution, living with it for a week or two, and then doing the OmniFocus 14-day trial. Make sure it’s a busy time, too- otherwise you won’t see the benefits.
Aside: There’s no free trial on the iPhone version of Omnifocus, but, if you find yourself in the moderately-in-need of a to-do program category, this might be the better solution. It’s got most of the major functionality and if most of your to-do aren’t connected to links to pdfs (i.e. grad school reading homework), you may not miss the stuff the desktop app has. Also, the desktop software is a whopping $80 ($50 for students), while the iPhone app is only $20. For those of you hesitant to buy software, seriously consider this. It’s a great app under active development.
Screenshot of my Projects Library. This view is where I do my planning, listing out all the actionable things I can think of for each project. There’s even a place, the Inbox, to dump to-dos and assign them to projects later on.
The Library is where you plan. Think of all of your goals, and make sure you’ve got a project that corresponds to each. So, if you want to keep in touch with friends, make the Keep in Touch with Friends Project. Then start listing the actionable items (buy stationary, set up facebook account, write a letter to Johnny, friend Martha on facebook) within that project. You can make items recurring (though the recurrence support is a little bit basic…), to support Write Martha a Letter popping up every 2 months, call Johnny every week, etc.
If other Projects are related to each other, such as your classes in grad school, just group them in a folder. You can even nest projects within other projects.
A great thing about organizing things this way is that you can highlight a project or folder and Focus (Ctrl-Cmd- F). This will hide everything not connected with the project(s) or folder(s) you selected. This means that you can forget about Martha long enough to get your schoolwork done. It also lets you make a to-do list of fun stuff (yes, fun stuff) that you can focus on during the weekends. Forgetting about other stuff is crucial to getting the thing at hand done, and OmniFocus supports it.
This is a focus on school tasks. Notice that in the 'Projects' view in the back, only school-related projects are there. In the 'Due Soon' window on top, I don't see all the to-dos I have for around the house or applying for summer internships.
The screenshot above includes an item, ‘Revise Personal Productivity Post’. …Check!
Notice that I have lots of folders within ‘School’ and they’re not all relevant all the time. So, for instance, when I’m doing my homework, I’ll focus only on the ‘Class’ folder to screen out stuff like the W-9 I have to fill out, etc.
Contexts help you tackle tasks that require similar resources – going to the Library, for instance – at the same time, regardless of which project they’re for.
Contexts are a way of looking at tasks through the lens of what resources you have. For instance, say you’re on your Mac and want to get some things done. The Mac context allows you to filter out tasks that requre your girlfriend or being on campus, so you can concentrate on what you can get done now. I really like this view for things like getting library books, filling out paperwork in x office, and going to the grocery store.
Doing what you can now is a great way to be super productive. Need to grab a library book? Add the task, select the Library context, and put it on your calendar during the week. Any other books you add to the list in the meantime will be waiting for you in the Library context so you can make sure to get them all done when you’re there. Same thing with groceries, filling out forms in your department’s office, and responding to emails. You can even make a person a context, so you can remember to do specific things the next time you see your significant other or your tax guy.
Like OmniGraffle, another excellent product from the Omni, the interface is excellent. I have some small gripes with it here and there, but in most cases there is a way to do everything I initially complain about that makes better sense than how I would’ve done it. That’s the classic tradeoff in interfaces- intutive may not always equal efficient.
Omnifocus has this great little menu bar that leads you right to your outstanding tasks
That said, in other cases, like the menu bar to the right, it works exactly the way you’d expect. You can tell it to display overdue (red) and/or due-soon (orange) tasks by context (as shown). The icon will always show you the total. (I was busy when I took the screenshot to the right…right now, the icon only says 2, one of which is writing this post…).
As a professional ap, pretty much everything is customizable. So, above, where I told you that Red means overdue…well, you can pick any color you want. The fonts for various sections of the ap are all separately formatable in the preferences. Keep that in mind and take a second look at those screenshots.
Upcoming: iPhone app review
OmniFocus supports syncing across multiple macs pretty well, but it also syncs with its spiffy iPhone app. I’ll review that app, with screenshots.
**Update: Mendeley is now the center of my scholarly workflow. In about 8 months of using it, I’ve accumulated 1500 scholarly papers from school, the lab, and my personal research, using about a gigbyte of cloud storage. How else could I have managed?
It offers syncing with a web account (500MB free), so my citations AND pdfs are always nearby. It allows group joining/sharing of PDFs (though the group has to be closes/restricted access, to avoid copyright violations, I suppose). It’s got an excellent full-screen PDF reader and annotator:
There’s even an iPhone app that lets you read or send citations and papers from your phone.
I really dig it. I’m using it manage all of my class readings (extra love to professors who provide links to PDFs), take notes, and collect interesting citations. Along with OmniFocus, it’s an indispensable part of me getting things done.
Highly recommended. It’s free-as-in-beer, which means you get some in the hopes that you buy more. I think it’s definitely worth it. I’ll likely start paying for web space once/if my 500MB gets filled.
Ed.: I started paying for extra space, then got mondo space for free when I became UNC’s Mendeley Advisor in May 2011. But I’d still be paying the $5/month even it it wasn’t free. It’s that awesome.
**[Older] Update: I got started on Mendeley. Am setting it up now.
Mendeley does support Zotero and CiteULike live-updating!
I also have set up Zotero, but don’t really like it because it’s stuck in Firefox. I use Safari most of the time because Firefox is so dern slow.
Mendeley's link-based importer in my QuickBar in Safari
Even if I don’t ever use Zotero myself, this integration means that I can join Zotero-based groups. That was a big concern for me: making sure the system I chose could play well with others. I’m at a grand total of 3 sources right now.
So far I’m impressed with Mendeley’s link-based import (even though it utilizes pop-ups).
As I put Mendeley through its paces, I’ll pay particular attention to:
Adding web-based sources
Adding sources directly from the Library’s page
Annotating and noting on sources (e.g. pp. 112-123, or ‘as contra Foucault’)
Searching through online groups and publically available bibliographies
The below written before I had chosen a system to try:
When I was an Undergrad at Duke they provided Endnote (it was v4 back then if I recall) as a citation manager. The newer versions of Microsoft Word have had citation support through the powerful, underutilized, and clunky document field functionality (the same features that will give you a Table of Contents or even a linked index).
RefWorks is UNC’s offering to its students and faculty. It’s a web-based citation tool and, like all niche tools, has come core functionalities that make it worth considering and some drawbacks that users love to hate.
Zotero is the free open source offering. I’m a big fan of free open source, so I really want Zotero to rock. And it looks like it’s come a long way and does rock. See niche software comment above, though. It’s first on the list to test because if it’s good enough I’m done looking. I’ll not be at UNC forever, and I don’t want to have to pay for access to my citations if I am using RefWorks.
CiteUlike is another contender. Recommended by University of Chicago’s Mark Olsen.
Mendeley is another contender, with both a web and an application-based client. First mentioned to me by Jason Priem.
I’m currently selecting from these alternatives because I desperately need to manage my lit. Stay tuned for thoughts and reflections.
Aside: George Mason’s Center for History and New Media is the impetus behind Zotero. As this software becomes more and more useful in the production of scholarly work, it’s interesting to factor a project like Zotero into the mix. The scholar who provides serious and ongoing help to the production of Zotero may benefit his field- and all of academia- in a way that’s hard to measure. Ultimately, unless his institution supports this contribution, his open-source contributions will be deleterious to his tenure chances. I suppose the non-tenure track staff/associate research prof plug belongs here, as well as a big up to tenured profs who spend time & energy on open source.
The department of Medical Informatics at Columbia University maintains the Medical Entities Dictionary, a concept-oriented repository with links to many of the vocabularies and codes below.
The Veteran’s Association maintains an open source health records/clinical support system called VISTA
HL7 is a pioneering data interchange format for health data. It focuses on interpretability between different formats. This allows specialist medical fields to choose best-of-breed equipment and information systems without compromising the ability to communicate with other aspects of the health system.
LOINC is a controlled vocabulary for clinical and laboratory results. This allows standardization of observations like “patient had a temperature of 105″ ”Strep positive” etc.
CPT is an established standard of medical procedures maintained by the American Medical Association. For instance, ’26010′ is ‘Drainage of finger abscess; simple’
ICD-9, ICD-9-CM, and ICD10 are standardized, hierarchical controlled vocabularies that have their origin in linking billing with the patient’s chief complaint. Like all controlled vocabularies, they have their problems. Also, since they’re used for billing, they’re not perfectly representative of diagnoses, etc. They demonstrate the huge potential of standardization, though, and are the basis of many secondary analyses of clinical data.
SNOMED is another medical controlled vocabulary that focuses on coverage scope and multilingual support.
UMLS is maintained by the National Library of Medicine. It is the most comprehensive system (6.4 million concepts from over 100 source vocabularies) and subsumes and interrelates many of the systems above. It includes a Metathesaurus, a Semantic network, the SPECIALIST Lexicon, and a Metamorphosis configuration tool. There have been concerns with this service’s efficiency and precision, but it’s pretty much the only attempt to unify so many standards.
This site is a collection of cool tools, articles, and links I’ve found. It’s for my own reference and convenience and I hope it helps you too! If this site ever seems a little out of date, well, check me on Academia, Linkedin, or GitHub because I’m probably up to something fun. And, as of 2012 I’ve been using my G+ page as a blog of sorts.
In general, the topmost categories below are most current.
You’ll find my Resume & CV under the Pages to the left.