Inching closer to a viable PDF workflow

05/28/2012 8:27:06

When I was in medical school, various professors and attendings would give out paper copies of journal articles that they recommended we read. I would tear out copies of journal articles I found interesting from the few (not so good) journals that actually came to my house. More often than not, I would tear out articles with the intention of reading them, but instead they would end up in an ever growing pile — never to be looked at again. After only a year or two, this collection of articles had grown large enough to take up several large boxes. I tried to start a filing system, and began organizing the articles according to whatever seemed to be the central topic — a sisyphean task since most articles of interest cross boundaries into several topics.

At that moment, however, I quit my Emergency Medicine residency. As I would be moving again, and spending an unknown amount of time in a limited amount of space, and I realized I would never go back and read all those unread articles. So I did the only sensible thing. I threw them all away. It was very freeing to get rid of the clutter. I was relatively inexperienced, and the reality is that my selection of articles was probably not very good, but it does sadden me that there is no way to go back and thumb through that collection. But I never would have carted those boxes around so they had to go….

Fortunately, something key happened between medical school and my second residency — everything moved to PDF. Virtually every journal article I need now comes in PDF format. The few I need that our library does not subscribe to I can order online through the InterLibrary Loan (ILL) program, and they arrive in my inbox as a PDF - sometimes a pixel perfect downloaded PDF, and sometimes a scanned PDF. But either way, no more piles of paper.

But I still am left with the problem of organizing, searching, and yes, remembering to read all those PDFs…. Fortunately, I am finally starting to find tools that work fairly well for accomplishing these tasks.

Because I use a Mac, many of these tools are Mac-centric. If you’re reading this article, and don’t use a Mac — I really have to question why. There are just no tools like this available for any other platform. Once you see how powerful they are, I can’t imagine doing this any other way.

Obtaining PDFs

Most of my reading interests as concerns PDFs relate to medicine. So I discover articles using tools geared for medical literature. I am sure similar tools exist for fields outside of medicine, but I am not familiar with them.

Pubmed — the de facto place to look for information. The search features of the site aren’t bad, but they’re not great. What is great, however, is that the site links to the full text version of articles when available, and also indicates which belong to a growing collection called PubMed Central - freely available full text regardless of subscription or institutional affiliation. Even better, you can set up saved searches through “My NCBI” that will run every week and email you new citations matching your criteria. I am up to 15–20 saved searches that arrive in my inbox every Saturday morning and keep me apprised of the latest articles on various topics of interest.

Pubget — a relatively new site that provides an alternate front-end to Pubmed searches. The search seems a little easier, but perhaps a little less powerful. But what I like is that it makes it easier to get to the PDF, provided that your institution is one that is configured to work with the site. They were helpful in adding my institution when I asked, and it didn’t take too long. They have an iPad app, but I don’t find it very useful.

Google — sometimes it really is just as easy to just search the web. I enter the article’s title, followed by "pdf" (in quotes). I usually do this if I am unable to find a pdf through other means, and sometimes I am surprised with what I can find this way.

Chrome search bar — whatever site you use to search, do yourself a favor and configure your browser to search it when you use a keyword. I simply type “pubget:” or “pubmed:” in the address bar of Chrome, and then a search phrase. I am instantly taken to the appropriate site to get the results of my search. Very useful

Organizing PDFs

Once I find and download a PDF, I need to do something with it. My first step is to save it to one of two folders — “Papers to Read” or “Papers to File”.

The next step for me is to tag the files with various keywords. Sometimes I do this when I download. Sometimes I do it when I read the article. Sometimes I forget to do it, and have to come back later.

I use the OpenMeta “standard” for tagging. It has it’s pros and cons, but there are very good tools available now that support it, and it becomes really easy to organize documents. Apple may choose to break it in the future, but something else will almost certainly take it’s place. And it is soooooo flexible.

I use Tagger by Ali Rantakari to apply tags to various documents. It’s a simple, straightforward app, but where it excels is in making it easy to apply tags to the document you are interested in. I simply call the app from the Spotlight search bar, and it knows that I want it to work on whichever document(s) is in the frontmost window. If I’m in the Finder, it opens whichever documents are selected and I can batch tag them. If I’m in Preview, it opens the document for the frontmost window. Same for almost any other app that I use. I also created a command line shortcut so I can tag files easily from the Terminal. (Disclaimer: Ali wrote some other code that I used to build MultiMarkdown Composer) There are plenty of other apps that support OpenMeta tagging, but Tagger is my favorite. If you like it, please consider making a donation to Ali to support his great work.

Separately from the tags that are applied, however, I need a place to put all of these PDFs. The system that has turned out to work the best for me is to put them all in a master folder in my Dropbox folder. This way they are available to me via the iPhone Dropbox app wherever I am. The search feature on the iPad/iPhone app isn’t great, but it usually will let me find the article I am interested in so I can share it with students and residents on the fly. (If you decide to join Dropbox, please consider using my referral link - we will each get a little extra space on our accounts).

Within that master folder, each PDF is stored inside a folder named after the author’s last name. The PDF filename is the title of the paper. Some people want really complicated hierarchies (e.g. author/journal/year or something). I simply wanted a way keep the list of PDFs short enough within any particular folder to be manageable. I can simply type the author’s last name in Spotlight, and open the folder for that author to see all of his or her articles. I don’t have enough articles from any single first author to overwhelm this approach. Sure, occasionally articles get blended from different authors sharing the same last name, but it doesn’t affect my ability to find the articles. And remember, I rarely dig through this hierarchy to find what I’m looking for. I just need someplace to put these PDFs so that they aren’t a total nightmare to look at.

I use Papers to actually store my PDFs into those folders. I have written before about Papers. It’s a beautiful appearing app. It makes it really easy to match a PDF with the appropriate citation information. It will automatically file the PDF once you match that information (e.g. into my author/article title.pdf scheme). It sucks at synchronizing with the iOS version of Papers however. Papers is the app that I keep coming back to because no one else has come up with something better. I recently tried Papers 2 again, however, and it seems to be a little bit better. We’ll see if I pay to upgrade. But again, where Papers does excel is in taking a random PDF and helping you match it to the appropriate citation information and name the PDF file appropriately. Once the PDF is tucked away, I usually delete it from Papers (the database seems to get corrupted often, so there’s no point in saving that information only to lose it later — perhaps Papers 2 is finally better on this point.) Papers can also help with setting up bibliographies, but I’ve never used it for this (I use BibDesk).

Tagging System

Everyone needs to come up with a tagging system that works for them. There is no “one size fits all” approach. But please do yourself a favor — always tag in lowercase, and all of your tags should be a single word (e.g. toread, not to read). You’ll thank me later.

Tags help me organize my documents by content (independently of where the PDF is actually stored). For example, all of medical PDFs get tagged with medicine — this is mainly to help me limit searches when necessary. Articles I still need to read get tagged with toread. Other tags relate to specific specialties within medicine, or areas of interest, or projects I am working on, etc.

For the tags to be useful, you must have enough of them to distinguish one topic from another, but not so many that you can’t remember which tag would have been applied to a give situation. I recommend that you periodically revisit the tags you have in use and look to see if any can be merged, deleted, split, etc.

I also recommend that whatever tool you use to apply tags should offer autocompletion. This helps you keep your tags consistent, so that you don’t end up applying different variations of a tag to different documents (e.g. toread, to read, things to read, todo, etc.).

Finding a Needle in a Haystack

Finding that article you vaguely remember reading that relates to the matter at hand is the whole reason for doing all this work. I have a couple of ways to find specific articles depending on what I’m doing.

One of the easiest is simply to use Spotlight. The reason I stuck with OpenMeta tagging is that it is Spotlight-compatible. Which means a quick “Cmd-Space” and I can search via something like “tag:education tag:physicalexam” to find articles that related to teaching the physical exam in medicine. You can either pick an article right from the Spotlight list, or you use the “Show All in Finder” to create a new window with your chosen articles. You can do amazing things with Spotlight - it’s worth reading more about. If you really want to channel your inner geek, you can run Spotlight searches from the command-line and do other things with the results. But that’s getting outside the scope of this article.

Another interesting thing you can do with Spotlight is to save your search in the Finder. You can come back the same search any time, and it’s always up to date. I haven’t found this so useful, but you may.

I also use another app from Ali Rantakari, TagLists to find articles. I set up a couple of lists in this app, such as “toread && (diagnosis || cognitive || decision || error)”. This list shows me the articles in areas of interest that I have not read yet, for example. I rarely, if ever, use this app to find a specific article.

As I mentioned earlier, because my PDFs are all in a folder in my Dropbox account, I can use the Dropbox app on my iPhone to search for an article on the go. If something comes up on rounds and I know I have a good article, I can email it to the team before we get distracted by other things.

Finally, we come to DEVONthink. DEVONthink is a fantastic tool. It is not so aesthetically pleasing. It is complex and sometimes not very intuitive. But it’s really powerful and does some great things. I cannot begin to explain how to use this app, but I can tell you some of what I have learned that changed how I use it.

DEVONthink

First, there are several versions of DEVONthink. I use the Personal version, but have also tried the other two versions at various times. They get a bit pricey, and I suspect most users never take full advantage of the advanced features.

I don’t recommend storing your PDFs inside the DEVONthink database itself. If you do that, it becomes difficult to access them in other ways. What I did was create my DEVONthink database, and then use the File->Index… command to have DEVONthink index several folders but to leave the files in place. I index two folders - the Dropbox folder containing my PDFs as described above, and the folder I use to store all of my text notes (which are shared between nvAlt on my Mac, and MultiMarkdown Composer on my iPhone/iPad — that’s also a separate article….)

Both sets of data (PDF and text files) share a common tag set (so I can search via Spotlight or whatever). DEVONthink can read the OpenMeta tags, and synchronize them to it’s own tag system. Though to be honest, I’m not sure that the tags are really all that useful inside of DEVONthink — they could do more with it.

Where DEVONthink excels is search. Once it indexes your files, it has a lightning-fast search feature. Whenever you view a file, it can show you similar/related files. If you store your files in the database, you can create an organization structure in the database and it learns which files belong in which folder and can automatically file them for you. In practice however, this gets really tricky and you end up accidentally creating duplicates of your files and causing yourself a bunch of headaches. So I stopped doing this, but others most certainly find it useful.

I can’t even begin to scratch the surface of what DEVONthink can do. I don’t use it all the time, but I do use it to index my documents, and it’s useful for the more challenging searches.

The latest updates to DEVONthink may have finally gotten synchronization with DEVONthink To Go working. DTTG is the iOS version of the app. It’s fairly limited, but does make it easy to read your PDF’s on your iPad. I’ve tried using this in the past and always grew frustrated, but recently started using this again and it seems to be working better. We’ll see.

Reading Your PDFs

At some point, I have to quit wasting time by playing with organization, and tagging, and actually have to do the work of reading these articles. At one point, I would print them to paper so that I could read the PDFs, easily take them from home to the hospital and back, take notes, etc. Then I realized that after reading the article I had a choice — keep the paper copy, or throw it away. Obviously, keeping all of the paper copies defeats the point of going digital, but throwing it away loses the highlighting or annotations that I had made.

This prompted me to start reading more of the PDFs electronically. I sometimes read them on my Mac in the Preview app. I can then use Preview’s annotation tools to highlight certain passages (I rarely write anything in annotations, but often highlight).

But I still struggled with a good workflow for reading on the iPad. It seems that recent updates to DEVONthink have offered a solution.

I tag unread articles with toread. In DEVONthink on my Mac, I created a smart folder that shows me articles that are flagged toread and have been tagged with whatever tags I’m interested in at the moment. I then mark those files as unread within DEVONthink (which is independent of whether they are tagged unread). I can then grab a bunch of them and replicate them into the “Mobile Sync” folder in DEVONthink. It’s important that you replicate the PDF, not duplicate them or move them. Replicating is like making an alias in the Finder. The PDF appears in two places, but the app knows it is still a single file. You can delete the replicant without deleting the original.

I then sync with DEVONthink To Go on my iPad and iPhone. This gives me a portable “reading list” that I can take with me for whenever I have time and want to catch up on some articles. In DTTG, I simply look at the “Unread Items”, pick one of interest and read it. It will be marked as read, and when I next sync with my Mac, the read status will be reflected there. I can also flag an item or label it on the iOS device, and that will make it back to the Mac as well.

DTTG does not support annotations however. This means I can’t see any highlighting in the article (not a big deal), but I also can’t highlight something for later. Fortunately, GoodReader handles annotations and then some. Again, I can’t begin to go into all the features GoodReader offers. But what is important here is that I can send the current PDF from DTTG to GoodReader, annotate it in GoodReader (e.g. highlight, add text, draw on it, whatever) and then send the PDF back to DTTG. DTTG recognizes the PDF as being an updated version of the same article. And even though you can’t see them, the annotations are still there. When you next sync with your Mac, the updated version of the document will be uploaded, and saved in the appropriate place on your Mac. Complete with annotations. And the best part is that the OpenMeta tags are preserved, since DEVONthink on the Mac understands them and keeps them intact when you upload the modified version of the PDF.

This means that I can read and highlight a PDF on the go, and have my changes saved back to my master library when I next sync.

You can’t change tags on your PDF on the iPad, however (at least not yet?). So what I do is label the file with the “To Do” label in DTTG. When they get back to my Mac, a Smart Group will gather them so I can go back and tweak the file as needed (e.g. remove the toread tag, apply other tags, etc.)

Finally, I have a Smart Group in DEVONthink on my Mac that looks for documents tagged toread but that have been marked as read. I verify that these documents have in fact been read, and then remove the toread tag. At that point I know I’ve read the article, and don’t have to think about it again until I decide to search for it for some reason.

Limitations

This system isn’t perfect. It works really well for me, but might not work for you.

OpenMeta is a Mac standard — it won’t work on other operating systems. OpenMeta is not supported on iOS devices, so you have to go through an app that supports tags in it’s own way (e.g. DEVONthink and DEVONthink To Go).

Not everyone likes tagging. I previously thought it was something of a waste of time. I don’t tag everything, just my PDF collection, and my text note collection. I can tell you that having these tags in place makes Spotlight searching much more productive as I can zero in on exactly the sort of files I want. It becomes easy to see what articles I have not read yet that relate to a certain topic. I can quickly create a saved search when I am working on a lecture or an idea so that I can see a bunch of related documents together. It works for me, but perhaps not you.

In any event, hopefully this sparked some ideas for someone, and I welcome feedback and comments!

Old Comments

By Fletcher on May 28,2012 10:59:43

Testing that comments and OpenID still works.

By Asaf on May 29,2012 07:15:44

Thank you for a very useful review of your PDF workflow.

You might want to check out Bookends as an alternative to several of your applications: Detection soy PDF metadata, reference manages and seamless synchronization with their mobile version.
I would be interested to hear how you keep and organize notes about your readings; only in your “text notes”? Do you save as notes also the underlined sections from your PDFs?

Thanks,

Asaf

By Fletcher on May 29,2012 20:45:51

The way I keep my notes depends on what I’m doing.

long term notes/brainstorming ideas go in nvAlt on my Mac, which synchronizes to my iPhone/iPad via Dropbox.
I don’t take many notes in the PDFs themselves - most of my annotations consist of highlighting to help me quickly review articles down the road
When I’m in the actual writing/preparation phase for a talk, I will often print the key articles and jot notes on paper - I then tend to spread things out on a big table/counter to help organize. The last talk I prepped for I even ended up taping things all over the walls like some movie. I’ll probably end up doing that again.
I don’t currently export the highlighted annotations from my PDFs.
I have not looked at Bookends lately.

By Howard on June 16,2012 16:41:26

A good research workflow is something I am super passionate about. I actually developed mine while writing my dissertation, which was maybe not the best time to do it, but it worked out. The nice part about developing my workflow under those conditions was that it was tested under an intensive endeavor. If you’re ever feeling so inclined, you can check out some of my post-dissertation accounts on my wordpress blog. At the end of the day, Papers was the hub of my project. I used the built-in text editor “attached” to each PDF in Papers to write the sections of the literature review. So, as I was writing each article I was not so much annotating as I was composing the text as it would more or less appear in the dissertation. Super efficient. I could go on and on, but I’ll leave it to you to review using the link if you so choose.
I was surprised to see that Papers’ database becomes corrupted so often for you. I cannot remember this ever happening to me, and I have been using Papers since 2007; at this point I have just under a thousand articles, which I know is not a ton, but substantial enough, to be sure (by the by, Papers 2 was a disappointment when I gave it the trial run. I don’t see myself paying to upgrade). I couldn’t help but feel that, although this workflow functions nicely for you, it involves a few extra steps, which in part is due to a corrupt Papers database. Nevertheless, it gave me some good ideas. I am learning LaTeX, and went ahead and fired up Bibdesk after I read that bit of your post. Keep up all the good work, and thanks again for MMD!

By Bill on September 3,2012 12:44:23

I have been looking for a useful solution for my Mac (at home). I’d like to get something as simple as my work (Win7) setup

On Windows (at work), I get my papers from JSTOR and from the various publishers websites or the authors preprints. My workflow is very simple 1. OCR the document if it is not. 2. Store using paper name (author, year) in a documents hierarchy (rather loose) 3. Index the entire document using X1 Hence I just need to use some text to recall papers on the topic in X1. Occasionally I’ll add keywords or comments with Acrobat. If I want to I can also stick new pages (e.g. extensive commentary) onto the document, and that is indexed as a part of the document. A side benefit is that I can search for documents that reference another article…

Is there anything like X1 for the Mac?

By Fletcher on September 7,2012 13:59:45

Bill,

I’m not familiar with X1, so I’m not sure about a similar app for the Mac.

DevonThink (at the higher end versions) offers an OCR feature, as well as full text indexing of the document and notes. If you haven’t checked into that yet, it’s worth a look.

If that doesn’t look like it works, add a couple of more details about what you’re looking for specifically, and I can brainstorm a bit more…

By Fletcher on September 7,2012 14:03:33

Howard,

Just catching up on podcasts, and heard your segment on Mac Power Users. Congrats!

When I test Papers, especially with iOS sync, the database usually would become corrupted within a few hours (or even minutes) of testing, and with a database size of 10’s to maybe a 100 PDFs. Nothing that should tax its capacity in the least.

More frustrating was that my attempts to share info and get support on the user forums led to my posts being edited by the moderator (with no indication to others that it was edited) that basically removed my criticism of the app….

I’m glad it works for you, and others. I’m not willing to trust my data to it any more.

That said, the workflow for identifying a PDF of a journal article is fantastic!

By Phil on March 12,2013 16:12:30

Hi Fletcher,

I enjoyed this article very much. One problem I am running into with openmeta tags is that they get deleted from a PDF if you annotate the PDF on many popular iOS apps, such as PDF Expert or iAnnotate. Then, if using Dropbox for sync, the PDF gets synced back to the Mac without any tags.

Have you noticed or experienced this? If so, have you found any workarounds?

Thanks.

By Fletcher on March 12,2013 22:33:57

Unfortunately iOS doesn’t offer native support for file metadata, so they do not work with any iOS apps to date. Not sure why Dropbox doesn’t at least preserve them or support them in their API, but they don’t.

I’m experimenting with a new workflow, and unfortunately it doesn’t really support OpenMeta either.

By Fletcher on September 9,2013 20:26:03

An update about Papers for Mac.

First, a disclaimer. I purchased version 1 of Papers. After working with the developers there a bit on a different issue, I was given a license code for Papers 2.

I have now been using Papers 2 (currently 2.6.4) for quite a while, and it has been much more reliable than version 1. There were still a few sync issues with iOS, but I found that if I ensured that only one iOS copy was running at a time (e.g. not iPhone and iPad at the same time) that things seemed to be reliable.

When I get some free time, I was thinking about writing an updated workflow article but haven’t had time yet. I did, however, want to mention that Papers 2 finally seems to be stable enough to use for real work, including iOS sync.

Fletcher Thompson Penney owner@fletcherpenney.net