Home | Full table | Info & help |

About this site

This web site provides data on the number of records in UK Institutional Repositories over time. The data was collected from late summer 2006, and has been collected weekly ever since.

What is this

This website shows some basic information about the number of records in UK based Institutional Repositories (IRs). Perhaps its key feature is that it has collected this information over time, so it can show historical growth of various IRs. see: table showing number of records in UK instiutional repositories over time

What can you do on this site?

  • See a table showing the number of records in UK IRs, with a column for either each week or month since July 2006. The table can be re-ordered and exported as a text file.
  • Compare a number of IRs, and see a pretty graph
  • Will hopefully add more features over time.

How does it work? Where does it get the data from?

Once a week (on a Tuesday, for no good reason) a simple script collects data from the Registry of Open Access Repositories (ROAR) based at the University of Southampton (ECS). The script accesses this simple text list of repositories (see the ROAR FAQ for more) and adds the total of records (and a few other details) for each IR which is registered in the UK to a database. This script is written in perl, and is rather crude (like everything I write!), it has little error detection and will happily record nothing if it can not connect to ROAR. The pages you are reading now are written in PHP and simply extract data from the database table.

Lies, more lies and statistics

It will be silly to base success on - or use as a measure of performance - the number of records in an IR. This site has the potential to encourage people to compare IRs by record count and conclude one IR is doing 'better' than another. The number of records really isn't a good measure of anything. IRs are primarily about full text documents and articles, not simply records. Some Organisations will have historic publication databases which they may put in to their IR. Plus IRs have different criteria, those with a wide remit such as including learning objects, images, even newsletters, may well have higher counts than others though it says nothing as to which has the largest pool of research.

What's a more successful IR: one with many records, or one with a few but each with a publicly available full text file of the research? One with many basic records of poor quality, or one with only a few that are complete and accurate? One with many records for the Library's Annual Report, Newsletter, powerpoint presentations etc, or one with less items but each being a piece of high quality research that has been published in a highly regarded journal? This website can do nothing more than what it does. That is show the number of records in each IR. Use caution when drawing conclusions from this data.

ROAR. I've noticed that the numbers in ROAR can sometimes (though rarely) fluctuate a little, and the number recorded here relies on a repository's OAI interface working correctly and accurately. It should also be noted that, from what I can tell, ROAR connects to each IR every few days. Therefore there is a delay from new records being added to an IR, to them being recorded in ROAR, and then once a week, recorded by this website. So when this website shows a particular number of records for an IR on a given date, bear in mind that the IR in question may actually have more (or even less) records on that date.

So this website is at best inaccurate and at worst misleading?

Yes.

So no use at all then?

Well, actually, some may find it useful as a guide to growth over time of UK IRs, based just on the number of records, not full text items.

Full text estimates

ROAR records two extra values for a number of archives: "PDF/MS-Word" and "Research Papers". You can see what these mean in the ROAR FAQ, but basically the first is a good indicator of the percentage of records that have a full text document available. These are not collected for all IRs, and do not seem to be updated (i.e. they never change). The percentage is based on a sample of 1,000 records in the IR.

This all leads to...

RANDOM GUESS total docs

This site will show a "RANDOM GUESS total docs" column in the details for some IRs (see this example). This is based on the idea that if we have the total number of records, and we know the percentage of these that have a PDF/Word document available, then we can work out how many full text (pdf/word) documents are available from an IR.

However, for reaons given above (see last question), this is simply not an accurate count, at best t can be called a rough estimate. Hence it is a random guess at the total number of documents available.

Why only UK IRs

No real reason! Originally this set out as a way to compare the IR I was running with other similar IRs as a basic form of measure, I was much more interested in the rate of increase rather than a simple total. In time I may expand this site to include more IRs. ROAR includes many types of repositories, this site only records information for those that are listed as institutional or departmental repositories.

How can I see what is being stored for a given IR?

At the bottom of the table showing weekly record totals on a page for one specific IR you will see a link "raw data that is stored for this repository". This will show all the data stored for a specific IR. It's not very interesting. The Full text estimate field was only implemented in April 2007, it will be empty before this time.

My IR isn't shown

Is it on ROAR? If not, register it! If it is already on ROAR, is it in the UK, is it listed as a 'Institutional or departmental' repository (the only sort this site collects, see this list in ROAR)? Has it only just been registered with ROAR, it can take a while to be full setup and data being collected. Does it have any records (according to ROAR), if not, it will not be shown. Contact me if something still seems amiss

The export functions don't work properly

There are two choices, tab delimited, and comma delimited (with each field value in double quotes). If you have trouble opening these in to Excel, try saving the file first and then opening in Excel (you can also try changing the filename to just txt). This is something I need to work on, though as far as I can tell the content-type and file extensions are correct.

How do you make the cool charts?

Using the easy and useful Google charts Api.

Bug list

IRs that have changed their name will show up in most places with their original name, not their currentname.

Irs that have changed their name will show up in table.php twice (or however many times they have changed their name .

Who's responsible for this atrocity

It was developed by Chris Keene, with the webpages developed on a few wet Sunday afternoons in March 2008. I'm based at the University of Sussex, and it is currently hosted on a Sussex based server, though apart from this is not affiliated in any way with the University.

You can contact me at chriskeene at gmail.com, I love feedback!

Contact: chriskeene@gmail.com | Using data from Registry of Open Access Repositories (ROAR).