Smooinaghtyn: obbyr

Showing posts with label obbyr. Show all posts

Monday, 3 June 2013

Foaddey

Skeeal firrinagh.

Wednesday, 16 January 2013

Excel: calculating composite actual and proportional use with COUNTIFS and SUMIFS

This is the second part of a discussion of my recent capacity-usage stats project tracking where and when people sit in a library. It might be helpful to any project involving nested data, comparing incomplete datasets, or datasets which increase in size (as in, number of rows of data) during the project. You can find the first part here.

This part of the job involved COUNTIFS, SUMIFS, INDEX, COUNTA and named ranges to track composite average usage while compensating for patchy data.

This half of the project concerned tracking usage of the library building's graduate study space, which is not part of the library, and therefore was monitored separately. People were interestd in the patterns of use over time, and in the proportion of capacity that was actually used, to help establish how efficiently the space is exploited. This could, for example, show that there was insufficient study space, or that the space was underused and could be better put to some other purpose. The main complication of this task, compared to the last, was that counts in the study space were much patchier than in the library. All counts included each area of the library, but sometimes the staff didn't get round to counting in the graduate space, since this involved going to a different area of the building.

Because the analyses include potential versus actual usage, the number of counts taken influences the figures. This means that it's important to only include data from counts that actually included the graduate space, otherwise any analyses will behave as though the count was zero, undermining accuracy and leading to under-reporting.

The potential-use-tracking field, which checks the number of counts taken and the amount of seating available, needs to include only those counts which actually included the graduate space (those with a non-empty value in that column). The actual-use-tracking field needs to only include rows with a count entry for the graduate space (only sum values from rows that include a non-null value in the "graduate space" column).

Luckily, at this point I discovered SUMIFS and COUNTIFS.

To begin with, I set up Time entries as rows. For columns, I tracked Number of Counts, Total Users (in that timeslot), Potential Use (at full capacity), Mean Use, and Percentage Use.

As I mentioned last time, the range of cells with recorded values in is called Readercount, and the set of capacity values for seating types is called Seats. To count cells based on multiple criteria, I turned to COUNTIFS. I calculated the number of relevant counts as follows:

=COUNTIFS( INDEX(Readercount,0,time column), $Bchosen timeslot, INDEX(Readercount,0,grad space column), "<>" )

=count up( all entries in the Time column within the range Readercount, if they match the chosen timeslot, and if the value in the grad space column is not null )

For summing the number of readers at various points, I had to use SUMIFS instead. This handy value sums up cells that meet multiple criteria.

=SUMIFS( INDEX(Readercount,0,grad space column), INDEX(Readercount,0,time column), $Bchosen timeslot, INDEX(Readercount,0,grad space column), "<>" )

=sum up( all entries in the grad space column within the range Readercount for which the time column in that row matches the chosen timeslot and the value in the grad space column is not null )

Potential use was:

=COUNTIFS( INDEX(Readercount,0,time column), $Bchosen timeslot, INDEX(Readercount,0,grad space column), "<>" ) * INDEX(Seats,0,grad space capacity )

Average use was just total users divided by number of counts taken for the chosen time slot. Rather than build the formulae again, I simply referenced the existing cells.

=D4/C4

In other words:

=Sum of users / Number of counts

In full, this is:

=SUMIFS( INDEX(Readercount,0,grad space column), INDEX(Readercount,0,time column), $Bchosen timeslot , INDEX(Readercount,0,grad space column), "<>" ) / COUNTIFS( INDEX(Readercount,0,time column), $Bchosen timeslot, INDEX(Readercount,0,grad space column), "<>" )

You can probably see why I went for the cell references...

Similarly, percentage use was:

=D4 / ( C4 * INDEX(Seats,0,16) ) * 100

=total users / ( counts taken * grad space capacity ) *100

In case anyone's curious, there's a couple of reasons why I used range references for the seating capacities in this project, rather than just plain numbers.

It makes the formulae more transparent, even though it also makes them more complex. Anyone reading through it can quickly work out, and confirm by finding the range in question, that INDEX(Seats,0,16) refers to a fixed value that corresponds to a seating capacity. With plain numbers, it's much harder to work out where the numbers are coming from and why they're there. This is important for debugging, and also makes it easier for someone other than me to maintain the spreadsheets in future.
It allows for changes in measurement type. Some of the capacities are somewhat hazy. Students sometimes use window bays as improvised seating. The graduate space has a few armchairs as well as the work desks; at present, students are counted regardless of where they're sitting, but only the work desks are treated as "capacity", because the aim is to determine if there are enough useful workspaces for everyone.
It allows for future-proofing. If the capacity in a room is changed, someone can amend the value in a single cell and start a new set of statistics.
It makes the project more transferrable. Individual formulae, or the whole spreadsheet, can be copied over to a new document and tweaked to track similar data elsewhere.
Because I do a lot of filling with my formulae ('fill down' and the like), actual numbers are a potential liability, as they can end up changing across the block of cells, rather than remaining static. Not that hard to overcome, but annoying nevertheless.

I used exactly the same structures to calculate for date, so I won't bother repeating that. I simply changed which column I referenced in INDEX.

Building charts

The previous charts, for the library itself, were simply tracking average values that were constantly updated. So they could simply track a fixed block of cells. The same is true of the use-by-timeslot chart for the graduate space, which simply took average values from the chart.

For the graduate space, however, I also had to generate a chart showing patterns of usage over time, which needed updating alongside the data. While it's a bit fiddly in the short term, the lowest-maintenance and most elegant way to do this was to use named ranges in the chart's dataset.

I defined two new dynamic named ranges for this sheet: Graddate and Graduse.

Graddate was defined as

=OFFSET( 'Graduate charts'!$B$11, 0,0, COUNTA( 'Graduate charts'!$B:$B) -7 )

=OFFSET( this worksheet!$date column$first row in date column, 0,0, COUNTA( this worksheet!$date column:$date column ) -7 )

In the Graduate Charts worksheet, go to the first cell in the Date column, and adjust this starting position by 0 rows and 0 columns. From here, select a number of cells equal to ( number of non-empty cells in the date column - number of irrelevant non-empty cells in this column )

The reason for the -7 is simply that this worksheet contains all the grad space analysis, both timeslot-based and date-based. The same column (B) contains seven cells'-worth of data relating to the timeslot-based calculations, which is irrelevant here, and I compensated by ignoring 7 rows. This wasn't really necessary - all that would happen otherwise is seven blank rows at the end of the date-based section would be included in the range. Excel should ignore these when generating charts, so it wouldn't make much difference, but it's tidier and better practice to make the adjustment as I go along. After all, in future I might use some of those cells for something else, and break the formulae. In theory I could have used another COUNTA rather than the fixed 7 value, which would have been better practice, but... I didn't.

To get the charts, I used these names as data sources. I started simply by manually selecting some appropriate cells from the correct columns, then went into Select Data Source.

I set the Legend Entries (Series), in the Series values (not Series name!) to:

='name of file' !Graduse

I set the Horizontal Category Axis Labels to:

='name of file' !Graddate

This gave me a chart based on a dynamic range of cells. As soon as further entries are added to the worksheet, the chart will adapt to include their data. All I have to do is keep filling down the dates and formulae as new data is added.

Annoyingly, I have to manually delete dates when no counts were taken in the graduate space - including weekends and holidays - as these produce #DIV/0 errors, and are treated as counts with zero students on the graph. Actual zero counts are left in, of course.

I could, in theory, write some code to do this for me. Every time a new row was added to the raw data, it would check to see if the date was already present on the grad calculations sheet, and if not, add a new valid line so long as a count was present for the graduate space. So far it hasn't been worth the time.

Pseudocode for adding new rows

This is pure pseudocode, dashed off hastily, but it gives the outline of how I'd do this if necessary.

onEvent(rawData.gradCount.newRow)
{
  if(gradData.indexOf(newRow.date)==0)&&(newRow.gradCount!='null')
    {
    get(gradData.lastRow)
    setDate(gradData.lastRow)==rawData.newRow.date
    fillDown(gradData.lastRow.formulae)
    }
}

Finally, here is a link to a (slightly tidied up) version of the Excel file.

Friday, 21 December 2012

Excel: library heatmaps

A quick discussion of a recent project involving COUNTIF, SUMIF, INDEX, named ranges and production of a heatmap using a surface chart.

I was recently asked to do what turned into a fairly fiddly statistics project. A university library has moved to new premises, and the new building has a mixture of furniture types: armchairs, small café-style tables, large desks and so on. They want to monitor usage of the different furniture, to find out whether they've got the right mix for their students, or whether anything should be changed. This means tracking the number of students in each category of furniture at each counting time.

There are a few layers of complication.

There are several counting times each day, so there's nested/duplicate data. This basically introduces a three-axis analysis: tracking the number of students in each type of seating at each time on each day. It also means that analyses over time will need to collate the information for each day, as will analyses by time of day.
Counts can't always be taken, depending on staff availability, so sampling will be erratic.
Furniture has been broken down heavily as a starting point, but may need regrouping later.
A graduate study room is to be tracked as well, but separately from the main library.

I set up the usual table structure: ever-increasing Time and Date rows as the rows, and the furniture types as fixed columns, with a Total column at the end to sum the students in the library at any time. There's also a secondary column to track the Graduate room. I named this Readercount, and defined it as follows:

=OFFSET( 'Raw data'!$A$2, 0, 0, COUNTA('Raw data'!$A:$A), COUNTA('Raw data'!$2:$2) )

This creates a dynamic named range, which expands as new rows are added to the bottom. What it's doing is:

Select this stuff(Location of first cell I want, move down 0 rows, move across 0 cols, select a number of rows equal to (the number of rows in column A containing data), select a number of cols equal to (the number of cols in row 2 containing data )

In other words: get the first cell in the block, then select all the rows and all the columns that are in use.

It starts at Row 2, because row 1 contains values for the number of seats in each section, as I'll explain in a moment.

Using names rather than just blocks of cells has two advantages. One is that I can use dynamic ranges, which is really useful for calculations like sum and mean values. The other is that you don't have to worry about formulae breaking in unexpected way if you move, delete or add cells - as long as the name still selects the right cells, all formulae based on it should still work. It also helps pinpoint mistakes in formulae, because names are more transparent and simpler than cell references. You can avoid repeatedly defining complex cell ranges and risking mistakes.

Calculating usage

A heatmap seemed like a sensible and intuitive way to map the average usage of each seating type, and show how it changes over time. For the 'heat', I decided to use the percentage of the seating type that was in use. I added in a row giving the number of seats of each type, and named it Seats - this is row 1.

On a new sheet, I did some test analyses to make sure things were working. For any complex work, I tend to go through step-by-step progression, which is a habit from coding, but also a useful way to check I actually know what I'm doing. This makes it much easier to pick up on typos, data entry errors and other confounding factors, before getting into complex formulae. Since I didn't really know what I was trying to do, I faffed about with several ideas and several new functions before getting something really useful; I won't bother discussing the mistakes and dead ends. I ended up with three main blocks of calculation.

The first is a simple block of potential usage. This calculates how many seats were available in each seating type, in each timeslot - adjusted to account for the number of counts taken. It shows the maximum possible students who could have been recorded.

This is a simple COUNTIF. I gave a timeslot in column B, and each successive column had a version of the following code:

=COUNTIF( Readercount, $Bthis row) * INDEX(Seats,0,column)

The row always matches the current row; this selects the timeslot value in column B as the timeslot you want to filter for with COUNTIF. Using $B allows easy drag-and-fill of the other columns. The column value picks the column from range Seats that matches the seating type.

In other words:

=COUNTIF( Readercount, this timeslot) * INDEX(Seats,0,this seating type )

Count up instances in the Readercount block of this timeslot, and multiply them by the number of seats listed for this seating type in the Seats block.

Note that with INDEX, if you enter 0 for the row or column, it will pass through every row or column in the chosen range. This is crucial to the COUNT* and SUM* formulae. If you enter a specific row and a specific column, it'll only look up one specific cell.

Next I calculated the actual seat usage: the total number of people who used each type of seating during each timeslot. I gave a timeslot in column B, and each successive column (representing a seating type) had a version of the following code:

=SUMIF( INDEX(Readercount,0,2), $Brow, INDEX(Readercount,0,column) )

i.e.

=SUMIF( INDEX(Readercount,0,timeslot), this timeslot, INDEX(Readercount,0,this seating type) )

Go through all rows in Readercount. Whenever the value in the Time column matches our timeslot value, find the column in that row that corresponds to the current seating type, and add its value to a running total.

In practical terms: total number of readers counted in that slot

Thirdly, I calculated the mean seat usage, by taking the total usage and dividing by the number of counts for the timeslot in question. Luckily, there was no variation in counts between the seating types, only between timeslots, so that's one less thing to worry about. As always, I gave a timeslot in column B, and each successive column (representing a seating type) had a version of the following code:

=SUMIF( INDEX(Readercount,0,2), $Brow, INDEX(Readercount,0,column) ) / COUNTIF(Readercount, $Brow)

i.e.

=SUMIF( INDEX(Readercount,0,timeslot), this timeslot, INDEX(Readercount,0,this seating type) ) / COUNTIF(Readercount, timeslot)

Here's a timeslot and a seating type. For every timeslot value in Readercount that matches our chosen timeslot value, get that row's entry for our chosen seating type, and add it to a running total. Finally, divide that total by the number of counts made for this timeslot.

In practical terms: total number of readers counted in that slot / number of counts in that slot

Finally, I worked up to the percentage usage. This was simply the mean seat usage multiplied by 100.

=SUMIF( INDEX(Readercount,0,2), $Brow, INDEX(Readercount,0,column) ) / ( COUNTIF(Readercount, $Brow) * INDEX(Seats,0,column)) * 100

i.e.

=SUMIF( INDEX(Readercount,0,timeslot), this timeslot, INDEX(Readercount,0,this seating type) ) / ( COUNTIF(Readercount, timeslot) * number of seats available ) * 100

For every timeslot value in Readercount that matches our chosen timeslot value, get that row's entry for our chosen seating type, and add it to a running total. Divide that total by the number of counts made for this timeslot multiplied by the number of seats in the room. Finally, multiply by 100 to give a percentage figure.

In practical terms: total number of students counted in that slot / hypothetical maximum number that could have been counted * 100

For the actual heatmaps, I selected relevant cells from the percentage usage block, and created a surface chart. There were several changes necessary.

The default interval for the 'heat' (percentage usage) scale was 20. I had to tweak this by going into the Depth Axis, and setting the Major Unit to 10.
The default colours are revolting and useless for a heatmap. Astonishingly, there doesn't seem to be any simple way to specify a gradient for the colours used, or any preset colour ranges. You can, however, add a gradient fill to the legend, and you can add a gradient as the fill for a single value: I can't think of a single reason why you'd ever want to do either of these things. Baffling. I had to manually go through each 'band' on the legend and set its fill colours.
The vertical time axis automatically places the earlier values at the bottom of the scale. I found this unintuitive, and flipped it (Format Axis > Options > Series in Reverse Order).
A graduate study room is to be tracked as well, but separately from the main library.

The heatmap indicates that the “South Desk” and “West Desk” are by far the most popular, that the “Comfy Chairs” and “Oval Table” are the least popular, and shows up the generally higher occupation rates between 11.30-5.30pm, which correspond to more students present.

On request, I also created a line graph of the seating usage, with each timeslot as a separate line. Compared to the heatmap, this emphasised the pattern of use for each seating type. It demonstrated that this was more or less constant throughout the day, except for a slightly increased tendency to use armchairs at the end of the day, based on a small sample of counts at this time.

Analysing the graduate data proved considerably fiddlier, even though it's a single column of data. I'll discuss that in another post. EDIT: Other post is now up.

If anyone is interested, you can get a (slightly cleaned-up, but hopefully otherwise unchanged) copy of the spreadsheet here.

Friday, 19 October 2012

Tostid 'sy lioarlann!

She shiaghtin skeeagh elley v'ayn. Ta shin tarroogh dy liooar. Shen ny haghyrys my t'ou jannoo un lioarlann ass daa, as freayll skimmee fer jeu dys tendeil ymmydeyryn y jees oc ta çheet da'n un voayl nish. Va daa wheesh ny h-ymmydeyryn jannoo famman dys cur erash daa wheesh ny lioaryn da lieh ny sleih dobbree lieh ny buird do nagh dod ny smoo jin cur shirveish eer dy row traa ain. Va barraghyn scudleelioaryn myr nah voalley rish ny boallaghyn, as dagh shirveishagh roie mygeayrt my fer keoie.

Ec y traa cheddin, ta ard-eiyr ayn. Hoshiaght, dy chredjagh oo rieau dy vel lioarlann ny voayl shee, jean aasmooinaghtyn. Ta'n boayl lane dy yeshaghtyn sheeanagh: co-earrooderyn, taishbyneryn boalley foawragh beilloo, jeshaghtyn coip-hollys, dagh fer jeu sheer-chronnaney eer tra nagh vel peiagh erbee bentyn roo. Ta çhellvaneyn oik, intercom as tree(!) coryssyn arree screeaghey er fud y laa, as mannagh vel, ta çhellvaneyn as radioyn laue ny dorysseyryn ry-chlashtyn dy baghtal trooid dorryssyn foshlit failtagh ny lioarlann. S'doillee eh feddyn magh my ta bun sheean ennagh mastey'n tharmane, ny loayrt rish sleih harrish yllagh y giat "sauçhys" ta shassoo noi peiagh neuloghtagh ennagh, as yllagh yn ourallagh boght tra ta'n giat dooney er e lurgaghyn. Ta'n giat jouyllagh shen screebey er ard lesh dagh peiagh goll ny hrooid, as eisht erash lesh cling; myr noi-chastey skeeys, ny keayrtyn t'eh aaghooishtey ass cadley boiragh, builley ry-cheilley myr ribbey roddan as cur magh sheean ny s'cooie da clag raauee çhennar fovooirane na da jeshaght stiuree entreilys ta currit ass nagh ren oo cur jee y kaartey ayd kiart er yn aght oardit. Gow padjer nagh jagh y larym arree lioaryn hene er bioyraghey! Ta'n gullyrnee shen dy liooar dy chur er fakideryn roie er çhea er son fastee, er deiney lajer giallaghey, as er shenn ard-scoillaryn baneaghey as tuittym dy trome noi boalley ennagh fo aggle. Foddee oo credjal dy vel peiagh ennagh fo drogh-ouyrys eab kynney-ghunverys, cha nel jus er n'eeassaghey lioar ennagh nagh ren ny lioarlannee aarlaghey dy kiart. Shen gyn çheet er beepyn as çherryn gyn scuirr lesh lioaryn goll er eeassaghey, glionnagyn goll er dooishtey as y corys intercom goll er brishey er lesh çhellvane creeney ennagh gyn coodagh fondagh, as frappal myr aile lane fuygh glass. Ta shligganys beayn lectraneagh ayn nagh lhiggagh dhyt ny sheeanyn y aarheynn; t'ad covestey myr sheer-tarmane ta soodraght harrish keeallyn castit myr mooir leoaie seiy.

'Sy choirrey çherragh cronnaneagh shoh, ta sheean beayn tarrarey boallaghyn, cur sheese caabylyn, bwoalley dorryssyn, builley coshey as obbraghey oardyn faggys as foddey jeed neuvaghtal, bunnys, da cleayshyn skee. Da peiagh erbee elley - noanee neuchreoit ta ventreil stiagh son y chied cheayrt 'sy çhiamble ynsee shoh - t'eh clashtyn dy vel ard-eailley electronica annymoil er lheh ayns Akihabara er ny crosh-skedjal marish taishbynys çhaghnoalleeaght lhieggey bleinoil yn teihll. Aigh vie nagh vel peiagh erbee geabbey dy obbraghey ayns shoh, eh?

As shen jerrey shayll 10-oor Jeheiney ec jerrey'n çhiaghtin. As ta mee gobbraghey mairagh myrgeddin. As kiart erash da'n fer oast aym cur stiagh - fy-yerrey, er dy shiaght shiaghteeyn! - jeshaght deayrtee noa (dgms, fer ta cur magh ushtey gyn screeaghey as gyn ceau lieh jeh trooid ny boallaghyn as sheese da'n çhamyr aarlee!) ta'n uinniag shamyr aarlee er ngoaill toshiaght lhiggey stiagh ushtey.

"T'eh tayrn dy traa ny lhiabbagh", myr shen.

Thursday, 12 July 2012

Eddyr Laghyn Seyrey

Cha row monney traa aym rish tammylt dy screeu artyn ayns shoh, ny da'n Chishtey Stoo noadyr. Ta mee kiart er geau shiaghtin ayns Ard ny Loghyn marish y lught-thie, gyn eddyr-voggyl erbee ny co-earrooder noadyr as gyn traa dy yannoo red erbee dou hene. Nish ta mee gobbraghey dy jeean 'syn obbyr rish shiaghtin, as eisht Jesarn hem erash da'n Twoaie as eisht harrish y cheayn dys yn Çhenn Ellan Aalin (as 'Liugh as Gheayagh, tra bee'm aynjee, er lhiam) er son y Chruinnaght. As tra hem erash bee eh orrym obbraghey myr fer keoie er y fa dy vel shin scughey'n lioarlann ta mee gobbraghey ayn. S'treisht lhiam dy noddym screeu art ny ghaa voish yn Chruinnaght tra ta mee ayns Mannin.

Monday, 12 March 2012

Jerrey'n imbagh

Ta jerrey’n imbagh er jeet nish, as ta reddyn çheet dy ve ny s’kiuney. S’aalin eh, son y chooid smoo. S’mie lhiam ny studeyryn ta çheet da’n lioarlann, she adsyn cree yn obbyr aym; t’ad cur bree as gennalys er y voayl, as nish ta’n obbyr hene ta faagit ayms beggan dree. Ny yei shen, ta mee laccal beggan kiunid erreish da imbagh tarroogh as tree bleeantyn boiragh. Ta ny smoo boirey ry-heet chammah, agh rish tammylt ta caa ain beggan fea y ghoaill. Cha nel ny turrysee lane my vlaa foast, myr shen ta'n chaayr tostagh dy liooar. Ta mee jus ceau wheesh traa as fondagh liorish shooyl mygeayrt dy kiune as lhiggey da reddyn lheie ersooyl. Dy firrinagh, ta mee er ve fo 'troos rish eash er dy daink çhingys er yn ard-ven ain 'sy toshiaght, as cha bee reddyn ny share derrey jerrey'n arraghey oik ta foin 'sy touree shoh. Agh cre'n feeu accan? S'treisht lhiam nagh bee eh myr shoh er son dy bragh, as myr shen cha nel eh orrym agh surranse rish tammylt. Rish jerrey'n imbagh, ta shin dooney ny s'leaie as myr shen ta ny smoo traa seyr aym er son ceaughyn traa as reddyn elley. Lesh shen as aase ny greiney rish yn arragh, ta mee roie ny s'menkey as ta shen yindyssagh. S'doillee dhyt credjal eh - wahll, by ghoillee douys eh y chredjal - dy vel taitnys erbee ry-gheddyn ass roie mannagh vel uss roie hannah, agh erreish dhyt geddyn barriaght er neuollanid erbee 'sy toshiaght, tra foddee oo roie tammylt gyn scuirr, ta seyrsnys as boggaghey ennagh ry-gheddyn ass jus roie, obbraghey muskylyn as lhiggey da'n inçhyn cadley. Share dou roie roish yn obbyr, myr saase dooishtee, agh t'eh cur orrym yeearree cadley 'syn obbyr ny keayrtyn...

Red quaagh er lhiam, shen y scansh eddyr pobblaght y lioarlann ‘syn imbagh as y phobblaght t’ayn nish. Rish yn imbagh ta pobblaght cheaghlagh ain, agh nish she lught gillyn t’ayn dy fondagh; deiney jeean as smooinaghtagh as glioonagyn oc. Ta ny mraane cho goan as airhey, bunnys.

Friday, 24 February 2012

Embedding mp4 videos in a Virtual Learning Environment

I recently finished a fairly long project to make a DVD collection more readily available to members of a university library. The DVDs are expensive academic ones, and the library is not that keen on lending them out. Also, this would restrict usage to one person at a time, which means if they’re assigned viewing for a seminar or assignment, life becomes very difficult. This being the case, the library got permission to make the DVDs available on the university network. However, for copyright reasons and to minimise the risk of piracy, access has to be restricted to only current members of the university, which means controlling access pretty heavily; they can’t just be made available over the internet.

The small faculty library have limited influence over the university computing services, which meant that bespoke viewing solutions were not really an option. For example, they couldn’t have a special password-protected page set up that would control access through user identification. For copyright purposes, something like this with by-user watermarking would be a nice solution; but technically difficult and probably expensive, even once the basic page was set up.

The chosen solution was to make use of an existing user-restriction system: the university Virtual Learning Envirnoment (VLE). The university IT service agreed to set things up, but for various reasons were eventually forced to drop the project. Obviously (as I’m posting this) it fell to me to handle it. I thought it might be useful to post a description in case any other poor sap has to do a similar project in the future.

A complication that arose fairly early was that the VLE would not (at least at that time) allow media files to be played directly. As a result, I had to set up embedded players within the VLE to play each set of files.

There were several basic stages to the project:

Transcribe the content descriptions of each DVD for use on the VLE. Thse came as text pages on the DVD, (not files; still text images!) which would have been somewhat nightmarish to transcribe. Fortunately the library had paper handbooks that had accompanied the original VHS versions of the DVDs.
Rip the DVD to .mp4.
Create XML playlists for each DVD.
Create webpages for each DVD with an embedded player.
Create static webpages for the transcriptions of each DVD, and an overall DVD listing page linking to the individual embedded-player pages.

Some were trickier than others, but I’ll run through them all quickly.

Transcription

The paper handbooks were invaluable here. I simply scanned the pages and OCRed them to get the basic text. Naturally there were plenty of OCR errors to correct, some of which I picked up at the time and others much later.

Ripping

For ripping the DVDs I used Handbrake (http://handbrake.fr/). This is a very useful bit of kit, fully open-source and GPL for those worried about it, and usable on Windows, OS X and Linux.

To rip DVDs with Handbrake:

Go to Options. Set the default folder to wherever you want this lot of files stored (which saves hassle later), and set up a filename scheme. This needs to be set for each DVD if you want the files named anything useful.
Insert the DVD and select “DVD” as the source.
In my case, the DVDs were fairly low resolution, so for space-efficiency and quick processing we chose the “Iphone and Ipod Touch” setting defaults.
Set audio however you want; in our case it was worth mixing down to Mono because there was only a mono track.
Select “Web Optimise”
Manually select the first track, so chapters “1 through 1”
Check the filename and path. Manually chance the prefix to .mp4 if you want .mp4s, as it tends to default to m4v.
Add the chapter to the queue
Change the start track to the next one. This will automatically change the end track, so you’re still only picking one track at a time.
Repeat until all tracks are added.
Open the queue, check all the tracks are actually there, and off you go. Ripping will take a while, so I suggest using a machine you won’t be needing for the rest of the afternoon; it might be safe enough, but there’s always a chance of causing problems if you try to work on it at the same time.

XML Playlists

A playlist looks like this:

  <playlist version="1" xmlns="http://xspf.org/ns/0/">
    <title>playlist title</title>
    <info>main DVD page URL</info>
    <tracklist>
      <track>
        <title>number and title of track</title>
        <creator>copyright holder</creator>
        <info>filename.mp4</info>
        <annotation>description of track</annotation>
        <location>filepath/filename.mp4</location>
      </track>
...(repeat for other tracks)     </tracklist>
  </playlist>

Sections in red need to be filled in appropriately. In the case of the VLE I was working with, files seemed to need absolute paths rather than relative ones.

I wrote them in Notepad++ (http://notepad-plus-plus.org/), which keeps track of bracket pairs, quote pairs and so on, making it amazingly useful for this kind of thing. In fact I do all my HTML editing in NP++.

Pages with embedded players

The player I ended up using is called JW Player, available from Longtail Video (http://www.longtailvideo.com/players/). It supports both Flash and HTML5. I used the simpler Flash option as the VLE was liable to create enough problems, without worrying about browser issues. I believe the newest release of JWP (currently 5.9) may make things simpler, but haven’t tried it. I started the project with the January 2011 release (probably 5.0), and have not seen any particular reason to change over.

I downloaded JWP and unzipped it into a folder on the VLE. That’s about all you need to do with it.

Each embedded-player webpage looks like this:

<h2>DVD Title</h2>
<script type='text/javascript' src='../../media/swfobject.js'></script>
<div id="mediaspace">descriptive text that should not appear unless there are problems. I used it to include a warning that library catalogue machines could not play the videos</div>
<script type='text/javascript'>
  var so = new SWFObject('../../media/player.swf','mpl','980','480','9');
    so.addParam('allowfullscreen','true');
    so.addParam('allowscriptaccess','always');
    so.addParam('wmode','opaque');
    so.addVariable('playlistfile','DVD_playlist.xml');
    so.addVariable('playlistsize','340');
    so.addVariable('playlist','right');
    so.addVariable('bufferlength','30');
    so.addVariable('autostart','true');
    so.addVariable("repeat","list");
    so.addVariable("shuffle","false");
    so.addVariable('backcolor','52708b');
    so.addVariable('frontcolor','000000');
    so.addVariable('lightcolor','ffffff');
    so.write('mediaspace');
</script>
<p>Read the <a href="path/thisdvdsummary.html">written summary</a>
</p>
<p>Return to the <a href="path/maindvdlist.html">main DVD listing</a>
</p>

Static pages

The pages with transcriptions, DVD listings and other information were just bog standard HTML pages.

Folder structure

Just for completeness, here’s the folder structure I used for the project, with generic names instead of the specifics I used.

Main (root folder for the project)
    Main.Media (contents of the JW Player download, used to actually play the .mp4s)
    Main.DVDs (includes the homepage for the videos, and any summaries or transcriptions to the contents of multiple individual DVDs; the latter could instead have been in a specific Summaries subfolder)
        Main.DVDs.ThisDVD (includes the individual mp4s; the xml playlist; summary page for the specific DVD; and the embedded-player page).

Sunday, 23 October 2011

Gyn vree

Er y gherrid, cha nel monney stoo noa jeant aym er son yn ynnyd-eggey beg aym. Dy firrinagh, ta bea er ve trome rish y vlein shoh chaie. Cha nel my vea hene cho olk, agh ta mee tarroogh as fo stroos bentyn rish yn obbyr as reddyn elley. Ta'n rheynn gobbraghey ain arraghey dys troggal elley y sourey shoh hugain, as ta obbyr ass towse ry-yannoo. Agh ta'n lught reirey neuarryltagh briwnys erbee y yannoo, as ta'n ard-ven ersooyl fo asslaynt ennagh kerroo ny bleeaney dagh blein, as cho skee as faase nagh nod ee jannoo monney tra t'ee erash. Myr shen, ta'n lught obbree er nyannoo tooilley obbyr rish daa vlein, as ceaut as skee. Cha nel shen ro-olk dooys, dy firrinagh, agh t'eh jeeaghyn dy vel dagh ainjyssagh aym fo doilleeid trome ennagh; çhingys, mooinjer çhingey, studeyrys gyn raah, argid, staartaghyn, cooish ghraih chrampagh... as myr shen ta aeraght trome as groamey er feie ny caayrey. T'eh cur lhag-chreeys orrym as cha nel eh greinney aittys ny gientynaght - ny bree.

Liorish ooilley shen, cha nel monney bree aynym ny laghyn shoh, as s'doillee eh red ennagh y screeu. Ta shalee ny ghaa foym as lieh-yeant, agh cha nel traa ny breeaghys aym ad y chooilleeney. Chammah's shen, mannagh vel glout dy hraa ayd, cha aashagh eh reddyn gientynagh y yannoo dy corrym. T'ou jummal traa liorish smooinaghtyn er c'red v'ou jannoo, ny shirrey noteyn, ny shirrey er y vree skeeeallagh v'aynyd y keayrt s'jerree. Myr shen, ga dy by vie lhiam red ennagh noa y chur magh, ta mee aareaghey ny skeealyn ta caghlaait ny screeuit aym hannah, obbyr gyn feme er wheesh bree. S'treisht lhiam dy bee caa aym red ennagh noa y yannoo dy leah.