About Matt Phillpott

I am an historian of early modern Britain and the Digital Resources Manager at the School of Advanced Study. My main area of interest is in the authentication of knowledge in early print, including religious, historical, and agricultural texts.

HISTORE workshop podcasts

All the presentations from June’s Histore digital tools workshop are now available online from the History SPOT website and below:

Jonathan Blaney (IHR) and Matt Phillpott (IHR) – Histore Digital Tools – Introduction to the Project

Matteo Romanello (DAI/KCL) – An Introduction to Text Mining

Pip Willcox (Bodleian Library, Oxford) – New Tools for Old Books

Advertisements

HISTORE workshop: breakout group 3

The third and final set of notes from the HISTORE workshop comes from Group 3.  This group was chaired by Jonathan Blaney, who is managing the HISTORE project but is primarily employed on the British History Online website, Connected Histories, and other IHR digital projects.

 

Group 3 breakout (chaired by Jonathan Blaney)

  • lack of joined-up thinking
  • no support from faculty
  • no support generally
  • collaboration between disciplines
  • not even the initial knowledge – no demand therefore
  • funding pressures mean you’re afraid to experiment
  • not even Zotero! the initial costs are too high, even if it might be more efficient in the end
  • add them to skills workshops – show that your PhD is the complete package
  • lots of independent historians are interested or would be if they knew
  • seeing the relevance is the key
  • case studies are important
  • big showcase projects may be offputting
  • practical, small-scale examples
  • texts not necessarily freely available
  • need guidance on copyright – people know about traditional copyright but not digital copyright
  • Twitter!
  • Google Hangouts
  • need to enthuse people – virally
  • rating/guidance
  • people in academia should be used to the intellectual challenge of learning tools
  • we don’t see the failures
  • citation – rules not up to date
  • how-to-cite boxes are great
  • the desire for “everything” to be freely available, particularly for interested people without institutional affiliation
  • bringing data together in a bucket, so digital tools can be used on it once, rather than iteratively
  • encouraging a habit of recording search methods (data, databases, search terms, date of access) so searches/queries can be replicated
  • educating readers/users to understand what the data is
  • human-readable URLs are particularly useful for citations, e.g. DNB, OED (not EEBO)
  • one attendee’s son rejected a course offered on digital resources for History, but he and his classmates rejected it

 

HISTORE workshop: breakout group 2

The second set of notes from the HISTORE workshop comes from Group 2.  This group was chaired by Mark Merry, who is responsible for several IHR digital projects and is a key training officer for our research training courses.

 

Group 2 (chaired by Mark Merry)

  • Training in digital tools in the humanities needs to begin earlier in students’ lives – even at undergrad level – they should be viewed as fundamental research skills and given as much weight as non-digital skills tuition
  • That is, tuition should be about logic and engagement and transferable methodologies, and about taxonomies and topic modelling
  • There is a need to teach concepts, broad approaches and the ‘theory’ of digital tool usage as the tools themselves become obsolete too quickly for the training to be about the specifics of any one tool – we should be striving to inspire people to train themselves as much as ‘showing them how to do it’
  • There needs to be more training to raise awareness about the existence of appropriate tools and toolkits
  • Online training should be privileged ahead of face to face – for simple convenience and to reach researchers without institutional affiliation
  • Academics could learn from the tools available to ‘amateur’ researchers as well as their proficiency with the techniques and approaches involved
  • Training should comprise demonstration not lecturing – both in terms of the goal of the training (i.e. to show what is possible) and in terms of the medium of the training resource (screencasting etc.)

 

HISTORE workshop: breakout group 1

The first set of notes from the HISTORE workshop comes from Group 1.  This group was chaired by Matt Phillpott, project officer for the History SPOT platform on which the modules will eventually be located.

Group 1 (chaired by Matt Phillpott)

1. Ideal Project

1.1   Greater provision and availability of datasets as raw data for reuse outside of pre-defined institutional frameworks.

1.2   Discovery of internet resources – hard to find what is out there.  Some kind of central index needed?  The Arts-Humanities.net site does try and do this http://www.arts-humanities.net/ but is itself often overlooked.

 

2.       Digital Tools – experiences good and bad

2.1   Old Bailey Online and related resources explain the use of digital tools well but not many other projects do this.

2.2   Related to point 1.2 Connected Histories provides a useful search engine approach to searching various digital projects.

3.       Online training vs. Face-to-face – how in-depth do you want to go?

3.1   Most technological training is searched for via Google searches on an ad-hoc basis.  Search for free content for specific needs.

3.2   Forums, youtube and blog posts were mentioned specifically as places people find training advice

3.3   Online training is useful but face-to-face discussion and training can also go a long way to improving knowledge.

 

4.       Do you want to start a project and learn as you go, flying blind, or would you prefer to be in command of the technical requirements before you start?

4.1   The group agreed that a large part of training is on the go.  It would be pointless learning a skill and then finding that it won’t work for the project.  At the same time there was concern that it is not easy to find out what tools are out there and might be useful.

5.       What are the impediments to digital research for you?

5.1   Training and expertise are lacking within the History discipline.  This means that UG students do not get the training unless they are fortunate to have a lecturer who understands the technology.

5.2   There is a presumption that the young know what to do with technology, which is only true as far as it goes.  There is a lot that needs to be taught which isn’t.

5.3   REF is a large impediment as it does not recognise or reward extensive work done using digital tools.  Only focused on more traditional outputs such as monographs and articles.  Why spend so much time learning and using a digital tool, when traditional methods are rewarded more favourably?

Digital tools Workshop – overview of the breakout sessions

Our recent workshop on digital tools for historians has given us plenty of food for thought.  Do historians want training in digital tools?  The answer seemed to be yes (although admittedly we might have been talking with the already converted). 

Do historians have time or incentive to undertake training in digital tools?  Ah!  Now we have a problem.  The overwhelming response during our breakout sessions was that there was little incentive or guidance within the profession in regard to digital tools.  Indeed, newly off the press a British Library study funded by JISC has confirmed that Generation Y at least (that is, those born between 1982 and 1994) are not as ready to use complex digital tools as is often assumed.  The report Researchers of Tomorrow: The research behaviour of Generation Y doctoral students (2012) suggests more tailor made training is required, although it also agrees that there remains a reluctance to undertake such training unless it is already recognised as essential to students current researches. 

A further problem presents itself on this subject that was touched upon in our breakout sessions; there is a lack of basic knowledge about what tools there are to achieve research tasks.  There is no advice as to how easy or difficult those tools are to use (including how much time and cost it will take to learn).  Neither is there much advice on how tools can be adapted and used in historical research in general. 

These are all serious impediments that historian will need to address, as digital tools can offer exciting new opportunities to learn things from our textual heritage.   Group 2 from our breakout sessions, for example, argued for digital tools training to be included within undergraduate tuition.  This, they argued, should be viewed as fundamental research skills and be given as much weight as non-digital skills tuition.  Group 3 suggested adding digital tools training to skills workshops as a means of adding to the PhD ‘package’.    

What was interesting, that came out of all three groups, however, was a feeling that such dedicated training is not generally where they, themselves go to learn these skills, nor something that they want to necessarily go through to achieve their initial aims.  They liked to dip into a subject to learn what they need, and then if it is useful enough consider a full face-to-face or online course.  Group 1 emphasised that if they need to learn something about a digital tool they will generally Google it and find the information on forums, blogs, and wikis.  Indeed, many participants had used free training materials found through these methods. 

Nevertheless, such searching relies upon the fundamental need to know what tools exist in the first place and which are useful to research.  Group 1 discussed the need for a central location where such information could be found by historians.  It was pointed out that the Arts-Humanities.net provides such a service.  It was interesting that few in the group were aware of this.   

In all, it would appear from the discussion in our breakout groups, that historians want more easily available information on what tools there are and how these might be applicable to their own research.  They want to be able to find out a little bit about these tools quickly, and, where possible, gain a basic knowledge of how they work and what can be done with them, before considering spending their time on a training course.  What type of training course was, however, not quite made clear.  Do historians want face to face training on specific tools or techniques?   Or would they prefer online courses?  Perhaps a mixture of both? 

From these discussions it would appear that our approach with the two HISTORE modules (one on semantic data and another on text mining) was the right one.  We are creating two relatively short freely available modules that introduce each subject and which suggest what historians can potentially gain from using such tools.  The modules are broken down into sections which work through the process from the basic to the more complex (although they are not intended to give everything you would want to know about the tools).  These then, are introductions.  The first section of each course will introduce you to the tool and can be read within 30 minutes (probably more like 10 if you don’t do the exercises).  From there you can go further if you would like to gain a basic grasp of the tool.  In some cases that might well be enough for what you need.  At the very least the modules should enable you to judge for yourself whether more training and time should be spent learning about the tool. 

Over the course of the next week we shall post brief bullet point notes from each of the breakout sessions, so you can see a little more of what was said.  Soon after this, we will also post the audio and hopefully video from the presentations given at the workshop.  By the end of August we hope to have the modules ready for release and so we will be talking a little more about these very soon!

Definition investigations (1): Text Mining

As one of the team members for the Historie project I thought I would do a little bit of digging into other attempts (largely from other disciplines) of describing some of the tools that we will be looking at in the simplest way possible.  In my first entry I will look at text mining.  Jonathan has already provided a brief definition.  Text mining is “The derivation of structured, meaningful data from a large body of unstructured data, using automated analytical methods”.  But how are other people defining this particular tool? 

Carrying out a basic Google search on the question – what is text mining? – the first item that appears on my screen is a short article titled with my search query written by Marti Hearst in 2003.  Professor Marti Hearst works in the School of Information at UC Berkeley and makes a living researching various digital tools: search engines, social technology, computational linguistics (including text mining), information visualisation, and usability in websites.  Her article ‘What is Text Mining?’ (17 October 2003).  The article describes text mining as ‘the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources’. 

Hearst emphasises that text mining is an aspect of data mining but differs in that it is attempting to extract from natural language text rather than structured databases of facts.  Thus, text mining attempts to dig out new knowledge from free flowing text such as might be found in an article, monograph, or primary source material.  Text mining is not a glorified Search – Search Engines look for something that is already known and cannot easily remove the chaff (irrelevant data) from the corn (relevant data)!  Hearst also believes that data mining differs from programmes designed for information extraction:

‘I distinguish between what I call “real” text mining, that discovers new pieces of knowledge, from approaches that find overall trends in textual data’. (Hearst, 2003)    

A second, longer article by Hearst entitled ‘Untangling Text Data Mining’ (which appears online and in the Proceedings of ACL’99: the 37th Annual Meeting of the Association for Computational Linguistics, University of Maryland [1999]) looks at corpus-based computational linguistics engagement with text data mining and concludes that whilst good at producing better text analysis algorithms fails to search for new facts and trends about the actual world.  Whilst now quite an antiquated study on text mining Hearst’s call for a semi-automated system to be devised to enable better text mining results is still under used – at the very least – in the humanities sector.

Three aspects of HISTORE

When drawing up the initial idea for the HISTORE project we broke the deliverables up into three main portions that seemed to make sense from the perspective of both compiling the resources in the first place and then presenting them in a useful way to historians. 

These were:

  1. A tool audit (by example of existing projects)
  2. Case studies (one per tool)
  3. Training modules (two tools demonstrated)

These will be made available through the IHR’s History Online and History SPOT platforms which are now our primary location for digital data, listings and online training materials.  Much of this material will be produced in-house through our own extensive expertise in these areas; however there are various parts where we have planned (and have budgeted) for external help.  The following is a brief breakdown of what we currently see these deliverables as containing. 

Tools Audit

The tools audit will form a database of current relevant digital projects for historians using one or more of the tools selected for investigation for the HISTORE project.  These will be organised by function, with a faceted browsing interface to allow filtering of tools along multiple dimensions.  The tools audit will be made permanently available on History Online with direct links to the case studies and training modules on History SPOT.  

Case Studies

A represented tool from each of the main areas relevant to historical research will be included in a series of case studies describing what the tool can be used for, providing examples of actual use, and demonstrating how it can be combined with other tools/software.  These case studies will be made available on History SPOT.

Training Modules

The audit will inform the choice of training areas.  Two free online modules will be developed to train historians in the basic use of two digital tools.  The modules will be multimedia in nature and provide a general understanding and awareness of the tools use.  Again, these will be made available on History SPOT.