AIML428 (2021) - Text Mining and Natural Language Processing


This course focuses on text mining and natural language processing. It covers a variety of topics including text representation, document classification and clustering, opinion mining, information retrieval, recommender systems, query expansion, and information extraction.

Course learning objectives

Students who pass this course should be able to:

  1. Explain problems and principles in the related research areas of information retrieval, information extraction, clustering and classification and natural language processing.
  2. Build systems to solve problems in text mining or natural language processing related areas.
  3. Read, interpret, and explain research papers in text mining and natural language processing.

Course content

The course is primarily offered in-person, but there will also be a remote option and there will be online alternatives for all the components of the course for students who cannot attend in-person.
Students taking this course remotely must have access to a computer with camera and microphone and a reliable high speed internet connection that will support real-time video plus audio connections and screen sharing.  Students must be able to use Zoom; other communication applications may also be used. A mobile phone connection only is not considered sufficient.   The computer must be adequate to support the programming required by the course: almost any modern windows, macintosh, or unix laptop or desktop computer will be sufficient, but an Android or IOS tablet will not.
If the assessment of the course includes tests, the tests will generally be run in-person on the Kelburn campus. There will be a remote option for students who cannot attend in-person and who have a strong justification (for example, being enrolled from overseas). The remote test option is likely to use the ProctorU system for online supervision of the tests. ProctorU requires installation of monitoring software on your computer which also uses your camera and microphone, and monitors your test-taking in real-time. Students who will need to use the remote test option must contact the course coordinator in the first two weeks to get permission and make arrangements.

Withdrawal from Course

Withdrawal dates and process:


Dr Xiaoying Gao (Coordinator)

Teaching Format

This course will be offered in-person and online.  For students in Wellington, there will be a combination of in-person components and web/internet based resources. It will also be possible to take the course entirely online for those who cannot attend on campus, with all the components provided in-person also made available online.
Two lectures per week plus associate assignments and projects.

Student feedback

Student feedback on University courses may be found at:

Dates (trimester, teaching & break dates)

  • Teaching: 22 February 2021 - 28 May 2021
  • Break: 05 April 2021 - 18 April 2021
  • Study period: 31 May 2021 - 03 June 2021
  • Exam period: 04 June 2021 - 19 June 2021

Class Times and Room Numbers

22 February 2021 - 04 April 2021

  • Monday 15:10 - 16:00 – LT105, Alan MacDiarmid Building, Kelburn
  • Friday 15:10 - 16:00 – LT105, Alan MacDiarmid Building, Kelburn
19 April 2021 - 30 May 2021

  • Monday 15:10 - 16:00 – LT105, Alan MacDiarmid Building, Kelburn
  • Friday 15:10 - 16:00 – LT105, Alan MacDiarmid Building, Kelburn


There are no required texts for this offering.

Mandatory Course Requirements

There are no mandatory course requirements for this course.

If you believe that exceptional circumstances may prevent you from meeting the mandatory course requirements, contact the Course Coordinator for advice as soon as possible.


Assessment ItemDue Date or Test DateCLO(s)Percentage
Oral presentation (10 mins, 15 hours to prepare)TBCCLO: 1,320%
Peer review of oral presentations and project demosTBCCLO: 1,35%
Paper review (one page) (15 hours)TBCCLO: 315%
Project (baseline code, full code, demonstration, 2-page report) (40 hours)TBCCLO: 235%
TestTBCCLO: 1,2,325%


Any assignment submitted after the deadline will be penalised by 20% per day of the full assignment marks. Individual extensions will only be granted in exceptional personal circumstances. We have a late days policy to cover minor problems.
LATE DAYS POLICY: Each student will have three "late days" which you may choose to use for any assignment or assignments during the course. There will be no penalty applied for these late days. You do not need to apply for these, instead any late days you have left will be automatically applied to assignments that you submit late.


Individual extensions will only be granted in exceptional personal circumstances, and should be negotiated with the course coordinator before the deadline whenever possible. Documentation (eg, medical certificate) may be required.

Submission & Return

All work is submitted through the ECS submission system, accessible through the course web pages. Marks and comments will be returned using our online assessment system.


In order to maintain satisfactory progress in AIML 428, you should plan to spend an average of 10 hours per week on this paper. A plausible and approximate breakdown for these hours would be:

  • Lectures: 2
  • Readings: 3
  • Assignments: 5

Teaching Plan


Communication of Additional Information

All online material for this course can be accessed at

Offering CRN: 33070

Points: 15
Prerequisites: 60 300-level pts;
Corequisites: AIML 420 or COMP 307;
Restrictions: COMP 423
Duration: 22 February 2021 - 20 June 2021
Starts: Trimester 1
Campus: Kelburn