Contribute to Gwulo | Gwulo: Old Hong Kong

Contribute to Gwulo

Your contributions help Gwulo grow. Here are some ideas for ways to contribute:

If you've got any questions, or you'd like to contribute in some other way, please leave a comment below.

Regards, David

Forum: 

Hi David,

i would like to contribute to the running costs of Gwulo but do not really want to use the patron system. The reason for this is that I will charged a transaction fee every month by my bank or credit card company.

can I make an annual contribution and set this up as a direct debit on my credit card. In that way I only pay the transaction fee once and that will mean I can increase my contribution.

Hi Thomas,

Several other Patrons asked if they could make an annual payment instead of monthly. Please see suggestions for how to do this at: https://gwulo.com/comment/35546#comment-35546

Thanks for your support!

Best regards, David

I used an OCR software and converted the 1930 Jurors list into a WORD document and a searchable PDF file. Conversion into an EXCEL file was unsuccessful.

You can take a look at the results here. They're not 100% accurate but may make importing into EXCEL easier by C&P.

1930 Jurors list

I used OCR (Abby Finereader 12) to convert the1941 Jurors list (see https://gwulo.com/jurors-list-1941).

In the end the time to prepare a page (copy & paste, and check & correct errors) was a little bit faster than the current method we use (correct the details from the previous year's list), but more difficult to use in a group, so I've stuck with the current method.

Hi David,

I split each page of the 1930 Jurors into individual JPEG images, removed the lines and dots from the image that confuses most OCR software and enhance the black level of the words before converting it with an OCR into a TXT file.

1930Jurors_2.jpg
1930Jurors_2.jpg, by tkjho
1930Jurors_2_edited.jpg
1930Jurors_2_edited.jpg, by tkjho

This is what I got - a list of the occupations, names and addresses that can be relatively easy to proofread, make corrections and C&P, as long as only one line is allowed for each item. 

Banker
Jardine, Matheson & Co., Ld.
Per pro., Mackinnon, Mackenzie & Co.
Merchant
Chief Manager, Bank of East Asia; Ld.
Stock Broker, Geo. & H. A. Lamwert
General Manager, Union Ince. Socty. of
Canton, Ld.
Exchange Manager, Bank of Canton, LL
Director, Reiss, Massey & Ld.
Principal, Little, Adams & Wood
Assistant Manager, Butterfield & Swire.
Resident Partner, Mackinnon
Mackenzie
Butterfield & Swire _
Director, Gilman & Co., Ld.
Shanghai Bank
Caldbeck, Macgregor & Co.
Gen. Manager, Standard Oil Co.
Merchant, J. D. Hutchison & Co.
Merchant, Bradley & Co., Ld.
Manager, Bank of China, Ld.,
Merchant
Exchange Broker
Principal, C. A. da Roza
Merchant, W. R. Loxley & Co
Manager, Mercantile Bank of India,Ld,
Incorporated Accountant, Percy Smith,
Seth & Fleming
Butterfield & Swire
Freight Agent, Canadian Pacific S.S., Ld.
Merchant, Shewan, Tomes & Co.
Merchant, Silva-Netto & Co.
Manager, China Underwriters, Ld.
Borneinann & Co.
Shipping Manager, Jardine, Matheson
& Ca., Ld.
Managing Director, Hong Kong Hotel
Shar:Broker, Tester & Abraham
Dodwell & Co,, Ld.
A. S. Watson & Co., Ld..
Compradore, H.K. & K. W. & G. Co., Ld
Department Manager, Sun Life Insur.
ance Co., Ld.
Leigh & Orange


Ho Kom-tong,
Ho Leung
Johnson, Marcus Theodore .
Joseph, Joseph Edgar -
Kan Tong-po
Lammert, Herbert Alexander.
Lauder, Paul
Lay Kam-fat
Lewis, Brian Lander
Little, Alexander Coulbourne.
Little. John Hargraves
:Mackie, Charles Gordon
Stewart
McHutchon, James Maitland
Miskin, Geoffrey
Murphy, Lewis Newton
Oliver, Roland Edward Henry:
Parker, Philo Woodworth
Pearce, Thomas Ernest
Plummer, John Archibald,
Pui Tso-yi (T. Y. Pei).
Rocha, Joao Maria da
Rodgers, Robert
Roza, Carlos Augusto an
Russell, Donald Oscar
Sandes, Charles Lancelot
Compton
Seth, John Hennessey
Shaw, Thomas Henry Robert
Sheppard, John Oram
Shields, Andrew Lusk
Silva-Netto, Antonio
Ferreira 13atalha
Start, Herbert Rothsay.
Sum Pak-ming,
Siltherland, Robert
Taggart, James Harper
Tester, Percy
Warren, John Percival
Wong, James Mow Lain
Wong Kam-Ink
Wong, 'KWong-tin
Wong-Tape, Benjamin
Wool, Gerald George


7 Caine Road.
On premises.
On premises.
Hong Kong Hotel,.
On premises.
170 The Peak.
On premises.
16 Mosque Street.
11 Peak Mansions.
5 Aighburth Hall, May Road.
188 The Peak.
On premises.
On premises.
104 The Peak.
On premises.
On premises.
AltcAena, The Peak.
299 The Peak.
515 The Peak.
9 Village Road.
3 Robinson Road.
137 The Peak.
3 May Road.
On premises.
Galesend, 302 The Peak.
Deepdene, Deep Water Bay.
On premises.
1.■Hattori Road, Hong Kong.
16 Peak Road.
32 Granville Road.
512 The Peak-.
On premises.
368 The Peak.
On premises.
9 Stewart Terrace, The Peak.
On premises.
On premises.
11 Arbuthnot Road.
Aimai Villas, Kowloon.
Kia Ora, Kowloon. City.
On premises. 
 

If you'd like to try transcribing some years' lists with OCR, I suggest you work on some of the missing lists from 1893 and earlier (you can see the list of which years we've got at https://gwulo.com/node/6706).

I'd originally planned to go back to these once we've finished the 1930s, but it'd be great to get a head start on them with your help. It'll also let you see how long it takes on average to produce an accurate page, and we can compare that with the current method.

I tried this list with OCR. Since the original file is in pretty good shape, I did not bother to do any enhancement and loaded the PDF into my Nuance PDF Converter Professional and had it converted into a readable and editable PDF file. Then I C&P'ed the names column block by block into a text editor, from a few lines to a page at a time depending on how the text was highlighted by the software, to prevent them from all mixed up, & ditto with the occupations column. Then I did the proofreading, corrections and imported it into Excel. The .............. behind the names were not removed as it requires line-by-line editing. If I have to photoshop enhance the original file, I would lasso and remove them all in one shot.

Total time taken: C&P 18 minutes + proofread 64 minutes + import to Excel 3 minutes = 85 minutes for 8 pages.

Tan King Sing was listed as the manager of an opium farm on Bonham Strand.

1881 Juror List

I next tried using Win10's built-in OneNote to do the OCR, with the original PDF pages split into individual scan pages first and cropped each column out for individual conversion to minimize confusion to the OCR software. No C&P of the OCR output was needed. Then I enhanced the same PDF column and repeated the conversion to check for any difference. It seems that the better the original JPEG file the more accurate the conversion. More time is needed to prepare the files before OCR, but less time in proofreading the more accurate output file.

Corrections can be made right in OneNote with the PDF and the OCR output side by side, then imported into EXCEL. Editing this enhanced list took me 5:55 minutes.

1930JurorsOccupationPage1-OneNoteOCR
1930JurorsOccupationPage1-OneNoteOCR, by tkjho

 

 

The XLS you made with Nuance looks better. A quick glance at the OneNote shows mistakes in lines 3 & 4 of the first column of OCR'd text, and even more errors in those lines in the column of OCR text from the enhanced PDF. So Nuance looks the way to go.

Next steps to put the Nuance file online will be:

  • remove the ".... ". We can do that with two search & replace. First replace all ".." with nothing, then replace all the "<space>." with nothing. That should remove the extra dots taht were there for padding, but leave them on abbreviations like "Co."
  • Add the extra column on the left of the spreadsheet, with an s for special jurors and a c for common jurors
  • Create an HTML table from the spreadsheet
  • Make a new forum post with the table in.

Let me know if you need any help with any of these steps.

Hi David,

The Nuance file was edited and the OneNote ones were pre-edit. The OCR output is very much dependent on the quality of the original PFD scan. Quite a lot of the alphabets were broken and not continuous, resulting in d being read as cl and H being read as I-I etc. I think it may be faster to use OCR, separate each PDF into 3 long columns and put each column side by side with the original scan to do the proofreadings and corrections, as it took me only 85 minutes to do this 2 column list of 8 pages.

Understood, thanks. The 1881 list looks good, thank you for posting that. I've added the standard headers, and slotted it in to the list of Jurors lists at https://gwulo.com/node/6706

If you can work backwards or forwards from there to add in any more years, they will be gratefully received!

While living in Hong Kong, 59-63, there were some pretty heavy typhoons. We had to put shutters up at doors and windows to balcony and sit it out until somebody came round with an “all clear” message.

 

Went back 2 years ago and visited a museum which shows the devastation caused by the strongest typhoon while we lived there. All we had was many sea creatures, sand and wood on balcony but the damage was immense.

Hi David,

I tried another way to do the lists. This one is a combination of your present method, OCR and EXCEL's database format. This may be faster as one does not have to compare a long row of data in a spreadsheet with another long row in the PDF.

1) Open the PDF and open the previous year's EXCEL file, then click Data > Form to open the database form. Place the DB form just below the PDF for easier comparison. Delete N/A old entries, correct typos and highlight the new jurors in the PDF for entry into the DB all in one shot later to avoid having to scroll back and forth through hundreds of entries.

Juror PDF and Previous year's DB
Juror PDF and Previous year's DB, by tkjho

2) After finishing with all the pages that one wants to do, enter the new jurors by opening an OCR PDF below the original OCR. Either C&P from the OCR or type the info into the DB forms.

Original Juror PDF, OCR PDF and DB form
Original Juror PDF, OCR PDF and DB form, by tkjho

3) After it's all done, sort the DB.

Sort Database in EXCEL
Sort Database in EXCEL, by tkjho

I've done the 1931 Special Jurors list, will email to you.

Thanks for the extra investigation, and the file by email. I guess most contributors will stick to the simpler approach, but you're clearly very comfortable with these tools so if they save you time then they're good news!

Regards, David