tag:blogger.com,1999:blog-4953008377950396317.post6662790302789950635..comments2024-03-28T18:39:59.184-07:00Comments on SearchReSearch: Search Challenge (9/2/15): How to search in a scanned document?Dan Russellhttp://www.blogger.com/profile/13603209997260423532noreply@blogger.comBlogger16125tag:blogger.com,1999:blog-4953008377950396317.post-55795893844082222082015-09-07T12:42:56.306-07:002015-09-07T12:42:56.306-07:00I'm not sure how A.M. got there (where are the...I'm not sure how A.M. got there (where are the snips coming from?) or why you are not seeing the 7 with ⌘f… or am I missing the question all together?<br /><a href="http://postimg.org/image/q45mv2ns5/" rel="nofollow">⌘f on Aui Maisi's link example… shows 7</a><br /><br />OK, now I see where the article was cached & it makes more sense to see - nicely done, A.M. - pretty clever approach getting around OCR on a scanned document, works for<br />items on the net that Google has indexed - but if someone had directly sent a long scanned pdf that wasn't on the web, OCR would still be needed… right?<br /><a href="http://postimg.org/image/6ru4jzb6x/" rel="nofollow">used the full text version - ⌘f shows 7</a><br /><a href="http://postimg.org/image/yowly47x7/" rel="nofollow">using the cached version</a>remmijhttps://www.blogger.com/profile/17985809654574916217noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-17157895232281223662015-09-07T10:38:17.000-07:002015-09-07T10:38:17.000-07:00This is great. But how did you find all 6 + 1 ins...This is great. But how did you find all 6 + 1 instances? Regular control-F only finds 6. What did you do to find the 7th? Dan Russellhttps://www.blogger.com/profile/13603209997260423532noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-26685668047854147682015-09-06T14:05:19.516-07:002015-09-06T14:05:19.516-07:00I don't know what happened to my first comment...I don't know what happened to my first comment. I think my connection has some issues. Here it is again. <br /><br />Good day, <b> Dr. Russell, fellow SearchResearchers </b><br /><br />Searched:<br /><br />I know that Optical Character Recognition (OCR) helps for this Challenge and that Google Drive works. Not sure it works with large documents so for that reason searched. <br /><br />[search scan document google drive]<br /><br />Only text extract first 10 pages.<br /><br /><a href="https://plus.google.com/+GoogleDocs/posts/6i1Nf4iCnw7" rel="nofollow">Did you know that you can search through text in scanned documents? Or convert them to Docs so you can make edits?</a><br /><br />[search scan document] [search scan document online] [search scan document OR PDF online]<br /><br />Gives some options. Some need email, others small files. Therefore, need to find other ways.<br />[pdf searchable][pdf searchable online]<br /><br /><a href="http://www.online-convert.com/" rel="nofollow">Convert Document</a><br /><br /><b> Answers </b><br /><br />1. How can you transform this document (LINK) into something that you can search within? <br />A: I did it this way, to practice Google Drive. (first 10 pages):<br />1. uploading to Google Drive.<br />2. Openining with Google Docs.<br />3. Ctrl f [multiple documents]<br />4. This gives 2 times "multiple documents"<br /><br />With source: convert document, changed the file to docx and opened it on Chrome. <br /><br />2. Once you've done that, can you determine how many times the authors refer to "multiple documents" in that paper? (This was my original search task--finding interesting papers about how people read multiple documents at the same reading session. That's how I found this paper.) <br />A. Six times. Ramon Gonzalezhttps://www.blogger.com/profile/16129830563029534511noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-63854541012811738452015-09-06T13:30:14.170-07:002015-09-06T13:30:14.170-07:00I just checked my links and it looks like this lin...I just checked my links and it looks like this link won't work, sorry. If you want to try Kami download the app Kami and link it to Google Drive. <br /><br />I have tried uploading scanned/online articles to Google Docs in Google Drive but I find a lot of editing required. In some cases despite the editing it is worth the bother. For example if you want to use text in a presentation, add your own comments etc. using Google Documents for OCR documents can be very beneficial. It works well in a class environment where students are working together on exercises.<br /><br />I tend to use Kami for language learning. I highlight unknown phrases/words. As well when working with online books or scanned books that have exercises within the document it's easy to fill in answers. Very handy.Rosemary Mhttps://www.blogger.com/profile/12291661159622665464noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-78818757564330531512015-09-06T08:24:32.826-07:002015-09-06T08:24:32.826-07:00Hello Fred, good day :)
I know, I didn't know...Hello Fred, good day :)<br /><br />I know, I didn't know what "TLDR" meant. Now, I know. I tried to read the whole document and lost multiple documents number. In any case, very interesting reading lecture. <br /><br />As I mentioned, I like your path. I tried different ways and got same result but with more steps. <br /><br />Looking forward to know Dr. Russell's path. I know <b> Terl's </b> work. <b>RRR's </b> too and is new for me that app. And mine, is different but also works. <br /><br />About apps or extensions, how we can know if one is trustable and safe to add to our Drive?<br /><br />Enjoy Sunday!<br /><br />Ramon Gonzalezhttps://www.blogger.com/profile/16129830563029534511noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-81439714355484659302015-09-06T03:10:21.493-07:002015-09-06T03:10:21.493-07:00And now for something completely different: mix a ...And now for something completely different: mix a snip from the 1st, another from the 2nd, eg:["reading comprehension strategies" "strategies are developmental"]; get the (text-) cached version; markk' the 6+1 finds at http://ow.ly/RQEar. Voilà.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-68583510395647164192015-09-05T16:50:07.561-07:002015-09-05T16:50:07.561-07:00Hi Ramón, if it is more than ten pages sorry, but ...Hi Ramón, if it is more than ten pages sorry, but TLDR. ;-)krossbowhttps://www.blogger.com/profile/07877826327758153784noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-11671577425262955352015-09-05T12:56:06.834-07:002015-09-05T12:56:06.834-07:00Hello, Fred. Thanks for the Url about Google Tips....Hello, Fred. Thanks for the Url about Google Tips. Your way is simpler and just looks first 10 pages, not the whole document. <br /><br />OCR is great. Ramon Gonzalezhttps://www.blogger.com/profile/16129830563029534511noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-18765771961889022962015-09-05T09:57:47.906-07:002015-09-05T09:57:47.906-07:00Hello everyone,
1. I placed the PDF in my Google ...Hello everyone,<br /><br />1. I placed the PDF in my Google Drive. Right-clicked on the file and selected OPEN WITH > Google Docs. The converted file has each page as an image and then the text of that page below. This is helpful if you are doing translations or fixing OCR that Google Drive does. Answer - use Google Drive. <a href="https://get.google.com/tips/#%21/tips/hack-photos-and-PDFs-to-say-what-you-want?category=learn-better" rel="nofollow">https://get.google.com/tips/#!/tips/hack-photos-and-PDFs-to-say-what-you-want?category=learn-better</a><br /><br />2. Used Command-F to find that they mention [ multiple documents ] twice. <br /><br />Next question - would they consider reading the pdf image page and the OCR text below it in the Google Doc for translation as multiple documents in one or is still just one document? krossbowhttps://www.blogger.com/profile/07877826327758153784noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-74576446805216414692015-09-03T08:01:25.387-07:002015-09-03T08:01:25.387-07:00Good day everyone.
After reading your comments. I...Good day everyone.<br /><br />After reading your comments. I tried [Ocr online] results are good for images and smaller files. <br /><br />I found as mentioned before 5 multiple documents and 1 in reference. That is total 6. <b> Terl </b> mentions 6 on text and 1 on references can not find the one I miss.<br /><br />I'd like to know in the answer if possible how <b> Dr. Russell </b> selected "multiple documents" as the word to search.<br /><br />Yes, <b> Remmij </b> 10 pages in Drive is still the current amount.Ramon Gonzalezhttps://www.blogger.com/profile/16129830563029534511noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-5572896421613422142015-09-02T18:25:29.764-07:002015-09-02T18:25:29.764-07:00…am guessing this is headed in a Google/Drive-cent...…am guessing this is headed in a <a href="http://tinyurl.com/nr9dq2d" rel="nofollow">Google/Drive-centric</a> direction?? <a href="http://www.dailytech.com/Exclusive+Googles+New+Search+Icon+Was+Created+in+2008+by+Russian+Designer/article37480.htm" rel="nofollow">a small detour</a><br />the key seemed to be finding the term [optical character recognition] - there seem to be a number available… for $$ like Adobe Acrobat, some like what<br />is offered by Google in Drive (<a href="https://support.google.com/drive/answer/176692?hl=en" rel="nofollow">About Optical Character Recognition in Google Drive</a>) and a number of free on line offerings - which is the way I went as an alternative - <br />this required registration to really do the job, but seemed to work fine - had it convert to Word… <a href="http://www.onlineocr.net/" rel="nofollow"><b>on line ocr</b></a><br /><br /><i>"2. Once you've done that, can you determine how many times the authors refer to "multiple documents" in that paper? (This was my original search task--finding interesting papers about how people read multiple documents at the same reading session. That's how I found this paper.) "</i><br />I'm going with 7 times, including in Notes out of the 24 pages… (saw where Drive only does 10 pages… don't know if that is current)remmijhttps://www.blogger.com/profile/17985809654574916217noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-83241253335143154452015-09-02T17:35:21.936-07:002015-09-02T17:35:21.936-07:00How did I find out? Out of necesity I went in sear...How did I find out? Out of necesity I went in search of such a tool. One particular use I needed it for is my online book collection. I have even asked the company to expand their search abilites to incorporate more Google Search functions. Rosemary Mhttps://www.blogger.com/profile/12291661159622665464noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-31824064459533649542015-09-02T17:27:17.643-07:002015-09-02T17:27:17.643-07:00If I understand correctly the challenge this tool ...If I understand correctly the challenge this tool is something I have been using for a while. Formerly known as Notable PDF and now known as Kami. I uploaded your document to Google Drive. There I have linked Kami with Google Drive. I now use the OCR function in Kami. and here is the scanned document. <br /><br />https://web.kamihq.com/web/viewer.html?file=https://notabletemporarydownloads.s3.amazonaws.com/Notable%2520PDF%2520Export%2520-%2520LHvAVw3OvM-T_ciCP82_RA.pdf<br /><br />or http://bit.ly/srs_Sept_2_2015<br /><br />Multiple documents showed up 5 times.Rosemary Mhttps://www.blogger.com/profile/12291661159622665464noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-54939446793201269712015-09-02T12:11:09.500-07:002015-09-02T12:11:09.500-07:00Looked it up in GoogleBooks and seached within and...Looked it up in GoogleBooks and seached within and find the search terms multiple documents is mentioned 7 times in the book but 5 times in the selection you chose.<br /><br />Tried to get Drive to convert with no success<br /><br />jon tUjonhttps://www.blogger.com/profile/06450649073262987652noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-3085574010389400732015-09-02T11:30:15.953-07:002015-09-02T11:30:15.953-07:00Since it's a PDF file, I just use the OCR tool...Since it's a PDF file, I just use the OCR tools built into Adobe Acrobat Pro. I simply opened the text recognition table and clicked "In This File" and let it run. Not everyone has access to Acrobat Pro, however, so you might need a different OCR tool.<br /><br />Knowing the text recognition isn't perfect I then selected "Find all Suspects" and scrolled through the document looking for any instance of the word "multiple" or "document" that was highlighted and made sure it got it right. There were three instances of multiple and none of document and only one of the suspects was in the phrase multiple documents. However, in each case, it actually had the correct spelling. If I was going to be using this document for a lot of things, I might actually play with the settings a bit to try to get better recognition and go through it and fix all the suspects (there were a large number) but for the challenge I didn't need to do that.<br /><br />Then searching through the document I found 6 instances of "multiple documents" in the text and one in the references (which I didn't check for errors).Tom Stephenshttps://www.blogger.com/profile/15280708969464181482noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-67643904795430302092015-09-02T08:32:01.929-07:002015-09-02T08:32:01.929-07:00Cool challenge, and timely. Thank you, Sir!Cool challenge, and timely. Thank you, Sir!Anonymousnoreply@blogger.com