tag:blogger.com,1999:blog-4953008377950396317.post4470328889151329323..comments2024-03-17T06:13:15.256-07:00Comments on SearchReSearch: Answer: Finding things with additional property limits (beginning web scraping)Dan Russellhttp://www.blogger.com/profile/13603209997260423532noreply@blogger.comBlogger11125tag:blogger.com,1999:blog-4953008377950396317.post-76735006256947991302015-06-19T00:16:54.139-07:002015-06-19T00:16:54.139-07:00This comment has been removed by the author.Anonymoushttps://www.blogger.com/profile/03851416537513600164noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-30792239819212005872015-02-28T08:00:16.281-08:002015-02-28T08:00:16.281-08:00scraping, not scrapping… meat-sacks of the world c...scraping, not scrapping… <b>meat-sacks of the world comply</b> - or confit, with a nice couscous: Julius Marx<br /><a href="http://goo.gl/maps/lS4pv" rel="nofollow">keep an eye here</a><br /><a href="http://goo.gl/maps/y5yRH" rel="nofollow">need an image update, too sunny</a>remmijhttps://www.blogger.com/profile/17985809654574916217noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-3422908162671233012015-02-28T06:40:23.231-08:002015-02-28T06:40:23.231-08:00Clipping off the "obviously not in the sofa p...Clipping off the "obviously not in the sofa price range" data is a good move. If I were going to give this report to my manager, I would have cleaned the data a bit more--and this is a fast way to improve the data quality. <br /><br />WRT "big data" -- yeah, it's a thing. I moderated a panel on the future of big data -- the YouTube video is https://www.youtube.com/watch?v=KenqiihxT1U -- my part starts at 10:50, and I moderate the discussion throughout. It's a pretty quick and useful overview of some of the prospects and possibilities for Big Data. <br />Dan Russellhttps://www.blogger.com/profile/13603209997260423532noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-52759385373878551732015-02-28T06:32:17.546-08:002015-02-28T06:32:17.546-08:00That's a good trick to know. Thanks.
BTW, I...That's a good trick to know. Thanks. <br /><br />BTW, I went back and updated the map so it's really interactive now. I added an addendum to my post: <br /><br /><i> In response to a couple of questions from readers, I went back and made the map of "big data internships" truly interactive. Now if you click on a pin, you'll see the city, the company with the position, and a link to the job posting. </i> <br /><br />Hope this is more useful as an example. <br />Dan Russellhttps://www.blogger.com/profile/13603209997260423532noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-49748746085800371362015-02-28T04:51:00.479-08:002015-02-28T04:51:00.479-08:00Anne and I thought we had posted but our response ...Anne and I thought we had posted but our response never appeared so I must have previewed it and then never published it. One thing we did on the Ikea search was changing the dollar amounts in the search. We limited the search from $100 (changing the bottom number from 0 to $100) to $1975 which was the amount already there. By doing that we eliminated most but not all of the extraneous stuff. There was one couch that was on sale for $167 so we probably could have set the bottom limiter at $150 and not removed any actual couches. This was an interesting search for Anne and me because there were many new terms. We had never heard of big data jobs. Since we are high school educators we should know about what fields of employment are out there for our students. As I had said in my unpublished post, schools are pushing careers in STEM and this might come under the umbrella of T for technology but still I don't think most people outside of Silicon Valley know about this. We certainly didn't and when I've asked around no one else did also. So doing the search on big data jobs which led us to this article in Forbes magazine - http://www.forbes.com/sites/louiscolumbus/2014/12/29/where-big-data-jobs-will-be-in-2015/ - which really gave us a great overview on the topic and showed that the growth in this field is enormous. As always we love these challenges and consider working on them part of our ongoing professional development. Debra Gottslebenhttps://www.blogger.com/profile/08074610468240387547noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-89091216405970576682015-02-27T21:36:43.744-08:002015-02-27T21:36:43.744-08:00fwiw… in regard to what passager mentioned about m...fwiw… in regard to what passager mentioned about making maps pinpoints interactive, I would point to Rosemary's map in<br />her 2/26/15, 5:50PM post as an example of a higher info content pin pop-up - her map takes some zooming to get started<br />and the accompanying Fusion table is unfortunately a 404 result — as opposed to passager (Google Fusion Tables product) or Fred's map (My Maps product)<br />Rosemary's includes live links to the listed internship - something that would be really useful and efficient in reviewing the positions. Clever Rosemary. Just thought that was a nice touch.<br />Think it illustrates that it still comes down to how - and what - data is used to construct the map - seems the inputs are still best determined by Scott Adams <a href="http://blog.dilbert.com/post/111758467951/robots-read-news" rel="nofollow">meat-sacks, eh?</a> ;-)<br /><br />Also have to give DrDan some slack on these challenges - he often ends up trying to cover a lot of ground for an audience of varied experience… and he operates from<br />a very different knowledge base/starting point than many of his readers and knowing that there are multiple ways the often times non-concrete solution can be arrived at…<br />all while doing his official Goo job: "anesthesiologist of search" or "search Gurkha"… or something like that.<br /><a href="https://pbs.twimg.com/media/BdjauvACYAApm2V.jpg" rel="nofollow">Google search guru Dan Russell at work…</a><br /><br />Just saying highly comprehensive how-to tutorials aren't what these weekly questions are about… imo. Think Dan is just trying to stimulate exploration while creating a catalysis<br />to support his "sensemaking and information foraging" skill building exercises. As GRayR said, thanks for making the effort Dan… even if the result isn't as perfect or concise<br />as might be hoped for. They are almost always useful or, at least, entertaining and engaging… hard to do week in and week out.remmijhttps://www.blogger.com/profile/17985809654574916217noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-21927000486670042332015-02-27T17:52:31.462-08:002015-02-27T17:52:31.462-08:00Thanks Dan for the answer. I think I learned the p...Thanks Dan for the answer. I think I learned the page number trick in Wordpress, where you have an option that allows to choose the number of posts per page and to paginate. If you set that option to 0 you get all the posts on the same page. Whenever I see a pagination appearing in the address bar I use the trick if needed.passagerhttps://www.blogger.com/profile/05897589130110709598noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-61155385113799770952015-02-27T16:31:42.189-08:002015-02-27T16:31:42.189-08:00Hello Dr. Russell, Passager and everyone.
I lke t...Hello Dr. Russell, Passager and everyone.<br /><br />I lke this challenge because I needed this tool and never thought it existed.<br /><br />Thanks for the trick, Passager. I'll try that. About import, I downloaded the desktop option and signed with Google. For what I saw, no difference from the one that works online. I'll work with it more. Also, as you both mentioned, download sometimes is one page. I changed the searches and found that with that query, you can change 5 pages and increase the number. My query allowed 20 pages to download.<br /><br />About question 1, I like the answer. I thought maybe one specific page was needed. Also, internships are great. Lots of ways to practice and have a better career. I wish I had some of them.<br /><br />I am looking forward for the next post about this topic.<br /><br />Have great weekend everyone.Ramon Gonzalezhttps://www.blogger.com/profile/16129830563029534511noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-59978264917112394892015-02-27T15:49:45.839-08:002015-02-27T15:49:45.839-08:00Phillipe -- Well, my map IS interactive, but the p...Phillipe -- Well, my map IS interactive, but the pop-up information is pretty sparse. As you point out, it's easy to aggregate the data about a city onto the pin. I should have done that. <br /><br />AND... You're absolutely right about the sofa data. Almost everything in my table that's < $100 isn't a sofa, but a funky chair or something related to Ikea sofas. I really should have cleaned the data more carefully. (Mea culpa--I was trying to get the post finished up so I could get to work!) <br /><br />That's a GREAT observation about changing the pagenumber to 0. I didn't know that. How did you learn about that trick? (I don't think it works in general, so I'm curious how you figured it out.) <br /><br />Yes, Import.IO has real data size limits--which is why a stand-alone app is sometimes the best way to go. <br /><br />And while the data has some "additional junk" in it, the overall quality isn't terrible. Some additional filtering has to be done by the human user at the time of reading! <br /><br />Excellent comments. Thanks very much! <br />Dan Russellhttps://www.blogger.com/profile/13603209997260423532noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-42570375759077091172015-02-27T14:31:07.480-08:002015-02-27T14:31:07.480-08:00Dan,
This time I really don't get it.
For th...Dan,<br /><br />This time I really don't get it.<br /><br />For the first challenge you asked for an interactive map ( <em>Ideally, you'd make an interactive map, where you can click on the red button and read about the internship.</em>) but don't give it in your answer although the solution is quite simple.<br /><br />In the second one you give an answer that completely miss the point, most of the products you show in your list are not sofas but covers, chairs, tables… The first sofa is "<em>LYCKSELE LÖVÅS Sofa bed $199.00 Unit price</em>" and no sofa appears under that $199 price although on the ikea site you can find a $99 sofa. You say "<em>It's kind of a hassle to get all 36 (pages)</em>" but as I said in my answer you just have to add (or change 1 to 0 in your case) <strong>&pageNumber=0</strong> at the end of the URL to get all the results on one BIG page (when pagination occurs on a page and you see a page number in the URL, replacing it by 0 often do the trick and gives you all the results, I'm sure you know). It appears that import.io will not like such a big page and ask for downloading their app (I haven't done it yet but will give it a try) hence the use of Openrefine. I really don't understand the use of wrong charts. No data is better that bad data, isn't it?<br /><br />Anyway thanks for those always interesting challenges.<br /><br />Philippepassagerhttps://www.blogger.com/profile/05897589130110709598noreply@blogger.comtag:blogger.com,1999:blog-4953008377950396317.post-18679169079400777942015-02-27T11:17:39.496-08:002015-02-27T11:17:39.496-08:00Dan,
I don't get to participate as much in the...Dan,<br />I don't get to participate as much in these wonderful search lessons of yours as I want. But I do read them all and save them for later. This one is especially interesting and useful. I just want to say thanks. GRayRhttps://www.blogger.com/profile/11355369343845540787noreply@blogger.com