Monday, 2 April 2012

Attack Of The Clones: Bonkers Sees Doubly Red In A Battle With Content Scrapers

This has not been a good weekend. For starters, I unintentionally aquired an orange thatch in place of hair. It was relatively cheap as thatching goes (with it being a dying art of course), but that was no consolation. Plus I failed to sleep, failed to edge the lawn, do a food shop, polish my shoes, or catch up on back episodes of Corrie. In short, most of the household chores and pleasurable activities I had set out to accomplish this weekend went by the board. This is because I was preoccupied by the sudden discovery that a month’s worth of past posts on Bonkers had been hijacked by Israeli terrorists - or by a Tel Aviv-based "content scraper site" (to give it its proper name) – posing as a harmless RSS index directory. Yup, I had been cloned.

Now my blog is already listed on a number of “aggregator” or directory sites like Tree Hugger or Day Life, and I have no problem with any of those. On such sites just the title and a bitesize taster of your post will typically be shown. Readers have to click on the link below - which will say something like: “Full article on (Site Name)” – if they wish to carry on reading, whereupon they are routed back to the original blog post. Thanks to this “titbit”-quoting approach, RSS feed sites are no threat at all to blog owners, and may well serve as additional sources of referred traffic.

As ever, Wikipedia helpfully explains this other, bothersome phenomenon of "content" or "web scraping" – and no, I am not scraping stuff from Wikipedia, just quoting a little bit of it:

“A scraper site is a spam website that copies all of its content from other websites using web scraping.”

One more sentence if the Wikipedia authors don’t mind, and then I am done.

“Some scraper sites are created to make money by using advertising programs.”

Too right they are! Yes, this one plonked my posts up on its site, hedged them all around with Google ads, and then sat back and watched as the revenue poured in...or maybe just trickled in, one small monetised drop at a time - who knows? Hey, it is the principle I object to, the piggybacking on someone else's time and effort in the hope of financial gain.

So I fired off a pretty stern email to the site owner, in which I came over all UPPER CASE, WHICH IS NOT LIKE ME AT ALL. I will, however, reproduce my complaint in normal text, to preempt mass eyestrain on the part of readers:

“Please take down all content from my blog, Bonkers about Perfume - you have
basically stolen my blog in its entirety for your own purposes and I will
pursue the matter further if you do not remove it from your site
immediately.”

To the site owner’s credit, he responded pretty promptly, asking me for my blog domain name and following this up with further inquiries about the exact web URL of the content I had a problem with. That said, in amongst our flurry of exchanges he couldn’t resist a defensive side swipe:

“And tobe clear we are only rss index directory (no scraping).”

And to be clear, if the lifting wholesale of material, including text and images, does not constitute scraping, I sincerely hope never to encounter a bona fide content scraper in a dark alley.

So anyway, by the end of the afternoon my blog content had been excised from the rogue site – scraped off, no less - though not without the site owner pointing out how I was in fact cutting off my SEO nose to spite my face:

“FYI to be in out site it is good for you SEO & traffice wise but it is no longer relevant for you…”

Note the suspenseful dots pointing to my ill-judged decision….

Well, as it happens, having checked them out, the content scraper’s site doesn’t appear to register on Google’s page rank scale of 0-10 whereas Bonkers is a 3, ie "off the starting blocks", you could say. (For anyone not familiar with this techie blog rating malarkey, Google page ranking is one key measure of a blog’s importance in cyberspace - for more on this see point 4 of my recent post on blogging here. So I fail to see why my ranking would be improved by an association with an unranked site, or for that matter why I would get any referred traffic to speak of, given that entire blog posts of mine were available for people to read over on the scraper site!

Not long after I sent my first tetchy email, something prompted me to click on a link on the scraper site that had previously led to my post on the Perfume Lovers London “Leather Event”; in its place I found a large chunk of what I took to be Hebrew, but at least the Bonkers blog content was gone. My thoughts immediately turned to Ari of Scents of Self, our go-to “in-perfume-community” Hebrew speaker, so I sent her the text and asked if she could confirm it was in fact Hebrew, and if so, whether she could kindly find a moment to tell me the gist. Ari was keen to help, but finding herself unfamiliar with some of the “Internet-y” terminology, she had the bright idea of running the text through Google Translate, and came up with the following:

"There is not even one character of copied content on this site. Display pages are a type of display window that broadcasts RSS within the online conversion of standard RSS to a display state that a web surfer is able to browse. And that every change that occurred in the original broadcast (feed) varies respectively in our website online. The donor site publishers with the broadcast channels are promoting Google and directing visitors. If you still want us to remove the display page of your transmission source, please contact us through the button "contact" and send us the download link."

So they would be the donor site publisher, I take it? Funny that I should feel as though I am the (unwitting) donor in all this...like those bodies harvested for organs without the prior permission of the deceased or their next of kin. And as for this business of "displaying" as opposed to "scraping", well, that is a nice point of semantics. That would also mean that pubs which screen Sky TV football matches using foreign satellite decoders are also merely "displaying" the games, rather than filching them in any more reprehensible manner.

And I find it interesting that this rather pat defensive statement should have appeared where my post had been, instead of being written in an email directly to me. I wouldn't be surprised if this was stock copy the site has got ready to bang in whenever it suddenly incurred a textual hole, as another disgruntled blog owner reclaims their material as their own.

Well, what have I learnt from this unsettling incident? Firstly, that it really does pay to make a direct approach in the first instance to the site which has copied your material, as this avoids having to seek out alternative avenues of complaint that may lie deep within the bowels of Google or Blogger. I also found out that fellow blogger My Perfume Life is being scammed in exactly the same way, and have dropped her a line to this effect. And I have proved once more that I can call upon friends in Perfume Land – even on a "sudden death" basis - for help on all manner of random topics. : - )

Lastly, I learnt (for the umpteenth time!) never to write a post on Blogspot software, even though it supposedly saves drafts as you go along. Beastly Blogger contrived to eat this post completely - or, you could say, to scratch it off the "compose" window with a single gouge of its heartless fingernails - just as I was doing a final proof. Which rather begs the question:

“Where’s the blinkin' duplicate post when you really need one?!”


And I would be interested to know if anyone else out there is aware of having been cloned, scammed or scraped in a similar fashion?

If so, did you take any action against the scrapers, or did you decide to go with the outflow? : - )


Since writing this account of my own experience, Tarleisio of The Alembicated Genie has written a powerful and moving post on the subject of content theft - see link below:

Phantoms in the Fumosphere


Photo of print scraper from drsmith7383 via Flickr CC, photo of boot scraper from sywlch via Flickr CC, other photos my own.

33 comments:

karen! said...

ugh, what a mess. Glad that it seems to be sorted now.

Natalie said...

Oh, how horribly frustrating. I'm not aware of it having been done to me, and I don't think my site is important enough to attract that. But, the internet is so vast.

It makes my blood boil that these scrapers (who are really thieves) are always finding new ways to carry on with this type of ... crap. (I can use the ellipsis too!)

Glad you got it sorted.

Vanessa said...

Hi Karen,

Thanks for dropping in with your good wishes. It does seem to be sorted, thanks, and I was pleasantly surprised at the quick reaction of the "donor site publisher" to my complaint - to be truthful I wasn't sure I would hear back from them at all.

Vanessa said...

Hi Natalie,

I learnt the term "ellipses" in the course of writing this post, but stuck to my dots as the other always makes me think of "elliptical billiard balls". And I can't for the life of me recall where that quote is from... : - )

It never occurred to me that blog size is a factor in the scraping strategies of these bottom-feeding bots, if bots is even the word. It is some kind of software that goes trawling for content, I do know that, but I assumed it is the more vulnerable "open content" sites that get picked off, but in fairly arbitrary fashion.

Tara said...

I'm such a techno dumb dumb I'm not even sure what RSS is, let alone a RSS index directory but it's clear to me that this content scraper copied your hard work without crediting it or linking to your site. What's worse however, is that it was in copied IN FULL so there would be little benefit to you even if they did.

I'm releived that at least they did respond quickly and took the appropriate action. I didn't see that coming. Though making out it they are not a content scraper and that it was your loss was a cheek!

Now do you have to keep checking Google or something to see if anybody else does it in the future?

Vanessa said...

Hi tara,

I am also a bit fuzzy on the exact nature of RSS Feed or other wizzy plug-ins assocated with blogs - or even what a plug-in is - but I'd say your assessment of the whole sorry tale is absolutely spot on on every point you make.

It is interesting to get other people's reactions to the story, as I was so embroiled in it myself at the weekend that I wondered if I could possibly have been missing the benefit of being "displayed" by my "donor site publisher" who had so kindly taken it upon himself to promote me in this way.

I do think you just have to keep a general eye out for stuff like this happening, but if there is such a thing as a "clone alerting service" I would be interested...

: - )

Ines said...

I don't think my blog ranks high enough to be on the map for content stealing. :)
And I'm not even sure how would I come across it anyway - not being online so much lately.
I'm glad you resolved your problem.

Now, I'm off to learn how do I discover my google ranking...

Vanessa said...

Hi Ines,

As I was saying to Natalie, I don't think size has too much to do with it - I suspect it is more a case of random opportune chance. I once had my life savings siphoned out of my bank account by fraudsters who sent a phishing virus that logged my keystrokes - the fraud department said that they would just have been online with their software gizmos and happened to catch me logging on to my bank account at the time. Based on the size of my savings - while significant to me - you wouldn't have thought I would be deliberately targeted!

I wonder if Blogger blogs are more vulnerable vs Wordpress - that would be an interesting thing to know...

Re your Google PR ranking, you are a 3, like me. There are PR checker sites you just enter your blog address in and it throws up the number! So if size did come into it, you could be equally attractive to our scraper friends... : - (

(Not that PR rank is THE be-all and end-all of SEO or anything, but it is one commonly used indicator.)

How I discovered this hijacking is by googling "Bonkers" and "perfume" to see different ways those words crop up in search phrases. I have got inspiration for a future blog post that way. Yet another one still to be written!

It is also a way of finding out other perfume blogs who are linking to you, so you can thank them and reciprocate, and I just enjoy random googling stuff anyway. Which is doubtless why I have bags under my eyes. : - )

Vanessa said...

PS And I imagine I shall be doing more of it in future to keep an eye out for something similar occurring again. Or moving to Wordpress, if that *is* in fact more secure!

Vanessa said...

Oh, and I even found a content scraper site that had lifted *content about content scraping*. So how shameless is that you might think?!! This is why I think it is a fairly arbitrary process. I doubt very much if the owners of such sites have much of a clue who they are featuring on their platforms - I get the impression they are just grabbing left and right using their automated software grapnels. Certainly this guy didn't seem to know who I was or what material of mine he had on his website!

The Candy Perfume Boy said...

What a bunch of horrible skeezbags!! I am very glad that you were able to get your content deleted from their site though, even if you did have to use capitals to do so :P

I live in fear of this sort of thing happening. I saw one Beauty Blogger on Twitter who had her entire blog scraped onto another blog that looked very similar. It is very annoying indeed and it's a shame that these sorts of things tend to be quite difficult to deal with.

That said, a good deal of vigilance within the perfume-blogging community is a great help and we can all come together to fight the scrapers.... !

Vanessa said...

Hi Candy Perfume Boy,

Haha - I was glad to depart from my normal capitalised habits on this occasion, as it appears to have had the desired effect!

Sorry to hear about the case of the beauty blogger - it is such a creepy feeling when you see it done to you.

As you say, we will keep an eye out for one another. I hope MyPerfumeLife can sort hers out too - I know she was on a blogging break for a while, which may have given the scrapers a good opportunity to nip in and pull the same stunt on her! : - (

Perfumeshrine said...

Hi Vanessa!

Sorry this happened to you. I'm glad it's sorted out now though.

Personally, it's happened so many times and so many people use aggregators to run things from my site I've stopped paying attention, otherwise I'd be losing my whole day trying to tackle everything. But yes, on a level it's enervating to see others making money out of one's words.

Hope you're otherwise well :-)

Vanessa said...

Hi Perfumeshrine,

Nice to hear from you, and I am not surprised to learn that your blog has experienced this problem time and again. I don't mind classic aggregator sites but these people went too far imo in lifting content wholesale.

I did find another perfume site which had got a mash of many of my past blog posts masquerading as "copy". It was complete gibberish and anyone looking at it would know that someone's blog had been put threw the shredder and spat out again so they cannot be accused of plagiarism, so I just ignored it. Though for all the sense it made, they might as well have written fjfjsafjsafjk39urnlsdnff= - you know, like when Charlie Bonkers the cat walks on the keyboard! Would take them no time at all to do that, though maybe the scrambled text has more of a semblance of real copy??

Anyway, I guess it is a question of degree whether I would weigh in and try to act on something in future - and also, how often it happens, and how weary I am of the fight. For a blog like yours, it must be like tackling a veritable "scraper hydra"!

Btw, I think my middle-aged hormones may be in a bit of a tizz at the moment, so that may have ratcheted my indignation levels up a notch!

Anonymous said...

Blergh.... What a nightmare! I'm so sorry that you've gone through this--- but i am glad that you found the stolen posts, and were able to "recover" them.

I don't think that this has happened to me... But don't really know how to check. Seems like I should get more savvy on the Google-ing!

Vanessa said...

Hi Dee,

Thanks for your good wishes. I think this site must just have caught my eye in amongst my idle googling because a) it was only on the second page of results at the time and b) had an odd termination of .il which wasn't familiar to me, prompting me to investigate further.

In fact that is another thing I have learnt in all this - Israel's country ending for website addresses! : - )

Marie in Denmark said...

They not only stole the contents of your blog, they also stole your time and a portion of your peace of mind - which is even worse.
Shame on them. Happy you won.

Anonymous said...

Hiya Bonks!

As a thoroughly non-techie person, I'm more intrigued by how you acquired the orange thatch.

Were you fooling around with electric cables? Auditioning for a part in "The Wizard of Oz"? Did someone switch the labels on your conditioner and bleach bottles? Or was it the shock of the scraping schemers?

What happened? I'm all agog!

yours, agog-ing,

Anna in Edinburgh

Dionne said...

Ouch! First Indieperfumes has some creep use her blog name on Tumblr, and now this. I'm glas you were able to get it resolved, Vanessa.

Vanessa said...

Hi Marie,

You are absolutely right at that - it took up most of yesterday one way or another. Well, I clocked the issue on Saturday night, so it was on my mind from that point onwards.

And as well as having my savings stolen in that online fraud incident, I have also been burgled - not while I was at home, luckily. The thief lived over the road and spent an estimated two hours in my house, drinking sherry and eating multiple bowls of cornflakes, the police reckoned, before making off with my telly and a Hoover, to add to the eight he had already nicked and stashed away. His preoccupation with hoovers was quite curious, thinking about it...

Anyway, this was *nothing as bad* as either of those "violations", but it rattled my cage, no question.

Vanessa said...

Hi Anna,

I love your suggestions as to the cause of my orange thatch!! I think the stylist left me reading one too many Hello! magazines, or that is my working theory - that can turn subtle highlights into raging ginger and brass-coloured stripes in the time it takes to read about yet another of the Duchess of Cornwall's L K Bennett ensembles.

There was also the later shock of the scraping business as you say, which doubtless "set" the colour as the weekend progressed.

Regarding my upcoming visit to Edinburgh - will PM you about your likely movements - I can tell you that hat contingency plans are being drawn up as I type. Or I may just make an emergency appointment with another salon if there is time!

Vanessa said...

Hi Dionne,

I did read about that happening to someone in connection with a Tumblr account now you mention it -didn't clock the name at the time - very bad luck for her.

I have in fact bought the name bonkersaboutperfume.com (for about eight quid a year!) to preempt anyone choosing a similar name quite legitimately, but am not qualified to say whether a non-blogspot "termination" might have given me any more protection than in the present instance when it comes to these scraper bots. I am woefully ignorant about such things tbh, but thanks for your commiserations!

Ines said...

Vanessa, thank you for the explanations. :)
I must say when I look back on my blogging experience, I now realized I started my blog knowing absolutely nothing about what was lying ahead. :)
I still don't know half as much as other bloggers it seems.

Vanessa said...

Hi Ines,

I would agree that when I started my blog, I had no idea either about the mechanics of blogging, the friends I would make, the whole concept of traffic and networking - none of it, really! I was unemployed at the time, and just started writing to keep my brain ticking over and as a creative outlet. And it also gave structure to my weeks, and still does!

The notion of blog audience and traffic is a side issue, but can also be quite absorbing. Hence my curiosity that someone should be looking at my blog from a hospital in The Maldives, as I wrote in my recent post on the subject of blogging. That sort of random trivia really appeals to me!

Indeed I find the algorithms behind the Internet generally fascinating. I was in a forum the other day when I got an ad pop up for a "mature chat room". Now how did that happen and why does it sound downright creepy to me rather than merely age-appropriate?!!

Unknown said...

Actually, just today @tarleisio let me know that my content was being scraped by another blogger on blogspot. I have no way of contacting them, but I left comments on my (stolen) posts and I am filling out the Google complaint form.

Tommasina said...

How horrible - vacuum cleaner, cornflakes, orange thatch, blog-stealing manic, and all!

FWIW, "elliptical billiard balls" is from The Mikado: I recognized it straight away, since my mother did a production of it at her (secondary modern) school lo! many a year ago (and my father sang the lead role the night before he took his Oxbridge entrance exams, poor farm lad that he was~)...

Big hugs and lots of sympathy to you (((V)))

Undina said...

Vanessa,

First - I'm really sorry that it happened to you. It sucks! I'm glad you were able to resolve it. Here's more about what happened to several other bloggers: http://thealembicatedgenie.com/2012/04/03/phantoms-in-the-fumosphere/

Now back to your blog. Technically the person who responded to you was kind of telling the truth: the whole content came from your RSS feed (though if their article didn't link back to your posts they were stealing the content - no matter how they've got it).

To see how it happens log off and then click on the Subscribe to: Post Comments (Atom) link on the bottom of the page. You'll notice that it shows full text with all the images. So all they needed to do was to subscribe to your feed and start saving the content.

Now what you can (and should!) do about it:
1. Go to your blog's Settings tab.
2. Select Other.
3. Under Allow Blog Feed select Short.

Done.

Ask me if you have ay questions.

Vanessa said...

Here is a comment from Krista Janicki which she left (my) yesterday evening. It didn't show up for some reason, so am "scraping" it from my email notification and posting it here!

From KRISTA JANICKI

"Actually, just today @tarleisio let me know that my content was being scraped by another blogger on blogspot. I have no way of contacting them, but I left comments on my (stolen) posts and I am filling out the Google complaint form."

Vanessa said...

Hi Tommasina,

Thanks very much for your sympathy and for clarifying that reference to The Mikado. I should have remembered it, for I acted in a production of the opera at my school c1974. I played the part of Koko. If I can find a photo I shall put it up on Facebook for a laugh!

: - )

Vanessa said...

Hi Undina,

I am extremely grateful to you for explaining that whole business of RSS Feed. I must say I barely knew what it was, and didn't realise I had a setting that was set to "full", effectively leaving the back door open to content thieves. I believe I have fixed the problem now. The guy who lifted the stuff was right then about just "displaying" material I had allowed him to, but it is very much an ethical nicety whether full display like that counts as nicking stuff or not.

As I said in a comment on Tarleisio's blog - and thanks for the tip off about her tour de force of a post, btw! - it is a wonder more people haven't posted massive chunks of RSS Feed of my blog if I was inadvertently allowing them to do so!

Vanessa said...

Hi Krista,

I am so sorry to learn that this has also happened to you, and it is great that Tarleisio alerted you to the problem so you could take it up with Google, as you have done. I hope MyPerfumeLife is able to sort out her similar issue with the Israeli site.

It is a horrible feeling when such a thing happens, and Tarleisio's post sums up the blog owner's viewpoint brilliantly. In fact I shall add a link to it now!

Martha said...

I find it reassuring that the scraper site has no page rank - and for that reason I generally feel angry but not worried about scrapers - they might as well be Xeroxing your content and lining a birdcage with it for all the exposure they get. It's infuriating that they feel entitled to do so, and somehow maddening that they seem to feel that you should be grateful for this exciting birdcage-lining opportunity. But I generally assume that it's only a matter of time before each one goes under.

Vanessa said...

Hi ChickenFreak,

Yes, the site's lack of Google rank made me feel a bit better about the whole thing and I love your "birdcage-lining" analogy!

That said, my annoyance at the fact that I was meant to be grateful for the opportunity to have my content extensively "displayed" at the bottom of a birdcage more than offset this, I sense. : - ) And the whole principle of the thing - as in lack of principles, rather - made me hopping mad!