Instagram API
Instagram API
改版後期改用sandbox模式,意思為,所有app要先經過審核才可使用public_content部分,一開始你只可讀取到你自己的content,或是你可以把別人邀請入你的sandbox,之後其同意後你也可以API獲取其資訊,不然都需經過審核才可使用API
1.先到如下網站,註冊帳號
https://www.instagram.com/developer
到Authentication可看獲取Access token的方法
到EndPoints可看使用API的方法
2.其分為兩種存取API的方式
使用三階段(建議此種作法)
client-ID-->code-->access-token
使用兩階段(如沒有server,安全性較低)
client-ID-->access-token
參考如下
https://www.instagram.com/developer/authentication/
範例:
https://www.instagram.com/oauth/authorize?client_id=3b6d608eb019431ca90bde60c9178e21&redirect_uri=http://localhost:3000/users/auth/instagram/callback&response_type=token&scope=basic+public_content+follower_list+comments+relationships+likes
之後會跳轉到你填寫的redirect URL 並在網址最後面附上Access-token
爬蟲參數
因為現在API開放少,所以決定直接使用爬蟲方法來使用API
查詢指定tag的所有文章
GET
https://www.instagram.com/explore/tags/輸入欲查詢的TAG/?__a=1
查詢使用者基本資訊與前幾篇文章
GET
https://www.instagram.com/使用者帳號/?__a=1
可以查到使用者數字ID、名字、追蹤人數、等等。
查詢使用者文章與動態
https://www.instagram.com/graphql/query/?query_hash=...&variables=...
Query請求格式
instagram的XHR固定格式如下
https://www.instagram.com/graphql/query/?query_hash=....&variables=...
query_hash 為發送請求的種類 ( 例如請求加載使用者後續圖片的hash均為
472f257a40c653c64c666ce877d59d2b
)variables 為一個json經過urlEncode過的字串
Querystring參數
有兩個參數,query_hash與variables
第一種:
當query_hash為7e1e0c68bbe459cf48cbd5533ddee9d
時 (加載使用者推薦好友相關的資訊)
variables參數:
{
"user_id":"275237117",
"include_chaining":true,
"include_reel":true,
"include_suggested_users":false,
"include_logged_out_extras":false
}
e.g.
https://www.instagram.com/graphql/query/?query_hash=7e1e0c68bbe459cf48cbd5533ddee9d4&variables=%7B%22user_id%22%3A%22275237117%22%2C%22include_chaining%22%3Atrue%2C%22include_reel%22%3Atrue%2C%22include_suggested_users%22%3Afalse%2C%22include_logged_out_extras%22%3Afalse%7D
第二種:
當query_hash為 472f257a40c653c64c666ce877d59d2b
時 (加載使用者文章)
variables參數:
{
"id":"275237117",
"first":12,
"after":"AQBEU_pfdtAHWuxSKwtTEIYRnN8LIHtBASC8bAaQGgpD9r3ZaaVu0qMQzh_qArARwpdM2jt0tprfp35rtcX268DNOFUTBEH7yme7oC8R6mRAug"
}
其中first參數代表取初幾張圖,after也為end_cursor,意思為結束的位置,所以越大越好,這樣我們才能一次取出足夠的圖片
after參數建議使用如下
AQBEU_pfdtAHWuxSKwtTEIYRnN8LIHtBASC8bAaQGgpD9r3ZaaVu0qMQzh_qArARwpdM2jt0tprfp35rtcX268DNOFUTBEH7yme7oC8R6mRAug
end_cursor可以從上次的query請求中的Response獲得。
e.g.
https://www.instagram.com/graphql/query/?query_hash=472f257a40c653c64c666ce877d59d2b&variables=%7B%22id%22%3A%22275237117%22%2C%22first%22%3A12%2C%22after%22%3A%22AQDgv0_xlXhuHI_YQW8deViqPYXPj7dim6ODe_tAbM6XLhqwbe-Xp4JPEHpLAJ5XGusu-nKdFoCYCVFcF7OkjSscKISMfCYIsEVs8zx9h2rWaQ%22%7D
第三種:
當query_hash為 bf41e22b1c4ba4c9f31b844ebb7d9056
時 (加載使用者動態影片)
query_hash: bf41e22b1c4ba4c9f31b844ebb7d9056
variables: {"reel_ids":["275237117"],"precomposed_overlay":false}
reel_ids即為使用者ID
e.g.
https://www.instagram.com/graphql/query/?query_hash=bf41e22b1c4ba4c9f31b844ebb7d9056&variables=%7B%22reel_ids%22%3A%5B%22275237117%22%5D%2C%22precomposed_overlay%22%3Afalse%7D
取得使用者發佈過的文章圖片
所以現在我們來試著取得使用者的所有文章,首先我們要先知道要查詢的使用者ID
所以我們先使用如下查詢ID
const https = require('https');
function https_request(username, querystring) {
let chunk = '';
return new Promise((resolve, reject) => {
const options = {
hostname: 'www.instagram.com',
port: 443,
path: `/${username}/${querystring}`,
method: 'GET'
};
const req = https.request(options, (res) => {
res.on('data', (d) => {
chunk += d;
});
res.on('end', () => {
resolve(chunk)
})
});
req.on('error', (e) => {
console.error(e);
});
req.end();
})
}
https_request('liona_luona', '?__a=1').then(data => {
console.log(JSON.parse(data).graphql.user.id)
})
之後我們有了ID
我們可以將下面改為如下查找使用者發表過的文章數量
https_request('liona_luona', '?__a=1').then(data => {
console.log(JSON.parse(data).graphql.user.edge_owner_to_timeline_media.count)
})
有了ID與文章數量後我們就可以來拼出參數
我們先拼出querystring
{
"id":"275237117",
"first": 1000, //或是上剛才查出使用者的文章數量,因為我們要一次查全部
"after":"AQBEU_pfdtAHWuxSKwtTEIYRnN8LIHtBASC8bAaQGgpD9r3ZaaVu0qMQzh_qArARwpdM2jt0tprfp35rtcX268DNOFUTBEH7yme7oC8R6mRAug"
}
發送請求
const https = require('https');
function https_request(path, querystring) {
let chunk = '';
return new Promise((resolve, reject) => {
const options = {
hostname: 'www.instagram.com',
port: 443,
path: `/${path}/${querystring}`,
method: 'GET'
};
const req = https.request(options, (res) => {
res.on('data', (d) => {
chunk += d;
});
res.on('end', () => {
resolve(chunk)
})
});
req.on('error', (e) => {
console.error(e);
});
req.end();
})
}
https_request('test001', '?__a=1').then(data => {
userID = JSON.parse(data).graphql.user.id
userArticleCount = JSON.parse(data).graphql.user.edge_owner_to_timeline_media.count
}).then(() => {
let urlencodeP = encodeURIComponent(
`{"id": ${userID},
"first": ${userArticleCount},
"after":"AQBo_T54D3Isvkn39aEAn5WO1VvQXmLmZzReXHtfgylI-l4IrcVMMRs0Kqz1Q2tu5Jrkcw1ScAfAUddkbVuBiDTXhkHI5jz58I1xj3kxVuzlDQ"
}`)
console.log(urlencodeP)
let querystring = `?query_hash=472f257a40c653c64c666ce877d59d2b&variables=${urlencodeP}`
https_request('graphql/query', querystring).then(data => {
console.log(data)
})
})
注意:如果first參數超過一千以上會產生timeout情況,返回FB錯誤頁面。
接著我們用Async Loop的方式讀取使用者所有圖片
const https = require('https');
let articles = [];
let current_count = 0;
let iterateCount = 1000;
let username = "liona_luona";
function https_request(path, querystring) {
let chunk = '';
return new Promise((resolve, reject) => {
const options = {
hostname: 'www.instagram.com',
port: 443,
path: `/${path}/${querystring}`,
method: 'GET'
};
const req = https.request(options, (res) => {
res.on('data', (d) => {
chunk += d;
});
res.on('end', () => {
resolve(chunk)
})
});
req.on('error', (e) => {
console.error(e);
});
req.end();
})
}
https_request(username, '?__a=1').then(data => {
userID = JSON.parse(data).graphql.user.id
userArticleCount = JSON.parse(data).graphql.user.edge_owner_to_timeline_media.count
current_endCursor = "AQBo_T54D3Isvkn39aEAn5WO1VvQXmLmZzReXHtfgylI-l4IrcVMMRs0Kqz1Q2tu5Jrkcw1ScAfAUddkbVuBiDTXhkHI5jz58I1xj3kxVuzlDQ"
}).then(() => {
(async function loop() {
for (let i = 0; i < userArticleCount; i += iterateCount) {
await new Promise(resolve => {
let urlencodeP = encodeURIComponent(
`{"id": ${userID},
"first": ${iterateCount},
"after": "${current_endCursor}"
}`);
let querystring = `?query_hash=472f257a40c653c64c666ce877d59d2b&variables=${urlencodeP}`
https_request('graphql/query', querystring).then(data => {
if(JSON.parse(data).status === 'fail') {
console.log(data);
return
}
let _articles = JSON.parse(data).data.user.edge_owner_to_timeline_media.edges;
_articles.forEach(article => {
articles.push(article.node.display_url);
})
current_endCursor = JSON.parse(data).data.user.edge_owner_to_timeline_media.page_info.end_cursor;
resolve();
if (i + iterateCount > userArticleCount) {
// 讀取全部後
console.log(articles)
}
})
});
}
})();
}).catch(err => {
console.log('No user')
})
注意事項:
1.2017/10/1號之後只能取得Basic的資訊,其他 API 都不開放了。
https://www.instagram.com/developer/changelog/
2.爬蟲執行過多次後會出現以下錯誤:
{"message": "rate limited", "status": "fail"}
解決方法為把first調大,並減少iterate的request次數
3.2019年之後`?__a=1` request要加上cookie才可
從 userid 獲得資料
var https = require("https");
const fs = require("fs");
const accounts = ['8'];
const promises = [];
// https://i.instagram.com/api/v1/users/{user_id}/info/
const doRequest = (account) => {
const result = new Promise((resolve, reject) => {
let chunk = "";
var options = {
host: "i.instagram.com",
port: 443,
path: `/api/v1/users/${account}/info/`,
method: "GET",
headers: {
cookie: `ig_did=CEDC32AA-9FF4-4A94-B2C3-5833EE96BE87; mid=X5kwfQAEAAFmGSa00xLy_QF0z8dv; ig_nrcb=1; fbm_124024574287414=base_domain=.instagram.com; fbsr_124024574287414=alcLEmDcis8uHix2t39oJKFeAnWICvOYIkoI8LuybD0.eyJ1c2VyX2lkIjoiMTAwMDE0NzI0Mjc2MDc1IiwiY29kZSI6IkFRQ1ZXc0pqbzhZZ3FhbmVPUmU2bThDVzdNelFOSXU4c3NVV1JleW5uZnJhbFBQS2MzUU9mQ0lPZmZ6OWFqQ2NQdWVmUHA0MzRlVWJSYU1Yc3N2NlJfY1daaUkwUzZYdjdoOVhmUnlfTG1MdVZBVUxWODJuR1d0V2VWTDRrMVVNaWFNcmJXck8xT0dlT0NKa1ZDakMxaU9FSXNFODYyWkhjdWZhQ0w4a0NQTUdRVHAwWl92MS1SbzA2YVZCSkc2ZWM0U3NwLS1CeWFPc1A1Z3RxZktfV1k5c3BvLTFDbm9LbVd1VVlLN2FnaVdEMzllTDI5UWh6dTZDa1pxaXlzT3ItcTB0VUNCSllEZzJpdzRFUFYzY09ZekI5eHI2YUhtSHhUbW1WSTZUeEpPc3NUQXBwRGx4ZlROV1h2eVh4QjZEUE1zemlxb0d1ZmNBUFRuejU3SEtLbEhJIiwib2F1dGhfdG9rZW4iOiJFQUFCd3pMaXhuallCQUoyTzdHeFpDczRTckptR09oRXo1Z2lXbkhNN3cybDV6RW9ucTJJdFdtcHpWbEFaQjA4clhnMUpvelR5ZExRb2JCWVJHOGRuNENsUmd2emd5NGt3WEJsT3pHQVVaQUkxb2ltb0gzUGV4MDB4Q1ZUQndjeVdjRXRhbUYwS0gzMDlRa2t4RXpNcUFRWkF5QmRqNjdCMWQ2VTJpWXdGSFljUVlaQlpBSGU2b0siLCJhbGdvcml0aG0iOiJITUFDLVNIQTI1NiIsImlzc3VlZF9hdCI6MTYxMjQ4OTc3MH0; csrftoken=XtzdIhOjSR0ZTlGJS1ZeTTzxWkemduLQ; ds_user_id=45331647121; sessionid=45331647121:1wHNFGMdrml5No:8; rur=FTW`,
'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.3.8 (KHTML, like Gecko) Mobile/14G60 Instagram 12.0.0.16.90 (iPhone9,4; iOS 10_3_3; en_US; en-US; scale=2.61; gamut=wide; 1080x1920)'
},
};
var req = https.request(options, function (res) {
//console.log("STATUS: " + res.statusCode);
//console.log('HEADERS: ' + JSON.stringify(res.headers));
res.setEncoding("utf8");
res.on("data", function (data) {
chunk += data;
});
res.on("end", function () {
console.log(chunk)
const data = JSON.parse(chunk);
console.log(data)
resolve();
});
});
req.on("error", function (e) {
console.log("problem with request: " + e.message);
reject();
});
// write data to request body
req.write("data\n");
req.write("data\n");
req.end();
});
promises.push(result);
};
accounts.forEach((account) => {
console.log(account);
try {
doRequest(account);
} catch (err) {
console.log(err);
}
});
Promise.all(promises).then(() => {
console.log('finish')
})
Last updated